Just another average day at the office.
Friday around 3pm, we no longer have access to our "outside" mailserver of some
our domains.
First thing ofcourse is that we "think" that the server is again down.
Again..yes again as it was down for about 2 hours on Thursday too.
But no it was not down...after some extra testing (logged it no a VPS in Sweden
and tried to connect to the server via it worked fine), it was some link on the
internet that was down. Doing a traceroute showed that we went via Hurricane
Electric to go from Europe to the US. The last hop we could see was
Paris-France. Tracing from the US to that hop in Paris, showed the last how
was NYC. So between Paris-France and NYC there was a problem. It's HE, so it
won't take long... Right...2 hours later still no change.
Lukely we will have a webhosting account somewere else and create the most
important mailboxes on it, change the MX records (they are other systems then
mail and web). And a couple of minutes later we have again mail. The TTL on
our MX records (primary MX that) is set very low, just in case, and in this
case it was important. Next step was on the back-mailserver to change the IP
of the main server and all that went to the backup now also goes to the "new"
MX server.
Saterday around noon, link is still down :-(
For some reason I forgot that we could try to go to the original MX via our 2nd
internet connection. Tested that, and yes that worked fine.
On our internal mailcollector (collects mail from various domains/mailboxes and
re-distributes them in internal mailboxes) we have no remote access (don't ask
:-)). Want to login localy, so a keyboard and monitor had to be connected.
Once they were connected, the keyboard didn't work, looks like we needed a
reboot of the system. Did a power-off/power-on the system and after about 10
minutes still nothing When checking the server, there was a very bad
ticking noise :-( the harddisk was giving problems...
Looking at the other "spare" systems we have, to see if we can install a new
mailcollector. First system, power-supply works, but no image. 2nd system,
hmmm kind of stupid to use a 8gig system for just a mailcollector (that 8gb
system is infact our Win2008 system) . 3rd system boots in Debian (with
VMWare)... Let's use that one. Booted the system with a Win2003 x64 CD and
installation starts. Installalation is done and no network :-( The OS doesn't
recognize the NIC :-( As this is a rack-mount case, with a desktop motherboard
but without raizercard, it means we can insert another PCI nic. Lukely I still
have a USB network adaptor, USB stick with drivers and we're on the network
The "old" system is still trying to bootscreen already shows something that
looks like a Win2003 startup screen, but still no login or services running.
On the new system we install vpop3 (trial for the moment) and configure our
main mailboxes, and start collecting the mail from the new-MX and once we added
a new route for the old-MX we can collect again all mail.
And surprise the old system is again up-and-runninggreat at least we didn't
loose those 1 "older" mails.
It's now Saturday almost 6pm and surpise the HE connection is working
againwe can again access the old-MX via our normal internet connection.
What did we learn and what will we do:
* have a better backup
* have more working spare systems
* buy some extra spare systems (Ebay will help)
* install a new mailfetcher on a "small" system (will go for a linux system
instead of a win2003 system)
* always have remote access to our systems, even internal servers
dirk
To unsubscribe send a message with UNSUBSCRIBE in the subject line to
salive@woodstone.nu
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members. Doing so will cause
you to be automatically removed from the list.