Just another average day at the office. Friday around 3pm, we no longer have access to our "outside" mailserver of some our domains. First thing ofcourse is that we "think" that the server is again down. Again..yes again as it was down for about 2 hours on Thursday too. But no it was not down...after some extra testing (logged it no a VPS in Sweden and tried to connect to the server via it worked fine), it was some link on the internet that was down. Doing a traceroute showed that we went via Hurricane Electric to go from Europe to the US. The last hop we could see was Paris-France. Tracing from the US to that hop in Paris, showed the last how was NYC. So between Paris-France and NYC there was a problem. It's HE, so it won't take long... Right...2 hours later still no change. Lukely we will have a webhosting account somewere else and create the most important mailboxes on it, change the MX records (they are other systems then mail and web). And a couple of minutes later we have again mail. The TTL on our MX records (primary MX that) is set very low, just in case, and in this case it was important. Next step was on the back-mailserver to change the IP of the main server and all that went to the backup now also goes to the "new" MX server. Saterday around noon, link is still down :-( For some reason I forgot that we could try to go to the original MX via our 2nd internet connection. Tested that, and yes that worked fine. On our internal mailcollector (collects mail from various domains/mailboxes and re-distributes them in internal mailboxes) we have no remote access (don't ask :-)). Want to login localy, so a keyboard and monitor had to be connected. Once they were connected, the keyboard didn't work, looks like we needed a reboot of the system. Did a power-off/power-on the system and after about 10 minutes still nothing.... When checking the server, there was a very bad ticking noise :-( the harddisk was giving problems... Looking at the other "spare" systems we have, to see if we can install a new mailcollector. First system, power-supply works, but no image. 2nd system, hmmm kind of stupid to use a 8gig system for just a mailcollector (that 8gb system is infact our Win2008 system) . 3rd system boots in Debian (with VMWare)... Let's use that one. Booted the system with a Win2003 x64 CD and installation starts. Installalation is done and no network :-( The OS doesn't recognize the NIC :-( As this is a rack-mount case, with a desktop motherboard but without raizercard, it means we can insert another PCI nic. Lukely I still have a USB network adaptor, USB stick with drivers and we're on the network.... The "old" system is still trying to boot....screen already shows something that looks like a Win2003 startup screen, but still no login or services running. On the new system we install vpop3 (trial for the moment) and configure our main mailboxes, and start collecting the mail from the new-MX and once we added a new route for the old-MX we can collect again all mail. And surprise the old system is again up-and-running....great at least we didn't loose those 10000 "older" mails. It's now Saturday almost 6pm and surpise the HE connection is working again....we can again access the old-MX via our normal internet connection. What did we learn and what will we do: * have a better backup * have more working spare systems * buy some extra spare systems (Ebay will help) * install a new mailfetcher on a "small" system (will go for a linux system instead of a win2003 system) * always have remote access to our systems, even internal servers dirk
To unsubscribe send a message with UNSUBSCRIBE in the subject line to salive@woodstone.nu If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.