I was working at Nordstrom in Seattle in both 1988 and 1993 when fire in electrical vaults in the city (one in Belltown and one in Downtown) blacked out large areas. In each case, they got a semi-trailer-mounted generator for their data center going for days while repairs were made.
Kurt On Tue, Dec 22, 2009 at 12:34, David Lum <david....@nwea.org> wrote: > I like the full disclosure. I’ve worked at places where all anyone would > “see” is “We had a problem and fixed it, we’re still the best”. > > > > From: Greg Olson [mailto:gol...@markettools.com] > Sent: Tuesday, December 22, 2009 12:13 PM > To: NT System Admin Issues > Subject: RE: My Tuesday Morning/Afternoon > > > > These are always good learning opportunities if nothing else. I had the joy > of being in the Datacenter at 365 Main in SF back in 2007 when through a > really interesting set of events it went dark. Even the best of planning > sometimes doesn’t cover everything, but it does allow for looking at how we > handled it, and improving our response. I know we’re in a far better state > now in terms of what systems to bring up and in the right order if something > like this we’re to happen again. Our biggest time sync was issues with DB’s > coming on-line before the san had fully re-initialized and issues with > servers booting up before the domain controllers we’re back up. We now have > scripts that delay boot up for X seconds on the servers which cover the > order. > > Always fun stuff J > > > > Oh, and if you want to know what killed a Data center with N+2 redundancy > here’s a link to the final report: > > http://365main.com/status_update.html > > > > -Greg > > > > From: Sherry Abercrombie [mailto:saber...@gmail.com] > Sent: Tuesday, December 22, 2009 11:22 AM > To: NT System Admin Issues > Subject: Re: My Tuesday Morning/Afternoon > > > > Actually Dave, I wish the AC had failed on the switch back to house power. > We would have known that if not immediately, because we do a physical check > on it when the generator goes off, within a few minutes because we would > have started getting heat alert messages from our Netbotz. That would have > given us a good 30 minutes to assess the situation & get the electrician in > here and very realistically avoided this whole situation. > > I am asking some very pointed questions right now, like why didn't the > generator detect that the ups was still on battery and kick in....why didn't > the electrician get notifications on his pager that the ups was still > running on battery like it's supposed to be setup to do....etc. This should > not have happened...... > > On Tue, Dec 22, 2009 at 12:30 PM, Eldridge, Dave <d...@parkviewmc.com> wrote: > > You lucky the AC stayed on. > > Makes my day look easy. J good luck with this one. > > You should book that Colorado ski vacation soon and get away. > > > > > > From: Sherry Abercrombie [mailto:saber...@gmail.com] > Sent: Tuesday, December 22, 2009 11:13 AM > To: NT System Admin Issues > Subject: My Tuesday Morning/Afternoon > > > > So here's how my Tuesday morning/afternoon is going so far. > > Arrive shortly after 7AM CST, all is quiet and functioning normally. I > proceed with taking care of some Heat tickets, monitoring stuff as normal > via Nagios, checking on backups, etc etc. Send a few emails to this list > about Citrix, blah blah blah, all rocking along nice and quiet like. > > 9AM - Generator testing, happens every Tuesday @ 9AM. > > 9:30AM - Generator shuts down, check on AC in server room to confirm it has > made the switch from generator back to house power (doesn't always switch & > will not be cooling in that event.) All is normal. > > 10:25AM - Fire suppression system starts alerting with a very loud beeping > noise, server room is locked down, we cannot access with our proximity > badges. Fire suppression system continues to alert with a "System Problem" > issue. > > 10:28AM - Gain entry into server room. AC unit is running....and that's > the only thing that is running. Server room is black, no power. > > Now that the issue has been assessed, it has been determined that the UPS > didn't make the switch from being on battery power for the generator test, > back to house power. Batteries drained, when batteries drained completely & > power went down, fire suppression system performed like it was supposed to > and locked everything down, shut the dampers on the vents etc, and went into > almost there is a fire mode. Fortunately it didn't release since no fire or > smoke was detected. > > Now 12:10PM and most everything is back on-line with the exception of 2 > VMWare hosts that are not being cooperative and the Oracle databases. > > Sigh.....Can I go home now? > > Oh, and the replacement parts are on order and should be here by in the > morning, electrician says he can replace the parts with no disruption of > service, and one minor detail, this could happen again before the parts get > replaced tomorrow. Glad I'm not the on-call person this week!! > > -- > Sherry Abercrombie > > "Any sufficiently advanced technology is indistinguishable from magic." > Arthur C. Clarke > > > > > > This e-mail contains the thoughts and opinions of the sender and does not > represent official Parkview Medical Center policy. > > This communication is intended only for the recipient(s) named above, may be > confidential and/or legally privileged: and, must be treated as such in > accordance with state and federal laws. If you are not the intended > recipient, you are hereby notified that any use of this communication, or > any of its contents, is prohibited. If you have received this communication > in error, please return to sender and delete the message from your computer > system.{token} > > > > > > > -- > Sherry Abercrombie > > "Any sufficiently advanced technology is indistinguishable from magic." > Arthur C. Clarke > Sent from Keller, TX, United States > > > > > > > > > > > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~