+1 On Oct 12, 2011 11:51 AM, <valdis.kletni...@vt.edu> wrote:
> On Wed, 12 Oct 2011 09:52:02 CDT, -Hammer- said: > > What kills me is what they have told the public. The lost a "core > > switch". I don't know if they actually mean network switch or not but > > I'm pretty sure any of us that work on an enterprise environment know > > how to factor N+1 just for these types of days. And then the backup > > solution failed? I'm not buying it either. > > Yeah, and that extra comma in the one config file that didn't make a > difference > when you tested the failover in the lab *never* makes a difference when it > hits > in the production network, right? Or they changed the config of the > primary and > it didn't get propogated just right to the backup, or they had mismatched > firmware > levels on blades in the blades on the primary and backup switches, so > traffic that > didn't tickle a bug on the primary blades caused the blade to crash on the > backup, > or... > > Anybody on this list who's been around long enough probably has enough "We > should have had N+2 because the N+1'th device failed too" stories to drain > *several* pitchers of beer at a good pub... I've even had one case where my > butt got *saved* from a ohnosecond-class whoops because the N+1'th device > *was* > crashed (stomped a config file, it replicated, was able to salvage a copy > from > a device that didn't replicate because it was down at the time). > >