FYI, this caused a Beta Cluster outage tonight. ----- Forwarded message from Andrew Bogott <[email protected]> -----
> Date: Thu, 26 Feb 2015 19:12:23 -0800 > From: Andrew Bogott <[email protected]> > To: Operations Engineers <[email protected]> > Subject: Re: [Ops] 2015-02-24 Labs outage post-mortem > > This happened again, just now. I don't have any theory for what's happening > -- it looks like a software issue except that it's now happened twice on > virt1012 and 1012 should be identical to 1011 and 1010. > > Giuseppe has already had a go at this issue -- I'd appreciate any log-diving > that anyone else is able to do. Additionally, I'd feel a lot better if we > could order at least one more server, tomorrow[1]. As it is, even if we > decide that virt1012 is cursed there's nowhere else for us to go. > > [1] Related tickets: > https://phabricator.wikimedia.org/T90783 > https://phabricator.wikimedia.org/T89752 > https://phabricator.wikimedia.org/T90962 > > > On 2/24/15 11:10 AM, Andrew Bogott wrote: > >We suffered yet another virt outage last night -- this time instance > >networking failed on virt1012. Awkwardly, virt1012 is where I moved > >everything from virt1005 during the outage last week, so all the same > >instances were affected this week as last. > > > >The outage report is here: > > > >https://wikitech.wikimedia.org/wiki/Incident_documentation/20150224-LabsOutage > > > > > >We didn't learn much from this one -- I welcome your thoughts and > >additions. > > > >-Andrew > > > > > _______________________________________________ > Ops mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/ops ----- End forwarded message ----- -- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @greg A18D 1138 8E47 FAC8 1C7D | _______________________________________________ QA mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/qa
