Send Outages-discussion mailing list submissions to outages-discussion@outages.org
To subscribe or unsubscribe via the World Wide Web, visit https://puck.nether.net/mailman/listinfo/outages-discussion or, via email, send a message with subject or body 'help' to outages-discussion-requ...@outages.org You can reach the person managing the list at outages-discussion-ow...@outages.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Outages-discussion digest..." Today's Topics: 1. Re: S3 Outages Postmortem (Michael Christian) 2. Re: S3 Outages Postmortem (Jim Popovitch) ---------------------------------------------------------------------- Message: 1 Date: Wed, 1 Mar 2017 23:44:37 -0800 From: Michael Christian <mfletcherchrist...@yahoo.com> To: "Chapman, Brad (NBCUniversal)" <brad.chap...@nbcuni.com> Cc: Kevin Blackham <black...@gmail.com>, Bob Strecansky <b...@mailchimp.com>, "outages-discussion@outages.org" <outages-discussion@outages.org> Subject: Re: [Outages-discussion] S3 Outages Postmortem Message-ID: <0b57d2b9-cfc3-4e5b-90be-05d56d657...@yahoo.com> Content-Type: text/plain; charset="utf-8" The outage was abrupt, but the recovery came in stages. Read traffic first, followed by write traffic ~1.5 hours later. That makes me think a power problem, or automation gone awry. We always blame the network team, but that rings hollow to me here. On strategy, I am fully behind prioritization of read traffic recovery over write traffic. That's evolving over time, but is still true for most use cases. For those saying "who cares," you may not understand the number of blended integrated systems out there in this age. This took down a huge number of correlated services, and it shouldn't have. We need looser coupling. - Mike Christian Sent from my iPad > On Mar 1, 2017, at 11:25 AM, Chapman, Brad (NBCUniversal) > <brad.chap...@nbcuni.com> wrote: > > ??lots of services affected?? > > Well, that was pretty obvious from the dashboard yesterday: > > https://i.imgur.com/xTec0Bn.png > > -Brad > > From: Outages-discussion [mailto:outages-discussion-boun...@outages.org] On > Behalf Of Kevin Blackham > Sent: Wednesday, March 1, 2017 11:17 AM > To: Bob Strecansky <b...@mailchimp.com> > Cc: outages-discussion@outages.org > Subject: Re: [Outages-discussion] S3 Outages Postmortem > > I have some insights, but I'm under NDA. This was big enough I expect some > public disclosure (my words). > > I can tell you we observed lots of services affected, not just S3. EBS was > jacking up IO all over the place, and many machines didn't even ping. SES was > quite broken, as was autoscaling. One might conclude it was a network problem. > > On Mar 1, 2017 12:09, "Bob Strecansky" <b...@mailchimp.com> wrote: > Has anyone heard anything about why S3 was down for 5 hours yesterday? > Usually Amazon doesn't post postmortems, and i'm curious as to what happened. > > Thanks, > > Bob Strecansky > -- > Thanks, > > -B > > _______________________________________________ > Outages-discussion mailing list > Outages-discussion@outages.org > https://puck.nether.net/mailman/listinfo/outages-discussion > > _______________________________________________ > Outages-discussion mailing list > Outages-discussion@outages.org > https://puck.nether.net/mailman/listinfo/outages-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://puck.nether.net/pipermail/outages-discussion/attachments/20170301/0cdc9eb1/attachment-0001.html> ------------------------------ Message: 2 Date: Thu, 2 Mar 2017 09:45:20 -0500 From: Jim Popovitch <jim...@gmail.com> To: "outages-discussion@outages.org" <outages-discussion@outages.org> Subject: Re: [Outages-discussion] S3 Outages Postmortem Message-ID: <CAGfsgR0nFc+=T6RDd7NBRLaYQU1DrBNMEVx+cEmX9v-e=nz...@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Thu, Mar 2, 2017 at 2:44 AM, Michael Christian <mfletcherchrist...@yahoo.com> wrote: > For those saying "who cares," you may not understand the number of blended > integrated systems out there in this age. I'm someone who says 'who cares', but not in the context you're suggesting. I say: Who cares to see 30 outages posts for an outage in 1/20th of 1 providers datacenter services? Who cares to see 30 outages posts about "important" websites that don't follow decades of best practices on redundancy and resiliency? Who cares to see 30 outages posts about "me too", "me too", "me too"? -Jim P. ------------------------------ Subject: Digest Footer _______________________________________________ Outages-discussion mailing list Outages-discussion@outages.org https://puck.nether.net/mailman/listinfo/outages-discussion ------------------------------ End of Outages-discussion Digest, Vol 93, Issue 3 *************************************************