On a somewhat related note - somebody at Joyent accidentally rebooted their entire US-East-1 datacenter.
Postmortem is here: http://www.joyent.com/blog/postmortem-for-outage-of-us-east-1-may-27-2014 On Tue, May 27, 2014 at 4:22 PM, Mark McCullough <[email protected]> wrote: > I've done a lot of this with cfengine. > > With pushes, cfrun has a rate limit capability to limit the push to no > more than x hosts at a time, built in. We set up a framework of cfengine > classes where we flagged sandbox, beta, nonprod, prod1, prod2 groups and a > policy couldn't skip levels without unusual overrides that set off alarms. > > It worked extremely well, and was key to the buy-in of cfgmgmt as a > concept. > > On 2014 May 27, at 10:31 , Chaos Golubitsky <[email protected]> > wrote: > > > On Mon, 19 May, 2014 at 11:05:30 -0700, Brent Chapman wrote: > > > >> Google uses both of these patterns ("rate limit your rollouts" and "one, > >> few, many") together in many of its systems; the value of these patterns > >> has been proven many, many times in allowing us to catch "unexpected" > >> failures ("it worked fine in testing, and in the first few hosts we > >> updated, and in the first few clusters, but then it blew up...") before > >> they swept through an entire service or the whole fleet. > > > > Out of curiosity, is anyone using config management tools to do this kind > > of rate limiting or one/few/many rollout? In particular, while i've > never > > used Ansible, i gather some people choose it over other CM tools because > > it has functionality for, at the very least, "roll out to N hosts at a > > time" type updates. Is anyone using it (or any other open source tool) > > to manage the logic of staged updates? If so, do you like it? > > > > Thanks. > > > > Chaos > > _______________________________________________ > > Discuss mailing list > > [email protected] > > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss > > This list provided by the League of Professional System Administrators > > http://lopsa.org/ > > > ---- > "The speed of communications is wondrous to behold. It is also true that > speed can multiply the distribution of information that we know to be > untrue." Edward R Murrow (1964) > > Mark McCullough > [email protected] > > > > > _______________________________________________ > Discuss mailing list > [email protected] > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss > This list provided by the League of Professional System Administrators > http://lopsa.org/ >
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
