Excerpts from Philipp Marek's message of 2015-08-05 00:10:30 -0700: > > > >Well, is it already decided that Pacemaker would be chosen to provide HA in > > >Openstack? There's been a talk "Pacemaker: the PID 1 of Openstack" IIRC. > > > > > >I know that Pacemaker's been pushed aside in an earlier ML post, but IMO > > >there's already *so much* been done for HA in Pacemaker that Openstack > > >should just use it. > > > > > >All HA nodes needs to participate in a Pacemaker cluster - and if one node > > >looses connection, all services will get stopped automatically (by > > >Pacemaker) - or the node gets fenced. > > > > > > > > >No need to invent some sloppy scripts to do exactly the tasks (badly!) that > > >the Linux HA Stack has been providing for quite a few years. > > So just a piece of information, but yahoo (the company I work for, with vms > > in the tens of thousands, baremetal in the much more than that...) hasn't > > used pacemaker, and in all honesty this is the first project (openstack) > > that I have heard that needs such a solution. I feel that we really should > > be building our services better so that they can be A-A vs having to depend > > on another piece of software to get around our 'sloppiness' (for lack of a > > better word). > > > > Nothing against pacemaker personally... IMHO it just doesn't feel like we > > are doing this right if we need such a product in the first place. > Well, Pacemaker is *the* Linux HA Stack. >
I'm not sure it's wise to claim the definite article for anything in Open Source. :) That said, it's certainly the most mature, and widely accepted. > So, before trying to achieve similar goals by self-written scripts (and > having to re-discover all the gotchas involved), it would be much better to > learn from previous experiences - even if they are not one's own. > > Pacemaker has eg. the concept of clones[1] - these define services that run > multiple instances within a cluster. And behold! the instances get some > Pacemaker-internal unique id[2], which can be used to do sharding. > > > Yes, that still means that upon service or node crash the failed instance > has to be started on some other node; but as that'll typically be up and > running already, the startup time should be in the range of seconds. > > > We'd instantly get > * a supervisor to start/stop/restart/fence/monitor the service(s) > * node/service failure detection > * only small changes needed in the services > * and all that in a tested software that's available in all distributions, > and that already has its own testsuite... > > > If we decide that this solution won't fulfill all our expectations, fine - > let's use something else. > > But I don't think it makes *any* sense to try to redo some (existing) > High-Availability code in some quickly written scripts, just because it > looks easy - there are quite a few traps for the unwary. > I think Keystone's dev team agrees with you, and also doesn't want to get in the way of that with any half-baked solution. They give you all the CLI tools and filesystem layouts to make this work perfectly. It would be nice to even ship the pacemaker resources in a contrib directory and run tests in the gate on them. But if users have some reason not to use it, they shouldn't be force to use it. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev