Excerpts from Philipp Marek's message of 2015-08-05 00:10:30 -0700:
> 
> > >Well, is it already decided that Pacemaker would be chosen to provide HA in
> > >Openstack? There's been a talk "Pacemaker: the PID 1 of Openstack" IIRC.
> > >
> > >I know that Pacemaker's been pushed aside in an earlier ML post, but IMO
> > >there's already *so much* been done for HA in Pacemaker that Openstack
> > >should just use it.
> > >
> > >All HA nodes needs to participate in a Pacemaker cluster - and if one node
> > >looses connection, all services will get stopped automatically (by
> > >Pacemaker) - or the node gets fenced.
> > >
> > >
> > >No need to invent some sloppy scripts to do exactly the tasks (badly!) that
> > >the Linux HA Stack has been providing for quite a few years.
> > So just a piece of information, but yahoo (the company I work for, with vms
> > in the tens of thousands, baremetal in the much more than that...) hasn't
> > used pacemaker, and in all honesty this is the first project (openstack)
> > that I have heard that needs such a solution. I feel that we really should
> > be building our services better so that they can be A-A vs having to depend
> > on another piece of software to get around our 'sloppiness' (for lack of a
> > better word).
> > 
> > Nothing against pacemaker personally... IMHO it just doesn't feel like we
> > are doing this right if we need such a product in the first place.
> Well, Pacemaker is *the* Linux HA Stack.
> 

I'm not sure it's wise to claim the definite article for anything in
Open Source. :)

That said, it's certainly the most mature, and widely accepted.

> So, before trying to achieve similar goals by self-written scripts (and 
> having to re-discover all the gotchas involved), it would be much better to 
> learn from previous experiences - even if they are not one's own.
> 
> Pacemaker has eg. the concept of clones[1] - these define services that run 
> multiple instances within a cluster. And behold! the instances get some 
> Pacemaker-internal unique id[2], which can be used to do sharding.
> 
> 
> Yes, that still means that upon service or node crash the failed instance 
> has to be started on some other node; but as that'll typically be up and 
> running already, the startup time should be in the range of seconds.
> 
> 
> We'd instantly get
>  * a supervisor to start/stop/restart/fence/monitor the service(s)
>  * node/service failure detection
>  * only small changes needed in the services
>  * and all that in a tested software that's available in all distributions,
>    and that already has its own testsuite...
> 
> 
> If we decide that this solution won't fulfill all our expectations, fine -
> let's use something else.
> 
> But I don't think it makes *any* sense to try to redo some (existing) 
> High-Availability code in some quickly written scripts, just because it 
> looks easy - there are quite a few traps for the unwary.
> 

I think Keystone's dev team agrees with you, and also doesn't want to get
in the way of that with any half-baked solution. They give you all the
CLI tools and filesystem layouts to make this work perfectly. It would
be nice to even ship the pacemaker resources in a contrib directory and
run tests in the gate on them. But if users have some reason not to use
it, they shouldn't be force to use it.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to