Yup, galera, thx! :) As for the:
"It also doesn't handle the case where u can automatically recover from the current resource owner (nova-compute for example) dying." So heat is actively working on some resources, doing its thing, its binary crashes (or kill -9 occurs), what happens? The same question u can ask for nova-compute. Hope that makes more sense now. To me u need a system that can detect liveness of processes and can automatically handle the case where it dies (maybe by starting up another heat, or nova-compute or ...). But ya, your summary is right, distributed systems are wonky just in general. But all I can say is that zookeeper is pretty battle tested :) On 10/30/13 6:04 PM, "Clint Byrum" <[email protected]> wrote: >Excerpts from Joshua Harlow's message of 2013-10-30 17:46:44 -0700: >> This works as long as you have 1 DB and don't fail over to a secondary >> slave DB. >> >> Now u can say we all must use percona (or similar) for this, but then > >Did you mean Galera which provides multiple synchronous masters? > >> that¹s a change in deployment as well (and imho a bigger one). This is >> where the concept of a quorum in zookeeper comes into play, the >> transaction log that zookeeper maintains will be consistent among all >> members in that quorum. This is a typical zookeeper deployment strategy >> (select how many nodes u want for that quorum being an important >>question). >> > >Galera uses more or less the exact same mechanism. > >> It also doesn't handle the case where u can automatically recover from >>the >> current resource owner (nova-compute for example) dying. >> > >I don't know what that means. > >> Your atomic "check-if-owner-is-empty-and-store-instance-as-owner" is now >> user initiated instead of being automatic (zookeeper provides these kind >> of notifications via its watch concept). So that makes it hard for say >>an >> automated system (heat?) to react to these failures in any other way >>than >> repeated polling (or repeated retries or periodic tasks) which means >>that >> heat will not be able to react to failure in a 'live' manner. So this to >> me is the liveness question that zookeeper is designed to help out with, >> of course u can simulate this in a DB and repeated polling (as long as u >> don't try to do anything complicated with mysql, like replicas/slaves >>with >> transaction logs that may not be caught up and that u might have to fail >> over to if problems happen, since u are on your own if this happens). >> > >Right, even if you have a Galera cluster you still have to poll it or >use wonky things like triggers hooked up to memcache/gearman/amqp UDF's >to get around polling latency. > >I think your point is that a weird MySQL is just as disruptive to "the >normal OpenStack deployment" as a weird service like ZooKeeper. > >_______________________________________________ >OpenStack-dev mailing list >[email protected] >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
