For some reason it appears that Darren's message is late in making it into my inbox, so I'm replying to Tristan's response instead.
On Tue, Feb 21, 2012 at 3:40 PM, i3D.net - Tristan van Bokkem <tristanvanbok...@i3d.nl> wrote: > Agreed, but you're talking about HA at a different layer - within the > applications that will sit on top of the cloud infrastructure. Whilst > that's absolutely valid, and a message that needs to be spread more > amongst those designing apps for the cloud, it's not addressing the > concerns of the OP. > > We still need to be thinking about HA of the cloud infrastructure > components themselves, so that there are less failures for the app > designers to have to tolerate in the first place. True. There are two different layers to be addressed here, and I believe HA of the cloud infra components is a pretty crucial one to address. That being said -- and excuse me for beating the Pacemaker drum again -- the issue of sites (multiple cloud "cells", if you will, that communicate automagically, maintain a distributed consensus about site availability, and fail over automatically) is also solved in that stack. The relevant project is named booth; for those who are interested, the SUSE docs team has whipped up some useful documentation for that: http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.geo.html This is an automatic site-failover mechanism based on site "tickets", the state of which is maintained via a simple Paxos cluster (for those into Ceph/RADOS, this will sound familiar). Pacemaker then can simply fire up or shut down resources (OpenStack services) based on site availability. Third-party arbitration is supported. Hope this is useful. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp