On 10/15/2014 06:30 PM, Jay Pipes wrote: > > > On 10/15/2014 04:50 PM, Florian Haas wrote: >> On Wed, Oct 15, 2014 at 9:58 PM, Jay Pipes <jaypi...@gmail.com> wrote: >>> On 10/15/2014 03:16 PM, Florian Haas wrote: >>>> >>>> On Wed, Oct 15, 2014 at 7:20 PM, Russell Bryant <rbry...@redhat.com> >>>> wrote: >>>>> >>>>> On 10/13/2014 05:59 PM, Russell Bryant wrote: >>>>>> >>>>>> Nice timing. I was working on a blog post on this topic. >>>>> >>>>> >>>>> which is now here: >>>>> >>>>> http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/ >>>>> >>>> >>>> >>>> I am absolutely loving the fact that we are finally having a >>>> discussion in earnest about this. i think this deserves a Design >>>> Summit session. >>>> >>>> If I may weigh in here, let me share what I've seen users do and what >>>> can currently be done, and what may be supported in the future. >>>> >>>> Problem: automatically ensure that a Nova guest continues to run, even >>>> if its host fails. >>>> >>>> (That's the general problem description and I don't need to go into >>>> further details explaining the problem, because Russell has done that >>>> beautifully in his blog post.) >>>> >>>> Now, what are the options? >>>> >>>> (1) Punt and leave it to the hypervisor. >>>> >>>> This essentially means that you must use a hypervisor that already has >>>> HA built in, such as VMware with the VCenter driver. In that scenario, >>>> Nova itself neither deals with HA, nor exposes any HA switches to the >>>> user. Obvious downside: not generic, doesn't work with all >>>> hypervisors, most importantly doesn't work with the most popular one >>>> (libvirt/KVM). >>>> >>>> (2) Deploy Nova nodes in pairs/groups, and pretend that they are one >>>> node. >>>> >>>> You can already do that by overriding "host" in nova-compute.conf, >>>> setting resume_guests_state_on_host_boot, and using VIPs with >>>> Corosync/Pacemaker. You can then group these hosts in host aggregates, >>>> and the user's scheduler hint to point a newly scheduled guest to such >>>> a host aggregate becomes, effectively, the "keep this guest running at >>>> all times" flag. Upside: no changes to Nova at all, monitoring, >>>> fencing and recovery for free from Corosync/Pacemaker. Downsides: >>>> requires vendors to automate Pacemaker configuration in deployment >>>> tools (because you really don't want to do those things manually). >>>> Additional downside: you either have some idle hardware, or you might >>>> be overcommitting resources in case of failover. >>>> >>>> (3) Automatic host evacuation. >>>> >>>> Not supported in Nova right now, as Adam pointed out at the top of the >>>> thread, and repeatedly shot down. If someone were to implement this, >>>> it would *still* require that Corosync/Pacemaker be used for >>>> monitoring and fencing of nodes, because re-implementing this from >>>> scratch would be the reinvention of a wheel while painting a bikeshed. >>>> >>>> (4) Per-guest HA. >>>> >>>> This is the idea of just doing "nova boot --keep-this running", i.e. >>>> setting a per-guest flag that still means the machine is to be kept up >>>> at all times. Again, not supported in Nova right now, and probably >>>> even more complex to implement generically than (3), at the same or >>>> greater cost. >>>> >>>> I have a suggestion to tackle this that I *think* is reasonably >>>> user-friendly while still bearable in terms of Nova development >>>> effort: >>>> >>>> (a) Define a well-known metadata key for a host aggregate, say "ha". >>>> Define that any host aggregate that represents a highly available >>>> group of compute nodes should have this metadata key set. >>>> >>>> (b) Then define a flavor that sets extra_specs "ha=true". >>>> >>>> Granted, this places an additional burden on distro vendors to >>>> integrate highly-available compute nodes into their deployment >>>> infrastructure. But since practically all of them already include >>>> Pacemaker, the additional scaffolding required is actually rather >>>> limited. >>> >>> >>> Or: >>> >>> (5) Let monitoring and orchestration services deal with these use >>> cases and >>> have Nova simply provide the primitive API calls that it already does >>> (i.e. >>> host evacuate). >> >> That would arguably lead to an incredible amount of wheel reinvention >> for node failure detection, service failure detection, etc. etc. > > How so? (5) would use existing wheels for monitoring and orchestration > instead of writing all new code paths inside Nova to do the same thing.
Right, there may be some confusion here ... I thought you were both agreeing that the use of an external toolset was a good approach for the problem, but Florian's last message makes that not so clear ... -- Russell Bryant _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev