What I saw in this thread are several topics: 1) Is VM HA really relevant (in a cloud)?
This is the most difficult question to answer, because it really depends on who you are talking to, who are the user community you are facing. IMHO, for most web-based applications that are born to run on cloud, maybe certain level of business resiliency has already been built into the code, so the application or service can live happily when VMs come and go. For traditional business applications, the scenario may be quite different. These apps are migrated to cloud for reasons like cost savings, server consolidation, etc.. Quite some companies are evaluating OpenStack for their "private cloud" -- which is a weird term, IMHO. In addition to this, while we are looking into the 'utility' vision of cloud, we can still ask ourselves: a) can we survive one month of power outage or water outage, though there are abundant supply elsewhere on this planet? b) what are the costs we need to pay if we eventually make it? c) do we want to pay for this? My personal experience is that our customers really want this feature (VM HA) for their private clouds. The question they asked us was: " Does OpenStack support VM HA? Maybe not for all VMS... We know we can have that using vSphere, Azure, or CloudStack... " 2) Where is the best location to provide VM HA? Suppose that we do feel the need to support VM HA, then the questions following this would 'where' and 'how'. Considering that a VM is not merely a bundle of compute processes, it is actually a virtual execution environment that consumes resources like storage and network bandwidth besides processor cycles, Nova may be NOT the ideal location to deal with this cross-cutting concern. High availability involves redundant resource provisioning, effective failure detection and appropriate fail-over policies, including fencing. Imposing all these requirements on Nova is impractical. We may need to consider whether VM HA, if ever implemented/supported, should be part of the orchestration service, aka Heat. 3) Can/should we do the VM HA orchestration in Heat? My perception is that it can be done in Heat, based on my limited understandig of how Heat works. It may imply some requirements to other projects (e.g. nova, cinder, neutron ...) as well, though Heat should be the orchestrator. What do we need then? - A resource type for VM groups/clusters, for the redundant provisioning. VMs in the group can be identical instances, managed by a Pacemaker setup among the VMs, just like a WatchRule in Heat can be controlled by Ceilometer. Another way to do this is to have the VMs monitored via heartbeat messages sent by Nova (if possible/needed), or some services injected into the VMs (consider what cfn-hup, cfn-signal does today). However, the VM group/cluster can decide how to react to a VM online /offline signal. It may choose to a) restart the VM in-place; b) remote-restart (aka evacuate) the VM somewhere else; c) live/cold migrate the VM to other nodes. The policies can be out sourced to other plugins considering that global load-balancing or power management requirements. But that is an advanced feature that warrants another blueprint. - Some fencing support from nova, cinder, neutron to shoot the bad VMs in the head so a VM that cannot be reached is guarantteed to be cleanly killed. - VM failure detectors that can reliably tell whether a VM has failed. Sometimes a VM that failed the expected performance goal should be treated as failed as well, if we really want to be strict on this. A failure detector can reside inside Nova, as what has been done for the 'service groups' there. It can reside inside a VM, as a service istalled there, sending out heatbeat messages (before the battery runs out, :)) - A generic signaling mechanism that allows a secure message delivery back to Heat indicating that a VM is alive or dead. My current understanding is that we may avoid complicated task-flow here. Regards, - Qiming > >>For the most part we've been trying to encourage projects that want to > >>control VMs to add such functionality to the Orchestration program, aka > >>"Heat". > >Yes, exactly. > > > >-jay > > > Hey folks, > > Just as a note for HA for VMs, our current heat-core thinking is our > HARestarter resource functionality is a workflow (Restarter is a > verb, rather then a Noun - Heat orchestrates Nouns) and would be > better suited to a workflow service like Mistral. Clearly we don't > know how to get from where we are today to the proper separation of > concerns as pointed out by Zane Bitter in recent threads on the ml > but just throwing this out there so folks are aware. > > Regards > -steve > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev