On Fri, Feb 20, 2015 at 3:48 AM, Matthew Booth <mbo...@redhat.com> wrote:
> Gary Kotton came across a doozy of a bug recently: > > https://bugs.launchpad.net/nova/+bug/1419785 > > In short, when you start a Nova compute, it will query the driver for > instances and compare that against the expected host of the the instance > according to the DB. If the driver is reporting an instance the DB > thinks is on a different host, it assumes the instance was evacuated > while Nova compute was down, and deletes it on the hypervisor. However, > Gary found that you trigger this when starting up a backup HA node which > has a different `host` config setting. i.e. You fail over, and the first > thing it does is delete all your instances. > > Gary and I both agree on a couple of things: > > 1. Deleting all your instances is bad > 2. HA nova compute is highly desirable for some drivers > There is a deeper issue here, that we are trying to work around. Nova was never designed to have entire systems running behind a nova-compute. It was designed to have one nova-compute per 'physical box that runs instances' There have been many discussions in the past on how to fix this issue (by adding a new point in nova where clustered systems can plug in), but if I remember correctly the gotcha was no one was willing to step up to do it. > > We disagree on the approach to fixing it, though. Gary posted this: > > https://review.openstack.org/#/c/154029/ > > I've already outlined my objections to this approach elsewhere, but to > summarise I think this fixes 1 symptom of a design problem, and leaves > the rest untouched. If the value of nova compute's `host` changes, then > the assumption that instances associated with that compute can be > identified by the value of instance.host becomes invalid. This > assumption is pervasive, so it breaks a lot of stuff. The worst one is > _destroy_evacuated_instances(), which Gary found, but if you scan > nova/compute/manager for the string 'self.host' you'll find lots of > them. For example, all the periodic tasks are broken, including image > cache management, and the state of ResourceTracker will be unusual. > Worse, whenever a new instance is created it will have a different value > of instance.host, so instances running on a single hypervisor will > become partitioned based on which nova compute was used to create them. > > In short, the system may appear to function superficially, but it's > unsupportable. > > I had an alternative idea. The current assumption is that the `host` > managing a single hypervisor never changes. If we break that assumption, > we break Nova, so we could assert it at startup and refuse to start if > it's violated. I posted this VMware-specific POC: > > https://review.openstack.org/#/c/154907/ > > However, I think I've had a better idea. Nova creates ComputeNode > objects for its current configuration at startup which, amongst other > things, are a map of host:hypervisor_hostname. We could assert when > creating a ComputeNode that hypervisor_hostname is not already > associated with a different host, and refuse to start if it is. We would > give an appropriate error message explaining that this is a > misconfiguration. This would prevent the user from hitting any of the > associated problems, including the deletion of all their instances. > > We can still do active/passive HA! > > If we configure both nodes in the active/passive cluster identically, > including with the same value of `host`, I don't see why this shouldn't > work today. I don't even think the configuration is onerous. All we > would be doing is preventing the user from accidentally running a > misconfigured HA which leads to inconsistent state, and will eventually > require manual cleanup. > > We would still have to be careful that we don't bring up both nova > computes simultaneously. The VMware driver, at least, has hardcoded > assumptions that it is the only writer in certain circumstances. That > problem would have to be handled separately, perhaps at the messaging > layer. > > Matt > -- > Matthew Booth > Red Hat Engineering, Virtualisation Team > > Phone: +442070094448 (UK) > GPG ID: D33C3490 > GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev