On 2/23/15, 2:05 PM, "Matthew Booth" <mbo...@redhat.com> wrote:
>On 20/02/15 11:48, Matthew Booth wrote: >> Gary Kotton came across a doozy of a bug recently: >> >> https://bugs.launchpad.net/nova/+bug/1419785 >> >> In short, when you start a Nova compute, it will query the driver for >> instances and compare that against the expected host of the the instance >> according to the DB. If the driver is reporting an instance the DB >> thinks is on a different host, it assumes the instance was evacuated >> while Nova compute was down, and deletes it on the hypervisor. However, >> Gary found that you trigger this when starting up a backup HA node which >> has a different `host` config setting. i.e. You fail over, and the first >> thing it does is delete all your instances. >> >> Gary and I both agree on a couple of things: >> >> 1. Deleting all your instances is bad >> 2. HA nova compute is highly desirable for some drivers >> >> We disagree on the approach to fixing it, though. Gary posted this: >> >> https://review.openstack.org/#/c/154029/ >> >> I've already outlined my objections to this approach elsewhere, but to >> summarise I think this fixes 1 symptom of a design problem, and leaves >> the rest untouched. If the value of nova compute's `host` changes, then >> the assumption that instances associated with that compute can be >> identified by the value of instance.host becomes invalid. This >> assumption is pervasive, so it breaks a lot of stuff. The worst one is >> _destroy_evacuated_instances(), which Gary found, but if you scan >> nova/compute/manager for the string 'self.host' you'll find lots of >> them. For example, all the periodic tasks are broken, including image >> cache management, and the state of ResourceTracker will be unusual. >> Worse, whenever a new instance is created it will have a different value >> of instance.host, so instances running on a single hypervisor will >> become partitioned based on which nova compute was used to create them. >> >> In short, the system may appear to function superficially, but it's >> unsupportable. >> >> I had an alternative idea. The current assumption is that the `host` >> managing a single hypervisor never changes. If we break that assumption, >> we break Nova, so we could assert it at startup and refuse to start if >> it's violated. I posted this VMware-specific POC: >> >> https://review.openstack.org/#/c/154907/ >> >> However, I think I've had a better idea. Nova creates ComputeNode >> objects for its current configuration at startup which, amongst other >> things, are a map of host:hypervisor_hostname. We could assert when >> creating a ComputeNode that hypervisor_hostname is not already >> associated with a different host, and refuse to start if it is. We would >> give an appropriate error message explaining that this is a >> misconfiguration. This would prevent the user from hitting any of the >> associated problems, including the deletion of all their instances. > >I have posted a patch implementing the above for review here: > >https://review.openstack.org/#/c/158269/ I have to look at what you have posted. I think that this topic is something that we should speak about at the summit and this should fall under some BP and well defined spec. I really would not like to see existing installations being broken if and when this patch lands. It may also affect Ironic as it works on the same model. > >Matt >-- >Matthew Booth >Red Hat Engineering, Virtualisation Team > >Phone: +442070094448 (UK) >GPG ID: D33C3490 >GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 > >__________________________________________________________________________ >OpenStack Development Mailing List (not for usage questions) >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev