Ok, the fix for CLOUDSTACK-600 is the first patch mentioned plus two fixes to it. The first additional patch handles upgrades, and the second one is a minor fix for when we shut down a non-persistent VM that will avoid printing useless scary messages (we throw an error in the timeout loop that waits for the vm to shut down when it succeeds, because the domain can't be found).
I've tested this patch set both in a fresh 4.1 install and a fresh master install, and both were able to start/stop VMs successfully. When upgrading an existing 4.1 install (without the patches), the VMs were undefined on shutdown as they should be. commit 5dfcd309f10e5bd6a918f7fdff3f44a3dff2374a Author: Wido den Hollander <w...@widodh.nl> Date: Thu Feb 7 22:58:20 2013 +0100 agent: Do not define domains persistent in libvirt We used to define domains persistent in libvirt, which caused XML definitionsto stay there after a reboot of the hypervisor. We however don't do anything with those already defined domains, actually, we wipe all defined domains when starting the agent. Some users however reported that libvirt started these domains after a reboot before the CloudStack agent was started. By starting domains from the XML description and not defining them we prevent them from ever being stored in libvirt. commit 8d7d1cd5623a6744a954fb35eeff8408d6381083 Author: Marcus Sorensen <mar...@betterservers.com> Date: Wed Mar 13 11:10:56 2013 -0600 Summary: KVM - undefine persistent VMs on stop Detail: A previous patch fixed an issue where we are defining VMs to persist locally on KVM hosts, which can cause issues if the agent isn't running and libvirt decides to start the VM unbeknownst to cloudstack. The previous patch stopped defining VMs as persistent. This patch adds compatibility for existing cloudstack environments, removing the persistent definition on stop if needed. BUG-ID: CLOUDSTACK-600 commit 97d2e3fe7772fa01941295397f9d59d35cf47671 Author: Marcus Sorensen <mar...@betterservers.com> Date: Wed Mar 13 12:57:46 2013 -0600 Summary: KVM - remove harmless message about domain not found on VM stop Detail: When we stop a VM, it's definition is no longer valid. Therefore, we need to catch the exception thrown from libvirt in trying to lookup a non-existent domain by UUID while trying to check if it's shut down. BUG-ID:CLOUDSTACK-600 Signed-off-by: Marcus Sorensen <mar...@betterservers.com> 1363201066 -0600 On Tue, Mar 12, 2013 at 11:49 PM, Marcus Sorensen <shadow...@gmail.com> wrote: > Looks like this patch might need to be adjusted slightly. We can use > the domain.isPersistent() method to determine if we need to call > domain.undefine() in addition to dm.shutdown and dm.destroy. That will > allow upgrades to clean up existing definitions as VMs are shutdown. > If you don't have a chance to get to that, I'll try to tomorrow. > > On Tue, Mar 12, 2013 at 7:23 PM, Marcus Sorensen <shadow...@gmail.com> wrote: >> To clarify, this is a fairly serious bug, and it should probably be >> addressed, I'm just wanting clarification that the patch is good to go >> as a fix. On my way home I was thinking about how I'm not sure if it >> addresses upgrades, i.e. does it still allow existing definitions to >> be wiped, or will those be forever stuck. I haven't even looked yet. >> >> Essentially, the bug is that if a KVM host is uncleanly shut down, and >> then brought back up without starting the agent, the VMs that were >> running on the system will be started by libvirt/startup scripts, >> which can be disastrous for shared-storage VMs. So if a host goes >> down without being put into maintenance mode, for whatever reason, and >> the agent is either slow to start, the agent fails to start for some >> reason, or auto-start is disabled for some reason, you're almost >> guaranteed to have two VMs running off of the same shared storage >> image. >> >> We don't want to make any VMs permanently known to a host, and libvirt >> provides a simple way to define a VM domain without making it >> permanent. >> >> On Tue, Mar 12, 2013 at 6:42 PM, Chip Childers >> <chip.child...@sungard.com> wrote: >>> On Tue, Mar 12, 2013 at 06:39:09PM -0600, Marcus Sorensen wrote: >>>> This addresses CLOUDSTACK-600. Please get buyoff from Wido if >>>> possible, since it's his patch. I've reviewed and tested it, but I >>>> want to make sure I'm not wrong in assuming that this should be in 4.1 >>>> >>>> commit 5dfcd309f10e5bd6a918f7fdff3f44a3dff2374a >>>> Author: Wido den Hollander <w...@widodh.nl> >>>> Date: Thu Feb 7 22:58:20 2013 +0100 >>>> >>>> agent: Do not define domains persistent in libvirt >>>> >>>> We used to define domains persistent in libvirt, which caused XML >>>> definitions >>>> to stay there after a reboot of the hypervisor. >>>> >>>> We however don't do anything with those already defined domains, >>>> actually, we wipe >>>> all defined domains when starting the agent. >>>> >>>> Some users however reported that libvirt started these domains >>>> after a reboot >>>> before the CloudStack agent was started. >>>> >>>> By starting domains from the XML description and not defining them >>>> we prevent >>>> them from ever being stored in libvirt. >>>> >>> >>> Wido - I'll let you comment before we apply this.