Ok, the fix for CLOUDSTACK-600 is the first patch mentioned plus two
fixes to it. The first additional patch handles upgrades, and the
second one is a minor fix for when we shut down a non-persistent VM
that will avoid printing useless scary messages (we throw an error in
the timeout loop that waits for the vm to shut down when it succeeds,
because the domain can't be found).

I've tested this patch set both in a fresh 4.1 install and a fresh
master install, and both were able to start/stop VMs successfully.
When upgrading an existing 4.1 install (without the patches), the VMs
were undefined on shutdown as they should be.

commit 5dfcd309f10e5bd6a918f7fdff3f44a3dff2374a
Author: Wido den Hollander <w...@widodh.nl>
Date:   Thu Feb 7 22:58:20 2013 +0100

    agent: Do not define domains persistent in libvirt

    We used to define domains persistent in libvirt, which caused XML
definitionsto stay there after a reboot of the hypervisor.

    We however don't do anything with those already defined domains,
actually, we wipe all defined domains when starting the agent.

    Some users however reported that libvirt started these domains
after a reboot before the CloudStack agent was started.

    By starting domains from the XML description and not defining them
we prevent them from ever being stored in libvirt.

commit 8d7d1cd5623a6744a954fb35eeff8408d6381083
Author: Marcus Sorensen <mar...@betterservers.com>
Date:   Wed Mar 13 11:10:56 2013 -0600

    Summary: KVM - undefine persistent VMs on stop

    Detail: A previous patch fixed an issue where we are defining VMs to persist
    locally on KVM hosts, which can cause issues if the agent isn't running and
    libvirt decides to start the VM unbeknownst to cloudstack. The
previous patch
    stopped defining VMs as persistent. This patch adds compatibility
for existing
    cloudstack environments, removing the persistent definition on
stop if needed.

    BUG-ID: CLOUDSTACK-600

commit 97d2e3fe7772fa01941295397f9d59d35cf47671
Author: Marcus Sorensen <mar...@betterservers.com>
Date:   Wed Mar 13 12:57:46 2013 -0600

    Summary: KVM - remove harmless message about domain not found on VM stop

    Detail: When we stop a VM, it's definition is no longer valid. Therefore, we
    need to catch the exception thrown from libvirt in trying to lookup a
    non-existent domain by UUID while trying to check if it's shut down.

    BUG-ID:CLOUDSTACK-600
    Signed-off-by: Marcus Sorensen <mar...@betterservers.com> 1363201066 -0600


On Tue, Mar 12, 2013 at 11:49 PM, Marcus Sorensen <shadow...@gmail.com> wrote:
> Looks like this patch might need to be adjusted slightly.  We can use
> the domain.isPersistent() method to determine if we need to call
> domain.undefine() in addition to dm.shutdown and dm.destroy. That will
> allow upgrades to clean up existing definitions as VMs are shutdown.
> If you don't have a chance to get to that, I'll try to tomorrow.
>
> On Tue, Mar 12, 2013 at 7:23 PM, Marcus Sorensen <shadow...@gmail.com> wrote:
>> To clarify, this is a fairly serious bug, and it should probably be
>> addressed, I'm just wanting clarification that the patch is good to go
>> as a fix. On my way home I was thinking about how I'm not sure if it
>> addresses upgrades, i.e. does it still allow existing definitions to
>> be wiped, or will those be forever stuck. I haven't even looked yet.
>>
>> Essentially, the bug is that if a KVM host is uncleanly shut down, and
>> then brought back up without starting the agent, the VMs that were
>> running on the system will be started by libvirt/startup scripts,
>> which can be disastrous for shared-storage VMs.  So if a host goes
>> down without being put into maintenance mode, for whatever reason, and
>> the agent is either slow to start, the agent fails to start for some
>> reason, or auto-start is disabled for some reason, you're almost
>> guaranteed to have two VMs running off of the same shared storage
>> image.
>>
>> We don't want to make any VMs permanently known to a host, and libvirt
>> provides a simple way to define a VM domain without making it
>> permanent.
>>
>> On Tue, Mar 12, 2013 at 6:42 PM, Chip Childers
>> <chip.child...@sungard.com> wrote:
>>> On Tue, Mar 12, 2013 at 06:39:09PM -0600, Marcus Sorensen wrote:
>>>> This addresses CLOUDSTACK-600.  Please get buyoff from Wido if
>>>> possible, since it's his patch. I've reviewed and tested it, but I
>>>> want to make sure I'm not wrong in assuming that this should be in 4.1
>>>>
>>>> commit 5dfcd309f10e5bd6a918f7fdff3f44a3dff2374a
>>>> Author: Wido den Hollander <w...@widodh.nl>
>>>> Date:   Thu Feb 7 22:58:20 2013 +0100
>>>>
>>>>     agent: Do not define domains persistent in libvirt
>>>>
>>>>     We used to define domains persistent in libvirt, which caused XML
>>>> definitions
>>>>     to stay there after a reboot of the hypervisor.
>>>>
>>>>     We however don't do anything with those already defined domains,
>>>> actually, we wipe
>>>>     all defined domains when starting the agent.
>>>>
>>>>     Some users however reported that libvirt started these domains
>>>> after a reboot
>>>>     before the CloudStack agent was started.
>>>>
>>>>     By starting domains from the XML description and not defining them
>>>> we prevent
>>>>     them from ever being stored in libvirt.
>>>>
>>>
>>> Wido - I'll let you comment before we apply this.

Reply via email to