Hi All,

We have this interesting problem I was hoping someone could shed some light on. Basically, we have 2 servers acting as a pacemaker cluster for DRBD and VirtualDomain (KVM) resources under CentOS 5.5.

As it is set up, if one node dies, the other node promotes the DRBD devices to "Master", then starts up the VMs there (there is one DRBD device for each VM). This works great. I set the 'resource-stickiness="100"', and the vm resource score is 50, such that if a VM migrates to the other server, it will stay there until I specifically move it back manually.

Now... In the event of a failure of one server, all the VMs go to the other server. When I fix the broken server and bring it back online, the VMs do not migrate back automatically because of the scoring I mentioned above. I wanted this because when the VM goes back, it essentially has to shut down, then reboot on the other node. I'm trying to avoid the 'shut down' part of it and do a live migration back to the first server. But, I cannot figure out the exact sequence of events to do this in such that pacemaker will not reboot the VM somewhere in the process. This is my configuration, with one VM called 'caweb':

node vmserver1
node vmserver2
primitive caweb-vd ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/caweb.xml" hypervisor="qemu:///system" \
        meta allow-migrate="false" target-role="Started" \
        op start interval="0" timeout="120s" \
        op stop interval="0" timeout="120s" \
        op monitor interval="10" timeout="30" depth="0"
primitive drbd-caweb ocf:linbit:drbd \
        params drbd_resource="caweb" \
        op monitor interval="15s" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="100s"
ms ms-drbd-caweb drbd-caweb \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
location caweb-prefers-vmserver1 caweb-vd 50: vmserver1
colocation caweb-vd-on-drbd inf: caweb-vd ms-drbd-caweb:Master
order caweb-after-drbd inf: ms-drbd-caweb:promote caweb-vd:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1276538859"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

One thing I tried, in an effort to do a live migration from vmserver2 to vmserver1 and afterward tell pacemaker to 're-acquire' the current state of things without a VM reboot, was:

vmserver1# crm resource unmanage caweb-vd
vmserver1# crm resource unmanage ms-drbd-caweb
vmserver1# drbdadm primary caweb   <--make dual primary

(then back on vmserver2...)

vmserver2# virsh migrate --live caweb qemu+ssh://hgvmserver1.local/system
vmserver2# drbdadm secondary caweb  <--disable dual primary
vmserver2# crm resource manage ms-drbd-caweb
vmserver2# crm resource manage caweb-vd
vmserver2# crm resource cleanup ms-drbd-caweb
vmserver2# crm resource cleanup caweb-vd
vmserver2# crm resource refresh
vmserver2# crm resource reprobe
vmserver2# crm resource start caweb-vd

at this point the VM has live migrated and is still online.

[wait 120 seconds for caweb-vd start timeouts to expire]

For a moment I thought it had worked, but then pacemaker put the device in an error mode and it was shut down... After bringing a resource(s) back into 'managed' mode, is there any way to tell pacemaker to 'figure things out' without restarting the resources? Or is this impossible because the VM resources is dependent on the DRBD resource, and it has trouble figuring out stacked resources without restarting them?

Or - does anyone know another way to manually live migrate a pacemaker/VirtualDomain managed VM (with DRBD) without having to reboot the VM after the live migrate?

Thanks in advance for any clues!! BTW, I am using pacemaker 1.0.8 and DRBD 83.

Cheers,
-erich

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to