Re: [ClusterLabs] Resources suddenly get target-role="stopped"

Klechomir Mon, 07 Dec 2015 00:06:07 -0800

Hi Ken,
The comments are in the text.

On 4.12.2015 19:06, Ken Gaillot wrote:

On 12/04/2015 10:22 AM, Klechomir wrote:

Hi list,
My issue is the following:


I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8
(observed the same problem with Corosync 2.3.5  & Pacemaker 1.1.13-rc3)

Bumped on this issue when started playing with VirtualDomain resources,
but this seems to be unrelated to the RA.

The problem is that without apparent reason a resource gets
target-role="Stopped". This happens after (successful) migration, or
after failover., or after VM restart .

My tests showed that changing the resource name fixes this problem, but
this seems to be a temporary workaround.

The resource configuration is:
primitive VMA_VM1 ocf:heartbeat:VirtualDomain \
         params config="/NFSvolumes/CDrive1/VM1/VM1.xml"
hypervisor="qemu:///system" migration_transport="tcp" \
         meta allow-migrate="true" target-role="Started" \
         op start interval="0" timeout="120s" \
         op stop interval="0" timeout="120s" \
         op monitor interval="10" timeout="30" depth="0" \
         utilization cpu="1" hv_memory="925"
order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1

Here is the log from one such stop, after successful migration with "crm
migrate resource VMA_VM1":

Dec 04 15:18:22 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564)
Dec 04 15:18:22 [4434] CLUSTER-1       lrmd:     info:
cancel_recurring_action:         Cancelling operation VMA_VM1_monitor_10000
Dec 04 15:18:23 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled
Dec 04 15:18:23 [3818929] CLUSTER-1       crmd:    debug:
do_lrm_rsc_op:        Performing
key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_migrate_to_0
VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:23 DEBUG:
Virtual domain VM1 is currently running.
VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:23 INFO: VM1:
Starting live migration to CLUSTER-2 (using virsh
--connect=qemu:///system --quiet migrate --live  VM1
qemu+tcp://CLUSTER-2/system ).
Dec 04 15:18:24 [3818929] CLUSTER-1       crmd:     info:
process_lrm_event:    LRM operation VMA_VM1_monitor_10000 (call=5564,
status=1, cib-update=0, confirmed=false) Cancelled
Dec 04 15:18:24 [3818929] CLUSTER-1       crmd:    debug:
update_history_cache:         Updating history for 'VMA_VM1' with
monitor op
VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:26 INFO: VM1:
live migration to CLUSTER-2 succeeded.
Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:    debug:
operation_finished:      VMA_VM1_migrate_to_0:1797698 - exited with rc=0
Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:   notice:
operation_finished:      VMA_VM1_migrate_to_0:1797698 [
2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2
(using virsh --connect=qemu:///system --quiet migrate --live  VM1
qemu+tcp://CLUSTER-2/system ). ]
Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:   notice:
operation_finished:      VMA_VM1_migrate_to_0:1797698 [
2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ]
Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:    debug:
create_operation_update:      do_update_resource: Updating resouce
VMA_VM1 after complete migrate_to op (interval=0)
Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:   notice:
process_lrm_event:    LRM operation VMA_VM1_migrate_to_0 (call=5697,
rc=0, cib-update=89, confirmed=true) ok
Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:    debug:
update_history_cache:         Updating history for 'VMA_VM1' with
migrate_to op
Dec 04 15:18:31 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
Operation VMA_VM1:5564 already cancelled
Dec 04 15:18:31 [3818929] CLUSTER-1       crmd:    debug:
do_lrm_rsc_op:        Performing
key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0
VirtualDomain(VMA_VM1)[1798719]:        2015/12/04_15:18:31 DEBUG:
Virtual domain VM1 is not running:  failed to get domain 'vm1' domain
not found: no domain with matching name 'vm1'

This looks like the problem. Configuration error?

As far as I checked this is a harmless bug in VirtualDomain RA. Itdowncases the output from "virsh domain info" command, to be able toparse the status easily, which prevents matching the domain name.In any case this error doesn't affect the RA functionality, in this caseit just finds out that the resource is already stopped, while my bigconcern is why is it stopped.

VirtualDomain(VMA_VM1)[1798719]:        2015/12/04_15:18:31 INFO: Domain
VM1 already stopped.
Dec 04 15:18:31 [4434] CLUSTER-1       lrmd:    debug:
operation_finished:      VMA_VM1_stop_0:1798719 - exited with rc=0
Dec 04 15:18:31 [4434] CLUSTER-1       lrmd:   notice:
operation_finished:      VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31
INFO: Domain VM1 already stopped. ]
Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:    debug:
create_operation_update:      do_update_resource: Updating resouce
VMA_VM1 after complete stop op (interval=0)
Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:   notice:
process_lrm_event:    LRM operation VMA_VM1_stop_0 (call=5701, rc=0,
cib-update=90, confirmed=true) ok
Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:    debug:
update_history_cache:         Updating history for 'VMA_VM1' with stop op
Dec 04 15:20:58 [3818929] CLUSTER-1       crmd:    debug:
create_operation_update:      build_active_RAs: Updating resouce VMA_VM1
after complete stop op (interval=0)
Dec 04 15:20:58 [3818929] CLUSTER-1       crmd:    debug:
create_operation_update:      build_active_RAs: Updating resouce VMA_VM1
after complete monitor op (interval=0)
Dec 04 15:23:31 [1833996] CLUSTER-1 crm_resource:    debug:
process_orphan_resource:    Detected orphan resource VMA_VM1 on CLUSTER-2

Any suggestions are welcome.

Best regards,
Klecho

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Resources suddenly get target-role="stopped"

Reply via email to