Re: [ClusterLabs] Resources suddenly get target-role="stopped"

2015-12-04 Thread Ken Gaillot
On 12/04/2015 10:22 AM, Klechomir wrote:
> Hi list,
> My issue is the following:
> 
> I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8
> (observed the same problem with Corosync 2.3.5  & Pacemaker 1.1.13-rc3)
> 
> Bumped on this issue when started playing with VirtualDomain resources,
> but this seems to be unrelated to the RA.
> 
> The problem is that without apparent reason a resource gets
> target-role="Stopped". This happens after (successful) migration, or
> after failover., or after VM restart .
> 
> My tests showed that changing the resource name fixes this problem, but
> this seems to be a temporary workaround.
> 
> The resource configuration is:
> primitive VMA_VM1 ocf:heartbeat:VirtualDomain \
> params config="/NFSvolumes/CDrive1/VM1/VM1.xml"
> hypervisor="qemu:///system" migration_transport="tcp" \
> meta allow-migrate="true" target-role="Started" \
> op start interval="0" timeout="120s" \
> op stop interval="0" timeout="120s" \
> op monitor interval="10" timeout="30" depth="0" \
> utilization cpu="1" hv_memory="925"
> order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1
> 
> Here is the log from one such stop, after successful migration with "crm
> migrate resource VMA_VM1":
> 
> Dec 04 15:18:22 [3818929] CLUSTER-1   crmd:debug: cancel_op:   
> Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564)
> Dec 04 15:18:22 [4434] CLUSTER-1   lrmd: info:
> cancel_recurring_action: Cancelling operation VMA_VM1_monitor_1
> Dec 04 15:18:23 [3818929] CLUSTER-1   crmd:debug: cancel_op:   
> Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled
> Dec 04 15:18:23 [3818929] CLUSTER-1   crmd:debug:
> do_lrm_rsc_op:Performing
> key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_migrate_to_0
> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 DEBUG:
> Virtual domain VM1 is currently running.
> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 INFO: VM1:
> Starting live migration to CLUSTER-2 (using virsh
> --connect=qemu:///system --quiet migrate --live  VM1
> qemu+tcp://CLUSTER-2/system ).
> Dec 04 15:18:24 [3818929] CLUSTER-1   crmd: info:
> process_lrm_event:LRM operation VMA_VM1_monitor_1 (call=5564,
> status=1, cib-update=0, confirmed=false) Cancelled
> Dec 04 15:18:24 [3818929] CLUSTER-1   crmd:debug:
> update_history_cache: Updating history for 'VMA_VM1' with
> monitor op
> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:26 INFO: VM1:
> live migration to CLUSTER-2 succeeded.
> Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:debug:
> operation_finished:  VMA_VM1_migrate_to_0:1797698 - exited with rc=0
> Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:   notice:
> operation_finished:  VMA_VM1_migrate_to_0:1797698 [
> 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2
> (using virsh --connect=qemu:///system --quiet migrate --live  VM1
> qemu+tcp://CLUSTER-2/system ). ]
> Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:   notice:
> operation_finished:  VMA_VM1_migrate_to_0:1797698 [
> 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ]
> Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:debug:
> create_operation_update:  do_update_resource: Updating resouce
> VMA_VM1 after complete migrate_to op (interval=0)
> Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:   notice:
> process_lrm_event:LRM operation VMA_VM1_migrate_to_0 (call=5697,
> rc=0, cib-update=89, confirmed=true) ok
> Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:debug:
> update_history_cache: Updating history for 'VMA_VM1' with
> migrate_to op
> Dec 04 15:18:31 [3818929] CLUSTER-1   crmd:debug: cancel_op:   
> Operation VMA_VM1:5564 already cancelled
> Dec 04 15:18:31 [3818929] CLUSTER-1   crmd:debug:
> do_lrm_rsc_op:Performing
> key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0
> VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 DEBUG:
> Virtual domain VM1 is not running:  failed to get domain 'vm1' domain
> not found: no domain with matching name 'vm1'

This looks like the problem. Configuration error?

> VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 INFO: Domain
> VM1 already stopped.
> Dec 04 15:18:31 [4434] CLUSTER-1   lrmd:debug:
> operation_finished:  VMA_VM1_stop_0:1798719 - exited with rc=0
> Dec 04 15:18:31 [4434] CLUSTER-1   lrmd:   notice:
> operation_finished:  VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31
> INFO: Domain VM1 already stopped. ]
> Dec 04 15:18:32 [3818929] CLUSTER-1   crmd:debug:
> create_operation_update:  do_update_resource: Updating resouce
> VMA_VM1 after complete stop op (interval=0)
> Dec 04 15:18:32 [3818929] CLUSTER-1   crmd:   notice:
> process_lrm_event:LRM operation VMA_VM1_stop_0 (call=5701, rc=0,
> cib-update=90, confirmed=true) ok
> Dec 04 15:18:32 

[ClusterLabs] Resources suddenly get target-role="stopped"

2015-12-04 Thread Klechomir

Hi list,
My issue is the following:

I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8
(observed the same problem with Corosync 2.3.5  & Pacemaker 1.1.13-rc3)

Bumped on this issue when started playing with VirtualDomain resources, 
but this seems to be unrelated to the RA.


The problem is that without apparent reason a resource gets 
target-role="Stopped". This happens after (successful) migration, or 
after failover., or after VM restart .


My tests showed that changing the resource name fixes this problem, but 
this seems to be a temporary workaround.


The resource configuration is:
primitive VMA_VM1 ocf:heartbeat:VirtualDomain \
params config="/NFSvolumes/CDrive1/VM1/VM1.xml" 
hypervisor="qemu:///system" migration_transport="tcp" \

meta allow-migrate="true" target-role="Started" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="10" timeout="30" depth="0" \
utilization cpu="1" hv_memory="925"
order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1

Here is the log from one such stop, after successful migration with "crm 
migrate resource VMA_VM1":


Dec 04 15:18:22 [3818929] CLUSTER-1   crmd:debug: cancel_op:
Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564)
Dec 04 15:18:22 [4434] CLUSTER-1   lrmd: info: 
cancel_recurring_action: Cancelling operation VMA_VM1_monitor_1
Dec 04 15:18:23 [3818929] CLUSTER-1   crmd:debug: cancel_op:
Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled
Dec 04 15:18:23 [3818929] CLUSTER-1   crmd:debug: 
do_lrm_rsc_op:Performing 
key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_migrate_to_0
VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 DEBUG: 
Virtual domain VM1 is currently running.
VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 INFO: VM1: 
Starting live migration to CLUSTER-2 (using virsh 
--connect=qemu:///system --quiet migrate --live  VM1 
qemu+tcp://CLUSTER-2/system ).
Dec 04 15:18:24 [3818929] CLUSTER-1   crmd: info: 
process_lrm_event:LRM operation VMA_VM1_monitor_1 (call=5564, 
status=1, cib-update=0, confirmed=false) Cancelled
Dec 04 15:18:24 [3818929] CLUSTER-1   crmd:debug: 
update_history_cache: Updating history for 'VMA_VM1' with monitor op
VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:26 INFO: VM1: 
live migration to CLUSTER-2 succeeded.
Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:debug: 
operation_finished:  VMA_VM1_migrate_to_0:1797698 - exited with rc=0
Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:   notice: 
operation_finished:  VMA_VM1_migrate_to_0:1797698 [ 
2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 
(using virsh --connect=qemu:///system --quiet migrate --live  VM1 
qemu+tcp://CLUSTER-2/system ). ]
Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:   notice: 
operation_finished:  VMA_VM1_migrate_to_0:1797698 [ 
2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ]
Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:debug: 
create_operation_update:  do_update_resource: Updating resouce 
VMA_VM1 after complete migrate_to op (interval=0)
Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:   notice: 
process_lrm_event:LRM operation VMA_VM1_migrate_to_0 (call=5697, 
rc=0, cib-update=89, confirmed=true) ok
Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:debug: 
update_history_cache: Updating history for 'VMA_VM1' with 
migrate_to op
Dec 04 15:18:31 [3818929] CLUSTER-1   crmd:debug: cancel_op:
Operation VMA_VM1:5564 already cancelled
Dec 04 15:18:31 [3818929] CLUSTER-1   crmd:debug: 
do_lrm_rsc_op:Performing 
key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0
VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 DEBUG: 
Virtual domain VM1 is not running:  failed to get domain 'vm1' domain 
not found: no domain with matching name 'vm1'
VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 INFO: Domain 
VM1 already stopped.
Dec 04 15:18:31 [4434] CLUSTER-1   lrmd:debug: 
operation_finished:  VMA_VM1_stop_0:1798719 - exited with rc=0
Dec 04 15:18:31 [4434] CLUSTER-1   lrmd:   notice: 
operation_finished:  VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31 
INFO: Domain VM1 already stopped. ]
Dec 04 15:18:32 [3818929] CLUSTER-1   crmd:debug: 
create_operation_update:  do_update_resource: Updating resouce 
VMA_VM1 after complete stop op (interval=0)
Dec 04 15:18:32 [3818929] CLUSTER-1   crmd:   notice: 
process_lrm_event:LRM operation VMA_VM1_stop_0 (call=5701, rc=0, 
cib-update=90, confirmed=true) ok
Dec 04 15:18:32 [3818929] CLUSTER-1   crmd:debug: 
update_history_cache: Updating history for 'VMA_VM1' with stop op
Dec 04 15:20:58 [3818929] CLUSTER-1   crmd:debug: 
create_operation_update:  build_active_RAs: Updating