Re: [Pacemaker] Question about the error when fencing failed

Andrew Beekhof Sun, 07 Apr 2013 19:09:16 -0700

On 05/04/2013, at 3:21 PM, Kazunori INOUE <inouek...@intellilink.co.jp> wrote:


> Hi,
> 
> When fencing failed (*1) on the following conditions, an error occurs
> in stonith_perform_callback().
> 
> - using fencing-topology. (*2)
> - fence DC node. ($ crm node fence dev2)
> 
> Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request: Client 
> crmd.2282.b9e69280 wants to fence (reboot) 'dev2' with device '(any)'
> Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request: Forwarding 
> complex self fencing request to peer dev1
> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info: stonith_command: Processed 
> st_fence from crmd.2282: Operation now in progress (-115)
> Apr  3 17:04:47 dev2 pengine[2281]:  warning: process_pe_message: Calculated 
> Transition 2: /var/lib/pacemaker/pengine/pe-warn-0.bz2
> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info: stonith_command: Processed 
> st_query from dev1: OK (0)
> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info: stonith_action_create: 
> Initiating action list for agent fence_legacy (target=(null))
> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info: stonith_command: Processed 
> st_timeout_update from dev1: OK (0)
> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info: dynamic_list_search_cb: 
> Refreshing port list for f-dev1
> Apr  3 17:04:48 dev2 stonith-ng[2278]:   notice: remote_op_done: Operation 
> reboot of dev2 by dev1 for crmd.2282@dev1.4494ed41: Generic Pacemaker error
> Apr  3 17:04:48 dev2 stonith-ng[2278]:     info: stonith_command: Processed 
> st_notify reply from dev1: OK (0)
> Apr  3 17:04:48 dev2 crmd[2282]:    error: crm_abort: 
> stonith_perform_callback: Triggered assert at st_client.c:1894 : call_id > 0
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result   <st-reply st_origin="stonith_construct_reply" t="stonith-ng" 
> st_rc="-201" st_op="st_query" st_callid="0" 
> st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1" st_clientname="crmd.2282" 
> st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_callopt="0" 
> st_delegate="dev1">
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result     <st_calldata>
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result       <st-reply t="st_notify" subt="broadcast" st_op="reboot" 
> count="1" src="dev1" state="4" st_target="dev2">
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result         <st_calldata>
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result           <st_notify_fence state="4" st_rc="-201" st_target="dev2" 
> st_device_action="reboot" st_delegate="dev1" 
> st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_origin="dev1" 
> st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1" st_clientname="crmd.2282"/>
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result         </st_calldata>
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result       </st-reply>
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result     </st_calldata>
> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback: Bad 
> result   </st-reply>
> Apr  3 17:04:48 dev2 crmd[2282]:  warning: stonith_perform_callback: STONITH 
> command failed: Generic Pacemaker error
> Apr  3 17:04:48 dev2 crmd[2282]:   notice: tengine_stonith_notify: Peer dev2 
> was not terminated (st_notify_fence) by dev1 for dev1: Generic Pacemaker 
> error (ref=4494ed41-2306-4707-8406-fa066b7f3ef0) by client crmd.2282
> Apr  3 17:07:11 dev2 crmd[2282]:    error: stonith_async_timeout_handler: 
> Async call 2 timed out after 144000ms
> 
> Is this the designed behavior?

Definitely not :-(
Is this the first fencing operation that has been initiated by the cluster?  Or 
has the cluster been running for some time?

> 
> 
> *1: I added "exit 1" to reset() of stonith-plugin in order to make
>    fencing fail.
> 
>  $ diff -u libvirt.ORG libvirt
>  --- libvirt.ORG 2012-12-17 09:56:37.000000000 +0900
>  +++ libvirt     2013-04-03 16:33:08.118157947 +0900
>  @@ -240,6 +240,7 @@
>       ;;
> 
>       reset)
>  +    exit 1
>       libvirt_check_config
>       libvirt_set_domain_id $2
> 
> *2:
>  node $id="3232261523" dev2
>  node $id="3232261525" dev1
>  primitive f-dev1 stonith:external/libvirt \
>      params pcmk_reboot_retries="1" hostlist="dev1" \
>      hypervisor_uri="qemu+ssh://bl460g1n5/system"
>  primitive f-dev2 stonith:external/libvirt \
>      params pcmk_reboot_retries="1" hostlist="dev2" \
>      hypervisor_uri="qemu+ssh://bl460g1n6/system"
>  location rsc_location-f-dev1 f-dev1 \
>      rule $id="rsc_location-f-dev1-rule" -inf: #uname eq dev1
>  location rsc_location-f-dev2 f-dev2 \
>      rule $id="rsc_location-f-dev2-rule" -inf: #uname eq dev2
>  fencing_topology \
>      dev1: f-dev1 \
>      dev2: f-dev2
>  property $id="cib-bootstrap-options" \
>      dc-version="1.1.10-1.el6-132019b" \
>      cluster-infrastructure="corosync" \
>      no-quorum-policy="ignore" \
>      stonith-timeout="70s"
> 
> Best Regards,
> Kazunori INOUE
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Question about the error when fencing failed

Reply via email to