Hi Andrew, I confirmed that this problem was fixed. Thanks!
> -----Original Message----- > From: Andrew Beekhof [mailto:and...@beekhof.net] > Sent: Wednesday, April 17, 2013 2:04 PM > To: The Pacemaker cluster resource manager > Cc: shimaza...@intellilink.co.jp > Subject: Re: [Pacemaker] Question about the error when fencing failed > > This should solve your issue: > > https://github.com/beekhof/pacemaker/commit/dbbb6a6 > > On 11/04/2013, at 7:23 PM, Kazunori INOUE <inouek...@intellilink.co.jp> wrote: > > > Hi Andrew, > > > > (13.04.08 11:04), Andrew Beekhof wrote: > >> > >> On 05/04/2013, at 3:21 PM, Kazunori INOUE <inouek...@intellilink.co.jp> > wrote: > >> > >>> Hi, > >>> > >>> When fencing failed (*1) on the following conditions, an error occurs > >>> in stonith_perform_callback(). > >>> > >>> - using fencing-topology. (*2) > >>> - fence DC node. ($ crm node fence dev2) > >>> > >>> Apr 3 17:04:47 dev2 stonith-ng[2278]: notice: handle_request: Client > crmd.2282.b9e69280 wants to fence (reboot) 'dev2' with device '(any)' > >>> Apr 3 17:04:47 dev2 stonith-ng[2278]: notice: handle_request: > Forwarding complex self fencing request to peer dev1 > >>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: > Processed st_fence from crmd.2282: Operation now in progress (-115) > >>> Apr 3 17:04:47 dev2 pengine[2281]: warning: process_pe_message: > Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-0.bz2 > >>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: > Processed st_query from dev1: OK (0) > >>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: stonith_action_create: > Initiating action list for agent fence_legacy (target=(null)) > >>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: > Processed st_timeout_update from dev1: OK (0) > >>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: dynamic_list_search_cb: > Refreshing port list for f-dev1 > >>> Apr 3 17:04:48 dev2 stonith-ng[2278]: notice: remote_op_done: > Operation reboot of dev2 by dev1 for crmd.2282@dev1.4494ed41: Generic > Pacemaker error > >>> Apr 3 17:04:48 dev2 stonith-ng[2278]: info: stonith_command: > Processed st_notify reply from dev1: OK (0) > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: crm_abort: > stonith_perform_callback: Triggered assert at st_client.c:1894 : call_id > 0 > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result <st-reply st_origin="stonith_construct_reply" t="stonith-ng" > st_rc="-201" st_op="st_query" st_callid="0" > st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1" > st_clientname="crmd.2282" > st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_callopt="0" > st_delegate="dev1"> > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result <st_calldata> > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result <st-reply t="st_notify" subt="broadcast" st_op="reboot" > count="1" src="dev1" state="4" st_target="dev2"> > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result <st_calldata> > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result <st_notify_fence state="4" st_rc="-201" st_target="dev2" > st_device_action="reboot" st_delegate="dev1" > st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_origin="dev1" > st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1" > st_clientname="crmd.2282"/> > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result </st_calldata> > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result </st-reply> > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result </st_calldata> > >>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad > result </st-reply> > >>> Apr 3 17:04:48 dev2 crmd[2282]: warning: stonith_perform_callback: > STONITH command failed: Generic Pacemaker error > >>> Apr 3 17:04:48 dev2 crmd[2282]: notice: tengine_stonith_notify: Peer > dev2 was not terminated (st_notify_fence) by dev1 for dev1: Generic Pacemaker > error (ref=4494ed41-2306-4707-8406-fa066b7f3ef0) by client crmd.2282 > >>> Apr 3 17:07:11 dev2 crmd[2282]: error: > stonith_async_timeout_handler: Async call 2 timed out after 144000ms > >>> > >>> Is this the designed behavior? > >> > >> Definitely not :-( > >> Is this the first fencing operation that has been initiated by the cluster? > > > > Yes. > > I attached crm_report. > > > >> Or has the cluster been running for some time? > >> > > > > ---- > > Best Regards, > > Kazunori INOUE > > > >>> > >>> *1: I added "exit 1" to reset() of stonith-plugin in order to make > >>> fencing fail. > >>> > >>> $ diff -u libvirt.ORG libvirt > >>> --- libvirt.ORG 2012-12-17 09:56:37.000000000 +0900 > >>> +++ libvirt 2013-04-03 16:33:08.118157947 +0900 > >>> @@ -240,6 +240,7 @@ > >>> ;; > >>> > >>> reset) > >>> + exit 1 > >>> libvirt_check_config > >>> libvirt_set_domain_id $2 > >>> > >>> *2: > >>> node $id="3232261523" dev2 > >>> node $id="3232261525" dev1 > >>> primitive f-dev1 stonith:external/libvirt \ > >>> params pcmk_reboot_retries="1" hostlist="dev1" \ > >>> hypervisor_uri="qemu+ssh://bl460g1n5/system" > >>> primitive f-dev2 stonith:external/libvirt \ > >>> params pcmk_reboot_retries="1" hostlist="dev2" \ > >>> hypervisor_uri="qemu+ssh://bl460g1n6/system" > >>> location rsc_location-f-dev1 f-dev1 \ > >>> rule $id="rsc_location-f-dev1-rule" -inf: #uname eq dev1 > >>> location rsc_location-f-dev2 f-dev2 \ > >>> rule $id="rsc_location-f-dev2-rule" -inf: #uname eq dev2 > >>> fencing_topology \ > >>> dev1: f-dev1 \ > >>> dev2: f-dev2 > >>> property $id="cib-bootstrap-options" \ > >>> dc-version="1.1.10-1.el6-132019b" \ > >>> cluster-infrastructure="corosync" \ > >>> no-quorum-policy="ignore" \ > >>> stonith-timeout="70s" > >>> > >>> Best Regards, > >>> Kazunori INOUE > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > > > <unexplained-crmd-error.tar.bz2>__________________________________________ > _____ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org