[Pacemaker] timeout rebooting with stonith_sbd

Alexandr A. Alexandrov Tue, 14 May 2013 07:56:34 -0700

Hi!

I have a two-node cluster (virtual machines) with several resources andshared storage.When the connectivity is lost (for some reason still needed to bedebuged), here is what I get (I am skipping unrelated messages)

May 14 16:49:21 wcs2 corosync[27531]: [TOTEM ] The token was lost inthe OPERATIONAL state.May 14 16:49:21 wcs2 corosync[27531]: [TOTEM ] A processor failed,forming new configuration.

Why corosync connectivity is lost? There was nothing suspicious in thelogs at all.

May 14 16:49:36 wcs2 corosync[27531]: [VOTEQ ] node 739269211 state=2,votes=1, expected=2May 14 16:49:36 wcs2 corosync[27531]: [VOTEQ ] node 739269212 state=1,votes=1, expected=2May 14 16:49:36 wcs2 corosync[27531]: [QUORUM] This node is within thenon-primary component and will NOT provide any services.

May 14 16:49:36 wcs2 corosync[27531]:   [QUORUM] Members[1]: 739269212

May 14 16:49:36 wcs2 corosync[27531]: [QUORUM] sending quorumnotification to (nil), length = 52May 14 16:49:36 wcs2 crmd[11381]: warning: match_down_event: No matchfor shutdown action on 739269211May 14 16:49:36 wcs2 crmd[11381]: notice: peer_update_callback:Stonith/shutdown of wcs1 not matched


What does that warning mean?

May 14 16:49:37 wcs2 pengine[27574]: notice: unpack_config: On loss ofCCM Quorum: IgnoreMay 14 16:49:37 wcs2 pengine[27574]: warning: pe_fence_node: Node wcs1will be fenced because stonith_sbd is thought to be active thereMay 14 16:49:37 wcs2 pengine[27574]: warning: custom_action: Actionstonith_sbd_stop_0 on wcs1 is unrunnable (offline)May 14 16:49:37 wcs2 pengine[27574]: warning: stage6: Scheduling Nodewcs1 for STONITHMay 14 16:49:37 wcs2 pengine[27574]: notice: LogActions: Movestonith_sbd#011(Started wcs1 -> wcs2)

All resources were active on node wcs2 (survived), stonith_sbd wasactive on node wcs1

May 14 16:49:37 wcs2 crmd[11381]: notice: te_fence_node: Executingreboot fencing operation (38) on wcs1 (timeout=60000)May 14 16:49:37 wcs2 stonith-ng[27571]: notice: handle_request: Clientcrmd.11381.a02439c4 wants to fence (reboot) 'wcs1' with device '(any)'May 14 16:49:37 wcs2 stonith-ng[27571]: notice:initiate_remote_stonith_op: Initiating remote operation reboot for wcs1:37151815-2182-42fa-b32e-86288b1808

5b (0)

Now, as these are actually virtual machines, reboot takes place quitequickly:

May 14 16:49:46 wcs2 crmd[11381]: notice: pcmk_quorum_notification:Membership 1000: quorum acquired (2)May 14 16:49:46 wcs2 crmd[11381]: notice: crm_update_peer_state:pcmk_quorum_notification: Node wcs1[739269211] - state is now memberMay 14 16:50:05 wcs2 crmd[11381]: notice: do_state_transition: Statetransition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DCcause=C_FSA_INTERNAL origin=do_election_check ]May 14 16:50:07 wcs2 attrd[27573]: notice: attrd_local_callback:Sending full refresh (origin=crmd)May 14 16:50:07 wcs2 attrd[27573]: notice: attrd_trigger_update:Sending flush op to all hosts for: probe_complete (true)May 14 16:50:49 wcs2 stonith-ng[27571]: error: remote_op_done:Operation reboot of wcs1 by wcs2 for crmd.11381@wcs2.37151815: Timer expiredMay 14 16:50:49 wcs2 crmd[11381]: notice: tengine_stonith_callback:Stonith operation 11/38:2655:0:8f1636b7-dd1d-470c-b645-65a9c8743a69:Timer expired (-62)May 14 16:50:49 wcs2 crmd[11381]: notice: tengine_stonith_callback:Stonith operation 11 for wcs1 failed (Timer expired): aborting transition.May 14 16:50:49 wcs2 crmd[11381]: notice: tengine_stonith_notify: Peerwcs1 was not terminated (st_notify_fence) by wcs2 for wcs2: Timerexpired (ref=37151815-2182-42fa-b32e-86288b18085b) by client crmd.11381


But why reboot operation timers expire?




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] timeout rebooting with stonith_sbd

Reply via email to