Hello, On 03/02/2012 12:06 PM, Thomas Boernert wrote: > Hi List, > > my problem is that stonith will execute the command to fence on the remote > dead host and not on the local machine :-(. this will end with an timeout. > > some facts: > - 2 node cluster with 2 dell servers > - each server have an own drac card > - pacemaker 1.1.6 > - heartbeat 3.0.4 > - corosync 1.4.1 > > node1 should fence node2 if node2 is dead and > node2 should fence node1 if node1 is dead > > it works fine manual with the stonith script > fence_drac5 ....
try adding pcmk_host_check="static-list" to your stonith resources Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > my config > <---------------------------------- snip --------------------------------> > node node1 \ > attributes standby="off" > node node2 \ > attributes standby="off" > primitive httpd ocf:heartbeat:apache \ > params configfile="/etc/httpd/conf/httpd.conf" port="80" \ > op start interval="0" timeout="60s" \ > op monitor interval="5s" timeout="20s" \ > op stop interval="0" timeout="60s" > primitive node1-stonith stonith:fence_drac5 \ > params ipaddr="192.168.1.101" login="root" passwd="1234" action="reboot" > secure="true" cmd_prompt="admin1->" power_wait="300" pcmk_host_list="node1" > primitive node2-stonith stonith:fence_drac5 \ > params ipaddr="192.168.1.102" login="root" passwd="1234" action="reboot" > secure="true" cmd_prompt="admin1->" power_wait="300" pcmk_host_list="node2" > primitive nodeIP ocf:heartbeat:IPaddr2 \ > op monitor interval="60" timeout="20" \ > params ip="192.168.1.10" cidr_netmask="24" nic="eth0:0" > broadcast="192.168.1.255" > primitive nodeIParp ocf:heartbeat:SendArp \ > params ip="192.168.1.10" nic="eth0:0" > group WebServices nodeIP nodeIParp httpd > location node1-stonith-log node1-stonith -inf: node1 > location node2-stonith-log node2-stonith -inf: node2 > property $id="cib-bootstrap-options" \ > dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="true" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1330685786" > <---------------------------------- snip --------------------------------> > > [root@node2 ~]# stonith_admin -l node1 > node1-stonith > 1 devices found > > it seems ok > > now i try > > [root@node2 ~]# stonith_admin -V -F node1 > stonith_admin[5685]: 2012/03/02_13:00:44 debug: main: Create > stonith_admin[5685]: 2012/03/02_13:00:44 debug: > init_client_ipc_comms_nodispatch: Attempting to talk on: > /var/run/crm/st_command > stonith_admin[5685]: 2012/03/02_13:00:44 debug: get_stonith_token: Obtained > registration token: 6258828b-4b19-472f-9256-8da36fe87962 > stonith_admin[5685]: 2012/03/02_13:00:44 debug: > init_client_ipc_comms_nodispatch: Attempting to talk on: > /var/run/crm/st_callback > stonith_admin[5685]: 2012/03/02_13:00:44 debug: get_stonith_token: Obtained > registration token: 6266ebb8-2112-4378-a00c-3eaff47c9a9d > stonith_admin[5685]: 2012/03/02_13:00:44 debug: stonith_api_signon: > Connection to STONITH successful > stonith_admin[5685]: 2012/03/02_13:00:44 debug: main: Connect: 0 > Command failed: Operation timed out > stonith_admin[5685]: 2012/03/02_13:00:56 debug: stonith_api_signoff: Signing > out of the STONITH Service > stonith_admin[5685]: 2012/03/02_13:00:56 debug: main: Disconnect: -8 > stonith_admin[5685]: 2012/03/02_13:00:56 debug: main: Destroy > > the log on node2 shows: > <----------------------------------------------- snip > ---------------------------------------> > Mar 2 13:00:58 node2 crmd: [2665]: info: te_fence_node: Executing reboot > fencing operation (21) on node1 (timeout=60000) > Mar 2 13:00:58 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: > Initiating remote operation reboot for node1: > 3325df94-8d59-4c00-a37e-be31e79f7503 > Mar 2 13:00:58 node2 stonith-ng: [2638]: info: stonith_command: Processed > st_query from node2: rc=0 > <----------------------------------------------- snip > ---------------------------------------> > > why remote on the dead host ? > > Thanks > > Thomas > > the complete log > <----------------------------------------------- snip > ---------------------------------------> > Mar 2 13:00:44 node2 stonith_admin: [5685]: info: crm_log_init_worker: > Changed active directory to /var/lib/heartbeat/cores/root > Mar 2 13:00:44 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: > Initiating remote operation off for node1: > 7d8beca4-1853-44fd-9bb2-4015b080c37b > Mar 2 13:00:44 node2 stonith-ng: [2638]: info: stonith_command: Processed > st_query from node2: rc=0 > Mar 2 13:00:46 node2 stonith-ng: [2660]: ERROR: remote_op_query_timeout: > Query 561e89af-6f5a-45cb-adc2-45389940f1db for node1 timed out > Mar 2 13:00:46 node2 stonith-ng: [2660]: ERROR: remote_op_timeout: Action > reboot (561e89af-6f5a-45cb-adc2-45389940f1db) for node1 timed out > Mar 2 13:00:46 node2 stonith-ng: [2660]: info: remote_op_done: Notifing > clients of 561e89af-6f5a-45cb-adc2-45389940f1db (reboot of node1 from > 8231841e-3537-44a9-8870-899d0d846c42 by (null)): 0, rc=-8 > Mar 2 13:00:46 node2 stonith-ng: [2660]: info: stonith_notify_client: > Sending st_fence-notification to client > 2665/ff16ec78-3634-444c-88a6-275ce79eec6b > Mar 2 13:00:46 node2 crmd: [2665]: info: tengine_stonith_callback: StonithOp > <remote-op state="0" st_target="node1" st_op="reboot" /> > Mar 2 13:00:46 node2 crmd: [2665]: info: tengine_stonith_callback: Stonith > operation 798/21:815:0:d274c31a-571b-4e22-b453-1c151a8871b1: Operation timed > out (-8) > Mar 2 13:00:46 node2 crmd: [2665]: ERROR: tengine_stonith_callback: Stonith > of node1 failed (-8)... aborting transition. > Mar 2 13:00:46 node2 crmd: [2665]: info: abort_transition_graph: > tengine_stonith_callback:454 - Triggered transition abort (complete=0) : > Stonith failed > Mar 2 13:00:46 node2 crmd: [2665]: info: update_abort_priority: Abort > priority upgraded from 0 to 1000000 > Mar 2 13:00:46 node2 crmd: [2665]: info: update_abort_priority: Abort action > done superceeded by restart > Mar 2 13:00:46 node2 crmd: [2665]: ERROR: tengine_stonith_notify: Peer node1 > could not be terminated (reboot) by <anyone> for node2 > (ref=561e89af-6f5a-45cb-adc2-45389940f1db): Operation timed out > Mar 2 13:00:46 node2 crmd: [2665]: info: run_graph: > ==================================================== > Mar 2 13:00:46 node2 crmd: [2665]: notice: run_graph: Transition 815 > (Complete=3, Pending=0, Fired=0, Skipped=14, Incomplete=0, > Source=/var/lib/pengine/pe-warn-39.bz2): Stopped > Mar 2 13:00:46 node2 crmd: [2665]: info: te_graph_trigger: Transition 815 is > now complete > Mar 2 13:00:46 node2 crmd: [2665]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=notify_crmd ] > Mar 2 13:00:46 node2 crmd: [2665]: info: do_state_transition: All 1 cluster > nodes are eligible to run resources. > Mar 2 13:00:46 node2 crmd: [2665]: info: do_pe_invoke: Query 1271: > Requesting the current CIB: S_POLICY_ENGINE > Mar 2 13:00:46 node2 crmd: [2665]: info: do_pe_invoke_callback: Invoking the > PE: query=1271, ref=pe_calc-dc-1330689646-1028, seq=404, quorate=0 > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Mar 2 13:00:46 node2 pengine: [2664]: WARN: pe_fence_node: Node node1 will > be fenced because it is un-expectedly down > Mar 2 13:00:46 node2 pengine: [2664]: WARN: determine_online_status: Node > node1 is unclean > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIParp_last_failure_0 found resource nodeIParp active on node2 > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > node1-stonith_last_failure_0 found resource node1-stonith active on node2 > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIParp_last_failure_0 found resource nodeIParp active on node1 > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIP_last_failure_0 found resource nodeIP active on node1 > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > httpd_last_failure_0 found resource httpd active on node1 > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Action > nodeIP_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:46 node2 pengine: [2664]: notice: RecurringOp: Start recurring > monitor (60s) for nodeIP on node2 > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Action > nodeIParp_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Action > httpd_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:46 node2 pengine: [2664]: notice: RecurringOp: Start recurring > monitor (5s) for httpd on node2 > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Action > node2-stonith_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:46 node2 pengine: [2664]: WARN: stage6: Scheduling Node node1 > for STONITH > Mar 2 13:00:46 node2 pengine: [2664]: notice: LogActions: Move > nodeIP#011(Started node1 -> node2) > Mar 2 13:00:46 node2 pengine: [2664]: notice: LogActions: Move > nodeIParp#011(Started node1 -> node2) > Mar 2 13:00:46 node2 pengine: [2664]: notice: LogActions: Move > httpd#011(Started node1 -> node2) > Mar 2 13:00:46 node2 pengine: [2664]: notice: LogActions: Leave > node1-stonith#011(Started node2) > Mar 2 13:00:46 node2 pengine: [2664]: notice: LogActions: Stop > node2-stonith#011(node1) > Mar 2 13:00:46 node2 pengine: [2664]: WARN: process_pe_message: Transition > 816: WARNINGs found during PE processing. PEngine Input stored in: > /var/lib/pengine/pe-warn-39.bz2 > Mar 2 13:00:46 node2 pengine: [2664]: notice: process_pe_message: > Configuration WARNINGs found during PE processing. Please run "crm_verify > -L" to identify issues. > Mar 2 13:00:46 node2 crmd: [2665]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Mar 2 13:00:46 node2 crmd: [2665]: info: unpack_graph: Unpacked transition > 816: 17 actions in 17 synapses > Mar 2 13:00:46 node2 crmd: [2665]: info: do_te_invoke: Processing graph 816 > (ref=pe_calc-dc-1330689646-1028) derived from /var/lib/pengine/pe-warn-39.bz2 > Mar 2 13:00:46 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 18 > fired and confirmed > Mar 2 13:00:46 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 19 > fired and confirmed > Mar 2 13:00:46 node2 crmd: [2665]: info: te_fence_node: Executing reboot > fencing operation (21) on node1 (timeout=60000) > Mar 2 13:00:46 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: > Initiating remote operation reboot for node1: > 07f10c9c-b33e-41b4-8781-fb32eb850bd2 > Mar 2 13:00:46 node2 stonith-ng: [2638]: info: stonith_command: Processed > st_query from node2: rc=0 > Mar 2 13:00:52 node2 stonith-ng: [2660]: ERROR: remote_op_query_timeout: > Query 07f10c9c-b33e-41b4-8781-fb32eb850bd2 for node1 timed out > Mar 2 13:00:52 node2 stonith-ng: [2660]: ERROR: remote_op_timeout: Action > reboot (07f10c9c-b33e-41b4-8781-fb32eb850bd2) for node1 timed out > Mar 2 13:00:52 node2 stonith-ng: [2660]: info: remote_op_done: Notifing > clients of 07f10c9c-b33e-41b4-8781-fb32eb850bd2 (reboot of node1 from > 8231841e-3537-44a9-8870-899d0d846c42 by (null)): 0, rc=-8 > Mar 2 13:00:52 node2 stonith-ng: [2660]: info: stonith_notify_client: > Sending st_fence-notification to client > 2665/ff16ec78-3634-444c-88a6-275ce79eec6b > Mar 2 13:00:52 node2 crmd: [2665]: info: tengine_stonith_callback: StonithOp > <remote-op state="0" st_target="node1" st_op="reboot" /> > Mar 2 13:00:52 node2 crmd: [2665]: info: tengine_stonith_callback: Stonith > operation 799/21:816:0:d274c31a-571b-4e22-b453-1c151a8871b1: Operation timed > out (-8) > Mar 2 13:00:52 node2 crmd: [2665]: ERROR: tengine_stonith_callback: Stonith > of node1 failed (-8)... aborting transition. > Mar 2 13:00:52 node2 crmd: [2665]: info: abort_transition_graph: > tengine_stonith_callback:454 - Triggered transition abort (complete=0) : > Stonith failed > Mar 2 13:00:52 node2 crmd: [2665]: info: update_abort_priority: Abort > priority upgraded from 0 to 1000000 > Mar 2 13:00:52 node2 crmd: [2665]: info: update_abort_priority: Abort action > done superceeded by restart > Mar 2 13:00:52 node2 crmd: [2665]: ERROR: tengine_stonith_notify: Peer node1 > could not be terminated (reboot) by <anyone> for node2 > (ref=07f10c9c-b33e-41b4-8781-fb32eb850bd2): Operation timed out > Mar 2 13:00:52 node2 crmd: [2665]: info: run_graph: > ==================================================== > Mar 2 13:00:52 node2 crmd: [2665]: notice: run_graph: Transition 816 > (Complete=3, Pending=0, Fired=0, Skipped=14, Incomplete=0, > Source=/var/lib/pengine/pe-warn-39.bz2): Stopped > Mar 2 13:00:52 node2 crmd: [2665]: info: te_graph_trigger: Transition 816 is > now complete > Mar 2 13:00:52 node2 crmd: [2665]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=notify_crmd ] > Mar 2 13:00:52 node2 crmd: [2665]: info: do_state_transition: All 1 cluster > nodes are eligible to run resources. > Mar 2 13:00:52 node2 crmd: [2665]: info: do_pe_invoke: Query 1272: > Requesting the current CIB: S_POLICY_ENGINE > Mar 2 13:00:52 node2 crmd: [2665]: info: do_pe_invoke_callback: Invoking the > PE: query=1272, ref=pe_calc-dc-1330689652-1029, seq=404, quorate=0 > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Mar 2 13:00:52 node2 pengine: [2664]: WARN: pe_fence_node: Node node1 will > be fenced because it is un-expectedly down > Mar 2 13:00:52 node2 pengine: [2664]: WARN: determine_online_status: Node > node1 is unclean > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIParp_last_failure_0 found resource nodeIParp active on node2 > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > node1-stonith_last_failure_0 found resource node1-stonith active on node2 > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIParp_last_failure_0 found resource nodeIParp active on node1 > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIP_last_failure_0 found resource nodeIP active on node1 > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > httpd_last_failure_0 found resource httpd active on node1 > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Action > nodeIP_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:52 node2 pengine: [2664]: notice: RecurringOp: Start recurring > monitor (60s) for nodeIP on node2 > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Action > nodeIParp_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Action > httpd_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:52 node2 pengine: [2664]: notice: RecurringOp: Start recurring > monitor (5s) for httpd on node2 > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Action > node2-stonith_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:52 node2 pengine: [2664]: WARN: stage6: Scheduling Node node1 > for STONITH > Mar 2 13:00:52 node2 pengine: [2664]: notice: LogActions: Move > nodeIP#011(Started node1 -> node2) > Mar 2 13:00:52 node2 pengine: [2664]: notice: LogActions: Move > nodeIParp#011(Started node1 -> node2) > Mar 2 13:00:52 node2 pengine: [2664]: notice: LogActions: Move > httpd#011(Started node1 -> node2) > Mar 2 13:00:52 node2 pengine: [2664]: notice: LogActions: Leave > node1-stonith#011(Started node2) > Mar 2 13:00:52 node2 pengine: [2664]: notice: LogActions: Stop > node2-stonith#011(node1) > Mar 2 13:00:52 node2 pengine: [2664]: WARN: process_pe_message: Transition > 817: WARNINGs found during PE processing. PEngine Input stored in: > /var/lib/pengine/pe-warn-39.bz2 > Mar 2 13:00:52 node2 pengine: [2664]: notice: process_pe_message: > Configuration WARNINGs found during PE processing. Please run "crm_verify > -L" to identify issues. > Mar 2 13:00:52 node2 crmd: [2665]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Mar 2 13:00:52 node2 crmd: [2665]: info: unpack_graph: Unpacked transition > 817: 17 actions in 17 synapses > Mar 2 13:00:52 node2 crmd: [2665]: info: do_te_invoke: Processing graph 817 > (ref=pe_calc-dc-1330689652-1029) derived from /var/lib/pengine/pe-warn-39.bz2 > Mar 2 13:00:52 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 18 > fired and confirmed > Mar 2 13:00:52 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 19 > fired and confirmed > Mar 2 13:00:52 node2 crmd: [2665]: info: te_fence_node: Executing reboot > fencing operation (21) on node1 (timeout=60000) > Mar 2 13:00:52 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: > Initiating remote operation reboot for node1: > a4ebce93-0eee-43dd-b610-0115e62b0285 > Mar 2 13:00:52 node2 stonith-ng: [2638]: info: stonith_command: Processed > st_query from node2: rc=0 > Mar 2 13:00:56 node2 stonith-ng: [2660]: ERROR: remote_op_query_timeout: > Query 7d8beca4-1853-44fd-9bb2-4015b080c37b for node1 timed out > Mar 2 13:00:56 node2 stonith-ng: [2660]: ERROR: remote_op_timeout: Action > off (7d8beca4-1853-44fd-9bb2-4015b080c37b) for node1 timed out > Mar 2 13:00:56 node2 stonith-ng: [2660]: info: remote_op_done: Notifing > clients of 7d8beca4-1853-44fd-9bb2-4015b080c37b (off of node1 from > 6258828b-4b19-472f-9256-8da36fe87962 by (null)): 0, rc=-8 > Mar 2 13:00:56 node2 stonith-ng: [2660]: info: stonith_notify_client: > Sending st_fence-notification to client > 2665/ff16ec78-3634-444c-88a6-275ce79eec6b > Mar 2 13:00:56 node2 crmd: [2665]: ERROR: tengine_stonith_notify: Peer node1 > could not be terminated (off) by <anyone> for node2 > (ref=7d8beca4-1853-44fd-9bb2-4015b080c37b): Operation timed out > Mar 2 13:00:58 node2 stonith-ng: [2660]: ERROR: remote_op_query_timeout: > Query a4ebce93-0eee-43dd-b610-0115e62b0285 for node1 timed out > Mar 2 13:00:58 node2 stonith-ng: [2660]: ERROR: remote_op_timeout: Action > reboot (a4ebce93-0eee-43dd-b610-0115e62b0285) for node1 timed out > Mar 2 13:00:58 node2 stonith-ng: [2660]: info: remote_op_done: Notifing > clients of a4ebce93-0eee-43dd-b610-0115e62b0285 (reboot of node1 from > 8231841e-3537-44a9-8870-899d0d846c42 by (null)): 0, rc=-8 > Mar 2 13:00:58 node2 stonith-ng: [2660]: info: stonith_notify_client: > Sending st_fence-notification to client > 2665/ff16ec78-3634-444c-88a6-275ce79eec6b > Mar 2 13:00:58 node2 crmd: [2665]: info: tengine_stonith_callback: StonithOp > <remote-op state="0" st_target="node1" st_op="reboot" /> > Mar 2 13:00:58 node2 crmd: [2665]: info: tengine_stonith_callback: Stonith > operation 800/21:817:0:d274c31a-571b-4e22-b453-1c151a8871b1: Operation timed > out (-8) > Mar 2 13:00:58 node2 crmd: [2665]: ERROR: tengine_stonith_callback: Stonith > of node1 failed (-8)... aborting transition. > Mar 2 13:00:58 node2 crmd: [2665]: info: abort_transition_graph: > tengine_stonith_callback:454 - Triggered transition abort (complete=0) : > Stonith failed > Mar 2 13:00:58 node2 crmd: [2665]: info: update_abort_priority: Abort > priority upgraded from 0 to 1000000 > Mar 2 13:00:58 node2 crmd: [2665]: info: update_abort_priority: Abort action > done superceeded by restart > Mar 2 13:00:58 node2 crmd: [2665]: ERROR: tengine_stonith_notify: Peer node1 > could not be terminated (reboot) by <anyone> for node2 > (ref=a4ebce93-0eee-43dd-b610-0115e62b0285): Operation timed out > Mar 2 13:00:58 node2 crmd: [2665]: info: run_graph: > ==================================================== > Mar 2 13:00:58 node2 crmd: [2665]: notice: run_graph: Transition 817 > (Complete=3, Pending=0, Fired=0, Skipped=14, Incomplete=0, > Source=/var/lib/pengine/pe-warn-39.bz2): Stopped > Mar 2 13:00:58 node2 crmd: [2665]: info: te_graph_trigger: Transition 817 is > now complete > Mar 2 13:00:58 node2 crmd: [2665]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=notify_crmd ] > Mar 2 13:00:58 node2 crmd: [2665]: info: do_state_transition: All 1 cluster > nodes are eligible to run resources. > Mar 2 13:00:58 node2 crmd: [2665]: info: do_pe_invoke: Query 1273: > Requesting the current CIB: S_POLICY_ENGINE > Mar 2 13:00:58 node2 crmd: [2665]: info: do_pe_invoke_callback: Invoking the > PE: query=1273, ref=pe_calc-dc-1330689658-1030, seq=404, quorate=0 > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Mar 2 13:00:58 node2 pengine: [2664]: WARN: pe_fence_node: Node node1 will > be fenced because it is un-expectedly down > Mar 2 13:00:58 node2 pengine: [2664]: WARN: determine_online_status: Node > node1 is unclean > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIParp_last_failure_0 found resource nodeIParp active on node2 > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > node1-stonith_last_failure_0 found resource node1-stonith active on node2 > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIParp_last_failure_0 found resource nodeIParp active on node1 > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > nodeIP_last_failure_0 found resource nodeIP active on node1 > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation > httpd_last_failure_0 found resource httpd active on node1 > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Action > nodeIP_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:58 node2 pengine: [2664]: notice: RecurringOp: Start recurring > monitor (60s) for nodeIP on node2 > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Action > nodeIParp_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Action > httpd_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:58 node2 pengine: [2664]: notice: RecurringOp: Start recurring > monitor (5s) for httpd on node2 > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Action > node2-stonith_stop_0 on node1 is unrunnable (offline) > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Marking node > node1 unclean > Mar 2 13:00:58 node2 pengine: [2664]: WARN: stage6: Scheduling Node node1 > for STONITH > Mar 2 13:00:58 node2 pengine: [2664]: notice: LogActions: Move > nodeIP#011(Started node1 -> node2) > Mar 2 13:00:58 node2 pengine: [2664]: notice: LogActions: Move > nodeIParp#011(Started node1 -> node2) > Mar 2 13:00:58 node2 pengine: [2664]: notice: LogActions: Move > httpd#011(Started node1 -> node2) > Mar 2 13:00:58 node2 pengine: [2664]: notice: LogActions: Leave > node1-stonith#011(Started node2) > Mar 2 13:00:58 node2 pengine: [2664]: notice: LogActions: Stop > node2-stonith#011(node1) > Mar 2 13:00:58 node2 pengine: [2664]: WARN: process_pe_message: Transition > 818: WARNINGs found during PE processing. PEngine Input stored in: > /var/lib/pengine/pe-warn-39.bz2 > Mar 2 13:00:58 node2 pengine: [2664]: notice: process_pe_message: > Configuration WARNINGs found during PE processing. Please run "crm_verify > -L" to identify issues. > Mar 2 13:00:58 node2 crmd: [2665]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Mar 2 13:00:58 node2 crmd: [2665]: info: unpack_graph: Unpacked transition > 818: 17 actions in 17 synapses > Mar 2 13:00:58 node2 crmd: [2665]: info: do_te_invoke: Processing graph 818 > (ref=pe_calc-dc-1330689658-1030) derived from /var/lib/pengine/pe-warn-39.bz2 > Mar 2 13:00:58 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 18 > fired and confirmed > Mar 2 13:00:58 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 19 > fired and confirmed > Mar 2 13:00:58 node2 crmd: [2665]: info: te_fence_node: Executing reboot > fencing operation (21) on node1 (timeout=60000) > Mar 2 13:00:58 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: > Initiating remote operation reboot for node1: > 3325df94-8d59-4c00-a37e-be31e79f7503 > Mar 2 13:00:58 node2 stonith-ng: [2638]: info: stonith_command: Processed > st_query from node2: rc=0 > <----------------------------------------------- snip > ---------------------------------------> > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org