Re: [Linux-HA] R2 Two-node apache cluster with STONITH

Dejan Muhamedagic Wed, 04 Apr 2007 04:31:58 -0700

On Tue, Apr 03, 2007 at 03:52:37PM -0400, Bjorn Oglefjorn wrote:
> Sorry Alan, I realize that this post is getting quite long.  Here is a run
> down of where I am currently.
> 
> STONITH is failing and I'm still not sure why.


Me neither. There's nothing in the logs apart from:

Mar 30 09:38:20 test-1 stonithd: [855]: info: Failed to STONITH
the node test-2.  domain: optype=1, op_result=2

What does op_result=2 mean?

I think I've already suggested that you do some debug output in
your (you implemented, right?) stonith RA.

This is also no good:

Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_add_running:native.c 
Resource stonith::external/drac4:test-1_DRAC appears to be active on 2 nodes.

You sure that drac4 always returns good status?

Timeouts: They are really way too short. I think that there was
already a discussion on the list about that. Normally, short
timeouts will work, but it may, and most probably it will, happen
that they don't. Then you'll wonder what the heck went wrong.

Intervals: Too often for my taste as I think I already said, but
hey, it's your cluster :)

Constraints: I'm not sure if you want to allow test-1_DRAC to run
on test-1. Setting score to INFINITY won't do if the prefered node
is down. It'd be better and more logical to go the other way
around: set -INFINITY for the nodes where it shouldn't run.

And please go on and read again
http://linux-ha.org/ReportingProblems
It says, in a Bold type:

    Please don't send "cleaned up" logs.

And there ain't no running CIB either.

> I'm quite sure the plugin is
> working properly and my cib.xml seems sane enough.  I'm having a hard time
> digesting the logs and understanding where the trouble is coming from.
> Attached again are my cib.xml and the logs from when heartbeat noticed
> test-2 was dead all the way up to STONITH failing for the second time...it
> loops and tries to STONTIH the dead node indefinitely.  It almost seems as
> if heartbeat is 'restarting', but like I said, I'm having a hard time
> getting useful information from the logs.
> 
> Sorry if I'm being a pest, but I really don't like not figuring stuff out.
> Let me know if you need more info.
> --BO
> 
> On 4/3/07, Alan Robertson <[EMAIL PROTECTED]> wrote:
> >
> >Bjorn Oglefjorn wrote:
> >> Anyone? Help?
> >> --BO
> >>
> >> On 4/2/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:
> >>>
> >>> Any ideas as to what's going wrong here?
> >
> >there is so much send/reply/try/fail/fix stuff in the email that I had
> >trouble following what was going on.
> >
> >Could you try reposting this cleanly and explain what symptoms you're
> >seeing?  I just saw "it doesn't work", and that's not very helpful.
> >
> >> See also: http://linux-ha.org/ReportingProblems
> >
> >
> >--
> >    Alan Robertson <[EMAIL PROTECTED]>
> >
> >"Openness is the foundation and preservative of friendship...  Let me
> >claim from you at all times your undisguised opinions." - William
> >Wilberforce
> >_______________________________________________
> >Linux-HA mailing list
> >Linux-HA@lists.linux-ha.org
> >http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >See also: http://linux-ha.org/ReportingProblems
> >

> Mar 30 09:37:56 test-1 heartbeat: [836]: WARN: node test-2.domain: is dead
> Mar 30 09:37:56 test-1 heartbeat: [836]: info: Link test-2.domain:eth0 dead.
> Mar 30 09:37:56 test-1 crmd: [857]: notice: 
> crmd_ha_status_callback:callbacks.c Status update: Node test-2.domain now has 
> status [dead]
> Mar 30 09:37:56 test-1 crmd: [857]: info: mem_handle_event: Got an event 
> OC_EV_MS_NOT_PRIMARY from ccm
> Mar 30 09:37:56 test-1 crmd: [857]: info: mem_handle_event: instance=2, 
> nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
> Mar 30 09:37:56 test-1 crmd: [857]: info: crmd_ccm_msg_callback:callbacks.c 
> Quorum lost after event=NOT PRIMARY (id=2)
> Mar 30 09:37:56 test-1 cib: [853]: info: activateCibXml:io.c CIB size is 
> 148544 bytes (was 153264)
> Mar 30 09:37:56 test-1 cib: [853]: info: cib_diff_notify:notify.c Local-only 
> Change (client:857, call: 24): 0.4.134 (ok)
> Mar 30 09:37:56 test-1 cib: [853]: info: mem_handle_event: Got an event 
> OC_EV_MS_NOT_PRIMARY from ccm
> Mar 30 09:37:56 test-1 cib: [853]: info: mem_handle_event: instance=2, 
> nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
> Mar 30 09:37:56 test-1 cib: [973]: info: write_cib_contents:io.c Wrote 
> version 0.4.134 of the CIB to disk (digest: 9c22a56909f0417ecc1f3af561f3523a)
> Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: no mbr_track info
> Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: no mbr_track info
> Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: instance=3, 
> nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: instance=3, 
> nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> Mar 30 09:38:01 test-1 crmd: [857]: info: crmd_ccm_msg_callback:callbacks.c 
> Quorum (re)attained after event=NEW MEMBERSHIP (id=3)
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_ccm_msg_callback:callbacks.c 
> LOST: test-2.domain
> Mar 30 09:38:01 test-1 crmd: [857]: WARN: check_dead_member:ccm.c Our DC node 
> (test-2.domain) left the cluster
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_ccm_msg_callback:callbacks.c 
> PEER: test-1.domain
> Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c NEW 
> MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
> Mar 30 09:38:01 test-1 ccm: [852]: info: Break tie for 2 nodes cluster
> Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c      
> CURRENT: test-1.domain [nodeid=0, born=3]
> Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c      LOST:   
>  test-2.domain [nodeid=1, born=1]
> Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c 
> test-1.domain: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION 
> cause=C_FSA_INTERNAL origin=check_dead_member ]
> Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC to <null> 
> (<null>)
> Mar 30 09:38:01 test-1 crmd: [857]: info: do_election_count_vote:election.c 
> Updated voted hash for test-1.domain to vote
> Mar 30 09:38:01 test-1 crmd: [857]: info: do_election_count_vote:election.c 
> Election ignore: our vote (test-1.domain)
> Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c 
> test-1.domain: State transition S_ELECTION -> S_INTEGRATION [ 
> input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> Mar 30 09:38:01 test-1 crmd: [857]: info: start_subsystem:subsystems.c 
> Starting sub-system "tengine"
> Mar 30 09:38:01 test-1 crmd: [857]: info: start_subsystem:subsystems.c 
> Starting sub-system "pengine"
> Mar 30 09:38:01 test-1 tengine: [974]: info: G_main_add_SignalHandler: Added 
> signal handler for signal 15
> Mar 30 09:38:01 test-1 tengine: [974]: info: G_main_add_TriggerHandler: Added 
> signal manual handler
> Mar 30 09:38:01 test-1 crmd: [857]: info: do_dc_takeover:election.c Taking 
> over DC status for this partition
> Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC to <null> 
> (<null>)
> Mar 30 09:38:01 test-1 crmd: [857]: info: do_dc_join_offer_all:join_dc.c 
> join-1: Waiting on 1 outstanding join acks
> Mar 30 09:38:01 test-1 pengine: [975]: info: G_main_add_SignalHandler: Added 
> signal handler for signal 15
> Mar 30 09:38:01 test-1 pengine: [975]: info: init_start:main.c Starting 
> pengine
> Mar 30 09:38:01 test-1 cib: [853]: info: activateCibXml:io.c CIB size is 
> 148648 bytes (was 148544)
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.c Local-only 
> Change (client:857, call: 25): 0.4.134 (ok)
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_process_readwrite:messages.c We 
> are now in R/W mode
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:28): 0.4.134 -> 0.4.135 (ok)
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_null_callback:callbacks.c 
> Setting cib_diff_notify callbacks for tengine: on
> Mar 30 09:38:01 test-1 tengine: [974]: info: init_start:main.c Registering TE 
> UUID: 7ec7d2e0-ae10-4810-a67a-73119ab6855f
> Mar 30 09:38:01 test-1 tengine: [974]: info: set_graph_functions:utils.c 
> Setting custom graph functions
> Mar 30 09:38:01 test-1 tengine: [974]: info: unpack_graph:unpack.c Unpacked 
> transition -1: 0 actions in 0 synapses
> Mar 30 09:38:01 test-1 tengine: [974]: info: init_start:main.c Starting 
> tengine
> Mar 30 09:38:01 test-1 cib: [976]: info: write_cib_contents:io.c Wrote 
> version 0.4.135 of the CIB to disk (digest: 676f407011dd2084813f1c414db97ce5)
> Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC to 
> test-1.domain (1.0.6)
> Mar 30 09:38:01 test-1 cib: [853]: info: sync_our_cib:messages.c Syncing CIB 
> to all peers
> Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c 
> test-1.domain: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ 
> input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> Mar 30 09:38:01 test-1 attrd: [856]: info: attrd_local_callback:attrd.c 
> Sending full refresh
> Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c All 1 
> cluster nodes responded to the join offer.
> Mar 30 09:38:01 test-1 crmd: [857]: info: update_attrd:join_dc.c Connecting 
> to attrd...
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:31): 0.4.135 -> 0.4.136 (ok)
> Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.4.135 -> 0.4.136
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:32): 0.4.136 -> 0.5.137 (ok)
> Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_bump): 0.4.136 -> 0.5.137
> Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:33): 0.5.137 -> 0.5.138 (ok)
> Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.137 -> 0.5.138
> Mar 30 09:38:01 test-1 cib: [977]: info: write_cib_contents:io.c Wrote 
> version 0.5.138 of the CIB to disk (digest: ff63aedd166b5e7c8880f380b334321e)
> Mar 30 09:38:02 test-1 crmd: [857]: info: update_dc:utils.c Set DC to 
> test-1.domain (1.0.6)
> Mar 30 09:38:02 test-1 crmd: [857]: info: do_dc_join_ack:join_dc.c join-1: 
> Updating node state to member for test-1.domain)
> Mar 30 09:38:02 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:34): 0.5.138 -> 0.5.139 (ok)
> Mar 30 09:38:02 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.138 -> 0.5.139
> Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c 
> test-1.domain: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ 
> input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
> Mar 30 09:38:02 test-1 tengine: [974]: info: update_abort_priority:utils.c 
> Abort priority upgraded to 1000000
> Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c All 1 
> cluster nodes are eligable to run resources.
> Mar 30 09:38:02 test-1 cib: [978]: info: write_cib_contents:io.c Wrote 
> version 0.5.139 of the CIB to disk (digest: 86e0ea1b0a56ca84cf3d1f0e261f7816)
> Mar 30 09:38:02 test-1 pengine: [975]: info: process_pe_message: [generation] 
> <cib admin_epoch="0" have_quorum="true" num_peers="2" 
> cib_feature_revision="1.3" generated="true" ccm_transition="3" 
> dc_uuid="b3bba1ca-b072-49ac-8e93-a2c6fbf4678e" epoch="5" num_updates="139"/>
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c Default 
> stickiness: 1000
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c Default 
> failure stickiness: -400
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c STONITH 
> of failed nodes is enabled
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c STONITH 
> will reboot nodes
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c Cluster 
> is symmetric - resources can run anywhere by default
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c On loss 
> of CCM Quorum: Stop ALL resources
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c Orphan 
> resources are stopped
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c Orphan 
> resource actions are stopped
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c Stopped 
> resources are removed from the status section: true
> Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c By 
> default resources are managed
> Mar 30 09:38:02 test-1 pengine: [975]: info: determine_online_status:unpack.c 
> Node test-1.domain is online
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: 
> determine_online_status_fencing:unpack.c Node test-2.domain 
> (d822d67b-5495-47c9-bdb9-f4f66e0bea85) is un-expectedly down
> Mar 30 09:38:02 test-1 pengine: [975]: info: 
> determine_online_status_fencing:unpack.c         ha_state=dead, 
> ccm_state=false, crm_state=online, join_state=down, expected=member
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: determine_online_status:unpack.c 
> Node test-2.domain is unclean
> Mar 30 09:38:02 test-1 pengine: [975]: info: Resource Group: test_group
> Mar 30 09:38:02 test-1 pengine: [975]: info:     test_IP      
> (heartbeat::ocf:IPaddr):        Started test-2.domain
> Mar 30 09:38:02 test-1 pengine: [975]: info:     httpd        
> (heartbeat::ocf:apache):        Started test-2.domain
> Mar 30 09:38:02 test-1 pengine: [975]: info: test-1_DRAC      
> (stonith:external/drac4):       Started test-2.domain
> Mar 30 09:38:02 test-1 pengine: [975]: info: test-2_DRAC      
> (stonith:external/drac4):       Started test-1.domain
> Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource test_IP   (test-2.domain -> test-1.domain)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> test_IP_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          test_IP_monitor_10000
> Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource httpd     (test-2.domain -> test-1.domain)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> httpd_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          httpd_monitor_10000
> Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource test-1_DRAC       (test-2.domain -> test-1.domain)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> test-1_DRAC_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:02 test-1 pengine: [975]: notice: StopRsc:native.c   
> test-1.domain       Stop test-2_DRAC
> Mar 30 09:38:02 test-1 pengine: [975]: notice: DeleteRsc:native.c Removing 
> test-2_DRAC from test-1.domain
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: stage6:allocate.c Scheduling 
> Node test-2.domain for STONITH
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: native_stop_constraints:native.c 
> Stop of failed resource test_IP is implict after test-2.domain is fenced
> Mar 30 09:38:02 test-1 pengine: [975]: info: native_stop_constraints:native.c 
> Re-creating actions for test_group
> Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource test_IP   (test-2.domain -> test-1.domain)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> test_IP_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          test_IP_monitor_10000
> Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource httpd     (test-2.domain -> test-1.domain)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> httpd_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          httpd_monitor_10000
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: native_stop_constraints:native.c 
> Stop of failed resource httpd is implict after test-2.domain is fenced
> Mar 30 09:38:02 test-1 pengine: [975]: info: native_stop_constraints:native.c 
> Re-creating actions for test_group
> Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource test_IP   (test-2.domain -> test-1.domain)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> test_IP_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          test_IP_monitor_10000
> Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource httpd     (test-2.domain -> test-1.domain)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> httpd_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          httpd_monitor_10000
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: native_stop_constraints:native.c 
> Stop of failed resource test-1_DRAC is implict after test-2.domain is fenced
> Mar 30 09:38:02 test-1 pengine: [975]: notice: stage8:allocate.c Created 
> transition graph 0.
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: process_pe_message:pengine.c No 
> value specified for cluster preference: pe-warn-series-max
> Mar 30 09:38:02 test-1 pengine: [975]: WARN: process_pe_message:pengine.c 
> Transition 0: WARNINGs found during PE processing. PEngine Input stored in: 
> /var/lib/heartbeat/pengine/pe-warn-45.bz2
> Mar 30 09:38:02 test-1 pengine: [975]: info: process_pe_message:pengine.c 
> Configuration WARNINGs found during PE processing.  Please run "crm_verify 
> -L" to identify issues.
> Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c 
> test-1.domain: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> Mar 30 09:38:02 test-1 tengine: [974]: info: unpack_graph:unpack.c Unpacked 
> transition 0: 14 actions in 14 synapses
> Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 9 confirmed
> Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 6 confirmed
> Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 13 confirmed
> Mar 30 09:38:02 test-1 tengine: [974]: info: send_rsc_command:actions.c 
> Initiating action 14: test-1_DRAC_start_0 on test-1.domain
> Mar 30 09:38:02 test-1 tengine: [974]: info: send_rsc_command:actions.c 
> Initiating action 15: test-2_DRAC_stop_0 on test-1.domain
> Mar 30 09:38:02 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.c Performing op 
> start on test-1_DRAC (interval=0ms, 
> key=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 3 confirmed
> Mar 30 09:38:02 test-1 lrmd: [979]: info: Try to start STONITH resource 
> <rsc_id=test-1_DRAC> : Device=external/drac4
> Mar 30 09:38:02 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.c Performing op 
> stop on test-2_DRAC (interval=0ms, key=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> Mar 30 09:38:03 test-1 lrmd: [981]: info: Try to stop STONITH resource 
> <rsc_id=test-2_DRAC> : Device=external/drac4
> Mar 30 09:38:03 test-1 crmd: [857]: info: process_lrm_event:lrm.c LRM 
> operation (8) stop_0 on test-2_DRAC complete
> Mar 30 09:38:03 test-1 cib: [853]: info: activateCibXml:io.c CIB size is 
> 150940 bytes (was 148648)
> Mar 30 09:38:03 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:36): 0.5.139 -> 0.5.140 (ok)
> Mar 30 09:38:03 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.139 -> 0.5.140
> Mar 30 09:38:03 test-1 tengine: [974]: info: match_graph_event:events.c 
> Action test-2_DRAC_stop_0 (15) confirmed
> Mar 30 09:38:03 test-1 tengine: [974]: info: send_rsc_command:actions.c 
> Initiating action 16: test-2_DRAC_delete_0 on test-1.domain
> Mar 30 09:38:03 test-1 crmd: [857]: info: do_lrm_invoke:lrm.c Removing 
> resource test-2_DRAC from the LRM
> Mar 30 09:38:03 test-1 crmd: [857]: info: send_direct_ack:lrm.c ACK'ing 
> resource op: delete for test-2_DRAC
> Mar 30 09:38:03 test-1 tengine: [974]: info: match_graph_event:events.c 
> Action test-2_DRAC_delete_0 (16) confirmed
> Mar 30 09:38:03 test-1 tengine: [974]: info: te_crm_command:actions.c 
> Executing crm-event (17): lrm_refresh on test-1.domain
> Mar 30 09:38:03 test-1 tengine: [974]: info: te_crm_command:actions.c 
> Skipping wait for 17
> Mar 30 09:38:03 test-1 crmd: [857]: WARN: msg_to_op(1151): failed to get the 
> value of field lrm_opstatus from a ha_msg
> Mar 30 09:38:03 test-1 crmd: [857]: info: msg_to_op: Message follows:
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG: Dumping message with 13 fields
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[0] : [lrm_t=op]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[1] : [lrm_rid=test-1_DRAC]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[2] : [lrm_op=start]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[3] : [lrm_timeout=30000]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[4] : [lrm_interval=0]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[5] : [lrm_delay=0]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[6] : [lrm_targetrc=-1]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[7] : [lrm_app=crmd]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[8] : 
> [lrm_userdata=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[9] : 
> [(2)lrm_param=0x907df78(164 198)]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG: Dumping message with 6 fields
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[0] : 
> [DRAC_ADDR=test-1.drac.domain]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[1] : [CRM_meta_op_target_rc=7]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[2] : [DRAC_LOGIN=root]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[3] : [DRAC_PASSWD=********]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[4] : [CRM_meta_timeout=30000]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[5] : [crm_feature_set=1.0.6]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[10] : [lrm_callid=7]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[11] : [lrm_app=crmd]
> Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[12] : [lrm_callid=7]
> Mar 30 09:38:03 test-1 crmd: [857]: info: do_lrm_invoke:lrm.c Forcing a local 
> LRM refresh
> Mar 30 09:38:03 test-1 cib: [985]: info: write_cib_contents:io.c Wrote 
> version 0.5.140 of the CIB to disk (digest: da245127f5a42d016fecac8050eb99a1)
> Mar 30 09:38:03 test-1 cib: [853]: info: activateCibXml:io.c CIB size is 
> 144888 bytes (was 150940)
> Mar 30 09:38:03 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:37): 0.5.140 -> 0.5.141 (ok)
> Mar 30 09:38:03 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.140 -> 0.5.141
> Mar 30 09:38:03 test-1 cib: [986]: info: write_cib_contents:io.c Wrote 
> version 0.5.141 of the CIB to disk (digest: da6197ce54aa4e2715dd524f8f2a40c5)
> Mar 30 09:38:05 test-1 stonithd: [855]: WARN: G_SIG_dispatch: Dispatch 
> function for SIGCHLD took too long to execute: 350 ms (> 10 ms) (GSource: 
> 0x929b820)
> Mar 30 09:38:05 test-1 crmd: [857]: info: process_lrm_event:lrm.c LRM 
> operation (7) start_0 on test-1_DRAC complete
> Mar 30 09:38:05 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:38): 0.5.141 -> 0.5.142 (ok)
> Mar 30 09:38:05 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.141 -> 0.5.142
> Mar 30 09:38:05 test-1 tengine: [974]: info: match_graph_event:events.c 
> Action test-1_DRAC_start_0 (14) confirmed
> Mar 30 09:38:05 test-1 tengine: [974]: info: te_fence_node:actions.c 
> Executing reboot fencing operation (18) on test-2.domain (timeout=15000)
> Mar 30 09:38:05 test-1 stonithd: [855]: info: client tengine [pid: 974] want 
> a STONITH operation RESET to node test-2.domain.
> Mar 30 09:38:05 test-1 stonithd: [855]: info: Broadcasting the message 
> succeeded: require others to stonith node test-2.domain.
> Mar 30 09:38:05 test-1 cib: [988]: info: write_cib_contents:io.c Wrote 
> version 0.5.142 of the CIB to disk (digest: d5d12bf8f6bdd66647f4dc5cc27adb6e)
> Mar 30 09:38:20 test-1 stonithd: [855]: info: Failed to STONITH the node 
> test-2.domain: optype=1, op_result=2
> Mar 30 09:38:20 test-1 tengine: [974]: info: 
> tengine_stonith_callback:callbacks.c call=-2, optype=1, 
> node_name=test-2.domain, result=2, node_list=, 
> action=18;0:7ec7d2e0-ae10-4810-a67a-73119ab6855f
> Mar 30 09:38:20 test-1 tengine: [974]: ERROR: 
> tengine_stonith_callback:callbacks.c Stonith of test-2.domain failed (2)... 
> aborting transition.
> Mar 30 09:38:20 test-1 tengine: [974]: info: update_abort_priority:utils.c 
> Abort priority upgraded to 1000000
> Mar 30 09:38:20 test-1 tengine: [974]: info: update_abort_priority:utils.c 
> Abort action 0 superceeded by 2
> Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.c 
> test-1.domain: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ 
> input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> Mar 30 09:38:20 test-1 tengine: [974]: info: run_graph:graph.c 
> ====================================================
> Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.c All 1 
> cluster nodes are eligable to run resources.
> Mar 30 09:38:20 test-1 tengine: [974]: notice: run_graph:graph.c Transition 
> 0: (Complete=9, Pending=0, Fired=0, Skipped=5, Incomplete=0)
> Mar 30 09:38:20 test-1 pengine: [975]: info: process_pe_message: [generation] 
> <cib admin_epoch="0" have_quorum="true" num_peers="2" 
> cib_feature_revision="1.3" generated="true" ccm_transition="3" 
> dc_uuid="b3bba1ca-b072-49ac-8e93-a2c6fbf4678e" epoch="5" num_updates="142"/>
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c Default 
> stickiness: 1000
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c Default 
> failure stickiness: -400
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c STONITH 
> of failed nodes is enabled
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c STONITH 
> will reboot nodes
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c Cluster 
> is symmetric - resources can run anywhere by default
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c On loss 
> of CCM Quorum: Stop ALL resources
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c Orphan 
> resources are stopped
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c Orphan 
> resource actions are stopped
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c Stopped 
> resources are removed from the status section: true
> Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c By 
> default resources are managed
> Mar 30 09:38:20 test-1 pengine: [975]: info: determine_online_status:unpack.c 
> Node test-1.domain is online
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: 
> determine_online_status_fencing:unpack.c Node test-2.domain 
> (d822d67b-5495-47c9-bdb9-f4f66e0bea85) is un-expectedly down
> Mar 30 09:38:20 test-1 pengine: [975]: info: 
> determine_online_status_fencing:unpack.c         ha_state=dead, 
> ccm_state=false, crm_state=online, join_state=down, expected=member
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: determine_online_status:unpack.c 
> Node test-2.domain is unclean
> Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_add_running:native.c 
> Resource stonith::external/drac4:test-1_DRAC appears to be active on 2 nodes.
> Mar 30 09:38:20 test-1 pengine: [975]: ERROR: See 
> http://linux-ha.org/v2/faq/resource_too_active for more information.
> Mar 30 09:38:20 test-1 pengine: [975]: info: Resource Group: test_group
> Mar 30 09:38:20 test-1 pengine: [975]: info:     test_IP      
> (heartbeat::ocf:IPaddr):        Started test-2.domain
> Mar 30 09:38:20 test-1 pengine: [975]: info:     httpd        
> (heartbeat::ocf:apache):        Started test-2.domain
> Mar 30 09:38:20 test-1 pengine: [975]: info: test-1_DRAC      
> (stonith:external/drac4)
> Mar 30 09:38:20 test-1 pengine: [975]: info:  0 : test-1.domain
> Mar 30 09:38:20 test-1 pengine: [975]: info:  1 : test-2.domain
> Mar 30 09:38:20 test-1 pengine: [975]: info: test-2_DRAC      
> (stonith:external/drac4):       Stopped 
> Mar 30 09:38:20 test-1 pengine: [975]: notice: native_create_probe:native.c 
> test-1.domain: Created probe for test-2_DRAC
> Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource test_IP   (test-2.domain -> test-1.domain)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> test_IP_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          test_IP_monitor_10000
> Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource httpd     (test-2.domain -> test-1.domain)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> httpd_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          httpd_monitor_10000
> Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_create_actions:native.c 
> Attempting recovery of resource test-1_DRAC
> Mar 30 09:38:20 test-1 pengine: [975]: notice: StopRsc:native.c   
> test-1.domain       Stop test-1_DRAC
> Mar 30 09:38:20 test-1 pengine: [975]: notice: DeleteRsc:native.c Removing 
> test-1_DRAC from test-1.domain
> Mar 30 09:38:20 test-1 pengine: [975]: notice: StopRsc:native.c   
> test-2.domain       Stop test-1_DRAC
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> test-1_DRAC_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:20 test-1 pengine: [975]: notice: StartRsc:native.c  
> test-1.domain       Start test-1_DRAC
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: stage6:allocate.c Scheduling 
> Node test-2.domain for STONITH
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: native_stop_constraints:native.c 
> Stop of failed resource test_IP is implict after test-2.domain is fenced
> Mar 30 09:38:20 test-1 pengine: [975]: info: native_stop_constraints:native.c 
> Re-creating actions for test_group
> Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource test_IP   (test-2.domain -> test-1.domain)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> test_IP_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          test_IP_monitor_10000
> Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource httpd     (test-2.domain -> test-1.domain)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> httpd_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          httpd_monitor_10000
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: native_stop_constraints:native.c 
> Stop of failed resource httpd is implict after test-2.domain is fenced
> Mar 30 09:38:20 test-1 pengine: [975]: info: native_stop_constraints:native.c 
> Re-creating actions for test_group
> Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource test_IP   (test-2.domain -> test-1.domain)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> test_IP_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          test_IP_monitor_10000
> Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.c Move  
> resource httpd     (test-2.domain -> test-1.domain)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Action 
> httpd_stop_0 on test-2.domain is unrunnable (offline)
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.c Marking 
> node test-2.domain unclean
> Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c 
> test-1.domain          httpd_monitor_10000
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: native_stop_constraints:native.c 
> Stop of failed resource test-1_DRAC is implict after test-2.domain is fenced
> Mar 30 09:38:20 test-1 pengine: [975]: notice: stage8:allocate.c Created 
> transition graph 1.
> Mar 30 09:38:20 test-1 pengine: [975]: WARN: process_pe_message:pengine.c No 
> value specified for cluster preference: pe-error-series-max
> Mar 30 09:38:20 test-1 pengine: [975]: ERROR: process_pe_message:pengine.c 
> Transition 1: ERRORs found during PE processing. PEngine Input stored in: 
> /var/lib/heartbeat/pengine/pe-error-128.bz2
> Mar 30 09:38:20 test-1 pengine: [975]: info: process_pe_message:pengine.c 
> Configuration WARNINGs found during PE processing.  Please run "crm_verify 
> -L" to identify issues.
> Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.c 
> test-1.domain: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> Mar 30 09:38:20 test-1 tengine: [974]: info: unpack_graph:unpack.c Unpacked 
> transition 1: 17 actions in 17 synapses
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 10 confirmed
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 7 confirmed
> Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:actions.c 
> Initiating action 14: test-1_DRAC_stop_0 on test-1.domain
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 17 confirmed
> Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:actions.c 
> Initiating action 3: test-2_DRAC_monitor_0 on test-1.domain
> Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.c Performing op 
> stop on test-1_DRAC (interval=0ms, key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 4 confirmed
> Mar 30 09:38:21 test-1 lrmd: [989]: info: Try to stop STONITH resource 
> <rsc_id=test-1_DRAC> : Device=external/drac4
> Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.c Performing op 
> monitor on test-2_DRAC (interval=0ms, 
> key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> Mar 30 09:38:21 test-1 crmd: [857]: info: process_lrm_event:lrm.c LRM 
> operation (9) stop_0 on test-1_DRAC complete
> Mar 30 09:38:21 test-1 crmd: [857]: info: process_lrm_event:lrm.c LRM 
> operation (10) monitor_0 on test-2_DRAC Error: (7) not running
> Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml:io.c CIB size is 
> 147180 bytes (was 144888)
> Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:40): 0.5.142 -> 0.5.143 (ok)
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.142 -> 0.5.143
> Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:events.c 
> Action test-1_DRAC_stop_0 (14) confirmed
> Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:actions.c 
> Initiating action 15: test-1_DRAC_delete_0 on test-1.domain
> Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_invoke:lrm.c Removing 
> resource test-1_DRAC from the LRM
> Mar 30 09:38:21 test-1 crmd: [857]: info: send_direct_ack:lrm.c ACK'ing 
> resource op: delete for test-1_DRAC
> Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:events.c 
> Action test-1_DRAC_delete_0 (15) confirmed
> Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:actions.c 
> Initiating action 18: test-1_DRAC_start_0 on test-1.domain
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_crm_command:actions.c 
> Executing crm-event (16): lrm_refresh on test-1.domain
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_crm_command:actions.c 
> Skipping wait for 16
> Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.c Performing op 
> start on test-1_DRAC (interval=0ms, 
> key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> Mar 30 09:38:21 test-1 lrmd: [993]: info: Try to start STONITH resource 
> <rsc_id=test-1_DRAC> : Device=external/drac4
> Mar 30 09:38:21 test-1 crmd: [857]: WARN: msg_to_op(1151): failed to get the 
> value of field lrm_opstatus from a ha_msg
> Mar 30 09:38:21 test-1 crmd: [857]: info: msg_to_op: Message follows:
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG: Dumping message with 13 fields
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[0] : [lrm_t=op]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[1] : [lrm_rid=test-1_DRAC]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[2] : [lrm_op=start]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[3] : [lrm_timeout=30000]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[4] : [lrm_interval=0]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[5] : [lrm_delay=0]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[6] : [lrm_targetrc=-1]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[7] : [lrm_app=crmd]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[8] : 
> [lrm_userdata=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[9] : 
> [(2)lrm_param=0x90afd20(140 168)]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG: Dumping message with 5 fields
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[0] : 
> [DRAC_ADDR=test-1.drac.domain]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[1] : [DRAC_LOGIN=root]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[2] : [DRAC_PASSWD=********]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[3] : [CRM_meta_timeout=30000]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[4] : [crm_feature_set=1.0.6]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[10] : [lrm_callid=11]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[11] : [lrm_app=crmd]
> Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[12] : [lrm_callid=11]
> Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_invoke:lrm.c Forcing a local 
> LRM refresh
> Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml:io.c CIB size is 
> 150940 bytes (was 147180)
> Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:41): 0.5.143 -> 0.5.144 (ok)
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.143 -> 0.5.144
> Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:events.c 
> Action test-2_DRAC_monitor_0 (3) confirmed
> Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:actions.c 
> Initiating action 2: probe_complete on test-1.domain
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:actions.c 
> Pseudo action 1 confirmed
> Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml:io.c CIB size is 
> 146356 bytes (was 150940)
> Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:42): 0.5.144 -> 0.5.145 (ok)
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.144 -> 0.5.145
> Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:43): 0.5.145 -> 0.5.146 (ok)
> Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.145 -> 0.5.146
> Mar 30 09:38:21 test-1 cib: [995]: info: write_cib_contents:io.c Wrote 
> version 0.5.146 of the CIB to disk (digest: 53f6e570e78f4cf92213e89be4ae0573)
> Mar 30 09:38:24 test-1 stonithd: [855]: WARN: G_SIG_dispatch: Dispatch 
> function for SIGCHLD took too long to execute: 350 ms (> 10 ms) (GSource: 
> 0x929b820)
> Mar 30 09:38:24 test-1 crmd: [857]: info: process_lrm_event:lrm.c LRM 
> operation (11) start_0 on test-1_DRAC complete
> Mar 30 09:38:24 test-1 cib: [853]: info: cib_diff_notify:notify.c Update 
> (client: 857, call:44): 0.5.146 -> 0.5.147 (ok)
> Mar 30 09:38:24 test-1 tengine: [974]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.5.146 -> 0.5.147
> Mar 30 09:38:24 test-1 tengine: [974]: info: match_graph_event:events.c 
> Action test-1_DRAC_start_0 (18) confirmed
> Mar 30 09:38:24 test-1 tengine: [974]: info: te_fence_node:actions.c 
> Executing reboot fencing operation (19) on test-2.domain (timeout=15000)
> Mar 30 09:38:24 test-1 stonithd: [855]: info: client tengine [pid: 974] want 
> a STONITH operation RESET to node test-2.domain.
> Mar 30 09:38:24 test-1 stonithd: [855]: info: Broadcasting the message 
> succeeded: require others to stonith node test-2.domain.
> Mar 30 09:38:24 test-1 cib: [999]: info: write_cib_contents:io.c Wrote 
> version 0.5.147 of the CIB to disk (digest: 7978776db187ec2d2397921646f4053b)
> Mar 30 09:38:39 test-1 stonithd: [855]: info: Failed to STONITH the node 
> test-2.domain: optype=1, op_result=2
> Mar 30 09:38:39 test-1 tengine: [974]: info: 
> tengine_stonith_callback:callbacks.c call=-3, optype=1, 
> node_name=test-2.domain, result=2, node_list=, 
> action=19;1:7ec7d2e0-ae10-4810-a67a-73119ab6855f
> Mar 30 09:38:39 test-1 tengine: [974]: ERROR: 
> tengine_stonith_callback:callbacks.c Stonith of test-2.domain failed (2)... 
> aborting transition.
> Mar 30 09:38:39 test-1 tengine: [974]: info: update_abort_priority:utils.c 
> Abort priority upgraded to 1000000
> Mar 30 09:38:39 test-1 tengine: [974]: info: update_abort_priority:utils.c 
> Abort action 0 superceeded by 2
> Mar 30 09:38:39 test-1 crmd: [857]: info: do_state_transition:fsa.c 
> test-1.domain: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ 
> input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> Mar 30 09:38:39 test-1 tengine: [974]: info: run_graph:graph.c 
> ====================================================
>  <cib admin_epoch="0" epoch="0" num_updates="0">
>    <configuration>
>      <crm_config>
>        <cluster_property_set id="cib-bootstrap-options">
>          <attributes>
>            <nvpair id="cib-bootstrap-options-transition_idle_timeout" 
> name="transition_idle_timeout" value="5min"/>
>            <nvpair id="cib-bootstrap-options-stonith_enabled" 
> name="stonith_enabled" value="true"/>
>            <nvpair id="cib-bootstrap-options-default_resource_stickiness" 
> name="default_resource_stickiness" value="1000"/>
>            <nvpair id="cib-bootstrap-options-short_resource_names" 
> name="short_resource_names" value="true"/>
>            <nvpair 
> id="cib-bootstrap-options-default_resource_failure_stickiness" 
> name="default_resource_failure_stickiness" value="-400"/>
>            <nvpair id="cib-bootstrap-options-stonith_action" 
> name="stonith_action" value="reboot"/>
>            <nvpair id="cib-bootstrap-options-remove_after_stop" 
> name="remove_after_stop" value="true"/>
>            <nvpair id="cib-bootstrap-options-default_action_timeout" 
> name="default_action_timeout" value="30s"/>
>          <nvpair id="cib-bootstrap-options-symmetric_cluster" 
> name="symmetric_cluster" value="true"/>
>            <nvpair id="cib-bootstrap-options-no_quorum_policy" 
> name="no_quorum_policy" value="stop"/>
>            <nvpair id="cib-bootstrap-options-stop_orphan_resources" 
> name="stop_orphan_resources" value="true"/>
>            <nvpair id="cib-bootstrap-options-stop_orphan_actions" 
> name="stop_orphan_actions" value="true"/>
>            <nvpair id="cib-bootstrap-options-is_managed_default" 
> name="is_managed_default" value="true"/>
>          </attributes>
>        </cluster_property_set>
>      </crm_config>
>      <nodes/>
>      <resources>
>        <group id="test_group">
>          <primitive class="ocf" id="test_IP" provider="heartbeat" 
> type="IPaddr">
>            <operations>
>              <op id="test_IP_mon" interval="10s" name="monitor" timeout="9s" 
> on_fail="restart"/>
>              <op id="test_IP_start" name="start" timeout="20s" 
> on_fail="restart" prereq="fencing"/>
>              <op id="test_IP_stop" name="stop" timeout="20s" 
> on_fail="restart"/>
>            </operations>
>            <instance_attributes id="test_IP_inst_attr">
>              <attributes>
>                <nvpair id="test_IP_attr_0" name="ip" value="192.168.168.168"/>
>                <nvpair id="test_IP_attr_1" name="netmask" value="25"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>          <primitive class="ocf" id="httpd" provider="heartbeat" type="apache">
>            <operations>
>              <op id="httpd_mon" name="monitor" interval="10s" timeout="9s" 
> on_fail="restart"/>
>              <op id="httpd_start" name="start" timeout="20s" 
> on_fail="restart" prereq="fencing"/>
>              <op id="httpd_stop" name="stop" timeout="20s" on_fail="restart"/>
>            </operations>
>          </primitive>
>        </group>
>          <primitive id="test-1_DRAC" class="stonith" type="external/drac4" 
> provider="heartbeat">
>            <operations>
>              <op id="test-1_DRAC_reset" name="reset" timeout="3m" 
> prereq="nothing"/>
>            </operations>
>            <instance_attributes id="test-1_DRAC_inst_attr">
>              <attributes>
>                <nvpair id="test-1_DRAC_attr_0" name="DRAC_ADDR" 
> value="test-1.drac.domain"/>
>                <nvpair id="test-1_DRAC_attr_1" name="DRAC_LOGIN" 
> value="root"/>
>                <nvpair id="test-1_DRAC_attr_2" name="DRAC_PASSWD" 
> value="********"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>          <primitive id="test-2_DRAC" class="stonith" type="external/drac4" 
> provider="heartbeat">
>            <operations>
>              <op id="test-2_DRAC_reset" name="reset" timeout="3m" 
> prereq="nothing"/>
>            </operations>
>            <instance_attributes id="test-2_DRAC_inst_attr">
>              <attributes>
>                <nvpair id="test-2_DRAC_attr_0" name="DRAC_ADDR" 
> value="test-2.drac.domain"/>
>                <nvpair id="test-2_DRAC_attr_1" name="DRAC_LOGIN" 
> value="root"/>
>                <nvpair id="test-2_DRAC_attr_2" name="DRAC_PASSWD" 
> value="********"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>      </resources>
>      <constraints>
>        <rsc_location id="test_group_location_test-1" rsc="test_group">
>          <rule id="prefered_location_test_group_test-1" score="1000">
>            <expression attribute="#uname" 
> id="prefered_location_test_group_test-1_expr_1" operation="eq" 
> value="test-1.domain"/>
>          </rule>
>        </rsc_location>
>        <rsc_location id="test_group_location_test-2" rsc="test_group">
>          <rule id="prefered_location_test_group_test-2" score="1000">
>            <expression attribute="#uname" 
> id="prefered_location_test_group_test-2_expr_1" operation="eq" 
> value="test-2.domain"/>
>          </rule>
>        </rsc_location>
>        <rsc_location id="test-1_DRAC_location" rsc="test-1_DRAC">
>          <rule id="prefered_location_test-1_DRAC" score="INFINITY">
>            <expression attribute="#uname" 
> id="prefered_location_test-1_DRAC_expr_1" operation="eq" 
> value="test-2.domain"/>
>          </rule>
>        </rsc_location>
>        <rsc_location id="test-2_DRAC_location" rsc="test-2_DRAC">
>          <rule id="prefered_location_test-2_DRAC" score="INFINITY">
>            <expression attribute="#uname" 
> id="prefered_location_test-2_DRAC_expr_1" operation="eq" 
> value="test-1.domain"/>
>          </rule>
>        </rsc_location>
>        <rsc_order id="Ord_websrv" from="httpd" type="after" to="test_IP"/>
>        <rsc_colocation id="Colo_not_same_DRAC" from="test-1_DRAC" 
> to="test-2_DRAC" score="-INFINITY"/>
>      </constraints>
>    </configuration>
>    <status/>
>  </cib>


> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
Dejan
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

Reply via email to