Re: [Linux-HA] R2 Two-node apache cluster with STONITH

Bjorn Oglefjorn Mon, 09 Apr 2007 06:04:26 -0700

Any ideas?
--BO

On 4/4/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:


I do not know what op_result=2 means.  I can only say that the drac4 RA
will never have exit code 2.  I am sure that the drac4 RA works as expected
in all use cases and also when called via the stonith command from the
command line.

*Timeouts and Intervals*:  I've increased these per your recommendations.
Please tell me if they are better now.
Constraints: That definitely makes more sense and I've updated to reflect
this.

As for the "cleaned up logs", the attached logs are full and have debug
information.  The only "cleaning" performed was to remove any sensitive
company information.  This is mandated and I must do so.

Attached you will find ha.cf, heartbeat logs (with debug) from both nodes,
the output of cibadmin -Ql at serveral points of execution, the output of
debug logging from the drac4 RA.

I hope that this will be helpful.  Thanks for the continued help.
--BO

On 4/4/07, Dejan Muhamedagic <[EMAIL PROTECTED] > wrote:
>
> On Tue, Apr 03, 2007 at 03:52:37PM -0400, Bjorn Oglefjorn wrote:
> > Sorry Alan, I realize that this post is getting quite long.  Here is a
> run
> > down of where I am currently.
> >
> > STONITH is failing and I'm still not sure why.
>
> Me neither. There's nothing in the logs apart from:
>
> Mar 30 09:38:20 test-1 stonithd: [855]: info: Failed to STONITH
> the node test-2.  domain: optype=1, op_result=2
>
> What does op_result=2 mean?
>
> I think I've already suggested that you do some debug output in
> your (you implemented, right?) stonith RA.
>
> This is also no good:
>
> Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_add_running:
> native.c Resource stonith::external/drac4:test-1_DRAC appears to be
> active on 2 nodes.
>
> You sure that drac4 always returns good status?
>
> Timeouts: They are really way too short. I think that there was
> already a discussion on the list about that. Normally, short
> timeouts will work, but it may, and most probably it will, happen
> that they don't. Then you'll wonder what the heck went wrong.
>
> Intervals: Too often for my taste as I think I already said, but
> hey, it's your cluster :)
>
> Constraints: I'm not sure if you want to allow test-1_DRAC to run
> on test-1. Setting score to INFINITY won't do if the prefered node
> is down. It'd be better and more logical to go the other way
> around: set -INFINITY for the nodes where it shouldn't run.
>
> And please go on and read again
> http://linux-ha.org/ReportingProblems
> It says, in a Bold type:
>
>     Please don't send "cleaned up" logs.
>
> And there ain't no running CIB either.
>
> > I'm quite sure the plugin is
> > working properly and my cib.xml seems sane enough.  I'm having a hard
> time
> > digesting the logs and understanding where the trouble is coming from.
> > Attached again are my cib.xml and the logs from when heartbeat noticed
>
> > test-2 was dead all the way up to STONITH failing for the second
> time...it
> > loops and tries to STONTIH the dead node indefinitely.  It almost
> seems as
> > if heartbeat is 'restarting', but like I said, I'm having a hard time
> > getting useful information from the logs.
> >
> > Sorry if I'm being a pest, but I really don't like not figuring stuff
> out.
> > Let me know if you need more info.
> > --BO
> >
> > On 4/3/07, Alan Robertson < [EMAIL PROTECTED]> wrote:
> > >
> > >Bjorn Oglefjorn wrote:
> > >> Anyone? Help?
> > >> --BO
> > >>
> > >> On 4/2/07, Bjorn Oglefjorn < [EMAIL PROTECTED]> wrote:
> > >>>
> > >>> Any ideas as to what's going wrong here?
> > >
> > >there is so much send/reply/try/fail/fix stuff in the email that I
> had
> > >trouble following what was going on.
> > >
> > >Could you try reposting this cleanly and explain what symptoms you're
> > >seeing?  I just saw "it doesn't work", and that's not very helpful.
> > >
> > >> See also: http://linux-ha.org/ReportingProblems
> > >
> > >
> > >--
> > >    Alan Robertson < [EMAIL PROTECTED]>
> > >
> > >"Openness is the foundation and preservative of friendship...  Let me
> > >claim from you at all times your undisguised opinions." - William
> > >Wilberforce
> > >_______________________________________________
> > >Linux-HA mailing list
> > >Linux-HA@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >See also: http://linux-ha.org/ReportingProblems
> > >
>
> > Mar 30 09:37:56 test-1 heartbeat: [836]: WARN: node test-2.domain: is
> dead
> > Mar 30 09:37:56 test-1 heartbeat: [836]: info: Link test-2.domain:eth0dead.
> > Mar 30 09:37:56 test-1 crmd: [857]: notice: crmd_ha_status_callback:
> callbacks.c Status update: Node test-2.domain now has status [dead]
> > Mar 30 09:37:56 test-1 crmd: [857]: info: mem_handle_event: Got an
> event OC_EV_MS_NOT_PRIMARY from ccm
> > Mar 30 09:37:56 test-1 crmd: [857]: info: mem_handle_event:
> instance=2, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
> > Mar 30 09:37:56 test-1 crmd: [857]: info: crmd_ccm_msg_callback:
> callbacks.c Quorum lost after event=NOT PRIMARY (id=2)
> > Mar 30 09:37:56 test-1 cib: [853]: info: activateCibXml:io.c CIB size
> is 148544 bytes (was 153264)
> > Mar 30 09:37:56 test-1 cib: [853]: info: cib_diff_notify:notify.cLocal-only 
Change (client:857, call: 24):
> 0.4.134 (ok)
> > Mar 30 09:37:56 test-1 cib: [853]: info: mem_handle_event: Got an
> event OC_EV_MS_NOT_PRIMARY from ccm
> > Mar 30 09:37:56 test-1 cib: [853]: info: mem_handle_event: instance=2,
> nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
> > Mar 30 09:37:56 test-1 cib: [973]: info: write_cib_contents:io.c Wrote
> version 0.4.134 of the CIB to disk (digest:
> 9c22a56909f0417ecc1f3af561f3523a)
> > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: Got an
> event OC_EV_MS_INVALID from ccm
> > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: Got an
> event OC_EV_MS_INVALID from ccm
> > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: no
> mbr_track info
> > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: no
> mbr_track info
> > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: Got an
> event OC_EV_MS_NEW_MEMBERSHIP from ccm
> > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: Got an
> event OC_EV_MS_NEW_MEMBERSHIP from ccm
> > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: instance=3,
> nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event:
> instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> > Mar 30 09:38:01 test-1 crmd: [857]: info: crmd_ccm_msg_callback:
> callbacks.c Quorum (re)attained after event=NEW MEMBERSHIP (id=3)
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_ccm_msg_callback:
> callbacks.c LOST: test-2.domain
> > Mar 30 09:38:01 test-1 crmd: [857]: WARN: check_dead_member:ccm.c Our
> DC node (test-2.domain) left the cluster
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_ccm_msg_callback:
> callbacks.c PEER: test-1.domain
> > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c NEW
> MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
> > Mar 30 09:38:01 test-1 ccm: [852]: info: Break tie for 2 nodes cluster
> > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c      
CURRENT:
> test-1.domain [nodeid=0, born=3]
> > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail: ccm.c
>       LOST:    test-2.domain [nodeid=1, born=1]
> > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> test-1.domain: State transition S_NOT_DC -> S_ELECTION [
> input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]
> > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC to
> <null> (<null>)
> > Mar 30 09:38:01 test-1 crmd: [857]: info: do_election_count_vote:
> election.c Updated voted hash for test-1.domain to vote
> > Mar 30 09:38:01 test-1 crmd: [857]: info: do_election_count_vote:
> election.c Election ignore: our vote (test-1.domain)
> > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> test-1.domain : State transition S_ELECTION -> S_INTEGRATION [
> input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> > Mar 30 09:38:01 test-1 crmd: [857]: info: start_subsystem:subsystems.cStarting 
sub-system "tengine"
> > Mar 30 09:38:01 test-1 crmd: [857]: info: start_subsystem:subsystems.cStarting 
sub-system "pengine"
> > Mar 30 09:38:01 test-1 tengine: [974]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> > Mar 30 09:38:01 test-1 tengine: [974]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> > Mar 30 09:38:01 test-1 crmd: [857]: info: do_dc_takeover:election.cTaking 
over DC status for this partition
> > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC to
> <null> (<null>)
> > Mar 30 09:38:01 test-1 crmd: [857]: info:
> do_dc_join_offer_all:join_dc.c join-1: Waiting on 1 outstanding join acks
> > Mar 30 09:38:01 test-1 pengine: [975]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> > Mar 30 09:38:01 test-1 pengine: [975]: info: init_start:main.cStarting 
pengine
> > Mar 30 09:38:01 test-1 cib: [853]: info: activateCibXml: io.c CIB size
> is 148648 bytes (was 148544)
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cLocal-only 
Change (client:857, call: 25):
> 0.4.134 (ok)
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_process_readwrite:
> messages.c We are now in R/W mode
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:28):
> 0.4.134 -> 0.4.135 (ok)
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_null_callback:
> callbacks.c Setting cib_diff_notify callbacks for tengine: on
> > Mar 30 09:38:01 test-1 tengine: [974]: info: init_start:main.cRegistering 
TE UUID: 7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > Mar 30 09:38:01 test-1 tengine: [974]: info: set_graph_functions:
> utils.c Setting custom graph functions
> > Mar 30 09:38:01 test-1 tengine: [974]: info: unpack_graph:unpack.cUnpacked 
transition -1: 0 actions in 0 synapses
> > Mar 30 09:38:01 test-1 tengine: [974]: info: init_start: main.cStarting 
tengine
> > Mar 30 09:38:01 test-1 cib: [976]: info: write_cib_contents:io.c Wrote
> version 0.4.135 of the CIB to disk (digest:
> 676f407011dd2084813f1c414db97ce5)
> > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc: utils.c Set DC to
> test-1.domain (1.0.6)
> > Mar 30 09:38:01 test-1 cib: [853]: info: sync_our_cib:messages.cSyncing CIB 
to all peers
> > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> test-1.domain : State transition S_INTEGRATION -> S_FINALIZE_JOIN [
> input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> > Mar 30 09:38:01 test-1 attrd: [856]: info: attrd_local_callback:
> attrd.c Sending full refresh
> > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1 
cluster nodes responded to the join offer.
> > Mar 30 09:38:01 test-1 crmd: [857]: info: update_attrd:join_dc.c
> Connecting to attrd...
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:31):
> 0.4.135 -> 0.4.136 (ok)
> > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.4.135 -> 0.4.136
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:32):
> 0.4.136 -> 0.5.137 (ok)
> > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_bump): 0.4.136 -> 0.5.137
> > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:33):
> 0.5.137 -> 0.5.138 (ok)
> > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.137 -> 0.5.138
> > Mar 30 09:38:01 test-1 cib: [977]: info: write_cib_contents:io.c Wrote
> version 0.5.138 of the CIB to disk (digest:
> ff63aedd166b5e7c8880f380b334321e)
> > Mar 30 09:38:02 test-1 crmd: [857]: info: update_dc:utils.c Set DC to
> test-1.domain (1.0.6)
> > Mar 30 09:38:02 test-1 crmd: [857]: info: do_dc_join_ack:join_dc.c
> join-1: Updating node state to member for test-1.domain )
> > Mar 30 09:38:02 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:34):
> 0.5.138 -> 0.5.139 (ok)
> > Mar 30 09:38:02 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.138 -> 0.5.139
> > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c
> test-1.domain: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [
> input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
> > Mar 30 09:38:02 test-1 tengine: [974]: info: update_abort_priority:
> utils.c Abort priority upgraded to 1000000
> > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1 
cluster nodes are eligable to run resources.
> > Mar 30 09:38:02 test-1 cib: [978]: info: write_cib_contents:io.c Wrote
> version 0.5.139 of the CIB to disk (digest:
> 86e0ea1b0a56ca84cf3d1f0e261f7816)
> > Mar 30 09:38:02 test-1 pengine: [975]: info: process_pe_message:
> [generation] <cib admin_epoch="0" have_quorum="true" num_peers="2"
> cib_feature_revision=" 1.3" generated="true" ccm_transition="3"
> dc_uuid="b3bba1ca-b072-49ac-8e93-a2c6fbf4678e" epoch="5" num_updates="139"/>
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config: unpack.cDefault 
stickiness: 1000
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cDefault 
failure stickiness: -400
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cSTONITH 
of failed nodes is enabled
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cSTONITH 
will reboot nodes
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cCluster 
is symmetric - resources can run anywhere by default
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c On
> loss of CCM Quorum: Stop ALL resources
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan 
resources are stopped
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan 
resource actions are stopped
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cStopped 
resources are removed from the status section: true
> > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.c By
> default resources are managed
> > Mar 30 09:38:02 test-1 pengine: [975]: info: determine_online_status:
> unpack.c Node test-1.domain is online
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> determine_online_status_fencing:unpack.c Node 
test-2.domain(d822d67b-5495-47c9-bdb9-f4f66e0bea85) is un-expectedly down
> > Mar 30 09:38:02 test-1 pengine: [975]: info:
> determine_online_status_fencing: unpack.c         ha_state=dead,
> ccm_state=false, crm_state=online, join_state=down, expected=member
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: determine_online_status:
> unpack.c Node test-2.domain is unclean
> > Mar 30 09:38:02 test-1 pengine: [975]: info: Resource Group:
> test_group
> > Mar 30 09:38:02 test-1 pengine: [975]: info:
> test_IP      (heartbeat::ocf:IPaddr):        Started test-2.domain
> > Mar 30 09:38:02 test-1 pengine: [975]: info:
> httpd        (heartbeat::ocf:apache):        Started test-2.domain
> > Mar 30 09:38:02 test-1 pengine: [975]: info:
> test-1_DRAC      (stonith:external/drac4):       Started test-2.domain
> > Mar 30 09:38:02 test-1 pengine: [975]: info:
> test-2_DRAC      (stonith:external/drac4):       Started test-1.domain
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test_IP   (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
test_IP_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> test-1.domain          test_IP_monitor_10000
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource httpd     (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
httpd_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> test-1.domain          httpd_monitor_10000
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test-1_DRAC       (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
test-1_DRAC_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: StopRsc: native.c
> test-1.domain       Stop test-2_DRAC
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: DeleteRsc:native.cRemoving 
test-2_DRAC from
> test-1.domain
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: stage6: allocate.cScheduling 
Node
> test-2.domain for STONITH
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: native_stop_constraints:
> native.c Stop of failed resource test_IP is implict after test-2.domainis 
fenced
> > Mar 30 09:38:02 test-1 pengine: [975]: info: native_stop_constraints:
> native.c Re-creating actions for test_group
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test_IP   (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
test_IP_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> test-1.domain          test_IP_monitor_10000
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource httpd     (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
httpd_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> test-1.domain          httpd_monitor_10000
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: native_stop_constraints:
> native.c Stop of failed resource httpd is implict after test-2.domain is
> fenced
> > Mar 30 09:38:02 test-1 pengine: [975]: info: native_stop_constraints:
> native.c Re-creating actions for test_group
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test_IP   (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
test_IP_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> test-1.domain          test_IP_monitor_10000
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource httpd     (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
httpd_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> test-1.domain          httpd_monitor_10000
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: native_stop_constraints:
> native.c Stop of failed resource test-1_DRAC is implict after
> test-2.domain is fenced
> > Mar 30 09:38:02 test-1 pengine: [975]: notice: stage8: allocate.cCreated 
transition graph 0.
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: process_pe_message:
> pengine.c No value specified for cluster preference: pe-warn-series-max
> > Mar 30 09:38:02 test-1 pengine: [975]: WARN: process_pe_message:
> pengine.c Transition 0: WARNINGs found during PE processing. PEngine
> Input stored in: /var/lib/heartbeat/pengine/pe-warn-45.bz2
> > Mar 30 09:38:02 test-1 pengine: [975]: info: process_pe_message:
> pengine.c Configuration WARNINGs found during PE processing.  Please run
> "crm_verify -L" to identify issues.
> > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c
> test-1.domain: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > Mar 30 09:38:02 test-1 tengine: [974]: info: unpack_graph: unpack.cUnpacked 
transition 0: 14 actions in 14 synapses
> > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 9 confirmed
> > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 6 confirmed
> > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 13 confirmed
> > Mar 30 09:38:02 test-1 tengine: [974]: info: send_rsc_command:
> actions.c Initiating action 14: test-1_DRAC_start_0 on test-1.domain
> > Mar 30 09:38:02 test-1 tengine: [974]: info: send_rsc_command:
> actions.c Initiating action 15: test-2_DRAC_stop_0 on test-1.domain
> > Mar 30 09:38:02 test-1 crmd: [857]: info: do_lrm_rsc_op: lrm.cPerforming op 
start on test-1_DRAC (interval=0ms,
> key=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 3 confirmed
> > Mar 30 09:38:02 test-1 lrmd: [979]: info: Try to start STONITH
> resource <rsc_id=test-1_DRAC> : Device=external/drac4
> > Mar 30 09:38:02 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming op 
stop on test-2_DRAC (interval=0ms,
> key=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > Mar 30 09:38:03 test-1 lrmd: [981]: info: Try to stop STONITH resource
> <rsc_id=test-2_DRAC> : Device=external/drac4
> > Mar 30 09:38:03 test-1 crmd: [857]: info: process_lrm_event:lrm.c LRM
> operation (8) stop_0 on test-2_DRAC complete
> > Mar 30 09:38:03 test-1 cib: [853]: info: activateCibXml:io.c CIB size
> is 150940 bytes (was 148648)
> > Mar 30 09:38:03 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:36):
> 0.5.139 -> 0.5.140 (ok)
> > Mar 30 09:38:03 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.139 -> 0.5.140
> > Mar 30 09:38:03 test-1 tengine: [974]: info: match_graph_event:
> events.c Action test-2_DRAC_stop_0 (15) confirmed
> > Mar 30 09:38:03 test-1 tengine: [974]: info: send_rsc_command:
> actions.c Initiating action 16: test-2_DRAC_delete_0 on test-1.domain
> > Mar 30 09:38:03 test-1 crmd: [857]: info: do_lrm_invoke:lrm.c Removing
> resource test-2_DRAC from the LRM
> > Mar 30 09:38:03 test-1 crmd: [857]: info: send_direct_ack: lrm.cACK'ing 
resource op: delete for test-2_DRAC
> > Mar 30 09:38:03 test-1 tengine: [974]: info: match_graph_event:
> events.c Action test-2_DRAC_delete_0 (16) confirmed
> > Mar 30 09:38:03 test-1 tengine: [974]: info: te_crm_command: 
actions.cExecuting crm-event (17): lrm_refresh on
> test-1.domain
> > Mar 30 09:38:03 test-1 tengine: [974]: info: 
te_crm_command:actions.cSkipping wait for 17
> > Mar 30 09:38:03 test-1 crmd: [857]: WARN: msg_to_op(1151): failed to
> get the value of field lrm_opstatus from a ha_msg
> > Mar 30 09:38:03 test-1 crmd: [857]: info: msg_to_op: Message follows:
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG: Dumping message with 13
> fields
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[0] : [lrm_t=op]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[1] :
> [lrm_rid=test-1_DRAC]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[2] : [lrm_op=start]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[3] : [lrm_timeout=30000]
>
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[4] : [lrm_interval=0]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[5] : [lrm_delay=0]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[6] : [lrm_targetrc=-1]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[7] : [lrm_app=crmd]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[8] :
> [lrm_userdata=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[9] :
> [(2)lrm_param=0x907df78(164 198)]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG: Dumping message with 6
> fields
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[0] : [DRAC_ADDR=
> test-1.drac.domain]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[1] :
> [CRM_meta_op_target_rc=7]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[2] : [DRAC_LOGIN=root]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[3] :
> [DRAC_PASSWD=********]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[4] :
> [CRM_meta_timeout=30000]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[5] : [crm_feature_set=
> 1.0.6]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[10] : [lrm_callid=7]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[11] : [lrm_app=crmd]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[12] : [lrm_callid=7]
> > Mar 30 09:38:03 test-1 crmd: [857]: info: do_lrm_invoke:lrm.c Forcing
> a local LRM refresh
> > Mar 30 09:38:03 test-1 cib: [985]: info: write_cib_contents: io.cWrote 
version
> 0.5.140 of the CIB to disk (digest: da245127f5a42d016fecac8050eb99a1)
> > Mar 30 09:38:03 test-1 cib: [853]: info: activateCibXml:io.c CIB size
> is 144888 bytes (was 150940)
> > Mar 30 09:38:03 test-1 cib: [853]: info: cib_diff_notify: notify.cUpdate 
(client: 857, call:37):
> 0.5.140 -> 0.5.141 (ok)
> > Mar 30 09:38:03 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.140 -> 0.5.141
> > Mar 30 09:38:03 test-1 cib: [986]: info: write_cib_contents: io.cWrote 
version
> 0.5.141 of the CIB to disk (digest: da6197ce54aa4e2715dd524f8f2a40c5)
> > Mar 30 09:38:05 test-1 stonithd: [855]: WARN: G_SIG_dispatch: Dispatch
> function for SIGCHLD took too long to execute: 350 ms (> 10 ms) (GSource:
> 0x929b820)
> > Mar 30 09:38:05 test-1 crmd: [857]: info: process_lrm_event:lrm.c LRM
> operation (7) start_0 on test-1_DRAC complete
> > Mar 30 09:38:05 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:38):
> 0.5.141 -> 0.5.142 (ok)
> > Mar 30 09:38:05 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.141 -> 0.5.142
> > Mar 30 09:38:05 test-1 tengine: [974]: info: match_graph_event:
> events.c Action test-1_DRAC_start_0 (14) confirmed
> > Mar 30 09:38:05 test-1 tengine: [974]: info: 
te_fence_node:actions.cExecuting reboot fencing operation (18) on
> test-2.domain (timeout=15000)
> > Mar 30 09:38:05 test-1 stonithd: [855]: info: client tengine [pid:
> 974] want a STONITH operation RESET to node test-2.domain.
> > Mar 30 09:38:05 test-1 stonithd: [855]: info: Broadcasting the message
> succeeded: require others to stonith node test-2.domain.
> > Mar 30 09:38:05 test-1 cib: [988]: info: write_cib_contents:io.c Wrote
> version 0.5.142 of the CIB to disk (digest:
> d5d12bf8f6bdd66647f4dc5cc27adb6e)
> > Mar 30 09:38:20 test-1 stonithd: [855]: info: Failed to STONITH the
> node test-2.domain: optype=1, op_result=2
> > Mar 30 09:38:20 test-1 tengine: [974]: info: tengine_stonith_callback:
> callbacks.c call=-2, optype=1, node_name=test-2.domain, result=2,
> node_list=, action=18;0:7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > Mar 30 09:38:20 test-1 tengine: [974]: ERROR:
> tengine_stonith_callback:callbacks.c Stonith of test-2.domain failed
> (2)... aborting transition.
> > Mar 30 09:38:20 test-1 tengine: [974]: info: update_abort_priority:
> utils.c Abort priority upgraded to 1000000
> > Mar 30 09:38:20 test-1 tengine: [974]: info: update_abort_priority:
> utils.c Abort action 0 superceeded by 2
> > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.c
> test-1.domain: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [
> input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> > Mar 30 09:38:20 test-1 tengine: [974]: info: 
run_graph:graph.c====================================================
> > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1 
cluster nodes are eligable to run resources.
> > Mar 30 09:38:20 test-1 tengine: [974]: notice: run_graph:graph.cTransition 
0: (Complete=9, Pending=0, Fired=0, Skipped=5, Incomplete=0)
> > Mar 30 09:38:20 test-1 pengine: [975]: info: process_pe_message:
> [generation] <cib admin_epoch="0" have_quorum="true" num_peers="2"
> cib_feature_revision=" 1.3" generated="true" ccm_transition="3"
> dc_uuid="b3bba1ca-b072-49ac-8e93-a2c6fbf4678e" epoch="5" num_updates="142"/>
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config: unpack.cDefault 
stickiness: 1000
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cDefault 
failure stickiness: -400
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cSTONITH 
of failed nodes is enabled
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cSTONITH 
will reboot nodes
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cCluster 
is symmetric - resources can run anywhere by default
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c On
> loss of CCM Quorum: Stop ALL resources
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan 
resources are stopped
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan 
resource actions are stopped
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cStopped 
resources are removed from the status section: true
> > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.c By
> default resources are managed
> > Mar 30 09:38:20 test-1 pengine: [975]: info: determine_online_status:
> unpack.c Node test-1.domain is online
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> determine_online_status_fencing:unpack.c Node 
test-2.domain(d822d67b-5495-47c9-bdb9-f4f66e0bea85) is un-expectedly down
> > Mar 30 09:38:20 test-1 pengine: [975]: info:
> determine_online_status_fencing: unpack.c         ha_state=dead,
> ccm_state=false, crm_state=online, join_state=down, expected=member
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: determine_online_status:
> unpack.c Node test-2.domain is unclean
> > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_add_running:
> native.c Resource stonith::external/drac4:test-1_DRAC appears to be
> active on 2 nodes.
> > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: See 
http://linux-ha.org/v2/faq/resource_too_active
> for more information.
> > Mar 30 09:38:20 test-1 pengine: [975]: info: Resource Group:
> test_group
> > Mar 30 09:38:20 test-1 pengine: [975]: info:
> test_IP      (heartbeat::ocf:IPaddr):        Started test-2.domain
> > Mar 30 09:38:20 test-1 pengine: [975]: info:
> httpd        (heartbeat::ocf:apache):        Started test-2.domain
> > Mar 30 09:38:20 test-1 pengine: [975]: info:
> test-1_DRAC      (stonith:external/drac4)
> > Mar 30 09:38:20 test-1 pengine: [975]: info:  0 : test-1.domain
> > Mar 30 09:38:20 test-1 pengine: [975]: info:  1 : test-2.domain
> > Mar 30 09:38:20 test-1 pengine: [975]: info:
> test-2_DRAC      (stonith:external/drac4):       Stopped
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: native_create_probe:
> native.c test-1.domain: Created probe for test-2_DRAC
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test_IP   (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
test_IP_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> test-1.domain          test_IP_monitor_10000
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange: native.cMove  
resource httpd     (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
httpd_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> test-1.domain          httpd_monitor_10000
> > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_create_actions:
> native.c Attempting recovery of resource test-1_DRAC
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: StopRsc:native.c
> test-1.domain       Stop test-1_DRAC
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: DeleteRsc: native.cRemoving 
test-1_DRAC from
> test-1.domain
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: StopRsc:native.c
> test-2.domain       Stop test-1_DRAC
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
test-1_DRAC_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: StartRsc: native.c
> test-1.domain       Start test-1_DRAC
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: stage6:allocate.cScheduling 
Node
> test-2.domain for STONITH
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: native_stop_constraints:
> native.c Stop of failed resource test_IP is implict after test-2.domainis 
fenced
> > Mar 30 09:38:20 test-1 pengine: [975]: info: native_stop_constraints:
> native.c Re-creating actions for test_group
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange: native.cMove  
resource test_IP   (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
test_IP_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> test-1.domain          test_IP_monitor_10000
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange: native.cMove  
resource httpd     (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
httpd_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> test-1.domain          httpd_monitor_10000
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: native_stop_constraints:
> native.c Stop of failed resource httpd is implict after test-2.domain is
> fenced
> > Mar 30 09:38:20 test-1 pengine: [975]: info: native_stop_constraints:
> native.c Re-creating actions for test_group
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange: native.cMove  
resource test_IP   (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
test_IP_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> test-1.domain          test_IP_monitor_10000
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange: native.cMove  
resource httpd     (
> test-2.domain -> test-1.domain)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
httpd_stop_0 on
> test-2.domain is unrunnable (offline)
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cMarking 
node
> test-2.domain unclean
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> test-1.domain          httpd_monitor_10000
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: native_stop_constraints:
> native.c Stop of failed resource test-1_DRAC is implict after
> test-2.domain is fenced
> > Mar 30 09:38:20 test-1 pengine: [975]: notice: stage8:allocate.cCreated 
transition graph 1.
> > Mar 30 09:38:20 test-1 pengine: [975]: WARN: process_pe_message:
> pengine.c No value specified for cluster preference: pe-error-series-max
> > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: process_pe_message:
> pengine.c Transition 1: ERRORs found during PE processing. PEngine Input
> stored in: /var/lib/heartbeat/pengine/pe- error-128.bz2
> > Mar 30 09:38:20 test-1 pengine: [975]: info: process_pe_message:
> pengine.c Configuration WARNINGs found during PE processing.  Please run
> "crm_verify -L" to identify issues.
> > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition: fsa.c
> test-1.domain: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > Mar 30 09:38:20 test-1 tengine: [974]: info: unpack_graph:unpack.cUnpacked 
transition 1: 17 actions in 17 synapses
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 10 confirmed
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 7 confirmed
> > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> actions.c Initiating action 14: test-1_DRAC_stop_0 on test-1.domain
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 17 confirmed
> > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> actions.c Initiating action 3: test-2_DRAC_monitor_0 on test-1.domain
> > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming op 
stop on test-1_DRAC (interval=0ms,
> key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 4 confirmed
> > Mar 30 09:38:21 test-1 lrmd: [989]: info: Try to stop STONITH resource
> <rsc_id=test-1_DRAC> : Device=external/drac4
> > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming op 
monitor on test-2_DRAC (interval=0ms,
> key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > Mar 30 09:38:21 test-1 crmd: [857]: info: process_lrm_event: lrm.c LRM
> operation (9) stop_0 on test-1_DRAC complete
> > Mar 30 09:38:21 test-1 crmd: [857]: info: process_lrm_event:lrm.c LRM
> operation (10) monitor_0 on test-2_DRAC Error: (7) not running
> > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB size
> is 147180 bytes (was 144888)
> > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:40):
> 0.5.142 -> 0.5.143 (ok)
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.142 -> 0.5.143
> > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> events.c Action test-1_DRAC_stop_0 (14) confirmed
> > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> actions.c Initiating action 15: test-1_DRAC_delete_0 on test-1.domain
> > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_invoke:lrm.c Removing
> resource test-1_DRAC from the LRM
> > Mar 30 09:38:21 test-1 crmd: [857]: info: send_direct_ack: lrm.cACK'ing 
resource op: delete for test-1_DRAC
> > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> events.c Action test-1_DRAC_delete_0 (15) confirmed
> > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> actions.c Initiating action 18: test-1_DRAC_start_0 on test-1.domain
> > Mar 30 09:38:21 test-1 tengine: [974]: info: 
te_crm_command:actions.cExecuting crm-event (16): lrm_refresh on
> test-1.domain
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_crm_command: 
actions.cSkipping wait for 16
> > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming op 
start on test-1_DRAC (interval=0ms,
> key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > Mar 30 09:38:21 test-1 lrmd: [993]: info: Try to start STONITH
> resource <rsc_id=test-1_DRAC> : Device=external/drac4
> > Mar 30 09:38:21 test-1 crmd: [857]: WARN: msg_to_op(1151): failed to
> get the value of field lrm_opstatus from a ha_msg
> > Mar 30 09:38:21 test-1 crmd: [857]: info: msg_to_op: Message follows:
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG: Dumping message with 13
> fields
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[0] : [lrm_t=op]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[1] :
> [lrm_rid=test-1_DRAC]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[2] : [lrm_op=start]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[3] : [lrm_timeout=30000]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[4] : [lrm_interval=0]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[5] : [lrm_delay=0]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[6] : [lrm_targetrc=-1]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[7] : [lrm_app=crmd]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[8] :
> [lrm_userdata=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[9] :
> [(2)lrm_param=0x90afd20(140 168)]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG: Dumping message with 5
> fields
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[0] : [DRAC_ADDR=
> test-1.drac.domain]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[1] : [DRAC_LOGIN=root]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[2] :
> [DRAC_PASSWD=********]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[3] :
> [CRM_meta_timeout=30000]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[4] : [crm_feature_set=
> 1.0.6]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[10] : [lrm_callid=11]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[11] : [lrm_app=crmd]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[12] : [lrm_callid=11]
> > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_invoke:lrm.c Forcing
> a local LRM refresh
> > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB size
> is 150940 bytes (was 147180)
> > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:41):
> 0.5.143 -> 0.5.144 (ok)
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.143 -> 0.5.144
> > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> events.c Action test-2_DRAC_monitor_0 (3) confirmed
> > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> actions.c Initiating action 2: probe_complete on test-1.domain
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> actions.c Pseudo action 1 confirmed
> > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB size
> is 146356 bytes (was 150940)
> > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:42):
> 0.5.144 -> 0.5.145 (ok)
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.144 -> 0.5.145
> > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:43):
> 0.5.145 -> 0.5.146 (ok)
> > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.145 -> 0.5.146
> > Mar 30 09:38:21 test-1 cib: [995]: info: write_cib_contents:io.c Wrote
> version 0.5.146 of the CIB to disk (digest:
> 53f6e570e78f4cf92213e89be4ae0573)
> > Mar 30 09:38:24 test-1 stonithd: [855]: WARN: G_SIG_dispatch: Dispatch
> function for SIGCHLD took too long to execute: 350 ms (> 10 ms) (GSource:
> 0x929b820)
> > Mar 30 09:38:24 test-1 crmd: [857]: info: process_lrm_event: lrm.c LRM
> operation (11) start_0 on test-1_DRAC complete
> > Mar 30 09:38:24 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:44):
> 0.5.146 -> 0.5.147 (ok)
> > Mar 30 09:38:24 test-1 tengine: [974]: info: te_update_diff:
> callbacks.c Processing diff (cib_update): 0.5.146 -> 0.5.147
> > Mar 30 09:38:24 test-1 tengine: [974]: info: match_graph_event:
> events.c Action test-1_DRAC_start_0 (18) confirmed
> > Mar 30 09:38:24 test-1 tengine: [974]: info: te_fence_node: 
actions.cExecuting reboot fencing operation (19) on
> test-2.domain (timeout=15000)
> > Mar 30 09:38:24 test-1 stonithd: [855]: info: client tengine [pid:
> 974] want a STONITH operation RESET to node test-2.domain.
> > Mar 30 09:38:24 test-1 stonithd: [855]: info: Broadcasting the message
> succeeded: require others to stonith node test-2.domain.
> > Mar 30 09:38:24 test-1 cib: [999]: info: write_cib_contents:io.c Wrote
> version 0.5.147 of the CIB to disk (digest:
> 7978776db187ec2d2397921646f4053b)
> > Mar 30 09:38:39 test-1 stonithd: [855]: info: Failed to STONITH the
> node test-2.domain: optype=1, op_result=2
> > Mar 30 09:38:39 test-1 tengine: [974]: info: tengine_stonith_callback:
> callbacks.c call=-3, optype=1, node_name=test-2.domain, result=2,
> node_list=, action=19;1:7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > Mar 30 09:38:39 test-1 tengine: [974]: ERROR:
> tengine_stonith_callback:callbacks.c Stonith of test-2.domain failed
> (2)... aborting transition.
> > Mar 30 09:38:39 test-1 tengine: [974]: info: update_abort_priority:
> utils.c Abort priority upgraded to 1000000
> > Mar 30 09:38:39 test-1 tengine: [974]: info: update_abort_priority:
> utils.c Abort action 0 superceeded by 2
> > Mar 30 09:38:39 test-1 crmd: [857]: info: do_state_transition: fsa.c
> test-1.domain: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [
> input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> > Mar 30 09:38:39 test-1 tengine: [974]: info: 
run_graph:graph.c====================================================
> >  <cib admin_epoch="0" epoch="0" num_updates="0">
> >    <configuration>
> >      <crm_config>
> >        <cluster_property_set id="cib-bootstrap-options">
> >          <attributes>
> >            <nvpair id="cib-bootstrap-options-transition_idle_timeout"
> name="transition_idle_timeout" value="5min"/>
> >            <nvpair id="cib-bootstrap-options-stonith_enabled"
> name="stonith_enabled" value="true"/>
> >            <nvpair
> id="cib-bootstrap-options-default_resource_stickiness"
> name="default_resource_stickiness" value="1000"/>
> >            <nvpair id="cib-bootstrap-options-short_resource_names"
> name="short_resource_names" value="true"/>
> >            <nvpair
> id="cib-bootstrap-options-default_resource_failure_stickiness"
> name="default_resource_failure_stickiness" value="-400"/>
> >            <nvpair id="cib-bootstrap-options-stonith_action"
> name="stonith_action" value="reboot"/>
> >            <nvpair id="cib-bootstrap-options-remove_after_stop"
> name="remove_after_stop" value="true"/>
> >            <nvpair id="cib-bootstrap-options-default_action_timeout"
> name="default_action_timeout" value="30s"/>
> >          <nvpair id="cib-bootstrap-options-symmetric_cluster"
> name="symmetric_cluster" value="true"/>
> >            <nvpair id="cib-bootstrap-options-no_quorum_policy"
> name="no_quorum_policy" value="stop"/>
> >            <nvpair id="cib-bootstrap-options-stop_orphan_resources"
> name="stop_orphan_resources" value="true"/>
> >            <nvpair id="cib-bootstrap-options-stop_orphan_actions"
> name="stop_orphan_actions" value="true"/>
> >            <nvpair id="cib-bootstrap-options-is_managed_default"
> name="is_managed_default" value="true"/>
> >          </attributes>
> >        </cluster_property_set>
> >      </crm_config>
> >      <nodes/>
> >      <resources>
> >        <group id="test_group">
> >          <primitive class="ocf" id="test_IP" provider="heartbeat"
> type="IPaddr">
> >            <operations>
> >              <op id="test_IP_mon" interval="10s" name="monitor"
> timeout="9s" on_fail="restart"/>
> >              <op id="test_IP_start" name="start" timeout="20s"
> on_fail="restart" prereq="fencing"/>
> >              <op id="test_IP_stop" name="stop" timeout="20s"
> on_fail="restart"/>
> >            </operations>
> >            <instance_attributes id="test_IP_inst_attr">
> >              <attributes>
> >                <nvpair id="test_IP_attr_0" name="ip" value="
> 192.168.168.168"/>
> >                <nvpair id="test_IP_attr_1" name="netmask" value="25"/>
>
> >              </attributes>
> >            </instance_attributes>
> >          </primitive>
> >          <primitive class="ocf" id="httpd" provider="heartbeat"
> type="apache">
> >            <operations>
> >              <op id="httpd_mon" name="monitor" interval="10s"
> timeout="9s" on_fail="restart"/>
> >              <op id="httpd_start" name="start" timeout="20s"
> on_fail="restart" prereq="fencing"/>
> >              <op id="httpd_stop" name="stop" timeout="20s"
> on_fail="restart"/>
> >            </operations>
> >          </primitive>
> >        </group>
> >          <primitive id="test-1_DRAC" class="stonith"
> type="external/drac4" provider="heartbeat">
> >            <operations>
> >              <op id="test-1_DRAC_reset" name="reset" timeout="3m"
> prereq="nothing"/>
> >            </operations>
> >            <instance_attributes id="test-1_DRAC_inst_attr">
> >              <attributes>
> >                <nvpair id="test-1_DRAC_attr_0" name="DRAC_ADDR"
> value=" test-1.drac.domain"/>
> >                <nvpair id="test-1_DRAC_attr_1" name="DRAC_LOGIN"
> value="root"/>
> >                <nvpair id="test-1_DRAC_attr_2" name="DRAC_PASSWD"
> value="********"/>
> >              </attributes>
> >            </instance_attributes>
> >          </primitive>
> >          <primitive id="test-2_DRAC" class="stonith"
> type="external/drac4" provider="heartbeat">
> >            <operations>
> >              <op id="test-2_DRAC_reset" name="reset" timeout="3m"
> prereq="nothing"/>
> >            </operations>
> >            <instance_attributes id="test-2_DRAC_inst_attr">
> >              <attributes>
> >                <nvpair id="test-2_DRAC_attr_0" name="DRAC_ADDR"
> value="test-2.drac.domain"/>
> >                <nvpair id="test-2_DRAC_attr_1" name="DRAC_LOGIN"
> value="root"/>
> >                <nvpair id="test-2_DRAC_attr_2" name="DRAC_PASSWD"
> value="********"/>
> >              </attributes>
> >            </instance_attributes>
> >          </primitive>
> >      </resources>
> >      <constraints>
> >        <rsc_location id="test_group_location_test-1" rsc="test_group">
> >          <rule id="prefered_location_test_group_test-1" score="1000">
> >            <expression attribute="#uname"
> id="prefered_location_test_group_test-1_expr_1" operation="eq" value="
> test-1.domain"/>
> >          </rule>
> >        </rsc_location>
> >        <rsc_location id="test_group_location_test-2" rsc="test_group">
> >          <rule id="prefered_location_test_group_test-2" score="1000">
> >            <expression attribute="#uname"
> id="prefered_location_test_group_test-2_expr_1" operation="eq" value="
> test-2.domain"/>
> >          </rule>
> >        </rsc_location>
> >        <rsc_location id="test-1_DRAC_location" rsc="test-1_DRAC">
> >          <rule id="prefered_location_test-1_DRAC" score="INFINITY">
> >            <expression attribute="#uname"
> id="prefered_location_test-1_DRAC_expr_1" operation="eq" value="
> test-2.domain"/>
> >          </rule>
> >        </rsc_location>
> >        <rsc_location id="test-2_DRAC_location" rsc="test-2_DRAC">
> >          <rule id="prefered_location_test-2_DRAC" score="INFINITY">
> >            <expression attribute="#uname"
> id="prefered_location_test-2_DRAC_expr_1" operation="eq" value="
> test-1.domain"/>
> >          </rule>
> >        </rsc_location>
> >        <rsc_order id="Ord_websrv" from="httpd" type="after"
> to="test_IP"/>
> >        <rsc_colocation id="Colo_not_same_DRAC" from="test-1_DRAC"
> to="test-2_DRAC" score="-INFINITY"/>
> >      </constraints>
> >    </configuration>
> >    <status/>
> >  </cib>
>
>
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
> --
> Dejan
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

Reply via email to