do you mean  STONITH will itself determine whether there is need of fencing
or not and will get executed accordingly.

i am completly new to STONITH and configuring it first time. i am doing
following. could you please verify it.

 i want to configure stonith external riloe. as for configuring riloe i need
to give ilo_hostname ,ilo_password etc which is specific to each host.  i
have two nodes and that's why i am adding two premitive stoniths
resource(stonith-node1 and stonith-node2) . stonith-node1 will keep all
riloe parameters details of node1 and stonith-node2 will keep all riloe
parameters detail of node2.  now i will add resource constraint to make
stonith-node1 to allways run on node1 . and same like this for
stonith-node2.

Please let me know your view on above steps.

one thing more , does a node need quorum to fence another node. as in my
case , i have only two nodes so will a node will be able to stonith other in
split brain situation . As in split brain situation both nodes will loose
quorum .

Thanks a lot for your help.

On Mon, Feb 9, 2009 at 1:52 PM, Dejan Muhamedagic <deja...@fastmail.fm>wrote:

> On Thu, Feb 05, 2009 at 10:21:04AM -0800, Gruher, Joseph R wrote:
> > Thanks for the input.  What could cause the STONITH request to
> > not be sent from tengine?
>
> Nothing. If there is a need for fencing that is. If you think
> that there should've been one sent but wasn't, please use
> hb_report and create a bugzilla.
>
> Thanks,
>
> Dejan
>
> >
> > Thanks,
> > Joe
> >
> > -----Original Message-----
> > From: linux-ha-boun...@lists.linux-ha.org [mailto:
> linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dejan Muhamedagic
> > Sent: Thursday, February 05, 2009 10:13 AM
> > To: General Linux-HA mailing list
> > Subject: Re: [Linux-HA] Help with STONITH Plugin
> >
> > On Mon, Feb 02, 2009 at 03:26:20PM -0800, Gruher, Joseph R wrote:
> > > Hello-
> > >
> > > We are developing our own STONITH plugin for our blade server
> > > and having an issue we are hoping this list can help us with.
> > > Our STONITH plugin script works well some of the time (the bad
> > > node is reset and the resources fail over) but does not work
> > > some of the time (resources never fail over).  We have been
> > > looking in the messages log (see examples below) and it appears
> > > in the non-working case that the reset call to our plugin never
> > > occurs, even though the getconfignames, status and gethosts
> > > calls that normally lead up to it are made.  Are there any
> > > common problem that could cause this behavior?  Any suggestions
> > > how we can continue to debug this problem?  Other logs we
> > > should be looking in?  Would it be useful to send our plugin or
> > > any other files from the system?
> > >
> > > Thanks very much for any and all input.  We are testing with SLES10.2
> x64.
> >
> > The "works" log looks fine.
> >
> > In the "not works" log, there's no request for stonith from
> > tengine (look for tengine.*reboot).
> >
> > Thanks,
> >
> > Dejan
> >
> > > -Joe
> > >
> > >
> > > When it works:
> > >
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: do_lrm_rsc_op: Performing
> op=STONITH_v1.1_start_0 key=16:47:676b4bb5-523b-49c5-a5f3-a228f2af8149)
> > > Jan 28 11:33:09 25node2 lrmd: [6322]: info: rsc:STONITH_v1.1: start
> > > Jan 28 11:33:09 25node2 lrmd: [14675]: info: Try to start STONITH
> resource <rsc_id=STONITH_v1.1> : Device=external/MFSYS_STONITH_PLUGIN_v1.1
> > > Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14676]:
> getconfignames (node2slot=; slot=)
> > > Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14676]: exiting
> script with an rc=0
> > > Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14687]: status
> (node2slot=25node1=1,25node2=2,25node3=3; slot=)
> > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: quorum plugin: majority
> > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: cluster:linux-ha,
> member_count=2, member_quorum_votes=200
> > > Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: Got an
> event OC_EV_MS_INVALID from ccm
> > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: total_node_count=3,
> total_quorum_votes=300
> > > Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: no
> mbr_track info
> > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: quorum plugin: majority
> > > Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: Got an
> event OC_EV_MS_NEW_MEMBERSHIP from ccm
> > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: cluster:linux-ha,
> member_count=2, member_quorum_votes=200
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: Got an
> event OC_EV_MS_INVALID from ccm
> > > Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event:
> instance=16, nodes=2, new=0, lost=1, n_idx=0, new_idx=2, old_idx=5
> > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: total_node_count=3,
> total_quorum_votes=300
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: no
> mbr_track info
> > > Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: LOST:
> 25node1
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: Got an
> event OC_EV_MS_NEW_MEMBERSHIP from ccm
> > > Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: PEER:
> 25node2
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event:
> instance=16, nodes=2, new=0, lost=1, n_idx=0, new_idx=2, old_idx=5
> > > Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: PEER:
> 25node3
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: crmd_ccm_msg_callback:
> Quorum (re)attained after event=NEW MEMBERSHIP (id=16)
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail: NEW
> MEMBERSHIP: trans=16, nodes=2, new=0, lost=1 n_idx=0, new_idx=2, old_idx=5
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail:
> CURRENT: 25node2 [nodeid=1, born=2]
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail:
> CURRENT: 25node3 [nodeid=2, born=3]
> > > Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail:     LOST:
>    25node1 [nodeid=0, born=15]
> > > Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: Remapping
> WebServer_monitor_0 (rc=1) on 25node2 to an ERROR
> > > Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: Remapping
> WebServer_monitor_0 (rc=1) on 25node3 to an ERROR
> > > Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: Remapping
> WebServer_monitor_0 (rc=1) on 25node2 to an ERROR
> > > Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: Remapping
> WebServer_monitor_0 (rc=1) on 25node3 to an ERROR
> > > Jan 28 11:33:10 25node2 haclient: on_event: from message queue:
> evt:cib_changed
> > > Jan 28 11:33:11 25node2 MFSYS_STONITH_PLUGIN_v1.1[14687]: exiting
> script with an rc=0
> > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14703]: gethosts
> (node2slot=25node1=1,25node2=2,25node3=3; slot=)
> > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14703]: exiting
> script with an rc=0
> > > Jan 28 11:33:12 25node2 crmd: [6325]: info: process_lrm_event: LRM
> operation STONITH_v1.1_start_0 (call=24, rc=0) complete
> > > Jan 28 11:33:12 25node2 tengine: [6860]: info: match_graph_event:
> Action STONITH_v1.1_start_0 (16) confirmed on 25node2 (rc=0)
> > > Jan 28 11:33:12 25node2 tengine: [6860]: info: te_pseudo_action: Pseudo
> action 17 fired and confirmed
> > > Jan 28 11:33:12 25node2 tengine: [6860]: info: te_fence_node: Executing
> reboot fencing operation (18) on 25node1 (timeout=50000)
> > > Jan 28 11:33:12 25node2 haclient: on_event:evt:cib_changed
> > > Jan 28 11:33:12 25node2 stonithd: [6323]: info: client tengine [pid:
> 6860] want a STONITH operation RESET to node 25node1.
> > > Jan 28 11:33:12 25node2 stonithd: [6323]: info:
> stonith_operate_locally::2368: sending fencing op (RESET) for 25node1 to
> device external (rsc_id=STONITH_v1.1, pid=14713)
> > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: reset 25node1
> (node2slot=25node1=1,25node2=2,25node3=3; slot=1)
> > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: retry=0
> bladeState=-1 powerstate=-32
> > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: exiting
> script with an rc=0
> > >
> > >
> > > When it fails:
> > >
> > > Jan 30 15:54:27 vs-cb03-5cl crmd: [5116]: info: do_lrm_rsc_op:
> Performing op=CBSTONITH_monitor_0
> key=4:8:469db639-f549-4582-9735-2b5e89d147c2)
> > > Jan 30 15:54:27 vs-cb03-5cl lrmd: [5113]: info: rsc:CBSTONITH: monitor
> > > Jan 30 15:54:27 vs-cb03-5cl crmd: [5116]: info: process_lrm_event: LRM
> operation CBSTONITH_monitor_0 (call=20, rc=7) complete
> > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: write_cib_contents:
> Wrote version 0.1175.3 of the CIB to disk (digest:
> 73c66f86b10cd0b1bc58a3beab870faa)
> > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> > > Jan 30 15:54:28 vs-cb03-5cl crmd: [5116]: info: do_lrm_rsc_op:
> Performing op=CBSTONITH_start_0
> key=19:8:469db639-f549-4582-9735-2b5e89d147c2)
> > > Jan 30 15:54:28 vs-cb03-5cl lrmd: [5113]: info: rsc:CBSTONITH: start
> > > Jan 30 15:54:28 vs-cb03-5cl lrmd: [30768]: info: Try to start STONITH
> resource <rsc_id=CBSTONITH> : Device=external/MFSYS_STONITH_PLUGIN_v1.1
> > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: write_cib_contents:
> Wrote version 0.1175.5 of the CIB to disk (digest:
> f2c35efbf0c7066043202d754cf607bb)
> > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> > > Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30769]:
> getconfignames (node2slot=; slot=)
> > > Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30769]: exiting
> script with an rc=0
> > > Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30780]: status
> (node2slot=vs-cb03-2cl=2,vs-cb03-5cl=5,vs-cb03-6cl=6; slot=)
> > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: write_cib_contents:
> Wrote version 0.1175.7 of the CIB to disk (digest:
> 551519e2dfb361ccf0f4303167997435)
> > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> > > Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30780]: exiting
> script with an rc=0
> > > Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: gethosts
> (node2slot=vs-cb03-2cl=2,vs-cb03-5cl=5,vs-cb03-6cl=6; slot=)
> > > Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: JOE02:
> hostlist = vs-cb03-2cl vs-cb03-5cl vs-cb03-6cl
> > > Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: exiting
> script with an rc=0
> > > Jan 30 15:54:31 vs-cb03-5cl crmd: [5116]: info: process_lrm_event: LRM
> operation CBSTONITH_start_0 (call=21, rc=0) complete
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to