On Mon, Feb 9, 2009 at 2:54 PM, Dejan Muhamedagic <deja...@fastmail.fm>wrote:
> On Mon, Feb 09, 2009 at 02:09:11PM +0530, Priyanka Ranjan wrote: > > do you mean STONITH will itself determine whether there is need of > fencing > > or not and will get executed accordingly. > > No. It's the crmd. > > > i am completly new to STONITH and configuring it first time. i am doing > > following. could you please verify it. > > > > i want to configure stonith external riloe. as for configuring riloe i > need > > to give ilo_hostname ,ilo_password etc which is specific to each host. i > > have two nodes and that's why i am adding two premitive stoniths > > resource(stonith-node1 and stonith-node2) . stonith-node1 will keep all > > riloe parameters details of node1 and stonith-node2 will keep all riloe > > parameters detail of node2. now i will add resource constraint to make > > stonith-node1 to allways run on node1 . and same like this for > > stonith-node2. > > It should be the other way around: stonith-node1 on node2. just want to clear one thing here. suppose i have four nodes cluster then do i need to configure and run stonith-node1 (which keep info about node1) on all other three nodes (i.e node2, node3 and node4) > > > > Please let me know your view on above steps. > > > > one thing more , does a node need quorum to fence another node. > > Yes. But being useless for two-nodes clusters it should be ignored. > > > as in my > > case , i have only two nodes so will a node will be able to stonith other > in > > split brain situation . As in split brain situation both nodes will loose > > quorum . > > It is either ignored by a special quorum plugin in older versions > or by pacemaker in the newer versions (no-quorum-policy=ignore). > > Thanks, > > Dejan > > > Thanks a lot for your help. > > > > On Mon, Feb 9, 2009 at 1:52 PM, Dejan Muhamedagic <deja...@fastmail.fm > >wrote: > > > > > On Thu, Feb 05, 2009 at 10:21:04AM -0800, Gruher, Joseph R wrote: > > > > Thanks for the input. What could cause the STONITH request to > > > > not be sent from tengine? > > > > > > Nothing. If there is a need for fencing that is. If you think > > > that there should've been one sent but wasn't, please use > > > hb_report and create a bugzilla. > > > > > > Thanks, > > > > > > Dejan > > > > > > > > > > > Thanks, > > > > Joe > > > > > > > > -----Original Message----- > > > > From: linux-ha-boun...@lists.linux-ha.org [mailto: > > > linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dejan Muhamedagic > > > > Sent: Thursday, February 05, 2009 10:13 AM > > > > To: General Linux-HA mailing list > > > > Subject: Re: [Linux-HA] Help with STONITH Plugin > > > > > > > > On Mon, Feb 02, 2009 at 03:26:20PM -0800, Gruher, Joseph R wrote: > > > > > Hello- > > > > > > > > > > We are developing our own STONITH plugin for our blade server > > > > > and having an issue we are hoping this list can help us with. > > > > > Our STONITH plugin script works well some of the time (the bad > > > > > node is reset and the resources fail over) but does not work > > > > > some of the time (resources never fail over). We have been > > > > > looking in the messages log (see examples below) and it appears > > > > > in the non-working case that the reset call to our plugin never > > > > > occurs, even though the getconfignames, status and gethosts > > > > > calls that normally lead up to it are made. Are there any > > > > > common problem that could cause this behavior? Any suggestions > > > > > how we can continue to debug this problem? Other logs we > > > > > should be looking in? Would it be useful to send our plugin or > > > > > any other files from the system? > > > > > > > > > > Thanks very much for any and all input. We are testing with > SLES10.2 > > > x64. > > > > > > > > The "works" log looks fine. > > > > > > > > In the "not works" log, there's no request for stonith from > > > > tengine (look for tengine.*reboot). > > > > > > > > Thanks, > > > > > > > > Dejan > > > > > > > > > -Joe > > > > > > > > > > > > > > > When it works: > > > > > > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: do_lrm_rsc_op: > Performing > > > op=STONITH_v1.1_start_0 key=16:47:676b4bb5-523b-49c5-a5f3-a228f2af8149) > > > > > Jan 28 11:33:09 25node2 lrmd: [6322]: info: rsc:STONITH_v1.1: start > > > > > Jan 28 11:33:09 25node2 lrmd: [14675]: info: Try to start STONITH > > > resource <rsc_id=STONITH_v1.1> : > Device=external/MFSYS_STONITH_PLUGIN_v1.1 > > > > > Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14676]: > > > getconfignames (node2slot=; slot=) > > > > > Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14676]: exiting > > > script with an rc=0 > > > > > Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14687]: status > > > (node2slot=25node1=1,25node2=2,25node3=3; slot=) > > > > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: quorum plugin: majority > > > > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: cluster:linux-ha, > > > member_count=2, member_quorum_votes=200 > > > > > Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: Got an > > > event OC_EV_MS_INVALID from ccm > > > > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: total_node_count=3, > > > total_quorum_votes=300 > > > > > Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: no > > > mbr_track info > > > > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: quorum plugin: majority > > > > > Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: Got an > > > event OC_EV_MS_NEW_MEMBERSHIP from ccm > > > > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: cluster:linux-ha, > > > member_count=2, member_quorum_votes=200 > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: Got > an > > > event OC_EV_MS_INVALID from ccm > > > > > Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: > > > instance=16, nodes=2, new=0, lost=1, n_idx=0, new_idx=2, old_idx=5 > > > > > Jan 28 11:33:09 25node2 ccm: [6320]: debug: total_node_count=3, > > > total_quorum_votes=300 > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: no > > > mbr_track info > > > > > Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: > LOST: > > > 25node1 > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: Got > an > > > event OC_EV_MS_NEW_MEMBERSHIP from ccm > > > > > Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: > PEER: > > > 25node2 > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: > > > instance=16, nodes=2, new=0, lost=1, n_idx=0, new_idx=2, old_idx=5 > > > > > Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: > PEER: > > > 25node3 > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: crmd_ccm_msg_callback: > > > Quorum (re)attained after event=NEW MEMBERSHIP (id=16) > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail: NEW > > > MEMBERSHIP: trans=16, nodes=2, new=0, lost=1 n_idx=0, new_idx=2, > old_idx=5 > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail: > > > CURRENT: 25node2 [nodeid=1, born=2] > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail: > > > CURRENT: 25node3 [nodeid=2, born=3] > > > > > Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail: > LOST: > > > 25node1 [nodeid=0, born=15] > > > > > Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: > Remapping > > > WebServer_monitor_0 (rc=1) on 25node2 to an ERROR > > > > > Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: > Remapping > > > WebServer_monitor_0 (rc=1) on 25node3 to an ERROR > > > > > Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: > Remapping > > > WebServer_monitor_0 (rc=1) on 25node2 to an ERROR > > > > > Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: > Remapping > > > WebServer_monitor_0 (rc=1) on 25node3 to an ERROR > > > > > Jan 28 11:33:10 25node2 haclient: on_event: from message queue: > > > evt:cib_changed > > > > > Jan 28 11:33:11 25node2 MFSYS_STONITH_PLUGIN_v1.1[14687]: exiting > > > script with an rc=0 > > > > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14703]: gethosts > > > (node2slot=25node1=1,25node2=2,25node3=3; slot=) > > > > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14703]: exiting > > > script with an rc=0 > > > > > Jan 28 11:33:12 25node2 crmd: [6325]: info: process_lrm_event: LRM > > > operation STONITH_v1.1_start_0 (call=24, rc=0) complete > > > > > Jan 28 11:33:12 25node2 tengine: [6860]: info: match_graph_event: > > > Action STONITH_v1.1_start_0 (16) confirmed on 25node2 (rc=0) > > > > > Jan 28 11:33:12 25node2 tengine: [6860]: info: te_pseudo_action: > Pseudo > > > action 17 fired and confirmed > > > > > Jan 28 11:33:12 25node2 tengine: [6860]: info: te_fence_node: > Executing > > > reboot fencing operation (18) on 25node1 (timeout=50000) > > > > > Jan 28 11:33:12 25node2 haclient: on_event:evt:cib_changed > > > > > Jan 28 11:33:12 25node2 stonithd: [6323]: info: client tengine > [pid: > > > 6860] want a STONITH operation RESET to node 25node1. > > > > > Jan 28 11:33:12 25node2 stonithd: [6323]: info: > > > stonith_operate_locally::2368: sending fencing op (RESET) for 25node1 > to > > > device external (rsc_id=STONITH_v1.1, pid=14713) > > > > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: reset > 25node1 > > > (node2slot=25node1=1,25node2=2,25node3=3; slot=1) > > > > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: retry=0 > > > bladeState=-1 powerstate=-32 > > > > > Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: exiting > > > script with an rc=0 > > > > > > > > > > > > > > > When it fails: > > > > > > > > > > Jan 30 15:54:27 vs-cb03-5cl crmd: [5116]: info: do_lrm_rsc_op: > > > Performing op=CBSTONITH_monitor_0 > > > key=4:8:469db639-f549-4582-9735-2b5e89d147c2) > > > > > Jan 30 15:54:27 vs-cb03-5cl lrmd: [5113]: info: rsc:CBSTONITH: > monitor > > > > > Jan 30 15:54:27 vs-cb03-5cl crmd: [5116]: info: process_lrm_event: > LRM > > > operation CBSTONITH_monitor_0 (call=20, rc=7) complete > > > > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last > (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig.last) > > > > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: write_cib_contents: > > > Wrote version 0.1175.3 of the CIB to disk (digest: > > > 73c66f86b10cd0b1bc58a3beab870faa) > > > > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last > (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig.last) > > > > > Jan 30 15:54:28 vs-cb03-5cl crmd: [5116]: info: do_lrm_rsc_op: > > > Performing op=CBSTONITH_start_0 > > > key=19:8:469db639-f549-4582-9735-2b5e89d147c2) > > > > > Jan 30 15:54:28 vs-cb03-5cl lrmd: [5113]: info: rsc:CBSTONITH: > start > > > > > Jan 30 15:54:28 vs-cb03-5cl lrmd: [30768]: info: Try to start > STONITH > > > resource <rsc_id=CBSTONITH> : Device=external/MFSYS_STONITH_PLUGIN_v1.1 > > > > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last > (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig.last) > > > > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: write_cib_contents: > > > Wrote version 0.1175.5 of the CIB to disk (digest: > > > f2c35efbf0c7066043202d754cf607bb) > > > > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last > (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig.last) > > > > > Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30769]: > > > getconfignames (node2slot=; slot=) > > > > > Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30769]: > exiting > > > script with an rc=0 > > > > > Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30780]: > status > > > (node2slot=vs-cb03-2cl=2,vs-cb03-5cl=5,vs-cb03-6cl=6; slot=) > > > > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last > (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig.last) > > > > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: write_cib_contents: > > > Wrote version 0.1175.7 of the CIB to disk (digest: > > > 551519e2dfb361ccf0f4303167997435) > > > > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig) > > > > > Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: > Reading > > > cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last > (digest: > > > /var/lib/heartbeat/crm/cib.xml.sig.last) > > > > > Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30780]: > exiting > > > script with an rc=0 > > > > > Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: > gethosts > > > (node2slot=vs-cb03-2cl=2,vs-cb03-5cl=5,vs-cb03-6cl=6; slot=) > > > > > Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: > JOE02: > > > hostlist = vs-cb03-2cl vs-cb03-5cl vs-cb03-6cl > > > > > Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: > exiting > > > script with an rc=0 > > > > > Jan 30 15:54:31 vs-cb03-5cl crmd: [5116]: info: process_lrm_event: > LRM > > > operation CBSTONITH_start_0 (call=21, rc=0) complete > > > > > _______________________________________________ > > > > > Linux-HA mailing list > > > > > Linux-HA@lists.linux-ha.org > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > See also: http://linux-ha.org/ReportingProblems > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > Linux-HA@lists.linux-ha.org > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > Linux-HA@lists.linux-ha.org > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > > > Linux-HA mailing list > > > Linux-HA@lists.linux-ha.org > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > > _______________________________________________ > > Linux-HA mailing list > > Linux-HA@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems