OK, I've set the stonith action to "poweroff" and I already had quarum action set to "ignore". The "poweroff" makes is much easier to re-set "stonith-enabled" to "false" so that I can get two systems online again. ;-)
However, I was more hoping to be able to reboot the fenced system without triggering a reboot (or halt) of the working system. Here are some specifics: SLES11 HAE (GA) external/ipmi two HA servers <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a"/> <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/> <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1242661586"/> <nvpair id="cib-bootstrap-options-no_quorum_policy" name="no_quorum_policy" value="ignore"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/> <nvpair id="nvpair-a8fa01f7-fd6c-4e9b-adf6-0e48250691f1" name="stonith-action" value="poweroff"/> <nvpair id="nvpair-1d2c923d-7619-4b45-989a-698357f9f8cb" name="no-quorum-policy" value="ignore"/> </cluster_property_set> And, the two stonith resources: <primitive class="stonith" id="ipmi_stonith_hikari" type="external/ipmi"> <meta_attributes id="ipmi_stonith_hikari-meta_attributes"/> <operations id="ipmi_stonith_hikari-operations"> <op id="ipmi_stonith_hikari-op-monitor-15" interval="30" name="monitor" start-delay="30" timeout="30"/> </operations> <instance_attributes id="ipmi_stonith_hikari-instance_attributes"> <nvpair id="nvpair-d95c4018-1ebc-447b-9028-050e68c9929c" name="hostname" value="hikari"/> <nvpair id="nvpair-3aca66aa-bb82-43ec-8b63-e936b2507fa3" name="ipaddr" value="172.16.1.247"/> <nvpair id="nvpair-3f623098-c266-4132-8d9c-77744e0e8713" name="userid" value="ADMIN"/> <nvpair id="nvpair-04e6a6d7-6541-45d4-8d36-9768e240e79d" name="passwd" value="ADMIN"/> <nvpair id="nvpair-1a90ef3c-3b67-41c2-98cf-58b8a2f9cfe0" name="interface" value="lanplus"/> </instance_attributes> </primitive> <primitive class="stonith" id="ipmi_stonith_hikari2" type="external/ipmi"> <meta_attributes id="ipmi_stonith_hikari2-meta_attributes"> <nvpair id="nvpair-88049439-39e2-459d-9820-78cdeb9ae282" name="target-role" value="started"/> </meta_attributes> <operations id="ipmi_stonith_hikari2-operations"> <op id="ipmi_stonith_hikari2-op-monitor-15" interval="30" name="monitor" start-delay="30" timeout="30"/> </operations> <instance_attributes id="ipmi_stonith_hikari2-instance_attributes"> <nvpair id="nvpair-c4b4e4ce-6f9a-4a8d-a7fb-b8726f09ccf0" name="hostname" value="hikari2"/> <nvpair id="nvpair-e9d42aca-110f-4308-a3dd-645d793e49d3" name="ipaddr" value="172.16.1.248"/> <nvpair id="nvpair-31b086de-5209-4361-a4b8-55460cad95a8" name="userid" value="ADMIN"/> <nvpair id="nvpair-5b3c6b97-a49e-4d18-beea-6d7aaec000fa" name="passwd" value="ADMIN"/> <nvpair id="nvpair-6f98c068-7b2e-4309-8f5b-2c7c2527cc93" name="interface" value="lanplus"/> </instance_attributes> </primitive> And the relevant pair of constraints: <rsc_location id="stonith_hikari_on_hikari2" node="hikari" rsc="ipmi_stonith_hikari" score="-INFINITY"/> <rsc_location id="stonith_hikari2_on_hikari" node="hikari2" rsc="ipmi_stonith_hikari2" score="-INFINITY"/> Any suggestions as to what needs changing so that the stonith deathmarch can be avoided? Cheers and thanks, Bob Haxo SGI On Fri, 2009-05-15 at 20:26 -0500, Karl Katzke wrote: > Bob, as we've discussed a few other times recently, when you're > testing (and depending on your preference in production), you may want > to set the stonith policy to 'poweroff' as opposed to 'reboot'. > Also, if you have a two-node cluster, pacemaker depends on quorum and > the loss thereof creates another stonith event. You'll want to set the > loss of quorum action to 'ignore'. > ... in short, RTFM: http://www.clusterlabs.org/wiki/Documentation -- > Pacemaker Configuration Explained 1.0 has *everything* you need to > know in it. > > > -K > > > --- > Karl Katzke > Systems Analyst II > TAMU - DRGS > > > > > > > >>> On 5/15/2009 at 7:22 PM, in message > <1242433367.21186.4.ca...@nalu.engr.sgi.com>, Bob Haxo <bh...@sgi.com> wrote: > > > Ok, never mind this question. "ifdown interface" works nicely to > > trigger STONITH action. > > > > Unfortunately (if I may ask a new question) ... I now have one server > > rebooting, then the other rebooting, and back to the first rebooting in > > what looks to be an endless loop of reboots. > > > > Suggestions? > > > > Cheers, > > Bob Haxo > > SGI > > > > On Fri, 2009-05-15 at 16:53 -0700, Bob Haxo wrote: > > > > > Greetings, > > > > > > What manual administrative actions can be used to trigger STONITH > > > action? > > > > > > I have created a pair of STONITH resources (external/ipmi) and would > > > like to test that these resources work as expected (which, if I > > > understand the default correctly, is to reboot the node). > > > > > > Thanks, > > > Bob Haxo > > > SGI > > > > > > SLES11 HAE > > > > > > _______________________________________________ > > > Pacemaker mailing list > > > Pacemaker@oss.clusterlabs.org > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
_______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker