Re: [Pacemaker] trigger STONITH for testing purposes

2009-06-03 Thread Andrew Beekhof
2009/6/3 Yan Gao :
> On Wed, 2009-06-03 at 09:26 +0200, Andrew Beekhof wrote:
>> 2009/6/3 Yan Gao :

>> > Andrew,
>> > If we execute crm_mon without "-r", the resources have ever been running
>> > on the uncleanly offline node will be hidden.
>>
>> Even when stonith-enabled is set to true?
> Yes, when the node is in "unclean offline", before it has been stonithed
> or there's no stonith resource configured. As far as I found, this is
> the only situation that the inconsistency happens.

Oh, if thats happening then crm_mon is definitely wrong.
Sorry, I misunderstood the problem.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-06-03 Thread Yan Gao
On Wed, 2009-06-03 at 09:26 +0200, Andrew Beekhof wrote:
> 2009/6/3 Yan Gao :
> > On Fri, 2009-05-22 at 12:33 +0200, Andrew Beekhof wrote:
> >> On Wed, May 20, 2009 at 6:39 PM, Bob Haxo  wrote:
> >> > Hi Andrew,
> >> >
> >> > I'd say you removed no-quorum-policy=ignore
> >> >
> >> > Actually, the pair of no_quorum_policy and no-quorum-policy are set to
> >> > "ignore", and expected-quorum-votes is set to "2":
> >> >
> >> >   
> >> > 
> >> >   ...
> >> >>> > name="expected-quorum-votes" value="2"/>
> >> >>> > name="no_quorum_policy" value="ignore"/>
> >> >>> > name="no-quorum-policy" value="ignore"/>
> >> >   ...
> >> >   
> >> >
> >> >
> >> > Removing the no-quorum-policy=ignore and no_quorum_policy=ignore (as in,
> >> > deleting the variables) left the cluster unable to failover with either 
> >> > an
> >> > ifdown iface or with a node reboot.  The state displayed by the GUI did 
> >> > not
> >> > agree with the state displayed by crm_mon (the GUI showed the ifdown or
> >> > rebooted node as still controlling resources, whereas crm_mon showed the
> >> > resources unavailable ... both showed the inaccessible node as offline).
> >>
> >> Assuming stonith-enabled was set to false, crm_mon is correct as the
> >> cluster assumes that the node is cleanly down*.
> >> You should file a bug for the GUI in that case.
> > It happens when a node is uncleanly offline, while the resources are
> > still seen running on the node (according to rsc->running_on) , and the
> > resources's role is still "Started".
> >
> > Changed in mgmtd:
> > http://hg.clusterlabs.org/pacemaker/pygui/rev/f6b91f133ce8
> >
> > In that case, regards the resources status is "unclean":
> > ..
> > if (g_list_length(rsc->running_on) > 0
> > && rsc->fns->active(rsc, TRUE) == FALSE) {
> > strncat(buf, "unclean", sizeof(buf)-strlen(buf)-1);
> > ..
> >
> >
> > Andrew,
> > If we execute crm_mon without "-r", the resources have ever been running
> > on the uncleanly offline node will be hidden.
> 
> Even when stonith-enabled is set to true?
Yes, when the node is in "unclean offline", before it has been stonithed
or there's no stonith resource configured. As far as I found, this is
the only situation that the inconsistency happens.

> 
> > While with "-r", the
> > primitive resources will be shown as "Started" on that node.
> > "crm_resource -W"  has the same behavior.
> >
> > That's inconsistent. Perhaps we also need to consider if resources are
> > "active" when those options are enabled?
> >

-- 
Regards,
Yan Gao
China R&D Software Engineer
y...@novell.com

Novell, Inc.
Making IT Work As One?6?4


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-06-03 Thread Andrew Beekhof
2009/6/3 Yan Gao :
> On Fri, 2009-05-22 at 12:33 +0200, Andrew Beekhof wrote:
>> On Wed, May 20, 2009 at 6:39 PM, Bob Haxo  wrote:
>> > Hi Andrew,
>> >
>> > I'd say you removed no-quorum-policy=ignore
>> >
>> > Actually, the pair of no_quorum_policy and no-quorum-policy are set to
>> > "ignore", and expected-quorum-votes is set to "2":
>> >
>> >   
>> >     
>> >       ...
>> >       > > name="expected-quorum-votes" value="2"/>
>> >       > > name="no_quorum_policy" value="ignore"/>
>> >       > > name="no-quorum-policy" value="ignore"/>
>> >       ...
>> >       
>> >    
>> >
>> > Removing the no-quorum-policy=ignore and no_quorum_policy=ignore (as in,
>> > deleting the variables) left the cluster unable to failover with either an
>> > ifdown iface or with a node reboot.  The state displayed by the GUI did not
>> > agree with the state displayed by crm_mon (the GUI showed the ifdown or
>> > rebooted node as still controlling resources, whereas crm_mon showed the
>> > resources unavailable ... both showed the inaccessible node as offline).
>>
>> Assuming stonith-enabled was set to false, crm_mon is correct as the
>> cluster assumes that the node is cleanly down*.
>> You should file a bug for the GUI in that case.
> It happens when a node is uncleanly offline, while the resources are
> still seen running on the node (according to rsc->running_on) , and the
> resources's role is still "Started".
>
> Changed in mgmtd:
> http://hg.clusterlabs.org/pacemaker/pygui/rev/f6b91f133ce8
>
> In that case, regards the resources status is "unclean":
> ..
> if (g_list_length(rsc->running_on) > 0
>                 && rsc->fns->active(rsc, TRUE) == FALSE) {
>     strncat(buf, "unclean", sizeof(buf)-strlen(buf)-1);
> ..
>
>
> Andrew,
> If we execute crm_mon without "-r", the resources have ever been running
> on the uncleanly offline node will be hidden.

Even when stonith-enabled is set to true?

> While with "-r", the
> primitive resources will be shown as "Started" on that node.
> "crm_resource -W"  has the same behavior.
>
> That's inconsistent. Perhaps we also need to consider if resources are
> "active" when those options are enabled?
>
> --
> Regards,
> Yan Gao
> China R&D Software Engineer
> y...@novell.com
>
> Novell, Inc.
> Making IT Work As One™
>
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-06-02 Thread Yan Gao
On Fri, 2009-05-22 at 12:33 +0200, Andrew Beekhof wrote: 
> On Wed, May 20, 2009 at 6:39 PM, Bob Haxo  wrote:
> > Hi Andrew,
> >
> > I'd say you removed no-quorum-policy=ignore
> >
> > Actually, the pair of no_quorum_policy and no-quorum-policy are set to
> > "ignore", and expected-quorum-votes is set to "2":
> >
> >   
> > 
> >   ...
> >> name="expected-quorum-votes" value="2"/>
> >> name="no_quorum_policy" value="ignore"/>
> >> name="no-quorum-policy" value="ignore"/>
> >   ...
> >   
> >
> >
> > Removing the no-quorum-policy=ignore and no_quorum_policy=ignore (as in,
> > deleting the variables) left the cluster unable to failover with either an
> > ifdown iface or with a node reboot.  The state displayed by the GUI did not
> > agree with the state displayed by crm_mon (the GUI showed the ifdown or
> > rebooted node as still controlling resources, whereas crm_mon showed the
> > resources unavailable ... both showed the inaccessible node as offline).
> 
> Assuming stonith-enabled was set to false, crm_mon is correct as the
> cluster assumes that the node is cleanly down*.
> You should file a bug for the GUI in that case.
It happens when a node is uncleanly offline, while the resources are
still seen running on the node (according to rsc->running_on) , and the
resources's role is still "Started".

Changed in mgmtd:
http://hg.clusterlabs.org/pacemaker/pygui/rev/f6b91f133ce8

In that case, regards the resources status is "unclean":
..
if (g_list_length(rsc->running_on) > 0
 && rsc->fns->active(rsc, TRUE) == FALSE) {
 strncat(buf, "unclean", sizeof(buf)-strlen(buf)-1);
..


Andrew,
If we execute crm_mon without "-r", the resources have ever been running
on the uncleanly offline node will be hidden.  While with "-r", the
primitive resources will be shown as "Started" on that node.
"crm_resource -W"  has the same behavior.

That's inconsistent. Perhaps we also need to consider if resources are
"active" when those options are enabled? 

-- 
Regards,
Yan Gao
China R&D Software Engineer
y...@novell.com

Novell, Inc.
Making IT Work As One?6?4


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-22 Thread Andrew Beekhof
On Wed, May 20, 2009 at 6:39 PM, Bob Haxo  wrote:
> Hi Andrew,
>
> I'd say you removed no-quorum-policy=ignore
>
> Actually, the pair of no_quorum_policy and no-quorum-policy are set to
> "ignore", and expected-quorum-votes is set to "2":
>
>   
>     
>   ...
>    name="expected-quorum-votes" value="2"/>
>    name="no_quorum_policy" value="ignore"/>
>    name="no-quorum-policy" value="ignore"/>
>   ...
>   
>    
>
> Removing the no-quorum-policy=ignore and no_quorum_policy=ignore (as in,
> deleting the variables) left the cluster unable to failover with either an
> ifdown iface or with a node reboot.  The state displayed by the GUI did not
> agree with the state displayed by crm_mon (the GUI showed the ifdown or
> rebooted node as still controlling resources, whereas crm_mon showed the
> resources unavailable ... both showed the inaccessible node as offline).

Assuming stonith-enabled was set to false, crm_mon is correct as the
cluster assumes that the node is cleanly down*.
You should file a bug for the GUI in that case.

* Which is clearly insane and going to cause data corruption some day,
but its also the only way the cluster can continue if STONITH is
disabled.
For this reason SUSE wont support any cluster without a valid STONITH setup.

>
> Setting the no-quorum-policy=stop had the same results, which included the
> resources not migrating to the working system until returning
> no-quorum-policy=ignore.  One of the tests led to filesystem corruption.

Without STONITH I can easily believe this happened.

> Very messy.  (this is a test-only setup, so no real data is present)
>
> So, no, the change that I made was neither deleting nor setting
> no-quorum-policy=stop.

Strange.

> Setting no-quorum-policy=ignore seems to be required
> for the cluster to support migrations and failovers.

For two node clusters, yes.

Heartbeat pretends that 2 node clusters always have quorum but this is
not the case when using OpenAIS.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-20 Thread Bob Haxo
Hi Andrew,


> I'd say you removed no-quorum-policy=ignore


Actually, the pair of no_quorum_policy and no-quorum-policy are set to
"ignore", and expected-quorum-votes is set to "2":

  

  ...
  
  
  
  ...
  
   


Removing the no-quorum-policy=ignore and no_quorum_policy=ignore (as in,
deleting the variables) left the cluster unable to failover with either
an ifdown iface or with a node reboot.  The state displayed by the GUI
did not agree with the state displayed by crm_mon (the GUI showed the
ifdown or rebooted node as still controlling resources, whereas crm_mon
showed the resources unavailable ... both showed the inaccessible node
as offline).

Setting the no-quorum-policy=stop had the same results, which included
the resources not migrating to the working system until returning
no-quorum-policy=ignore.  One of the tests led to filesystem corruption.
Very messy.  (this is a test-only setup, so no real data is present)

So, no, the change that I made was neither deleting nor setting
no-quorum-policy=stop.  Setting no-quorum-policy=ignore seems to be
required for the cluster to support migrations and failovers.

Cheers and thanks,
Bob Haxo


On Wed, 2009-05-20 at 11:17 +0200, Andrew Beekhof wrote:

> On Wed, May 20, 2009 at 1:31 AM, Bob Haxo  wrote:
> > Greetings,
> >
> > I liked the idea of not starting the cluster at boot, and found that the
> > fenced node would reboot and then openais start brought the node onboard
> > without triggering a reboot of the already running node.
> >
> > Then magic happened.  I chkconfig'd openais to start with boot, re-ran the
> > "ifdown eth0" command that had been triggering STONITH and then the STONITH
> > deathmarch, and, well, everything worked.  I've done this test many 10s of
> > times without a STONITH deathmarch.
> >
> > Unfortunately, I haven't a clue as to what was changed that cleared the
> > issue.
> 
> At a guess, I'd say you removed no-quorum-policy=ignore
> OpenAIS based clusters don't pretend they have quorum when only 1 of
> the 2 nodes is available (and you cant start shooting until you have
> quorum or the above option is set).
> 
> 
> >
> > Thanks for all the suggestions.
> >
> > Cheers,
> > Bob Haxo
> >
> >
> > On Tue, 2009-05-19 at 14:03 +0200, Andrew Beekhof wrote:
> >
> > On Mon, May 18, 2009 at 8:12 PM, Bob Haxo  wrote:
> >>
> >> Any suggestions as to what needs changing so that the stonith deathmarch
> >> can
> >> be avoided?
> >
> > If you only have two nodes, the only two ways have already discussed:
> > use poweroff, or don't start the cluster at boot.
> > If you don't want to do either of those, the only way to terminate the
> > stonith loop is to fix the network failure.
> >
> > If you had 3 or more nodes, the returning node wouldn't have quorum
> > and therefore wouldn't be allowed to shoot anyone.
> >
> > ___
> > Pacemaker mailing list
> > Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > ___
> > Pacemaker mailing list
> > Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> >
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-20 Thread Andrew Beekhof
On Wed, May 20, 2009 at 1:31 AM, Bob Haxo  wrote:
> Greetings,
>
> I liked the idea of not starting the cluster at boot, and found that the
> fenced node would reboot and then openais start brought the node onboard
> without triggering a reboot of the already running node.
>
> Then magic happened.  I chkconfig'd openais to start with boot, re-ran the
> "ifdown eth0" command that had been triggering STONITH and then the STONITH
> deathmarch, and, well, everything worked.  I've done this test many 10s of
> times without a STONITH deathmarch.
>
> Unfortunately, I haven't a clue as to what was changed that cleared the
> issue.

At a guess, I'd say you removed no-quorum-policy=ignore
OpenAIS based clusters don't pretend they have quorum when only 1 of
the 2 nodes is available (and you cant start shooting until you have
quorum or the above option is set).


>
> Thanks for all the suggestions.
>
> Cheers,
> Bob Haxo
>
>
> On Tue, 2009-05-19 at 14:03 +0200, Andrew Beekhof wrote:
>
> On Mon, May 18, 2009 at 8:12 PM, Bob Haxo  wrote:
>>
>> Any suggestions as to what needs changing so that the stonith deathmarch
>> can
>> be avoided?
>
> If you only have two nodes, the only two ways have already discussed:
> use poweroff, or don't start the cluster at boot.
> If you don't want to do either of those, the only way to terminate the
> stonith loop is to fix the network failure.
>
> If you had 3 or more nodes, the returning node wouldn't have quorum
> and therefore wouldn't be allowed to shoot anyone.
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-19 Thread Bob Haxo
Greetings,

I liked the idea of not starting the cluster at boot, and found that the
fenced node would reboot and then openais start brought the node onboard
without triggering a reboot of the already running node.

Then magic happened.  I chkconfig'd openais to start with boot, re-ran
the "ifdown eth0" command that had been triggering STONITH and then the
STONITH deathmarch, and, well, everything worked.  I've done this test
many 10s of times without a STONITH deathmarch.

Unfortunately, I haven't a clue as to what was changed that cleared the
issue.

Thanks for all the suggestions.

Cheers,
Bob Haxo


On Tue, 2009-05-19 at 14:03 +0200, Andrew Beekhof wrote:

> On Mon, May 18, 2009 at 8:12 PM, Bob Haxo  wrote:
> >
> > Any suggestions as to what needs changing so that the stonith deathmarch can
> > be avoided?
> 
> If you only have two nodes, the only two ways have already discussed:
> use poweroff, or don't start the cluster at boot.
> If you don't want to do either of those, the only way to terminate the
> stonith loop is to fix the network failure.
> 
> If you had 3 or more nodes, the returning node wouldn't have quorum
> and therefore wouldn't be allowed to shoot anyone.
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-19 Thread Andrew Beekhof
On Mon, May 18, 2009 at 8:12 PM, Bob Haxo  wrote:
>
> Any suggestions as to what needs changing so that the stonith deathmarch can
> be avoided?

If you only have two nodes, the only two ways have already discussed:
use poweroff, or don't start the cluster at boot.
If you don't want to do either of those, the only way to terminate the
stonith loop is to fix the network failure.

If you had 3 or more nodes, the returning node wouldn't have quorum
and therefore wouldn't be allowed to shoot anyone.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-19 Thread Tim Serong
Bob Haxo wrote:
> OK, I've set the stonith action to "poweroff" and I already had quarum
> action set to "ignore".  The "poweroff" makes is much easier to re-set
> "stonith-enabled" to "false" so that I can get two systems online
> again. ;-)
> 
> However, I was more hoping to be able to reboot the fenced system
> without triggering a reboot (or halt) of the working system.  Here are
> some specifics:
> 
> SLES11 HAE (GA)
> external/ipmi
> two HA servers
> 
> 
> ...
> 
> Any suggestions as to what needs changing so that the stonith deathmarch
> can be avoided?

I can't offer any useful commentary on your config, but I can suggest
another trick for debugging this:

1) Change the IPMI password, so that STONITH will still be attempted,
   but will fail (can't reboot the node due to authentication failure).
2) This will put the cluster into a slightly bizarre state, where
   (ultimately) no resources will run properly, but at least your
   machines won't be continually rebooting.
3) tail and/or cat /var/log/ha_log and /var/log/ha_debug (or whererever
   the log files are) on both nodes.  This should tell you what it was
   that failed that resulted in STONITH, and hopefully give you some
   idea of where to look next (eg: if a "stop" action failed, maybe
   instrument that resource agent to log more detailed failure
   messages).
4) Don't forget to reset your IMPI passwords once the problem is
   solved! :)

Hope that helps,

Tim

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-18 Thread Bob Haxo
OK, I've set the stonith action to "poweroff" and I already had quarum
action set to "ignore".  The "poweroff" makes is much easier to re-set
"stonith-enabled" to "false" so that I can get two systems online
again. ;-)

However, I was more hoping to be able to reboot the fenced system
without triggering a reboot (or halt) of the working system.  Here are
some specifics:

SLES11 HAE (GA)
external/ipmi
two HA servers



  







  

And, the two stonith resources:

  


  


  
  
  
  
  

  
  

  


  


  
  
  
  
  

  

And the relevant pair of constraints:

  
  


Any suggestions as to what needs changing so that the stonith deathmarch
can be avoided?

Cheers and thanks,
Bob Haxo
SGI



On Fri, 2009-05-15 at 20:26 -0500, Karl Katzke wrote:

> Bob, as we've discussed a few other times recently, when you're
> testing (and depending on your preference in production), you may want
> to set the stonith policy to 'poweroff' as opposed to 'reboot'. 
> Also, if you have a two-node cluster, pacemaker depends on quorum and
> the loss thereof creates another stonith event. You'll want to set the
> loss of quorum action to 'ignore'. 
> ... in short, RTFM: http://www.clusterlabs.org/wiki/Documentation --
> Pacemaker Configuration Explained 1.0 has *everything* you need to
> know in it. 
> 
> 
> -K 
> 
> 
> ---
> Karl Katzke
> Systems Analyst II
> TAMU - DRGS
> 
> 
> 
> 
> 
> 
> >>> On 5/15/2009 at  7:22 PM, in message
> <1242433367.21186.4.ca...@nalu.engr.sgi.com>, Bob Haxo  wrote:
> 
> > Ok, never mind this question.  "ifdown interface" works nicely to 
> > trigger STONITH action. 
> >  
> > Unfortunately (if I may ask a new question) ... I now have one server 
> > rebooting, then the other rebooting, and back to the first rebooting in 
> > what looks to be an endless loop of reboots. 
> >  
> > Suggestions? 
> >  
> > Cheers, 
> > Bob Haxo 
> > SGI 
> >  
> > On Fri, 2009-05-15 at 16:53 -0700, Bob Haxo wrote: 
> >  
> > > Greetings, 
> > >  
> > > What manual administrative actions can be used to trigger STONITH 
> > > action?   
> > >  
> > > I have created a pair of STONITH resources (external/ipmi) and would 
> > > like to test that these resources work as expected (which, if I 
> > > understand the default correctly, is to reboot the node). 
> > >  
> > > Thanks, 
> > > Bob Haxo 
> > > SGI 
> > >  
> > > SLES11 HAE  
> > >  
> > > ___ 
> > > Pacemaker mailing list 
> > > Pacemaker@oss.clusterlabs.org 
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> >  
> 
> 
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-15 Thread Tim Serong
Bob Haxo wrote:
> Ok, never mind this question.  "ifdown interface" works nicely to
> trigger STONITH action.
> 
> Unfortunately (if I may ask a new question) ... I now have one server
> rebooting, then the other rebooting, and back to the first rebooting in
> what looks to be an endless loop of reboots.
> 
> Suggestions?

Funny you should ask.  I wrote up some notes about this exact problem a
few days ago:

  http://ourobengr.com/ha

There is more that needs to be added to this document (thank you Dejan &
Joe for the suggestions - I'll incorporate them as soon as I am able),
but it should nevertheless be of some use to you in its current form.

In a bizarre twist, I can't help but think that there is a small tragedy
unfolding here.  Myself and my former team were working on HA at SGI
until the December round of layoffs; were it not for that event I would
likely have been able to be of direct assistance :-/

*sigh*

Tim


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-15 Thread Karl Katzke
Bob, as we've discussed a few other times recently, when you're testing (and 
depending on your preference in production), you may want to set the stonith 
policy to 'poweroff' as opposed to 'reboot'. 

Also, if you have a two-node cluster, pacemaker depends on quorum and the loss 
thereof creates another stonith event. You'll want to set the loss of quorum 
action to 'ignore'. 

... in short, RTFM: http://www.clusterlabs.org/wiki/Documentation -- Pacemaker 
Configuration Explained 1.0 has *everything* you need to know in it. 

-K 


---
Karl Katzke
Systems Analyst II
TAMU - DRGS






>>> On 5/15/2009 at  7:22 PM, in message
<1242433367.21186.4.ca...@nalu.engr.sgi.com>, Bob Haxo  wrote:

> Ok, never mind this question.  "ifdown interface" works nicely to 
> trigger STONITH action. 
>  
> Unfortunately (if I may ask a new question) ... I now have one server 
> rebooting, then the other rebooting, and back to the first rebooting in 
> what looks to be an endless loop of reboots. 
>  
> Suggestions? 
>  
> Cheers, 
> Bob Haxo 
> SGI 
>  
> On Fri, 2009-05-15 at 16:53 -0700, Bob Haxo wrote: 
>  
> > Greetings, 
> >  
> > What manual administrative actions can be used to trigger STONITH 
> > action?   
> >  
> > I have created a pair of STONITH resources (external/ipmi) and would 
> > like to test that these resources work as expected (which, if I 
> > understand the default correctly, is to reboot the node). 
> >  
> > Thanks, 
> > Bob Haxo 
> > SGI 
> >  
> > SLES11 HAE  
> >  
> > ___ 
> > Pacemaker mailing list 
> > Pacemaker@oss.clusterlabs.org 
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
>  



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-15 Thread Bob Haxo
Ok, never mind this question.  "ifdown interface" works nicely to
trigger STONITH action.

Unfortunately (if I may ask a new question) ... I now have one server
rebooting, then the other rebooting, and back to the first rebooting in
what looks to be an endless loop of reboots.

Suggestions?

Cheers,
Bob Haxo
SGI

On Fri, 2009-05-15 at 16:53 -0700, Bob Haxo wrote:

> Greetings,
> 
> What manual administrative actions can be used to trigger STONITH
> action?  
> 
> I have created a pair of STONITH resources (external/ipmi) and would
> like to test that these resources work as expected (which, if I
> understand the default correctly, is to reboot the node).
> 
> Thanks,
> Bob Haxo
> SGI
> 
> SLES11 HAE 
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker