RE: [Linux-HA] standby does not take over on multiple power failure

HF/EBC Mon, 04 Jun 2007 23:29:38 -0700

Hi Andrew, and thank you for prompt response.
My objection with the failure stickiness is that in my lab the rscX will get 
back to nodeY after running on the standby.

Ex:
I pull the power cord to ha-8
        rsc_lim8 will start to run on the standby ha-9
I insert the power cord to ha-8
        rsc_lim8 will stop on ha-9 and start to run on ha-8
This can be repeated.

In my case the standby will get in a state where it will not start rsc_lim8 
when ha-8 goes down, ever. That happens after the procedure I described when I 
first pull the power cord of one server, inserts it and pull the cord of 
another server before the first has started properly. And that behavior ceased 
when I removed the failure stickiness.

BR.
/Thomas

 *** Thomas
This communication is confidential and intended solely for the addressee(s). 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
believe this message has been sent to you in error, please notify the sender by 
replying to this transmission and delete the message without disclosing it. 
Thank you.
E-mail including attachments is susceptible to data corruption, interruption, 
unauthorized amendment, tampering and viruses, and we only send and receive 
e-mails on the basis that we are not liable for any such corruption, 
interception, amendment, tampering or viruses or any consequences thereof.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Beekhof
Sent: den 4 juni 2007 13:20
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] standby does not take over on multiple power failure

On 6/4/07, Thomas Åkerblom (HF/EBC) <[EMAIL PROTECTED]> wrote:
> Hi Andrew.
> I'm using 2.0.8-0.15, but I have seen the same behavior in 2.0.7.
> In this case ha-9 is DC and also the standby server.
> ha-8 has no power, but the standby server does not take over.
> The logs begin right before I pulled the power cord.
>
> Actually I do know how to get around this problem now, but I also have some 
> new questions.
> If I remove the line:
> <nvpair id="default_resource_failure_stickiness" 
> name="default_resource_failure_stickiness" value="-INFINITY"/>
> In the cib file the problem disappears.
> I wouldn't expect that parameter to have this effect, rather the opposite.
> Is this a known/expected correlation?

not so much "correlation" as "thats what its designed to do".

setting default_resource_failure_stickiness=-INFINITY means that if
heartbeat finds the rscX as failed on nodeY, then never ever consider
nodeY as a valid place to run rscX ever again... at least not until
the admin "clears" the error by resetting the failcount.

in the future we'll expire the failures after "a period of time" but
that is not yet implemented as the lrm doesn't provide the infomation
to do so.

> I would like to set that parameter in order to be able to use the failure 
> counters.
> Furthermore I am not able to read and reset the counters using:
>
> crm_failcount -G -U ha-8 -r rsc_lim8
>         The result is always 0
>
> crm_failcount -D -U ha-8 -r rsc_lim8
>         Error performing operation: The object/attribute does not exist.

later versions return 0 instead of "The object/attribute does not exist."

updated packages for most distros/platforms are available at:
   http://software.opensuse.org/download/server:/ha-clustering/
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

RE: [Linux-HA] standby does not take over on multiple power failure

Reply via email to