Hi Andrew, and thank you for prompt response. My objection with the failure stickiness is that in my lab the rscX will get back to nodeY after running on the standby.
Ex: I pull the power cord to ha-8 rsc_lim8 will start to run on the standby ha-9 I insert the power cord to ha-8 rsc_lim8 will stop on ha-9 and start to run on ha-8 This can be repeated. In my case the standby will get in a state where it will not start rsc_lim8 when ha-8 goes down, ever. That happens after the procedure I described when I first pull the power cord of one server, inserts it and pull the cord of another server before the first has started properly. And that behavior ceased when I removed the failure stickiness. BR. /Thomas *** Thomas This communication is confidential and intended solely for the addressee(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you believe this message has been sent to you in error, please notify the sender by replying to this transmission and delete the message without disclosing it. Thank you. E-mail including attachments is susceptible to data corruption, interruption, unauthorized amendment, tampering and viruses, and we only send and receive e-mails on the basis that we are not liable for any such corruption, interception, amendment, tampering or viruses or any consequences thereof. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Beekhof Sent: den 4 juni 2007 13:20 To: General Linux-HA mailing list Subject: Re: [Linux-HA] standby does not take over on multiple power failure On 6/4/07, Thomas Åkerblom (HF/EBC) <[EMAIL PROTECTED]> wrote: > Hi Andrew. > I'm using 2.0.8-0.15, but I have seen the same behavior in 2.0.7. > In this case ha-9 is DC and also the standby server. > ha-8 has no power, but the standby server does not take over. > The logs begin right before I pulled the power cord. > > Actually I do know how to get around this problem now, but I also have some > new questions. > If I remove the line: > <nvpair id="default_resource_failure_stickiness" > name="default_resource_failure_stickiness" value="-INFINITY"/> > In the cib file the problem disappears. > I wouldn't expect that parameter to have this effect, rather the opposite. > Is this a known/expected correlation? not so much "correlation" as "thats what its designed to do". setting default_resource_failure_stickiness=-INFINITY means that if heartbeat finds the rscX as failed on nodeY, then never ever consider nodeY as a valid place to run rscX ever again... at least not until the admin "clears" the error by resetting the failcount. in the future we'll expire the failures after "a period of time" but that is not yet implemented as the lrm doesn't provide the infomation to do so. > I would like to set that parameter in order to be able to use the failure > counters. > Furthermore I am not able to read and reset the counters using: > > crm_failcount -G -U ha-8 -r rsc_lim8 > The result is always 0 > > crm_failcount -D -U ha-8 -r rsc_lim8 > Error performing operation: The object/attribute does not exist. later versions return 0 instead of "The object/attribute does not exist." updated packages for most distros/platforms are available at: http://software.opensuse.org/download/server:/ha-clustering/ _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems