Re: [ClusterLabs] Antw: Retries before setting fail-count to INFINITY

Ken Gaillot Mon, 21 Aug 2017 07:57:42 -0700

On Mon, 2017-08-21 at 15:39 +0200, Ulrich Windl wrote:
> >>> Vaibhaw Pandey <vabu.v...@gmail.com> schrieb am 21.08.2017 um 14:58 in
> Nachricht
> <CAAdwLTsZMX5fD=RsA7k1DKgMKoZ51A0jM=hay4rub4ef44z...@mail.gmail.com>:
> > Version in use: 1.1 along with corosync 1.4
> > 
> > Hello,
> > I am new to pacemaker and was trying to setup a MySQL master/slave cluster
> > using pacemaker and had a question on resource failure response which I
> > couldn't resolve from the documentation.
> > 
> > The pacemaker doc (
> > https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_fa
> >  
> > ilure_response.html)
> > says clearly that:
> > 
> > "Normally, if a running resource fails, pacemaker will try to stop it and
> > start it again."
> > 
> > I was wondering if there is a way to configure the # of times pacemaker
> > will attempt this start and stop sequence - we want to try and restart the
> > resource 2 or 3 times before it is stopped. Obviously setting a
> 
> Maybe you misunderstood: A stopped resource is the precondition for a 
> successful start. So before any start attempt of a failed resource comes a 
> stop attempt. If your monitor times out, try to increase the monitor timeout; 
> it it causes false alerts, fix the monitor. If the database is crap, replace 
> the database ;-)


Agreed, the ideal solution here is to fix the monitor. (It is free to
try 2 or 3 times before returning a result.)

FYI, there is a planned overhaul of pacemaker's failure handling that
would give this capability. The new options would allow you to say
"ignore this many failures, then try restarting this many times, then do
this hard recovery action". However, there's no time frame for when that
will arrive.

> > migration-threshold doesn't work in this case because the moment the 1st
> > attempt to restart the resource fails, fail-count is set to INFINITY. Our
> > failure-timeout is set to default (0).
> 
> Yes, the cluster cannot predict the future: If the resource failed to start, 
> it's unlikely that repeating the same thing will suddenly succeed. It's more 
> likely that the start will suceed elesewhere (disregarding configuration 
> errors).
> 
> > 
> > The reason we wish to do this is that, at times the database is busy and
> > the monitor action fails. However there is a good chance it might succeed
> > on a second or third attempt.
> 
> "it" is "monitor" operation?
> 
> > 
> > Is there a parameter in pacemaker that we can utilize to cause this
> > behavior or will this have to be coded in the resource agent?
> 
> See above.
> 
> > 
> > Thanks,
> > Vaibhaw
> 
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Ken Gaillot <kgail...@redhat.com>





_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Retries before setting fail-count to INFINITY

Reply via email to