Re: [ClusterLabs] clearing failed actions

Ken Gaillot Tue, 30 May 2017 07:34:25 -0700

On 05/30/2017 09:13 AM, Attila Megyeri wrote:
> Hi,
> 
>  
> 
> Shouldn’t the 
> 
>  
> 
> cluster-recheck-interval="2m"
> 
>  
> 
> property instruct pacemaker to recheck the cluster every 2 minutes and
> clean the failcounts?


It instructs pacemaker to recalculate whether any actions need to be
taken (including expiring any failcounts appropriately).

> At the primitive level I also have a
> 
>  
> 
> migration-threshold="30" failure-timeout="2m"
> 
>  
> 
> but whenever I have a failure, it remains there forever.
> 
>  
> 
>  
> 
> What could be causing this?
> 
>  
> 
> thanks,
> 
> Attila
Is it a single old failure, or a recurring failure? The failure timeout
works in a somewhat nonintuitive way. Old failures are not individually
expired. Instead, all failures of a resource are simultaneously cleared
if all of them are older than the failure-timeout. So if something keeps
failing repeatedly (more frequently than the failure-timeout), none of
the failures will be cleared.

If it's not a repeating failure, something odd is going on.

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] clearing failed actions

Reply via email to