
> [...]
> > But back to the original question:
> >
> >>>>> Is there a way to tell Linux-HA to retry a failed 
> resource after a
> >
> >>>>> certain amount of time again? [...]
> >
> > The mentioned cluster had also a feature called "auto-clear" which 
> > would clear the faulted-state after some time.
> > I personally dislike this idea - while I think the idea of a 
> > confidence-interval, which clears the fail-count if a 
> resource has not 
> > faulted and is online again is a good one.
> is it not essentially the same thing but with a more 
> complicated formula?

No - it's a different pair of shoes.

"auto-clear" comes into action only after a resource has failed - which
in my point of view should only be the case when there is something
completely wrong and can't be fixed automatically - but in that case a
human intervention is needed any way - or there is something wrong with
monitoring-methods or the cluster-setup.

Clearing the fail-count is a more common situation: Timeouts of
monitoring procedures, a monitoring that monitors too early, or a
temporary resource failure which should not cause a failover are here
the main sources - in these cases a warning would be ok - but human
intervention is not normally needed.

Kind regards, Nils
Linux-HA mailing list
See also: http://linux-ha.org/ReportingProblems

Reply via email to