Hi, > [...] > > But back to the original question: > > > >>>>> Is there a way to tell Linux-HA to retry a failed > resource after a > > > >>>>> certain amount of time again? [...] > > > > The mentioned cluster had also a feature called "auto-clear" which > > would clear the faulted-state after some time. > > I personally dislike this idea - while I think the idea of a > > confidence-interval, which clears the fail-count if a > resource has not > > faulted and is online again is a good one. > > is it not essentially the same thing but with a more > complicated formula?
No - it's a different pair of shoes. "auto-clear" comes into action only after a resource has failed - which in my point of view should only be the case when there is something completely wrong and can't be fixed automatically - but in that case a human intervention is needed any way - or there is something wrong with monitoring-methods or the cluster-setup. Clearing the fail-count is a more common situation: Timeouts of monitoring procedures, a monitoring that monitors too early, or a temporary resource failure which should not cause a failover are here the main sources - in these cases a warning would be ok - but human intervention is not normally needed. Kind regards, Nils _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems