On 09/22/2016 12:58 PM, Kristoffer Grönlund wrote: > Ken Gaillot <kgail...@redhat.com> writes: >> >> "restart" is the only on-fail value that it makes sense to escalate. >> >> block/stop/fence/standby are final. Block means "don't touch the >> resource again", so there can't be any further response to failures. >> Stop/fence/standby move the resource off the local node, so failure >> handling is reset (there are 0 failures on the new node to begin with). > > Hrm. If a restart potentially migrates the resource to a different node, > is the failcount reset then as well? If so, wouldn't that complicate the > hard-fail-threshold variable too, since potentially, the resource could > keep migrating between nodes and since the failcount is reset on each > migration, it would never reach the hard-fail-threshold. (or am I > missing something?)
The failure count is specific to each node. By "failure handling is reset" I mean that when the resource moves to a different node, the failure count of the original node no longer matters -- the new node's failure count is now what matters. A node's failure count is reset only when the user manually clears it, or the node is rebooted. Also, resources may have a failure-timeout configured, in which case the count will go down as failures expire. So, a resource with on-fail=restart would never go back to a node where it had previously reached the threshold, unless that node's fail count were cleared in one of those ways. _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org