On 09/22/2016 10:43 AM, Jan Pokorný wrote: > On 21/09/16 10:51 +1000, Andrew Beekhof wrote: >> On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot <kgail...@redhat.com> wrote: >>> Our first proposed approach would add a new hard-fail-threshold >>> operation property. If specified, the cluster would first try restarting >>> the resource on the same node, >> >> >> Well, just as now, it would be _allowed_ to start on the same node, but >> this is not guaranteed. > > Yeah, I should attend doublethink classes to understand "the same > node" term better: > > https://github.com/ClusterLabs/pacemaker/pull/1146/commits/3b3fc1fd8f2c95d8ab757711cf096cf231f27941
"Same node" is really a shorthand to hand-wave some details, because that's what will typically happen. The exact behavior is: "If the fail-count on this node reaches <N>, ban this node from running the resource." That's not the same as *requiring* the resource to restart on the same node before <N> is reached. As in any situation, Pacemaker will re-evaluate the current state of the cluster, and choose the best node to try starting the resource on. For example, consider if the failed resource with on-fail=restart is colocated with another resource with on-fail=standby that also failed, then the whole node will be put in standby, and the original resource will of course move away. It will be restarted, but the start will happen on another node. There are endless such scenarios, so "try restarting on the same node" is not really accurate. To be accurate, I should have said something like "try restarting without banning the node with the failure". _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org