On Fri, 2018-05-04 at 15:46 +0200, alessandro.par...@softeco.it wrote: > Hi, I have a problem with a cluster Pacemaker (0.9.158) Corosync > (Corosync Cluster Engine 2.4.0), composed by two servers (Oracle > Cloud) with Oracle Linux Server 7.4. > On one of the two node (for example node1), a service seems to fail a > great number of times, until exhaust the counter of attempts. > At this point, correctly, the service is activated on the other node > (node2). > If appens a new change of server (for example in case of shutdown of > the node2), on the node1 Pacemeker doesn't try to restart the > service. It doesn't apparently reset the number of failed attempts. > The situation is restored only following the cleanup (pcs resource > cleanup). > There is any solution? Is possible to tell to pacemaker that need to > reset the number of failed attempts when, for example, the resource > is activated on the other node? > > Thanks, alex
You can clean failures manually, or set the failure-timeout resource meta-attribute (which can be set on a particular resource, or for all resources via rsc_defaults). The failure-timeout (as you might expect) works by automatically cleaning the failure after a certain amount of time has passed, not when a particular event occurs (such as a start on another node). Once a failure is cleaned, that node becomes eligible to run the resource again, and (depending on stickiness and so forth) the cluster may choose to move the resource back to that node. That's one reason failures aren't automatically cleaned after a successful start elsewhere. Also, keeping the failure allows an administrator to notice that something went wrong, and manually investigate before allowing the node to host the resource again. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org