[ClusterLabs] Failing operations immediately when node is known to be down

Ryan Thomas Thu, 12 Apr 2018 19:28:42 -0700

I’m trying to implement a HA solution which recovers very quickly when a
node fails.  It my configuration, when I reboot a node, I see in the logs
that pacemaker realizes the node is down, and decides to move all resources
to the surviving node.  To do this, it initiates a ‘stop’ operation on each
of the resources to perform the move.  The ‘stop’ fails as expected after
20s (the default action timeout).  However, in this case, with the node
known to be down,  I’d like to avoid this 20 second delay.  The node is
known to be down, so any operations sent to the node will fail.  It would
be nice if operations sent to a down node would immediately fail, thus
reducing the time it takes the resource to be started on the surviving
node.  I do not want to reduce the timeout for the operation, because the
timeout is sensible for when a resource moves due to a non-node-failure.  Is
there a way to accomplish this?



Thanks for your help.

_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Failing operations immediately when node is known to be down

Reply via email to