[ClusterLabs] Antw: Re: Master/slave failover does not work as expected

Ulrich Windl Tue, 13 Aug 2019 00:45:20 -0700

>>> Harvey Shepherd <harvey.sheph...@aviatnet.com> schrieb am 12.08.2019 um 
>>> 23:38
in Nachricht <ec767e3d-0cde-42c2-a8de-72ffce859...@email.android.com>:
> I've been experiencing exactly the same issue. Pacemaker prioritises 
> restarting the failed resource over maintaining a master instance. In my case 
> I used crm_simulate to analyse the actions planned and taken by pacemaker 
> during resource recovery. It showed that the system did plan to failover the 
> master instance, but it was near the bottom of the action list. Higher 
> priority was given to restarting the failed instance, consequently when that 
> had occurred, it was easier just to promote the same instance rather than 
> failing over.


That's interesting: Maybe usually it's actually faster to restart a failed 
(master) process rather than promoting a slave to master, possibly demoting the 
old master to slave, etc.

But most obviously while there is a (possible) resource utilization for 
resources, there is none for operations (AFAIK): If one could configure 
"operation costs" (maybe as rules), the cluster could prefer the transition 
with least costs. Unfortunately it will make things more complicated.

I could even imagine if you set the cost for "stop" to infinity, the cluster 
will not even try to stop the resource, but will fence the node instead...

> 
> This particular behaviour caused me a lot of headaches. In the end I had to 
> use a workaround by setting max failures for the resource to 1, and clearing 
> the failure after 10 seconds. This forces it to failover, but there is then a 
> window (longer than 10 seconds due to the cluster check timer which is used 
> to clear failures) where the resource can't fail back if there happened to be 
> a second failure. It also means that there is no slave running during this 
> time, which causes a performance hit in my case.
> 
[...]


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: Master/slave failover does not work as expected

Reply via email to