Hello, On Fri, 18 Feb 2022 21:44:58 +0000 "Larry G. Mills" <lgmi...@fnal.gov> wrote:
> ... This happened again recently, and the running primary DB was demoted and > then re-promoted to be the running primary. What I'm having trouble > understanding is why the running Master/primary DB was demoted. After the > monitor operation timed out, the failcount for the ha-db resource was still > less than the configured "migration-threshold", which is set to 5. Because "migration-threshold" is the limit before the resource is moved away from the node. As long as your failcount is less than "migration-threshold" and the failure is not fatal, the cluster will keep the resource on the same node and try to "recover" it by running a full restart: demote -> stop -> start -> promote. Since 2.0, the recover action can be demote -> promote. See the "on-fail" property and the detail about it below the table: https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#operation-properties Regards, _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/