I have two pacemaker resources. We call them A and B. Because of environmental reasons, their start methods and monitor methods always return failure
(OCF_ERR_GENERIC). The following are their configurations:(The cluster property of start-failure-is-fatal is false) primitive A A \ op monitor interval=20 timeout=120 \ op stop interval=0 timeout=120 on-fail=restart \ op start interval=0 timeout=240 on-fail=restart \ meta failure-timeout=60s primitive B B \ op monitor interval=20 timeout=120 \ op stop interval=0 timeout=120 on-fail=restart \ op start interval=0 timeout=240 on-fail=restart \ meta failure-timeout=60s clone A_cl A clone B_cl B The time consuming of their methods is different: A: start = 60s monitor < 1s stop = 80s B: start < 1s monitor < 1s stop < 1s Resource of A is scheduled normally, always start and stop. But for resource B, there is only circular monitor fails, without start and stop. . And there is no fail-count showing of B in "crm status -f". Two operations can solve the problem of B not being scheduled: 1,Set failure-timeout of B from 60s to 600s 2,Modify ocf of A,make the stop method return as soon as possible I tested it several times, and the results were the same. Why does the resource not be scheduled when failure-timeout setting too short? And what does it have to do with the time consuming stop of another resource? Is this a bug? My pacemaker version is 1.1.16. Any suggestion is welcome. Thank you! James 2018-05-20
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org