Possible bug with mandatory ordering involving stateful (i.e. master-slave) resources
I have a 2-node cluster (we are running the SLES 11 HA extension, so the pacemaker version is 1.1.2) in which a master-slave resource is dependent on a clone resource via a mandatory ordering constraint. From "crm configure show": primitive dummy ocf:heartbeat:Dummy \ op monitor interval="15s" \ op start interval="0" timeout="40s" \ op stop interval="0" timeout="60s" primitive statefuldummy ocf:heartbeat:Stateful \ op start timeout="1800s" \ op timeout="45s" \ op monitor interval="10s" timeout="60s" \ op promote timeout="45s" \ op demote timeout="30s" ms dummy-ms statefuldummy \ meta target-role="Started" master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="false" ordered="false" globally-unique="false" is-managed="true" clone dummy-clone dummy \ meta target-role="Started" order dummy-order inf: dummy-clone dummy-ms (I reproduced the problem we are experiencing with dummy resources to try and eliminate the RAs for our real resources as the source of the issue.) The order of events is as follows: 1) Force a shutdown of the dummy-clone via "crm resource stop dummy-clone" 2) Logs show that the crm stops both the master and slave statefuldummy resources of the dummy-ms. Good. 3) Logs show that the crm stops the dummy-clone resources. Good. 4) Logs immediately show that the crm starts the master and slave statefuldummy resources of the dummy-ms. Bad. 5) Logs show the crm stopping the statefuldumy resources again. Good? Has anyone seen something similar? My understanding of the ordering constraints tells me that event #4 is erroneous behaviour. I would not expect the statefuldummy resources to be restarted until a "crm resource start dummy-clone" command is issued. If I have other types of resources dependent on the clone, such as another clone or a group, they behave as I would expect. It seems to be only with master-slave resources that the crm tries to start the resource inappropriately. In our real cluster, the master-slave returns an error (OCF_ERR_GENERIC) when it is started while its prerequisite resource is not started. In this case, event#5 does not happen, and the master-slave is never again restarted, even after the prerequisite clone resource is restarted via "crm resource start <resource-name>". Thanks for your help, Chris King
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker