On Fri, Oct 7, 2011 at 2:05 AM, King, Christopher <ck...@broadviewnet.com> wrote: > Possible bug with mandatory ordering involving stateful (i.e. master-slave) > resources > > > > I have a 2-node cluster (we are running the SLES 11 HA extension, so the > pacemaker version is 1.1.2) in which a master-slave resource is dependent on > a clone resource via a mandatory ordering constraint. From “crm configure > show”: > > > > primitive dummy ocf:heartbeat:Dummy \ > > op monitor interval="15s" \ > > op start interval="0" timeout="40s" \ > > op stop interval="0" timeout="60s" > > > > primitive statefuldummy ocf:heartbeat:Stateful \ > > op start timeout="1800s" \ > > op timeout="45s" \ > > op monitor interval="10s" timeout="60s" \ > > op promote timeout="45s" \ > > op demote timeout="30s" > > > > ms dummy-ms statefuldummy \ > > meta target-role="Started" master-max="1" master-node-max="1" > clone-max="2" clone-node-max="1" notify="false" ordered="false" > globally-unique="false" is-managed="true" > > > > clone dummy-clone dummy \ > > meta target-role="Started" > > > > order dummy-order inf: dummy-clone dummy-ms > > (I reproduced the problem we are experiencing with dummy resources to try > and eliminate the RAs for our real resources as the source of the issue.) > > > > The order of events is as follows: > > 1) Force a shutdown of the dummy-clone via “crm resource stop > dummy-clone” > > 2) Logs show that the crm stops both the master and slave statefuldummy > resources of the dummy-ms. Good. > > 3) Logs show that the crm stops the dummy-clone resources. Good. > > 4) Logs immediately show that the crm starts the master and slave > statefuldummy resources of the dummy-ms. Bad. > > 5) Logs show the crm stopping the statefuldumy resources again. Good? > > > > Has anyone seen something similar? My understanding of the ordering > constraints tells me that event #4 is erroneous behaviour.
Correct. Since you're a SLES customer, I'd advise you to contact SUSE directly - they should be able to give it the proper attention and escalate upstream if its not already fixed. > I would not > expect the statefuldummy resources to be restarted until a “crm resource > start dummy-clone” command is issued. If I have other types of resources > dependent on the clone, such as another clone or a group, they behave as I > would expect. It seems to be only with master-slave resources that the crm > tries to start the resource inappropriately. > > > > In our real cluster, the master-slave returns an error (OCF_ERR_GENERIC) > when it is started while its prerequisite resource is not started. In this > case, event#5 does not happen, and the master-slave is never again > restarted, even after the prerequisite clone resource is restarted via “crm > resource start <resource-name>”. > > > > Thanks for your help, > > Chris King > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker