Hi, On Mon, Nov 29, 2010 at 06:08:11PM +0100, Uwe Grawert wrote: > Zitat von Dejan Muhamedagic <deja...@fastmail.fm>: > > >Hi, > > > >On Mon, Nov 29, 2010 at 02:42:42PM +0100, Uwe Grawert wrote: > >>Was: Re: [Pacemaker] crm resource restart doesn't restart the > >>correct resource > >> > >>Zitat von Dejan Muhamedagic <deja...@fastmail.fm>: > >> > >>>>This is happening, because, when the clone is created, > >>>>pacemaker stops the primitive but does not wait for the stop action > >>>>to return, and just starts the primitive over. And that off course > >>>>causes problems. > >>> > >>>Hmm, don't quite understand what is going on. Is that primitive > >>>part of the group? Can you describe in more detail what is going > >>>on. > >> > >>I have a group (grp_fs) consisting of a LVM and several Filesystem > >>resources, in that order. That group is started and all resources are > >>running. Now I do clone this group by issuing: > >> > >>crm configure clone clo_fs grp_fs > >> > >>That does stop all resources and starts them again as clone. But > >>Pacemaker does not seem to wait until the stop action has finished. I > >>have modified the LVM RA to log the action command issued to the agent > >>and the value returned by the agent: > >> > >>14:24:11 [ 14495 ] Action: start > >>14:24:11 [ 14494 ] Action: stop > >>14:24:13 [ 14494 ] RC: 1 > >>14:24:14 [ 14495 ] RC: 0 > >>14:24:14 [ 14599 ] Action: monitor > >>14:24:14 [ 14599 ] RC: 0 > >> > >>In brackets you see the PID. As can be seen, Pacemaker first issues a > >>start command and then immediately a stop afterwards, not waiting for > >>the first command to return. That produces an orphan resource. That > >>involves that the state of the LVM resource (which is now cloned) is > >>uncertain. It can happen to start but it can also fail. > > > >I see. The problem here is that as far as the cluster's > >concerned, the new resources and the old resources are > >unrelated: they have different names (before it was say lvm1 and > >now it's lvm1:0). I'm not sure if the crmd/pengine can tell if > >the resources of the group which are running actually belong to > >the cloned group as well. Andrew? If not, then we'll have to > >forbid creating a clone of running resources in the shell. > > Ok, if it is going to be forbidden to clone a running resource, > there is a problem with groups. A stopped primitive is getting its > target-role property cleared when cloned. A group does not! If I > stop a group, make a clone and try to start the clone, nothing > happens until the target-role="stopped" is cleared manually from the > CIB. Stopping a primitive in that group (say the first one) has the > same effect. As long as some resource or group in the clone has the > target-role property set, nothing will happen.
That bug was fixed yesterday in the 1.1 repository: changeset: 10433:e99aa3451ce7 user: Dejan Muhamedagic <de...@hello-penguin.com> date: Thu Dec 02 16:52:37 2010 +0100 summary: Medium: Shell: repair management of cloned groups Thanks for reporting. Cheers, Dejan > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker