On Tue, Dec 18, 2012 at 4:24 PM, pavan tc <pavan...@gmail.com> wrote: > [..] > > >> The idea is to make sure that stop does not fail when the underlying >> >> > resource goes away. >> > (Otherwise I see that the resource gets to an unmanaged state) >> > Also, the expectation is that when the resource comes back, it joins the >> > cluster without much fuss. >> > >> > What I see is that pacemaker calls stop twice >> >> That would not be expected. Bug? > > > Are you pointing at stop getting called 'twice'?
Correct > If yes, I will confirm once > more about > the behaviour and will raise a bug. > >> >> >> > and if it finds that stop >> > returns success, >> > it does not continue with monitor any more. I also do not see an attempt >> > to >> > start. >> >> Anywhere? Or just on the same node? >> > > On the same node. The resource does get promoted on the other node. > My expectation was that if I kept returning OCF_NOT_RUNNING in monitor, > then it should attempt a start-stop-monitor cycle till the resource came > back. > It seems this is not what the cluster manager does? Not always, it very much depends on the constraints you've defined and things like migration-threshold. > >> > >> > Is there a way to keep the monitor going in such circumstances? >> >> Not really. You can define a recurring monitor for the Stopped role >> though. > > > I did not want to go there if I could achieve it via the usual mechanisms. If you want to monitor a resource on a node that its not running on, that _is_ the usual mechanism. The thing is that it's an unusual thing to want to do. > If that is not, possible, I will explore this option in more detail. > >> But why would it come back? You _really_ should not be starting >> services outside of the cluster - not least of all because we've >> probably started it somewhere else in the meantime. > > > Even if we started the resource elsewhere, we are running in degraded mode. Not on the node for which you returned "stopped". There you are just flat-out not running at all. > (My bad, I did not mention this is a _two-node_ multi-state resource). > We would like to come back to the available mode as early as possible and > with the least amount of manual intervention with the cluster. Normally I wouldn't expect any manual intervention either, but I really can't comment further without seeing logs and configs. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org