On Tue, Sep 27, 2011 at 6:24 PM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: > 27.09.2011 10:56, Andrew Beekhof wrote: >> On Tue, Sep 27, 2011 at 5:07 PM, Vladislav Bogdanov >> <bub...@hoster-ok.com> wrote: >>> 27.09.2011 08:59, Andrew Beekhof wrote: >>> [snip] >>>>>>>>> I agree with Jiaju >>>>>>>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html), >>>>>>>>> that could be solely pacemaker problem, because it probably should >>>>>>>>> originate fencing itself is such situation I think. >>>>>>>>> >>>>>>>>> So, using pacemaker/dlm with openais stack is currently risky due to >>>>>>>>> possible hangs of dlm_lockspaces. >>>>>>>> >>>>>>>> It shouldn't be, failing to connect to attrd is very unusual. >>>>>>> >>>>>>> By the way, one of underlying problems, which actually made me to notice >>>>>>> all this, is that pacemaker cluster does not fence its DC if it leaves >>>>>>> the cluster for a very short time. That is what Jiaju told in his notes. >>>>>>> And I can confirm that. >>>>>> >>>>>> Thats highly surprising. Do the logs you sent display this behaviour? >>>>> >>>>> They do. Rest of the cluster begins the election, but then accepts >>>>> returned DC back (I write this from memory, I looked at logs Sep 5-6, so >>>>> I may mix up something). >>>> >>>> Actually, this might be possible - if DC.old came back before DC.new >>>> had a chance to get elected, run the PE and initiate fencing, then >>>> there would be no need to fence. >>>> >>> >>> (text below is for pacemaker on top of openais stack, not for cman) >>> >>> Except dlm lockspaces are in kern_stop state, so a whole dlm-related >>> part is frozen :( - clvmd in my case, but I expect the same from gfs2 >>> and ocfs2. >>> And fencing requests originated on CPG NODEDOWN event by dlm_controld >>> (with my patch to dlm_controld and your patch for >>> crm_terminate_member_common()) on a quorate partition are lost. DC.old >>> doesn't accept CIB updates from other nodes, so that fencing requests >>> are discarded. >> >> All the more reason to start using the stonith api directly. >> I was playing around list night with the dlm_controld.pcmk code: >> >> https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787 > > Wow, I'll try it! > > Btw (offtopic), don't you think that it could be interesting to have > stacks support in dlopened modules there? From what I see in that code, > it could be almost easily achieved. One just needs to create module API > structure, enumerate functions in each stack, add module loading to > dlm_controld core and change calls to module functions.
I'm sure its possible. Just up to David if he wants to support it. > >> >>> >>> I think that problem is that membership changes are handled in a >>> non-transactional way (?). >> >> Sounds more like the dlm/etc is being dumb - if the host is back and >> healthy, why would we want to shoot it? > > Ammmm..... No comments from me on this ;) > > But, anyways, something needs to be done at either side... > >> >>> If pacemaker fully finish processing of one membership change - elect >>> new DC on a quorate partition, and do not try to take over dc role (or >>> release it) on a non-quorate partition if quorate one exists, that >>> problem could be gone. >> >> Non quorate partitions still have a DC. >> They're just not supposed to do anything (depending on the value of >> no-quorum-policy). > > I actually meant "do not try to take over dc role in a rejoined cluster > (or release that role) if it was running on a non-quorate partition > before rejoin if quorate one existed". All existing DC's give up the role and a new one is elected when two partitions join. So I'm unsure what you're referring to here :-) > Sorry for confusion. Not very > natural wording again, but should be better. > > May be DC from non-quorate partition should just have lower priority to > become DC when cluster rejoins and new election happen (does it?)? There is no bias towards past DCs in the election. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker