On Fri, Mar 25, 2011 at 11:41 AM, Simone Gotti <simone.go...@gmail.com> wrote: > On 03/25/2011 11:10 AM, Andrew Beekhof wrote: >> On Thu, Mar 17, 2011 at 11:54 PM, Simone Gotti <simone.go...@gmail.com> >> wrote: >>> Hi, >>> >>> When using corosync + pcmk v1 starting both corosync and pacemakerd (and >>> I think also using heartbeat or anything other than cman) as quorum >>> provider, at startup in the CIB will not be a <node_state/> entry for >>> the nodes that are not in cluster. >> No, I'm pretty sure heartbeat has the same behavior. > I didn't tested it bit if it works like cman then I think that > startup-fencing won't work also on it. But this will be very strange. > >>> Instead when using cman as quorum provider there will be a <node_state> >>> for every node known by cman as lib/common/ais.c:cman_event_callback >>> calls crm_update_peer for every node reported by cman_get_nodes. >> Yep >> >>> Something similar will happen when using corosync+pcmkv1 if corosync is >>> started on N nodes but pacemakerd is started only on N-M nodes. >> Probably true. >> >>> All of this will break 'startup-fencing' because, from my understanding, >>> the logic is this: >>> >>> 1) At startup all the nodes are marked (in >>> lib/pengine/unpack.c:unpack_node) as unclean. >>> 2) lib/pengine/unpack.c:unpack_status will cycle only the available >>> <node_state/> in the cib status section resetting them to a clean status >>> at the start and then putting them as unclean if some conditions are met. >>> 3) pengine/allocate.c:stage6 all the unclean nodes are fenced. >>> >>> In the above conditions you'll have a <node_state/> in the cib status >>> section also for nodes without pacemakerd enabled and the startup >>> fencing won't happen because there isn't any condition in unpack_status >>> that will mark them as unclean. >> But they're unclean by default... so the lack of a node_state >> shouldn't affect that. >> Or did you mean "clean" instead of "unclean"? > > The problem is not the lack of node state but the opposite, the presence > of a node state also if the nodes that haven't joined the cluster. This > happens with the current cman integration. > > The nodes known to pacemaker are all setted as unclean by default (point > 1 above). > But if their <node_state/> is available in the CIB, then in point 2 they > will be set as clean (unclean=false) and no condition check in > unpack_status will mark them as unclean=true again.
Ok, I understand what you're saying now. >>> I'm not very expert of the code. I discarded the solution to not >>> register at startup all the nodes known by cman but only the active ones >>> as it won't fix the corosync+pcmkv1 case. >>> >>> Instead I tried to understand when a node that has its status in the cib >>> should be startup fenced and a possible solution is in the attached patch. >>> I noticed that when crm_update_peer inserts a new node this one doesn't >>> have the expected attribute set. So if startup-fencing is enabled I'm >>> going to set the node as expected up. >> >> You lost me there... isn't this covered by just setting >> startup-fencing=false? > I lost you too :D . The problem is that startup-fencing is not working. > > > Anyway. This first patche is a sort of attempt to make startup-fencing > work when in the CIB there are <node_state/> tags also for nodes not in > the cluster. But it was a fast attempt that I don't like it as my > intention was primarily to explain the actual problem. But probably I > wasn't very clear in doing this. Sorry. > > In the mail a sent after this one, I tried to make a first step changing > the behavior of the cman integration to make it work like the other > implementations: add <node_state/> tag only for the hosts that joined > the cluster. That patch looks dangerous. If A comes up and then B, then: A will have entries for A, B, C and D, but B will only have entries for A and B Can you file a bug for this please? _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker