On 03/17/2011 11:54 PM, Simone Gotti wrote: > Hi, > > When using corosync + pcmk v1 starting both corosync and pacemakerd (and > I think also using heartbeat or anything other than cman) as quorum > provider, at startup in the CIB will not be a <node_state/> entry for > the nodes that are not in cluster. > > Instead when using cman as quorum provider there will be a <node_state> > for every node known by cman as lib/common/ais.c:cman_event_callback > calls crm_update_peer for every node reported by cman_get_nodes. > > Something similar will happen when using corosync+pcmkv1 if corosync is > started on N nodes but pacemakerd is started only on N-M nodes. > > All of this will break 'startup-fencing' because, from my understanding, > the logic is this: > > 1) At startup all the nodes are marked (in > lib/pengine/unpack.c:unpack_node) as unclean. > 2) lib/pengine/unpack.c:unpack_status will cycle only the available > <node_state/> in the cib status section resetting them to a clean status > at the start and then putting them as unclean if some conditions are met. > 3) pengine/allocate.c:stage6 all the unclean nodes are fenced. > > In the above conditions you'll have a <node_state/> in the cib status > section also for nodes without pacemakerd enabled and the startup > fencing won't happen because there isn't any condition in unpack_status > that will mark them as unclean. > > > I'm not very expert of the code. I discarded the solution to not > register at startup all the nodes known by cman but only the active ones > as it won't fix the corosync+pcmkv1 case. > > Instead I tried to understand when a node that has its status in the cib > should be startup fenced and a possible solution is in the attached patch. > I noticed that when crm_update_peer inserts a new node this one doesn't > have the expected attribute set. So if startup-fencing is enabled I'm > going to set the node as expected up. Hi,
Thinking a little more about this I think that the cman case and the pcmkv1 case are quite different. It's probably correct to have cman + pacemaker started on some nodes and only cman started on other nodes. So it would be better, as a first step, to make the cman integration work as the other cases and then look at some problems already presents in all the implementations that comes to my mind (I've got some corner cases in mind that I'd like to explain in the next days). The attached patch tries to add at startup to the cib status section only the active nodes. Thanks! Bye! > > Thanks! > Bye! >
# HG changeset patch # User Simone Gotti <simone.go...@gmail.com> # Date 1300498033 -3600 # Node ID 1152982cac5558fea2faf5e344e76ac18d0b80c5 # Parent 30d64eaba0506e3ed85f442fd90ea3adc83c9501 At startup add only the active nodes. This will make the cman integration behave as the other and let startup-fencing work. diff -r 30d64eaba050 -r 1152982cac55 lib/common/ais.c --- a/lib/common/ais.c Thu Mar 17 23:42:33 2011 +0100 +++ b/lib/common/ais.c Sat Mar 19 02:27:13 2011 +0100 @@ -636,7 +636,7 @@ #define MAX_NODES 256 -static void cman_event_callback(cman_handle_t handle, void *privdata, int reason, int arg) +static void cman_event_handle(cman_handle_t handle, void *privdata, int reason, int arg, int startup) { int rc = 0, lpc = 0, node_count = 0; @@ -674,10 +674,13 @@ /* Never allow node ID 0 to be considered a member #315711 */ cman_nodes[lpc].cn_member = 0; } - crm_update_peer(cman_nodes[lpc].cn_nodeid, cman_nodes[lpc].cn_incarnation, + /* At startup add only the active nodes or startup fencing won't work */ + if ((startup && cman_nodes[lpc].cn_member) || !startup ) { + crm_update_peer(cman_nodes[lpc].cn_nodeid, cman_nodes[lpc].cn_incarnation, cman_nodes[lpc].cn_member?crm_peer_seq:0, 0, 0, cman_nodes[lpc].cn_name, cman_nodes[lpc].cn_name, NULL, cman_nodes[lpc].cn_member?CRM_NODE_MEMBER:CRM_NODE_LOST); + } } if(dispatch) { @@ -696,6 +699,12 @@ break; } } + +static void cman_event_callback(cman_handle_t handle, void *privdata, int reason, int arg) +{ + cman_event_handle(handle, privdata, reason, arg, FALSE); +} + #endif gboolean init_cman_connection( @@ -729,8 +738,8 @@ } /* Get the current membership state */ - cman_event_callback(pcmk_cman_handle, dispatch, CMAN_REASON_STATECHANGE, - cman_is_quorate(pcmk_cman_handle)); + cman_event_handle(pcmk_cman_handle, dispatch, CMAN_REASON_STATECHANGE, + cman_is_quorate(pcmk_cman_handle), TRUE); fd = cman_get_fd(pcmk_cman_handle); crm_debug("Adding fd=%d to mainloop", fd);
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker