Can you try with these two patches please? + Andrew Beekhof (4 seconds ago) fec946a: Fix: crmd: When the DC gracefully shuts down, record the new expected state into the cib (HEAD, master) + Andrew Beekhof (10 seconds ago) 740122a: Fix: crmd: When a peer expectedly shuts down, record the new join and expected states into the cib
On 12 Nov 2013, at 11:05 am, Andrew Beekhof <and...@beekhof.net> wrote: > > On 12 Nov 2013, at 10:29 am, Andrew Beekhof <and...@beekhof.net> wrote: > >> >> On 12 Nov 2013, at 2:46 am, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: >> >>> 11.11.2013 09:00, Vladislav Bogdanov wrote: >>> ... >>>>>>> Looking at crm-fence-peer.sh script, it would determine peer state as >>>>>>> offline immediately if node state (all of) >>>>>>> * doesn't contain "expected" tag or has it set to "down" >>>>>>> * has "in_ccm" tag set to false >>>>>>> * has "crmd" tag set to anything except "online" >>>>>>> >>>>>>> On the other hand, crmd sets "expected" = "down" only after fencing is >>>>>>> complete (probably the same for "in_ccm"?). Shouldn't is do the same (or >>>>>>> may be just remove that tag) if clean shutdown about to be complete? >>>>>> >>>>>> That would make sense. Are you using the plugin, cman or corosync 2? >>>> >>> >>> This one works in all tests I was able to imagine, but I'm not sure it is >>> completely safe to set expected="down" for old DC (in test when drbd is >>> promoted on DC and it reboots). >>> >>> From ddfccc8a40cfece5c29d61f44a4467954d5c5da8 Mon Sep 17 00:00:00 2001 >>> From: Vladislav Bogdanov <bub...@hoster-ok.com> >>> Date: Mon, 11 Nov 2013 14:32:48 +0000 >>> Subject: [PATCH] Update node values in cib on clean shutdown >>> >>> --- >>> crmd/callbacks.c | 6 +++++- >>> crmd/membership.c | 2 +- >>> 2 files changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/crmd/callbacks.c b/crmd/callbacks.c >>> index 3dae17b..9cfb973 100644 >>> --- a/crmd/callbacks.c >>> +++ b/crmd/callbacks.c >>> @@ -162,6 +162,8 @@ peer_update_callback(enum crm_status_type type, >>> crm_node_t * node, const void *d >>> } else if (safe_str_eq(node->uname, fsa_our_dc) && >>> crm_is_peer_active(node) == FALSE) { >>> /* Did the DC leave us? */ >>> crm_notice("Our peer on the DC (%s) is dead", fsa_our_dc); >>> + /* FIXME: is it safe? */ >> >> Not at all safe. It will prevent fencing. >> >>> + crm_update_peer_expected(__FUNCTION__, node, >>> CRMD_JOINSTATE_DOWN); >>> register_fsa_input(C_CRMD_STATUS_CALLBACK, I_ELECTION, NULL); >>> } >>> break; >>> @@ -169,6 +171,7 @@ peer_update_callback(enum crm_status_type type, >>> crm_node_t * node, const void *d >>> >>> if (AM_I_DC) { >>> xmlNode *update = NULL; >>> + int flags = node_update_peer; >>> gboolean alive = crm_is_peer_active(node); >>> crm_action_t *down = match_down_event(0, node->uuid, NULL, appeared); >>> >>> @@ -199,6 +202,7 @@ peer_update_callback(enum crm_status_type type, >>> crm_node_t * node, const void *d >>> >>> crm_update_peer_join(__FUNCTION__, node, crm_join_none); >>> crm_update_peer_expected(__FUNCTION__, node, >>> CRMD_JOINSTATE_DOWN); >>> + flags |= node_update_cluster | node_update_join | >>> node_update_expected; >> >> This does look ok though > > With the exception of 'node_update_cluster'. > That didn't change here and shouldn't be touched until it really does leave > the membership. > >> >>> check_join_state(fsa_state, __FUNCTION__); >>> >>> update_graph(transition_graph, down); >>> @@ -221,7 +225,7 @@ peer_update_callback(enum crm_status_type type, >>> crm_node_t * node, const void *d >>> crm_trace("Other %p", down); >>> } >>> >>> - update = do_update_node_cib(node, node_update_peer, NULL, >>> __FUNCTION__); >>> + update = do_update_node_cib(node, flags, NULL, __FUNCTION__); >>> fsa_cib_anon_update(XML_CIB_TAG_STATUS, update, >>> cib_scope_local | cib_quorum_override | >>> cib_can_create); >>> free_xml(update); >>> diff --git a/crmd/membership.c b/crmd/membership.c >>> index be1863a..d68b3aa 100644 >>> --- a/crmd/membership.c >>> +++ b/crmd/membership.c >>> @@ -152,7 +152,7 @@ do_update_node_cib(crm_node_t * node, int flags, >>> xmlNode * parent, const char *s >>> crm_xml_add(node_state, XML_ATTR_UNAME, node->uname); >>> >>> if (flags & node_update_cluster) { >>> - if (safe_str_eq(node->state, CRM_NODE_ACTIVE)) { >>> + if (crm_is_peer_active(node)) { >> >> This is also wrong. XML_NODE_IN_CLUSTER is purely a record of whether the >> node is in the current corosync/cman/heartbeat membership. >> >>> value = XML_BOOLEAN_YES; >>> } else if (node->state) { >>> value = XML_BOOLEAN_NO; >>> -- >>> 1.7.1 >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org