The coordinator IMMND on PL-3 was crashed, the active IMMD then elected the new coordinator on the standby node, SC-2, but failed because IMMND on the SC-2 was restarted also. As the result, the active IMMD exited and failure-over happened. After that, SC-2 took active role and found no candidate for new IMMND coordinator, so cluster was rebooted.
We can prevent this happen if the active IMMD prioritizes to elect the coordinator which is located on the same site with himself if the IMMND database is up-to-date. --- src/imm/immd/immd_proc.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/src/imm/immd/immd_proc.c b/src/imm/immd/immd_proc.c index 1882eef..34c1415 100644 --- a/src/imm/immd/immd_proc.c +++ b/src/imm/immd/immd_proc.c @@ -331,24 +331,34 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active) */ } else { /* Try to elect a new coord. */ + IMMD_IMMND_INFO_NODE *candidate_coord_node = NULL; cb->payload_coord_dest = 0LL; memset(&key, 0, sizeof(MDS_DEST)); immd_immnd_info_node_getnext(&cb->immnd_tree, &key, &immnd_info_node); + + // Election priority: + // 1) Coordinator on active node + // 2) Coordinator on standby node + // 3) Coordinator on PL node if SC absence is allowed. while (immnd_info_node) { key = immnd_info_node->immnd_dest; if ((immnd_info_node->isOnController) && (immnd_info_node->epoch == cb->mRulingEpoch)) { - /*We found a new candidate for cordinator */ + candidate_coord_node = immnd_info_node; immnd_info_node->isCoord = true; - break; + if (immnd_info_node->immnd_key == cb->node_id) { + /* Found a new candidate on active SC */ + break; + } } immd_immnd_info_node_getnext(&cb->immnd_tree, &key, &immnd_info_node); } - if (!immnd_info_node && cb->mScAbsenceAllowed) { + immnd_info_node = candidate_coord_node; + if (!immnd_info_node && cb->mScAbsenceAllowed) { /* If SC absence is allowed and no SC based IMMND is available then elect an IMMND coord at a payload. Note this means that an IMMND at a payload may be -- 1.9.1 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel