Hi Vu, ack review only. One minor comment below. /Thanks HansN On 10/19/18 10:16, Vu Minh Nguyen wrote:
After split-recovery, there is possibility of having epoch counters mismatched b/w one on IMMND veteran located at this partition and one from active IMMD on another partition. With that, instead of generating coredump in such case, we should syslog error message and have the IMMND veteran self-terminated. --- src/imm/immnd/immnd_evt.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c index b260d43ff..bc55ea946 100644 --- a/src/imm/immnd/immnd_evt.c +++ b/src/imm/immnd/immnd_evt.c @@ -10279,22 +10279,21 @@ static uint32_t immnd_evt_proc_start_sync(IMMND_CB *cb, IMMND_EVT *evt, } else { if (cb->mMyEpoch + 1 < cb->mRulingEpoch) { if (cb->mState > IMM_SERVER_LOADING_PENDING) { - LOG_WA( - "Imm at this node has epoch %u, " + LOG_ER( + "Imm at this node has epoch %u, rulling epoch %u" "appears to be a stragler in wrong state %u", - cb->mMyEpoch, cb->mState); - abort(); + cb->mMyEpoch, cb->mRulingEpoch, cb->mState); + exit(1); } else { TRACE_2( "This nodes apparently missed start of sync"); } } else { - osafassert(cb->mMyEpoch + 1 > cb->mRulingEpoch); - LOG_WA( - "Imm at this evs node has epoch %u, " + LOG_ER( + "Imm at this evs node has epoch %u, rulling epoch %u" [HansN] perhaps the log message needs some updates, e.g. "COORDINATOR appears to be a straggler!!, exiting.", "COORDINATOR appears to be a stragler!!, aborting.", - cb->mMyEpoch); - abort(); + cb->mMyEpoch, cb->mRulingEpoch); + exit(1); /* TODO: 080414 re-inserted the osafassert/abort ... This is an extreemely odd case. Possibly it could occur after a failover ?? */ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel