Hi Vu, ack review only. One minor comment below. /Thanks HansN

On 10/19/18 10:16, Vu Minh Nguyen wrote:

After split-recovery, there is possibility of having epoch counters mismatched
b/w one on IMMND veteran located at this partition and one from active IMMD on
another partition.

With that, instead of generating coredump in such case, we should syslog error
message and have the IMMND veteran self-terminated.
---
 src/imm/immnd/immnd_evt.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index b260d43ff..bc55ea946 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10279,22 +10279,21 @@ static uint32_t immnd_evt_proc_start_sync(IMMND_CB 
*cb, IMMND_EVT *evt,
        } else {
                if (cb->mMyEpoch + 1 < cb->mRulingEpoch) {
                        if (cb->mState > IMM_SERVER_LOADING_PENDING) {
-                               LOG_WA(
-                                   "Imm at this node has epoch %u, "
+                               LOG_ER(
+                                   "Imm at this node has epoch %u, rulling 
epoch %u"
                                    "appears to be a stragler in wrong state 
%u",
-                                   cb->mMyEpoch, cb->mState);
-                               abort();
+                                   cb->mMyEpoch, cb->mRulingEpoch, cb->mState);
+                               exit(1);
                        } else {
                                TRACE_2(
                                    "This nodes apparently missed start of 
sync");
                        }
                } else {
-                       osafassert(cb->mMyEpoch + 1 > cb->mRulingEpoch);
-                       LOG_WA(
-                           "Imm at this evs node has epoch %u, "
+                       LOG_ER(
+                           "Imm at this evs node has epoch %u, rulling epoch 
%u"

[HansN] perhaps the log message needs some updates, e.g. "COORDINATOR appears 
to be a straggler!!, exiting.",


                            "COORDINATOR appears to be a stragler!!, aborting.",
-                           cb->mMyEpoch);
-                       abort();
+                           cb->mMyEpoch, cb->mRulingEpoch);
+                       exit(1);
                        /* TODO: 080414 re-inserted the osafassert/abort ...
                           This is an extreemely odd case. Possibly it could
                           occur after a failover ?? */


_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to