After split-recovery, there is possibility of having epoch counters mismatched
b/w one on IMMND veteran located at this partition and one from active IMMD on
another partition.

With that, instead of generating coredump in such case, we should syslog error
message and have the IMMND veteran self-terminated.
---
 src/imm/immnd/immnd_evt.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index b260d43ff..bc55ea946 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10279,22 +10279,21 @@ static uint32_t immnd_evt_proc_start_sync(IMMND_CB 
*cb, IMMND_EVT *evt,
        } else {
                if (cb->mMyEpoch + 1 < cb->mRulingEpoch) {
                        if (cb->mState > IMM_SERVER_LOADING_PENDING) {
-                               LOG_WA(
-                                   "Imm at this node has epoch %u, "
+                               LOG_ER(
+                                   "Imm at this node has epoch %u, rulling 
epoch %u"
                                    "appears to be a stragler in wrong state 
%u",
-                                   cb->mMyEpoch, cb->mState);
-                               abort();
+                                   cb->mMyEpoch, cb->mRulingEpoch, cb->mState);
+                               exit(1);
                        } else {
                                TRACE_2(
                                    "This nodes apparently missed start of 
sync");
                        }
                } else {
-                       osafassert(cb->mMyEpoch + 1 > cb->mRulingEpoch);
-                       LOG_WA(
-                           "Imm at this evs node has epoch %u, "
+                       LOG_ER(
+                           "Imm at this evs node has epoch %u, rulling epoch 
%u"
                            "COORDINATOR appears to be a stragler!!, aborting.",
-                           cb->mMyEpoch);
-                       abort();
+                           cb->mMyEpoch, cb->mRulingEpoch);
+                       exit(1);
                        /* TODO: 080414 re-inserted the osafassert/abort ...
                           This is an extreemely odd case. Possibly it could
                           occur after a failover ?? */
-- 
2.18.0



_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to