osaf/services/saf/immsv/immpbed/immpbe.cc |  24 ++++++++++++++++++++++--
 1 files changed, 22 insertions(+), 2 deletions(-)


If the PBE detaches and re-attaches while there are one or more open
non-critical (not yet committing) but non-empty CCBs, then before this
enhancement one would see the following in the syslog at apply of the CCB:

 May 20 13:25:33 SC-2 local0.notice osafimmnd[406]: NO STARTING PBE process.
 ......
 May 20 13:25:34 SC-2 local0.notice osafimmnd[406]: NO PBE-OI established on 
this SC. Dumping incrementally to file imm.db
 May 20 13:25:49 SC-2 local0.info osafimmnd[406]: IN GOING FROM IMM_CCB_PREPARE 
to IMM_CCB_CRITICAL Ccb:4
 May 20 13:25:49 SC-2 user.notice osafimmpbed: NO Record for ccb 0x4 not found 
or found aborted in ok_for_critical
 May 20 13:25:49 SC-2 user.warn osafimmpbed: WA WARNING: CCB record for 4 does 
not have correct op-count
 May 20 13:25:49 SC-2 local0.notice osafimmnd[406]: NO Invalid error reported 
implementer 'OpenSafImmPBE', Ccb 4 will be aborted

While this does catch the problem and aborts the CCB, the op-count mechanism
that catches this is not intended for handling regular processing cases.
It is an extra safety harness intended to catch bugs, lost messages, or 
incorrect
behavior of the PBE.

This enhancement avoids dependence on the op-count safety harness by having the
restarted PBE (primary or slave) invoking the special admin-operation that 
aborts
all non-critical CCBs in the immsv. See enhancement ticket #1107 or the IMMSV 
README
for details about his admin-operation.

The newly (re)started PBE invokes the admin-operation asynchronously to avoid 
getting
blocked waiting on reply for this admin-op. The risk of the admin-op failing is
minimal and if it does fail then we end up in the same distributed logic as we 
have
today. That is we would end up in the op-count safety-harness. No CCB can get 
applied
without ack from the PBE and so the admin-operation, if it is successfully 
received by
the IMMND coord, should result in all currently non-critical CCBS getting 
aborted before
the PBE can get any completed/apply for such a CCB over FEVS.

With this enhancement, if the PBE detaches and re-attaches while there are one 
or more
open non-critical (not yet committing) and non-empty CCBs, then these CCBs will 
be
aborted. The newly attached PBE may possibly  get an abort callback for such 
CCbs,
but these are ignored by the PBE.

With this enhancement one will see something like the following in the syslog 
at an attempt
tp apply a CCB that was active during detach and attach of PBE:

May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Persistent Back End OI 
attached, pid: 764
May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Received: immadm -o 202 
safRdn=immManagement,safApp=safImmService
May 21 12:41:34 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs = 
true;
May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Implementer connected: 19 
(OpenSafImmPBE) <332, 2020f>
May 21 12:41:34 SC-2 user.info osafimmpbed: IN Admop for aborting CCBs result: 
1, immsv returned 1
May 21 12:41:34 SC-2 user.notice osafimmpbed: NO Update epoch 21 committing 
with ccbId:100000014/4294967316
May 21 12:41:34 SC-2 local0.notice osafimmd[396]: NO IMMND coord at 2020f
May 21 12:41:34 SC-2 local0.info osafimmnd[406]: IN Update of epoch is 
PERSISTENT.
May 21 12:41:35 SC-2 local0.notice osafimmnd[406]: NO PBE-OI established on 
this SC. Dumping incrementally to file imm.db
May 21 12:41:35 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs is 
true => set max_oi_timeout to 0
May 21 12:41:35 SC-2 local0.notice osafimmnd[406]: NO CCB 5 aborted by: immadm 
-o 202 safRdn=immManagement,safApp=safImmService
May 21 12:41:35 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs reset 
to false
May 21 12:41:35 SC-2 local0.warn osafimmnd[406]: WA Timeout while waiting for 
implementer, aborting ccb:5
May 21 12:41:35 SC-2 user.warn osafimmpbed: WA Failed to find CCB object for 5/5
May 21 12:41:45 SC-2 local0.notice osafimmnd[406]: NO Ccb <5> not in correct 
state (12) for Apply ignoring request
May 21 12:41:45 SC-2 local0.warn osafimmnd[406]: WA Spurious and redundant 
ccb-apply request ignored ccbId:5

diff --git a/osaf/services/saf/immsv/immpbed/immpbe.cc 
b/osaf/services/saf/immsv/immpbed/immpbe.cc
--- a/osaf/services/saf/immsv/immpbed/immpbe.cc
+++ b/osaf/services/saf/immsv/immpbed/immpbe.cc
@@ -33,9 +33,13 @@
 
 static void saImmOmAdminOperationInvokeCallback(SaInvocationT invocation,
        SaAisErrorT operationReturnValue,
-       SaAisErrorT)
+       SaAisErrorT err)
 {
-       LOG_ER("Unexpected async admin-op callback invocation:%llx", 
invocation);
+       if(invocation == 1) {
+               LOG_IN("Admop for aborting CCBs result: %u, immsv returned %u", 
operationReturnValue, err);
+       } else {
+               LOG_ER("Unexpected async admin-op callback invocation:%llx", 
invocation);
+       }
 }
 
 static const SaImmCallbacksT callbacks = {
@@ -118,6 +122,7 @@ int main(int argc, char* argv[])
        unsigned int            retryInterval = 1000000;        /* 1 sec */
        unsigned int            maxTries = 70;                          /* 70 
times == max 70 secs */
        unsigned int            tryCount=0;
+       const SaImmAdminOperationParamsT_2 *params[] = {NULL};
 
        if ((logPath = getenv("IMMSV_TRACE_PATHNAME")))
        {
@@ -320,6 +325,21 @@ int main(int argc, char* argv[])
                exit(1);
        }
 
+       /* Admin-op invoked to abort any non-empty non critical CCBs.
+          Such CCbs are doomed if the PBE (primary or slave) restarts.
+          Slave PBE can in fact not attach as long as there are active
+          non-empty CCBs in the system. 
+        */
+       errorCode = saImmOmAdminOperationInvokeAsync_o3(ownerHandle, 1,
+               "safRdn=immManagement,safApp=safImmService", 0, 
+               SA_IMM_ADMIN_ABORT_CCBS, params);
+
+       if(SA_AIS_OK != errorCode)
+       {
+               LOG_WA("Failed to invoke admin-op for aborting CCBs: err:%u - 
ignoring",
+                       errorCode);
+       }
+
        /*
        errorCode = saImmOmAdminOwnerSet(ownerHandle, objectNames, SA_IMM_ONE);
        */

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to