osaf/services/saf/immsv/immpbed/immpbe.cc | 24 ++++++++++++++++++++++-- 1 files changed, 22 insertions(+), 2 deletions(-)
If the PBE detaches and re-attaches while there are one or more open non-critical (not yet committing) but non-empty CCBs, then before this enhancement one would see the following in the syslog at apply of the CCB: May 20 13:25:33 SC-2 local0.notice osafimmnd[406]: NO STARTING PBE process. ...... May 20 13:25:34 SC-2 local0.notice osafimmnd[406]: NO PBE-OI established on this SC. Dumping incrementally to file imm.db May 20 13:25:49 SC-2 local0.info osafimmnd[406]: IN GOING FROM IMM_CCB_PREPARE to IMM_CCB_CRITICAL Ccb:4 May 20 13:25:49 SC-2 user.notice osafimmpbed: NO Record for ccb 0x4 not found or found aborted in ok_for_critical May 20 13:25:49 SC-2 user.warn osafimmpbed: WA WARNING: CCB record for 4 does not have correct op-count May 20 13:25:49 SC-2 local0.notice osafimmnd[406]: NO Invalid error reported implementer 'OpenSafImmPBE', Ccb 4 will be aborted While this does catch the problem and aborts the CCB, the op-count mechanism that catches this is not intended for handling regular processing cases. It is an extra safety harness intended to catch bugs, lost messages, or incorrect behavior of the PBE. This enhancement avoids dependence on the op-count safety harness by having the restarted PBE (primary or slave) invoking the special admin-operation that aborts all non-critical CCBs in the immsv. See enhancement ticket #1107 or the IMMSV README for details about his admin-operation. The newly (re)started PBE invokes the admin-operation asynchronously to avoid getting blocked waiting on reply for this admin-op. The risk of the admin-op failing is minimal and if it does fail then we end up in the same distributed logic as we have today. That is we would end up in the op-count safety-harness. No CCB can get applied without ack from the PBE and so the admin-operation, if it is successfully received by the IMMND coord, should result in all currently non-critical CCBS getting aborted before the PBE can get any completed/apply for such a CCB over FEVS. With this enhancement, if the PBE detaches and re-attaches while there are one or more open non-critical (not yet committing) and non-empty CCBs, then these CCBs will be aborted. The newly attached PBE may possibly get an abort callback for such CCbs, but these are ignored by the PBE. With this enhancement one will see something like the following in the syslog at an attempt tp apply a CCB that was active during detach and attach of PBE: May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Persistent Back End OI attached, pid: 764 May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Received: immadm -o 202 safRdn=immManagement,safApp=safImmService May 21 12:41:34 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs = true; May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Implementer connected: 19 (OpenSafImmPBE) <332, 2020f> May 21 12:41:34 SC-2 user.info osafimmpbed: IN Admop for aborting CCBs result: 1, immsv returned 1 May 21 12:41:34 SC-2 user.notice osafimmpbed: NO Update epoch 21 committing with ccbId:100000014/4294967316 May 21 12:41:34 SC-2 local0.notice osafimmd[396]: NO IMMND coord at 2020f May 21 12:41:34 SC-2 local0.info osafimmnd[406]: IN Update of epoch is PERSISTENT. May 21 12:41:35 SC-2 local0.notice osafimmnd[406]: NO PBE-OI established on this SC. Dumping incrementally to file imm.db May 21 12:41:35 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs is true => set max_oi_timeout to 0 May 21 12:41:35 SC-2 local0.notice osafimmnd[406]: NO CCB 5 aborted by: immadm -o 202 safRdn=immManagement,safApp=safImmService May 21 12:41:35 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs reset to false May 21 12:41:35 SC-2 local0.warn osafimmnd[406]: WA Timeout while waiting for implementer, aborting ccb:5 May 21 12:41:35 SC-2 user.warn osafimmpbed: WA Failed to find CCB object for 5/5 May 21 12:41:45 SC-2 local0.notice osafimmnd[406]: NO Ccb <5> not in correct state (12) for Apply ignoring request May 21 12:41:45 SC-2 local0.warn osafimmnd[406]: WA Spurious and redundant ccb-apply request ignored ccbId:5 diff --git a/osaf/services/saf/immsv/immpbed/immpbe.cc b/osaf/services/saf/immsv/immpbed/immpbe.cc --- a/osaf/services/saf/immsv/immpbed/immpbe.cc +++ b/osaf/services/saf/immsv/immpbed/immpbe.cc @@ -33,9 +33,13 @@ static void saImmOmAdminOperationInvokeCallback(SaInvocationT invocation, SaAisErrorT operationReturnValue, - SaAisErrorT) + SaAisErrorT err) { - LOG_ER("Unexpected async admin-op callback invocation:%llx", invocation); + if(invocation == 1) { + LOG_IN("Admop for aborting CCBs result: %u, immsv returned %u", operationReturnValue, err); + } else { + LOG_ER("Unexpected async admin-op callback invocation:%llx", invocation); + } } static const SaImmCallbacksT callbacks = { @@ -118,6 +122,7 @@ int main(int argc, char* argv[]) unsigned int retryInterval = 1000000; /* 1 sec */ unsigned int maxTries = 70; /* 70 times == max 70 secs */ unsigned int tryCount=0; + const SaImmAdminOperationParamsT_2 *params[] = {NULL}; if ((logPath = getenv("IMMSV_TRACE_PATHNAME"))) { @@ -320,6 +325,21 @@ int main(int argc, char* argv[]) exit(1); } + /* Admin-op invoked to abort any non-empty non critical CCBs. + Such CCbs are doomed if the PBE (primary or slave) restarts. + Slave PBE can in fact not attach as long as there are active + non-empty CCBs in the system. + */ + errorCode = saImmOmAdminOperationInvokeAsync_o3(ownerHandle, 1, + "safRdn=immManagement,safApp=safImmService", 0, + SA_IMM_ADMIN_ABORT_CCBS, params); + + if(SA_AIS_OK != errorCode) + { + LOG_WA("Failed to invoke admin-op for aborting CCBs: err:%u - ignoring", + errorCode); + } + /* errorCode = saImmOmAdminOwnerSet(ownerHandle, objectNames, SA_IMM_ONE); */ ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel