osaf/services/saf/immsv/immpbed/immpbe.cc | 24 ++++++++++++++++++++++--
1 files changed, 22 insertions(+), 2 deletions(-)
If the PBE detaches and re-attaches while there are one or more open
non-critical (not yet committing) but non-empty CCBs, then before this
enhancement one would see the following in the syslog at apply of the CCB:
May 20 13:25:33 SC-2 local0.notice osafimmnd[406]: NO STARTING PBE process.
......
May 20 13:25:34 SC-2 local0.notice osafimmnd[406]: NO PBE-OI established on
this SC. Dumping incrementally to file imm.db
May 20 13:25:49 SC-2 local0.info osafimmnd[406]: IN GOING FROM IMM_CCB_PREPARE
to IMM_CCB_CRITICAL Ccb:4
May 20 13:25:49 SC-2 user.notice osafimmpbed: NO Record for ccb 0x4 not found
or found aborted in ok_for_critical
May 20 13:25:49 SC-2 user.warn osafimmpbed: WA WARNING: CCB record for 4 does
not have correct op-count
May 20 13:25:49 SC-2 local0.notice osafimmnd[406]: NO Invalid error reported
implementer 'OpenSafImmPBE', Ccb 4 will be aborted
While this does catch the problem and aborts the CCB, the op-count mechanism
that catches this is not intended for handling regular processing cases.
It is an extra safety harness intended to catch bugs, lost messages, or
incorrect
behavior of the PBE.
This enhancement avoids dependence on the op-count safety harness by having the
restarted PBE (primary or slave) invoking the special admin-operation that
aborts
all non-critical CCBs in the immsv. See enhancement ticket #1107 or the IMMSV
README
for details about his admin-operation.
The newly (re)started PBE invokes the admin-operation asynchronously to avoid
getting
blocked waiting on reply for this admin-op. The risk of the admin-op failing is
minimal and if it does fail then we end up in the same distributed logic as we
have
today. That is we would end up in the op-count safety-harness. No CCB can get
applied
without ack from the PBE and so the admin-operation, if it is successfully
received by
the IMMND coord, should result in all currently non-critical CCBS getting
aborted before
the PBE can get any completed/apply for such a CCB over FEVS.
With this enhancement, if the PBE detaches and re-attaches while there are one
or more
open non-critical (not yet committing) and non-empty CCBs, then these CCBs will
be
aborted. The newly attached PBE may possibly get an abort callback for such
CCbs,
but these are ignored by the PBE.
With this enhancement one will see something like the following in the syslog
at an attempt
tp apply a CCB that was active during detach and attach of PBE:
May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Persistent Back End OI
attached, pid: 764
May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Received: immadm -o 202
safRdn=immManagement,safApp=safImmService
May 21 12:41:34 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs =
true;
May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Implementer connected: 19
(OpenSafImmPBE) <332, 2020f>
May 21 12:41:34 SC-2 user.info osafimmpbed: IN Admop for aborting CCBs result:
1, immsv returned 1
May 21 12:41:34 SC-2 user.notice osafimmpbed: NO Update epoch 21 committing
with ccbId:100000014/4294967316
May 21 12:41:34 SC-2 local0.notice osafimmd[396]: NO IMMND coord at 2020f
May 21 12:41:34 SC-2 local0.info osafimmnd[406]: IN Update of epoch is
PERSISTENT.
May 21 12:41:35 SC-2 local0.notice osafimmnd[406]: NO PBE-OI established on
this SC. Dumping incrementally to file imm.db
May 21 12:41:35 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs is
true => set max_oi_timeout to 0
May 21 12:41:35 SC-2 local0.notice osafimmnd[406]: NO CCB 5 aborted by: immadm
-o 202 safRdn=immManagement,safApp=safImmService
May 21 12:41:35 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs reset
to false
May 21 12:41:35 SC-2 local0.warn osafimmnd[406]: WA Timeout while waiting for
implementer, aborting ccb:5
May 21 12:41:35 SC-2 user.warn osafimmpbed: WA Failed to find CCB object for 5/5
May 21 12:41:45 SC-2 local0.notice osafimmnd[406]: NO Ccb <5> not in correct
state (12) for Apply ignoring request
May 21 12:41:45 SC-2 local0.warn osafimmnd[406]: WA Spurious and redundant
ccb-apply request ignored ccbId:5
diff --git a/osaf/services/saf/immsv/immpbed/immpbe.cc
b/osaf/services/saf/immsv/immpbed/immpbe.cc
--- a/osaf/services/saf/immsv/immpbed/immpbe.cc
+++ b/osaf/services/saf/immsv/immpbed/immpbe.cc
@@ -33,9 +33,13 @@
static void saImmOmAdminOperationInvokeCallback(SaInvocationT invocation,
SaAisErrorT operationReturnValue,
- SaAisErrorT)
+ SaAisErrorT err)
{
- LOG_ER("Unexpected async admin-op callback invocation:%llx",
invocation);
+ if(invocation == 1) {
+ LOG_IN("Admop for aborting CCBs result: %u, immsv returned %u",
operationReturnValue, err);
+ } else {
+ LOG_ER("Unexpected async admin-op callback invocation:%llx",
invocation);
+ }
}
static const SaImmCallbacksT callbacks = {
@@ -118,6 +122,7 @@ int main(int argc, char* argv[])
unsigned int retryInterval = 1000000; /* 1 sec */
unsigned int maxTries = 70; /* 70
times == max 70 secs */
unsigned int tryCount=0;
+ const SaImmAdminOperationParamsT_2 *params[] = {NULL};
if ((logPath = getenv("IMMSV_TRACE_PATHNAME")))
{
@@ -320,6 +325,21 @@ int main(int argc, char* argv[])
exit(1);
}
+ /* Admin-op invoked to abort any non-empty non critical CCBs.
+ Such CCbs are doomed if the PBE (primary or slave) restarts.
+ Slave PBE can in fact not attach as long as there are active
+ non-empty CCBs in the system.
+ */
+ errorCode = saImmOmAdminOperationInvokeAsync_o3(ownerHandle, 1,
+ "safRdn=immManagement,safApp=safImmService", 0,
+ SA_IMM_ADMIN_ABORT_CCBS, params);
+
+ if(SA_AIS_OK != errorCode)
+ {
+ LOG_WA("Failed to invoke admin-op for aborting CCBs: err:%u -
ignoring",
+ errorCode);
+ }
+
/*
errorCode = saImmOmAdminOwnerSet(ownerHandle, objectNames, SA_IMM_ONE);
*/
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel