osaf/services/saf/immsv/immpbed/immpbe.cc | 22 +++++++++++++--------- 1 files changed, 13 insertions(+), 9 deletions(-)
Using a syncronous admin-op results in the PBE getting blocked until the admin-op has been completed. The admin-op only sets a flag in he IMMND coord indicating the request to abort non crirical ccbs. The actual processing is done in ImmModel::cleanTheBasement with a period of 1 sec. To cover this the PBE also waits for 1 second after the admin-operation has returned to ensure that the abort messages have been generated by the IMMND coord. This still means that fevs messages for ccb aborts may be in transit. But since the attach of the PBE-OI itself goes over fevs, the PBE-OI attach should get processed after the aborts. The mechanism is not 100% tight. On the other hand, enabling and disabling the PBE should be rare and avoided and CCBs may get resource aborted for other reasons such as a blocked/overloaded filesystem. This adjustment to PBE was needed to cater for usage by SMF which is actually questionable since the PBE is togled off and on during an SMF campaign and not just before and after a campaign. But since the fix is trivial and has no serious down side, we accept it. The 1 second delay will not delay cluster restart or any other realtime critical operations. The PBE availability is only necessary to service CCBs and persistent runtime data changes, neither of which should ever be realtime critical or SA critical. diff --git a/osaf/services/saf/immsv/immpbed/immpbe.cc b/osaf/services/saf/immsv/immpbed/immpbe.cc --- a/osaf/services/saf/immsv/immpbed/immpbe.cc +++ b/osaf/services/saf/immsv/immpbed/immpbe.cc @@ -97,8 +97,9 @@ int main(int argc, char* argv[]) {0, 0, 0, 0} }; SaImmHandleT immHandle; - SaAisErrorT errorCode; - SaVersionT version; + SaAisErrorT errorCode = SA_AIS_OK; + SaAisErrorT admoRetVal = SA_AIS_OK; + SaVersionT version; SaImmAdminOwnerHandleT ownerHandle; @@ -123,6 +124,7 @@ int main(int argc, char* argv[]) unsigned int maxTries = 70; /* 70 times == max 70 secs */ unsigned int tryCount=0; const SaImmAdminOperationParamsT_2 *params[] = {NULL}; + SaImmAdminOperationParamsT_2 **retParams=NULL; if ((logPath = getenv("IMMSV_TRACE_PATHNAME"))) { @@ -325,19 +327,21 @@ int main(int argc, char* argv[]) exit(1); } - /* Admin-op invoked to abort any non-empty non critical CCBs. + /* Admin-op invoked to abort any non-empty non critical CCBs (#1261, #1107). Such CCbs are doomed if the PBE (primary or slave) restarts. Slave PBE can in fact not attach as long as there are active - non-empty CCBs in the system. + non-empty CCBs in the system. */ - errorCode = saImmOmAdminOperationInvokeAsync_o3(ownerHandle, 1, + errorCode = saImmOmAdminOperationInvoke_o3(ownerHandle, "safRdn=immManagement,safApp=safImmService", 0, - SA_IMM_ADMIN_ABORT_CCBS, params); + SA_IMM_ADMIN_ABORT_CCBS, params, &admoRetVal, 0, &retParams); - if(SA_AIS_OK != errorCode) + if((errorCode == SA_AIS_OK) && (admoRetVal == SA_AIS_OK)) { - LOG_WA("Failed to invoke admin-op for aborting CCBs: err:%u - ignoring", - errorCode); + sleep(1); /* Sleep for 1 second since cleanTheBasement runs once every second. */ + } else { + LOG_WA("PBE: Problem with invoking admin-op for aborting noncritical CCBs " + "retvals: %u %u- ignoring", errorCode, admoRetVal); } /* ------------------------------------------------------------------------------ Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel