In large cluster or system under high load, during one step
upgrade, SMF orders AMF to lock node group(NG). There are
many request to IMM to update attribute and it causes the
timeout respond from IMM to AMF. SMF receives timeout then
retry lock again and again while the first lock still on
going. When the first lock is successful and the request
lock again from SMF will receive NO_OP error from AMF.

In this case, NO_OP should be considered as a success.
---
 src/smf/smfd/SmfAdminState.cc | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc
index 958b7ae82..c20df8d74 100755
--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -926,6 +926,9 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
           saImmOmAdminOperationInvoke_2(ownerHandle_, &nodeGroupName, 0,
                                         adminOp, params, &oi_rc,
                                         smfd_cb->adminOpTimeout);
+      if ((imm_rc != SA_AIS_OK) || (oi_rc != SA_AIS_OK))
+        LOG_WA("%s: imm_rc: %s, oi_rc: %s", __FUNCTION__,
+            saf_error(imm_rc), saf_error(oi_rc));
       if ((imm_rc == SA_AIS_ERR_TRY_AGAIN) ||
           (imm_rc == SA_AIS_OK && oi_rc == SA_AIS_ERR_TRY_AGAIN)) {
         base::Sleep(base::MillisToTimespec(2000));
@@ -933,7 +936,8 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
       } else if (imm_rc == SA_AIS_ERR_TIMEOUT) {
         // Retry
         continue;
-      } else if (imm_rc == SA_AIS_ERR_NO_OP) {
+      } else if ((imm_rc == SA_AIS_ERR_NO_OP) ||
+                (oi_rc == SA_AIS_ERR_NO_OP)) {
         // If an admin operation is already performed SA_AIS_ERR_NO_OP
         // is returned. Treat this as OK, just log it and return
         // operation success
-- 
2.25.1



_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to