Hi Thang,

ACK from me.

Best Regards,
Thien

-----Original Message-----
From: Thang Duc Nguyen <[email protected]> 
Sent: Tuesday, June 25, 2024 8:56 AM
To: Thien Minh Huynh <[email protected]>; Dat Tran Quoc Phan 
<[email protected]>
Cc: [email protected]; Thang Duc Nguyen 
<[email protected]>
Subject: [PATCH 1/1] smf: fix one step upgrade failed [#3354]

In large cluster or system under high load, during one step upgrade, SMF orders 
AMF to lock node group(NG). There are many request to IMM to update attribute 
and it causes the timeout respond from IMM to AMF. SMF receives timeout then 
retry lock again and again while the first lock still on going. When the first 
lock is successful and the request lock again from SMF will receive NO_OP error 
from AMF.

In this case, NO_OP should be considered as a success.
---
 src/smf/smfd/SmfAdminState.cc | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc 
index 958b7ae82..c20df8d74 100755
--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -926,6 +926,9 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
           saImmOmAdminOperationInvoke_2(ownerHandle_, &nodeGroupName, 0,
                                         adminOp, params, &oi_rc,
                                         smfd_cb->adminOpTimeout);
+      if ((imm_rc != SA_AIS_OK) || (oi_rc != SA_AIS_OK))
+        LOG_WA("%s: imm_rc: %s, oi_rc: %s", __FUNCTION__,
+            saf_error(imm_rc), saf_error(oi_rc));
       if ((imm_rc == SA_AIS_ERR_TRY_AGAIN) ||
           (imm_rc == SA_AIS_OK && oi_rc == SA_AIS_ERR_TRY_AGAIN)) {
         base::Sleep(base::MillisToTimespec(2000));
@@ -933,7 +936,8 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
       } else if (imm_rc == SA_AIS_ERR_TIMEOUT) {
         // Retry
         continue;
-      } else if (imm_rc == SA_AIS_ERR_NO_OP) {
+      } else if ((imm_rc == SA_AIS_ERR_NO_OP) ||
+                (oi_rc == SA_AIS_ERR_NO_OP)) {
         // If an admin operation is already performed SA_AIS_ERR_NO_OP
         // is returned. Treat this as OK, just log it and return
         // operation success
--
2.25.1



_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to