lock-in [#2835]

Alex Jones Mon, 23 Apr 2018 07:05:48 -0700

   My comments inline:

   Alex


   On 04/20/2018 04:00 AM, Hans Nordeback wrote:
     __________________________________________________________________

   NOTICE: This email was received from an EXTERNAL sender
     __________________________________________________________________

   Hi Alex,

   please see below for some comments/questions.

   /Regards HansN

   On 04/18/2018 03:41 PM, Alex Jones wrote:

When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck in locked state when rebooting, or going through a PLM EE lock/unlock.

When amfd receives a START step from CLM tracking it attempts to gracefully
shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is
involved this doesn't always work correctly because PLM is also shutting down
the node by calling "opensafd stop". There is a race condition between PLM
using "opensafd stop", and amfd using the admin operations to bring down the
node, so that sometimes the AMF node gets stuck in locked state.

If the rootCauseEntity in the CLM tracking is a PLM entity then don't do
anything, as "opensafd stop" is already being called.
---
 src/amf/amfd/clm.cc | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc
index 2bcea2db0..7f675d8e9 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -274,6 +274,27 @@ static void clm_track_cb(
             TRACE_3("Already got callback for start of this change.");
             continue;
           }
+
+          if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
+                      "safEE=",
+                      sizeof("safEE=") - 1) == 0 ||
+              strncmp(osaf_extended_name_borrow(rootCauseEntity),
+                      "safHE=",
+                      sizeof("safHE=") - 1) == 0) {
+            // PLM will take care of calling opensafd stop
+            TRACE("rootCause: %s from PLM operation so skipping %u",
+                  osaf_extended_name_borrow(rootCauseEntity),
+                  notifItem->clusterNode.nodeId);
+
+            SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle,
+                                           invocation,
+                                           SA_CLM_CALLBACK_RESPONSE_OK));

   [HansN] perhaps use:
SaAisErrorT rc = saClmResponse_4 or SaAisErrorT rc{saClmResponse_4 instead?

   [Alex] I'm not sure what you are asking here. Do you not like the
   function syntax? And what is '{'? I don't understand your second
   suggestion.


+            if (rc != SA_AIS_OK)
+              LOG_ER("saClmResponse_4 failed: %i", rc);
+

     [HansN] I think the amf operational state has to be checked and set
     to disabled? And should
     break be used instead of continue?

   [Alex] Setting operational state to disabled is taken care of when
   COMPLETED is received in the track callback. My code change is only
   when receiving START. I used "continue" to explicitly mean that we are
   done processing this node, and we need to move to the next node in the
   for loop. The same thing is done in legacy code above when checking for
   "clm_change_start_preceded."

+            continue;
+          }
+
           /* invocation to be used by pending clm response */
           node->clm_pend_inv = invocation;
           clm_node_exit_start(node, notifItem->clusterChange);
@@ -304,7 +325,9 @@ static void clm_track_cb(
                 osaf_extended_name_borrow(rootCauseEntity),
                 notifItem->clusterNode.nodeId);
           if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
-                      "safEE=", 6) == 0) {
+                      "safEE=", 6) == 0 ||
+              strncmp(osaf_extended_name_borrow(rootCauseEntity),
+                      "safHE=", 6) == 0) {

     [HansN] sizeof("safHE=") as above

   [Alex] Agreed. I will make this change. And change the older code to
   conform.

             /* This callback is because of operation on PLM, so we need to mark
                the node absent, because PLCD will anyway call opensafd stop.*/
             AVD_AVND *node =

signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]

Reply via email to