My comments inline: Alex
On 04/20/2018 04:00 AM, Hans Nordeback wrote:
__________________________________________________________________
NOTICE: This email was received from an EXTERNAL sender
__________________________________________________________________
Hi Alex,
please see below for some comments/questions.
/Regards HansN
On 04/18/2018 03:41 PM, Alex Jones wrote:
When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck in locked state when rebooting, or going through a PLM EE lock/unlock.
When amfd receives a START step from CLM tracking it attempts to gracefully
shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is
involved this doesn't always work correctly because PLM is also shutting down
the node by calling "opensafd stop". There is a race condition between PLM
using "opensafd stop", and amfd using the admin operations to bring down the
node, so that sometimes the AMF node gets stuck in locked state.
If the rootCauseEntity in the CLM tracking is a PLM entity then don't do
anything, as "opensafd stop" is already being called.
---
src/amf/amfd/clm.cc | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc
index 2bcea2db0..7f675d8e9 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -274,6 +274,27 @@ static void clm_track_cb(
TRACE_3("Already got callback for start of this change.");
continue;
}
+
+ if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
+ "safEE=",
+ sizeof("safEE=") - 1) == 0 ||
+ strncmp(osaf_extended_name_borrow(rootCauseEntity),
+ "safHE=",
+ sizeof("safHE=") - 1) == 0) {
+ // PLM will take care of calling opensafd stop
+ TRACE("rootCause: %s from PLM operation so skipping %u",
+ osaf_extended_name_borrow(rootCauseEntity),
+ notifItem->clusterNode.nodeId);
+
+ SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle,
+ invocation,
+ SA_CLM_CALLBACK_RESPONSE_OK));
[HansN] perhaps use:
SaAisErrorT rc = saClmResponse_4 or SaAisErrorT rc{saClmResponse_4 instead?
[Alex] I'm not sure what you are asking here. Do you not like the
function syntax? And what is '{'? I don't understand your second
suggestion.
+ if (rc != SA_AIS_OK)
+ LOG_ER("saClmResponse_4 failed: %i", rc);
+
[HansN] I think the amf operational state has to be checked and set
to disabled? And should
break be used instead of continue?
[Alex] Setting operational state to disabled is taken care of when
COMPLETED is received in the track callback. My code change is only
when receiving START. I used "continue" to explicitly mean that we are
done processing this node, and we need to move to the next node in the
for loop. The same thing is done in legacy code above when checking for
"clm_change_start_preceded."
+ continue;
+ }
+
/* invocation to be used by pending clm response */
node->clm_pend_inv = invocation;
clm_node_exit_start(node, notifItem->clusterChange);
@@ -304,7 +325,9 @@ static void clm_track_cb(
osaf_extended_name_borrow(rootCauseEntity),
notifItem->clusterNode.nodeId);
if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
- "safEE=", 6) == 0) {
+ "safEE=", 6) == 0 ||
+ strncmp(osaf_extended_name_borrow(rootCauseEntity),
+ "safHE=", 6) == 0) {
[HansN] sizeof("safHE=") as above
[Alex] Agreed. I will make this change. And change the older code to
conform.
/* This callback is because of operation on PLM, so we need to mark
the node absent, because PLCD will anyway call opensafd stop.*/
AVD_AVND *node =
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
