My comments inline: Alex
On 04/20/2018 04:00 AM, Hans Nordeback wrote: __________________________________________________________________ NOTICE: This email was received from an EXTERNAL sender __________________________________________________________________ Hi Alex, please see below for some comments/questions. /Regards HansN On 04/18/2018 03:41 PM, Alex Jones wrote: When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get stuck in locked state when rebooting, or going through a PLM EE lock/unlock. When amfd receives a START step from CLM tracking it attempts to gracefully shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is involved this doesn't always work correctly because PLM is also shutting down the node by calling "opensafd stop". There is a race condition between PLM using "opensafd stop", and amfd using the admin operations to bring down the node, so that sometimes the AMF node gets stuck in locked state. If the rootCauseEntity in the CLM tracking is a PLM entity then don't do anything, as "opensafd stop" is already being called. --- src/amf/amfd/clm.cc | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index 2bcea2db0..7f675d8e9 100644 --- a/src/amf/amfd/clm.cc +++ b/src/amf/amfd/clm.cc @@ -274,6 +274,27 @@ static void clm_track_cb( TRACE_3("Already got callback for start of this change."); continue; } + + if (strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safEE=", + sizeof("safEE=") - 1) == 0 || + strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safHE=", + sizeof("safHE=") - 1) == 0) { + // PLM will take care of calling opensafd stop + TRACE("rootCause: %s from PLM operation so skipping %u", + osaf_extended_name_borrow(rootCauseEntity), + notifItem->clusterNode.nodeId); + + SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle, + invocation, + SA_CLM_CALLBACK_RESPONSE_OK)); [HansN] perhaps use: SaAisErrorT rc = saClmResponse_4 or SaAisErrorT rc{saClmResponse_4 instead? [Alex] I'm not sure what you are asking here. Do you not like the function syntax? And what is '{'? I don't understand your second suggestion. + if (rc != SA_AIS_OK) + LOG_ER("saClmResponse_4 failed: %i", rc); + [HansN] I think the amf operational state has to be checked and set to disabled? And should break be used instead of continue? [Alex] Setting operational state to disabled is taken care of when COMPLETED is received in the track callback. My code change is only when receiving START. I used "continue" to explicitly mean that we are done processing this node, and we need to move to the next node in the for loop. The same thing is done in legacy code above when checking for "clm_change_start_preceded." + continue; + } + /* invocation to be used by pending clm response */ node->clm_pend_inv = invocation; clm_node_exit_start(node, notifItem->clusterChange); @@ -304,7 +325,9 @@ static void clm_track_cb( osaf_extended_name_borrow(rootCauseEntity), notifItem->clusterNode.nodeId); if (strncmp(osaf_extended_name_borrow(rootCauseEntity), - "safEE=", 6) == 0) { + "safEE=", 6) == 0 || + strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safHE=", 6) == 0) { [HansN] sizeof("safHE=") as above [Alex] Agreed. I will make this change. And change the older code to conform. /* This callback is because of operation on PLM, so we need to mark the node absent, because PLCD will anyway call opensafd stop.*/ AVD_AVND *node =
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel