Re: [devel] [PATCH 1/1] amfnd: don't attempt su failover if active controller is rebooting [#3035]

Nagendra Kumar Tue, 07 May 2019 23:13:08 -0700

Hi Alex,

The patch looks good to me. Ack.


 

Thanks

-Nagendra, +91-9866424860

High Availability Solutions

(OpenSAF Support and Services)

 <http://www.hasolutions.in/> www.hasolutions.in

 <mailto:[email protected]> [email protected]

Delaware, USA: +1 508-422-7725    |    Hyderabad, India: +91 798-992-5293 

 

 

From: Jones, Alex [mailto:[email protected]] 
Sent: 08 May 2019 01:17
To: [email protected]; [email protected]; 
[email protected]
Cc: [email protected]; Jones, Alex
Subject: [PATCH 1/1] amfnd: don't attempt su failover if active controller is 
rebooting [#3035]

 

In N+M model CSI-remove responses can get lost if active controller reboots.
In this case SG will be stuck in unstable state, and standby will never get
assignments.

We are the active controller, active for N+M, SU failover is set, and
failfast on termination failure is set for the nodes. If a component in the
SU crashes, and another component fails during cleanup, the node does
failfast. It currently attempts to do su failover in this case, but the
csi-remove responses from the payload can get lost because we are rebooting.
They eventually show up on the new active, but we get message-id errors.

Set a flag when the active controller is about to reboot. If the flag is set,
then don't do SU failover. Let the new active take care of the failover.
---
src/amf/amfd/node.cc | 1 +
src/amf/amfd/node.h | 1 +
src/amf/amfd/sgproc.cc | 7 +++++++
src/amf/amfd/util.cc | 3 +++
4 files changed, 12 insertions(+)

diff --git a/src/amf/amfd/node.cc b/src/amf/amfd/node.cc
index 7fc764f22..b8d8a7d77 100644
--- a/src/amf/amfd/node.cc
+++ b/src/amf/amfd/node.cc
@@ -121,6 +121,7 @@ void AVD_AVND::initialize() {
clm_pend_inv = {};
clm_change_start_preceded = {};
recvr_fail_sw = {};
+ actv_ctrl_reboot_in_progress = {};
admin_ng = {};
}

diff --git a/src/amf/amfd/node.h b/src/amf/amfd/node.h
index ecee5c591..dbe48dc43 100644
--- a/src/amf/amfd/node.h
+++ b/src/amf/amfd/node.h
@@ -140,6 +140,7 @@ class AVD_AVND {
CLM completed cb. */
bool recvr_fail_sw; /* to indicate there was node reboot because of node
failover/switchover.*/
+ bool actv_ctrl_reboot_in_progress;
AVD_AMF_NG *admin_ng; /* points to the nodegroup on which admin operation is
going on.*/
uint16_t node_up_msg_count; /* to count of node_up msg that director had
diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc
index 1537acac3..7c8d9a558 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -478,6 +478,13 @@ static uint32_t sg_su_failover_func(AVD_SU *su) {
goto done;
}

+ if (su->su_on_node->actv_ctrl_reboot_in_progress) {
+ TRACE("'%s' is already going down, so not doing SU failover",
+ su->name.c_str());
+ rc = NCSCC_RC_SUCCESS;
+ goto done;
+ }
+
su->set_oper_state(SA_AMF_OPERATIONAL_DISABLED);
su->set_readiness_state(SA_AMF_READINESS_OUT_OF_SERVICE);
if (su->saAmfSUAdminState == SA_AMF_ADMIN_LOCKED)
diff --git a/src/amf/amfd/util.cc b/src/amf/amfd/util.cc
index 14a4e0485..0dc3e99e3 100644
--- a/src/amf/amfd/util.cc
+++ b/src/amf/amfd/util.cc
@@ -1802,6 +1802,9 @@ void avd_d2n_reboot_snd(AVD_AVND *node) {
if (avd_d2n_msg_snd(avd_cb, node, d2n_msg) != NCSCC_RC_SUCCESS) {
LOG_ER("%s: snd to %x failed", __FUNCTION__, node->node_info.nodeId);
d2n_msg_free(d2n_msg);
+ } else if (node->node_info.nodeId == avd_cb->node_id_avd) {
+ TRACE("rebooting active amf director which is ourself");
+ node->actv_ctrl_reboot_in_progress = true;
}
}

-- 
2.17.2



  _____  

Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.

  _____  


_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: don't attempt su failover if active controller is rebooting [#3035]

Reply via email to