osaf/services/saf/amf/amfd/sgproc.cc |  21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)


NG gets stuck in SHUTTING_DOWN state during shutdown op and controller failover.

During SHUTDOWN admin operation on NG, initial admin state is set to 
SHUTTING_DOWN and
it is checkpointed to standby AMFD. On decoding it, standby AMFD sets 
node->admin_ng
and it clears it when active AMFD checkpoints the LOCKED state. Now after 
fail-over when
AMFD gets quiescing success response from AMFND it clears this pointer in
process_su_si_response_for_ng() assuming there is only one SU hosted on that 
node.
After this when response for second SU comes, this response is not processed 
from NG
perspective as AMFD has already cleared node->admin_ng. Issue does not occur 
when node hosts
only one application SU.

Patch fixes the problem by avoiding clearing of node->admin_ng when NG is in 
SHUTTING_DOWN state.

diff --git a/osaf/services/saf/amf/amfd/sgproc.cc 
b/osaf/services/saf/amf/amfd/sgproc.cc
--- a/osaf/services/saf/amf/amfd/sgproc.cc
+++ b/osaf/services/saf/amf/amfd/sgproc.cc
@@ -400,6 +400,27 @@ void process_su_si_response_for_ng(AVD_S
                ng->node_oper_list.erase(Amf::to_string(&node->name));
                TRACE("node_oper_list size:%u",ng->oper_list_size());
        }
+
+       /*Handling for the case: There are pending assignments on more than one 
SUs
+         on same node of nodegroup with atleast one quiescing assignment and 
controller
+         failover occured. 
+         Below if block will be hit only when assignments for quiescing state 
are still pending 
+         on atleast one SU and on atleast one node of NG.
+       */
+       if ((ng->saAmfNGAdminState == SA_AMF_ADMIN_SHUTTING_DOWN) &&
+                       (ng->admin_ng_pend_cbk.admin_oper == 0) && 
+                       (ng->admin_ng_pend_cbk.invocation == 0)) {
+               /*During SHUTDOWN admin operation on NG, initial admin state is 
set to SHUTTING_DOWN
+                 and it is checkpointed to standby AMFD. On decoding it, 
standby AMFD sets
+                 node->admin_ng and it clears it when active AMFD checkpoints 
the LOCKED state.
+                 In case active AMFD sends quiescing state and reboots after 
checkpointing only 
+                 SHUTTING_DOWN state, standby AMFD will be able to mark NG 
LOCKED by processing
+                 response of assignments as it has set node->admin_ng. So this 
pointer should be
+                 cleared only when NG is marked LOCKED. And in that case we 
will not be in this if block. 
+                */
+               TRACE_1("'%s' in shutting_down state after 
failover.",ng->name.value);
+               goto done;
+       }
        /*If assignment changes are done on all the SUs on each node of 
nodegroup
          then reply to IMM for status of admin operation.*/
        if (ng->node_oper_list.empty())

------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to