Hi Nagu, Yes you are right. The "component failover" is unstable, I think I will post my analysis of component failover problem to #1902 after you have verified the other recoveries.
Thanks, Minh On 15/09/16 16:51, Nagendra Kumar wrote: > Hi Minh, > @2.a.) and @2.b.) are working except "Component Failover" as > recovery. Other recovery like SU Failover, etc are working fine with > 1725_pending_review.tgz and 07_no_recovery_if_no_pending_susi.diff. > > Please confirm. > > Thanks > -Nagu > >> -----Original Message----- >> From: Nagendra Kumar >> Sent: 15 September 2016 12:13 >> To: minh chau; hans.nordeb...@ericsson.com; Praveen Malviya; >> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation continuation if >> csi completes during headless [#1725 part 1] V1 >> >> Hi Minh, >>>> If there's no any major problem, can we make SI Dep as last phase? >> Yes, absolutely. There is no problem. >>>> If I am right, I think you are testing @2.a) - and *fault* has just been as >> node reboot/powered-off by user during headless. >> Yes, you are right. >> >> Thanks >> -Nagu >> >>> -----Original Message----- >>> From: minh chau [mailto:minh.c...@dektech.com.au] >>> Sent: 14 September 2016 17:54 >>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; >>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation >>> continuation if csi completes during headless [#1725 part 1] V1 >>> >>> Hi Nagu, >>> >>> I have proposed to change the order on 28 Jul: >>> >>> ============== >>> >>> I would like to change the above orders of implementation: >>> @0. We are here now: No admin op continuation, no recovery on faults >>> during headless. >>> Since componentRestart/suRestart has no impact on recovery after >>> headless, faults during headless here mean: failover escalation, node >>> reboot/powered-off by user during headless. Faults are different >>> phenomenons but they all result in loss of SUSI. Having #1902 will >>> remove the major impact of a node reboot due to immediate escalation >>> and AMF also has to deal with the loss of SUSI the same as without >>> #1902 plus failover escalation >>> >>> @1. Admin op continuation without required recovery on faults during >>> headless >>> @1.a) All CSI(s) callback completes during headless, but SUSI states >>> are still QUIESCED/QUIESCING >>> @1.b) One of CSI(s) callback is still ongoing after headless (AMFD >>> would have to wait for it?) >>> >>> @2. Recovery on faults. (Doing fault recovery needs to consider admin >>> op continuation which would have been implemented in step @1) Need >>> #1902 >>> @2.a.) Faults in normal flow: No admin op continuation is required >>> after headless, but fault did happen during headless >>> @2.b.) Faults happen during admin operation while headless, after >>> headless AMFD needs to consider a recovery on fault together with >>> admin op continuation. >>> >>> @3. @1 + @2 + With SI Dep. >>> >>> =============== >>> I thought we have followed the above order so far? Because part 1 was >>> acked, which is "@1. Admin op continuation without required recovery >>> on faults during headless" >>> If there's no any major problem, can we make SI Dep as last phase? >>> If I am right, I think you are testing @2.a) - and *fault* has just >>> been as node reboot/powered-off by user during headless. >>> >>> Thanks, >>> Minh >>> >>> On 14/09/16 21:48, Nagendra Kumar wrote: >>>> Hi Minh, >>>> If it is not tested, then it is fine. But, we had added (#1) the >>> following in the ticket #1725 on 27 Jul : >>>> =========================================== >>>> Nagendra Kumar - 2016-07-27 >>>> >>>> For 2N red model, implementation can be done in the following phased >>> manner. >>>> It has advantages of being logically segregated and it continues >>>> from >>> where we left in 5.0. >>>> (Phases #1, #2 and #3 is more related to ticket #1725 and phases #4 >>>> and #5 are related to #1902) >>>> >>>> 1. Node restart escalation (with and without SI Dep). >>>> 2. Without Si Dep : Admin op (no faults/escalations). >>>> 3. Without Si Dep : Admin Op + node restart faults/escalations during >>> headless. >>>> 4. Without Si Dep : >>>> a.) All faults in normal flows. >>>> b.) All faults during admin operation(minus node reboot during >>>> headless >>> as covered in #3). >>>> 5. With Si Dep : #2, #3 and #4. >>>> >>>> Since 5.0 already has immediate escalation model (component and node >>> restart/reboot), so #1, #2 and #3 completes left over portion of >>> headless contribution in 5.0 with that model. >>>> ====================================== >>>> >>>> Thanks >>>> -Nagu >>>> >>>>> -----Original Message----- >>>>> From: minh chau [mailto:minh.c...@dektech.com.au] >>>>> Sent: 14 September 2016 17:05 >>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation >>>>> continuation if csi completes during headless [#1725 part 1] V1 >>>>> >>>>> Hi Nagu, >>>>> >>>>> SI Dep is the last phase of implementation of headless recovery, >>>>> its support is not included in all patches attached in ticket #1725. >>>>> >>>>> Thanks, >>>>> Minh >>>>> >>>>> On 14/09/16 21:21, Nagendra Kumar wrote: >>>>>> Hi Minh, >>>>>> Have you tested Si Dep (2N Red model) for "node restart test >>>>> cases" ? I can't see it in the test case doc. >>>>>> Thanks >>>>>> -Nagu >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Nagendra Kumar >>>>>>> Sent: 13 September 2016 11:20 >>>>>>> To: minh chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation >>>>>>> continuation if csi completes during headless [#1725 part 1] V1 >>>>>>> >>>>>>> Hi Minh, >>>>>>> I have tested these scenarios again and it works well. >>>>>>> >>>>>>> Thanks >>>>>>> -Nagu >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: minh chau [mailto:minh.c...@dektech.com.au] >>>>>>>> Sent: 12 September 2016 11:53 >>>>>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen >>> Malviya; >>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>> Subject: Re: [PATCH 2 of 4] AMFND: Admin operation continuation >>>>>>>> if csi completes during headless [#1725 part 1] V1 >>>>>>>> >>>>>>>> Hi Nagu, >>>>>>>> >>>>>>>> One bug get hit by your configuration, where the absent SUSIs >>>>>>>> are found after headless but no real SUSIs are available also. >>>>>>>> In this case I think that AMFD can do like a fresh assignment. >>>>>>>> I attach the patch to ticket #1725, please help to test again. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Minh >>>>>>>> >>>>>>>> On 12/09/16 11:09, minh chau wrote: >>>>>>>>> Hi Nagu, >>>>>>>>> >>>>>>>>> I'm running the tests with this configuration and will get back to >> you. >>>>>>>>> Thanks, >>>>>>>>> Minh >>>>>>>>> >>>>>>>>> On 09/09/16 22:26, Nagendra Kumar wrote: >>>>>>>>>> Hi Minh, >>>>>>>>>> I am using 1725_pending_review.tgz >>>>>>>>>> (1725_02_V2_bugfix_01_resend_buffer_in_set_leds.diff, >>>>>>>>>> 1725_02_V2_bugfix_02_honor_clusterinit_nodesync_timer.diff, >>>>>>>>>> 1725_02_V2_bugfix_03_restore_ng_admin.diff, >>>>>>>>>> 1725_03_V4_failover_absent_susi_longDn.diff, >>>>>>>>>> 1725_04_V2_headless_validation.diff, >>>>>>>>>> 1725_05_V2_resend_oper_state.diff, >>>>>>>>>> 1725_06a_fullscope_escalation_headless.diff). >>>>>>>>>> >>>>>>>>>> I am doing basic node reboot validation testing with no faults. >>>>>>>>>> >>>>>>>>>> Configuration: SU1(act) and SU2(stanby) both on PL-3. >>>>>>>>>> >>>>>>>>>> TC #1: Start SC-1, PL-3 and PL-5: Unlock SU1 and SU2. Stop >>>>>>>>>> SC-1 and stop PL-3, start PL-3 and start SC-1. >>>>>>>>>> After SC-1 and PL-3 comes back, ideally SU1 and SU2 should get >>>>>>>>>> assignments as Act and Std, but no assignment are being given >>>>>>>>>> to SUs on PL-3 and it shows following in status: >>>>>>>>>> >>>>>>>>>> Only Su2 has Std assignment. >>>>>>>>>> >>>>>>>>>> safSISU=safSu=SC- >>>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=O >>>>>>>>>> penSAF >>>>>>>>>> >>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>> safSISU=safSu=PL- >>>>>>>> 5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=O >>>>>>>>>> penSAF >>>>>>>>>> >>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>> >> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>>> mo1,s >>>>>>>>>> afApp=AmfDemo1 >>>>>>>>>> >>>>>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>>> 2N,safApp=OpenSAF >>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>> safSISU=safSu=PL- >>>>>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=O >>>>>>>>>> penSAF >>>>>>>>>> >>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>> >>>>>>>>>> TC #2: Configuration same as TC#1. Stop PL-3 and don't start. >>>>>>>>>> The same issue: >>>>>>>>>> safSISU=safSu=PL- >>>>>>>> 5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=O >>>>>>>>>> penSAF >>>>>>>>>> >>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>> >> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>>> mo1,s >>>>>>>>>> afApp=AmfDemo1 >>>>>>>>>> >>>>>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>>>>> safSISU=safSu=SC- >>>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=O >>>>>>>>>> penSAF >>>>>>>>>> >>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>>> 2N,safApp=OpenSAF >>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>> >>>>>>>>>> TC #3: Configured SU1(Act) on PL-3 and SU2(Std) on PL-4. >>>>>>>>>> Stop SC-1, stop PL-3 and PL-4, but PL-5 is running. start >>>>>>>>>> SC-1, the same issue. >>>>>>>>>> >>>>>>>>>> TC #4: Same as TC #3, but SU3 configured on PL-5 as spare. SU3 >>>>>>>>>> doesn't get any assignment and Sg is unstable. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> -Nagu >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >>>>>>>>>>> Sent: 18 August 2016 05:46 >>>>>>>>>>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen >>>>>>> Malviya; >>>>>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au; >>>>>>>>>>> minh.c...@dektech.com.au >>>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>>>> Subject: [PATCH 2 of 4] AMFND: Admin operation continuation >>>>>>>>>>> if csi completes during headless [#1725 part 1] V1 >>>>>>>>>>> >>>>>>>>>>> osaf/services/saf/amf/amfnd/di.cc | 199 >>>>>>>>>>> +++++++++++++++++-------- >>>>>>>>>>> osaf/services/saf/amf/amfnd/include/avnd_di.h | 1 + >>>>>>>>>>> 2 files changed, 134 insertions(+), 66 deletions(-) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> There're two options basically that AMFD can continue admin >>>>>>>>>>> operation wih completed csi(s) >>>>>>>>>>> >>>>>>>>>>> First: AMFD can use the sync SUSI fsm state as latest, AMFD >>>>>>>>>>> then has to explore its SUSI assignments with adminStates of >>>>>>>>>>> relevant entities to determine which SU should be on call of >>> susi_success(). >>>>>>>>>>> Deeper level of exploration for csi addition. It also depends >>>>>>>>>>> on SG Fsm state which is being used variously in different SG >> types. >>>>>>>>>>> Second: AMFD uses the SUSI fsm state read from IMM as latest, >>>>>>>>>>> and AMFND needs to resend susi_resp messages which were >>>>>>>>>>> deferred during headless so that AMFD can continue the admin >>>>>>>>>>> operation >>>>>>> sequence. >>>>>>>>>>> Both cases of csi completion [during or after] headless can >>>>>>>>>>> run in the same code flow. >>>>>>>>>>> >>>>>>>>>>> The patch buffers susi_resp_msg during headless stage and >>>>>>>>>>> resend it to AMFD after headless. There could be a chance >>>>>>>>>>> that AMFND sent out susi response message but AMFD could not >>> receive >>>>>>>>>>> or process it. This case could be seen as a defect, which can >>>>>>>>>>> be fixed by securing the result of sending susi_resp message >>>>>>>>>>> from AMFND toward >>>>>>> AMFD. >>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>>> b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>>> --- a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>>> @@ -805,11 +805,6 @@ uint32_t >>> avnd_di_susi_resp_send(AVND_CB >>>>>>>>>>> if (cb->term_state == >>>>>>>>>>> AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED) >>>>>>>>>>> return rc; >>>>>>>>>>> >>>>>>>>>>> - if (cb->is_avd_down == true) { >>>>>>>>>>> - m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>>> - return rc; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> // should be in assignment pending state to be here >>>>>>>>>>> osafassert(m_AVND_SU_IS_ASSIGN_PEND(su)); >>>>>>>>>>> >>>>>>>>>>> @@ -820,64 +815,76 @@ uint32_t >>>>> avnd_di_susi_resp_send(AVND_CB >>>>>>>>>>> TRACE_ENTER2("Sending Resp su=%s, si=%s, >>>>>>>>>>> curr_state=%u, prv_state=%u", su->name.value, >>>>>>>>>>> curr_si->name.value,curr_si- >>>>>>>>>>>> curr_state,curr_si->prv_state); >>>>>>>>>>> /* populate the susi resp msg */ >>>>>>>>>>> msg.info.avd = new AVSV_DND_MSG(); >>>>>>>>>>> - msg.type = AVND_MSG_AVD; >>>>>>>>>>> - msg.info.avd->msg_type = >>>>>>> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb- >>>>>>>>>>>> snd_msg_id); >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.node_id = cb- >>>>>>>>>>>> node_info.nodeId; >>>>>>>>>>> - if (si) { >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.single_csi = >>>>>>>>>>> - ((si->single_csi_add_rem_in_si == >>>>>>>>>>> AVSV_SUSI_ACT_BASE) ? >>>>>>>>>>> false : true); >>>>>>>>>>> - } >>>>>>>>>>> - TRACE("curr_assign_state '%u'", curr_si- >>> curr_assign_state); >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>>> - (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >> || >>>>>>>>>>> - m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) >> ? >>>>>>>>>>> - ((!curr_si->prv_state) ? AVSV_SUSI_ACT_ASGN : >>>>>>>>>>> AVSV_SUSI_ACT_MOD) : AVSV_SUSI_ACT_DEL; >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.su_name = su- >>>>>> name; >>>>>>>>>>> - if (si) { >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.si_name = si->name; >>>>>>>>>>> - if (AVSV_SUSI_ACT_ASGN == >>>>>>>>>>> si->single_csi_add_rem_in_si) { >>>>>>>>>>> - TRACE("si->curr_assign_state '%u'", >>>>>>>>>>> curr_si- >>>>>>>>>>>> curr_assign_state); >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>>> - >>>>>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>>>>> - >>>>>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) ? >>>>>>>>>>> - AVSV_SUSI_ACT_ASGN : >>>>>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>>>>> - } >>>>>>>>>>> - } >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.ha_state = >>>>>>>>>>> - (SA_AMF_HA_QUIESCING == curr_si->curr_state) ? >>>>>>>>>>> SA_AMF_HA_QUIESCED : curr_si->curr_state; >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.error = >>>>>>>>>>> - (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >> || >>>>>>>>>>> - m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_REMOVED(curr_si)) >> ? >>>>>>>>>>> NCSCC_RC_SUCCESS : NCSCC_RC_FAILURE; >>>>>>>>>>> + msg.type = AVND_MSG_AVD; >>>>>>>>>>> + msg.info.avd->msg_type = >>>>> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.node_id = cb- >>>>>>>>>>>> node_info.nodeId; >>>>>>>>>>> + if (si) { >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.single_csi = >>>>>>>>>>> + ((si->single_csi_add_rem_in_si == >>>>>>>>>>> AVSV_SUSI_ACT_BASE) ? false : true); >>>>>>>>>>> + } >>>>>>>>>>> + TRACE("curr_assign_state '%u'", curr_si->curr_assign_state); >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>>> + >>>>>>>>>>> >> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>> || >>>>>>>>>>> + >>>>>>>>>>> >> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) >>> ? >>>>>>>>>>> + ((!curr_si->prv_state) ? >>>>>>>>>>> AVSV_SUSI_ACT_ASGN : AVSV_SUSI_ACT_MOD) : >>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.su_name = su- >>>> name; >>>>>>>>>>> + if (si) { >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.si_name = >>>>>>>>>>> + si- >>>>>>>>>>>> name; >>>>>>>>>>> + if (AVSV_SUSI_ACT_ASGN == si->single_csi_add_rem_in_si) >> { >>>>>>>>>>> + TRACE("si->curr_assign_state '%u'", curr_si- >>>>>>>>>>>> curr_assign_state); >>>>>>>>>>> + msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>>> + >>>>>>>>>>> >> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>> || >>>>>>>>>>> + >>>>>>>>>>> >> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) >>> ? >>>>>>>>>>> + AVSV_SUSI_ACT_ASGN : >>>>>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.ha_state = >>>>>>>>>>> + (SA_AMF_HA_QUIESCING == curr_si->curr_state) ? >>>>>>>>>>> SA_AMF_HA_QUIESCED : curr_si->curr_state; >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.error = >>>>>>>>>>> + >>>>>>>>>>> >> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>> || >>>>>>>>>>> + >>>>>>>>>>> >> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_REMOVED(curr_si)) >>> ? >>>>>>>>>>> +NCSCC_RC_SUCCESS : NCSCC_RC_FAILURE; >>>>>>>>>>> >>>>>>>>>>> - if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>>> AVSV_SUSI_ACT_ASGN) >>>>>>>>>>> - osafassert(si); >>>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>>> AVSV_SUSI_ACT_ASGN) >>>>>>>>>>> + osafassert(si); >>>>>>>>>>> >>>>>>>>>>> - /* send the msg to AvD */ >>>>>>>>>>> - TRACE("Sending. msg_id'%u', node_id'%u', msg_act'%u', >>>>>>>>>>> su'%s', si'%s', >>>>>>>>>>> ha_state'%u', error'%u', single_csi'%u'", >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_id, >>>>>>>>>>> msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.node_id, >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>>>>> msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>>>>> msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.single_csi); >>>>>>>>>>> + /* send the msg to AvD */ >>>>>>>>>>> + TRACE("Sending. msg_id'%u', node_id'%u', msg_act'%u', >>>>>>>>>>> + su'%s', >>>>>>>>>>> si'%s', ha_state'%u', error'%u', single_csi'%u'", >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id, >>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.node_id, >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>>>>> +msg.info.avd->msg_info.n2d_su_si_assign.single_csi); >>>>>>>>>>> >>>>>>>>>>> - if ((su->si_list.n_nodes > 1) && (si == nullptr)) { >>>>>>>>>>> - if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act >> == >>>>>>>>>>> AVSV_SUSI_ACT_DEL) >>>>>>>>>>> - LOG_NO("Removed 'all SIs' from '%s'", >>>>>>>>>>> su->name.value); >>>>>>>>>>> + if ((su->si_list.n_nodes > 1) && (si == nullptr)) { >>>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act >>>>>>>>>>> + == >>>>>>>>>>> AVSV_SUSI_ACT_DEL) >>>>>>>>>>> + LOG_NO("Removed 'all SIs' from '%s'", su- >>>>>>>>>>>> name.value); >>>>>>>>>>> - if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act >> == >>>>>>>>>>> AVSV_SUSI_ACT_MOD) >>>>>>>>>>> - LOG_NO("Assigned 'all SIs' %s of '%s'", >>>>>>>>>>> - ha_state[msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.ha_state], >>>>>>>>>>> - su->name.value); >>>>>>>>>>> - } >>>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act >>>>>>>>>>> + == >>>>>>>>>>> AVSV_SUSI_ACT_MOD) >>>>>>>>>>> + LOG_NO("Assigned 'all SIs' %s of '%s'", >>>>>>>>>>> + ha_state[msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.ha_state], >>>>>>>>>>> + su->name.value); >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> - rc = avnd_di_msg_send(cb, &msg); >>>>>>>>>>> - if (NCSCC_RC_SUCCESS == rc) >>>>>>>>>>> - msg.info.avd = 0; >>>>>>>>>>> - >>>>>>>>>>> - /* we have completed the SU SI msg processing */ >>>>>>>>>>> - if (su_assign_state_is_stable(su)) >>>>>>>>>>> - m_AVND_SU_ASSIGN_PEND_RESET(su); >>>>>>>>>>> - m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>>> + if (cb->is_avd_down == true) { >>>>>>>>>>> + // We are in headless, buffer this msg >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0; >>>>>>>>>>> + if (avnd_diq_rec_add(cb, &msg) == nullptr) { >>>>>>>>>>> + rc = NCSCC_RC_FAILURE; >>>>>>>>>>> + } >>>>>>>>>>> + m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>>> + LOG_NO("avnd_di_susi_resp_send() deferred as AMF >>>>>>>>>>> director is offline"); >>>>>>>>>>> + } else { >>>>>>>>>>> + // We are in normal cluster, send msg to director >>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id = >>>>>>>>>>> + ++(cb- >>>>>>>>>>>> snd_msg_id); >>>>>>>>>>> + /* send the msg to AvD */ >>>>>>>>>>> + rc = avnd_di_msg_send(cb, &msg); >>>>>>>>>>> + if (NCSCC_RC_SUCCESS == rc) >>>>>>>>>>> + msg.info.avd = 0; >>>>>>>>>>> + /* we have completed the SU SI msg processing */ >>>>>>>>>>> + if (su_assign_state_is_stable(su)) { >>>>>>>>>>> + m_AVND_SU_ASSIGN_PEND_RESET(su); >>>>>>>>>>> + } >>>>>>>>>>> + m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> /* free the contents of avnd message */ >>>>>>>>>>> avnd_msg_content_free(cb, &msg); @@ -1256,14 +1263,7 >>> @@ >>>>>>> void >>>>>>>>>>> avnd_diq_rec_del(AVND_CB *cb, AVND_ >>>>>>>>>>> /* stop the AvD msg response timer */ >>>>>>>>>>> if (m_AVND_TMR_IS_ACTIVE(rec->resp_tmr)) { >>>>>>>>>>> m_AVND_TMR_MSG_RESP_STOP(cb, *rec); >>>>>>>>>>> - // Resend msgs from queue because amfd dropped during >>>>>>>>>>> sync >>>>>>>>>>> - if ((cb->dnd_list.head != nullptr)) { >>>>>>>>>>> - TRACE("retransmit message to amfd"); >>>>>>>>>>> - AVND_DND_MSG_LIST *pending_rec = 0; >>>>>>>>>>> - for (pending_rec = cb->dnd_list.head; pending_rec != >>>>>>>>>>> nullptr; pending_rec = pending_rec->next) { >>>>>>>>>>> - avnd_diq_rec_send(cb, pending_rec); >>>>>>>>>>> - } >>>>>>>>>>> - } >>>>>>>>>>> + avnd_diq_rec_send_buffered_msg(cb); >>>>>>>>>>> /* resend pg start track */ >>>>>>>>>>> avnd_di_resend_pg_start_track(cb); >>>>>>>>>>> } >>>>>>>>>>> @@ -1276,6 +1276,73 @@ void avnd_diq_rec_del(AVND_CB >> *cb, >>>>>>>> AVND_ >>>>>>>>>>> TRACE_LEAVE(); >>>>>>>>>>> return; >>>>>>>>>>> } >>>>>>>>>>> >> +/************************************************************ >>>>>>>>>>> **************** >>>>>>>>>>> + Name : avnd_diq_rec_send_buffered_msg >>>>>>>>>>> + >>>>>>>>>>> + Description : Resend buffered msg >>>>>>>>>>> + >>>>>>>>>>> + Arguments : cb - ptr to the AvND control block >>>>>>>>>>> + >>>>>>>>>>> + Return Values : None. >>>>>>>>>>> + >>>>>>>>>>> + Notes : None. >>>>>>>>>>> >> +************************************************************* >>>>>>>>>>> ********** >>>>>>>>>>> +*******/ void avnd_diq_rec_send_buffered_msg(AVND_CB >> *cb) >>> { >>>>>>>>>>> + TRACE_ENTER(); >>>>>>>>>>> + // Resend msgs from queue because amfnd dropped during >>>>>>> headless >>>>>>>>>>> + // or headless-synchronization >>>>>>>>>>> + if ((cb->dnd_list.head != nullptr)) { >>>>>>>>>>> + AVND_DND_MSG_LIST *pending_rec = 0; >>>>>>>>>>> + TRACE("Attach msg_id of buffered msg"); >>>>>>>>>>> + bool found = true; >>>>>>>>>>> + while (found) { >>>>>>>>>>> + found = false; >>>>>>>>>>> + for (pending_rec = cb->dnd_list.head; >>>>>>>>>>> + pending_rec != >>>>>>>>>>> nullptr; pending_rec = pending_rec->next) { >>>>>>>>>>> + if (pending_rec->msg.type == >>>>>>>>>>> AVND_MSG_AVD) { >>>>>>>>>>> + // At this moment, only oper_state >>>>>>>>>>> msg needs to report to director >>>>>>>>>>> + if (pending_rec->msg.info.avd- >>>>>>>>>>>> msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG && >>>>>>>>>>> + pending_rec->msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.msg_id == 0) { >>>>>>>>>>> + m_AVND_DIQ_REC_POP(cb, >>>>>>>>>>> pending_rec); #if 0 >>>>>>>>>>> + // only resend if this SUSI >>>>>>>>>>> does exist >>>>>>>>>>> + AVND_SU *su = >>>>>>>>>>> m_AVND_SUDB_REC_GET(cb->sudb, >>>>>>>>>>> + pending_rec- >>>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.su_name); >>>>>>>>>>> + if (su != nullptr && su- >>>>>>>>>>>> si_list.n_nodes > 0) { #endif >>>>>>>>>>> + pending_rec- >>>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_id = >>>>>>>>>>>> ++(cb->snd_msg_id); >>>>>>>>>>> + >>>>>>>>>>> m_AVND_DIQ_REC_PUSH(cb, pending_rec); >>>>>>>>>>> + LOG_NO("Found and >>>>>>>>>>> resend buffered su_si_assign msg for SU:'%s', " >>>>>>>>>>> + >>>>>>>>>>> "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', " >>>>>>>>>>> + >>>>>>>>>>> "error:'%u', msg_id:'%u'", >>>>>>>>>>> + >>>>>>>>>>> pending_rec->msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>>> + >>>>>>>>>>> pending_rec->msg.info.avd- >>>>>>>>>>>> msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> pending_rec->msg.info.avd- >>>> msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> pending_rec->msg.info.avd- >>> msg_info.n2d_su_si_assign.msg_act, >>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> pending_rec->msg.info.avd- >>>> msg_info.n2d_su_si_assign.single_csi >>>>>>>>>>> , >>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> pending_rec->msg.info.avd- >>> msg_info.n2d_su_si_assign.msg_id); >>>>>>>>>>> + >>>>>>>>>>> +#if 0 >>>>>>>>>>> + } else { >>>>>>>>>>> + >>>>>>>>>>> avnd_msg_content_free(cb, &pending_rec->msg); >>>>>>>>>>> + delete pending_rec; >>>>>>>>>>> + pending_rec = cb- >>>>>>>>>>>> dnd_list.head; >>>>>>>>>>> + } >>>>>>>>>>> +#endif >>>>>>>>>>> + found = true; >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + TRACE("retransmit message to amfd"); >>>>>>>>>>> + for (pending_rec = cb->dnd_list.head; pending_rec >>>>>>>>>>> +!= nullptr; >>>>>>>>>>> pending_rec = pending_rec->next) { >>>>>>>>>>> + avnd_diq_rec_send(cb, pending_rec); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + TRACE_LEAVE(); >>>>>>>>>>> + return; >>>>>>>>>>> +} >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> /************************************************************* >>>>>>>>>>> *************** >>>>>>>>>>> Name : avnd_diq_rec_send >>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>>> @@ -79,6 +79,7 @@ void avnd_di_msg_ack_process(struct >> avnd >>>>> void >>>>>>>>>>> avnd_diq_del(struct avnd_cb_tag *); AVND_DND_MSG_LIST >>>>>>>>>>> *avnd_diq_rec_add(struct avnd_cb_tag *cb, AVND_MSG *msg); >>> void >>>>>>>>>>> avnd_diq_rec_del(struct avnd_cb_tag *cb, >> AVND_DND_MSG_LIST >>>>>>> *rec); >>>>>>>>>>> +void avnd_diq_rec_send_buffered_msg(struct avnd_cb_tag >> *cb); >>>>>>>>>>> uint32_t avnd_diq_rec_send(struct avnd_cb_tag *cb, >>>>>>>>>>> AVND_DND_MSG_LIST *rec); uint32_t >>>>> avnd_di_reg_su_rsp_snd(struct >>>>>>>>>>> avnd_cb_tag *cb, SaNameT *su_name, uint32_t ret_code); >>>>>>>>>>> uint32_t avnd_di_ack_nack_msg_send(struct avnd_cb_tag *cb, >>>>>>>>>>> uint32_t rcv_id, uint32_t view_num); >>>>>>> ----------------------------------------------------------------- >>>>>>> -- >>>>>>> -- >>>>>>> --------- _______________________________________________ >>>>>>> Opensaf-devel mailing list >>>>>>> Opensaf-devel@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Opensaf-devel mailing list >> Opensaf-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel