Hi Minh,

One quick question:
Ticket description says:
"Si deps safSi=AmfDemoTwon2 depends safSi=AmfDemoTwon1 depends 
safSi=AmfDemoTwon"
But logs are related to without SIdep. Also in the configuration 
app3_twon3su3si.xml, SI dep classes are commented.
I think ticket description needs correction as problem is without SI dep.
Please confirm.

Thanks,
Praveen


On 17-Feb-17 10:58 AM, praveen malviya wrote:
> Hi Minh,
>
> I have started reviewing this patch.
>
> Thanks,
> Praveen
>
> On 15-Feb-17 9:22 AM, minh chau wrote:
>> Hi all,
>>
>> Have you had time to review this patch?
>> It changes the component failover sequence, so I think we need more time
>> to look at it.
>>
>> Thanks,
>> Minh
>>
>> On 23/01/17 12:28, Minh Hon Chau wrote:
>>>   src/amf/amfnd/avnd_su.h |   1 +
>>>   src/amf/amfnd/clc.cc    |   3 ---
>>>   src/amf/amfnd/di.cc     |  12 +++++++++++-
>>>   src/amf/amfnd/susm.cc   |  32 +++++++++++++++++++++++++++++---
>>>   4 files changed, 41 insertions(+), 7 deletions(-)
>>>
>>>
>>> In case component failover, faulty component will be terminated. When
>>> the reinstantiation
>>> is done, amfnd will send su_oper_message (enabled) to amfd which is
>>> running along with
>>> component failover. In the reported problem, if su_oper_message
>>> (enabled) comes to amfd
>>> before the quiesced assignment response (as part of component failover
>>> sequence) comes to
>>> amfd, then this quiesced assignment response is ignored, thus
>>> component failover will not
>>> finish.
>>>
>>> The problem is in function susi_success_sg_realign with act=5,
>>> state=3, amfd always assumes
>>> su having faulty component is OUT_OF_SERVICE. This assumption is true
>>> in most of the time
>>> when su_oper_message (enabled) comes a little later than quiesced
>>> assignment response. In fact
>>> the su_oper_message (enabled) is not designed as part of component
>>> failover sequence, thus it
>>> can come any time during the failover. If amfd is getting a bit busier
>>> with RTA update then
>>> the faulty component has enough to reinstiantiate so that amfnd sends
>>> su_oper_message (enabled)
>>> before quiesced assignment response, the reported problem will be seen.
>>>
>>> This patch hardens the component failover sequence by ensuring the
>>> su_oper_message (enabled) to
>>> be sent after su completes to remove assignment. This approach comes
>>> from the similarity in
>>> su failover, where the su_oper_message (enabled) is sent in repair phase.
>>>
>>> diff --git a/src/amf/amfnd/avnd_su.h b/src/amf/amfnd/avnd_su.h
>>> --- a/src/amf/amfnd/avnd_su.h
>>> +++ b/src/amf/amfnd/avnd_su.h
>>> @@ -393,6 +393,7 @@ extern struct avnd_su_si_rec *avnd_silis
>>>   extern struct avnd_su_si_rec *avnd_silist_getprev(const struct
>>> avnd_su_si_rec *);
>>>   extern struct avnd_su_si_rec *avnd_silist_getlast(void);
>>>   extern bool sufailover_in_progress(const AVND_SU *su);
>>> +extern bool componentfailover_in_progress(const AVND_SU *su);
>>>   extern bool sufailover_during_nodeswitchover(const AVND_SU *su);
>>>   extern bool all_csis_in_removed_state(const AVND_SU *su);
>>>   extern void su_reset_restart_count_in_comps(const struct avnd_cb_tag
>>> *cb, const AVND_SU *su);
>>> diff --git a/src/amf/amfnd/clc.cc b/src/amf/amfnd/clc.cc
>>> --- a/src/amf/amfnd/clc.cc
>>> +++ b/src/amf/amfnd/clc.cc
>>> @@ -2381,9 +2381,6 @@ uint32_t avnd_comp_clc_terming_cleansucc
>>>               (m_AVND_SU_IS_FAILOVER(su))) {
>>>           /* yes, request director to orchestrate component failover */
>>>           rc = avnd_di_oper_send(cb, su, SA_AMF_COMPONENT_FAILOVER);
>>> -
>>> -        //Reset component-failover here. SU failover is reset as part
>>> of REPAIRED admin op.
>>> -        m_AVND_SU_FAILOVER_RESET(su);
>>>       }
>>>         /*
>>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>>> --- a/src/amf/amfnd/di.cc
>>> +++ b/src/amf/amfnd/di.cc
>>> @@ -894,7 +894,17 @@ uint32_t avnd_di_susi_resp_send(AVND_CB
>>>               }
>>>               m_AVND_SU_ALL_SI_RESET(su);
>>>           }
>>> -
>>> +        if (componentfailover_in_progress(su)) {
>>> +            if (all_csis_in_removed_state(su) == true) {
>>> +                bool is_en;
>>> +                m_AVND_SU_IS_ENABLED(su, is_en);
>>> +                if (is_en) {
>>> +                    if (avnd_di_oper_send(cb, su, 0) ==
>>> NCSCC_RC_SUCCESS) {
>>> +                        m_AVND_SU_FAILOVER_RESET(su);
>>> +                    }
>>> +                }
>>> +            }
>>> +        }
>>>       /* free the contents of avnd message */
>>>       avnd_msg_content_free(cb, &msg);
>>>   diff --git a/src/amf/amfnd/susm.cc b/src/amf/amfnd/susm.cc
>>> --- a/src/amf/amfnd/susm.cc
>>> +++ b/src/amf/amfnd/susm.cc
>>> @@ -1633,10 +1633,22 @@ uint32_t avnd_su_pres_st_chng_prc(AVND_C
>>>               m_AVND_SU_IS_ENABLED(su, is_en);
>>>               if (true == is_en) {
>>>                   TRACE("SU oper state is enabled");
>>> +                // do not send su_oper state if component failover is
>>> in progress
>>>                   m_AVND_SU_OPER_STATE_SET(su,
>>> SA_AMF_OPERATIONAL_ENABLED);
>>> -                rc = avnd_di_oper_send(cb, su, 0);
>>> -                if (NCSCC_RC_SUCCESS != rc)
>>> -                    goto done;
>>> +                if (componentfailover_in_progress(su) == true) {
>>> +                    si = reinterpret_cast<AVND_SU_SI_REC*>
>>> +                            (m_NCS_DBLIST_FIND_FIRST(&su->si_list));
>>> +                    if (si == nullptr ||
>>> all_csis_in_removed_state(su)) {
>>> +                        rc = avnd_di_oper_send(cb, su, 0);
>>> +                        if (rc != NCSCC_RC_SUCCESS)
>>> +                            goto done;
>>> +                        m_AVND_SU_FAILOVER_RESET(su);
>>> +                    }
>>> +                } else {
>>> +                    rc = avnd_di_oper_send(cb, su, 0);
>>> +                    if (NCSCC_RC_SUCCESS != rc)
>>> +                        goto done;
>>> +                }
>>>               }
>>>               else
>>>                   TRACE("SU oper state is disabled");
>>> @@ -3551,6 +3563,20 @@ bool sufailover_in_progress(const AVND_S
>>>   }
>>>     /**
>>> + * This function checks if the componentfailover is going on.
>>> + * @param su: ptr to the SU .
>>> + *
>>> + * @return true/false.
>>> + */
>>> +bool componentfailover_in_progress(const AVND_SU *su) {
>>> +    if ((su->sufailover == false) && (!m_AVND_SU_IS_RESTART(su)) &&
>>> +            (avnd_cb->oper_state != SA_AMF_OPERATIONAL_DISABLED) &&
>>> (!su->is_ncs) &&
>>> +            m_AVND_SU_IS_FAILOVER(su))
>>> +        return true;
>>> +    return false;
>>> +}
>>> +
>>> +/**
>>>    * This function checks if the sufailover and node switchover are
>>> going on.
>>>    * @param su: ptr to the SU .
>>>    *
>>>
>>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to