- **status**: review --> fixed
- **assigned_to**: Minh Hon Chau -->  nobody 



---

** [tickets:#2233] AMF: SG is unstable after component failover recovery**

**Status:** fixed
**Milestone:** 5.0.2
**Labels:** unstable sg 
**Created:** Tue Dec 20, 2016 03:00 AM UTC by Minh Hon Chau
**Last Updated:** Mon Mar 06, 2017 06:51 AM UTC
**Owner:** nobody


This issue occurs as component failover recovery in context of locking node.

**Configuration and steps:**
1- Set up 2N model, PL4 hosts SU4, PL5 hosts SU5, PL3 hosts SU5B. 
2- Bring up 2N app, SU4 has active assignment, SU5 has standby assignment
3- Lock PL4
4- Set a few seconds delay csi remove callback in component of SU4
5- Set a few seconds delay quiesced csi set callback in component of SU5
6- When SU5 finishes active assignment, SU4 now receives assignment removal 
from amfd. In mean time, component failover report is triggered by component of 
SU5.
7- Now SU5 receives quiesced csi set callback from amfd
8- Release both callback in step 4 and 5

**Observation: **
SG unstable, could not repair failed SU (SU5) or lock/unlock any entities

At the time amfd process quiesced assignment response in REALIGN state, no 
action from amfd
> Dec 20 13:23:22.272043 osafamfd [487:sg_2n_fsm.cc:1448] >> 
> susi_success_sg_realign: 'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon' 
> act=5, state=3
> Dec 20 13:23:22.272048 osafamfd [487:sg.cc:1756] TR 
> safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon found in 
> safSg=AmfDemoTwon,safApp=AmfDemoTwon
> Dec 20 13:23:22.272054 osafamfd [487:sg_2n_fsm.cc:0477] >> 
> avd_sg_2n_act_susi: 'safSg=AmfDemoTwon,safApp=AmfDemoTwon'
> Dec 20 13:23:22.272059 osafamfd [487:sg_2n_fsm.cc:0486] TR 
> si'safSi=AmfDemoTwon,safApp=AmfDemoTwon', 
> su'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> si'safSi=AmfDemoTwon,safApp=AmfDemoTwon'
> Dec 20 13:23:22.272065 osafamfd [487:sg_2n_fsm.cc:0486] TR 
> si'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon', 
> su'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> si'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon'
> Dec 20 13:23:22.272071 osafamfd [487:sg_2n_fsm.cc:0486] TR 
> si'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon', 
> su'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> si'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon'
> Dec 20 13:23:22.272076 osafamfd [487:sg_2n_fsm.cc:0501] TR 
> su_1'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', su_2'(null)'
> Dec 20 13:23:22.272082 osafamfd [487:sg_2n_fsm.cc:0555] << 
> avd_sg_2n_act_susi: act: 'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> stdby: '(null)'
> Dec 20 13:23:22.272087 osafamfd [487:sg_2n_fsm.cc:1862] << 
> susi_success_sg_realign: rc:1

In this sg fsm function, SU5 is expected as OUT_OF_SERVICE, but SU5 is 
currently IN_SERVICE
SU5 firstly is reported as OUT_OF_SERVICE from message su_oper_state[DISABLED] 
as part of component failover report
Dec 20 13:22:56.241508 osafamfd [487:sgproc.cc:0656] >> avd_su_oper_state_evh: 
id:56, node:2050f, 'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon' state:2

The failed component is instantiated again, and generates another message 
su_oper_state[ENABLED], it sets SU5 back to IN_SERVICE
Dec 20 13:22:58.481319 osafamfd [487:sgproc.cc:0656] >> avd_su_oper_state_evh: 
id:62, node:2050f, 'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon' state:1

SU5 should be OUT_OF_SERVICE when amfd orchestrates component failover 
recovery, which initiates QUIESCED assignment of SU5 first. If re-instantiation 
of failed component happens faster as in this test then the sg fsm results in 
unexpected sequence.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to