- Description has changed:

Diff:

~~~~

--- old
+++ new
@@ -1,7 +1,7 @@
 This issue occurs as component failover recovery in context of locking node.
 
 **Configuration and steps:**
-1- Set up 2N model, PL4 hosts SU4, PL5 hosts SU5, PL3 hosts SU5B. Si deps 
safSi=AmfDemoTwon2 depends safSi=AmfDemoTwon1 depends safSi=AmfDemoTwon
+1- Set up 2N model, PL4 hosts SU4, PL5 hosts SU5, PL3 hosts SU5B. 
 2- Bring up 2N app, SU4 has active assignment, SU5 has standby assignment
 3- Lock PL4
 4- Set a few seconds delay csi remove callback in component of SU4

~~~~




---

** [tickets:#2233] AMF: SG is unstable after component failover recovery**

**Status:** review
**Milestone:** 5.0.2
**Labels:** unstable sg 
**Created:** Tue Dec 20, 2016 03:00 AM UTC by Minh Hon Chau
**Last Updated:** Mon Feb 20, 2017 12:59 AM UTC
**Owner:** Minh Hon Chau


This issue occurs as component failover recovery in context of locking node.

**Configuration and steps:**
1- Set up 2N model, PL4 hosts SU4, PL5 hosts SU5, PL3 hosts SU5B. 
2- Bring up 2N app, SU4 has active assignment, SU5 has standby assignment
3- Lock PL4
4- Set a few seconds delay csi remove callback in component of SU4
5- Set a few seconds delay quiesced csi set callback in component of SU5
6- When SU5 finishes active assignment, SU4 now receives assignment removal 
from amfd. In mean time, component failover report is triggered by component of 
SU5.
7- Now SU5 receives quiesced csi set callback from amfd
8- Release both callback in step 4 and 5

**Observation: **
SG unstable, could not repair failed SU (SU5) or lock/unlock any entities

At the time amfd process quiesced assignment response in REALIGN state, no 
action from amfd
> Dec 20 13:23:22.272043 osafamfd [487:sg_2n_fsm.cc:1448] >> 
> susi_success_sg_realign: 'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon' 
> act=5, state=3
> Dec 20 13:23:22.272048 osafamfd [487:sg.cc:1756] TR 
> safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon found in 
> safSg=AmfDemoTwon,safApp=AmfDemoTwon
> Dec 20 13:23:22.272054 osafamfd [487:sg_2n_fsm.cc:0477] >> 
> avd_sg_2n_act_susi: 'safSg=AmfDemoTwon,safApp=AmfDemoTwon'
> Dec 20 13:23:22.272059 osafamfd [487:sg_2n_fsm.cc:0486] TR 
> si'safSi=AmfDemoTwon,safApp=AmfDemoTwon', 
> su'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> si'safSi=AmfDemoTwon,safApp=AmfDemoTwon'
> Dec 20 13:23:22.272065 osafamfd [487:sg_2n_fsm.cc:0486] TR 
> si'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon', 
> su'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> si'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon'
> Dec 20 13:23:22.272071 osafamfd [487:sg_2n_fsm.cc:0486] TR 
> si'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon', 
> su'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> si'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon'
> Dec 20 13:23:22.272076 osafamfd [487:sg_2n_fsm.cc:0501] TR 
> su_1'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', su_2'(null)'
> Dec 20 13:23:22.272082 osafamfd [487:sg_2n_fsm.cc:0555] << 
> avd_sg_2n_act_susi: act: 'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 
> stdby: '(null)'
> Dec 20 13:23:22.272087 osafamfd [487:sg_2n_fsm.cc:1862] << 
> susi_success_sg_realign: rc:1

In this sg fsm function, SU5 is expected as OUT_OF_SERVICE, but SU5 is 
currently IN_SERVICE
SU5 firstly is reported as OUT_OF_SERVICE from message su_oper_state[DISABLED] 
as part of component failover report
Dec 20 13:22:56.241508 osafamfd [487:sgproc.cc:0656] >> avd_su_oper_state_evh: 
id:56, node:2050f, 'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon' state:2

The failed component is instantiated again, and generates another message 
su_oper_state[ENABLED], it sets SU5 back to IN_SERVICE
Dec 20 13:22:58.481319 osafamfd [487:sgproc.cc:0656] >> avd_su_oper_state_evh: 
id:62, node:2050f, 'safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon' state:1

SU5 should be OUT_OF_SERVICE when amfd orchestrates component failover 
recovery, which initiates QUIESCED assignment of SU5 first. If re-instantiation 
of failed component happens faster as in this test then the sg fsm results in 
unexpected sequence.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to