Hi Praveen,

Those limitation were picked up and intented for 5.2 documentation but yet done 
offically, the limitation related to loss RTA is going to be fixed for 5.2 
since it has been seen some times in our cluster, so that's reason I set it as 
*defect*. 

The fix for this problem should be based on current implemetation of #1725.

I am thinking whether the same problem happens in normal cluster, if power off 
the active controller just before active amfd is going to update RTA. I think 
basically the loss of RTA update could be seen when the standby amfd takes over 
new active role in case of SC failover should be the same as the first amfd 
takes over active role in case of coming back from headless. 

By hacking a bit in amfd code to wait a few seconds just before update RTA so 
that I have enough time to power off the active SC for SC failover test (as 
similar as stop/restart both SCs for headless test). I can see the same problem 
in both cases: SC failover in normal cluster and SC comes back from headless. 
Attached log is for case of SC failover.

The way that I am testing is not quite convincible, but it can happen in 
theory, I think. In another view, the SC absence period could be the extended 
time in which cluster has no active controller comparing to SC failover's, so 
some issues are rarely seen in normal cluster now could be easier seen in 
headless. (#2233 could be another example)

I am trying to use su_fault() to remove the assignment of faulty su due to loss 
of RTA.
Please let me know if I am not on the right track.

Thanks,
Minh


Attachments:

- 
[2210_sc_failover.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/a0c8f091/fb34/attachment/2210_sc_failover.tgz)
 (902.6 kB; application/x-compressed)


---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Tue Jan 24, 2017 05:45 AM UTC
**Owner:** Minh Hon Chau


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x00000000004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
        s_susi = 0x8f50000000b
        susi_temp = 0x5fa169
        o_su = 0x2417f98
        __FUNCTION__ = "node_fail_si_oper"
        cb = 0x919240 <_control_block>
#1  0x00000000004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
        a_susi = 0x1
        s_susi = 0x7fffedecd2d0
        o_su = 0x5a50bd <AVD_SU::any_susi_fsm_in(unsigned int)+497>
        flag = 2
        __FUNCTION__ = "node_fail"
        su_ha_state = 0
#2  0x0000000000513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
        su = @0x2411330: 0x2413440
        __for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
        __for_begin = 
        __for_end = 
        __FUNCTION__ = "failover_absent_assignment"
#3  0x000000000043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
        i_sg = 0x24109d0
        it = {first = "safSg=1,safApp=osaftest", second = }
        __FUNCTION__ = "avd_cluster_tmr_init_evh"
        su = 0x0
        node = 0x240f9b0
~~~



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to