Attached 203.tgz contains configuration:
-203.xml  configuration to reproduce the issue.
-script new_si_csi_add.sh to add new SI 
-Traces.

Steps to reproduce:
1) immcfg -f 203.xml
2) Unlock and Unlock-in of SU1 and SU2.
3) ./new_si_csi_add.sh
4) Lock SU1
5) OpenSAF stop of payload hosting SU2.



---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** assigned
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Fri Sep 06, 2013 01:14 PM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0xffffffff'.
 Program terminated with signal 6, Aborted.
 #0 0x00007f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x00007f457decd645 in raise () from /lib64/libc.so.6
 #1 0x00007f457decec33 in abort () from /lib64/libc.so.6
 #2 0x00007f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x000000000048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x000000000048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x000000000048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x0000000000474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x0000000000476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x000000000046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x000000000047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x0000000000434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x0000000000431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x000000000043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x000000000043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x0000000000409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7 months ago by nagendra 
Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue 
as there has been many csi add/del before this issue has occured.
 
Changed 7 months ago by allasirisha 
This is still seen with #2832 patch.
 
Changed 7 months ago by praveenmalviya 
Similar problem in http://devel.opensaf.org/ticket/2861 also.

In both the tickets, when su is locked, avnd does not issue csi remove callback 
to all components(one component is always left out though it has assignments). 
Due to this avnd never sends any response to avd which leads to "SG state is 
not stable" in subsequent operations.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to