[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-12-09 Thread Praveen
This issue got fixed with #233 in the following change sets:

changeset: 4690:5bc315546fe4
user: praveen.malv...@oracle.com
date: Mon Dec 09 14:09:31 2013 +0530
summary: amfnd : issue remove cbk to all comps when CSIs areunevenly 
distributed [#233]

changeset: 4691:38f2d79c9698
branch: opensaf-4.3.x
parent: 4688:1b9b59cf671f
user: praveen.malv...@oracle.com
date: Mon Dec 09 14:10:24 2013 +0530
summary: amfnd : issue remove cbk to all comps when CSIs areunevenly 
distributed [#233]

changeset: 4692:41ad09d56d11
branch: opensaf-4.2.x
tag: tip
parent: 4687:251f0595bb4a
user: praveen.malv...@oracle.com
date: Mon Dec 09 14:10:59 2013 +0530
summary: amfnd : issue remove cbk to all comps when CSIs areunevenly 
distributed [#233]

[staging:5bc315]
[staging:38f2d7]
[staging:41ad09]
 





---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** fixed
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Wed Dec 04, 2013 10:14 AM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126

[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-12-09 Thread Praveen
- **status**: review --> fixed



---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** fixed
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Wed Dec 04, 2013 10:14 AM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7 months ago by nagendra 
Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue 
as there has been many csi add/del before this issue has occured.
 
Changed 7 months ago by allasirisha 
This is still seen with #2832 patch.
 
Changed 7 months ago

[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-12-04 Thread Nagendra Kumar
- **assigned_to**: Praveen



---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** review
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Mon Dec 02, 2013 10:29 AM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7 months ago by nagendra 
Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue 
as there has been many csi add/del before this issue has occured.
 
Changed 7 months ago by allasirisha 
This is still seen with #2832 patch.
 
Changed 7 months ago by

[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-12-02 Thread Nagendra Kumar
- **assigned_to**: Praveen -->  nobody 
- **Milestone**: 4.4.FC --> 4.2.5



---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** review
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Mon Nov 04, 2013 04:44 AM UTC
**Owner:** nobody

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7 months ago by nagendra 
Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue 
as there has been many csi add/del before this issue has occured.
 
Changed 7 months ago by allasirisha 
This is still se

[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-11-03 Thread Praveen
- **status**: assigned --> review



---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** review
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Mon Nov 04, 2013 04:42 AM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7 months ago by nagendra 
Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue 
as there has been many csi add/del before this issue has occured.
 
Changed 7 months ago by allasirisha 
This is still seen with #2832 patch.
 
Changed 7 months

[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-11-03 Thread Praveen
Patch floated for #233 applies to this also.


---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** assigned
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Tue Oct 29, 2013 07:06 AM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7 months ago by nagendra 
Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue 
as there has been many csi add/del before this issue has occured.
 
Changed 7 months ago by allasirisha 
This is still seen with #2832 patch.
 
Chan

[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-10-29 Thread Praveen
Attached traces and configuration.


Attachment: 203.tgz (628.1 kB; application/x-compressed) 


---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** assigned
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Tue Oct 29, 2013 07:04 AM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7 months ago by nagendra 
Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue 
as there has been many csi add/del before this issue has occured.
 
Changed 7 months ago by allasi

[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-10-29 Thread Praveen
Attached 203.tgz contains configuration:
-203.xml  configuration to reproduce the issue.
-script new_si_csi_add.sh to add new SI 
-Traces.

Steps to reproduce:
1) immcfg -f 203.xml
2) Unlock and Unlock-in of SU1 and SU2.
3) ./new_si_csi_add.sh
4) Lock SU1
5) OpenSAF stop of payload hosting SU2.



---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** assigned
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Fri Sep 06, 2013 01:14 PM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7

[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM

2013-09-06 Thread Praveen
- **Milestone**: future --> 4.4.FC



---

** [tickets:#203] avsv: SG went to unstable state when active SU is locked 
after adding new SI in NWay RM**

**Status:** assigned
**Created:** Wed May 15, 2013 04:32 AM UTC by Praveen
**Last Updated:** Wed May 15, 2013 04:33 AM UTC
**Owner:** Praveen

The issue is observed on SLES 64bit VMs.
 

Configuration:
 NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root 
user.
 

New SI is added and then active SU is locked. The following message is seen in 
the syslog:
 

Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 
'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 
'safSi=d_NWay_1Norm_3,safApp=N' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 
'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N'
 Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable
 

Further operations failed since the SG is not stable. When PL-4 which was 
hosting the active SU is brought down, amfd on active controller crashed 
leading to the reboot of the node. The following message is seen in the syslog.
 Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster
 Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: 
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed
 Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. 
Marking it as doomed 3 <17, 2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 
2010f> (safAmfService)
 Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node
 

Bt of the core file:
 Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'.
 Program terminated with signal 6, Aborted.
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0 0x7f457decd645 in raise () from /lib64/libc.so.6
 #1 0x7f457decec33 in abort () from /lib64/libc.so.6
 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, 
func=0x4ad590 "avd_su_dec_curr_stdby_si", 
assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399
#3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585
 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0)
at avd_siass.c:730
#5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663
 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, 
su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191
 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, 
su=0x732130) at avd_sgNWayfsm.c:3645
 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) 
at avd_sgNWayfsm.c:657
 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) 
at avd_sgproc.c:2126
 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776
 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_ndfsm.c:407
 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, 
evt=0x7f4578000ae0) at avd_proc.c:589
 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505
 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47
 

Traces from the active controller are attached.

Changed 7 months ago by nagendra 
Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue 
as there has been many csi add/del before this issue has occured.
 
Changed 7 months ago by allasirisha 
This is still seen with #2832 patch.
 
Changed 7 mon