[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
This issue got fixed with #233 in the following change sets: changeset: 4690:5bc315546fe4 user: praveen.malv...@oracle.com date: Mon Dec 09 14:09:31 2013 +0530 summary: amfnd : issue remove cbk to all comps when CSIs areunevenly distributed [#233] changeset: 4691:38f2d79c9698 branch: opensaf-4.3.x parent: 4688:1b9b59cf671f user: praveen.malv...@oracle.com date: Mon Dec 09 14:10:24 2013 +0530 summary: amfnd : issue remove cbk to all comps when CSIs areunevenly distributed [#233] changeset: 4692:41ad09d56d11 branch: opensaf-4.2.x tag: tip parent: 4687:251f0595bb4a user: praveen.malv...@oracle.com date: Mon Dec 09 14:10:59 2013 +0530 summary: amfnd : issue remove cbk to all comps when CSIs areunevenly distributed [#233] [staging:5bc315] [staging:38f2d7] [staging:41ad09] --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** fixed **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Wed Dec 04, 2013 10:14 AM UTC **Owner:** Praveen The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126
[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
- **status**: review --> fixed --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** fixed **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Wed Dec 04, 2013 10:14 AM UTC **Owner:** Praveen The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, evt=0x7f4578000ae0) at avd_ndfsm.c:407 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, evt=0x7f4578000ae0) at avd_proc.c:589 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47 Traces from the active controller are attached. Changed 7 months ago by nagendra Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue as there has been many csi add/del before this issue has occured. Changed 7 months ago by allasirisha This is still seen with #2832 patch. Changed 7 months ago
[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
- **assigned_to**: Praveen --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** review **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Mon Dec 02, 2013 10:29 AM UTC **Owner:** Praveen The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, evt=0x7f4578000ae0) at avd_ndfsm.c:407 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, evt=0x7f4578000ae0) at avd_proc.c:589 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47 Traces from the active controller are attached. Changed 7 months ago by nagendra Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue as there has been many csi add/del before this issue has occured. Changed 7 months ago by allasirisha This is still seen with #2832 patch. Changed 7 months ago by
[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
- **assigned_to**: Praveen --> nobody - **Milestone**: 4.4.FC --> 4.2.5 --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** review **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Mon Nov 04, 2013 04:44 AM UTC **Owner:** nobody The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, evt=0x7f4578000ae0) at avd_ndfsm.c:407 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, evt=0x7f4578000ae0) at avd_proc.c:589 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47 Traces from the active controller are attached. Changed 7 months ago by nagendra Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue as there has been many csi add/del before this issue has occured. Changed 7 months ago by allasirisha This is still se
[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
- **status**: assigned --> review --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** review **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Mon Nov 04, 2013 04:42 AM UTC **Owner:** Praveen The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, evt=0x7f4578000ae0) at avd_ndfsm.c:407 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, evt=0x7f4578000ae0) at avd_proc.c:589 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47 Traces from the active controller are attached. Changed 7 months ago by nagendra Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue as there has been many csi add/del before this issue has occured. Changed 7 months ago by allasirisha This is still seen with #2832 patch. Changed 7 months
[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
Patch floated for #233 applies to this also. --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** assigned **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Tue Oct 29, 2013 07:06 AM UTC **Owner:** Praveen The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, evt=0x7f4578000ae0) at avd_ndfsm.c:407 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, evt=0x7f4578000ae0) at avd_proc.c:589 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47 Traces from the active controller are attached. Changed 7 months ago by nagendra Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue as there has been many csi add/del before this issue has occured. Changed 7 months ago by allasirisha This is still seen with #2832 patch. Chan
[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
Attached traces and configuration. Attachment: 203.tgz (628.1 kB; application/x-compressed) --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** assigned **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Tue Oct 29, 2013 07:04 AM UTC **Owner:** Praveen The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, evt=0x7f4578000ae0) at avd_ndfsm.c:407 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, evt=0x7f4578000ae0) at avd_proc.c:589 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47 Traces from the active controller are attached. Changed 7 months ago by nagendra Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue as there has been many csi add/del before this issue has occured. Changed 7 months ago by allasi
[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
Attached 203.tgz contains configuration: -203.xml configuration to reproduce the issue. -script new_si_csi_add.sh to add new SI -Traces. Steps to reproduce: 1) immcfg -f 203.xml 2) Unlock and Unlock-in of SU1 and SU2. 3) ./new_si_csi_add.sh 4) Lock SU1 5) OpenSAF stop of payload hosting SU2. --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** assigned **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Fri Sep 06, 2013 01:14 PM UTC **Owner:** Praveen The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, evt=0x7f4578000ae0) at avd_ndfsm.c:407 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, evt=0x7f4578000ae0) at avd_proc.c:589 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47 Traces from the active controller are attached. Changed 7
[tickets] [opensaf:tickets] #203 avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM
- **Milestone**: future --> 4.4.FC --- ** [tickets:#203] avsv: SG went to unstable state when active SU is locked after adding new SI in NWay RM** **Status:** assigned **Created:** Wed May 15, 2013 04:32 AM UTC by Praveen **Last Updated:** Wed May 15, 2013 04:33 AM UTC **Owner:** Praveen The issue is observed on SLES 64bit VMs. Configuration: NWay RM with 2 SUs, 2SIs and 2 CSIs. PBE is enabled and opensaf is run as root user. New SI is added and then active SU is locked. The following message is seen in the syslog: Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:39 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' ACTIVE to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_1,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigning 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Assigned 'safSi=d_NWay_1Norm_3,safApp=N' QUIESCED to 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removed 'safSi=d_NWay_1Norm_3,safApp=N' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfnd[3703]: Removing 'all SIs' from 'safSu=d_NWay_1Norm_1,safSg=SG_d_n,safApp=N' Oct 6 19:24:42 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:43 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Oct 6 19:24:44 SLES-SLOT-1 osafamfd[3693]: SG state is not stable Further operations failed since the SG is not stable. When PL-4 which was hosting the active SU is brought down, amfd on active controller crashed leading to the reboot of the node. The following message is seen in the syslog. Oct 6 19:43:59 SLES-SLOT-1 osafamfd[3693]: Node 'PL-4' left the cluster Oct 6 19:44:00 SLES-SLOT-1 osafamfd[3693]: avd_su.c:1585: avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed. Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: AMF director unexpectedly crashed Oct 6 19:44:00 SLES-SLOT-1 osafamfnd[3703]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer locally disconnected. Marking it as doomed 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 osafimmnd[3628]: Implementer disconnected 3 <17, 2010f> (safAmfService) Oct 6 19:44:00 SLES-SLOT-1 opensaf_reboot: Rebooting local node Bt of the core file: Core was generated by `/usr/lib64/opensaf/osafamfd —tracemask=0x'. Program terminated with signal 6, Aborted. #0 0x7f457decd645 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f457decd645 in raise () from /lib64/libc.so.6 #1 0x7f457decec33 in abort () from /lib64/libc.so.6 #2 0x7f457f4df095 in osafassert_fail (file=0x4ac5e5 "avd_su.c", line=1585, func=0x4ad590 "avd_su_dec_curr_stdby_si", assertion=0x4ad5b0 "su->saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:399 #3 0x0048964f in avd_su_dec_curr_stdby_si (su=0x727f70) at avd_su.c:1585 #4 0x0048b244 in avd_susi_update_assignment_counters (susi=0x767bf0, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at avd_siass.c:730 #5 0x0048aff7 in avd_susi_del_send (susi=0x767bf0) at avd_siass.c:663 #6 0x00474bbc in avd_sg_nway_node_fail_stable (cb=0x6bdbe0, su=0x732130, susi=0x0) at avd_sgNWayfsm.c:3191 #7 0x00476257 in avd_sg_nway_node_fail_sg_realign (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:3645 #8 0x0046c82c in avd_sg_nway_node_fail_func (cb=0x6bdbe0, su=0x732130) at avd_sgNWayfsm.c:657 #9 0x0047ad65 in avd_node_susi_fail_func (cb=0x6bdbe0, avnd=0x6fef50) at avd_sgproc.c:2126 #10 0x00434f72 in avd_node_failover (node=0x6fef50) at avd_ndproc.c:776 #11 0x00431a80 in avd_mds_avnd_down_evh (cb=0x6bdbe0, evt=0x7f4578000ae0) at avd_ndfsm.c:407 #12 0x0043b57e in avd_process_event (cb_now=0x6bdbe0, evt=0x7f4578000ae0) at avd_proc.c:589 #13 0x0043b305 in avd_main_proc () at avd_proc.c:505 #14 0x00409210 in main (argc=2, argv=0x7fff87968c08) at amfd_main.c:47 Traces from the active controller are attached. Changed 7 months ago by nagendra Can you please test it on 4.2.2, i suspect that 2832 may be solving the issue as there has been many csi add/del before this issue has occured. Changed 7 months ago by allasirisha This is still seen with #2832 patch. Changed 7 mon