---

** [tickets:#1165] Out of sync messages cause amfnd crash and  cluster reboot**

**Status:** unassigned
**Milestone:** 4.4.2
**Created:** Thu Oct 09, 2014 11:11 AM UTC by surender khetavath
**Last Updated:** Thu Oct 09, 2014 11:11 AM UTC
**Owner:** nobody

Changeset : 6012
Setup : 2 controllers
Initially SC-1 active and SC-2 standby

Test: 
1) do 'kill -STOP `pidof osafamfd` on sc-1
2) on sc-2 do 'amf-adm si-swap safSi=SC-2N,safApp=OpenSAF'
3) after few seconds on sc-1 do 'kill -CONT <pid of amfd>'
At this point switchover succeeds.Then
4) Again on SC-1 do 'kill -STOP <pid of amfd>'
5) on sc-2 do 'amf-adm si-swap safSi=SC-2N,safApp=OpenSAF'
6) After few seconds do 'kill -CONT <pid of amfd>' on sc-1

Now the SC-2 will crash and go to reboot.
Later SC-1 also reboots.

syslog start time on sc-1 at 1st swap : 
Oct  9 15:25:33 SC-1 osafamfd[8666]: NO safSi=SC-2N,safApp=OpenSAF Swap 
initiated
syslog start time on sc-1 at 2nd swap : 
Oct  9 15:25:35 SC-1 osafamfd[8666]: NO Controller switch over initiated

After second swap errors seen in sc-1 syslog: 

Oct  9 15:26:28 SC-1 opensaf_reboot: Rebooting remote node in the absence of 
PLM is outside the scope of OpenSAF
Oct  9 15:26:28 SC-1 osaflckd[8778]: ER GLD mbcsv chgrole failed
Oct  9 15:26:28 SC-1 osafevtd[8797]: ER MBCSv state change failed
Oct  9 15:26:28 SC-1 osafckptd[8806]: NO ERR_INVALID_PARAM: Implementer 
safCheckPointService already set for this handle when trying to set 
safCheckPointService
Oct  9 15:26:28 SC-1 osafckptd[8806]: ER cpd immOiImplmenterSet failed with err 
= 7
Oct  9 15:26:28 SC-1 osafamfnd[8676]: NO 
'safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Oct  9 15:26:28 SC-1 osafamfnd[8676]: ER 
safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Oct  9 15:26:28 SC-1 osafamfnd[8676]: Rebooting OpenSAF NodeId = 131343 EE Name 
= , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  9 15:26:28 SC-1 opensaf_reboot: Rebooting local node; timeout=60
Oct  9 15:26:28 SC-1 osafimmnd[8603]: NO Implementer locally disconnected. 
Marking it as doomed 40 <514, 2010f> (MsgQueueService131599)
Oct  9 15:26:28 SC-1 osafmsgd[8726]: NO ERR_INVALID_PARAM: Implementer 
safMsgGrpService already set for this handle when trying to set safMsgGrpService
Oct  9 15:26:28 SC-1 osafmsgd[8726]: ER mqd_imm_declare_implementer failed: err 
= 7
Oct  9 15:26:28 SC-1 osafmsgd[8726]: ER MBCSV ChangeRole Failed



syslog start time on sc-2 at 1st swap : 
Oct  9 15:25:55 SC-2 osafamfd[7552]: NO safSi=SC-2N,safApp=OpenSAF Swap 
initiated
syslog start time on sc-2 at 2nd swap :
Oct  9 15:26:00 SC-2 osafamfd[7552]: NO Controller switch over initiated

After second swap errors seen in sc-2 syslog:
Oct  9 15:26:16 SC-2 osafamfd[7552]: ER Out of sync detected in warm sync 
response, exiting
Oct  9 15:26:16 SC-2 osafamfd[7552]: ckpt_dec.cc:2766: avd_dec_warm_sync_rsp: 
Assertion '0' failed.
Oct  9 15:26:16 SC-2 osafamfnd[7562]: ER AMF director unexpectedly crashed
Oct  9 15:26:16 SC-2 osafamfnd[7562]: Rebooting OpenSAF NodeId = 131599 EE Name 
= , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 
131599, SupervisionTime = 60


(gdb) bt
#0  0x00007f394c52fb55 in raise () from /lib64/libc.so.6
#1  0x00007f394c531131 in abort () from /lib64/libc.so.6
#2  0x00007f394e2f3ffe in __osafassert_fail () from 
/usr/lib64/libopensaf_core.so.0
#3  0x00000000004154e6 in avd_dec_warm_sync_rsp(cl_cb_tag*, ncs_mbcsv_cb_dec*) 
() at ckpt_dec.cc:2766
#4  0x000000000040ca75 in avsv_mbcsv_process_dec_cb(cl_cb_tag*, 
ncs_mbcsv_cb_arg*) ()
#5  0x000000000040c2ec in avsv_mbcsv_cb(ncs_mbcsv_cb_arg*) ()
#6  0x00007f394e304036 in ncs_mbscv_rcv_decode () from 
/usr/lib64/libopensaf_core.so.0
#7  0x00007f394e3047ce in ncs_mbcsv_rcv_warm_sync_resp_cmplt () from 
/usr/lib64/libopensaf_core.so.0
#8  0x00007f394e30ac40 in mbcsv_process_events () from 
/usr/lib64/libopensaf_core.so.0
#9  0x00007f394e30adab in mbcsv_hdl_dispatch_all () from 
/usr/lib64/libopensaf_core.so.0
#10 0x00007f394e305782 in mbcsv_process_dispatch_request () at mbcsv_api.c:423
#11 0x000000000040d26a in avsv_mbcsv_dispatch(cl_cb_tag*, unsigned int) ()
#12 0x000000000044337e in main_loop() () at main.cc:698
#13 0x00000000004437de in main () at main.cc:830


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to