---
** [tickets:#1165] Out of sync messages cause amfnd crash and cluster reboot**
**Status:** unassigned
**Milestone:** 4.4.2
**Created:** Thu Oct 09, 2014 11:11 AM UTC by surender khetavath
**Last Updated:** Thu Oct 09, 2014 11:11 AM UTC
**Owner:** nobody
Changeset : 6012
Setup : 2 controllers
Initially SC-1 active and SC-2 standby
Test:
1) do 'kill -STOP `pidof osafamfd` on sc-1
2) on sc-2 do 'amf-adm si-swap safSi=SC-2N,safApp=OpenSAF'
3) after few seconds on sc-1 do 'kill -CONT <pid of amfd>'
At this point switchover succeeds.Then
4) Again on SC-1 do 'kill -STOP <pid of amfd>'
5) on sc-2 do 'amf-adm si-swap safSi=SC-2N,safApp=OpenSAF'
6) After few seconds do 'kill -CONT <pid of amfd>' on sc-1
Now the SC-2 will crash and go to reboot.
Later SC-1 also reboots.
syslog start time on sc-1 at 1st swap :
Oct 9 15:25:33 SC-1 osafamfd[8666]: NO safSi=SC-2N,safApp=OpenSAF Swap
initiated
syslog start time on sc-1 at 2nd swap :
Oct 9 15:25:35 SC-1 osafamfd[8666]: NO Controller switch over initiated
After second swap errors seen in sc-1 syslog:
Oct 9 15:26:28 SC-1 opensaf_reboot: Rebooting remote node in the absence of
PLM is outside the scope of OpenSAF
Oct 9 15:26:28 SC-1 osaflckd[8778]: ER GLD mbcsv chgrole failed
Oct 9 15:26:28 SC-1 osafevtd[8797]: ER MBCSv state change failed
Oct 9 15:26:28 SC-1 osafckptd[8806]: NO ERR_INVALID_PARAM: Implementer
safCheckPointService already set for this handle when trying to set
safCheckPointService
Oct 9 15:26:28 SC-1 osafckptd[8806]: ER cpd immOiImplmenterSet failed with err
= 7
Oct 9 15:26:28 SC-1 osafamfnd[8676]: NO
'safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Oct 9 15:26:28 SC-1 osafamfnd[8676]: ER
safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Oct 9 15:26:28 SC-1 osafamfnd[8676]: Rebooting OpenSAF NodeId = 131343 EE Name
= , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343,
SupervisionTime = 60
Oct 9 15:26:28 SC-1 opensaf_reboot: Rebooting local node; timeout=60
Oct 9 15:26:28 SC-1 osafimmnd[8603]: NO Implementer locally disconnected.
Marking it as doomed 40 <514, 2010f> (MsgQueueService131599)
Oct 9 15:26:28 SC-1 osafmsgd[8726]: NO ERR_INVALID_PARAM: Implementer
safMsgGrpService already set for this handle when trying to set safMsgGrpService
Oct 9 15:26:28 SC-1 osafmsgd[8726]: ER mqd_imm_declare_implementer failed: err
= 7
Oct 9 15:26:28 SC-1 osafmsgd[8726]: ER MBCSV ChangeRole Failed
syslog start time on sc-2 at 1st swap :
Oct 9 15:25:55 SC-2 osafamfd[7552]: NO safSi=SC-2N,safApp=OpenSAF Swap
initiated
syslog start time on sc-2 at 2nd swap :
Oct 9 15:26:00 SC-2 osafamfd[7552]: NO Controller switch over initiated
After second swap errors seen in sc-2 syslog:
Oct 9 15:26:16 SC-2 osafamfd[7552]: ER Out of sync detected in warm sync
response, exiting
Oct 9 15:26:16 SC-2 osafamfd[7552]: ckpt_dec.cc:2766: avd_dec_warm_sync_rsp:
Assertion '0' failed.
Oct 9 15:26:16 SC-2 osafamfnd[7562]: ER AMF director unexpectedly crashed
Oct 9 15:26:16 SC-2 osafamfnd[7562]: Rebooting OpenSAF NodeId = 131599 EE Name
= , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId =
131599, SupervisionTime = 60
(gdb) bt
#0 0x00007f394c52fb55 in raise () from /lib64/libc.so.6
#1 0x00007f394c531131 in abort () from /lib64/libc.so.6
#2 0x00007f394e2f3ffe in __osafassert_fail () from
/usr/lib64/libopensaf_core.so.0
#3 0x00000000004154e6 in avd_dec_warm_sync_rsp(cl_cb_tag*, ncs_mbcsv_cb_dec*)
() at ckpt_dec.cc:2766
#4 0x000000000040ca75 in avsv_mbcsv_process_dec_cb(cl_cb_tag*,
ncs_mbcsv_cb_arg*) ()
#5 0x000000000040c2ec in avsv_mbcsv_cb(ncs_mbcsv_cb_arg*) ()
#6 0x00007f394e304036 in ncs_mbscv_rcv_decode () from
/usr/lib64/libopensaf_core.so.0
#7 0x00007f394e3047ce in ncs_mbcsv_rcv_warm_sync_resp_cmplt () from
/usr/lib64/libopensaf_core.so.0
#8 0x00007f394e30ac40 in mbcsv_process_events () from
/usr/lib64/libopensaf_core.so.0
#9 0x00007f394e30adab in mbcsv_hdl_dispatch_all () from
/usr/lib64/libopensaf_core.so.0
#10 0x00007f394e305782 in mbcsv_process_dispatch_request () at mbcsv_api.c:423
#11 0x000000000040d26a in avsv_mbcsv_dispatch(cl_cb_tag*, unsigned int) ()
#12 0x000000000044337e in main_loop() () at main.cc:698
#13 0x00000000004437de in main () at main.cc:830
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets