This osafamfd crash has been observed in our lab several times. It could be 
triggered easily. We have some applications started by opensaf. If one 
application failed, and opensaf tries to resatrt it. But the applucation failed 
to restart, the osafamfd always crash by the same assert di.cc line 569.

Here is the syslog:

Aug 27 13:34:00 slot4-MW984 osafamfnd[22493]: NO 
'safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp'
 faulted due to 'passiveMonitorFailed' : Recovery is 'componentRestart'
Aug 27 13:34:00 slot4-MW984 zookeeper_sector_clean: Cleanup for CompName: 
safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp
Aug 27 13:34:00 slot4-MW984 charon: 02[KNL] 169.254.91.248 disappeared from 
bond0
Aug 27 13:34:01 slot4-MW984 CRON[26217]: (root) CMD 
(/usr/share/platform-config/atca/update-ssh-keys)
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_clean: Cleanup Complete for 
CompName: 
safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: copying 
zkCleanup.movik.sector.sh to /usr/share/zookeeper/bin
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: zkId=2
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: using myId 2
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: executing script for type 
sector interface bond0:2
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: performing zkCleanup of 
/var/lib/zookeeper/movik.sector/
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: MVK_ZK_IP_BASE = 
169.254.91.247
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: MVK_ZK_IP_MASK = 
255.255.255.0
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: MVK_ZK_IP_CNT = 3
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: MVK_ZK_EXT_PORT = 2889
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: MVK_ZK_INT_PORT = 3889
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: MY_IP = 169.254.91.248
Aug 27 13:34:05 slot4-MW984 charon: 02[KNL] 169.254.91.248 appeared on bond0
Aug 27 13:34:05 slot4-MW984 charon: 02[KNL] 169.254.91.248 disappeared from 
bond0
Aug 27 13:34:05 slot4-MW984 charon: 02[KNL] 169.254.91.248 appeared on bond0
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: copying 
zookeeper_environment.movik.sector to /etc/zookeeper/conf
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: copying zkServer.sh to 
/usr/share/zookeeper/bin
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: overwriting 
/etc/zookeeper/conf/conf.movik.sector/zoo.movik.sector.cfg
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: touching file 
/var/run/zookeeper.movik.sector/zookeeper_server.pid
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: 
\nserver.1=169.254.91.247:2889:3889\nserver.2=169.254.91.248:2889:3889\nserver.3=169.254.91.249:2889:3889\n
Aug 27 13:34:05 slot4-MW984 zookeeper_sector_inst: Instantiating CompName: 
safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp
Aug 27 13:34:06 slot4-MW984 zookeeper_sector_inst: 
COMP_PID_MAP_FILE=/var/run/zookeeper.movik.sector/zookeeper_server.pid, 
PID=26336
Aug 27 13:34:06 slot4-MW984 amfpm: saAmfPmStart FAILED 12
Aug 27 13:34:06 slot4-MW984 osafamfnd[22493]: NO Instantiation of 
'safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp'
 failed
Aug 27 13:34:06 slot4-MW984 osafamfnd[22493]: NO Reason:'Exec of script 
success, but script exits with non-zero status'
Aug 27 13:34:06 slot4-MW984 osafamfnd[22493]: NO Exit code: 1
Aug 27 13:34:06 slot4-MW984 zookeeper_sector_clean: Cleanup for CompName: 
safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp
Aug 27 13:34:06 slot4-MW984 charon: 02[KNL] 169.254.91.248 disappeared from 
bond0
Aug 27 13:34:11 slot4-MW984 zookeeper_sector_clean: Cleanup Complete for 
CompName: 
safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp

Aug 27 13:34:17 slot4-MW984 zookeeper_sector_clean: Cleanup Complete for 
CompName: 
safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp
Aug 27 13:34:17 slot4-MW984 osafamfnd[22493]: WA 
'safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp'
 Presence State RESTARTING => INSTANTIATION_FAILED
Aug 27 13:34:17 slot4-MW984 osafamfnd[22493]: NO Component Failover trigerred 
for 
'safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp':
 Failed component: 
'safComp=ZookeeperSector_PL-4,safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp'
Aug 27 13:34:17 slot4-MW984 osafamfnd[22493]: NO 
'safSu=ZookeeperSectorSU_PL-4,safSg=ZookeeperSectorSG,safApp=ZookeeperSectorApp'
 Presence State INSTANTIATED => INSTANTIATION_FAILED
Aug 27 13:34:17 slot4-MW984 osafamfnd[22493]: 
/home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-4.4.0/osaf/services/saf/amf/amfnd/di.cc:569:
 avnd_di_susi_resp_send: Assertion 'm_AVND_SU_IS_ASSIGN_PEND(su)' failed.
Aug 27 13:34:17 slot4-MW984 compress-core.sh: Running 
/etc/compressed-coredump.d/001_kdp_bypass_for_gtppx_crash
Aug 27 13:34:17 slot4-MW984 osafamfwd[22526]: Rebooting OpenSAF NodeId = 0 EE 
Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId = 66561, 
SupervisionTime = 60
Aug 27 13:34:17 slot4-MW984 osafimmnd[22172]: AL AMF Node Director is down, 
terminate this process




---

** [tickets:#1025] osafamfd crashed during restart failure **

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Wed Aug 27, 2014 05:31 PM UTC by KANG-SEN LU
**Last Updated:** Thu Aug 28, 2014 02:46 PM UTC
**Owner:** nobody

We are running opensaf 4.4.0.

Here is a gdb stack trace of osafamfd crash:

    ==========================
    (gdb) bt
    0 0x00007f457067f425 in __GI_raise (sig=<optimized out="">) at

    ../nptl/sysdeps/unix/sysv/linux/raise.c:64
    1 0x00007f4570682b8b in __GI_abort () at abort.c:91
    2 0x00007f4572105f21 in __osafassert_fail (

    __file=0x448498

    "/home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/services/saf/amf/amfnd/di.cc", line=569,

    func=0x4488b0 <avnd_di_susi_resp_send(avnd_cb_tag*, avnd_su_tag*,="" 
avnd_su_si_rec*)::__FUNCTION__=""> "avnd_di_susi_resp_send",
    __assertion=0x44837a "m_AVND_SU_IS_ASSIGN_PEND(su)")
    at /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/libs/core/leap/sysf_def.c:278
    3 0x0000000000427a42 in avnd_di_susi_resp_send

    (cb=cb@entry=0x65e4a0 <_avnd_cb>, su=su@entry=0x2444980,
    si=si@entry=0x2439720)
    at /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/services/saf/amf/amfnd/di.cc:569
    4 0x0000000000438c21 in avnd_su_pres_st_chng_prc

    (final_st=SA_AMF_PRESENCE_INSTANTIATION_FAILED,
    prv_st=SA_AMF_PRESENCE_INSTANTIATED, su=0x2444980,
    cb=0x65e4a0 <_avnd_cb>) at
    /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/services/saf/amf/amfnd/susm.cc:1608
    5 avnd_su_pres_fsm_run (cb=cb@entry=0x65e4a0 <_avnd_cb>,

    su=0x2444980, comp=comp@entry=0x2444bb0, ev=<optimized out="">)
    at /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/services/saf/amf/amfnd/susm.cc:1394
    6 0x00000000004188b3 in avnd_comp_clc_st_chng_prc

    (cb=cb@entry=0x65e4a0 <_avnd_cb>, comp=comp@entry=0x2444bb0,
    prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING,
    final_st=final_st@entry=SA_AMF_PRESENCE_INSTANTIATION_FAILED)
    at /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/services/saf/amf/amfnd/clc.cc:1298
    7 0x000000000041a512 in avnd_comp_clc_fsm_run

    (cb=cb@entry=0x65e4a0 <_avnd_cb>, comp=comp@entry=0x2444bb0,
    ev=<optimized out="">)
    at /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/services/saf/amf/amfnd/clc.cc:862
    8 0x000000000041aa39 in avnd_evt_clc_resp_evh (cb=0x65e4a0

    <_avnd_cb>, evt=0x7f45640008c0)
    at /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/services/saf/amf/amfnd/clc.cc:416
    9 0x000000000042c23c in avnd_evt_process (evt=0x7f45640008c0)

    at /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-

    4.4.0/osaf/services/saf/amf/amfnd/main.cc:678
    10 avnd_main_process () at

    /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-
    4.4.0/osaf/services/saf/amf/amfnd/main.cc:619
    11 0x0000000000405328 in main (argc=1, argv=0x7fff7bcfc988)

    at /home/ksenlu/sandbox/klu_main/cae/extern/opensaf4/opensaf-

    4.4.0/osaf/services/saf/amf/amfnd/main.cc:178
    (gdb)




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to