- **status**: unassigned --> assigned
- **assigned_to**: Praveen


---

** [tickets:#2320] clm: standby clmd crashes due to missing node information**

**Status:** assigned
**Milestone:** 5.2.RC2
**Created:** Wed Feb 22, 2017 12:55 PM UTC by Zoran Milinkovic
**Last Updated:** Fri Feb 24, 2017 10:06 AM UTC
**Owner:** Praveen


The standby CLMD service crashed due to missing PL-3 information.

syslog from SC-2:
~~~
Feb 13 00:43:31 SC-2-2 osafamfd[5082]: NO Cold sync complete!
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: Ruling epoch noted as:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: SaImmRepositoryInitModeT changed 
and noted as 'SA_IMM_KEEP_REPOSITORY'
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 19082
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO Epoch set to 5 in ImmModel
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2020f old epoch: 4  new epoch:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2010f old epoch: 4  new epoch:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2030f old epoch: 0  new epoch:5
Feb 13 00:43:31 SC-2-2 osafclmd[5066]: ER Node is NULL,problem with the 
database.
Feb 13 00:43:31 SC-2-2 osafclmd[5066]: 
../../opensaf/src/clm/clmd/clms_mbcsv.c:468: ckpt_proc_node_rec: Assertion '0' 
failed.
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: ER 
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Feb 13 00:43:32 SC-2-2 opensaf_reboot: Rebooting local node; timeout=60
~~~

Coredump:
~~~
[New LWP 5066]
[New LWP 5069]
[New LWP 5068]
[New LWP 5070]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/lib64/opensaf/osafclmd'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fbc7be880c7 in raise () from /lib64/libc.so.6
### BT ###
#0  0x00007fbc7be880c7 in raise () from /lib64/libc.so.6
#1  0x00007fbc7be89478 in abort () from /lib64/libc.so.6
#2  0x00007fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 
"../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, 
__func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", 
__assertion=__assertion@entry=0x7fbc7e1b78ea "0") at 
../../opensaf/src/base/sysf_def.c:281
#3  0x00007fbc7e1aa016 in ckpt_proc_node_rec (cb=<optimized out>, 
data=0x7fbc7f218a50) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:468
#4  0x00007fbc7e1ae044 in ckpt_decode_async_update (cbk_arg=<optimized out>, 
cb=0x7fbc7e3be100 <_clms_cb>) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:2310
#5  ckpt_decode_cbk_handler (cbk_arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:1997
#6  mbcsv_callback (arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:719
#7  0x00007fbc7c856f76 in ncs_mbscv_rcv_decode (peer=peer@entry=0x7fbc7f217a60, 
evt=evt@entry=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:393
#8  0x00007fbc7c857146 in ncs_mbcsv_rcv_async_update (peer=0x7fbc7f217a60, 
evt=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:440
#9  0x00007fbc7c85dd30 in mbcsv_process_events (rcvd_evt=0x7fbc740036e0, 
mbcsv_hdl=mbcsv_hdl@entry=4293918753) at 
../../opensaf/src/mbc/mbcsv_pr_evts.c:168
#10 0x00007fbc7c85de9b in mbcsv_hdl_dispatch_all (mbcsv_hdl=4293918753, 
mbx=mbx@entry=4283432961) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:272
#11 0x00007fbc7c8586c2 in mbcsv_process_dispatch_request (arg=0x7fff27b6b310) 
at ../../opensaf/src/mbc/mbcsv_api.c:423
#12 0x00007fbc7e1aa7be in clms_mbcsv_dispatch (mbcsv_hdl=<optimized out>) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:687
#13 0x00007fbc7e19e4e4 in main (argc=<optimized out>, argv=<optimized out>) at 
../../opensaf/src/clm/clmd/clms_main.c:535
### BT FULL ###
#0  0x00007fbc7be880c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fbc7be89478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 
"../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, 
__func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", 
__assertion=__assertion@entry=0x7fbc7e1b78ea "0") at 
../../opensaf/src/base/sysf_def.c:281
No locals.
#3  0x00007fbc7e1aa016 in ckpt_proc_node_rec (cb=<optimized out>, 
data=0x7fbc7f218a50) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:468
        param = 0x7fbc7f218a60
        node = 0x0
        ip = 0x0
        __FUNCTION__ = "ckpt_proc_node_rec"
#4  0x00007fbc7e1ae044 in ckpt_decode_async_update (cbk_arg=<optimized out>, 
cb=0x7fbc7e3be100 <_clms_cb>) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:2310
        ckpt_cluster_rec = <optimized out>
        rc = 1
        num_bytes = <optimized out>
        hdr = 0x7fbc7f218a50
        ckpt_finalize_rec = <optimized out>
        ckpt_node_rec = <optimized out>
        ckpt_node_config_rec = <optimized out>
        ckpt_node_del_rec = <optimized out>
        ckpt_node_down_rec = <optimized out>
        ckpt_msg = 0x7fbc7f218a50
        ckpt_client_rec = <optimized out>
        ckpt_csync_node_rec = <optimized out>
        ckpt_agent_down = <optimized out>
#5  ckpt_decode_cbk_handler (cbk_arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:1997
        rc = 1
        msg_fmt_version = 1
#6  mbcsv_callback (arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:719
        rc = 1
        __FUNCTION__ = "mbcsv_callback"
#7  0x00007fbc7c856f76 in ncs_mbscv_rcv_decode (peer=peer@entry=0x7fbc7f217a60, 
evt=evt@entry=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:393
        parg = {
          i_op = NCS_MBCSV_CBOP_DEC,
          i_client_hdl = 0,
          i_ckpt_hdl = 4292870177,
          info = {
            encode = {
              io_msg_type = NCS_MBCSV_MSG_ASYNC_UPDATE,
              io_action = NCS_MBCSV_ACT_ADD,
              io_reo_type = 6,
              io_reo_hdl = 0,
              io_uba = {
                start = 0x0,
                ub = 0x0,
                bufp = 0x7000000000 <error: Cannot access memory at address 
0x7000000000>,
                res = 112,
                ttl = 0,
                max = 2132894108
              },
              io_req_context = 9209973925752930305,
              i_peer_version = 21264
            },
            decode = {
              i_msg_type = NCS_MBCSV_MSG_ASYNC_UPDATE,
              i_action = NCS_MBCSV_ACT_ADD,
              i_reo_type = 6,
              i_uba = {
                start = 0x0,
                ub = 0x0,
                bufp = 0x0,
                res = 0,
                ttl = 112,
                max = 112
              },
              o_req_context = 140447563473308,
              i_peer_version = 1
            },
            peer = {
              i_service = NCS_SERVICE_ID_LEAP_TMR,
              i_peer_version = 1
            },
            notify = {
              i_uba = {
                start = 0x100000001,
                ub = 0x6,
                bufp = 0x0,
                res = 0,
                ttl = 0,
                max = 0
              },
              i_peer_version = 0,
              i_msg = 0x70
            },
            error = {
              i_code = NCS_MBCSV_WARM_SYNC_TMR_EXP,
              i_err = true,
              i_arg = 0x6,
              i_peer_version = 0
            }
          }
        }
        status = 2
        mbc_inst = 0x7fbc7f2057f0
        __FUNCTION__ = "ncs_mbscv_rcv_decode"
#8  0x00007fbc7c857146 in ncs_mbcsv_rcv_async_update (peer=0x7fbc7f217a60, 
evt=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:440
        __FUNCTION__ = "ncs_mbcsv_rcv_async_update"
#9  0x00007fbc7c85dd30 in mbcsv_process_events (rcvd_evt=0x7fbc740036e0, 
mbcsv_hdl=mbcsv_hdl@entry=4293918753) at 
../../opensaf/src/mbc/mbcsv_pr_evts.c:168
        mbc_reg = 0x7fbc7f2057f0
        peer = <optimized out>
        ckpt = <optimized out>
        hdl_to_give = 4291821601
        __FUNCTION__ = "mbcsv_process_events"
#10 0x00007fbc7c85de9b in mbcsv_hdl_dispatch_all (mbcsv_hdl=4293918753, 
mbx=mbx@entry=4283432961) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:272
        rcvd_evt = <optimized out>
        rc = 1
        __FUNCTION__ = "mbcsv_hdl_dispatch_all"
#11 0x00007fbc7c8586c2 in mbcsv_process_dispatch_request (arg=0x7fff27b6b310) 
at ../../opensaf/src/mbc/mbcsv_api.c:423
        mbc_reg = <optimized out>
        rc = SA_AIS_OK
        mail_box = 4283432961
        __FUNCTION__ = "mbcsv_process_dispatch_request"
#12 0x00007fbc7e1aa7be in clms_mbcsv_dispatch (mbcsv_hdl=<optimized out>) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:687
        mbcsv_arg = {
          i_op = NCS_MBCSV_OP_DISPATCH,
          i_mbcsv_hdl = 4293918753,
          info = {
            initialize = {
              i_mbcsv_cb = 0x2,
              i_version = 0,
              i_service = NCS_SERVICE_ID_BASE,
              o_mbcsv_hdl = 0
            },
            sel_obj_get = {
              o_select_obj = 2
            },
            dispatch = {
              i_disp_flags = SA_DISPATCH_ALL
            },
            finalize = {
              i_dummy = 2
            },
            open = {
              i_pwe_hdl = 2,
              i_client_hdl = 0,
              o_ckpt_hdl = 0
            },
            close = {
              i_ckpt_hdl = 2
            },
            chg_role = {
              i_ckpt_hdl = 2,
              i_ha_state = 0
            },
            send_ckpt = {
              i_ckpt_hdl = 2,
              i_send_type = NCS_MBCSV_SND_SYNC,
              i_reo_type = 0,
              i_reo_hdl = 0,
              i_action = NCS_MBCSV_ACT_DONT_CARE,
              io_no_peer = false
            },
            send_notify = {
              i_ckpt_hdl = 2,
              i_msg_dest = NCS_MBCSV_ACTIVE,
              i_msg = 0x0
            },
            send_data_req = {
              i_ckpt_hdl = 2,
              i_uba = {
                start = 0x0,
                ub = 0x0,
                bufp = 0x0,
                res = 0,
                ttl = 0,
                max = 0
              }
            },
            obj_set = {
              i_ckpt_hdl = 2,
              i_obj = NCS_MBCSV_OBJ_WARM_SYNC_ON_OFF,
              i_val = 0
            },
            obj_get = {
              i_ckpt_hdl = 2,
              i_obj = NCS_MBCSV_OBJ_WARM_SYNC_ON_OFF,
              o_val = 0
            }
          }
        }
#13 0x00007fbc7e19e4e4 in main (argc=<optimized out>, argv=<optimized out>) at 
../../opensaf/src/clm/clmd/clms_main.c:535
        mbx_fd = <optimized out>
        error = <optimized out>
        rc = <optimized out>
        term_fd = 17
        timeout = <optimized out>
        __FUNCTION__ = "main"
### THREAD APPLY ALL BT ###

Thread 4 (Thread 0x7fbc7e333b00 (LWP 5070)):
#0  0x00007fbc7bf2fbfd in poll () from /lib64/libc.so.6
#1  0x00007fbc7c84e361 in poll (__timeout=30000, __nfds=1, 
__fds=0x7fbc7e3331f0) at /usr/include/bits/poll2.h:46
#2  osaf_ppoll (io_fds=io_fds@entry=0x7fbc7e3331f0, i_nfds=i_nfds@entry=1, 
i_timeout_ts=i_timeout_ts@entry=0x7fbc7e3331c0, i_sigmask=i_sigmask@entry=0x0) 
at ../../opensaf/src/base/osaf_poll.c:105
#3  0x00007fbc7c84e4fb in osaf_poll (io_fds=io_fds@entry=0x7fbc7e3331f0, 
i_nfds=i_nfds@entry=1, i_timeout=i_timeout@entry=30000) at 
../../opensaf/src/base/osaf_poll.c:44
#4  0x00007fbc7c84e545 in osaf_poll_one_fd (i_fd=11, i_timeout=30000) at 
../../opensaf/src/base/osaf_poll.c:128
#5  0x00007fbc7c87f2d7 in rda_read_msg (sockfd=11, msg=msg@entry=0x7fbc7e333290 
"10 2", size=64) at ../../opensaf/src/rde/agent/rda_papi.cc:673
#6  0x00007fbc7c87f564 in rda_callback_task (rda_callback_cb=0x7fbc7f2034a0) at 
../../opensaf/src/rde/agent/rda_papi.cc:150
#7  0x00007fbc7c2030a4 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fbc7bf3802d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7fbc7e373b00 (LWP 5068)):
#0  0x00007fbc7bf2fbfd in poll () from /lib64/libc.so.6
#1  0x00007fbc7c84e361 in poll (__timeout=5100, __nfds=1, __fds=0x7fbc7e373260) 
at /usr/include/bits/poll2.h:46
#2  osaf_ppoll (io_fds=io_fds@entry=0x7fbc7e373260, i_nfds=i_nfds@entry=1, 
i_timeout_ts=0x7fbc7e373280, i_sigmask=i_sigmask@entry=0x0) at 
../../opensaf/src/base/osaf_poll.c:105
#3  0x00007fbc7c85549f in ncs_tmr_wait () at 
../../opensaf/src/base/sysf_tmr.c:414
#4  0x00007fbc7c2030a4 in start_thread () from /lib64/libpthread.so.0
#5  0x00007fbc7bf3802d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fbc7e353b00 (LWP 5069)):
#0  0x00007fbc7bf2fbfd in poll () from /lib64/libc.so.6
#1  0x00007fbc7c87c9cf in poll (__timeout=20000, __nfds=3, 
__fds=0x7fbc7e353280) at /usr/include/bits/poll2.h:46
#2  mdtm_process_recv_events () at ../../opensaf/src/mds/mds_dt_tipc.c:684
#3  0x00007fbc7c2030a4 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fbc7bf3802d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fbc7e3ab740 (LWP 5066)):
#0  0x00007fbc7be880c7 in raise () from /lib64/libc.so.6
#1  0x00007fbc7be89478 in abort () from /lib64/libc.so.6
#2  0x00007fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 
"../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, 
__func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", 
__assertion=__assertion@entry=0x7fbc7e1b78ea "0") at 
../../opensaf/src/base/sysf_def.c:281
#3  0x00007fbc7e1aa016 in ckpt_proc_node_rec (cb=<optimized out>, 
data=0x7fbc7f218a50) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:468
#4  0x00007fbc7e1ae044 in ckpt_decode_async_update (cbk_arg=<optimized out>, 
cb=0x7fbc7e3be100 <_clms_cb>) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:2310
#5  ckpt_decode_cbk_handler (cbk_arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:1997
#6  mbcsv_callback (arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:719
#7  0x00007fbc7c856f76 in ncs_mbscv_rcv_decode (peer=peer@entry=0x7fbc7f217a60, 
evt=evt@entry=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:393
#8  0x00007fbc7c857146 in ncs_mbcsv_rcv_async_update (peer=0x7fbc7f217a60, 
evt=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:440
#9  0x00007fbc7c85dd30 in mbcsv_process_events (rcvd_evt=0x7fbc740036e0, 
mbcsv_hdl=mbcsv_hdl@entry=4293918753) at 
../../opensaf/src/mbc/mbcsv_pr_evts.c:168
#10 0x00007fbc7c85de9b in mbcsv_hdl_dispatch_all (mbcsv_hdl=4293918753, 
mbx=mbx@entry=4283432961) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:272
#11 0x00007fbc7c8586c2 in mbcsv_process_dispatch_request (arg=0x7fff27b6b310) 
at ../../opensaf/src/mbc/mbcsv_api.c:423
#12 0x00007fbc7e1aa7be in clms_mbcsv_dispatch (mbcsv_hdl=<optimized out>) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:687
#13 0x00007fbc7e19e4e4 in main (argc=<optimized out>, argv=<optimized out>) at 
../../opensaf/src/clm/clmd/clms_main.c:535
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to