Hi Mathi/Mahesh, First of all thanks for helping me in resolving this issue.
Do you require CPA(application) or traces of CPA? If it is traces, please let me know how to get it. Regards, Girish -----Original Message----- From: Mathivanan Naickan Palanivelu [mailto:[email protected]] Sent: Friday, February 20, 2015 3:55 PM To: [email protected] Cc: [email protected]; [email protected] Subject: Re: [users] Issues with CPSv Hi, Please raise a ticket for this crash and share the traces of CPND and CPA(your application). Also, you should specify a testcase or try to explain what the application is doing and at what point the crash is occuring? Thanks, Mathi. ----- [email protected] wrote: > Hi, > > > > I don’t get this issue with opensaf version 4.3, but I get segfault: > > > > application sometimes crashes, stack trace as below: > > > > Program received signal SIGSEGV, Segmentation fault. > > search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 > "H\356\367\b") at patricia.c:94 > > 94 patricia.c: No such file or directory. > > (gdb) bt > > #0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 > "H\356\367\b") at patricia.c:94 > > #1 0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4, > pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434 > > #2 0xb7738493 in cpa_lcl_ckpt_node_get > (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4, > lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10) > > at cpa_db.c:195 > > #3 0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120, > ioVector=0x92c6d28, numberOfElements=1320, > > erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at > cpa_api.c:3134 > > > > (gdb) p pNode > > $2 = (NCS_PATRICIA_NODE *) 0x5e > > (gdb) p *pTree > > $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, > key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, > actual_key_size = 0, > > node_size = 0}, n_nodes = 3} > > > > > > Regards, > > Girish > > > > *From:* Girish Nagaraj [mailto:[email protected]] > *Sent:* Friday, February 20, 2015 3:34 PM > *To:* 'A V Mahesh'; '[email protected]' > *Subject:* RE: [users] Issues with CPSv > > > > Hi, > > > > Yes, similar issue in TCP also: exits with message: > > > > Feb 20 15:24:59 fedvm1 RIB[28549]: MDTM:socket_recv() = 0, conn lost > with dh server, exiting library err :Success > > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO > 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart > probation timer started (timeout: 4000000000 ns) > > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO Restarting a component of > 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp restart count: 1) > > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO > 'safComp=ribd,safSu=SU1,safSg=zebos-simplex,safApp=zebos' faulted due > to 'avaDown' : Recovery is 'componentRestart' > > > > I experimented with code changes: > > > > recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->len_buff, 2, > MSG_NOSIGNAL); > > if (0 == recd_bytes) { > > syslog(LOG_ERR, "MDTM:socket_recv() = > %d, conn lost with dh server, exiting library err 111:%d", recd_bytes, > errno); > > close(tcp_cb->DBSRsock); > > exit(0); > > } else if (2 == recd_bytes) { > > uint16_t local_len_buf = 0; > > > > data = tcp_cb->len_buff; > > local_len_buf = > ncs_decode_16bit(&data); > > > > /* MY CHANGE START */ > > *if (0 == local_len_buf)* > > * return;* > > /* MY CHANGE END */ > > > > tcp_cb->buff_total_len = > local_len_buf; > > tcp_cb->num_by_read_for_len_buff = 2; > > > > if (NULL == (tcp_cb->buffer = > calloc(1, (local_len_buf + 1)))) { > > /* Length + 2 is done to reuse > the same buffer > > while sending to other > nodes */ > > syslog(LOG_ERR, "Memory > allocation failed in dtm_intranode_processing"); > > return; > > } > > recd_bytes = recv(tcp_cb->DBSRsock, > tcp_cb->buffer, local_len_buf, 0); > > if (recd_bytes < 0) { > > return; > > } else if (0 == recd_bytes) { > > syslog(LOG_ERR, > "MDTM:socket_recv() > = %d, conn lost with dh server, exiting library err 222:%d len:%d", > recd_bytes, errno, > > > local_len_buf); > > close(tcp_cb->DBSRsock); > > exit(0); > > > > This caused many other issues, so I think just returning won’t work. > > > > Regards, > > Girish > > > > -----Original Message----- > From: A V Mahesh [mailto:[email protected] > <[email protected]>] > Sent: Friday, February 20, 2015 1:38 PM > To: Girish Nagaraj; [email protected] > Subject: Re: [users] Issues with CPSv > > > > Hi, > > > > On 2/20/2015 1:19 PM, Girish Nagaraj wrote: > > > Hi , > > > > > > I think this is not connection loss, we are passing 0 (len of > bytes > > > to be > > > read) to recv() function. Which returns back 0 received bytes. > > > > You mean, you are seeing issue similar to `TIPC ticket #1227 > mds/tipc > > : protect mds application form zero bytes hacking messages` for TCP as > well ? > > > > -AVM > > > > > > > > local_len_buf = ncs_decode_16bit(&data); > > > > > > Is there mistake in decoding local_len_buf? > > > > > > Regards, > > > Girish > > > > > > -----Original Message----- > > > From: A V Mahesh [mailto:[email protected] > <[email protected]> > ] > > > Sent: Friday, February 20, 2015 11:03 AM > > > To: [email protected] > > > Subject: Re: [users] Issues with CPSv > > > > > > Hi, > > > > > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote: > > >> local_len_buf turns out be 0, this causes recv() to return 0 and > > >> application exits. Is this programming bug?? > > > This is expected behavior , if any connection loss happens on TCP > > > socket will recives ZERO size bytes, this not related to CPSv. > > > > > > -AVM > > > > > > > > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote: > > >> Hi, > > >> > > >> > > >> > > >> *Background*: > > >> > > >> Opensaf version: 4.5 > > >> > > >> Number of checkpoints used: 2 > > >> > > >> In our application we use CPSv to save application data and when > > >> application faults, it is restarted and it’s state is restored back > > >> by reading data from checkpoints > > >> > > >> Model: Simplex > > >> > > >> > > >> > > >> * Issue faced:* > > >> > > >> application sometimes crashes, stack trace as below: > > >> > > >> > > >> > > >> Program received signal SIGSEGV, Segmentation fault. > > >> > > >> search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 > > >> "H\356\367\b") at patricia.c:94 > > >> > > >> 94 patricia.c: No such file or directory. > > >> > > >> (gdb) bt > > >> > > >> #0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 > > >> "H\356\367\b") at patricia.c:94 > > >> > > >> #1 0xb76d0bef in ncs_patricia_tree_get > (pTree=pTree@entry=0x8f733e4, > > >> pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434 > > >> > > >> #2 0xb7738493 in cpa_lcl_ckpt_node_get > > >> (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4, > > >> lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10) > > >> > > >> at cpa_db.c:195 > > >> > > >> #3 0xb7734d76 in saCkptCheckpointWrite > (checkpointHandle=150466120, > > >> ioVector=0x92c6d28, numberOfElements=1320, > > >> > > >> erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) > at > > >> cpa_api.c:3134 > > >> > > >> > > >> > > >> (gdb) p pNode > > >> > > >> $2 = (NCS_PATRICIA_NODE *) 0x5e > > >> > > >> (gdb) p *pTree > > >> > > >> $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, > > >> key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, > > >> actual_key_size = 0, > > >> > > >> node_size = 0}, n_nodes = 3} > > >> > > >> > > >> > > >> sometimes application exits with below message: > > >> > > >> > > >> > > >> Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, > conn > > >> lost with dh server, exiting library err:0 len:0 > > >> > > >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO > > >> 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart > > >> probation timer started (timeout: 4000000000 ns) > > >> > > >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a > > >> component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp > > >> restart count: 1) > > >> > > >> > > >> > > >> > > >> > > >> Below is the modified code snippet from file > > >> osaf/libs/core/mds/mds_dt_trans.c > > >> > > >> > > >> > > >> } else if (2 == recd_bytes) { > > >> > > >> uint16_t local_len_buf = 0; > > >> > > >> > > >> > > >> data = tcp_cb->len_buff; > > >> > > >> local_len_buf = > > >> ncs_decode_16bit(&data); > > >> > > >> tcp_cb->buff_total_len = > > >> local_len_buf; > > >> > > >> tcp_cb->num_by_read_for_len_buff > = > > >> 2; > > >> > > >> > > >> > > >> if (NULL == (tcp_cb->buffer = > > >> calloc(1, (local_len_buf + 1)))) { > > >> > > >> /* Length + 2 is done to > > >> reuse the same buffer > > >> > > >> while sending to other > > >> nodes */ > > >> > > >> syslog(LOG_ERR, "Memory > > >> allocation failed in dtm_intranode_processing"); > > >> > > >> return; > > >> > > >> } > > >> > > >> recd_bytes = > recv(tcp_cb->DBSRsock, > > >> tcp_cb->buffer, local_len_buf, 0); > > >> > > >> if (recd_bytes < 0) { > > >> > > >> return; > > >> > > >> } else if (0 == recd_bytes) { > > >> > > >> syslog(LOG_ERR, > > >> "MDTM:socket_recv() = %d, conn lost with dh server, exiting library > > >> err:%d len:%d", recd_bytes, errno, local_len_buf); > > >> > > >> close(tcp_cb->DBSRsock); > > >> > > >> exit(0); *<<<<<<<EXITS > > >> HERE>>>>>>>>>>* > > >> > > >> } else if (local_len_buf > > > >> recd_bytes) { > > >> > > >> /* can happen only in two > > >> cases, system call interrupt or half data, */ > > >> > > >> TRACE("less data recd, > recd > > >> bytes = %d, actual len = %d", recd_bytes, > > >> > > >> local_len_buf); > > >> > > >> tcp_cb->bytes_tb_read = > > >> tcp_cb->buff_total_len - recd_bytes; > > >> > > >> return; > > >> > > >> > > >> > > >> local_len_buf turns out be 0, this causes recv() to return 0 and > > >> application exits. Is this programming bug?? > > >> > > >> > > >> > > >> Could someone please help to resolve these issues. > > >> > > >> > > >> > > >> Regards, > > >> > > >> Girish > > >> > > > > > > > ---------------------------------------------------------------------- > > > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT > > > Server from Actuate! Instantly Supercharge Your Business Reports and > > > Dashboards with Interactivity, Sharing, Native Excel Exports, App > > > Integration & more Get technology previously reserved for > > > billion-dollar corporations, FREE > > > > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg. > > > clktrk _______________________________________________ > > > Opensaf-users mailing list > > > [email protected] > > > https://lists.sourceforge.net/lists/listinfo/opensaf-users > > > > > -- > . > ---------------------------------------------------------------------- > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT > Server from Actuate! Instantly Supercharge Your Business Reports and > Dashboards with Interactivity, Sharing, Native Excel Exports, App > Integration & more Get technology previously reserved for > billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg. > clktrk _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users -- . ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
