Re: [OpenSIPS-Devel] [opensips] error in infinite loop "io_watch_del: BUG - trying to del fd 34 with flags 2 1" (#591)
Hi @rgupta0110 , Manage to replicate the bug, now working on the fix. The problem is that somehow epoll_wait generates EPOLLERR event(strerror(errno)="Bad file descriptor") so we will try to come with the fix as soon as possible. --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-154346077___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop "io_watch_del: BUG - trying to del fd 34 with flags 2 1" (#591)
@rgupta0110 i would be very thankful if you could try this patch[0] and tell me if it solves your problem. It solved the problem for me. Regards, Ionut Ionita [0] https://gist.github.com/ionutrazvanionita/86efca1c13c6654c2a74 --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-154398863___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
Hi Bogdan, I have uploaded the tcpdump traces for the issue along with core, binaries and similar log files http://www88.zippyshare.com/v/kgfvUJ9b/file.html Opensips IP - 10.205.235.5 (port 5060) CallManager - 10.205.235.5 (Port 5059) endpoint - 10.205.203.130 Thanks Rahul Gupta --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-133458933___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
Hi Bogdan, this is inside our lab and it won’t be possible to give you the access. However, I can work with you to put any debug binaries or code changes and test it out. Let me know if that’s something we can do. Thanks Rahul Gupta From: Bogdan Andrei IANCU [mailto:notificati...@github.com] Sent: Friday, August 14, 2015 10:13 AM To: OpenSIPS/opensips Cc: Gupta, Rahul Subject: Re: [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591) @rgupta0110https://github.com/rgupta0110 thank you for your effort. In order to speed up the fixing, is it possible to get access to an opensips where this can be reproduced (once again, I couldn't reproduce it on my side) - so I can trigger the testing and have opensips im my hands to change and debug ? If you can manage to provide this, I guess I will manage to trace the problem and have it fixed in max 1 day. — Reply to this email directly or view it on GitHubhttps://github.com/OpenSIPS/opensips/issues/591#issuecomment-131117313. -- DISCLAIMER: This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unintended recipients are prohibited from taking action on the basis of information in this e-mail.E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered with without the knowledge of the sender or the intended recipient. If you are not comfortable with the risks associated with e-mail messages, you may decide not to use e-mail to communicate with IPC. IPC reserves the right, to the extent and under circumstances permitted by applicable law, to retain, monitor and intercept e-mail messages to and from its systems. --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-131811375___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
Rahul, the missing element for me is how the TCP conn gets terminated while in the SYN_SENT state (after attempting to connect to the disconnected party). What is actually happening with the conn while in SYN_SENT ? I get is not opensips the one destroying itis there some TCP timeout in kernel ? Maybe a tcpdump capture will help to show all the packages in that TCP conn. Thanks and regards, Bogdan --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-131891540___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
@rgupta0110 thank you for your effort. In order to speed up the fixing, is it possible to get access to an opensips where this can be reproduced (once again, I couldn't reproduce it on my side) - so I can trigger the testing and have opensips im my hands to change and debug ? If you can manage to provide this, I guess I will manage to trace the problem and have it fixed in max 1 day. --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-131117313___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
The last logs you provided actually indicate that the poll event was an EPOLLIN, while that fd is added only for EPOLLOUT :-/ . This is way, even if your fix hides the effects, it is not dealing with the actual problem. --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-131118983___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
Hi Bogdan, I have reproduced the issue and uploaded the core file along with opensips bins,libs and other logs at the following location http://www81.zippyshare.com/v/LdXDN5r6/file.html Call flow and timestamps : 1) Tue Aug 11 11:36:54 --- created a TCP connection to send out SIP NOTIFY to 10.205.203.130 2) Tue Aug 11 11:37:07 --- unplugged the network cable from the endpoint 10.205.203.130 3) Tue Aug 11 11:39:38 --- tcp connection timed out (FYI tcp_connection_lifetime=120 in opensips.cfg) 4) Tue Aug 11 11:41:08 --- Tried another call to the same end point while the network cable was unplugged. opensips created another TCP connection to send another SIP NOTIFY, netstat showed this connection in SYN_SENT state 5) 2015-08-11T11:42:10.000 --- the above connection went away from net stat and opensips started printed BUG - trying to del fd 34 with flags 2 1 ABORT and it generated the uploaded core file following is the backtrace #0 0x7f217ed2a925 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-33.el6.x86_64 libcom_err-1.41.12-18.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 openssl-1.0.1e-30.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x7f217ed2a925 in raise () from /lib64/libc.so.6 #1 0x7f217ed2c105 in abort () from /lib64/libc.so.6 #2 0x004aa981 in io_watch_del (fd=34, idx=value optimized out, flags=0, sock_flags=1, h=0x7e6d00) at io_wait.h:608 #3 0x004b049b in handle_tcpconn_ev (tcpconn=0x7f216b26eb00, fd_i=-1, event_type=value optimized out) at tcp_main.c:1560 #4 0x004b7eba in io_wait_loop_epoll () at io_wait.h:846 #5 tcp_main_loop () at tcp_main.c:2138 #6 0x00438058 in main_loop (argc=value optimized out, argv=value optimized out) at main.c:1053 #7 main (argc=value optimized out, argv=value optimized out) at main.c:1634 Thanks Rahul Gupta --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-129952925___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
Hi Bogdan, I will generate the core file and upload the logs and core soon. Meanwhile, I have put the following fix which seems working and I haven't experienced any issues so far. Could you please review it and let me know if there is any potential problem Added the following condition in tcp_main.c where its calling io_watch_del in the particular failure scenario inside the method handle_tcpconn_ev() if(tcpconn-flags F_CONN_NOT_CONNECTED) { if (io_watch_del(io_h, tcpconn-s, fd_i, 0,IO_WATCH_WRITE)==-1) return -1; } This F_CONN_NOT_CONNECTED flag is only added in case of ASYNC_CONNECT where the connection is being added with IO_WATCH_WRITE mode. The difference between file code is: [root@rg-rhel-65 opensips-1.11.5-tls]# diff tcp_main.c.orig tcp_main.c 1565,1566c1565,1575 if (io_watch_del(io_h, tcpconn-s, fd_i, 0,IO_WATCH_READ)==-1) return -1; --- if(tcpconn-flags F_CONN_NOT_CONNECTED) { if (io_watch_del(io_h, tcpconn-s, fd_i, 0,IO_WATCH_WRITE)==-1) return -1; } else { if (io_watch_del(io_h, tcpconn-s, fd_i, 0,IO_WATCH_READ)==-1) return -1; } Thanks Rahul --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-129476952___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
@rgupta0110 , that may be a simplistic way of fixing the effect, but without understanding the cause. As I mentioned, in my tests I get a completely different behavior, so fixing the effects may not bit a generic solution. Please let me know about the ERROR logs that I asked you to enable and about the backtrace. Thanks, Bogdan --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-129516728___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
@rgupta0110 , I did a lot of testing and research on this. Still I was not able to reproduce the the behavior you describe (at least not the last step). While I have the conn in SYN_SENT, if (A) conn is expired by opensips, it is simply removed or if (b) the kernel is timing it out (due low tcp_syn_retries), I get the connection %p fd %d is now writable message and it is closed. What is really strange in your case is the fact that the FD never gets added to poll with the READ indication (if you grep in your log for FD 34 in proc 13168) for io_watch_xxx ops, you will see that the FD is added only for WRITE: DBG:core:io_watch_add: io_watch_add op on 34 (0x7e5f40, 34, 2, 0x7f18fc1809e8,2), fd_no=27 Still, the triggering on FD 34: DBG:core:handle_tcpconn_ev: data available on 0x7f18fc1809e8 34 shows a READ triggering (handle_tcpconn_ev is triggered by epoll only with IO_WATCH_READ.. So, I my only logical assumption is that we deal with a mis-mapping of the epoll event to the IO_WATCH event. What I suspect is the EPOLLHUP epoll event, which according to the man : Hang up happened on the associated file descriptor. epoll_wait(2) will always wait for this event; it is not necessary to set it in events. Note that when reading from a channel such as a pipe or a stream socket, this event merely indicates that the peer closed its end of the channel. Subsequent reads from the channel will return 0 (end of file) only after all outstanding data in the channel has been consumed. So, even if added by epoll_ctl() for EPOLLOUT event, our FD reports the EPOLLHUP which is automatically translated by the io_wait epoll implementation (see io_wait_loop_epoll()) to a IO_WATCH_READ event. Which leads to the described bug. Now, I need to confirm this . For this, in io_wait_loop_epoll(), line 833, change the #if 0 into an #if 1 to print the actual epoll events. Also, in io_watch_del(), after the LM_ERR printing the bug log BUG - trying to del fd %d with flags %d %d (line 606), add an abort(); to force opensips to core dump when getting there - this will help us to investigate the whole function's stack. Please do the above changes, recompile and run the test again. Be sure with the ulimit -c unlimited in order to get a core file. Send me the opensips logs again, plus the backtrace from the core file. Thanks and regards, Bogdan --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-129471918___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
Server:: OpenSIPS (1.11.3-tls (x86_64/linux)) FYI, its reproducible with 1.11.5 latest src too. --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-128704584___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
@rgupta0110 , which opensips version were you using for this logs ? --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-128635411___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
[OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
Hi Bogdan, Almost a month ago, I have raised the issue regarding the opensips log file filling up with the following message ERROR:core:io_watch_del: BUG - trying to del fd 38 with flags 2 1 Now I can reproduce it every time with the following scenario: 1) UAC and UAS are connected to opensips proxy with TCP transport 2) Connections looks good 3) Netstat shows the connection to UAC in ESTABLISHED state and the tcp_conn_lists of opensips also looks fine for that UAC 4) Now unplug the Ethernet cable on UAC 5) After tcp connection timeout (set to 5 mins in opensips.cfg), the tcp connection goes away from netstat as well as from tcp_conn_lists 6) UAS tries another tcp call to the UAC which is still unplugged 7) tcp_conn_lists shows the tcp connection to the UAC and netstat shows the connection in SYN_SENT state 8) After the tcp connection timeout (set to 5 mins in opensips.cfg), the connection goes away from netstat however it remains there in tcp_conn_lists and at that moment the “BUG - trying to del fd 38 with flags 2 1” starts printing in infinite loop. I looked at the source code and observed the following 1) When the network cable is plugged in The io_watch_add happens with flag IO_WATCH_READ in tcp_main.c when the command is CONN_NEW 2) When the network cable is unplugged The io_watch_add happens with flag IO_WATCH_WRITE in tcp_main.c when the command is ASYNC_CONNECT 3) While doing io_watch_del after timeout, from handle_tcpconn_ev method, io_watch_del always uses IO_WATCH_READ to delete the fd, which gives this error in case of unplugged cable. Please look into this and suggest how can I fix this issue ? Thanks Rahul Gupta --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel
Re: [OpenSIPS-Devel] [opensips] error in infinite loop io_watch_del: BUG - trying to del fd 34 with flags 2 1 (#591)
I have captured logs for the scenario and uploaded to zippyshare . Please let me know if you want me to upload to anyother location http://www76.zippyshare.com/v/njlgk4lW/file.html 1) netstat_debug.log --- netstat info for the particular endpoint (10.205.203.130 in this case) 2) opensipsctl_debug.log --- list_tcp_conns from opensips for the particular end point (10.205.203.130) 3) opensips.log --- opensips logs with debug level 6 Call flow and timestamps : 1) 15:45:02 --- created a TCP connection to send out SIP NOTIFY to 10.205.203.130 2) 15:46:30 --- unplugged the network cable from the endpoint 10.205.203.130 3) 15:47:11 --- tcp connection timed out (FYI tcp_connection_lifetime=120 in opensips.cfg) 4) 15:49:30 --- Tried another call to the same end point while the network cable was unplugged. opensips created another TCP connection to send another SIP NOTIFY, netstat showed this connection in SYN_SENT state 5) 15:50:31 --- the above connection went away from net stat and opensips started printing BUG - trying to del fd 34 with flags 2 1 in infinite loop FYI, I changed the debug level of this message from ERR to DBG in source code and recompiled so that it doesnt fill my hard drive while running in debug=2 mode Let me know if you need more clarification or any other details regarding this issue Thanks Rahul Gupta --- Reply to this email directly or view it on GitHub: https://github.com/OpenSIPS/opensips/issues/591#issuecomment-128507656___ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel