Bugs item #3570495, was opened at 2012-09-21 05:26 Message generated for change (Comment added) made by dragosoancea You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3570495&group_id=232389
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Dragos Oancea (dragosoancea) Assigned to: Nobody/Anonymous (nobody) Summary: crashes in the dialog module with opensips 1.8.1-tls Initial Comment: Hi , The setup where I am experiencing these crashes has the following particularities: 1. TLS My TLS config looks like this: listen = tls:212.162.x.x:5061 listen = tcp:172.20.x.x:5060 # tls params disable_tls = no tls_verify_server = 1 tls_verify_client = 0 tls_require_client_certificate = 0 tls_method = TLSv1 tls_certificate = "/etc/pki/CA/certs/mydomain.crt" tls_private_key = "/etc/pki/CA/private/mydomain.key" tls_ca_list = "/etc/pki/CA/certs/ca.crt" # tcp parameters tcp_connection_lifetime=1860 open_files_limit=8192 tcp_max_connections=4096 2. SIP ping for NAT bypassing I create the dialog with create_dialog("BPp") , sending OPTIONS to both sides . I also call fix_nated_contact() on requests and replies. 3. I have calls where a protocol change is necessary , like UDP->TLS , TLS->UDP , TLS->TCP . I use: modparam("rr", "enable_double_rr", 1) 4. I use the same database for multiple SIP proxies. Unfortunally opensips was not compiled with debugging symbols. Here 1st crash ( I saw this one ALOT of times) : Core was generated by `/usr/sbin/opensips -P /var/run/opensips.pid -f /etc/opensips/opensips.cfg'. Program terminated with signal 11, Segmentation fault. #0 0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so Missing separate debuginfos, use: debuginfo-install opensips-1.8.1-3.el6.x86_64 (gdb) bt full #0 0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so No symbol table info available. #1 0x000000000047d4ca in start_timer_processes () No symbol table info available. #2 0x000000000042a814 in main () No symbol table info available. Core was generated by `/usr/sbin/opensips -P /var/run/opensips.pid -f /etc/opensips/opensips.cfg'. Program terminated with signal 11, Segmentation fault. #0 0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so Missing separate debuginfos, use: debuginfo-install opensips-1.8.1-3.el6.x86_64 (gdb) i r rax 0x7f15e430a388 139731999433608 rbx 0x1 1 rcx 0x7f15e43020a0 139731999400096 rdx 0x400 1024 rsi 0x1 1 rdi 0x200000009 8589934601 rbp 0x7f15e4792570 0x7f15e4792570 rsp 0x7fff372daaa0 0x7fff372daaa0 r8 0x7f15e4792558 139732004185432 r9 0x6 6 r10 0x4000 16384 r11 0x0 0 r12 0x7f15e47926e8 139732004185832 r13 0x200000009 8589934601 r14 0x769660 7771744 r15 0x734cc8 7556296 rip 0x7f160948c7e8 0x7f160948c7e8 <dlg_ping_routine+280> eflags 0x10202 [ IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) bt #0 0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so #1 0x000000000047d4ca in start_timer_processes () #2 0x000000000042a814 in main () (gdb) bt full #0 0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so No symbol table info available. #1 0x000000000047d4ca in start_timer_processes () No symbol table info available. #2 0x000000000042a814 in main () No symbol table info available. (gdb) this crash happends from time to time , unrelated to the load apparently. machine configuration: =================== Centos 6.3 Linux somedomain.com 2.6.32-220.23.1.el6.x86_64 #1 SMP Mon Jun 18 18:58:52 BST 2012 x86_64 x86_64 x86_64 GNU/Linux opensips -V version: opensips 1.8.1-tls (x86_64/linux) flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, DISABLE_NAGLE, USE_MCAST, SHM_MEM, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535 poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. svnrevision: unknown @(#) $Id: main.c 8772 2012-03-08 11:16:13Z bogdan_iancu $ main.c compiled on 17:04:54 Sep 8 2012 with gcc 4.4.6 openssl version: openssl-1.0.0-20.el6_2.5.x86_64 ====================== ------------------------------------------------------------------- ------------------------------------------------------------------- Here is the 2nd crash - it could be related to the first. (this is on another box, with a slightly different config file (it listens to UDP as well) I did not see this showing like below in gdb on opensips 1.7.2 (it never used to crash in unref_dlg , only in dlg_ping_routing()) Core was generated by `/usr/sbin/opensips -P /var/run/opensips.pid -f /etc/opensips/opensips.cfg'. Program terminated with signal 11, Segmentation fault. #0 0x00007fe93ab05491 in unref_dlg () from /usr/lib64/opensips/modules/dialog.so Missing separate debuginfos, use: debuginfo-install opensips-1.8.1-3.el6.x86_64 (gdb) bt full #0 0x00007fe93ab05491 in unref_dlg () from /usr/lib64/opensips/modules/dialog.so No symbol table info available. #1 0x00007fe93ab0e7f5 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so No symbol table info available. #2 0x000000000047d4ca in start_timer_processes () No symbol table info available. #3 0x000000000042a814 in main () No symbol table info available. machine configuration (almost the same config as 2st crash): =============== Centos 6.3 Linux somedomain.com 2.6.32-279.2.1.el6.x86_64 #1 SMP Fri Jul 20 01:55:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux opensips -V version: opensips 1.8.1-tls (x86_64/linux) flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, DISABLE_NAGLE, USE_MCAST, SHM_MEM, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535 poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. svnrevision: unknown @(#) $Id: main.c 8772 2012-03-08 11:16:13Z bogdan_iancu $ main.c compiled on 17:04:54 Sep 8 2012 with gcc 4.4.6 opessl version: openssl-1.0.0-20.el6_2.5.x86_64 ================ syslog of the crashed process: http://pastebin.com/F3hu1WY5 please notice something that looks like a memory leak in opensips: CRITICAL:dialog:unref_dlg: bogus ref -1788757169 with cnt 1 for dlg 0x7fe915eccd30 [0:0] with clid '' and tags 'EDH-RSA-DES-CBC3-SHA' '' tags "EDH-RSA-DES-CBC3-SHA" ? these are ssl ciphers. how could they be related to the dialog module ? One particular thing related to the second bug is that just before it crashed, opensips was showing alot of messages like this, coming everytime from the same remote unknown client , but with different source port: ERROR:core:tls_accept: New TLS connection from 82.113.x.x:43574 failed to accept: rejected by client ERROR:core:tls_accept: New TLS connection from 82.113.x.x:61586 failed to accept: rejected by client I have core files, let me know if I can upload them somewhere. Cheers, Dragos ---------------------------------------------------------------------- >Comment By: Dragos Oancea (dragosoancea) Date: 2012-10-02 06:30 Message: Hi Yes, both patches were applied at the same time. The only thing changed is the ping_timeout option which was 20 seconds, now is 40 seconds. But I am not sure this is relevant. It is looking good so far. I had no crash but I also had to restart opensips yesterday because I made a script change to disable tcp keepalive which was draining out the battery of our mobile devices. I also noticed that tcp keepalive is enabled by default in opensips 1.8.1 , the tcp socket is created with the SO_KEEPALIVE option regardless of the state of the tcp_keepalive switch of the script. I will keep you updated. Cheers, Dragos Oancea ---------------------------------------------------------------------- Comment By: Vladut-Stefan Paiu (vladut-paiu) Date: 2012-10-01 05:45 Message: Hello, Ok,cool. Is this with both patches applied ( dlg patch + TLS patch ) ? Please keep us up to date if things remain stable in time. Regards, Vlad ---------------------------------------------------------------------- Comment By: Dragos Oancea (dragosoancea) Date: 2012-10-01 05:00 Message: Hi Thank you for the feedback. I applied the patch three days ago. Looks good so far. Cheers, Dragos ---------------------------------------------------------------------- Comment By: Vladut-Stefan Paiu (vladut-paiu) Date: 2012-09-28 00:42 Message: Hello, Also, for the TLS memory corruption, can you please try the patch for this trunk revision : http://opensips.svn.sourceforge.net/viewvc/opensips?view=revision&revision=9268 It added support for proper locking within the openssl library. It's not yet back-ported to the 1.8 branch, as we felt it needed a little bit more testing. Regards, Vlad ---------------------------------------------------------------------- Comment By: Vladut-Stefan Paiu (vladut-paiu) Date: 2012-09-28 00:32 Message: Hello, Can you apply the attached patch, and see if you still experience these issues ? Thanks and Regards, Vlad ---------------------------------------------------------------------- Comment By: Dragos Oancea (dragosoancea) Date: 2012-09-27 07:55 Message: Hi It happened again. Program terminated with signal 11, Segmentation fault. #0 0x00007fe93ab05491 in unlink_unsafe_dlg (dlg=0x7fe915eccd30, cnt=1) at dlg_hash.c:706 706 dlg->next->prev = dlg->prev; (gdb) bt full #0 0x00007fe93ab05491 in unlink_unsafe_dlg (dlg=0x7fe915eccd30, cnt=1) at dlg_h ash.c:706 No locals. #1 unref_dlg (dlg=0x7fe915eccd30, cnt=1) at dlg_hash.c:740 d_entry = 0x7fe915c3f990 __FUNCTION__ = "unref_dlg" #2 0x00007fe93ab0e7f5 in dlg_ping_routine (ticks=<value optimized out>, attr=<v alue optimized out>) at dlg_timer.c:529 expired = <value optimized out> it = 0x0 curr = 0x0 dlg = 0x7fe915eccd30 __FUNCTION__ = "dlg_ping_routine" #3 0x000000000047d4ca in timer_ticker () at timer.c:355 t = 0x7fe93c19c810 #4 run_timer_process () at timer.c:425 multiple = 10 cnt = <value optimized out> tv = {tv_sec = 0, tv_usec = 0} #5 start_timer_processes () at timer.c:522 tpl = 0x7fe93c20aa60 pid = <value optimized out> __FUNCTION__ = "start_timer_processes" #6 0x000000000042a814 in main_loop (argc=<value optimized out>, argv=<value optimized out>) at main.c:933 i = <value optimized out> pid = <value optimized out> si = 0x0 startup_done = 0x0 chd_rank = 0 rc = <value optimized out> load_p = 0x0 #7 main (argc=<value optimized out>, argv=<value optimized out>) at main.c:1520 cfg_log_stderr = <value optimized out> cfg_stream = <value optimized out> c = <value optimized out> r = <value optimized out> tmp = 0x1 <Address 0x1 out of bounds> tmp_len = <value optimized out> port = <value optimized out> proto = <value optimized out> options = 0x4eb5a8 "f:cCm:M:b:l:n:N:rRvdDETSVhw:t:u:g:P:G:W:o:" ret = -1 seed = 1183817865 rfd = <value optimized out> __FUNCTION__ = "main" This function is called when Opensips would try to end the dialog sending BYE to both sides, due to the fact it did not receive a reply to OPTIONS request that it's sending. These are the comments of a good samaritan on IRC (freenode / #opensips) , in regard to this bug : ....... > how many process have the run_timer_process ? > Only one or multiple ? > the dialog may have been deleted after get_timeout_dlgs() which hold the lock for lock_get(ping_timer->lock); > OR > /* no longer reffed in list */ > unref_dlg(dlg,1); ....... > how many process have the run_timer_process ? > Only one or multiple ? > the dialog may have been deleted after get_timeout_dlgs() which hold the lock for lock_get(ping_timer->lock); > OR > /* no longer reffed in list */ > unref_dlg(dlg,1); > /* dlg is still reffed in TM callback. deletion from memory > * will happen only on 408 timeout */ > so that if you ping a leg > and there is no answer > and while you called get_timeout_dlgs() you received a 408 > the dialog may be destroyed > if it happens randomly when a pinged leg doesn't answer then it might be related to a race condition with the 408 (or any other path that destroy the dialog) > (the dialog extracted from the ping list) .......... PS: Thank you for the comments, these might really help! Cheers, Dragos ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3570495&group_id=232389 _______________________________________________ Devel mailing list Devel@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/devel