On Thu, Jul 12, 2018 at 5:01 PM, Salahuddin Ahmed <txru...@gmail.com> wrote:
> Hello, > > Recently I have experienced a deadlock situation in Asterisk Version > 13.20.0. According to our gdb analysis there have 3 suspicious thread(3, 11 > and 44). Thread 44 push a task to taskpool, thread 11 execute that task and > the thread 3 is a timer event. In thread 11 during a response going to > transmit a group lock has been acquired. and this thread was locked on this > point. In Thread 3 while retransmission timer triggered same group lock > again acquired and try to retransmit response. In this situation asterisk > was blocked. > > Let me explain according to GDB log, > > Thread-03: > #4 0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568, > tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122 > > Thread-11: > #6 0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx@entry=0x7eff8bdd4568, > tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783 > (gdb) p tsx->transport_flag > $2 = 1(TSX_HAS_PENDING_TRANSPORT) > > Here Thread-03 is execute first and try to retransmit a message before > acquire a tsx group lock in Thread-11, because in thread 3 > tsx->transport_flag |= TSX_HAS_PENDING_TRANSPORT has been set, Now if we > read this flag from thread 11 in frame 04 then we have found this flag was > set by someone before acquiring tsx group lock. That means during this > acquiring lock in Thread3, Thread 11 trying to send data on transport. So > this might be the case of deadlock, somehow timer thread 3 execute before > thread-11. > > Here I enclose Thread-3 and Thread-11 gdb backtrace. > > Could anyone please help me on this? > > > Thread-11: > #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/ > linux/x86_64/lowlevellock.S:135 > #1 0x00007f00f62bf494 in _L_lock_952 () from /lib/x86_64-linux-gnu/ > libpthread.so.0 > #2 0x00007f00f62bf2f6 in __GI___pthread_mutex_lock (mutex=0x7eff41197c58) > at ../nptl/pthread_mutex_lock.c:114 > #3 0x00007f00b97e49a5 in pj_mutex_lock (mutex=0x7eff41197c58) at > ../src/pj/os_core_unix.c:1265 > #4 0x00007f00b97eace1 in grp_lock_acquire (p=0x7eff41197b48) at > ../src/pj/lock.c:290 > #5 0x00007f00b97ead22 in pj_grp_lock_acquire (grp_lock=<optimized out>) > at ../src/pj/lock.c:478 > #6 0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx@entry=0x7eff8bdd4568, > tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783 > #7 0x00007f00ba6de952 in pjsip_dlg_send_response (dlg=<optimized out>, > tsx=0x7eff8bdd4568, tdata=tdata@entry=0x7eff892ae668) at > ../src/pjsip/sip_dialog.c:1531 > #8 0x00007f00bab1c09a in pjsip_inv_send_msg (inv=0x7eff88bdf9b8, > tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip-ua/sip_inv.c:3231 > #9 0x00007f0092874b4c in ast_sip_session_send_response > (session=session@entry=0x7eff89367700, tdata=0x7eff892ae668) at > res_pjsip_session.c:983 > #10 0x00007f002a431841 in answer (data=0x7eff89367700) at chan_pjsip.c:591 > #11 0x00007f0094b7a490 in sync_task (data=0x7efdbcf8b810) at > res_pjsip.c:4268 > #12 0x00000000005d52ce in ast_taskprocessor_execute (tps=tps@entry=0x267f9b0) > at taskprocessor.c:971 > #13 0x00000000005dcc80 in execute_tasks (data=0x267f9b0) at > threadpool.c:1322 > #14 0x00000000005d52ce in ast_taskprocessor_execute (tps=0x23e0580) at > taskprocessor.c:971 > #15 0x00000000005dd558 in threadpool_execute (pool=<optimized out>) at > threadpool.c:351 > #16 worker_active (worker=<optimized out>) at threadpool.c:1105 > #17 worker_start (arg=0x7f0080001680) at threadpool.c:1024 > #18 0x00000000005e6dfa in dummy_start (data=<optimized out>) at > utils.c:1238 > #19 0x00007f00f62bd064 in start_thread (arg=0x7eff67c9b700) at > pthread_create.c:309 > #20 0x00007f00f55a562d in clone () at ../sysdeps/unix/sysv/linux/ > x86_64/clone.S:111 > > > Thread-3: > #0 sanitize_tdata (tdata=<optimized out>) at res_pjsip/pjsip_message_ > filter.c:211 > #1 filter_on_tx_message (tdata=0x7eff892ae668) at res_pjsip/pjsip_message_ > filter.c:233 > #2 0x00007f00ba6c5b0e in endpt_on_tx_msg (endpt=<optimized out>, > tdata=0x7eff892ae668) at ../src/pjsip/sip_endpoint.c:1088 > #3 0x00007f00ba6cc5c0 in pjsip_transport_send (tr=0x2677718, > tdata=tdata@entry=0x7eff892ae668, addr=addr@entry=0x7eff8bdd4644, > addr_len=16, token=token@entry=0x7eff8bdd4568, > cb=cb@entry=0x7f00ba6d91b0 <transport_callback>) at > ../src/pjsip/sip_transport.c:833 > #4 0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568, > tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122 > #5 0x00007f00ba6da8da in tsx_on_state_proceeding_uas (tsx=0x7eff8bdd4568, > event=<optimized out>) at ../src/pjsip/sip_transaction.c:2882 > #6 0x00007f00ba6d9b86 in tsx_timer_callback (theap=<optimized out>, > entry=0x7eff8bdd46e8) at ../src/pjsip/sip_transaction.c:1189 > #7 0x00007f00b97f6757 in pj_timer_heap_poll (ht=0x23e53a0, > next_delay=next_delay@entry=0x7f0090649e50) at ../src/pj/timer.c:644 > #8 0x00007f00ba6c65c4 in pjsip_endpt_handle_events2 (endpt=0x23e50b8, > max_timeout=max_timeout@entry=0x7f0090649ea0, p_count=p_count@entry=0x0) > at ../src/pjsip/sip_endpoint.c:715 > #9 0x00007f00ba6c66d7 in pjsip_endpt_handle_events (endpt=<optimized > out>, max_timeout=max_timeout@entry=0x7f0090649ea0) at > ../src/pjsip/sip_endpoint.c:776 > #10 0x00007f0094b7a118 in monitor_thread_exec (endpt=<optimized out>) at > res_pjsip.c:4345 > #11 0x00007f00b97e4ca0 in thread_main (param=0x23e3c68) at > ../src/pj/os_core_unix.c:541 > #12 0x00007f00f62bd064 in start_thread (arg=0x7f009064a700) at > pthread_create.c:309 > #13 0x00007f00f55a562d in clone () at ../sysdeps/unix/sysv/linux/ > x86_64/clone.S:111 > > * The attached backtrace does not match the threads you are highlighting in your email. In fact the attached backtrace was done with a non-bundled pjproject without any symbols. * Thread 3 above should have acquired the transaction group lock in tsx_timer_callback() before calling the state handler routine. From there the pjproject line numbers do not match the bundled pjproject version for 13.20.0. Also the thread is not blocked so I don't see how it as participating in a deadlock unless you think it is in an endless loop. You *really* need to be using bundled pjproject [1] for the reasons listed on that page. Richard [1] https://wiki.asterisk.org/wiki/display/AST/PJSIP-pjproject
-- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev