Re: [asterisk-dev] Deadlock between transaction and transport

2018-07-13 Thread Richard Mudgett
On Thu, Jul 12, 2018 at 5:01 PM, Salahuddin Ahmed  wrote:

> Hello,
>
> Recently I have experienced a deadlock situation in Asterisk Version
> 13.20.0. According to our gdb analysis there have 3 suspicious thread(3, 11
> and 44). Thread 44 push a task to taskpool, thread 11 execute that task and
> the thread 3 is a timer event. In thread 11 during a response going to
> transmit a group lock has been acquired. and this thread was locked on this
> point. In Thread 3 while retransmission timer triggered same group lock
> again acquired and try to retransmit response. In this situation asterisk
> was blocked.
>
> Let me explain according to GDB log,
>
> Thread-03:
> #4  0x7f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
> tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
>
> Thread-11:
> #6  0x7f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx@entry=0x7eff8bdd4568,
> tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
> (gdb) p tsx->transport_flag
> $2 = 1(TSX_HAS_PENDING_TRANSPORT)
>
> Here Thread-03 is execute first and try to retransmit a message before
> acquire a tsx group lock in Thread-11, because in thread 3
> tsx->transport_flag |= TSX_HAS_PENDING_TRANSPORT has been set, Now if we
> read this flag from thread 11 in frame 04 then we have found this flag was
> set by someone before acquiring tsx group lock. That means during this
> acquiring lock in Thread3, Thread 11 trying to send data on transport. So
> this might be the case of deadlock, somehow timer thread 3 execute before
> thread-11.
>
> Here I enclose Thread-3 and Thread-11 gdb backtrace.
>
> Could anyone please help me on this?
>
>
> Thread-11:
> #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/
> linux/x86_64/lowlevellock.S:135
> #1  0x7f00f62bf494 in _L_lock_952 () from /lib/x86_64-linux-gnu/
> libpthread.so.0
> #2  0x7f00f62bf2f6 in __GI___pthread_mutex_lock (mutex=0x7eff41197c58)
> at ../nptl/pthread_mutex_lock.c:114
> #3  0x7f00b97e49a5 in pj_mutex_lock (mutex=0x7eff41197c58) at
> ../src/pj/os_core_unix.c:1265
> #4  0x7f00b97eace1 in grp_lock_acquire (p=0x7eff41197b48) at
> ../src/pj/lock.c:290
> #5  0x7f00b97ead22 in pj_grp_lock_acquire (grp_lock=)
> at ../src/pj/lock.c:478
> #6  0x7f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx@entry=0x7eff8bdd4568,
> tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
> #7  0x7f00ba6de952 in pjsip_dlg_send_response (dlg=,
> tsx=0x7eff8bdd4568, tdata=tdata@entry=0x7eff892ae668) at
> ../src/pjsip/sip_dialog.c:1531
> #8  0x7f00bab1c09a in pjsip_inv_send_msg (inv=0x7eff88bdf9b8,
> tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip-ua/sip_inv.c:3231
> #9  0x7f0092874b4c in ast_sip_session_send_response
> (session=session@entry=0x7eff89367700, tdata=0x7eff892ae668) at
> res_pjsip_session.c:983
> #10 0x7f002a431841 in answer (data=0x7eff89367700) at chan_pjsip.c:591
> #11 0x7f0094b7a490 in sync_task (data=0x7efdbcf8b810) at
> res_pjsip.c:4268
> #12 0x005d52ce in ast_taskprocessor_execute (tps=tps@entry=0x267f9b0)
> at taskprocessor.c:971
> #13 0x005dcc80 in execute_tasks (data=0x267f9b0) at
> threadpool.c:1322
> #14 0x005d52ce in ast_taskprocessor_execute (tps=0x23e0580) at
> taskprocessor.c:971
> #15 0x005dd558 in threadpool_execute (pool=) at
> threadpool.c:351
> #16 worker_active (worker=) at threadpool.c:1105
> #17 worker_start (arg=0x7f0080001680) at threadpool.c:1024
> #18 0x005e6dfa in dummy_start (data=) at
> utils.c:1238
> #19 0x7f00f62bd064 in start_thread (arg=0x7eff67c9b700) at
> pthread_create.c:309
> #20 0x7f00f55a562d in clone () at ../sysdeps/unix/sysv/linux/
> x86_64/clone.S:111
>
>
> Thread-3:
> #0  sanitize_tdata (tdata=) at res_pjsip/pjsip_message_
> filter.c:211
> #1  filter_on_tx_message (tdata=0x7eff892ae668) at res_pjsip/pjsip_message_
> filter.c:233
> #2  0x7f00ba6c5b0e in endpt_on_tx_msg (endpt=,
> tdata=0x7eff892ae668) at ../src/pjsip/sip_endpoint.c:1088
> #3  0x7f00ba6cc5c0 in pjsip_transport_send (tr=0x2677718,
> tdata=tdata@entry=0x7eff892ae668, addr=addr@entry=0x7eff8bdd4644,
> addr_len=16, token=token@entry=0x7eff8bdd4568,
> cb=cb@entry=0x7f00ba6d91b0 ) at
> ../src/pjsip/sip_transport.c:833
> #4  0x7f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
> tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
> #5  0x7f00ba6da8da in tsx_on_state_proceeding_uas (tsx=0x7eff8bdd4568,
> event=) at ../src/pjsip/sip_transaction.c:2882
> #6  0x7f00ba6d9b86 in tsx_timer_callback (theap=,
> entry=0x7eff8bdd46e8) at ../src/pjsip/sip_transaction.c:1189
> #7  0x7f00b97f6757 in pj_timer_heap_poll (ht=0x23e53a0,
> next_delay=next_delay@entry=0x7f0090649e50) at ../src/pj/timer.c:644
> #8  0x7f00ba6c65c4 in pjsip_endpt_handle_events2 (endpt=0x23e50b8,
> max_timeout=max_timeout@entry=0x7f0090649ea0, p_count=p_count@entry=0x0)
> at ../src/pjsip/sip_endpoint.c:715
> #9  0x7f00ba6c66d7 in pjsip_endpt_handle_events 

[asterisk-dev] Deadlock between transaction and transport

2018-07-12 Thread Salahuddin Ahmed
Hello,

Recently I have experienced a deadlock situation in Asterisk Version
13.20.0. According to our gdb analysis there have 3 suspicious thread(3, 11
and 44). Thread 44 push a task to taskpool, thread 11 execute that task and
the thread 3 is a timer event. In thread 11 during a response going to
transmit a group lock has been acquired. and this thread was locked on this
point. In Thread 3 while retransmission timer triggered same group lock
again acquired and try to retransmit response. In this situation asterisk
was blocked.

Let me explain according to GDB log,

Thread-03:
#4  0x7f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122

Thread-11:
#6  0x7f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx@entry=0x7eff8bdd4568,
tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
(gdb) p tsx->transport_flag
$2 = 1(TSX_HAS_PENDING_TRANSPORT)

Here Thread-03 is execute first and try to retransmit a message before
acquire a tsx group lock in Thread-11, because in thread 3
tsx->transport_flag |= TSX_HAS_PENDING_TRANSPORT has been set, Now if we
read this flag from thread 11 in frame 04 then we have found this flag was
set by someone before acquiring tsx group lock. That means during this
acquiring lock in Thread3, Thread 11 trying to send data on transport. So
this might be the case of deadlock, somehow timer thread 3 execute before
thread-11.

Here I enclose Thread-3 and Thread-11 gdb backtrace.

Could anyone please help me on this?


Thread-11:
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f00f62bf494 in _L_lock_952 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x7f00f62bf2f6 in __GI___pthread_mutex_lock (mutex=0x7eff41197c58)
at ../nptl/pthread_mutex_lock.c:114
#3  0x7f00b97e49a5 in pj_mutex_lock (mutex=0x7eff41197c58) at
../src/pj/os_core_unix.c:1265
#4  0x7f00b97eace1 in grp_lock_acquire (p=0x7eff41197b48) at
../src/pj/lock.c:290
#5  0x7f00b97ead22 in pj_grp_lock_acquire (grp_lock=) at
../src/pj/lock.c:478
#6  0x7f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx@entry=0x7eff8bdd4568,
tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
#7  0x7f00ba6de952 in pjsip_dlg_send_response (dlg=,
tsx=0x7eff8bdd4568, tdata=tdata@entry=0x7eff892ae668) at
../src/pjsip/sip_dialog.c:1531
#8  0x7f00bab1c09a in pjsip_inv_send_msg (inv=0x7eff88bdf9b8,
tdata=tdata@entry=0x7eff892ae668) at ../src/pjsip-ua/sip_inv.c:3231
#9  0x7f0092874b4c in ast_sip_session_send_response
(session=session@entry=0x7eff89367700, tdata=0x7eff892ae668) at
res_pjsip_session.c:983
#10 0x7f002a431841 in answer (data=0x7eff89367700) at chan_pjsip.c:591
#11 0x7f0094b7a490 in sync_task (data=0x7efdbcf8b810) at
res_pjsip.c:4268
#12 0x005d52ce in ast_taskprocessor_execute (tps=tps@entry=0x267f9b0)
at taskprocessor.c:971
#13 0x005dcc80 in execute_tasks (data=0x267f9b0) at
threadpool.c:1322
#14 0x005d52ce in ast_taskprocessor_execute (tps=0x23e0580) at
taskprocessor.c:971
#15 0x005dd558 in threadpool_execute (pool=) at
threadpool.c:351
#16 worker_active (worker=) at threadpool.c:1105
#17 worker_start (arg=0x7f0080001680) at threadpool.c:1024
#18 0x005e6dfa in dummy_start (data=) at utils.c:1238
#19 0x7f00f62bd064 in start_thread (arg=0x7eff67c9b700) at
pthread_create.c:309
#20 0x7f00f55a562d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Thread-3:
#0  sanitize_tdata (tdata=) at
res_pjsip/pjsip_message_filter.c:211
#1  filter_on_tx_message (tdata=0x7eff892ae668) at
res_pjsip/pjsip_message_filter.c:233
#2  0x7f00ba6c5b0e in endpt_on_tx_msg (endpt=,
tdata=0x7eff892ae668) at ../src/pjsip/sip_endpoint.c:1088
#3  0x7f00ba6cc5c0 in pjsip_transport_send (tr=0x2677718,
tdata=tdata@entry=0x7eff892ae668, addr=addr@entry=0x7eff8bdd4644,
addr_len=16, token=token@entry=0x7eff8bdd4568,
cb=cb@entry=0x7f00ba6d91b0 ) at
../src/pjsip/sip_transport.c:833
#4  0x7f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
#5  0x7f00ba6da8da in tsx_on_state_proceeding_uas (tsx=0x7eff8bdd4568,
event=) at ../src/pjsip/sip_transaction.c:2882
#6  0x7f00ba6d9b86 in tsx_timer_callback (theap=,
entry=0x7eff8bdd46e8) at ../src/pjsip/sip_transaction.c:1189
#7  0x7f00b97f6757 in pj_timer_heap_poll (ht=0x23e53a0,
next_delay=next_delay@entry=0x7f0090649e50) at ../src/pj/timer.c:644
#8  0x7f00ba6c65c4 in pjsip_endpt_handle_events2 (endpt=0x23e50b8,
max_timeout=max_timeout@entry=0x7f0090649ea0, p_count=p_count@entry=0x0) at
../src/pjsip/sip_endpoint.c:715
#9  0x7f00ba6c66d7 in pjsip_endpt_handle_events (endpt=,
max_timeout=max_timeout@entry=0x7f0090649ea0) at
../src/pjsip/sip_endpoint.c:776
#10 0x7f0094b7a118 in monitor_thread_exec (endpt=) at
res_pjsip.c:4345
#11 0x7f00b97e4ca0 in thread_main (param=0x23e3c68) at
../src/pj/os_core_unix.c:541
#12