-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3069/
-----------------------------------------------------------

Review request for Asterisk Developers, Joshua Colp, Jonathan Rose, Matt 
Jordan, and rmudgett.


Bugs: ASTERISK-22936
    https://issues.asterisk.org/jira/browse/ASTERISK-22936


Repository: Asterisk


Description
-------

This is more-or-less a take 2 of https://reviewboard.asterisk.org/r/3042

Initially, it was discovered that performing an attended transfer of a 
multiparty bridge with a PJSIP channel would cause a deadlock. A PBX thread 
started a masquerade and reached the point where it was calling the fixup() 
callback on the "original" channel. For chan_pjsip, this involves pushing a 
synchronous task to the session's serializer. The problem was that a task ahead 
of the fixup task was also attempting to perform a channel masquerade. However, 
since masquerades are designed in a way to only allow for one to occur at a 
time, the task ahead of the fixup could not continue until the masquerade 
already in progress had completed. And of course, the masquerade in progress 
could not complete until the task ahead of the fixup task had completed. 
Deadlock.

The initial fix was to change the fixup task to be asynchronous. While this 
prevented the deadlock from occurring, it had the frightful side effect of 
potentially allowing for tasks in the session's serializer to operate on a 
zombie channel.

Taking a step back from this particular deadlock, it became clear that the 
problem was not really this one particular issue but that masquerades 
themselves needed to be addressed. A PJSIP attended transfer operation calls 
ast_channel_move(), which attempts to both set up and execute a masquerade. The 
problem was that after it had set up the masquerade, the PBX thread had swooped 
in and tried to actually perform the masquerade. Looking at changes that had 
been made to Asterisk 12, it became clear that there never is any time now that 
anyone ever wants to set up a masquerade and allow for the channel thread to 
actually perform the masquerade. Everyone always is calling ast_channel_move(), 
performs the masquerade itself before returning.

In this patch, I have removed all blocks of code from channel.c that will 
attempt to perform a masquerade if ast_channel_masq() returns true. Now, there 
is no distinction between setting up a masquerade and performing the 
masquerade. It is one operation. The only remaining checks for 
ast_channel_masq() and ast_channel_masqr() are in ast_hangup() since we do not 
want to interrupt a masquerade by hanging up the channel. Instead, now 
ast_hangup() will wait for a masquerade to complete before moving forward with 
its operation.

The ast_channel_move() function has been modified to basically in-line the 
logic that used to be in ast_channel_masquerade(). ast_channel_masquerade() has 
been killed off for real. ast_channel_move() now has a lock associated with it 
that is used to prevent any simultaneous moves from occurring at once. This 
means there is no need to make sure that ast_channel_masq() or 
ast_channel_masqr() are already set on a channel when ast_channel_move() is 
called. It also means the channel container lock is not pulling double duty by 
both keeping the container locked and preventing multiple masquerades from 
occurring simultaneously.

The ast_do_masquerade() function has been renamed to do_channel_masquerade() 
and is now internal to channel.c. The function now takes explicit arguments of 
which channels are involved in the masquerade instead of a single channel. 
While it probably is possible to do some further refactoring of this method, I 
feel that I would be treading dangerously. Instead, all I did was change some 
comments that no longer are true after this changeset.

The other more minor change introduced in this patch is to res_pjsip.c to make 
ast_sip_push_task_synchronous() run the task in-place if we are already a SIP 
servant thread. This is related to this patch because even when we isolate the 
channel masquerade to only running in the SIP servant thread, we would still 
deadlock when the fixup() callback is reached since we would essentially be 
waiting forever for ourselves to finish before actually running the fixup. This 
makes it so the fixup is run without having to push a task into a serializer at 
all.


Diffs
-----

  /branches/12/res/res_pjsip.c 403349 
  /branches/12/main/channel.c 403343 
  /branches/12/include/asterisk/channel.h 403343 

Diff: https://reviewboard.asterisk.org/r/3069/diff/


Testing
-------

Jonathan ran the same test he had originally run in order to make the deadlock 
occur, and with this patch it no longer occurs. I also ran a bevy of testsuite 
tests to see if there were any that did not pass with this patch but did pass 
without the patch. I found none on my machine, though I'll admit I did not run 
every test.


Thanks,

Mark Michelson

-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to