> On Dec. 12, 2013, 4:03 p.m., Mark Michelson wrote:
> > Josh and I talked on IRC, and it would really be great if the channels 
> > involved in the masquerade could be locked for the duration of the 
> > operation instead of having to unlock them in order to send indications to 
> > them. That way, it could be guaranteed that any time something acquires a 
> > channel lock, it is not interrupting the middle of a masquerade and can 
> > freely do what it needs to. I think such refactoring could actually be 
> > done, but I also feel like since that sort of refactoring does not directly 
> > affect the problem being fixed here, it should be saved for a separate 
> > review.

Agreed. If (when?) this goes in, I'd make a new issue in JIRA for that part of 
it.


- Matt


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3069/#review10408
-----------------------------------------------------------


On Dec. 12, 2013, 3:59 p.m., Mark Michelson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/3069/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2013, 3:59 p.m.)
> 
> 
> Review request for Asterisk Developers, Joshua Colp, Jonathan Rose, Matt 
> Jordan, and rmudgett.
> 
> 
> Bugs: ASTERISK-22936
>     https://issues.asterisk.org/jira/browse/ASTERISK-22936
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> This is more-or-less a take 2 of https://reviewboard.asterisk.org/r/3042
> 
> Initially, it was discovered that performing an attended transfer of a 
> multiparty bridge with a PJSIP channel would cause a deadlock. A PBX thread 
> started a masquerade and reached the point where it was calling the fixup() 
> callback on the "original" channel. For chan_pjsip, this involves pushing a 
> synchronous task to the session's serializer. The problem was that a task 
> ahead of the fixup task was also attempting to perform a channel masquerade. 
> However, since masquerades are designed in a way to only allow for one to 
> occur at a time, the task ahead of the fixup could not continue until the 
> masquerade already in progress had completed. And of course, the masquerade 
> in progress could not complete until the task ahead of the fixup task had 
> completed. Deadlock.
> 
> The initial fix was to change the fixup task to be asynchronous. While this 
> prevented the deadlock from occurring, it had the frightful side effect of 
> potentially allowing for tasks in the session's serializer to operate on a 
> zombie channel.
> 
> Taking a step back from this particular deadlock, it became clear that the 
> problem was not really this one particular issue but that masquerades 
> themselves needed to be addressed. A PJSIP attended transfer operation calls 
> ast_channel_move(), which attempts to both set up and execute a masquerade. 
> The problem was that after it had set up the masquerade, the PBX thread had 
> swooped in and tried to actually perform the masquerade. Looking at changes 
> that had been made to Asterisk 12, it became clear that there never is any 
> time now that anyone ever wants to set up a masquerade and allow for the 
> channel thread to actually perform the masquerade. Everyone always is calling 
> ast_channel_move(), performs the masquerade itself before returning.
> 
> In this patch, I have removed all blocks of code from channel.c that will 
> attempt to perform a masquerade if ast_channel_masq() returns true. Now, 
> there is no distinction between setting up a masquerade and performing the 
> masquerade. It is one operation. The only remaining checks for 
> ast_channel_masq() and ast_channel_masqr() are in ast_hangup() since we do 
> not want to interrupt a masquerade by hanging up the channel. Instead, now 
> ast_hangup() will wait for a masquerade to complete before moving forward 
> with its operation.
> 
> The ast_channel_move() function has been modified to basically in-line the 
> logic that used to be in ast_channel_masquerade(). ast_channel_masquerade() 
> has been killed off for real. ast_channel_move() now has a lock associated 
> with it that is used to prevent any simultaneous moves from occurring at 
> once. This means there is no need to make sure that ast_channel_masq() or 
> ast_channel_masqr() are already set on a channel when ast_channel_move() is 
> called. It also means the channel container lock is not pulling double duty 
> by both keeping the container locked and preventing multiple masquerades from 
> occurring simultaneously.
> 
> The ast_do_masquerade() function has been renamed to do_channel_masquerade() 
> and is now internal to channel.c. The function now takes explicit arguments 
> of which channels are involved in the masquerade instead of a single channel. 
> While it probably is possible to do some further refactoring of this method, 
> I feel that I would be treading dangerously. Instead, all I did was change 
> some comments that no longer are true after this changeset.
> 
> The other more minor change introduced in this patch is to res_pjsip.c to 
> make ast_sip_push_task_synchronous() run the task in-place if we are already 
> a SIP servant thread. This is related to this patch because even when we 
> isolate the channel masquerade to only running in the SIP servant thread, we 
> would still deadlock when the fixup() callback is reached since we would 
> essentially be waiting forever for ourselves to finish before actually 
> running the fixup. This makes it so the fixup is run without having to push a 
> task into a serializer at all.
> 
> 
> Diffs
> -----
> 
>   /branches/12/res/res_pjsip.c 403349 
>   /branches/12/main/channel.c 403343 
>   /branches/12/include/asterisk/channel.h 403343 
> 
> Diff: https://reviewboard.asterisk.org/r/3069/diff/
> 
> 
> Testing
> -------
> 
> Jonathan ran the same test he had originally run in order to make the 
> deadlock occur, and with this patch it no longer occurs. I also ran a bevy of 
> testsuite tests to see if there were any that did not pass with this patch 
> but did pass without the patch. I found none on my machine, though I'll admit 
> I did not run every test.
> 
> 
> Thanks,
> 
> Mark Michelson
> 
>

-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to