> On Dec. 12, 2013, 4:03 p.m., Mark Michelson wrote: > > Josh and I talked on IRC, and it would really be great if the channels > > involved in the masquerade could be locked for the duration of the > > operation instead of having to unlock them in order to send indications to > > them. That way, it could be guaranteed that any time something acquires a > > channel lock, it is not interrupting the middle of a masquerade and can > > freely do what it needs to. I think such refactoring could actually be > > done, but I also feel like since that sort of refactoring does not directly > > affect the problem being fixed here, it should be saved for a separate > > review.
Agreed. If (when?) this goes in, I'd make a new issue in JIRA for that part of it. - Matt ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviewboard.asterisk.org/r/3069/#review10408 ----------------------------------------------------------- On Dec. 12, 2013, 3:59 p.m., Mark Michelson wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviewboard.asterisk.org/r/3069/ > ----------------------------------------------------------- > > (Updated Dec. 12, 2013, 3:59 p.m.) > > > Review request for Asterisk Developers, Joshua Colp, Jonathan Rose, Matt > Jordan, and rmudgett. > > > Bugs: ASTERISK-22936 > https://issues.asterisk.org/jira/browse/ASTERISK-22936 > > > Repository: Asterisk > > > Description > ------- > > This is more-or-less a take 2 of https://reviewboard.asterisk.org/r/3042 > > Initially, it was discovered that performing an attended transfer of a > multiparty bridge with a PJSIP channel would cause a deadlock. A PBX thread > started a masquerade and reached the point where it was calling the fixup() > callback on the "original" channel. For chan_pjsip, this involves pushing a > synchronous task to the session's serializer. The problem was that a task > ahead of the fixup task was also attempting to perform a channel masquerade. > However, since masquerades are designed in a way to only allow for one to > occur at a time, the task ahead of the fixup could not continue until the > masquerade already in progress had completed. And of course, the masquerade > in progress could not complete until the task ahead of the fixup task had > completed. Deadlock. > > The initial fix was to change the fixup task to be asynchronous. While this > prevented the deadlock from occurring, it had the frightful side effect of > potentially allowing for tasks in the session's serializer to operate on a > zombie channel. > > Taking a step back from this particular deadlock, it became clear that the > problem was not really this one particular issue but that masquerades > themselves needed to be addressed. A PJSIP attended transfer operation calls > ast_channel_move(), which attempts to both set up and execute a masquerade. > The problem was that after it had set up the masquerade, the PBX thread had > swooped in and tried to actually perform the masquerade. Looking at changes > that had been made to Asterisk 12, it became clear that there never is any > time now that anyone ever wants to set up a masquerade and allow for the > channel thread to actually perform the masquerade. Everyone always is calling > ast_channel_move(), performs the masquerade itself before returning. > > In this patch, I have removed all blocks of code from channel.c that will > attempt to perform a masquerade if ast_channel_masq() returns true. Now, > there is no distinction between setting up a masquerade and performing the > masquerade. It is one operation. The only remaining checks for > ast_channel_masq() and ast_channel_masqr() are in ast_hangup() since we do > not want to interrupt a masquerade by hanging up the channel. Instead, now > ast_hangup() will wait for a masquerade to complete before moving forward > with its operation. > > The ast_channel_move() function has been modified to basically in-line the > logic that used to be in ast_channel_masquerade(). ast_channel_masquerade() > has been killed off for real. ast_channel_move() now has a lock associated > with it that is used to prevent any simultaneous moves from occurring at > once. This means there is no need to make sure that ast_channel_masq() or > ast_channel_masqr() are already set on a channel when ast_channel_move() is > called. It also means the channel container lock is not pulling double duty > by both keeping the container locked and preventing multiple masquerades from > occurring simultaneously. > > The ast_do_masquerade() function has been renamed to do_channel_masquerade() > and is now internal to channel.c. The function now takes explicit arguments > of which channels are involved in the masquerade instead of a single channel. > While it probably is possible to do some further refactoring of this method, > I feel that I would be treading dangerously. Instead, all I did was change > some comments that no longer are true after this changeset. > > The other more minor change introduced in this patch is to res_pjsip.c to > make ast_sip_push_task_synchronous() run the task in-place if we are already > a SIP servant thread. This is related to this patch because even when we > isolate the channel masquerade to only running in the SIP servant thread, we > would still deadlock when the fixup() callback is reached since we would > essentially be waiting forever for ourselves to finish before actually > running the fixup. This makes it so the fixup is run without having to push a > task into a serializer at all. > > > Diffs > ----- > > /branches/12/res/res_pjsip.c 403349 > /branches/12/main/channel.c 403343 > /branches/12/include/asterisk/channel.h 403343 > > Diff: https://reviewboard.asterisk.org/r/3069/diff/ > > > Testing > ------- > > Jonathan ran the same test he had originally run in order to make the > deadlock occur, and with this patch it no longer occurs. I also ran a bevy of > testsuite tests to see if there were any that did not pass with this patch > but did pass without the patch. I found none on my machine, though I'll admit > I did not run every test. > > > Thanks, > > Mark Michelson > >
-- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev