> On Aug. 24, 2014, 3:56 p.m., Joshua Colp wrote: > > My only concern with this is that we somehow have code which expected to be > > invoked at #3 - do you think this is possible at all? > > Joshua Colp wrote: > It may also be useful to document your flow discovery above within the > code somewhere for future reference.
The only thing I could think of is if a session supplement specifically wants to be called before media negotiation so that it may play a part in how media ends up getting negotiated. Does anything do that? - Mark ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviewboard.asterisk.org/r/3930/#review13157 ----------------------------------------------------------- On Aug. 24, 2014, 3:52 p.m., Mark Michelson wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviewboard.asterisk.org/r/3930/ > ----------------------------------------------------------- > > (Updated Aug. 24, 2014, 3:52 p.m.) > > > Review request for Asterisk Developers. > > > Bugs: ASTERISK-24212 > https://issues.asterisk.org/jira/browse/ASTERISK-24212 > > > Repository: Asterisk > > > Description > ------- > > In /r/3927, I introduced a changeset that fixes a scheduler race condition. I > noted on that issue that even though I was no longer seeing the same crashes > when running tests, I was still seeing occasional test failures due to some > media-related race condition. > > I have determined the cause of this race condition. PJSIP's inv_session API > provides two callbacks, on_state_changed() and on_tsx_state_changed(). Both > of these are called into when the state of a transaction changes, and the > documentation is not particularly clear about what the difference is between > these. Through some investigation, I've found that what happens is: > > 1) Transaction layer detects a state change (such as a request/response being > sent/received, or a timeout) > 2) Transaction layer informs the inv_session layer. > 3) inv_session layer sets the new state and calls the on_state_changed() > callback. > 4) inv_session layer performs its processing of the transaction state change > (including potential calls to the on_media_update() callback). > 5) inv_session layer calls the on_tsx_state_changed() callback. > > res_pjsip_session is hooked into step (3) such that when a new SIP request or > response is sent or received, we call into session supplements to let them > react to the new message. One of these session supplements is in chan_pjsip > and queues an AST_CONTROL_ANSWER frame when a 200 OK response is received. > > In the test where I am seeing the race condition occur, a call is originated > to a PJSIP channel (Alice). Once Alice answers, her channel is placed into > the dialplan to execute the TalkDetect() application. The problem here is > that at step (3), the AST_CONTROL_ANSWER frame is queued onto Alice's > channel, and the PBX thread moves the channel into the TalkDetect() > application before step (4) is executed. Step (4) is important for setting > formats on Alice's channel and setting appropriate translation paths as well. > Since step (4) has not been completed yet, the attempt to play a file as part > of TalkDetect() fails, and therefore the test fails. In most runs, however, > step (4) does get completed before the PBX thread moves Alice's channel into > TalkDetect(). > > The fix presented here is pretty simple. For transaction state changes, we > are now hooked into step (5) instead of step (3) to call session supplements. > This means that we do not call into session supplements until media has been > negotiated on the channel. This eliminates the race condition entirely. > > > Diffs > ----- > > /branches/12/res/res_pjsip_session.c 421883 > > Diff: https://reviewboard.asterisk.org/r/3930/diff/ > > > Testing > ------- > > I ran the > tests/channels/pjsip/basic_calls/two_parties/nominal/alice_initiated/alice_hangs_up > test in a loop on my console both before and after applying this patch. > Before applying the patch, I would usually encounter a test failure within an > hour or two. After applying this patch, I left the test running in a loop for > over 24 hours and never had a test failure. > > > Thanks, > > Mark Michelson > >
-- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev