> On Aug. 24, 2014, 3:56 p.m., Joshua Colp wrote:
> > My only concern with this is that we somehow have code which expected to be 
> > invoked at #3 - do you think this is possible at all?
> 
> Joshua Colp wrote:
>     It may also be useful to document your flow discovery above within the 
> code somewhere for future reference.

The only thing I could think of is if a session supplement specifically wants 
to be called before media negotiation so that it may play a part in how media 
ends up getting negotiated. Does anything do that?


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3930/#review13157
-----------------------------------------------------------


On Aug. 24, 2014, 3:52 p.m., Mark Michelson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/3930/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2014, 3:52 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Bugs: ASTERISK-24212
>     https://issues.asterisk.org/jira/browse/ASTERISK-24212
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> In /r/3927, I introduced a changeset that fixes a scheduler race condition. I 
> noted on that issue that even though I was no longer seeing the same crashes 
> when running tests, I was still seeing occasional test failures due to some 
> media-related race condition.
> 
> I have determined the cause of this race condition. PJSIP's inv_session API 
> provides two callbacks, on_state_changed() and on_tsx_state_changed(). Both 
> of these are called into when the state of a transaction changes, and the 
> documentation is not particularly clear about what the difference is between 
> these. Through some investigation, I've found that what happens is:
> 
> 1) Transaction layer detects a state change (such as a request/response being 
> sent/received, or a timeout)
> 2) Transaction layer informs the inv_session layer.
> 3) inv_session layer sets the new state and calls the on_state_changed() 
> callback.
> 4) inv_session layer performs its processing of the transaction state change 
> (including potential calls to the on_media_update() callback).
> 5) inv_session layer calls the on_tsx_state_changed() callback.
> 
> res_pjsip_session is hooked into step (3) such that when a new SIP request or 
> response is sent or received, we call into session supplements to let them 
> react to the new message. One of these session supplements is in chan_pjsip 
> and queues an AST_CONTROL_ANSWER frame when a 200 OK response is received.
> 
> In the test where I am seeing the race condition occur, a call is originated 
> to a PJSIP channel (Alice). Once Alice answers, her channel is placed into 
> the dialplan to execute the TalkDetect() application. The problem here is 
> that at step (3), the AST_CONTROL_ANSWER frame is queued onto Alice's 
> channel, and the PBX thread moves the channel into the TalkDetect() 
> application before step (4) is executed. Step (4) is important for setting 
> formats on Alice's channel and setting appropriate translation paths as well. 
> Since step (4) has not been completed yet, the attempt to play a file as part 
> of TalkDetect() fails, and therefore the test fails. In most runs, however, 
> step (4) does get completed before the PBX thread moves Alice's channel into 
> TalkDetect().
> 
> The fix presented here is pretty simple. For transaction state changes, we 
> are now hooked into step (5) instead of step (3) to call session supplements. 
> This means that we do not call into session supplements until media has been 
> negotiated on the channel. This eliminates the race condition entirely.
> 
> 
> Diffs
> -----
> 
>   /branches/12/res/res_pjsip_session.c 421883 
> 
> Diff: https://reviewboard.asterisk.org/r/3930/diff/
> 
> 
> Testing
> -------
> 
> I ran the 
> tests/channels/pjsip/basic_calls/two_parties/nominal/alice_initiated/alice_hangs_up
>  test in a loop on my console both before and after applying this patch. 
> Before applying the patch, I would usually encounter a test failure within an 
> hour or two. After applying this patch, I left the test running in a loop for 
> over 24 hours and never had a test failure.
> 
> 
> Thanks,
> 
> Mark Michelson
> 
>

-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to