-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3930/
-----------------------------------------------------------

(Updated Aug. 24, 2014, 11:37 p.m.)


Review request for Asterisk Developers.


Changes
-------

Added doxygen for the two callbacks in res_pjsip_session.c


Bugs: ASTERISK-24212
    https://issues.asterisk.org/jira/browse/ASTERISK-24212


Repository: Asterisk


Description
-------

In /r/3927, I introduced a changeset that fixes a scheduler race condition. I 
noted on that issue that even though I was no longer seeing the same crashes 
when running tests, I was still seeing occasional test failures due to some 
media-related race condition.

I have determined the cause of this race condition. PJSIP's inv_session API 
provides two callbacks, on_state_changed() and on_tsx_state_changed(). Both of 
these are called into when the state of a transaction changes, and the 
documentation is not particularly clear about what the difference is between 
these. Through some investigation, I've found that what happens is:

1) Transaction layer detects a state change (such as a request/response being 
sent/received, or a timeout)
2) Transaction layer informs the inv_session layer.
3) inv_session layer sets the new state and calls the on_state_changed() 
callback.
4) inv_session layer performs its processing of the transaction state change 
(including potential calls to the on_media_update() callback).
5) inv_session layer calls the on_tsx_state_changed() callback.

res_pjsip_session is hooked into step (3) such that when a new SIP request or 
response is sent or received, we call into session supplements to let them 
react to the new message. One of these session supplements is in chan_pjsip and 
queues an AST_CONTROL_ANSWER frame when a 200 OK response is received.

In the test where I am seeing the race condition occur, a call is originated to 
a PJSIP channel (Alice). Once Alice answers, her channel is placed into the 
dialplan to execute the TalkDetect() application. The problem here is that at 
step (3), the AST_CONTROL_ANSWER frame is queued onto Alice's channel, and the 
PBX thread moves the channel into the TalkDetect() application before step (4) 
is executed. Step (4) is important for setting formats on Alice's channel and 
setting appropriate translation paths as well. Since step (4) has not been 
completed yet, the attempt to play a file as part of TalkDetect() fails, and 
therefore the test fails. In most runs, however, step (4) does get completed 
before the PBX thread moves Alice's channel into TalkDetect().

The fix presented here is pretty simple. For transaction state changes, we are 
now hooked into step (5) instead of step (3) to call session supplements. This 
means that we do not call into session supplements until media has been 
negotiated on the channel. This eliminates the race condition entirely.


Diffs (updated)
-----

  /branches/12/res/res_pjsip_session.c 421883 

Diff: https://reviewboard.asterisk.org/r/3930/diff/


Testing
-------

I ran the 
tests/channels/pjsip/basic_calls/two_parties/nominal/alice_initiated/alice_hangs_up
 test in a loop on my console both before and after applying this patch. Before 
applying the patch, I would usually encounter a test failure within an hour or 
two. After applying this patch, I left the test running in a loop for over 24 
hours and never had a test failure.


Thanks,

Mark Michelson

-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to