----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviewboard.asterisk.org/r/3930/#review13188 -----------------------------------------------------------
Ship it! Ship It! /branches/12/include/asterisk/res_pjsip_session.h <https://reviewboard.asterisk.org/r/3930/#comment23664> Document the default value for this. - Joshua Colp On Aug. 26, 2014, 10:44 p.m., Mark Michelson wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviewboard.asterisk.org/r/3930/ > ----------------------------------------------------------- > > (Updated Aug. 26, 2014, 10:44 p.m.) > > > Review request for Asterisk Developers. > > > Bugs: ASTERISK-24212 > https://issues.asterisk.org/jira/browse/ASTERISK-24212 > > > Repository: Asterisk > > > Description > ------- > > In /r/3927, I introduced a changeset that fixes a scheduler race condition. I > noted on that issue that even though I was no longer seeing the same crashes > when running tests, I was still seeing occasional test failures due to some > media-related race condition. > > I have determined the cause of this race condition. PJSIP's inv_session API > provides two callbacks, on_state_changed() and on_tsx_state_changed(). Both > of these are called into when the state of a transaction changes, and the > documentation is not particularly clear about what the difference is between > these. Through some investigation, I've found that what happens is: > > 1) Transaction layer detects a state change (such as a request/response being > sent/received, or a timeout) > 2) Transaction layer informs the inv_session layer. > 3) inv_session layer sets the new state and calls the on_state_changed() > callback. > 4) inv_session layer performs its processing of the transaction state change > (including potential calls to the on_media_update() callback). > 5) inv_session layer calls the on_tsx_state_changed() callback. > > res_pjsip_session is hooked into step (3) such that when a new SIP request or > response is sent or received, we call into session supplements to let them > react to the new message. One of these session supplements is in chan_pjsip > and queues an AST_CONTROL_ANSWER frame when a 200 OK response is received. > > In the test where I am seeing the race condition occur, a call is originated > to a PJSIP channel (Alice). Once Alice answers, her channel is placed into > the dialplan to execute the TalkDetect() application. The problem here is > that at step (3), the AST_CONTROL_ANSWER frame is queued onto Alice's > channel, and the PBX thread moves the channel into the TalkDetect() > application before step (4) is executed. Step (4) is important for setting > formats on Alice's channel and setting appropriate translation paths as well. > Since step (4) has not been completed yet, the attempt to play a file as part > of TalkDetect() fails, and therefore the test fails. In most runs, however, > step (4) does get completed before the PBX thread moves Alice's channel into > TalkDetect(). > > The fix presented here is pretty simple. For transaction state changes, we > are now hooked into step (5) instead of step (3) to call session supplements. > This means that we do not call into session supplements until media has been > negotiated on the channel. This eliminates the race condition entirely. > > > ***EDIT 26 Aug, 2014*** > After running all PJSIP tests in the testsuite, I found that things weren't > quite so cut and dry as they were initially presented here. Here are some > additional findings: > * For incoming 3XX responses, it's actually at step (2) that we get told > about the redirection. There are some session supplements that need to be > called back at this time. > * Not all session supplements can safely wait until after media handling to > be called into. For instance, the NAT handling code that rewrites contacts > needs to be called into before media handling on an inbound 200 OK. > * In something pretty much completely unrelated, I found that there was an > undiscovered bug in res_pjsip_session.c that prevented incoming reinvites > from reaching session supplements. > > With regards to the first two bullet points, I have added a > "response_priority" field to session supplements. This allows for a session > supplement to tell res_pjsip_session at what point during response handling > it should be called into. All but two session supplements are called into at > the same point that they were prior to writing this patch. However, two > supplements now specify when they should be called into: > 1) res_pjsip_diversion's supplement specifies to be called into before > processing redirections. > 2) One of chan_pjsip's session supplements specifies to be called into after > media negotiation. > > I discovered the bug in the third bullet by accidentally fixing it. Because > of this change, there are some session supplements in chan_pjsip that now > will check INVITE state and stop processing if dealing with a reinvite. > > > Diffs > ----- > > /branches/12/res/res_pjsip_session.c 421883 > /branches/12/res/res_pjsip_diversion.c 421883 > /branches/12/include/asterisk/res_pjsip_session.h 421883 > /branches/12/channels/chan_pjsip.c 421883 > > Diff: https://reviewboard.asterisk.org/r/3930/diff/ > > > Testing > ------- > > I ran the > tests/channels/pjsip/basic_calls/two_parties/nominal/alice_initiated/alice_hangs_up > test in a loop on my console both before and after applying this patch. > Before applying the patch, I would usually encounter a test failure within an > hour or two. After applying this patch, I left the test running in a loop for > over 24 hours and never had a test failure. > > > Thanks, > > Mark Michelson > >
-- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev