> On Oct. 31, 2014, 12:39 p.m., rmudgett wrote:
> > /branches/12/main/bridge.c, line 4253
> > <https://reviewboard.asterisk.org/r/4135/diff/2/?file=68675#file68675line4253>
> >
> >     I have a feeling that the race also affects parking here.  If so then 
> > holding the bridge lock is not a viable solution.
> 
> Mark Michelson wrote:
>     I took a gander through the parking code and didn't find anywhere where 
> the blind transfer operation was being queued onto a bridge channel. Did you 
> have a specific spot you had found in the parking code that looked suspicious?

At the end of 
res/parking/parking_bridge_features.c:parking_blind_transfer_park() an action 
is written to the bridge for the peer to move itself to the parking/holding 
bridge.  If this happens before the transfer is published you have the same 
race condition.


- rmudgett


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/4135/#review13651
-----------------------------------------------------------


On Oct. 31, 2014, 10:58 a.m., Mark Michelson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/4135/
> -----------------------------------------------------------
> 
> (Updated Oct. 31, 2014, 10:58 a.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> During blind transfer testing, it was noticed that tests were failing 
> occasionally because the ARI blind transfer event was not being sent. After 
> investigating, I detected a race condition in the blind transfer code. When 
> blind transferring a single channel, the actual transfer operation (i.e. 
> removing the transferee from the bridge and directing them to the proper 
> dialplan location) is queued onto the transferee bridge channel. After 
> queuing the transfer operation, the blind transfer Stasis message is 
> published. At the time of publication, snapshots of the channels and bridge 
> involved are created. The ARI subscriber to the blind transfer Stasis message 
> then attempts to determine if the bridge or any of the involved channels are 
> subscribed to by ARI applications. If so, then the blind transfer message is 
> sent to the applications. The way that the ARI blind transfer message handler 
> works is to first see if the transferer channel is subscribed to. If not, 
> then iterate over all the ch
 annel IDs in the bridge snapshot and determine if any of those are subscribed 
to. In the test we were running, the lone transferee channel was subscribed to, 
so an ARI event should have been sent to our application. Occasionally, though, 
the bridge snapshot did not have any channels IDs on it at all. Why?
> 
> The problem is that since the blind transfer operation is handled by a 
> separate thread, it is possible that the transfer will have completed and the 
> channels removed from the bridge before we publish the blind transfer Stasis 
> message. Since the blind transfer has completed, the bridge on which the 
> transfer occurred no longer has any channels on it, so the resulting bridge 
> snapshot has no channels on it. Through investigation of the code, I found 
> that attended transfers can have this issue too for the case where a 
> transferee is transferred to an application.
> 
> To fix this problem, I thought of four possible fixes:
> 1) Let the thread that actually performs the blind transfer publish the 
> Stasis message.
> 2) Get the bridge snapshot before queuing the blind transfer operation.
> 3) Publish the blind transfer Stasis message before queuing the blind 
> transfer operation.
> 4) Hold the bridge lock while queuing the blind transfer operation and 
> publishing the blind transfer Stasis message.
> 
> Option 1 is clearly the "best" option, but it also is made nearly impossible 
> due to the way that bridge channel operations are queued. Bridge channel 
> operations use frames, which require that their payload be copyable using 
> memcpy(). Including any sort of pointer is a no-no. So I would be forced to 
> come up with some inane method of representing multiple channels and bridges 
> in a frame in order to do this.
> 
> Option 2 is slightly more workable. Currently, there are functions to publish 
> blind and attended transfers that require bridges and channels, not 
> snapshots. Changing the API to accommodate snapshots is possible, but it is 
> more widespread than I would like, and it changes the API. It also runs the 
> slight risk of publishing "stale" data, though that is not likely.
> 
> Option 3 is easiest to implement, but it runs the (very slight) risk that we 
> could publish that a blind transfer happened successfully when in fact the 
> attempt to queue the blind transfer operation failed. I almost went with 
> this, but I felt like the possibility of lying in our events was a bad thing 
> to do.
> 
> Option 4 is my least favorite, but within a release felt like the most viable 
> method. The bridge is already locked while publishing the transfer messages, 
> and I didn't see any harm in having the bridge locked while queuing the 
> transfer operation either. By holding the bridge lock the entire time, we 
> prevent the bridge state from changing out from under us while we publish our 
> Stasis message. This means that the bridge will stay stable until we finally 
> unlock after publishing the Stasis message.
> 
> 
> Diffs
> -----
> 
>   /branches/12/main/bridge.c 426802 
> 
> Diff: https://reviewboard.asterisk.org/r/4135/diff/
> 
> 
> Testing
> -------
> 
> I have re-run the blind transfer tests that had occasionally failed, and I do 
> not see failures. More importantly, the tests still function properly and the 
> change to locking has not introduced any noticeable bad effects.
> 
> 
> Thanks,
> 
> Mark Michelson
> 
>

-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to