Re: [asterisk-dev] bridge_unreal: An alternative approach to Local/Unreal channel optimization
On Mon, Mar 10, 2014 at 7:27 AM, Matthew Jordan mjor...@digium.com wrote: On Mon, Mar 10, 2014 at 6:59 AM, Joshua Colp jc...@digium.com wrote: Matthew Jordan wrote: snip The NLB compatibility code actually checks whether something like a MixMonitor is on either Local channel and won't allow it to be used. Now that I've given a diagram to show where things optimize and how it isn't inside of chan_local... what do you think NOW? ;) I think this proposal is tantamount to killing Local channel optimization. I'm not sure that's a bad thing, but I'd certainly like to get more opinions. Updating this thread with some more thoughts. These are a bit random, but hopefully they'll spark some conversation about possibilities here: * We probably can't get rid of Local channel optimization. While it is ugly - and prone to causing strange thing to happen both in the core and from the perspective of an external user - there's at least one use case that needs this feature: collapsing two RTP capable channels into a native bridge. For example, assume we have the following: -- -- -- SIP/foo \ / Local;1 Local;2 \ / SIP/bar -- B0B1 Here, B0 and B1 would currently be simple two party bridges with the media flowing through the core. If feature requirements meant that SIP/foo and SIP/bar could not be natively bridged - even if they were directly in a bridge together - then optimizing this scenario doesn't buy much performance. If, however, optimizing away the Local channel would result in the bridge between SIP/foo and SIP/bar being a native bridge, then the performance gain is significant. * There's lots of strange edge cases with Local channels. Consider, for example, some of the following scenarios (all of which are possible in 12): ** Local channel between Real channel and multi-party bridge with Real channels. In this case, optimization should result in the Real/A channel being pushed into bridge B1. B0 / Real Real/A --/ \-- -- - B1 -- Real Local;1 Local;2\ Real ** Local channel between Local channel and a multi-party bridge with Real channels. Here, there's a possible race condition between the Local channels: we don't know for sure what is on the other end of the Local/A;2 channel. Finding out is also a bad idea - a whole lot of things would have to be locked in order to get that information. What's more, there may be another Local channel beyond Local/A! This is where Josh's proposal comes into play, as the information is passed down the chain - making it so that optimizations don't have to occur in Local channel chains. At the same time, we may want to try and optimize away the Local channel between the multi-party bridge of Real channels and the other Local channel. Assuming Local/A doesn't win in an optimization race, we'd want Local/A to take the place of the existing Local channel - but we have to prevent it from optimizing away at the same time. B0 / Real Local/A;2 --/ \-- -- - B1 -- Real Local;1 Local;2\ Real ** Local channel between two multi-party bridges. Here, there's really two ways to handle this: either don't optimize away, or merge both bridges together into one massive multi-party bridge. Real -- \ / Real Real -- -- B0 - --- -- - B1 -- Real Real -- / Local;1 Local;2 \ Real ** Two Local channels optimizing into a multi-party bridge. Both our Local channel - as well as Local/B - may attempt to optimize the channels on the other ends into B1 at the same time. The bridge has to carefully manage this process. B0 / Real Real/A --/ \-- -- - B1 -- Local/B Local;1 Local;2\ Real All of these scenarios are currently handled by core_unreal and core_local in some fashion. It is, however, very complex code that - particularly with Local channel chains - is prone to error. The implementation today faces two problems: (1) Knowledge of what is on the other side of the bridge is known by the bridge, but not by either Local channel half. In order to get that knowledge, both Local channel halves must take control of the bridge (and all of its participants), then synchronize with each other. (2) When multiple Local channels can optimize in a chain, they have to communicate with each other (or at least compete with each other) to see who optimizes out first. This can change the information that a Local channel has about how it can optimize: for example, a Local channel may view that it is in a two party bridge with another Local channel, attempt to optimize, only to find out later that it is now in a multi-party bridge with
Re: [asterisk-dev] bridge_unreal: An alternative approach to Local/Unreal channel optimization
Matthew Jordan wrote: snip All of these scenarios are currently handled by core_unreal and core_local in some fashion. It is, however, very complex code that - particularly with Local channel chains - is prone to error. The implementation today faces two problems: (1) Knowledge of what is on the other side of the bridge is known by the bridge, but not by either Local channel half. In order to get that knowledge, both Local channel halves must take control of the bridge (and all of its participants), then synchronize with each other. (2) When multiple Local channels can optimize in a chain, they have to communicate with each other (or at least compete with each other) to see who optimizes out first. This can change the information that a Local channel has about how it can optimize: for example, a Local channel may view that it is in a two party bridge with another Local channel, attempt to optimize, only to find out later that it is now in a multi-party bridge with multiple Real channels. (3) When optimization occurs, there can be *no* information in flight on the Local channel. This is particularly difficult as the write queue exists on the ast_channel struct - which means that the bridging layer has to be informed to not write to the channel when the optimization occurs. Again, more points of synchronization and locking. There's a few possible approaches that may simplify the implementation: * Use approaches such as Josh's native Local bridge to move logic out of core_unreal and core_local into bridge implementations. The bridges actually have state now, and *know* who is in the bridge with them. A bridge implementation could be written that handles a Local channel + one other channel, and it could tell the Local channel when it can optimize. I ended up toying with a prototype[1] last night which does Local channel optimization using this approach. It implements a native bridge technology which requires at least one Local channel to be present in the bridge. Once two channels have joined it stores the bridge and peer channel on each Local channel shared structure in the bridge. If the shared structure contains information about both sides of the Local channel it queues up a task with all of the bridges/channels to optimize. The task is executed in a serialized fashion using a taskprocessor and moves the respective channels around. If there is a chain of Local channels involved then multiple tasks are queued. Some may fail due to actions taken before they are executed, but another task will have already been queued to optimize once again. This happens until the entire chain is collapsed. [1] http://svn.digium.com/svn/asterisk/team/file/bridge_unreal_optimizer/ -- Joshua Colp Digium, Inc. | Senior Software Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - US Check us out at: www.digium.com www.asterisk.org -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
Re: [asterisk-dev] bridge_unreal: An alternative approach to Local/Unreal channel optimization
My one concern is if we stop optimizing Local channels and allow the ast_channel to live for the duration of the call, this could significantly increase open FD's. This would be a bigger issue for systems using res_timing_timerfd, since that causes alert pipe's to be created. On Tue, Mar 11, 2014 at 11:39 AM, Joshua Colp jc...@digium.com wrote: Matthew Jordan wrote: snip All of these scenarios are currently handled by core_unreal and core_local in some fashion. It is, however, very complex code that - particularly with Local channel chains - is prone to error. The implementation today faces two problems: (1) Knowledge of what is on the other side of the bridge is known by the bridge, but not by either Local channel half. In order to get that knowledge, both Local channel halves must take control of the bridge (and all of its participants), then synchronize with each other. (2) When multiple Local channels can optimize in a chain, they have to communicate with each other (or at least compete with each other) to see who optimizes out first. This can change the information that a Local channel has about how it can optimize: for example, a Local channel may view that it is in a two party bridge with another Local channel, attempt to optimize, only to find out later that it is now in a multi-party bridge with multiple Real channels. (3) When optimization occurs, there can be *no* information in flight on the Local channel. This is particularly difficult as the write queue exists on the ast_channel struct - which means that the bridging layer has to be informed to not write to the channel when the optimization occurs. Again, more points of synchronization and locking. There's a few possible approaches that may simplify the implementation: * Use approaches such as Josh's native Local bridge to move logic out of core_unreal and core_local into bridge implementations. The bridges actually have state now, and *know* who is in the bridge with them. A bridge implementation could be written that handles a Local channel + one other channel, and it could tell the Local channel when it can optimize. I ended up toying with a prototype[1] last night which does Local channel optimization using this approach. It implements a native bridge technology which requires at least one Local channel to be present in the bridge. Once two channels have joined it stores the bridge and peer channel on each Local channel shared structure in the bridge. If the shared structure contains information about both sides of the Local channel it queues up a task with all of the bridges/channels to optimize. The task is executed in a serialized fashion using a taskprocessor and moves the respective channels around. If there is a chain of Local channels involved then multiple tasks are queued. Some may fail due to actions taken before they are executed, but another task will have already been queued to optimize once again. This happens until the entire chain is collapsed. [1] http://svn.digium.com/svn/asterisk/team/file/bridge_unreal_optimizer/ -- Joshua Colp Digium, Inc. | Senior Software Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - US Check us out at: www.digium.com www.asterisk.org -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
Re: [asterisk-dev] bridge_unreal: An alternative approach to Local/Unreal channel optimization
Matthew Jordan wrote: snip It's important to point out that optimization's goal was never the removal of the channel. If anything, nuking Local channels has - in my opinion - always made life more difficult for everyone, not easier. The goal was performance - minimize the frame path. If I'm picturing this correctly, this doesn't *quite* optimize as efficiently as completely removing the Local channels - but it may still be sufficient. Real-01 Local-02;1 Local-02;2 Real-03 -- - --- \ / \ / \ / -B0- -NLB- -B1- Real-03 Real-01 In this case - and this is assuming I understand the proposed Native Local Bridge correctly! - Local-02;1 has as its actual destination target Real-03, while Local-02;2 has as it actual destination Real-01. When B0 pushes a frame to Local-02;1, Local-02;1 knows that it should just pass it on to its destination. Rather than passing to its bridge, it writes directly to Real-03. The same happens in reverse for Real-03 to Local-02;2. As it is right now this approach can't optimize a bridged scenario above but I'm not sure that's a bad thing. While having a goal of optimizing things as much as possible is good in this scenario that's costing us a lot of complex code (with issues) and also requiring outside consumers to understand what can happen. I'd personally like to see Local channels become a connection between things instead of channels that can transmogrify, morph, and disappear. It makes both of our lives easier. To outside consumers they become these channels have the same semantics as other channels but are implemented as a connection, and a single event will be produced which shows you how they are connected. As for your diagram no NLB would be present there because a bridge does not exist within chan_local. Making it use the bridging framework would require rewriting it, as it can not work within the confines of what bridging requires (ie: you can't have a channel doing two things at once). Where the NLB would be in use is this: Real-01 (B1) - Local-02;1 - Local-02;2 (B2) - Local-03;1 - Local-03;2 (B3) - Real-02 In this case B2 would be an NLB and optimize things so media coming from Real-01 would be queued onto Local-03;2 for reading by B3 and media coming from Real-02 would be queued onto Local-02;1 for reading by B1. This bypasses Local-02;2 and Local-03;1 in the middle. This works no matter what each far end is doing. The reason optimizing your example is hard is because frames have to come from a channel within the bridge and pass through it. Creating a chain of these works by the real 'endpoints' getting passed down the chain of Local channels via control frames. There's two issues I can see with this - one minor, one maybe not. (1) There's a small amount of work here that occurs by the Local channel passing the frame on to its destination channel. It's minor, but it would be slightly more work than what occurs during today's optimization. (2) More seriously: I wonder if the destination shouldn't be a channel but a bridge. The above optimization cannot work for multi-party bridges: there is no single channel destination. Today's optimization does work in that scenario via a bridge swap - the single party on one end gets swapped with the Local channel in the multi-party bridge. This really is a minor case - the idea of optimizing channels into multi-party bridges is admittedly ridiculously new - but it may be useful to think through this use case. Yes, as it is right now this doesn't optimize as much as the code that currently exists. I can say though that now when any video and audio frame go through a Local channel they no longer attempt to optimize out. (Yes, every 20ms pretty much the code attempts to do the optimization). snip I would think you'd need it if you had a hook that needed the audio on that Local channel - such as a MixMonitor. The NLB compatibility code actually checks whether something like a MixMonitor is on either Local channel and won't allow it to be used. Now that I've given a diagram to show where things optimize and how it isn't inside of chan_local... what do you think NOW? ;) -- Joshua Colp Digium, Inc. | Senior Software Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - US Check us out at: www.digium.com www.asterisk.org -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
Re: [asterisk-dev] bridge_unreal: An alternative approach to Local/Unreal channel optimization
On Mon, Mar 10, 2014 at 6:59 AM, Joshua Colp jc...@digium.com wrote: Matthew Jordan wrote: snip It's important to point out that optimization's goal was never the removal of the channel. If anything, nuking Local channels has - in my opinion - always made life more difficult for everyone, not easier. The goal was performance - minimize the frame path. If I'm picturing this correctly, this doesn't *quite* optimize as efficiently as completely removing the Local channels - but it may still be sufficient. Real-01 Local-02;1 Local-02;2 Real-03 -- - --- \ / \ / \ / -B0- -NLB- -B1- Real-03 Real-01 In this case - and this is assuming I understand the proposed Native Local Bridge correctly! - Local-02;1 has as its actual destination target Real-03, while Local-02;2 has as it actual destination Real-01. When B0 pushes a frame to Local-02;1, Local-02;1 knows that it should just pass it on to its destination. Rather than passing to its bridge, it writes directly to Real-03. The same happens in reverse for Real-03 to Local-02;2. As it is right now this approach can't optimize a bridged scenario above but I'm not sure that's a bad thing. While having a goal of optimizing things as much as possible is good in this scenario that's costing us a lot of complex code (with issues) and also requiring outside consumers to understand what can happen. I'd personally like to see Local channels become a connection between things instead of channels that can transmogrify, morph, and disappear. It makes both of our lives easier. To outside consumers they become these channels have the same semantics as other channels but are implemented as a connection, and a single event will be produced which shows you how they are connected. I don't disagree with that. Optimization of Local channels is a complexity that is hard for us, and hard for users of Asterisk. It improves performance - but I'm not sure how much we improve it by is actually worth that pain. When it was first written, people cared less about the guts of Asterisk, and operations internally were generally simpler. If this complexity is going to stick around, we have to find a way to manage it appropriately. As for your diagram no NLB would be present there because a bridge does not exist within chan_local. Making it use the bridging framework would require rewriting it, as it can not work within the confines of what bridging requires (ie: you can't have a channel doing two things at once). Where the NLB would be in use is this: Real-01 (B1) - Local-02;1 - Local-02;2 (B2) - Local-03;1 - Local-03;2 (B3) - Real-02 In this case B2 would be an NLB and optimize things so media coming from Real-01 would be queued onto Local-03;2 for reading by B3 and media coming from Real-02 would be queued onto Local-02;1 for reading by B1. This bypasses Local-02;2 and Local-03;1 in the middle. This works no matter what each far end is doing. The reason optimizing your example is hard is because frames have to come from a channel within the bridge and pass through it. Well, there are both good and bad things about this. On the good side, AMI events that exist today won't change. Fewer breaking changes is good. Optimization begin/end events wouldn't occur, but the semantics of Local channels otherwise behaves the same. On the bad side, the simplest case is now the case that receives no improvements. You only get benefit from the Native Local Bridge if you have chains of Local channels - which I would imagine to be relatively rare in practice. Creating a chain of these works by the real 'endpoints' getting passed down the chain of Local channels via control frames. There's two issues I can see with this - one minor, one maybe not. (1) There's a small amount of work here that occurs by the Local channel passing the frame on to its destination channel. It's minor, but it would be slightly more work than what occurs during today's optimization. (2) More seriously: I wonder if the destination shouldn't be a channel but a bridge. The above optimization cannot work for multi-party bridges: there is no single channel destination. Today's optimization does work in that scenario via a bridge swap - the single party on one end gets swapped with the Local channel in the multi-party bridge. This really is a minor case - the idea of optimizing channels into multi-party bridges is admittedly ridiculously new - but it may be useful to think through this use case. Yes, as it is right now this doesn't optimize as much as the code that currently exists. I can say though that now when any video and audio frame go through a Local channel they no longer attempt to optimize out. (Yes, every 20ms pretty much the code attempts to do the optimization). Which, if it can't optimize, is a bunch of needless work. Trade-offs!
Re: [asterisk-dev] bridge_unreal: An alternative approach to Local/Unreal channel optimization
On Sat, Mar 8, 2014 at 1:19 PM, Joshua Colp jc...@digium.com wrote: Greetings everyone on this glorious weekend! I've had an idea bouncing around my head for the past many months on an alternative approach for optimizing Local/Unreal channels. This morning everything finally clicked and I put it together[1] (I'm still working on it/tweaking it, but it DOES work). The traditional approach has been to collapse the chain of Local channels down until you are left with the minimum amount required. Unfortunately this can be rather complex and error prone as you need to go through the entire chain and then figure out the best way to accomplish this (keeping in mind juggling multiple locks and potentially multiple bridges). You also end up needing to give information when this happens so consumers know what is going on. In any of the code bases, this is a difficult and complex thing to do. In 12, while I'm not sure we made the problem worse, we certainly didn't make it any better. In order to optimize, each Local channel half has to first determine if they even can optimize. If they are in a bridge with multiple participants, there are ways in which they can - either by a bridge merge or a bridge swap. (Merge puts two multi-party bridges together; swap moves a single participant into another bridge (single or multi)). If they can they then have to synchronize with the other half, lock both bridges that the halves are in - including all of the participants (via the bridge lock) - then move a lot of channels around. To date, the Local channel optimization test - which collapses 150 Local channels - is the number one failing test in the test suite. Weird timing errors cause weird errors. While I'm confident we'll get to the bottom of all the edge cases, it is very, very, very complex. We eliminated the vast majority of masquerades - but this particular operation is, in many ways, just as nasty. The bridge_unreal approach doesn't do this. It aims to optimize the path for frames traveling through the chain, allowing them to skip intermediary hops where they don't need to go through. This results in a very similar situation for the frames but does not move/change/alter/hangup the intermediary channels involved. It's important to point out that optimization's goal was never the removal of the channel. If anything, nuking Local channels has - in my opinion - always made life more difficult for everyone, not easier. The goal was performance - minimize the frame path. If I'm picturing this correctly, this doesn't *quite* optimize as efficiently as completely removing the Local channels - but it may still be sufficient. Real-01 Local-02;1 Local-02;2 Real-03 -- ---- \ / \ / \ / -B0- -NLB- -B1- Real-03 Real-01 In this case - and this is assuming I understand the proposed Native Local Bridge correctly! - Local-02;1 has as its actual destination target Real-03, while Local-02;2 has as it actual destination Real-01. When B0 pushes a frame to Local-02;1, Local-02;1 knows that it should just pass it on to its destination. Rather than passing to its bridge, it writes directly to Real-03. The same happens in reverse for Real-03 to Local-02;2. Creating a chain of these works by the real 'endpoints' getting passed down the chain of Local channels via control frames. There's two issues I can see with this - one minor, one maybe not. (1) There's a small amount of work here that occurs by the Local channel passing the frame on to its destination channel. It's minor, but it would be slightly more work than what occurs during today's optimization. (2) More seriously: I wonder if the destination shouldn't be a channel but a bridge. The above optimization cannot work for multi-party bridges: there is no single channel destination. Today's optimization does work in that scenario via a bridge swap - the single party on one end gets swapped with the Local channel in the multi-party bridge. This really is a minor case - the idea of optimizing channels into multi-party bridges is admittedly ridiculously new - but it may be useful to think through this use case. It does this by passing each far end channel through the entire chain with each intermediary hop storing them and the next hop in the chain examining and forwarding them on over and over. Once this completes each end has the channel that is at the far end and is able to queue frames onto it directly, bypassing the intermediary hops. This happens over time (less than a second, I'm not talking minutes here) but leads to eventual optimization. Even in a compromised optimized state frames will still flow as expected. This also works perfectly fine when a hop uses /n and wishes to remain in the path of frames. Each side of that hop will optimize themselves and skip any intermediary hops.