Re: [aqm] review of draft-ietf-aqm-ecn-benefits-04
On 5/7/2015 5:39 PM, Dave Taht wrote: On Thu, May 7, 2015 at 2:31 PM, Michael Welzl mich...@ifi.uio.no wrote: Hi, On 7. mai 2015, at 22.40, Dave Taht dave.t...@gmail.com wrote: I see that during my absence here most mention of the potential negative aspects of ecn have been nuked from this document. Actually I don't think we really removed any - it's just a stylistic change (title, headlines). So: could you be specific about which one there is in a (which?) previous version that is missing in this one? You are correct. I should have said that few of the negatives I had attempted to discuss previously were added to the document. Hi Dave, I'm trying as a co-chair to figure out if we have consensus on this document to go forward. If it's easy to point to, or summarize, a list of negatives that haven't yet been included, I think that would make it simpler for the editors to incorporate. I wasn't able to go back and track every message, but the things that have been most discussed do seem to be included currently. If there are some still missing, I'd like to make sure they get discussed and incorporated as needed. There was the topic of gaming ECN, which I thought Bob Briscoe's message on 4/15/2015 came close to putting to rest, and personally, I'm not sure if or how to reflect this conversation in the draft, but maybe others have more clear ideas? -- Wes Eddy MTI Systems ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
On 9 Jun, 2015, at 19:11, Steven Blake slbl...@petri-meat.com wrote: On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). This doesn't seem to be very effective. Question: have you worked out what drop rate is required to achieve control of a TCP at that speed? There are well-known formulae for standard TCPs, particularly Reno. You might be surprised by the result. Fundamentally, Codel operates on the principle that one mark/drop per RTT per flow is sufficient to control a TCP, or a flow which behaves like a TCP; *not* a particular percentage of packets. This is because TCPs are generally required to perform multiplicative decrease upon a *single* congestion event. The increasing count over time is meant to adapt to higher flow counts and lower RTTs. Other types of flows tend to be sparse and unresponsive in general, and must be controlled using some harder mechanism if necessary. One such mechanism is to combine Codel with an FQ system, which is exactly what fq_codel in Linux does. Fq_codel has been tested successfully at 10Gbps. Codel then operates separately for each flow, and unresponsive flows are isolated. - Jonathan Morton ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
My concern with fq_codel is that by putting single flows into single Codel instances you hit the problem with Codel where it limits bandwidth on higher RTT paths. Simon Sent with AquaMail for Android http://www.aqua-mail.com On June 9, 2015 9:32:15 AM Jonathan Morton chromati...@gmail.com wrote: On 9 Jun, 2015, at 19:11, Steven Blake slbl...@petri-meat.com wrote: On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). This doesn't seem to be very effective. Question: have you worked out what drop rate is required to achieve control of a TCP at that speed? There are well-known formulae for standard TCPs, particularly Reno. You might be surprised by the result. Fundamentally, Codel operates on the principle that one mark/drop per RTT per flow is sufficient to control a TCP, or a flow which behaves like a TCP; *not* a particular percentage of packets. This is because TCPs are generally required to perform multiplicative decrease upon a *single* congestion event. The increasing count over time is meant to adapt to higher flow counts and lower RTTs. Other types of flows tend to be sparse and unresponsive in general, and must be controlled using some harder mechanism if necessary. One such mechanism is to combine Codel with an FQ system, which is exactly what fq_codel in Linux does. Fq_codel has been tested successfully at 10Gbps. Codel then operates separately for each flow, and unresponsive flows are isolated. - Jonathan Morton ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] CoDel on high-speed links
I have a question about how CoDel (as defined in draft-ietf-aqm-codel-01) behaves on high-speed (e.g., = 1 Gbps) links. If this has been discussed before, please just point me in the right direction. In the text below, I'm using drop to mean either packet discard/ECN mark. I'm using (instantaneous) drop frequency to mean the inverse of the interval between consecutive drops during a congestion epoch, measured in drops/sec. The control law for CoDel computes the next time to drop a packet, and is given as: t + interval/sqrt(count) where t is the current time, interval is a value roughly proportional to maximum RTT (recommended 100 msec), and count is cumulative number of drops during a congestion epoch. It is not hard to see that drop frequency increases with sqrt(count). At the first drop, the frequency is 10 drop/sec; after 100 drops it is 100 drops/sec; after 1000 drops it is 316 drops/sec. On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel immediately starts dropping 1% of packets and ramps up to ~10% after 100 drops (1.86 secs). This seems like a reasonable range. On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). This doesn't seem to be very effective. It's possible to reduce interval to ramp up drop frequency more quickly, but that is counter-intuitive because interval should be roughly proportional to maximum RTT, which is link-speed independent. Unless I am mistaken, it appears that the control law should be normalized in some way to average packet rate. On a high-speed link, it might be common to drop multiple packets per-msec, so it also isn't clear to me whether the drop frequency needs to be recalculated on every drop, or whether it could be recalculated over a shorter interval (e.g., 5 msec). Regards, // Steve ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
On 9 Jun, 2015, at 21:29, Steven Blake slbl...@petri-meat.com wrote: That's great, but I'm talking about how to do AQM on Nx100 Gbps packet processing ASICs, not Linux boxes. And I agree that FQ is desirable, but not always cost effective or feasible to retrofit. Ah, you are talking about core networks. That’s a little out of my area of expertise, but… As I understand it, core networks are generally supposed to be over-provisioned. However, even a nominally over-provisioned link can be saturated in the short-term and/or at peak load. A straightforward AQM system is useful for coping with that. There is a well-known theory which states that, for a dumb FIFO, queue length can be reduced from the “BDP rule of thumb” to BDP * sqrt(flows) in the case where a large number of flows is normally expected. This is the case for core networks, but not the edge (where the link is routinely saturated by a single flow). I think it’s reasonable to suppose that the same theory might apply to Codel parameters. If so, taking a nice round number of 1 flows (since you can’t predict it very precisely, an order of magnitude or two is sufficient), the new parameters would be interval=1ms and target=50us. If you re-run your analysis using those parameters, do you get more reasonable behaviour? Intuitively, I think you should get similar per-packet behaviour at 100G using those parameters, as at 1G using the defaults. - Jonathan Morton ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
On Tue, 2015-06-09 at 19:31 +0300, Jonathan Morton wrote: On 9 Jun, 2015, at 19:11, Steven Blake slbl...@petri-meat.com wrote: On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). This doesn't seem to be very effective. Question: have you worked out what drop rate is required to achieve control of a TCP at that speed? There are well-known formulae for standard TCPs, particularly Reno. You might be surprised by the result. Fundamentally, Codel operates on the principle that one mark/drop per RTT per flow is sufficient to control a TCP, or a flow which behaves like a TCP; *not* a particular percentage of packets. This is because TCPs are generally required to perform multiplicative decrease upon a *single* congestion event. The increasing count over time is meant to adapt to higher flow counts and lower RTTs. Other types of flows tend to be sparse and unresponsive in general, and must be controlled using some harder mechanism if necessary. I'm not worried about controlling one TCP. I'm worried about controlling a multiplex of 10K TCPs (on average). One such mechanism is to combine Codel with an FQ system, which is exactly what fq_codel in Linux does. Fq_codel has been tested successfully at 10Gbps. Codel then operates separately for each flow, and unresponsive flows are isolated. FQ is very nice, but not always an option at the necessary scale for 10/100 Gbps links. Regards, // Steve ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
On Tue, 2015-06-09 at 20:46 +0300, Jonathan Morton wrote: In such cases you could reasonably configure fq_codel with a very large number of queues - I believe it supports 65536 out of the box - and/or to provide host fairness instead of flow fairness. A planned enhancement to cake (which really is designed for the last mile - fq_codel isn't so specialised) is to provide both at once. In the default mode, fq_codel still degrades gracefully when faced with an extreme flow count, due to the natural incidence of hash collisions. Each Codel instance then applies to traffic from a subset of flows. Even under these conditions, a single unresponsive flow is reasonably well isolated, only interfering with traffic unfortunate enough to end up in the same queue by chance. That's great, but I'm talking about how to do AQM on Nx100 Gbps packet processing ASICs, not Linux boxes. And I agree that FQ is desirable, but not always cost effective or feasible to retrofit. So back to my previous point: is CoDel as described in draft-ietf-aqm-codel-01 suitable *by itself* as an AQM for high-speed links multiplexing lots ( 1K) of flows. Ex/ RTT = 25 msec #flows = 1K To drop 1 packet/flow/RTT as you say, I need to drop 40K packets/sec. CoDel ramps up to this drop rate after approximately eternity (~16e6 drops). Has anyone tested CoDel (*not fq_codel*) in an environment with interval ~100 msec and #flows = 1K? It would be easy enough to tweak the control law ramp: just scale the calculation of drop interval by a factor inversely proportional to link speed or average number of flows. Regards, // Steve ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
On Tue, 2015-06-09 at 12:44 -0700, Dave Taht wrote: The below makes several mis-characterisations of codel in the first place, and then attempts to reason from there. Hmmm... On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake slbl...@petri-meat.com wrote: I have a question about how CoDel (as defined in draft-ietf-aqm-codel-01) behaves on high-speed (e.g., = 1 Gbps) links. If this has been discussed before, please just point me in the right direction. In the text below, I'm using drop to mean either packet discard/ECN mark. I'm using (instantaneous) drop frequency to mean the inverse of the interval between consecutive drops during a congestion epoch, measured in drops/sec. The control law for CoDel computes the next time to drop a packet, and is given as: t + interval/sqrt(count) where t is the current time, interval is a value roughly proportional to maximum RTT (recommended 100 msec), and count is cumulative number of drops during a congestion epoch. No. Count is just a variable to control the curve of the drop rate. It is not constantly incremented, either, it goes up and down based on how successful it is at controlling the flow(s), only incrementing while latency exceeds the target, decrementing slightly after it stays below the target. The time spent below the target is not accounted for, so you might have a high bang-bang drop rate retained, when something goes above from below. This subtlety is something people consistently miss and something I tried to elucidate in the first stanford talk. I specifically mentioned during a congestion epoch, but let me be more precise: count is continuously incremented during an extended period where latency exceeds the target (perhaps because CoDel isn't yet dropping hard enough). Correct? The fact that the drop frequency doesn't ramp down quickly when congestion is momentarily relieved is good, but doesn't help if it takes forever for the algorithm to ramp up to an effective drop frequency (i.e., something greater than 1 drop/flow/minute). It is not hard to see that drop frequency increases with sqrt(count). At the first drop, the frequency is 10 drop/sec; after 100 drops it is 100 drops/sec; after 1000 drops it is 316 drops/sec. On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel immediately starts dropping 1% of packets and ramps up to ~10% after 100 drops (1.86 secs). No it will wait 100ms after stuff first exceeds the target, then progressively shoot harder based on the progress of the interval/sqrt(count). Ok. At the first drop it is dropping at a rate of 1 packet/100 msec == 10 drops/sec and ramps up from there. At the 100th drop it is dropping at a rate of 100 msec/sqrt(100) == 1 packet/10 msec == 100 drops/sec. This just so happens to occur after 1.8 secs. Aside: as described, CoDel's drop frequency during a congestion epoch increases approximately linearly with time (at a rate of about 50 drops/sec^2 when interval = 100 msec). secondly people have this tendency to measure full size packets, or a 1k average packet. The reality is a dynamic range of 64 bytes to 64k (gso/tso/gro offloads). So bytes is a far better proxy than packets in order to think about this properly. offloads of various sorts bulking up packet sizes has been a headache. I favor reducing mss on highly congested underbuffered links (and bob favors sub-packet windows) to keep the signal strength up. The original definition of packet (circa 1962) was 1000 bits, with up to 8 fragments. I do wish the materials that were the foundation of packet behavior were online somewhere... I don't see how this has anything to do with the text of the draft or my questions. This seems like a reasonable range. On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). I am allergic to averages as a statistic in the network measurement case. This doesn't seem to be very effective. It's possible to reduce interval to ramp up drop frequency more quickly, but that is counter-intuitive because interval should be roughly proportional to maximum RTT, which is link-speed independent. Except that tcp's drop their rates by (typically) half on a drop, and a matter of debate as to when on CE. Ex/ 10 GE link, ~10K flows (average). During a congestion epoch, CoDel with interval = 100 msec starts dropping 257 packets/sec after 5 secs. How many flows is that effectively managing? Unless I am mistaken, it appears that the control law should be normalized in some way to average packet rate. On a high-speed link, it might be common to drop multiple packets per-msec, so it also isn't clear to me whether the drop frequency needs to be recalculated on every drop, or whether it could be recalculated over a shorter interval (e.g., 5 msec). Pie took the approach of sampling, setting a rate for
Re: [aqm] CoDel on high-speed links
There is a well-known theory which states that, for a dumb FIFO, queue length can be reduced from the BDP rule of thumb to BDP * sqrt(flows) in the case where a large number of flows is normally expected. That cannot be right, it goes the wrong direction. Maybe you meant something like BDP / sqrt(flows) ? (I have no idea if that is correct, but at least it goes the right direction.) -Tim Shepard s...@alum.mit.edu ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
The below makes several mis-characterisations of codel in the first place, and then attempts to reason from there. On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake slbl...@petri-meat.com wrote: I have a question about how CoDel (as defined in draft-ietf-aqm-codel-01) behaves on high-speed (e.g., = 1 Gbps) links. If this has been discussed before, please just point me in the right direction. In the text below, I'm using drop to mean either packet discard/ECN mark. I'm using (instantaneous) drop frequency to mean the inverse of the interval between consecutive drops during a congestion epoch, measured in drops/sec. The control law for CoDel computes the next time to drop a packet, and is given as: t + interval/sqrt(count) where t is the current time, interval is a value roughly proportional to maximum RTT (recommended 100 msec), and count is cumulative number of drops during a congestion epoch. No. Count is just a variable to control the curve of the drop rate. It is not constantly incremented, either, it goes up and down based on how successful it is at controlling the flow(s), only incrementing while latency exceeds the target, decrementing slightly after it stays below the target. The time spent below the target is not accounted for, so you might have a high bang-bang drop rate retained, when something goes above from below. This subtlety is something people consistently miss and something I tried to elucidate in the first stanford talk. It is not hard to see that drop frequency increases with sqrt(count). At the first drop, the frequency is 10 drop/sec; after 100 drops it is 100 drops/sec; after 1000 drops it is 316 drops/sec. On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel immediately starts dropping 1% of packets and ramps up to ~10% after 100 drops (1.86 secs). No it will wait 100ms after stuff first exceeds the target, then progressively shoot harder based on the progress of the interval/sqrt(count). secondly people have this tendency to measure full size packets, or a 1k average packet. The reality is a dynamic range of 64 bytes to 64k (gso/tso/gro offloads). So bytes is a far better proxy than packets in order to think about this properly. offloads of various sorts bulking up packet sizes has been a headache. I favor reducing mss on highly congested underbuffered links (and bob favors sub-packet windows) to keep the signal strength up. The original definition of packet (circa 1962) was 1000 bits, with up to 8 fragments. I do wish the materials that were the foundation of packet behavior were online somewhere... This seems like a reasonable range. On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). I am allergic to averages as a statistic in the network measurement case. This doesn't seem to be very effective. It's possible to reduce interval to ramp up drop frequency more quickly, but that is counter-intuitive because interval should be roughly proportional to maximum RTT, which is link-speed independent. Except that tcp's drop their rates by (typically) half on a drop, and a matter of debate as to when on CE. Unless I am mistaken, it appears that the control law should be normalized in some way to average packet rate. On a high-speed link, it might be common to drop multiple packets per-msec, so it also isn't clear to me whether the drop frequency needs to be recalculated on every drop, or whether it could be recalculated over a shorter interval (e.g., 5 msec). Pie took the approach of sampling, setting a rate for shooting, over a 16ms interval. That's pretty huge, but also low cost in some hardware. Codel's timestamp per-packet control law is continuous (but you do need to have a cheap packet timestamping ability). Certainly in all cases more work is needed to address the problems 100gps rates have in general, and it is not just all queue theory! A small packet is .62 *ns* in that regime. A benefit of fq in this case is that you can parallelize fib table lookups across multiple processors/caches, and of fq_codel is that all codels operate independently. Regards, // Steve ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
On Tue, 9 Jun 2015, Steven Blake wrote: Except that tcp's drop their rates by (typically) half on a drop, and a matter of debate as to when on CE. Ex/ 10 GE link, ~10K flows (average). During a congestion epoch, CoDel with interval = 100 msec starts dropping 257 packets/sec after 5 secs. How many flows is that effectively managing? how fast are the flows ramping back up to the prior speed? if you have 10K flows and ~250 drops/sec, over 40 seconds each could end up with one drop. If that keeps the link uncongested, it's doing it's job. Unfortunantly, when there is a drop, the affected flow slows down a LOT, so if you are near the edge of being uncongested, you may not need to slow that many flows down to be uncongested. Then as the flows ramp back up, the link becomes congested again and some flow needs to be slowed down. Hopefully CoDel is going to slow down a different flow the next time. With the extisting feedback that can be provided, it's not possible to slow all flows down 5%, all you could do is to slow 10% of the flows by 50% to reduce overall load by 5% David Lang Unless I am mistaken, it appears that the control law should be normalized in some way to average packet rate. On a high-speed link, it might be common to drop multiple packets per-msec, so it also isn't clear to me whether the drop frequency needs to be recalculated on every drop, or whether it could be recalculated over a shorter interval (e.g., 5 msec). Pie took the approach of sampling, setting a rate for shooting, over a 16ms interval. That's pretty huge, but also low cost in some hardware. Codel's timestamp per-packet control law is continuous (but you do need to have a cheap packet timestamping ability). Certainly in all cases more work is needed to address the problems 100gps rates have in general, and it is not just all queue theory! A small packet is .62 *ns* in that regime. A benefit of fq in this case is that you can parallelize fib table lookups across multiple processors/caches, and of fq_codel is that all codels operate independently. High-speed metro/core links need AQM, too. I believe that draft-ietf-aqm-codel-01 doesn't work for these links in the general case (e.g., large #flows, recommended value for interval, no FQ). IMHO the draft should say something about this, or should adjust the algorithm accordingly. Regards, // Steve ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
On Tue, 2015-06-09 at 13:52 -0700, David Lang wrote: On Tue, 9 Jun 2015, Steven Blake wrote: Except that tcp's drop their rates by (typically) half on a drop, and a matter of debate as to when on CE. Ex/ 10 GE link, ~10K flows (average). During a congestion epoch, CoDel with interval = 100 msec starts dropping 257 packets/sec after 5 secs. How many flows is that effectively managing? how fast are the flows ramping back up to the prior speed? if you have 10K flows and ~250 drops/sec, over 40 seconds each could end up with one drop. If that keeps the link uncongested, it's doing it's job. According to my calculations, with RTT = 25 msec and MTU = 1500 bytes, you need to be going around 29 Mbps average (oscillating between 2/3 and 4/3 of this) to need to see a drop every 40 seconds. For 1 Mbps average you need ~1.4 secs between drops.For 10K 1 Mbps flows you then need to drop ~7000 packets/sec (~0.8% drop frequency for MTU-sized packets on a 10 GE link). Of course this all assumes uniform stationary elephants which is never the case in real life, but you see how CoDel's drop frequency (for interval = 100 msec) is not even in the right ballpark. Unfortunantly, when there is a drop, the affected flow slows down a LOT, so if you are near the edge of being uncongested, you may not need to slow that many flows down to be uncongested. Then as the flows ramp back up, the link becomes congested again and some flow needs to be slowed down. Hopefully CoDel is going to slow down a different flow the next time. With the extisting feedback that can be provided, it's not possible to slow all flows down 5%, all you could do is to slow 10% of the flows by 50% to reduce overall load by 5% The more flows you have, the more flows you need to nuke to slow things down even a little bit. The more flows you need to nuke, the faster you need to drop packets. Regards, // Steve ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm