Re: [aqm] [gautamramk/FQ-PIE-for-Linux-Kernel] max_prob & ecn (#2)
Somehow our naive attempt at putting ecn into pie became part of the standard. This project is making that more configurable. I'd like it if more pie folk took a look at it. https://github.com/gautamramk/FQ-PIE-for-Linux-Kernel/issues/2 Gautam Ramakrishnan writes: > I have added this feature in the latest commit. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub, or mute the thread. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] [Bloat] [Cake] paper: per flow fairness in a data center network
Luca Muscariello writes: > I disagree on the claims that DC switches do not implement anything. > They do, from quite some time now. > > https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-738488.html I'm really impressed. I'd have probably heard about it if they'd mentioned bufferbloat once :/. The graphs comparing their performance to arista's are far, far, far too small to read. You can certainly see a huge improvement on mice in this paper. is there a better copy of this paper around? What's the cheapest form of this switch I can buy? (or beg, borrow, or steal?) I do need a 10GigE-40GigE capable switch in the lab, and BOY oh boy oh boy would I love to test this one. Has this tech made it into their routing products? > > On Thu, Dec 6, 2018 at 4:19 AM Dave Taht wrote: > > While I strongly agree with their premise: > > "Multi-tenant DCNs cannot rely on specialized protocols and > mechanisms > that assume single ownership and end-system compliance. It is > necessary rather to implement general, well-understood mechanisms > provided as a network service that require as few assumptions > about DC > workload as possible." > > ... And there's a solid set of links to current work, and a very > interesting comparison to pfabric, their DCTCP emulation is too > flawed > to be convincing, and we really should get around to making the > ns2 > fq_codel emulation fully match reality. This is also a scenario > where > I'd like to see cake tried, to demonstrate the effectiveness (or > not!) > of 8 way set associative queuing, cobalt, per host/per flow fq, > etc, > vs some of the workloads they outline. > > https://perso.telecom-paristech.fr/drossi/paper/rossi18hpsr.pdf > > -- > > Dave Täht > CTO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-831-205-9740 > ___ > Cake mailing list > c...@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cake > > > > ___ > Bloat mailing list > bl...@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] paper: per flow fairness in a data center network
While I strongly agree with their premise: "Multi-tenant DCNs cannot rely on specialized protocols and mechanisms that assume single ownership and end-system compliance. It is necessary rather to implement general, well-understood mechanisms provided as a network service that require as few assumptions about DC workload as possible." ... And there's a solid set of links to current work, and a very interesting comparison to pfabric, their DCTCP emulation is too flawed to be convincing, and we really should get around to making the ns2 fq_codel emulation fully match reality. This is also a scenario where I'd like to see cake tried, to demonstrate the effectiveness (or not!) of 8 way set associative queuing, cobalt, per host/per flow fq, etc, vs some of the workloads they outline. https://perso.telecom-paristech.fr/drossi/paper/rossi18hpsr.pdf -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] sch_cake and sch_tbs now in linux 4.19
Of possible interest to the members of this (former) working group, is that sch_cake (our all singing, all dancing shaper + per host fq + revised codel qdisc) is now in the Linux mainline. Of other possible interest is the new sch_tbs scheduler which allows for time based packet releases and hardware offload support. https://kernelnewbies.org/Linux_4.19#Better_networking_experience_with_the_CAKE_queue_management_algorithm -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] upstreamed sqm sch_cake in openwrt-18.06-rc2
hopefully the identical version of sch_cake that will also be in linux 4.19 (presently in net-next) is now in openwrt's 18.06-rc2 release. It would be good for tons more folk to beat it up thoroughly over the next several weeks before it is formally released. Come on, don't you remember back when reflashing for the cause was fun? https://downloads.openwrt.org/releases/18.06.0-rc2/targets/ For those of you not paying attention to sch_cake's development, see https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=046f6fd5daefac7f5abdafb436b30f63bc7c602b 1) It would be good to get back more results from docsis mode on cable modems (where we hope now you can use 99.99% of the uplink set rate rather than 85-95%). Also, If anyone's got docsis-3.1 with pie enabled, it would be good to know if we co-exist well with that. 2) We hope everyone digs the default per-host/per flow fq 3) There's a zillion other features worth exercising, like diffserv. A late addition was the ability to run at speeds far greater than the <= 1gbit speeds we initially targetted for the shaper component, on suitable hw. (shaper works great at a gigabit, try it!) Establishing good cpu constraints by architecture would be good too. Etc. huge thx to kevin db, toke, jon, and everyone else for finally "making it real". -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel: After much ado ...
Jana Iyengar <j...@google.com> writes: > ... draft-ietf-aqm-codel-08 is finally posted. This new version addresses all > IESG comments during IESG review, in addition to review comments by Patrick > Timmons and Yoav Nir. We thank everyone for their help with reviews. > > Most importantly, I want to personally thank the fq_codel authors for sending > me > Yerba Mate, Dave Taht for sending me delicious freshly-baked cookies, and Paul > McKenney for sending me a ton of organic green tea to help me move on the > document. I will say that you all managed to do something nobody has managed > so > far: you successfully shamed me into getting this work done. > > I also received bungee cords from the fq_codel authors to tie myself to my > chair > with, which I put to good use: I would like to share here evidence of my > atonement. (Cookies are not in the picture, because they were delicious. > Thanks, > Dave!) Yer welcome, and thank you VERY MUCH for completing this. I got some bungee cords for myself, too, as I have more than a few things 98% done I'd like to get off my plate. Perhaps we could include these new concepts in future standards for the RFC creation processes? However, I think at least one new hardware standard is necessary. There needs to be some sort of laptop mounting bracket for the cookies and a powerful feedback loop between interface and future IOT enabled-bungee cords. Particularly, reaching for the mouse, rather than cookies or tea, should be de-incentivised. I couldn't come up with a good way to distinguish between those forms of muscular traffic. > > - jana > > (P.S.: I now look forward to receiving thank you gifts. Oh, and I'm > caffeine-free and vegetarian, just in case.) > > > ___ > aqm mailing list > aqm@ietf.org > https://www.ietf.org/mailman/listinfo/aqm ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] I am setting up a per holiday cron job
the template: For [INSERT HOLIDAY], I'd really love to see a codel & fq_codel RFC published. -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] make-wifi-fast linuxplumbers talk summary on lwn.net
and available here: https://lwn.net/SubscriberLink/705884/1bdb9c4aa048b0d5/ After the talk I discussed with several folk about applying the same debloating techniques to other chipsets. I don't remember, unfortunately, who all those folk were, nor the candidate chipsets! We are still wrestling with "good" settings to get fq_codel to scale properly, and mostly trying to move in the direction of less inherent latency on more stations. -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] "Globally, the average loss rates on policed flows are over 20%"
And while I'm catching up on my academic backlog (scholar.google.com has a ton of newer things on it about bufferbloat), this report on the effects of policing was pretty good: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45411.pdf On Wed, Aug 3, 2016 at 3:37 PM, Dave Tähtwrote: > I am especially grateful for the full documentation of how to configure > the bsd versions of this stuff, but the rest of the report was pretty > good too. > > http://caia.swin.edu.au/reports/160708A/CAIA-TR-160708A.pdf > ___ > Bloat mailing list > bl...@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] transport protocols in userspace
good discussion of a new feature for linux, proposed by facebook, that will make it much easier to write protocols in userspace, the positives, and negatives. https://lwn.net/SubscriberLink/691887/9388e53741d4c93e/ Please don't discuss on this list, I've had a bad morning already. -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] A bit of history on RFC970 and RFC896 from john nagle
-- Forwarded message -- From: John Nagle Date: Thu, Apr 14, 2016 at 7:14 PM Subject: Re: Bufferbloat and FQ and you To: Dave Taht <dave.t...@gmail.com> Cc: ro...@cisco.com On 04/14/2016 03:33 PM, Dave Taht wrote: > > https://www.rfc-editor.org/rfc/rfc7806.txt was published today. "There is extensive history in the set of algorithms collectively referred to as "fair queuing". The model was initially discussed in [RFC970], which proposed it hypothetically as a solution to the TCP Silly Window Syndrome issue in BSD 4.1." That's somewhat wrong. First, tinygram prevention (the "Nagle algorithm") is not about "silly window syndrome". Silly window syndrome occurs when the window is full, and the reader does a small read, resulting in the sender being allowed a small write, resulting in very short messages. The solution is clearly to not offer more window until there's at least one full size datagram worth of window available. Tinygram prevention is a problem when the window is empty, not full, and the writer is doing small writes. The question is how to consolidate those writes. It's not obvious how to do this without impacting interactive response. The classic solution, from X.25, was an accumulation timer with a human response time sized delay. That's a bad idea, but unfortunately the people who put in delayed ACKs didn't know that. They were trying to fix TELNET responsiveness at Berkeley, which was using a large number of dumb terminals connected to terminal servers at the time. Delayed ACKs with a fixed timer are useful in that situation, and in few others. Actually, this didn't involve 4.1BSD's netowrking; we at Ford Aerospace were running a heavily modified version of 3COM's UNET TCP/IP stack on various UNIX systems. That TCP/IP stack lost out because it cost about $4000 per node for the software. The tinygram stuff was in my RFC 896, and isn't really relevant to fair queuing or congestion management in routers and other middle boxes. The important part of RFC970 is at the section headed "Game Theoretic Aspects of Network Congestion". This discusses the relationship between endpoints and middle boxes, and the need to create an ecosystem which does not reward bad endpoint behavior. The [NOFAIR]] reference is interesting. Yes, fairness is gameable. But FIFO is so much worse, as the bufferbloat people point out. It's worth thinking about when a packet becomes useless and should be dropped. If the packet times out (this was originally what TTL was for; it was a seconds count), it can be dropped as obsolete. A router which looks above the IP level could also detect that the packet has been superseded by a later packet in the same flow, that is, there's a retransmitted copy also queued. If enough resources are available, that's the only packet dropping you have to do. As an optimization, you can also drop packets that are so far back in queues that they'll time out before they're sent. It's worth viewing that as a goal - don't drop any packets that would be useful if they were delivered. Just reorder based on notions of fairness and quality of service. This is the opposite of Random Early Drop, but that's sort of a meat-axe approach. Bufferbloat is only bad if the queuing is FIFO-dumb. It's fine to have lots of queue space if you manage it well. (I'm retired from all this. It's up to you guys now.) John Nagle -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] Last Call: (FlowQueue-Codel) to Experimental RFC
rity as I could muster... and went and tested the hell out of it, whenever I could. as for bufferbloat.net's efforts: We've published all the source code, all the (flent.org) benchmark code, made all the code available widely for anyone to try for under 50 bucks worth of hardware that can be reflashed with openwrt (well, 54 dollars on amazon, for the edgerouter X), and begged interested parties to try *every* bufferbloat-fighting technology we have. I jumped all over pie when it came out, helped polish the code, added ecn support, and got it out there so it could be tested by as many as possible, as soon as possible. I thought fq_pie was pretty neat. I'd fiddle with your latest stuff if you'd just fix dctcp's behaviors vs loss. I love the work on BQL and on fixing TCPs, like pacing in sch_fq. My principal interest is in ending bufferbloat in my lifetime via any means possible. And I did not, and do not, intend to make a career out of it. If I, personally, can just get to where a few more pieces of gear can be bought off the shelf with stuff that has the products of the AQM wg in it - hopefully including wifi, 3g, and homeplug! - I can go back to things I consider far, far, far more interesting. and... Jeebus, it's just one experimental RFC. > Otherwise, FQ_CoDel will get bad press later. Then, after riding the hype > curve of coolness it will fall over the cliff of disillusionment. We'll see. Never in my life have I seen a set of ideas so enthusiastically adopted by those that have adopted it, with so few complaints. The only things that bother me at this point are behaviors below ~2mbit sans tuning, and the ecn support, for which we have research ongoing in cake that we can easily fold back into fq_codel if we need to. > > > Bob > > > > On 22/03/16 04:41, Dave Taht wrote: >> >> I don't even know where to start bob. This part of the language has >> been in the draft for 2 years, and you are the only person to object >> that I can recall. >> >> It's an experimental RFC. By "safe" we mean that deploying it, within >> the guidelines, won't break anything to any huge extent to enormous >> benefits. "unsafe", for example, would be promoting use of dctcp while >> it still responds incorrectly to packet loss. >> >> Verses your decades long quest for better variable rate video, we've >> had over a decade of the bufferbloat problem to deal with on all >> traffic, particularly along the edge, and even after solutions started >> to appear in mid 2012, we haven't made a real dent in what's deployed, >> except for the small select group of devs, academics, ISPs, and >> manufacturers willing to try something new. I'd like to imagine >> things are shifting to the left side of the green line here, but under >> load, most users are still experiencing latency orders of magnitude in >> excess of what can be achieved >> http://www.dslreports.com/speedtest/results/bufferbloat?up=1 >> >> I've been testing the latest generation of wifi APs of late, and the >> "best" of them, under load, in a single direction, has over 2 seconds >> at the lower rates. Applying any of these algorithms to wifi is >> proving hard, and it's where the bottlenecks are shifting to at least >> in my world, where the default download speed is hovering at around >> 75mbit, and wifi starts breaking down long before that is hit. >> >> ... >> >> I tore apart that HAS experiment you cited here: >> https://lists.bufferbloat.net/pipermail/bloat/2016-February/007198.html >> - where I was, at least, happy >> to see fq_codel handle the onslought of dctcp traffic, gracefully. (It >> makes me nervous to have such tcps loose on the internet where a >> configuration mistake might send that at the wrong people. fq_codel, >> "safe" - not, perhaps, optimal - in the face of dctcp.) >> >> my key objections to nearly all the experiments on your side are >> non-reproducability, no competing traffic (not even bothering to >> measure web PLT in >> that paper, for example), no competing upload traffic, and no >> inclusion of the typical things that are latency sensitive at all >> (voip, dns, tcp neg, ssl neg, etc). >> >> with competing download and upload traffic, fq_codel *dramatically* >> improves the responsiveness and utilization of the link, for all >> traffic. Above 5mbits pretty much the only thing that matters for web >> traffic is RTT, the google cite for this is around somewhere. >> >> I tend to weigh low latency for every other form of traffic... >> today... over marginal improvements in a contrived video download >> scenario someday. >> >> As for pie vs fq_codel,
Re: [aqm] Alia Atlas' No Objection on draft-ietf-aqm-fq-codel-05: (with COMMENT)
On Thu, Mar 17, 2016 at 10:13 AM, Toke Høiland-Jørgensenwrote: > "Alia Atlas" writes: > >> -- >> COMMENT: >> -- >> >> I think it would be useful to have a reference to the Linux >> implementation ("current" version and pointer). > > Hi Alia > > I've added a reference pointing to the fq_codel code in Linux git tree > to the latest updated version, available here: > https://kau.toke.dk/ietf/draft-ietf-aqm-fq-codel-06.html (or .txt). I'm not huge on calling this reference [LINUX]. [LINUXSRC]? [SRC]? I also felt compelled, after this round of cite-adding, to add a few more cites, (what will be) rfc7806, BQL, HTB, and HFSC, with a brief section explaining why they are needed also. BQL was the under appreciated breakthrough that made scaling past a gbit possible, and would (if implemented) make dsl and cable modems a lot better, at their (much slower) speeds. https://github.com/dtaht/bufferbloat-rfcs/commit/7d500133008857b7b78000abac9d592e66477ffb adding: ## Device queues must also be well controlled It is best that these AQM and FQ algorithms run as close to the hardware as possible. Scheduling such complexity at interrupt time is difficult, so a small standing queue between the algorithm and the wire is often needed at higher transmit rates. In Linux, this is accomplished via "Byte Queue Limits" {{BQL}} in the device driver ring buffer (for physical line rates), and via a software rate limiter such as {HTB}}, {{HFSC}}, or {{CAKE}} otherwise. Other issues with concatenated queues are described in {{CODEL}}. ... There has been such an accumulation of small changes in response to this wonderful review process that I fear that going through another "last, last" call will be needed. > -Toke ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] working group last call on CoDel drafts
In just about every benchmark we have created to date, the linux version of the codel implementation wins over dozens of attempted alternatives. We have one that is mildly better at a 10ms RTT, but not as good at 80ms, but that's it. This doesn't mean that more experimentation isn't called for (there are two radical alternatives I know of still being tested), but I would vote for putting the linux version into the codel draft. On Fri, Dec 4, 2015 at 11:16 AM, Bless, Roland (TM)wrote: > Dear all, > > we believe that the Codel specification > https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/ needs at least one > major clarification. > > The following lines are present in the draft's pseudo-code, but are not > explained further anywhere in the document text, and moreover differ from > the Linux implementation [*], that the document also suggests as reference > implementation. > >// If min went above target close to when it last went >// below, assume that the drop rate that controlled the >// queue on the last cycle is a good starting point to >// control it now. ('drop_next' will be at most 'interval' >// later than the time of the last drop so 'now - drop_next' >// is a good approximation of the time from the last drop >// until now.) >count_ = (count_ > 2 && now - drop_next_ < 8*interval_)? >count_ - 2 : 1; > > This line makes sure, that when two dropping states are entered within a > short interval from each other, the variable count is not reset (to zero), > but is rather changed somehow. In this document, count is decreased by two, > while in the Linux version, count is set to the number of packets, that were > dropped in the previous dropping state. > > Based on the email-thread that was started from these messages ... > http://www.ietf.org/mail-archive/web/aqm/current/msg00376.html > http://www.ietf.org/mail-archive/web/aqm/current/msg01250.html > http://www.ietf.org/mail-archive/web/aqm/current/msg01455.html > > ... one can infer, that: > 1) the case where count is not reset is not an exception, but rather a > common case (that we can confirm from our measurements), It is a common case. Most of the other behaviors in codel are in attempting to seek to the optimum drop rate, that bit is the one that maintains the optimal drop rate. > 2) several options for this behavior were described on the mailing list some > time ago, > > Since it is the most common case, this part of the algorithm should be > explained in the specification. > If the two versions will continue to differ, both algorithms (and their > difference in behavior) should be explained, > but in order to avoid confusion for implementers/operators we believe that > specification of a single algorithm is preferable . I could make a counter argument saying that diversity and not having a monoculture is good, and that it is possible to make other codels with very similar behavior... but I too would prefer the one true implementation in this draft. > > Regards, > Roland and Polina > > [*] https://github.com/torvalds/linux/blob/master/include/net/codel.h#L341 > > Am 02.12.2015 um 16:45 schrieb Wesley Eddy: > > These both have the intended status designated as "Informational". Similar > to the questions asked for PIE, we/chairs need to understand if there's > consensus on: > - Are these specifications are clear and sufficient quality to publish? > - Should the status of the RFCs be "Experimental", "Proposed Standard", or > "Informational"? > > > > ___ > aqm mailing list > aqm@ietf.org > https://www.ietf.org/mailman/listinfo/aqm > ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel's control law that determines drop frequency
It helps to have the codel mailing list cc'd on codel discussions. Adding this message to the cc. One of these days we do have to write up - after finishing - cake's code-likel implementation. Dave Täht I just invested five years of my life to making wifi better. And, now... the FCC wants to make my work, illegal for people to install. https://www.gofundme.com/savewifi On Tue, Nov 3, 2015 at 11:22 AM, Jeff Weeks <jwe...@sandvine.com> wrote: > The drop rate is affected by sojourn time, yes, but a 2x sojourn time goes > through the same incremental reduction of interval size, as does a sojourn > time of x. > > In investigating codel, I've setup various worst case scenarios, and I feel > like the algorithm could be made better by having its response time more > dependent upon how far away from the target latency it is. > > For example, consider a disabled interface, with a large (or even > conceptually infinite queue), that's attached to a fairly small shaper. > > The interface is then enabled, and immediately starts seeing 100Mbps, and > tries to shape it to 1Mbps. > > The queue will obviously build up quickly, and codel will notice this, and > enter the drop state. But it will start at count = 1. > > If the interface is receiving 64b udp packets, then it'll be receiving > 100,000,000/512 == 195,312 packets per second, and only transmitting 1953 > packets per second. > > The default target is 5ms, which is about 10 packets. So of those 195,312 > packets/second, we should ideally be dropping 195,312 - (1953 + 10) == > 192,349 packets/second. > > But in order to drop that many packets, 'count' needs to ramp up to the point > where the drop interval is consistently 5,198 ns. > > I believe that means 'count' has to reach some nearly impossibly high value > of (100ms/5198ns)^2 == 370,107,128 > > I say nearly impossible, because it will take minutes (hours?) to get that > high (if my math is correct, it'll take over 17 seconds just to reach 7500). > > In the meantime, the queue *isn't* being effectively managed, as packets with > extremely high latencies will be transmitted for far too long. > > Of course, as I stated earlier, simply increasing count more quickly, based > on how far away we are from the target latency effectively invalidates the > optimization which most (all?) codel implementations use (namely the newton > step integer-only sqrt approximation) as, at some point, the approximation > starts *diverging* from the appropriate value. > > One alternative which I've been investigating is the possibility of skewing > the precalculated 1/sqrt(count) value. > > If this is kept as a 32-bit all decimal fixed point number, then performing > the multiplication by intentionally miss-shifting will result in doubling > sqrt(count): > > eg, take the following accurate calculation of next interval: > > codel->next_interval_start_ticks = base_time + ((interval * > codel->one_over_sqrt_count) >> 32) > > And intentionally miss-shift by 1 bit: > > codel->next_interval_start_ticks = base_time + ((interval * > codel->one_over_sqrt_count) >> 33) > > Will effectively have the interval reduce twice as fast. > > Alternatively, (and similarly to how CAKE halves the count while re-entering > the drop interval), count can periodically be doubled, if the current value > is seen to not be adequately affecting traffic, and the pre-calculated > 1/sqrt(count) can then be divided by sqrt(2) (i.e., do not rely on the newton > step approximation for this modification of count). > > Cheers, > --Jeff > > > > > /dev/jeff_weeks.x2936 > Sandvine Incorporated > > From: aqm [aqm-boun...@ietf.org] on behalf of Andrew Mcgregor > [andrewm...@google.com] > Sent: Sunday, October 25, 2015 6:44 PM > To: Dave Dolson > Cc: Kathleen Nichols; Bob Briscoe; Dave Taht; Van Jacobson; AQM IETF list > Subject: Re: [aqm] CoDel's control law that determines drop frequency > > CoDel does have the form of a controller; drop rate (not probability) is a > function of sojourn time (not queue size) and history, encoded in the state > variables. > > Now, I don't take it as proven that the particular form of the controller is > the best we could do, but making it a rate and based on sojourn time are > clear wins. Yes, you can use size as a proxy for sojourn time if your link > really has a constant bit rate, but not even ethernet is exactly CBR in > practice (and in some hardware situations, knowing the size is much more > expensive than measuring sojourn; the opposite can also apply). Yes, you can > use probability as a proxy for rate if y
[aqm] Catching up on diffserv markings
I unsubscribed from rmcat and rtcweb groups a while back after I got overloaded, and appear.in started working so well, (for both ipv6 and ipv4! I use it all day long now!), to focus on finishing up the new "cake" qdisc/shaper/aqm/QoS system, among other things. http://www.bufferbloat.net/projects/codel/wiki/CakeTechnical Cake is now entering the testlab, and among other things, it has support for the diffserv markings discussed in the related, now concluded dart wg, but in ways somewhat different from that imagined there. We have not got any good code in our testbeds yet to test videoconferencing behavior, and we could use some, although it does look like we can drive firefox with some remote control stuff with a fixed video playback now Five questions: 1) Has anyone implemented or tested putting voice and video on two different 5-tuples in any running code out there? 2) How about diffserv markings in general? Do any browsers or webrtc capable software support what was discussed way back when? 3) Were diffserv marking changes eventually allowed on the same 5-tuple? 4) Did the ECN support that was originally in one draft or another ever make it into any running code? (yea, apple plans to turn on ecn universally in their next OS!) 5) What else did I miss in the past year I should know about? Feel free to contact me off list if these have already been discussed. I have totally lost track of the relevant drafts. Sincerely, Dave Täht I just lost five years of my life to making the edge of the internet, and, wifi better. And, now... the FCC wants to make my work illegal for ordinary people to install. https://www.gofundme.com/savewifi ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] one way to truly screw up ecn - is to mark CE on all packets
https://forums.developer.apple.com/thread/16699 Unitymedia sets “CE” on all packets. And totally messes up a vpn that adheres to bob's guidelines regarding encapsulation. Sigh. Can someone call those guys and straighten them out? -- Dave Täht Do you want faster, better, wifi? https://www.patreon.com/dtaht ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] Last call for signatures to the FCC on the wifi lockdown issue
The CeroWrt project's letter to the FCC on how to better manage the software on wifi and home routers vs some proposed regulations is now in last call for signatures. The final draft of our FCC submittal is here: https://docs.google.com/document/d/15QhugvMlIOjH7iCxFdqJFhhwT6_nmYT2j8xAscCImX0/edit?usp=sharing The principal signers (Dave Taht and Vint Cerf), are joined by many network researchers, open source developers, and dozens of developers of aftermarket firmware projects like OpenWrt. Prominent signers currently include: Jonathan Corbet, David P. Reed, Dan Geer, Jim Gettys, Phil Karn, Felix nFietkau, Corinna "Elektra" Aichele, Randell Jesup, Eric S. Raymond, Simon Kelly, Andreas Petlund, Sascha Meinrath, Joe Touch, Dave Farber, Nick Feamster, Paul Vixie, Bob Frankston, Eric Schultz, Brahm Cohen, Jeff Osborn, Harald Alvestrand, and James Woodyatt. If you would like to join our call for substituting sane software engineering practices over misguided regulations, the window for adding your signature to the letter closes at 11:59AM ET, today, Friday, 2015-10-08. Sign via webform here: http://goo.gl/forms/WCF7kPcFl9 We are at approximately 170 signatures as I write. For more details on the controversy we are attempting to address, or to submit your own filing to the FCC see: https://libreplanet.org/wiki/Save_WiFi https://www.dearfcc.org/ Sincerely, Dave Täht CeroWrt Project Architect Tel: +46547001161 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] FCC vs Wifi: The cerowrt letter to the FCC about the wifi firmware lockdown issue is nearly final
We go into a lot of bufferbloat and homenet stuff... it is my hope others involved in these efforts would be willing to add their voice to the mix, either by signing, commenting, or producing your own letters. For your comments, please see the current draft, and especially the 5 mandates at the end at: https://docs.google.com/document/d/1E1D1vWP9uA97Yj5UuBPZXuQEPHARp-AhRqUOeQB2WPk/edit?usp=sharing Final signatures are being accepted now via web form at: http://goo.gl/forms/WCF7kPcFl9 if there is another more appropo ietf mailing list for this sort of announcement/RFC, please forward. Also discussions are mostly on the make-wifi-mailing list on lists.bufferbloat.net or the fcc mailing list at prpl. I note that a similar letter needs to be constructed to the eu commission. Deadline for filing is oct 8. -- Dave Täht Do you want faster, better, wifi? https://www.patreon.com/dtaht ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel's control law that determines drop frequency
On Wed, Sep 30, 2015 at 1:50 AM, Bob Briscoewrote: > Andrew, > > I am also not so interested in an AQM dealing directly with unresponsive > traffic - I prefer to keep policing and AQM as separately deployable > functions, because AQM should be policy-neutral, whereas policing inherently > involves policy. > > My concern was merely that CoDel's linear increase in drop probability can > take a long time to reach where it intends to get to. I would have thought > some form of exponential increase, or at least super-linear, would have been > more responsive to changing traffic conditions. I.e., rather than have to > answer the question "how quickly should drop probability increase?", make it > increase increasingly quickly. > > Early on, Rong Pan showed that it takes CoDel ages to bring high load under > control. I think this linear increase is the reason. cake uses a better curve for codel, but we still need to do more testing in the lab. http://www.bufferbloat.net/projects/codel/wiki/CakeTechnical > > Bob > > > > On 30/09/15 01:42, Andrew Mcgregor wrote: > > Hmm, that's really interesting. > > Most interesting is that my understanding is that the control law was > intended to deal with aggregates of mostly TCP-like traffic, and that an > overload of unresponsive traffic wasn't much of a goal; this seems like > vaguely reasonable behaviour, I suppose, given that pathological situation. > > But I don't have a way to derive the control law from first principles at > this time (I haven't been working on that for a long time now). > > On 25 September 2015 at 06:27, Bob Briscoe wrote: >> >> Toke, >> >> Having originally whinged that no-one ever responded to my original 2013 >> posting, now it's my turn to be embarrassed for having missed your >> interesting response for over 3 months. >> >> Cool that the analysis proves correct in practice - always nice. >> >> The question is still open whether this was the intention, and if so why >> this particular control law was intended. >> I would rather we started from a statement of what the control law ought >> to do, then derive it. >> >> Andrew McGregor said he would have a go at this question some time ago... >> Andrew? >> >> >> Bob >> >> >> >> On 07/06/15 20:27, Toke Høiland-Jørgensen wrote: >> >> Hi Bob >> >> Apologies for reviving this ancient thread; been meaning to get around >> to it sooner, but well... better late than never I suppose. >> >> (Web link to your original mail, in case Message-ID referencing breaks: >> https://www.ietf.org/mail-archive/web/aqm/current/msg00376.html ). >> >> Having recently had a need to understand CoDel's behaviour in more >> detail, your analysis popped out of wherever it's been hiding in the >> back of my mind and presented itself as maybe a good place to start. :) >> >> So anyhow, I'm going to skip the initial assertions in your email and >> focus on the analysis: >> >> Here's my working (pls check it - I may have made mistakes) >> _ >> For brevity, I'll define some briefer variable names: >> interval = I [s] >> next_drop = D [s] >> packet-rate = R [pkt/s] >> count = n [pkt] >> >> >From the CoDel control law code: >> D(n) = I / sqrt(n) >> And the instantaneous drop probability is: >> p(n) = 1/( R * D(n) ) >> >> Then the slope of the rise in drop probability with time is: >> Delta p / Delta t = [p(n+1) - p(n)] / D(n) >> = [1/D(n+1) - 1/D(n)] / [ R * D(n) ] >> = sqrt(n) * [sqrt(n+1) - sqrt(n)] / >> [R*I*I] >> = [ sqrt(n(n+1)) - n ] / R*I^2 >> >> I couldn't find anything wrong with the derivation. I'm not entirely >> sure that I think it makes sense to speak about an "instantaneous drop >> probability" for an algorithm that is not probabilistic in nature. >> However, interpreting p(n) as "the fraction of packets dropped over the >> interval from D(n) to D(n+1)" makes sense, I guess, and for this >> analysis that works. >> >> At count = 1, the numerator starts at sqrt(2)-1 = 0.414. >> Amd as n increases, it rapidly tends to 1/2. >> >> So CoDel's rate of increase of drop probability with time is nearly >> constant (it >> is always between 0.414 and 0.5) and it rapidly approaches 0.5 after a few >> drops, tending towards: >> dp/dt = 1/(2*R*I^2) >> >> This constant increase clearly has very little to do with the square-root >> law of >> TCP Reno. >> >> In the above formula, drop probability increases inversely proportional to >> the >> packet rate. For instance, with I = 100ms and 1500B packets >> at 10Mb/s =>R = 833 pkt/s =>dp/dt = 6.0% /s >> at 100Mb/s => R = 8333 pkt/s => dp/dt = 0.6% /s >> >> I also tried to test this. I configured CoDel (on a Linux 4.0 box) on >> 1Mbps, 2Mbps and 10Mbps links with interval settings of 1 second and >> 500ms, and a total packet limit
Re: [aqm] WGLC on draft-ietf-aqm-eval-guidelines
On Tue, Aug 18, 2015 at 3:03 PM, Roland Bless roland.bl...@kit.edu wrote: Hi, Am 10.08.2015 um 15:43 schrieb Wesley Eddy: As chairs, Richard and I would like to start a 2-week working group last call on the AQM characterization guidelines: https://datatracker.ietf.org/doc/draft-ietf-aqm-eval-guidelines/ Please make a review of this, and send comments to the list or chairs. Any comments that you might have will be useful to us, even if it's just to say that you've read it and have no other comments. Unfortunately, we (Polina and I) did a thorough review, which is attached. TL;DR: from our point-of-view the I-D needs a major revision. I am so tired of this document that I can hardly bear to read it again, but I agree with the majority of the comments. Sometimes I do wish we could do graphics and charts as the IEEE does. Regards, Roland ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht worldwide bufferbloat report: http://www.dslreports.com/speedtest/results/bufferbloat And: What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] cake status ( was Codel's count variable and re-entering dropping state at small time intervals)
I would like to stress that cake is a work in progress, taking place with very limited resources - jonathon's funding ran out last month and we've had to scramble to keep a floor under him F/T, toke is contributing his testbed and test scripts that he used for The good the bad and the wifi, recently published in computer networks: https://kau.toke.dk/experiments/good-bad-wifi/ so we can compare all prior qdiscs... but he is otherwise on vacation... various other parties have contributed scripts to use it in openwrt... and I am entirely unpaid, yet contributing a few servers and clients in real world scenarios while working primarily on the make-wifi-fast stuff, for which some of cake's algorithms may apply but the code needs to move to the mac80211e layer, which was discussed at battlemesh. Other bits - like the new more robust linux hashing api which supports macaddr and mpls targets - are in rapid development elsewhere and we are not tracking that work well. Any suggestions towards putting a better floor under this increasingly promising work are welcomed. Any grant money out there? Exploration of various constants, ratios, and other bits of math throughout the code is welcomed, also. All the code is open source and easily buildable for many versions of linux now. Feel free to play. Much needed are testing and analysis at both line and shaped rates at 1gigE, 10gige and higher, (anyone got 10GigE in a testbed we can use?) - testing at longer rtts is needed (we probably need to expose the interval parameter for the satcomm folk), and with more mixtures of traffic than we currently use. We worked out how to test webrtc only recently (at ietf), for example, but not coded it up. ns2 and ns3 models are needed. there are some thoughts towards leveraging qfq in another group of researchers, more news on that as it happens. Lastly, if anyone knows of some cites for previous attempts at deficit mode schedulers and the other key ideas in cake - we have not done an exaustive liturature search yet, for that pentultimate paper that is in progress. Having a ton of fun though! What we did on our summer vacation! ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] [tsvwg] New Liaison Statement, Explicit Congestion Notification for Lower Layer Protocols
Is there anyone doing ECN outreach also to IEEE 802.11? On Tue, Jul 21, 2015 at 10:42 AM, Liaison Statement Management Tool l...@ietf.org wrote: Title: Explicit Congestion Notification for Lower Layer Protocols Submission Date: 2015-07-20 URL of the IETF Web page: https://datatracker.ietf.org/liaison/1424/ Please reply by 2015-10-30 From: Transport Area Working Group (David Black david.bl...@emc.com) To: 3GPP (susanna.koois...@etsi.org) Cc: Gonzalo Camarillo gonzalo.camari...@ericsson.com,Gorry Fairhurst go...@erg.abdn.ac.uk,Martin Stiemerling mls.i...@gmail.com,Spencer Dawkins spencerdawkins.i...@gmail.com,John Kaippallimalil john.kaippallima...@huawei.com,Bob Briscoe i...@bobbriscoe.net,Transport Area Working Group Discussion List ts...@ietf.org Response Contact: David Black david.bl...@emc.com Technical Contact: Bob Briscoe i...@bobbriscoe.net Purpose: For comment Body: To: 3GPP SA, 3GPP CT, 3GPP RAN, 3GPP SA4, 3GPP SA2, 3GPP RAN2 From: IETF TSVWG In 2001, the IETF introduced explicit congestion notification (ECN) to the Internet Protocol as a proposed standard [RFC3168]. The purpose of ECN was to notify congestion without having to drop packets. The IETF originally specified ECN for cases where buffers were IP-aware. However, ECN is now being used in a number of environments including codec selection and rate adaptation, where 3GPP protocols such as PDCP encapsulate IP. As active queue management (AQM) and ECN become widely deployed in 3GPP networks and interconnected IP networks, it could be incompatible with the standardized use of ECN across the end-to-end IP transport [RFC7567]. The IETF is now considering new uses of ECN for low latency [draft-welzl-ecn-benefits] that would be applicable to 5G mobile flows. However, the IETF has realized that it has given little if any guidance on how to add explicit congestion notification to lower layer protocols or interfaces between lower layers and ECN in IP. This liaison statement is to inform 3GPP, in particular those groups including those involved in 3GPP Release-10 work on the work item ECSRA_LA (TR23.860) - SA4, CT4, SA2 and RAN2. Please distribute to all groups that have used or plan to use IETF ECN /AQM RFCs in 3GPP specifications. The IETF has started work on guidelines for adding ECN to protocols that may encapsulate IP and interfacing these protocols with ECN in IP. Then IP may act in its role as an interoperability protocol over multiple forwarding protocols. This activity is led by the IETF's transport services working group (tsvwg). Actions: The IETF tsvwg kindly asks 3GPP: 1) to tell the IETF tsvwg which 3GPP working groups could be affected by this work. 2) To inform the IETF tsvwg of any specific 3GPP specifications affected by this work. 3) to forward this liaison statement to these affected working groups, and to invite them to review the latest draft of the guidelines, available here: http://tools.ietf.org/html/draft-ietf-tsvwg-ecn-encap-guidelines Review comments are particularly welcome on: - comprehensibility for the 3GPP community - usefulness and applicability - technical feasibility Review comments may be posted directly to the IETF tsvwg mailing list mailto: ts...@ietf.org. Postings from non-subscribers may be delayed by moderation. Alternatively, subscription is open to all at: https://www.ietf.org/mailman/listinfo/tsvwg. The following IETF specifications or drafts are particularly relevant to this activity (the relevance of each of them is explained in the first item below): * draft-ietf-tsvwg-ecn-encap-guidelines * RFC3168 updated by RFC4301, RFC6040 (ECN in respectively: IP/TCP, IPsec IP-in-IP tunnels) * RFC6679 (ECN in RTP) * RFC5129 updated by RFC5462 (ECN in MPLS) * RFC4774 (Specifying alternative semantics for the ECN field) * RFC7567 (Recommendations Regarding Active Queue Management * draft-welzl-ecn-benefits (Benefits to Applications of Using ECN) Yours, --David L. Black (TSVWG co-chair) Attachments: No document has been attached -- Dave Täht worldwide bufferbloat report: http://www.dslreports.com/speedtest/results/bufferbloat And: What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] quick comment on aqm-eval-guidelines
From each of these sets of measurements, the 10th and 90th percentiles and the median value SHOULD be computed. For each scenario, a graph can be generated, with the x-axis showing the end- to-end delay and the y-axis the goodput. This graph provides part of a better understanding of (1) the delay/goodput trade-off for a given congestion control mechanism, and (2) how the goodput and average queue size vary as a function of the traffic load. This is lame. Capturing *all* the data as in a CDF or an Winstien ellipsis plot, across the entire range, is to be preferred when engineering a system. 90th percentile is a very, very low bar to cross, most of the nasty bufferbloat happens at the top end of the range. Packet crcs, as one example, are measured out to what, one in 6 million? Would you drive a car that had the steering wheel fail one time in 10 turns? as for medians, seven figure summaries, if you must... -- Dave Täht worldwide bufferbloat report: http://www.dslreports.com/speedtest/results/bufferbloat And: What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] FQ-PIE kernel module implementation
On Fri, Jul 3, 2015 at 11:22 AM, Fred Baker (fred) f...@cisco.com wrote: On Jul 3, 2015, at 10:56 AM, Dave Taht dave.t...@gmail.com wrote: There are also weighted FQ systems (like qfq+ + pie or codel) under development. Actually, A WFQ system has been in Cisco product for 20 years, and I wrote one at a different company four years earlier. having FQ systems be weighted is pretty normal. yep! Sorry! What is the current limit on number of queues, however? /me gets head out of linux sand -- Dave Täht worldwide bufferbloat report: http://www.dslreports.com/speedtest/results/bufferbloat And: What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] FQ-PIE kernel module implementation
On Fri, Jul 3, 2015 at 2:52 AM, Fred Baker f...@cisco.com wrote: On Jul 3, 2015, at 2:45 AM, Polina Goltsman uu...@student.kit.edu wrote: As I understand the FQ-Codel draft, it seems to be fundamental to FQ-Codel that each queue has separate state variables. So my question is: is it indeed fundamental ? If you're asking whether it is fundamental to fair queuing, I'll recommend you start researching that question with RFC 970 and the articles in SIGCOMM and INFOCOMM on the topic circa 1988-1995 or so. Also take a look at Class-based Queing (aka CBQ) in the same timeframe. I think you'll find that FQ systems are not approached as collections of queues with different characteristics; they are collections of queues with essentially the same set of characteristics, using scheduling to make the queues share bandwidth in a manner similar to the Generalized Processor Sharing model. On the other hand, CBQ systems are systems with separate queues or classes for different sets of traffic, with different characteristics such as drop policy or target latency. I do not think how FQ-codel works is fundamental to FQ, rather, it is an innovation that seems to work well in practice, with extremely low overhead, leaving (mostly) untouched, request-response traffic like dns, and other low rate traffic, but while getting large bursts under control *rapidly*, yet allowing short flows through, with no further configuration. So it is a set of queues with different characteristics - which is why we ended up calling it Flow queueing, not fair queuing. AQMs (like pie) traditionally rely on applying increasing amounts of random drops/marks to a stream of packets, hoping to eventually pick out the fattest flows and shoot at them, and yet modern traffic is A) assymmetric, and B) bidirectional, and C) bursty, with large bursts coming from various forms off TSO/GSO/GRO offloads, IW10 (and now quic IW10+22 paced packets), and things like web browsers opening up many connections simultaneously. Any form of FQ reduces the impact of C) enormously. So much so that I regard FQ as the biggest part of the answer to achieving reliably low latency for all forms of traffic. No AQM deals particularly well with B - if you have 20 acks vs 1 full size MTU packet filling up the queue, it can take a lot longer for the AQM to find an ideal drop rate - which is not, actually, ideal. Toke did a preso on this on B and C. As for A) - I am seeing a 12x1 ratio of down to up in my current cable modem services, and this makes acks actually far more important and painful than they ever have been before. It does not take a lot of fat uplink traffic to start starving the downlink. Wifi and LTE are also often painfully assymmetric. FQ does impose some structure - per packet fairness has problems in some scenarios, byte fairness at the MTU size does not shoot at enough acks, the compromise in deployed fq_codel systems at lower bandwidths is a DRR quantum of 300 bytes. Cake uses peeling (to deal with up to 64k byte packets common now in consumer and server hardware), an 8 way set associative hash (much more flow isolation) and has a variable quantum (presently), based on the number of extant flows, and has some thoughts about the ideal BDP in the AQM, as the number of flows goes up, and is smarter about quite a few things, and perhaps dumber about others. http://www.bufferbloat.net/projects/codel/wiki/Cake I think FQ-pie is a good idea (aside from worrying about it over-prioritizing millions of small flows). I lean still towards stochastic methods (as in cake and fq_codel) along the edge. As for the core, or where you have a lot of cpus on the ingress side, damned if I know. As for wifi/wireless, we still have tons of work left to do there. On my benchmarks fq-pie, fq_codel, and cake all win pretty big. I think cake is now more or less better than sqm-scripts was, and handles more edge cases. When we built the differentiated services model, we modeled a FQ subsystem as if it were a single queue in a larger CBQ system, We might, for example, have a FQ system for an AF class, but give EF priority over the entire FQ subsystem. What we did with sqm-scripts (and other deployed fq_codel based systems, like free.fr's) was to have 3 tiers of relative fq_codel queues for priority, best effort, and background. That seems to work pretty well. There is not a lot of effective use of classification in my sample sets. Cake is experimenting with various means of layering diffserv on top of that, presently with a default of 4 fq_codel-ish queues, We are collecting a great deal more stats on queue behavior and actual loads in cake, now, example here: http://pastebin.com/bX1HmDP6 Couple notes on that url: Class 0 is just a name (background traffic goes here), class 1 is CS0, classes 2 and 3 are higher prio than 1. We need a better name than class as CS0 is actually in class 1 Sent 65725311796 bytes 52559409 pkt (dropped 11935,
Re: [aqm] Questioning the goal of a hard delay target
On Fri, Jul 3, 2015 at 10:42 AM, Bob Briscoe i...@bobbriscoe.net wrote: Simon, Y, if you're going to start autoadjusting a hard-coded parameter, you have to first question whether it was right to choose that parameter to hard-code in the first place. In codel, target was never a hardcoded parameter. It has always been specified as 5-10% of the interval, with a default of 5% (which equals 5ms on an interval of 100ms. In retrospect I really wish we had made it be an actual percentage in the code and configuration, and on other days wish we had only exposed interval as a parameter). We have always thought target in the case of wifi especially needed to be a function of active stations. This is sort of where cake is going. Target is merely a delay that codel *aims for*. When it hits a drop rate (or the flow slows down) enough - it turns off, and the algorithm goes into behavior that only goes on again after the delay exceeds target for the current computed interval. It is good to have some new thinking on this, of course, and codifying how to modify the target - or work on different curves on various other algorithms, is wonderful. I like bob's other piece on smaller cwnds to keep the tcp signal strength up, but am not as allergic as he to reducing mss in such cases. One of these days, perhaps, someone will successfully write up and explain the 3 modes of codel. People look too hard at the ramp up portion of the algo and not at what happens when you are at or near steady state. I would like to develop a model that shows what is going on in all the queues, all the time, and presents it graphically, somehow. Bob On 03/07/15 18:34, Simon Barber wrote: Hi Bob, Very interesting to see this. I had just recently privately proposed an extension to Codel - to auto tune the target parameter. The proposal is to observe the characteristics that are exhibited when target is too large or too small, and make adjustments appropriately. i.e. if you make a single drop during an interval, and the response of the flow is to go idle (even momentarily) then perhaps it was because target is too small. Using some rule you could increase target. Conversely you can heuristically identify when target is likely too large, and reduce it. Simon On 7/3/2015 5:20 AM, Bob Briscoe wrote: AQM chairs and list, 1) Delay-loss tradeoff We (Koen de Schepper and I) have designed an AQM aimed at removing the need for low delay QoS classes, initially as a cost/complexity reduction exercise for broadband remote access servers (BRASs). One of the requirements given to us was: * As background load increases, delay-sensitive apps previously given priority QoS treatment (e.g. voice, conversational video) should continue to get the same QoS as they got with Diffserv. We found that AQMs with a hard delay threshold (PIE, CoDel) have to drive up loss really high in order to maintain the hard cap on delay. The levels of loss start to cause QoS problems for voice, even tho delay is fine. Indeed, we found that the high levels of loss become the dominant cause of delay for Web traffic, due to tail losses and timeouts. Everyone has been focusing on delay, but we've not been noticing consequent really bad loss levels at high load. Once you know where to look, the problem is easy to grasp: As load increases, the bottleneck link has to get each TCP flow to go slower to use a smaller share of the link. The network can increase either drop or RTT. If it holds queuing delay (and therefore RTT) constant (as PIE and CoDel do), it has to increase drop more. We found that by softening the delay threshold a little, at high load we don't need crazy loss levels to keep delay within bounds. BTW, the implementation needs fewer operations per packet than RED, PIE or CoDel. Conversely, at low load, a hard queuing delay threshold also means that delay will be /higher/ than it needs to be. I've written up a brief (4pp) tech report quantifying the problem analytically. http://www.bobbriscoe.net/projects/latency/credi_tr.pdf Koen and colleagues have since done thousands of experiments on their broadband testbed with real equipment. It's looking good, even before we've explored varying what we call the 'curviness' parameter (which varies how hard the target it). We have a paper under submission with all the results, which we'll post as soon as it's not sub judice. 2) Does Flow Aggregation Increase or Decrease the Queue? Something else had been bugging me about how queue lengths vary with load: The above argument explains how more TCP flows /increase/ the queue. But queues are meant to get /smaller/ at higher levels of aggregation. The second half of the above tech report explains why there's no paradox. And it goes on to explain when you have to configure an AQM with different parameters for higher link capacity, and when you don't. It gives the formula for how to set the config too. Writing this
Re: [aqm] tackling torrent on a 10mbit uplink (100mbit down)
sometimes I pick the wrong week to actually try to benchmark a protocol in the wild. https://torrentfreak.com/popular-torrents-being-sabotaged-by-ipv6-peer-flood-150619/ On Fri, Jun 19, 2015 at 9:01 AM, Dave Taht dave.t...@gmail.com wrote: I just downloaded and seeded 4 popular torrents overnight using the latest version of the transmission-gtk client. I have not paid much attention to this app or protocol of late (about 2.5 years since last I did this), I got a little sparked by wanting to test cdg, but did not get that far. Some egress stats this morning (fq_codel on the uplink) bytes 32050522339 packets 3379478 dropped 702799 percent 20.80% maxpacket 28614 Some notes: 1) The link stayed remarkably usable: http://snapon.lab.bufferbloat.net/~d/withtorrent/vs64connectedpeers.png This graph shows what happened when one of the 4 torrents completed. The percentage of bandwidth the uplink on this test got was a bit larger than I expected. Subjectively, web browsing was slower but usable, and my other normal usages (like ssh and mosh and google music over quic) were seemingly unaffected. (latency for small flows stayed pretty flat) 2) even with 69 peers going at peak, I generally did not get anywhere near saturating the 100mbit downlink with torrent alone. 3) Offloads are a pita. Merely counting packets here does not show the real truth of what's going on (max packet of 28614 bytes!?), so linux, benchmarkers, and so on, should also be counting bytes dropped these days. (cake does peeling of superpackets but I was not testing that, and it too does not return bytes dropped) 4) *All* the traffic was udp. (uTP) Despite ipv6 being enabled (with two source specific ipv6 ips), I did not see any ipv6 peers connect. Bug? Death of torrent over ipv6? Blocking? What? 5) transmission-generated uplink traffic seemed bursty, but I did not tear apart the data or code. I will track queue length next time. 6) Although transmission seems to support setting the diffserv bytes, it did not do so on the udp marked traffic. I think that was a tcp only option. Also it is incorrect for ipv6 (not using IPV6_TCLASS). I had figured (before starting the test) that this was going to be a good test of cake's diffserv support. Sigh. Is there some other client I could use? 7) transmission ate a metric ton of cpu (30% on a i3) at these speeds. 8) My (cable) link actually is 140mbit down, 11 up. I did not much care for asymmetric networks when the ratios were 6x1, so 13x1 is way up there Anyway, 20% packet loss of the right packets was survivable. I will subject myself to the same test on other fq or aqms. And, if I can force myself to, with no aqm or fq. For SCIENCE! Attention, DMCA lawyers: Please send takedown notices to bufferbloat-research@/dev/null.org . One of the things truly astonishing about this is that in 12 hours in one night I downloaded more stuff than I could ever watch (mp4) or listen to (even in flac format) in several days of dedicated consumption. And it all just got rm -rf'd. It occurs to me there is a human upper bound to how much data one would ever want to consume, and we cracked that limit at 20mbit, with only 4k+ video driving demand any harder. When we started bufferbloat.net 20mbit downlinks were the best you could easily get. -- Dave Täht worldwide bufferbloat report: http://www.dslreports.com/speedtest/results/bufferbloat And: What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast -- Dave Täht worldwide bufferbloat report: http://www.dslreports.com/speedtest/results/bufferbloat And: What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] loss + ect
After subjecting myself to the cable dslreports.com/speedtest on a 1mbit link,against current implementations of codel and fq_codel (no overload protection), pie and cake (overload protection)... and witnessing the carnage... ...I kind of think transports should treat loss with ect(3) also being sent as a stronger signal than they do. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] tackling torrent on a 10mbit uplink (100mbit down)
I just downloaded and seeded 4 popular torrents overnight using the latest version of the transmission-gtk client. I have not paid much attention to this app or protocol of late (about 2.5 years since last I did this), I got a little sparked by wanting to test cdg, but did not get that far. Some egress stats this morning (fq_codel on the uplink) bytes 32050522339 packets 3379478 dropped 702799 percent 20.80% maxpacket 28614 Some notes: 1) The link stayed remarkably usable: http://snapon.lab.bufferbloat.net/~d/withtorrent/vs64connectedpeers.png This graph shows what happened when one of the 4 torrents completed. The percentage of bandwidth the uplink on this test got was a bit larger than I expected. Subjectively, web browsing was slower but usable, and my other normal usages (like ssh and mosh and google music over quic) were seemingly unaffected. (latency for small flows stayed pretty flat) 2) even with 69 peers going at peak, I generally did not get anywhere near saturating the 100mbit downlink with torrent alone. 3) Offloads are a pita. Merely counting packets here does not show the real truth of what's going on (max packet of 28614 bytes!?), so linux, benchmarkers, and so on, should also be counting bytes dropped these days. (cake does peeling of superpackets but I was not testing that, and it too does not return bytes dropped) 4) *All* the traffic was udp. (uTP) Despite ipv6 being enabled (with two source specific ipv6 ips), I did not see any ipv6 peers connect. Bug? Death of torrent over ipv6? Blocking? What? 5) transmission-generated uplink traffic seemed bursty, but I did not tear apart the data or code. I will track queue length next time. 6) Although transmission seems to support setting the diffserv bytes, it did not do so on the udp marked traffic. I think that was a tcp only option. Also it is incorrect for ipv6 (not using IPV6_TCLASS). I had figured (before starting the test) that this was going to be a good test of cake's diffserv support. Sigh. Is there some other client I could use? 7) transmission ate a metric ton of cpu (30% on a i3) at these speeds. 8) My (cable) link actually is 140mbit down, 11 up. I did not much care for asymmetric networks when the ratios were 6x1, so 13x1 is way up there Anyway, 20% packet loss of the right packets was survivable. I will subject myself to the same test on other fq or aqms. And, if I can force myself to, with no aqm or fq. For SCIENCE! Attention, DMCA lawyers: Please send takedown notices to bufferbloat-research@/dev/null.org . One of the things truly astonishing about this is that in 12 hours in one night I downloaded more stuff than I could ever watch (mp4) or listen to (even in flac format) in several days of dedicated consumption. And it all just got rm -rf'd. It occurs to me there is a human upper bound to how much data one would ever want to consume, and we cracked that limit at 20mbit, with only 4k+ video driving demand any harder. When we started bufferbloat.net 20mbit downlinks were the best you could easily get. -- Dave Täht worldwide bufferbloat report: http://www.dslreports.com/speedtest/results/bufferbloat And: What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] I-D Action: draft-ietf-aqm-ecn-benefits-04.txt
for many positive bullet points in the present document I can think of a negative counter-example in the real world that needs to be defeated in detail. Just off the top of my head: 3.2 tinc, when carrying tos, encapsulates the ecn markings also and does not apply them properly according to the various rfcs. 3.3 recently: one large providers equal cost multipath implementation used the full 8 bit tos field as part of the tuple. This worked fine, until CE started getting exerted by new aqms in the path, which led to massive packet re-ordering. Fixing it required fixing a ton of pretty modern vendor gear. 3.4: thus far, even with multiple queues, on the aqms I have, ECN marked traffic causes extra loss and delay in non ecn marked traffic. I agree that we should ecn mark sooner than drop, work is progressing. I would like it if non-traditional (ab)uses of ecn were covered - 1) attacks using ecn marked packets on dns servers, for example - and 2) future protocols that could use it (say, Quic). 3) as an example of something I've been fiddling with for a long time, coupling a routing protocol's metrics to something other than packet loss, and getting better signal strength by using ecn marked packets for more reliable communications under congestion. the draft touches upon voip uses (where I kind of think ecn is not the best idea), but does not touch upon videoconferencing well, where I think ecn protection of iframes would be a very good idea. So the guidance in sec 2.4 is a bit vague. aggregating transports with retries (e.g. wifi) could use ecn basically for free when experiencing trouble at the lowest layers of the stack. I know I have a tendency to accumulate the negatives (I do LIKE ecn), but would certainly like to have a fora, or a living document or wiki for potential sysadmins, vendors, and deployers to have a clear grip on what can go wrong when attempting to roll out ecn stuff. So I am mostly in favor of this document getting published, so long as someone steps up to also be an ecn news central, chock full of user generated content on the pitfalls, tips, and tricks - and benefits! -, to guiding ecn deployment further along. ecn is inevitable. finally. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
On Mon, Jun 15, 2015 at 4:40 PM, Agarwal, Anil anil.agar...@viasat.com wrote: I guess this is pointing to the age old problem - what is the right buffer size or equivalent delay limit, when packets should be dropped or ECN-marked, so that the link is never under-utilized? For a single TCP connection, the answer is the bandwidth-delay product BDP. For large number of connections, it is BDP / sqrt(numConnections). Hence, one size does not fit all. E.g., for RTT of 100 ms or 500 ms, CoDel target delay of 5 or 10 ms is too short - when handling a small number of connections. The codel recommendation is that the target be set to 5-10% of the typical interval (RTT), so in the case of a sat link, interval would be 600(?)ms, and target 5-10 % of that, 30-60ms. The recommendation bears testing in this scenario. If you would like to specify what you expect your (say) 98th percentile real physical RTT, I can exhaustively simulate that rather rapidly later this week against reno, cdg, cubic, and westwood and various aqms, against flows from 1 to 100. Most of this convo has also missed other advancements in tcp (reno is thoroughly dead), like PRR. A very good read which incorporates a discussion of PRR is http://folk.uio.no/kennetkl/jonassen_thesis.pdf which I plan to finish reading on the plane. I am not sure what the pie settings would be for a sat system. Perhaps, there is a need for a design that adapts the queue size (or delay) target dynamically by estimating numConnections ! Not perhaps. That is one of the avenues cake is exploring: https://lists.bufferbloat.net/pipermail/cake/2015-June/000241.html Anil -Original Message- From: aqm [mailto:aqm-boun...@ietf.org] On Behalf Of Simon Barber Sent: Monday, June 15, 2015 2:01 PM To: Dave Taht Cc: Jonathan Morton; aqm@ietf.org; Steven Blake Subject: Re: [aqm] CoDel on high-speed links On 6/14/2015 10:26 PM, Dave Taht wrote: On Sun, Jun 14, 2015 at 4:10 PM, Simon Barber si...@superduper.net wrote: Indeed - I believe that Codel will drop too much to allow maximum bandwidth utilization, when there are very few flows, and RTT is significantly greater than target. Interval. Not target. Interval defaults to 100ms. Target is 5ms. Dropping behaviors stop when the queue falls below the target. In this case I specifically mean target, not interval. Dropping stops when queue falls below target, but by then it's too late. In the case I'm talking about (cwind cut by more than queue length) then a period of link idle occurs, and so bandwidth is hurt. It happens repeatedly. Range of tests from near zero to 300ms RTT codel does quite well with reno, better with cubic, on single flows. 4 flows, better. fq_codel does better than that on more than X flows in general. The effect is not huge, but the bandwidth loss is there. More flows significantly reduce the effect, since the other flows keep the link busy. This bandwidth reduction effect only happens with very few flows. I think TCP Reno will be worse than Cubic, due to it's 50% reduction in cwind on drop vs Cubic's 20% reduction - but Cubic's RTT independent increase in cwind after the drop may make the effect happen more often with larger RTTs. What results have you seen for codel on single flow for these larger RTTs? You can easily do whatever experiments you like with off the shelf hardware and RTTs around half the planet to get the observations you need to confirm your thinking. Remember that a drop tail queue of various sizes has problems of it's own. I have a long overdue rant in progress of being wikified about how to use netem correctly to properly emulate any rtt you like. I note that a main aqm goal is not maximum bandwidth utilization, but maximum bandwidth while still having working congestion avoidance and minimal queue depth so other new flows can rapidly grab their fair share of the link. The bufferbloat problem was the result of wanting maximum bandwidth for single flows. Indeed - with many TCP CC algorithms it's just not possible to achieve maximum bandwidth utilization with only 5ms induced latency when the RTTs are long, and a single queue (no FQ, only drop tail or single queue AQM). The multiplicative decrease part of TCP CC simply does not allow it unless the decrease is smaller than the queue (PRR might mitigate a little here). Now add in FQ and you can have the best of both worlds. The theory is - with a Reno based CC the cwind gets cut in half on a drop. If the drop in cwind is greater than the number of packets in the queue, then the queue will empty out, and the link will then be idle for a flight + queue. When cwind gets cut by N packets, the sender stops sending data while ACKs for N data packets are received. If the queue has less than N data packets, then it will empty out, resulting in an idle link at that point, and eventually at the receiver (hence bandwidth loss). while. If you want data
Re: [aqm] CoDel on high-speed links
On Mon, Jun 15, 2015 at 5:12 PM, Agarwal, Anil anil.agar...@viasat.com wrote: Dave, I guess I need to read up on cake. some basic doc is at: http://www.bufferbloat.net/projects/codel/wiki/Cake The most important thing in cake at the moment is GRO packet peeling, which turned out desperately needed in all the new router hardware we have encountered. Huge wins there. the other stuff is not fully baked or implemented yet. We are in a bit of a debate about the most troublesome and misunderstood aspect of codel on the list over there. If you have time, can you simulate an RTT of 600 ms? ok. With a few queue drain rates from 1 Mbps to 100 Mbps. 10,50,100,200,300 was my planned range, fed by gigE. I can't go much faster than than that. (and even as low as 300 requires GRO offloads) Would help us satellite folks get a better understanding of CoDel parameters. I have not looked at the very long rtt problem in several years, and since then tcps, in particular have changed muchly (pacing in particular) Thanks, Anil -Original Message- From: Dave Taht [mailto:dave.t...@gmail.com] Sent: Monday, June 15, 2015 7:57 PM To: Agarwal, Anil Cc: Simon Barber; Jonathan Morton; aqm@ietf.org; Steven Blake Subject: Re: [aqm] CoDel on high-speed links On Mon, Jun 15, 2015 at 4:40 PM, Agarwal, Anil anil.agar...@viasat.com wrote: I guess this is pointing to the age old problem - what is the right buffer size or equivalent delay limit, when packets should be dropped or ECN-marked, so that the link is never under-utilized? For a single TCP connection, the answer is the bandwidth-delay product BDP. For large number of connections, it is BDP / sqrt(numConnections). Hence, one size does not fit all. E.g., for RTT of 100 ms or 500 ms, CoDel target delay of 5 or 10 ms is too short - when handling a small number of connections. The codel recommendation is that the target be set to 5-10% of the typical interval (RTT), so in the case of a sat link, interval would be 600(?)ms, and target 5-10 % of that, 30-60ms. The recommendation bears testing in this scenario. If you would like to specify what you expect your (say) 98th percentile real physical RTT, I can exhaustively simulate that rather rapidly later this week against reno, cdg, cubic, and westwood and various aqms, against flows from 1 to 100. Most of this convo has also missed other advancements in tcp (reno is thoroughly dead), like PRR. A very good read which incorporates a discussion of PRR is http://folk.uio.no/kennetkl/jonassen_thesis.pdf which I plan to finish reading on the plane. I am not sure what the pie settings would be for a sat system. Perhaps, there is a need for a design that adapts the queue size (or delay) target dynamically by estimating numConnections ! Not perhaps. That is one of the avenues cake is exploring: https://lists.bufferbloat.net/pipermail/cake/2015-June/000241.html Anil -Original Message- From: aqm [mailto:aqm-boun...@ietf.org] On Behalf Of Simon Barber Sent: Monday, June 15, 2015 2:01 PM To: Dave Taht Cc: Jonathan Morton; aqm@ietf.org; Steven Blake Subject: Re: [aqm] CoDel on high-speed links On 6/14/2015 10:26 PM, Dave Taht wrote: On Sun, Jun 14, 2015 at 4:10 PM, Simon Barber si...@superduper.net wrote: Indeed - I believe that Codel will drop too much to allow maximum bandwidth utilization, when there are very few flows, and RTT is significantly greater than target. Interval. Not target. Interval defaults to 100ms. Target is 5ms. Dropping behaviors stop when the queue falls below the target. In this case I specifically mean target, not interval. Dropping stops when queue falls below target, but by then it's too late. In the case I'm talking about (cwind cut by more than queue length) then a period of link idle occurs, and so bandwidth is hurt. It happens repeatedly. Range of tests from near zero to 300ms RTT codel does quite well with reno, better with cubic, on single flows. 4 flows, better. fq_codel does better than that on more than X flows in general. The effect is not huge, but the bandwidth loss is there. More flows significantly reduce the effect, since the other flows keep the link busy. This bandwidth reduction effect only happens with very few flows. I think TCP Reno will be worse than Cubic, due to it's 50% reduction in cwind on drop vs Cubic's 20% reduction - but Cubic's RTT independent increase in cwind after the drop may make the effect happen more often with larger RTTs. What results have you seen for codel on single flow for these larger RTTs? You can easily do whatever experiments you like with off the shelf hardware and RTTs around half the planet to get the observations you need to confirm your thinking. Remember that a drop tail queue of various sizes has problems of it's own. I have a long overdue rant in progress of being wikified about how to use netem correctly to properly emulate any
Re: [aqm] CoDel on high-speed links
On Sun, Jun 14, 2015 at 4:10 PM, Simon Barber si...@superduper.net wrote: Indeed - I believe that Codel will drop too much to allow maximum bandwidth utilization, when there are very few flows, and RTT is significantly greater than target. Interval. Not target. Interval defaults to 100ms. Target is 5ms. Dropping behaviors stop when the queue falls below the target. Range of tests from near zero to 300ms RTT codel does quite well with reno, better with cubic, on single flows. 4 flows, better. fq_codel does better than that on more than X flows in general. You can easily do whatever experiments you like with off the shelf hardware and RTTs around half the planet to get the observations you need to confirm your thinking. Remember that a drop tail queue of various sizes has problems of it's own. I have a long overdue rant in progress of being wikified about how to use netem correctly to properly emulate any rtt you like. I note that a main aqm goal is not maximum bandwidth utilization, but maximum bandwidth while still having working congestion avoidance and minimal queue depth so other new flows can rapidly grab their fair share of the link. The bufferbloat problem was the result of wanting maximum bandwidth for single flows. The theory is - with a Reno based CC the cwind gets cut in half on a drop. If the drop in cwind is greater than the number of packets in the queue, then the queue will empty out, and the link will then be idle for a flight + queue. while. If you want data to keep the data flowing uninterrupted, then you must have a full unloaded RTT's worth of data in the queue at that point. A Do the experiment? Recently landed in flent is the ability to monitor queue depth while running another test. drop will happen, the cwind will be halved (assuming a Reno TCP), and the sender will stop sending until one (unloaded) RTT's worth of data has been received. At that point the queue will just hit empty as the sender starts sending again. And reno is dead. Long live reno! Simon On 6/9/2015 10:30 AM, Jonathan Morton wrote: Wouldn't that be a sign of dropping too much, in contrast to your previous post suggesting it wouldn't drop enough? In practice, statistical multiplexing works just fine with fq_codel, and you do in fact get more throughput with multiple flows in those cases where a single flow fails to each adequate utilisation. Additionally, utilisation below 100% is really characteristic of Reno on any worthwhile AQM queue and significant RTT. Other TCPs, particularly CUBIC and Westwood+, do rather better. - Jonathan Morton ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
to. Regards, Anil Agarwal ViaSat Inc. -Original Message- From: aqm [mailto:aqm-boun...@ietf.org] On Behalf Of Steven Blake Sent: Tuesday, June 09, 2015 4:40 PM To: Dave Taht Cc: aqm@ietf.org Subject: Re: [aqm] CoDel on high-speed links On Tue, 2015-06-09 at 12:44 -0700, Dave Taht wrote: The below makes several mis-characterisations of codel in the first place, and then attempts to reason from there. Hmmm... On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake slbl...@petri-meat.com wrote: I have a question about how CoDel (as defined in draft-ietf-aqm-codel-01) behaves on high-speed (e.g., = 1 Gbps) links. If this has been discussed before, please just point me in the right direction. In the text below, I'm using drop to mean either packet discard/ECN mark. I'm using (instantaneous) drop frequency to mean the inverse of the interval between consecutive drops during a congestion epoch, measured in drops/sec. The control law for CoDel computes the next time to drop a packet, and is given as: t + interval/sqrt(count) where t is the current time, interval is a value roughly proportional to maximum RTT (recommended 100 msec), and count is cumulative number of drops during a congestion epoch. No. Count is just a variable to control the curve of the drop rate. It is not constantly incremented, either, it goes up and down based on how successful it is at controlling the flow(s), only incrementing while latency exceeds the target, decrementing slightly after it stays below the target. The time spent below the target is not accounted for, so you might have a high bang-bang drop rate retained, when something goes above from below. This subtlety is something people consistently miss and something I tried to elucidate in the first stanford talk. I specifically mentioned during a congestion epoch, but let me be more precise: count is continuously incremented during an extended period where latency exceeds the target (perhaps because CoDel isn't yet dropping hard enough). Correct? The fact that the drop frequency doesn't ramp down quickly when congestion is momentarily relieved is good, but doesn't help if it takes forever for the algorithm to ramp up to an effective drop frequency (i.e., something greater than 1 drop/flow/minute). It is not hard to see that drop frequency increases with sqrt(count). At the first drop, the frequency is 10 drop/sec; after 100 drops it is 100 drops/sec; after 1000 drops it is 316 drops/sec. On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel immediately starts dropping 1% of packets and ramps up to ~10% after 100 drops (1.86 secs). No it will wait 100ms after stuff first exceeds the target, then progressively shoot harder based on the progress of the interval/sqrt(count). Ok. At the first drop it is dropping at a rate of 1 packet/100 msec == 10 drops/sec and ramps up from there. At the 100th drop it is dropping at a rate of 100 msec/sqrt(100) == 1 packet/10 msec == 100 drops/sec. This just so happens to occur after 1.8 secs. Aside: as described, CoDel's drop frequency during a congestion epoch increases approximately linearly with time (at a rate of about 50 drops/sec^2 when interval = 100 msec). secondly people have this tendency to measure full size packets, or a 1k average packet. The reality is a dynamic range of 64 bytes to 64k (gso/tso/gro offloads). So bytes is a far better proxy than packets in order to think about this properly. offloads of various sorts bulking up packet sizes has been a headache. I favor reducing mss on highly congested underbuffered links (and bob favors sub-packet windows) to keep the signal strength up. The original definition of packet (circa 1962) was 1000 bits, with up to 8 fragments. I do wish the materials that were the foundation of packet behavior were online somewhere... I don't see how this has anything to do with the text of the draft or my questions. This seems like a reasonable range. On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). I am allergic to averages as a statistic in the network measurement case. This doesn't seem to be very effective. It's possible to reduce interval to ramp up drop frequency more quickly, but that is counter-intuitive because interval should be roughly proportional to maximum RTT, which is link-speed independent. Except that tcp's drop their rates by (typically) half on a drop, and a matter of debate as to when on CE. Ex/ 10 GE link, ~10K flows (average). During a congestion epoch, CoDel with interval = 100 msec starts dropping 257 packets/sec after 5 secs. How many flows is that effectively managing? Unless I am mistaken, it appears that the control law should be normalized in some way to average packet rate. On a high-speed link, it might
Re: [aqm] CoDel on high-speed links
On Tue, Jun 9, 2015 at 10:14 AM, Simon Barber si...@superduper.net wrote: My concern with fq_codel is that by putting single flows into single Codel instances you hit the problem with Codel where it limits bandwidth on higher RTT paths. I recently did a bit of work, testing rtt_fairness from my location (los gatos, california) to linodes in london, dallas, tokoyo, and newark, at RTTs of roughly 145, 45, 115, and 85ms. The servers are all using sch_fq and a modern linux (on a vm) There is a rangley box inbetween running the sqm scripts that let me test pie, codel, fq_codel, cake, etc. On the long path, with pie, the download rate was generally higher than on the shorter paths, which was kind of interesting and would bear a repeated look at. Codel, more even, and fq_codel was very even across all rtts. http://snapon.lab.bufferbloat.net/~d/qdisc-stats2/download_comparison.png http://snapon.lab.bufferbloat.net/~d/qdisc-stats2/upload_comparison.png (rawer data in that dir or you can get it all via: http://snapon.lab.bufferbloat.net/~d/qdisc-stats2.tgz toke is also working on getting buffering, drop, and delay measurements, some of those and preliminary plot types are in there also. pull flent from git. ) the rtt_fair4be dataset is noisy (and limited to my local connection speed of 70/10mbits) If anyone would like access to these servers for more extensive testing, I still have quite a few more gigabits to use up, and no time to use them. Contact me offlist for access. Simon Sent with AquaMail for Android http://www.aqua-mail.com On June 9, 2015 9:32:15 AM Jonathan Morton chromati...@gmail.com wrote: On 9 Jun, 2015, at 19:11, Steven Blake slbl...@petri-meat.com wrote: On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). This doesn't seem to be very effective. Question: have you worked out what drop rate is required to achieve control of a TCP at that speed? There are well-known formulae for standard TCPs, particularly Reno. You might be surprised by the result. Fundamentally, Codel operates on the principle that one mark/drop per RTT per flow is sufficient to control a TCP, or a flow which behaves like a TCP; *not* a particular percentage of packets. This is because TCPs are generally required to perform multiplicative decrease upon a *single* congestion event. The increasing count over time is meant to adapt to higher flow counts and lower RTTs. Other types of flows tend to be sparse and unresponsive in general, and must be controlled using some harder mechanism if necessary. One such mechanism is to combine Codel with an FQ system, which is exactly what fq_codel in Linux does. Fq_codel has been tested successfully at 10Gbps. Codel then operates separately for each flow, and unresponsive flows are isolated. - Jonathan Morton ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] CoDel on high-speed links
The below makes several mis-characterisations of codel in the first place, and then attempts to reason from there. On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake slbl...@petri-meat.com wrote: I have a question about how CoDel (as defined in draft-ietf-aqm-codel-01) behaves on high-speed (e.g., = 1 Gbps) links. If this has been discussed before, please just point me in the right direction. In the text below, I'm using drop to mean either packet discard/ECN mark. I'm using (instantaneous) drop frequency to mean the inverse of the interval between consecutive drops during a congestion epoch, measured in drops/sec. The control law for CoDel computes the next time to drop a packet, and is given as: t + interval/sqrt(count) where t is the current time, interval is a value roughly proportional to maximum RTT (recommended 100 msec), and count is cumulative number of drops during a congestion epoch. No. Count is just a variable to control the curve of the drop rate. It is not constantly incremented, either, it goes up and down based on how successful it is at controlling the flow(s), only incrementing while latency exceeds the target, decrementing slightly after it stays below the target. The time spent below the target is not accounted for, so you might have a high bang-bang drop rate retained, when something goes above from below. This subtlety is something people consistently miss and something I tried to elucidate in the first stanford talk. It is not hard to see that drop frequency increases with sqrt(count). At the first drop, the frequency is 10 drop/sec; after 100 drops it is 100 drops/sec; after 1000 drops it is 316 drops/sec. On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel immediately starts dropping 1% of packets and ramps up to ~10% after 100 drops (1.86 secs). No it will wait 100ms after stuff first exceeds the target, then progressively shoot harder based on the progress of the interval/sqrt(count). secondly people have this tendency to measure full size packets, or a 1k average packet. The reality is a dynamic range of 64 bytes to 64k (gso/tso/gro offloads). So bytes is a far better proxy than packets in order to think about this properly. offloads of various sorts bulking up packet sizes has been a headache. I favor reducing mss on highly congested underbuffered links (and bob favors sub-packet windows) to keep the signal strength up. The original definition of packet (circa 1962) was 1000 bits, with up to 8 fragments. I do wish the materials that were the foundation of packet behavior were online somewhere... This seems like a reasonable range. On a 10 GE link serving 2.5 MPPs on average, CoDel would only drop 0.013% of packets after 1000 drops (which would occur after 6.18 secs). I am allergic to averages as a statistic in the network measurement case. This doesn't seem to be very effective. It's possible to reduce interval to ramp up drop frequency more quickly, but that is counter-intuitive because interval should be roughly proportional to maximum RTT, which is link-speed independent. Except that tcp's drop their rates by (typically) half on a drop, and a matter of debate as to when on CE. Unless I am mistaken, it appears that the control law should be normalized in some way to average packet rate. On a high-speed link, it might be common to drop multiple packets per-msec, so it also isn't clear to me whether the drop frequency needs to be recalculated on every drop, or whether it could be recalculated over a shorter interval (e.g., 5 msec). Pie took the approach of sampling, setting a rate for shooting, over a 16ms interval. That's pretty huge, but also low cost in some hardware. Codel's timestamp per-packet control law is continuous (but you do need to have a cheap packet timestamping ability). Certainly in all cases more work is needed to address the problems 100gps rates have in general, and it is not just all queue theory! A small packet is .62 *ns* in that regime. A benefit of fq in this case is that you can parallelize fib table lookups across multiple processors/caches, and of fq_codel is that all codels operate independently. Regards, // Steve ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] FQ-PIE kernel module implementation
On Thu, Jun 4, 2015 at 3:06 PM, Hironori Okano -X (hokano - AAP3 INC at Cisco) hok...@cisco.com wrote: Hi all, I’m Hironori Okano and Fred’s intern. I’d like to let you know that I have implemented FQ-PIE as a linux kernel module “fq-pie and iproute2 for fq-pie. This was done in collaboration with others at Cisco including Fred Baker, Rong Pan, Bill Ver Steeg, and Preethi Natarajan. The source codes are in my github repository. I attached patch file “fq-pie_patch.tar.gz” to this email also. I’m using the latest linux kernel (git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git) fq-pie kernel module https://github.com/hironoriokano/fq-pie.git iproute2 for fq-pie https://github.com/hironoriokano/iproute2_fq-pie.git If you have any comments, please reach out to me. Best regards, Very cool. I have been building this as part of my testbed for some time now with some very impressive results. I will update my openwrt tree to pull from yours (if possible, openwrt is still largely linux-3.18 based, otherwise I might have to slip in some backport code) https://github.com/dtaht/ceropackages-3.10/tree/master/net/kmod-sched-fq_pie thanks for such a cool and interesting qdisc! Hironori Okano hok...@cisco.com ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] big science discovers sch_fq and pacing
https://fasterdata.es.net/host-tuning/linux/fair-queuing-scheduler/ -- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] AQM hurts utilization with a single TCP stream?
On Fri, May 22, 2015 at 11:42 PM, Jonathan Morton chromati...@gmail.com wrote: In practice, I haven't noticed any loss of throughput due to using Codel on 100ms+ RTTs. Probably most servers now use CUBIC, which contributes to that impression. There are only slight differences between these tcps (and everybody just uses multiple flows to do stuff anyway) Using ECN rather than tail drops also makes the delivery smoother. I can try some longer rtts than this (10,70). this was against the latest cake on linux 4.1rc3. http://snapon.lab.bufferbloat.net/~cero3/renovscubic.tgz (or the dir) There is a flaw in your analysis. Codel only starts dropping (or marking) when the sojourn time has remained above target (5ms) for an entire interval (100ms), during which time the cwnd is still growing. Thus the peak queue occupancy is more than 5ms. yes. people keep missing the wait for an interval thing in codel. It is however probably fair to say that a single Reno flow does lose some throughput under AQM. very clear plot of reno's classic sawtooth behavior vs cubics: http://snapon.lab.bufferbloat.net/~cero3/renovscubic/reno.png http://snapon.lab.bufferbloat.net/~cero3/renovscubic/cubic.png On a 200ms rtt things look more interesting, but aqm hardly enters into it. Reno is simply less efficient than cubic, period. but I'll leave it to you to generate your own plots if you want, out of the netperf-eu dataset above. (it would be nice to be able to generate directly comparable plots vs cc algo types, presently you can't combine these data sets in flent) - Jonathan Morton ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht Open Networking needs **Open Source Hardware** https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] I-D Action: draft-ietf-aqm-recommendation-04.txt
On Thu, May 21, 2015 at 10:18 PM, Simon Barber si...@superduper.net wrote: On 5/18/2015 10:00 AM, Dave Taht wrote: LEDBAT was probably my first concern and area of research before entering this project full time. I *knew* we were going to break ledbat, but the two questions were: how badly? (ans: pretty badly) and did it matter? (not that much, compared to saving seconds overall in induced delay) LEDBAT is about more than just reducing the delay caused by the steam - it's also about the bandwidth impact. AQM solves the delay situation, but breaks the bandwidth reduction that LEDBAT can achieve today when other traffic is present. In pointing out that paper I have to stress that their good ledbat result, (read the text around table 3) was look! it's scavenging! And mine was: with over 7 seconds of inherent delay on the link! Revisiting the data sets with reasonable amounts of buffering on the link, a correctly functioning tcp stack, and a few other variables more under control would be good... (much as I pushed for dctcp to be looked at once real patches landed for it) ... as would investigating actual behavior of ledbat on real links with aqm and fq technologies on them. While I poked into it quite a lot, I did not do much more rigorous than observe that web traffic worked a lot better when torrent was present in a fq/aqm'd environment and that cubic outcompeted it slightly, generally. There was supposed to be someone else updating the tcp_ledbat kernel module we used, but that never got fixed, and it is in a dire need of update since the change to usec from msec and other major tcp modifications in the linux kernel. while we have long recommended CS1 be set on torrent, it turns out that a lot of gear actually prioritizes that over BE, still. It helps on the outbound where you can still control your dscp settings. Many torrent users have reported just setting their stuff to max outbound and rate limiting inbound, and observing no real effects on their link. Do you have examples of the gear that prioritizes CS1 over best effort? How often have you seen it? Did you see it in places where it would be important? Yes. a lot. and yes. More details I can do later. Simon -- Dave Täht Open Networking needs **Open Source Hardware** https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] I-D Action: draft-ietf-aqm-recommendation-04.txt
On Mon, May 18, 2015 at 8:54 AM, Jonathan Morton chromati...@gmail.com wrote: On 18 May, 2015, at 18:27, Simon Barber si...@superduper.net wrote: Apparently a significant chunk of bittorrent traffic and Windows updates use these techniques to deprioritise their traffic. Widespread adoption of AQM will remove their ability to avoid impacting the network at peak times. Use of DSCP could be one way to mitigate this problem with AQM, and this merits further study. I’m working on a comprehensive algorithm (including AQM, FQ and Diffserv support and a shaper in one neat package) which does address this problem, or at least provides a platform for doing so. Some information here: http://www.bufferbloat.net/projects/codel/wiki/Cake This is partially an outgrowth of some of the ideas and problems I attempted to discuss at ietf90. https://www.ietf.org/proceedings/90/slides/slides-90-aqm-6.pdf Since then various other working groups (like dart) attempted to answer some of the same questions. I am pretty convinced (now) that inbound policing on cpe can be improved to better fool dumb upstream rate limiters (like those in cmtses), but haven't got around to doing the work (it's called bobbie). The biggest problem we have with applying a shaper + fq/aqm algorithm to inbound traffic on an already be-ing dumbly rate limited link is that a burst can backup in the upstream cmts and stay backed up - a rate differential of 90 to 100 takes a long time for an aqm to bring under control. Analysis of smoothness might also help. When the ratios are 10 or 1000s to 1 and there is only one bottleneck link, we do better. This is working code, albeit still under development. I’m actively dogfooding it, and I’m not the only one doing so. Pushing it into openwrt soon, we hope. As it stands cake is a win across the board on cpu cost and fairness, it does saner things with ecn, and so on... We have discussed a few more advanced ideas that are not currently in cake on the cake mailing list, including better coupling between flows, more rapid response to overload, etc. The Diffserv layer provides a four-class system by default, corresponding in principle with the 802.1p classes - background, best-effort, video and voice. It does not inherit the naive mapping from DSCPs to those classes, though - only CS1 (001000) is mapped to the background class. I see a ton of traffic remarked to CS1 from comcast. Others may be more lucky. Since dart I have basically come to the conclusion we need at least one new diffserv priority class for scavaging traffic. An important part of the Diffserv support in Cake is that the enhanced priority given to the video and voice classes applies only up to given shares of the overall bandwidth. If traffic in those classes exceeds that allocated share, deprioritisation occurs. This ensures that improperly marked traffic cannot starve the link, and attempts to incentivise correct marking. - Jonathan Morton ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht Open Networking needs **Open Source Hardware** https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] I-D Action: draft-ietf-aqm-recommendation-04.txt
On Mon, May 18, 2015 at 8:27 AM, Simon Barber si...@superduper.net wrote: Thank you Mikeal, these are useful observations about the choice of exact DSCP value and various potential impacts. I agree that ultimately without operator agreement non of this matters. I do think that an important step towards garnering that operator agreement is to have the concerns clearly elucidated in this group's recommendations. I found a study of the interaction between low priority background techniques, including LEDBAT and AQM. http://www.enst.fr/~drossi/paper/rossi12conext.pdf That paper was continually extended and revised. (I have had very little to do with it since the first release.) http://perso.telecom-paristech.fr/~drossi/paper/rossi14comnet-b.pdf While it is pretty good... my fav part of that paper is table 3 where the authors ignore the 7 second delay on the link but otherwise showing the optimal ratio between real tcp and utp in their testbed. LEDBAT was probably my first concern and area of research before entering this project full time. I *knew* we were going to break ledbat, but the two questions were: how badly? (ans: pretty badly) and did it matter? (not that much, compared to saving seconds overall in induced delay) it's conclusion states: Shortly, our investigation confirms the negative interference: while AQM fixes the bufferbloat, it destroys the relative priority among Cc protocols. Yep. I do wish the paper was updated to account for 4 concepts 0) never got around to trying ns2/ns3 fq_codel or the sqm_scripts against it 1) utp has a lower IW. With the move to IW10 in linux tcp iw10 tcp knocks utp more out of the way (note that a ton of torrent clients still use tcp and thus they are getting an advantage now by using iw10 that they shouldn't be). Anyway, most web traffic knocks utp out of the way handily 2) ledbat when first proposed had a 25ms target for induced delay. I would not mind that tried again. 3) coupled congestion control (one app, many flows) Apparently a significant chunk of bittorrent traffic and Windows updates use these techniques to deprioritise their traffic. So, torrent and ledbat are different things. torrent has LOTS of flows (worst case 6 active/torrent, (50 or more connected, and switching one into an active state every 15 seconds)). ledbat is just a cc algorithm that torrent and some other heavy apps use. Widespread adoption of AQM will remove their ability to avoid impacting the network at peak times. No. A single ledbat flow will behave like a single tcp flow. Widespread adoption of AQM will make it easier for many flows to share the network with low latency. I don't see any impact from continued use of ledbat for applications like updates, backups, etc. My own recomendation is merely to try torrent today with your aqm or fq system of choice and see what happens. I did, and stopped worrying about ledbat. Use of DSCP could be one way to mitigate this problem with AQM, and this merits further study. while we have long recommended CS1 be set on torrent, it turns out that a lot of gear actually prioritizes that over BE, still. It helps on the outbound where you can still control your dscp settings. Many torrent users have reported just setting their stuff to max outbound and rate limiting inbound, and observing no real effects on their link. Simon Sent with AquaMail for Android http://www.aqua-mail.com On May 13, 2015 1:47:33 AM Mikael Abrahamsson swm...@swm.pp.se wrote: On Tue, 12 May 2015, Simon Barber wrote: Hi John, Where would be the best place to see if it would be possible to get agreement on a global low priority DSCP? Currently the general assumption among ISPs is that DSCP should be zeroed between ISPs unless there is a commercial agreement saying that it shouldn't. This is generally accepted (there are NANOG mailing list threads on several occasions in the past 5-10 years where this was the outcome). The problem is quite complex if you actually want things to act on this DSCP value, as there are devices with default behaviour is 4 queue 802.1p, with 1 and 2 (which will match AF1x and AF2x) will have lower priority than 0 and 3 (BE and AF3x), and people doing DSCP based forwarding, usually does things the other way around. It might be possible to get the last DSCP bits to map into this, because for DSCP-ignorant quipment, this would still be standard BE to something only looking at CSx (precedence), but that would be lower than 00. So DSCP 000110 (high drop BE) might work, because it's incremental. Possibly DSCP 10 (low drop BE) might be able to get some agreement because it doesn't really cause any problems in the existing networks (most likely) and it could be enabled incrementally. I would suggest bringing this kind of proposal to operator organizations and the IETF. It needs to get sold to the ISPs mostly, because in this aspect the IETF decision will mostly be empty
Re: [aqm] I-D Action: draft-ietf-aqm-recommendation-04.txt
On Tue, May 12, 2015 at 9:17 PM, Simon Barber si...@superduper.net wrote: Hi Wesley, Thanks for considering my comments, and apologies for being so late in the process - I've only recently been able to put time into this area, and I understand it may be too late in the process to hack things in. I replied to John with where I'm concerned with the current -11 text. I am glad you are able to put time in, you have been a long way away. Re: background / low priority streams. There are other ways to achieve a 'lower priority', such as changing the AIMD parameters. Does not help if FQ is involved though. There are many ways to do lower priority streams if fq is present. Simplest: 1) Send 3 packets back to back, timestamped. First packet arrives in an empty queue, gets sent out immediately, 2nd and third packet are affected by the total number of flows extant. (fq_codel) (or in SFQ all are affected by total number of flows) keep that to 1/2 OWD (or less) plus fuzz/smoothing and you have a solution for how much additional load you are willing to add to the network. 2) for coupled congestion control on say 6 flows from one app do the same sort of bunching and measure, then drop off when one or more of the flows experiences excessive delay. In both cases the timestamps would be received differently and in order via pure aqm or drop tail most of the time. It is relatively easy to get low priority in other words in a fq'd system. It is harder to get to an optimal bandwidth while still staying low priority and somewhat hard to figure out if you are being fq'd in the first place. My concern is that implementing AQM removes a capability from the network, so doing so without providing a mechanism to support low priority is a negative for certain applications (backups, updates - and the impact these have on other applications). Would be good for this to be at least common knowledge. Is there any other document this could go in? see dart. Simon On 5/12/2015 5:11 PM, Wesley Eddy wrote: On 5/8/2015 11:42 PM, Simon Barber wrote: I have a couple of concerns with the recommendations of this document as they stand. Firstly - implementing AQM widely will reduce or even possibly completely remove the ability to use delay based congestion control in order to provide a low priority or background service. I think there should be a recommendation that if you are implementing AQM then you should also implement a low priority service using DSCP, e.g. CS1. This will enable these low priority applications to continue to work in an environment where AQM is increasingly deployed. Unlike DSCPs that give higher priority access to the network, a background or low priority DSCP is not going to be gamed to get better service! Secondly, there is a recommendation that AQM be implemented both within classes of service, and across all classes of service. This does not make sense. If you are implementing AQM across multiple classes of service, then you are making marks or drops while ignoring what class the data belongs to. This destroys the very unfairness that you wanted to achieve by implementing the classes in the first place. Hi Simon, thanks for your comments. These comments appear to be in response to version -04 of the document, from around 1 year ago. The document is currently on version -11, has past working group last call and IESG evaluation, and is in the RFC Editor's queue. I mention this, because it isn't clear to me how applicable your comments are with regard to the current copy. The current copy can be found at: https://datatracker.ietf.org/doc/draft-ietf-aqm-recommendation/ The current revision does mention the impact to delay-based end-host algorithms as an area for future research. While I agree that in a lot of cases it seems like logically a good idea to have a DiffServ configuration like you mention, I don't think we have seen data on this yet in the working group. Looking into this could be part of that mentioned future work, though not something I'd want to see hacked into this document today, so late in its publication process. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht Open Networking needs **Open Source Hardware** https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] splat start?
On Mon, May 11, 2015 at 5:26 AM, Mirja Kühlewind mirja.kuehlew...@tik.ee.ethz.ch wrote: Hi Dave, Michael Scharf performed his PhD thesis on start up mechanisms. Here is one of his papers (from 2009): Am I the only person that works in spreadsheets to model stuff? :( Scharf, M.: Work in Progress: Performance Evaluation of Fast Startup Congestion Control Schemes Proceedings of the 8th IFIP-TC6 Networking Conference (Networking 2009), Lecture Notes in Computer Science (LNCS) 5550, Aachen, May 2009 Thank you. I did not know that work had also fed back into RFC6928. https://www.bell-labs.com/researchers/537/ thesis here. But I will need a spare weekend to read it. http://www.ikr.uni-stuttgart.de/Content/Publications/Archive/Sf_Diss_40112.pdf I guess he can further comment on this own (cc’ed). Mirja Am 10.05.2015 um 04:18 schrieb Dave Taht dave.t...@gmail.com: One of the things bugging me lately is that we actually have a lot of forms of slow start on the table - HyStart, Initial Spreading, reno vs cubic, dctcp, IW2, IW4, IW10, TSO offloads, the effect of GRO on it, etc. I dont know what is in QUIC, either. I would love a comprehensive guide to exactly the behaviors of slow start in every tcp known to man and some sane way to refer to them all in a cross reference and a spreadsheet. Does something like that exist? Just the * start behavior. The world has spent way too much time analyzing congestion avoidance mode. -- Dave Täht ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht Open Networking needs **Open Source Hardware** https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67 ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] splat start?
One of the things bugging me lately is that we actually have a lot of forms of slow start on the table - HyStart, Initial Spreading, reno vs cubic, dctcp, IW2, IW4, IW10, TSO offloads, the effect of GRO on it, etc. I dont know what is in QUIC, either. I would love a comprehensive guide to exactly the behaviors of slow start in every tcp known to man and some sane way to refer to them all in a cross reference and a spreadsheet. Does something like that exist? Just the * start behavior. The world has spent way too much time analyzing congestion avoidance mode. -- Dave Täht ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] ECN AQM parameters
On Sat, May 9, 2015 at 10:20 AM, Bob Briscoe bob.bris...@bt.com wrote: Dave, As promised, here's my thoughts on what PIE (and CoDel) should do when ECN is enabled. There's also new info in here that I think is important: CoDel uses an RTT estimate in two different places. One has to be the max expected RTT, the other should be the (harmonic) mean of expected RTTs. I like the harmonic idea a lot, it ties in with some of my packet pair thinking on wifi aggregation. The former might be 100ms, but the latter is more likely to be 15-20ms, given most traffic in the developed world these days comes from CDNs. This could make significant difference to performance. Bob Date: Tue, 14 Apr 2015 19:59:47 +0100 To: Fred Baker (fred) f...@cisco.com From: Bob Briscoe bob.bris...@bt.com Subject: ECN AQM parameters (was: AQM Recommendation: last minute change?) Cc: Gorry Fairhurst go...@erg.abdn.ac.uk, Richard Scheffenegger r...@netapp.com, Eddy Wesley M. [VZ] wesley.m.e...@nasa.gov, aqm-...@ietf.org aqm-...@ietf.org Fred, At 22:27 13/04/2015, Fred Baker (fred) wrote: I think w have a pregnant statement in this case. What parameters do you have in mind? The point was simply to ensure that implementers provide sufficient flexibility so that /any or all/ of the AQM parameters for ECN traffic could be separate instances from those for drop. But they would still apply to the same queue, much like the different RED curves for different traffic classes in the WRED algo. With RED, the parameters available to change are min-threshold, max-threshold, the limit mark/drop rate, and (IIRC) the minimum inter-mark/drop interval. ...and, importantly, the EMWA constant, which is the main parameter I would change for ECN (for ECN set ewma-const = 0, assuming the Cisco definition of ewma-const as EWMA weight = 2^{ema-const} So for ECN, EWMA weight = 2^0 = 1). See also {Note 1} about inter-mark/drop interval. With PIE, the equation is p = p + alpha*(est_del-target_del) + beta*(est_del-est_del_old). meaning that we can meaningfully tune alpha, beta, and target_del, and there is an additional 'max_burst' parameter. Yes. Strictly, the min data in the queue before PIE measures the rate ('dq_threshold') is a parameter as well. With Codel, if I understand section 4, the only parameters are the a round-trip time metric (100 ms by default) and the Setpoint, which they set to 5 ms based on it being 5-10% of the RTT. If it's not the target delay, which is essentially what Codel's setpoint is, I'm not sure what parameter you want to change. There are actually two more hidden parameters in CoDel's control law, which is written in the pseudocode as : t + interval / sqrt(count) but ought to have been written as: t + rtt_ave / (count)^b These parameters have been hard-coded as rtt_ave = interval and b=1/2. 'rtt_ave' in the control law is better set to a likely /average/ RTT, whereas the interval used to determine whether to enter dropping mode is better set to a likely /maximum/ RTT. To implement an ECN variant of CoDel I would set interval = 0 (or very close to zero) and I would leave 'rtt_ave' (in the control law) as an average RTT, decoupled from 'interval'. However, the CoDel control law was designed assuming it will remove packets from the queue, so I'm not convinced that any naive approach for implementing ECN will work. I suspect a CoDel-ECN doesn't just need different parameters, it needs a different algo. I have no interest in solving this problem, because I wouldn't start from CoDel in the first place - I would never design an AQM that switches between discrete modes, and CoDel's control law assumes that the e2e congestion control is NewReno which contravenes our AQM recommendations anyway. In saying 'in this case you might want to mess with the parameters', I'm not sure what parameters are under discussion, and in any event we're talking about the document that says 'we should have an algorithm', not the discussion of any of them in particular. To my mind, this begs for a new draft on your part. Certainly. We're still doing the research and evaluation tho (see www.ietf.org/proceedings/92/slides/slides-92-iccrg-5.pdf - I don't remember whether you were in the room for that). But, yes, we will write it up. So far it's not based on RED, PIE or CoDel, but a new drop-based AQM that is most similar to RED but with only 2 parameters (not 4). This is because we needed the drop probability to be the square of the marking probability. So it made the implementation really simple to use a square curve through the origin for drop. It doesn't need min_thresh, because the square curve near-enough runs along the axis when it is close to the origin. For the square curve we used a probability trick - we merely had to compare the queue delay with the max of two random numbers. RED (especially gentle RED) can be thought
Re: [aqm] ECN AQM parameters
I am looking over the rest of your email. It is a lot to absorb... but: I have no interest in solving this problem, because I wouldn't start from CoDel in the first place - I would never design an AQM that switches between discrete modes, and CoDel's control law assumes that the e2e congestion control is NewReno which contravenes our AQM recommendations anyway. 0) I too have trouble... particularly with the decay of codel as it stands. 1) I note we generally got better results from cubic than reno. But: Whatever people call cubic is not what has been in linux, and is certainly not what is in it now after umpteen revisions and bug fixes over the past 4 years. It is not what is in QUIC, and it is not how tcp with sch_fq behaves, and I have tried to document each major change to tcp in every talk I give, with things like hystart being modified, etc. As a baseline reference reno was useless 6+ years ago. Tracking the continuous changes and bugfixes to linux tcp and the driver subsystems over the past 4 years has been one of my biggest headaches. I sometimes wish I was tracking something stable and obsolete like windows or ns2. 2) It is good to know your honest opinion of why you would not start with codel as a base for an AQM, and also good to know your lines of inquery. 3) I really hope that it is clear to everyone that my own main objective is to fix bufferbloat, and I really don't care what algorithms we use to do that - the parts that I care about are getting good data, working clean code, stuff that won´t break the internet, clear rfcs, and all the solutions out there before the heat death of the universe. Obviously I think highly of fq as a big means to get there in many circumstances. I got really good results from fq_pie btw, published them in a dataset that I thought others would find interesting (apparently nobody looks at my data sets, no matter how fast you can fly through them now with netperf-wrapper) http://snapon.lab.bufferbloat.net/~d/cake3-fixed/baseline.png In the next year I hope to finally buckle down down to writing some papers with all the needed math and results while we ramp up on the very difficult problems wifi represents... but now, damn it, I have to go re-run thousands of tests with whatever version of pie emerges from your analysis and the new draft (pie v8 as far as I am concerned, I have been tracking the code for far too long and they kept changing stuff all over the place, where *codel has stayed stable). I have a bunch of ideas queued up for cake which is going to be a test vehicle for also testing enhancements to codel if some more folk were willing to help implement: https://lists.bufferbloat.net/pipermail/cake/2015-April/02.html Particularly the cake_drop_monitor bits seem very useful to basically import over from ns3 and try on real traffic on real machines. tc qdisc add dev whatever root cake flowblind # is the codel only test and i totally welcome new attempts at the problem as you allude to below and will gladly fix up, polish, and test *anything*. But I would like to be able to test stuff in the 3 testbeds I have, the dozens of routers that i have - or the 10s of thousands we can quickly muster by leveraging the openwrt effort, and in the 10 servers I have over the world, and so on, and all the other testbeds now out there, so we can lock the theorists in the same room with the experimenters, coders, and EEs making hardware, so we all line up in the same place(s) at the end. Implementations DO require tradeoffs from ideal circumstances (like fixed point) and sometimes those are significant and not understood by the implementers. So I would like very much for the linux pie code that I have so many results on to also be subject to the same scrutiny as you subjected the draft to and drew plots of in the hope that some of the experimental data will line up fully with your analysis. And I really want to be creating and providing data people can use. Which I don't feel like I am doing right now. Highest on my list is webrtc behaviors, followed by a stack of wifi related issues so high that I dont think 2 years will be enough to get all the coding done... I have been delightfully distracted by debugging the dslreports tests of late 4) So I really liked very much you identifying edge cases in particular in that document, and much else besides. That yields testable concepts instead of having to explore the whole parameter space. thank you thank you thank you! I really understood pie, codel, and aqm behavior overall a lot better after reading that critique! 5) One thing I really gotta do is test the drop on overload even if ecn marked, then mark the next packet idea out more full and commit that to mainline (and look over the tcp scoreboard) and also figure out what to do with pie ecn. High on my list is producing some results showing the existing ecn and drop behaviors in all these algos on the table. 6) There is no 6! On Sat, May 9,
Re: [aqm] draft-ietf-aqm-pie-01: review
Dear Bob: I now understand the linux codebase for pie a lot better, as well as some of the experimental data I have. It looks like I could make several of the changes you describe and put them in my next series of tests, and based on your parameters I should be able to exercise some edge cases across those changes. Wow, thx! I have not actually read the latest pie draft, but would like to make a few comments on your comments quickly: re: 3.1: in linux, params-ecn is presently a boolean, and could easily be modified to mean anything and compare anything you want. What would be a good default? The ECN support in the linux code on enqueue looks like: if (!drop_early(sch, skb-len)) { enqueue = true; } else if (q-params.ecn (q-vars.prob = MAX_PROB / 10) INET_ECN_set_ce(skb)) { /* If packet is ecn capable, mark it if drop probability * is lower than 10%, else drop it. */ Re: 5.0: will look over that code re: 5.1, linux code: /* Non-linear drop in probability: Reduce drop probability quickly if * delay is 0 for 2 consecutive Tupdate periods. */ if ((qdelay == 0) (qdelay_old == 0) update_prob) q-vars.prob = (q-vars.prob * 98) / 100; re: 5.2: strongly agree that the lookup table doesn't scale properly. The linux code appears to differ from the draft also, here, with a smaller lookup table, and some other smoothing functions. I am going to stop pasting now and just point at: https://github.com/torvalds/linux/blob/master/net/sched/sch_pie.c#L334 Will await more feedback from y'all on that. Codel also has a similar undershoot problem for which I proposed we try a fixed point fractional count variable in a recent post to the cake mailing list. re: 5.3.1: In the environments i work in it is extremely hard to get timers to reliably fire in under 2ms intervals, particularly on vm'd systems. Also as you fire the timer more rapidly the current calculations in pie now done out of band of the packet processing have a couple divides in them which tend to be processor intensive... both things said, i figure this and other implementations could fire faster than the default 16ms... re: 5.3.2: I like what you are saying but I gotta go work it out for myself. which will take a while. patches wanted. re: 5.4: linux and all my tests have always been against: /* default of 100 ms in pschedtime */ vars-burst_time = PSCHED_NS2TICKS(100 * NSEC_PER_MSEC); 5.5: explains a lot. Probably. Will think on it. 5.6: :chortle: heats_this_room() indeed! Derandomization always looked to me like an overly complex solution to a non-problem. 5.7: don't think this problem exists in the linux code but will step through. But: in one of my recent (450 rrul_50_up tcp flows) tests neither codel or pie got to a sane drop rate in under 60 seconds, and pie stayed stuck at 1 packets outstanding, i did not try more - on gigE local links. I think a lot of pie tests were run by others with a very low outside packet limit (200 packets) and thus the tail drop kicked in before pie itself could react. 5.8: I think this is a quibblish change not relevant for any reasonable length of queue measured in packets. but I do note that we switched from packet limits to byte limits in cake but that was for other reasons - primarily due to the extreme (1000x1) dynamic range of a modern packet. 6: I do wish the draft and the code I have still lined up, and the constants clearly defined. 7: exp(p) is not cheap, and there aint no floating point in the kernel either. 8. Haven't read the draft, can't comment on the nits. One quick note on: 4.1 Random Dropping s/Like any state-of-the-art AQM scheme, PIE would drop packets randomly/ /PIE drops packets randomly/ Rationale: The other scheme with a claim to be state-of-the-art doesn’t (CoDel). I would agree if the draft had said “A state of the art scheme should introduce randomness into packet dropping in order to desynchronize flows,” but maybe it was decided not to introduce such underhand criticism of CoDel. Whatever, the draft needs to be careful about evangelising random drop, given it attempts to derandomize later. I don't buy the gospel that randomness is needed to avoid tcp global synchronization. I would prefer that whatever evangelicalization of randomness exists in any draft here or elsewhere be dropped in favor always of discussing the real problem of tcp global sync instead... ... which as near as I can tell both codel and fq_codel avoid just fine without inducing a random number generator. I welcome evidence to the contrary. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] bob's summary of ecn and aqm use cases
I thought every bullet point here was marvelous: http://www.ietf.org/mail-archive/web/aqm/current/msg01118.html and would like to see it captured in a formal document somewhere if it is not captured in the ecn advocacy document. I have only three quibbles, one kind of major. re: #5 Slow-starts{Note 3} cause spikes in delay. - AQM without ECN cannot remove this delay, and typically AQM is designed to allow such bursts of delay in the hope they will disappear of their own accord. - Flow queuing can remove the effect of these delay bursts on other flows, but only if it gives all flows a separate queue from the start. I don't think this last is proven. Further, I am pretty sure that a fully dedicated queue per flow is kind of dangerous. I would have said: - AQM *with* ECN cannot remove this delay, and typically AQM is designed to allow such bursts of delay in the hope they will disappear of their own accord. AQM without ECN or with ECN overload protection can make a dent in this delay but not instantaneously. FQ can remove the effect of these delay bursts on other flows. re: #6 Slow-starts{Note 3} can cause runs of losses, which in turn cause delays. - AQM without ECN cannot remove these delays. - Flow queuing cannot remove these losses, if self-induced. - Delay-based SS like HSS can mitigate these losses, with increased risk of longer completion time. - ECN can remove these losses, and the consequent delays. We have a problem with interpreting delay within a flow and delays induced in other flows throughout much of our debates and perhaps we should come up with a clean word to distinguish between these two forms of induced delay. AQM with ECN in this case would reduce the amount of induced delay here on that flow, but cause delay for other flows, and the overall rate would only be reduced by half while perhaps 5 of the IW10 packets (as an example) could have been dropped (clearing the immediate congestion for another flow). With the current overload protection in the different ecn enabled AQM algorithms, different things happen, as I have noted elsewhere. pie very quickly starts dropping even ecn marked packets, when slammed with stuff in slow start, which is perhaps as it should be. Re: Whether flow queuing is applicable depends on the scale. The work I'm doing with Koen is to reduce the cost of the queuing mechanisms on our BNGs (broadband network gateways). We're trying to reduce the cost of per-customer queuing at scale, so per-flow queuing is simply out of the question. Whereas ECN requires no more processing than drop. Three subpoints. 1) It turns out that the amount of packet inspection needed to pry apart a packet and mark it can be quite a lot at dequeue time, additional memory accesses for configuration variables as well. hashing and timestamping the headers at enqueue time is in some ways lighter weight, particularly if offloaded to the rx hardware. There are other things besides queue algorithms that are pretty heavyweight in the code path, notably FIB lookups (recently massively improved in linux 4.0) - which can benefit from be-ing paralellized, as they are, with the current 10GigE hardware in most intel systems, with 16 cpus handling the load of, typically, 64 rx and tx queues. So it is a total systems (amdahl's law) sort of problem as to where the trade-offs are. I would be interested to know of the cpu, network hardware, and memory design of your BNGs. I am painfully aware of how hard it is to do software rate limiting in Linux, by this point, in particular. Doing it in hardware turned out to be straightforward (senic). In the design of cake it was basically my hope to find a simple means to apply it to many, many customer specific queues, but that requires a customer lookup service filter not yet designed, and some attention to how rx queues are handled in the stack on a per cpu basis, and perhaps some custom hardware. I look forward to trying it at 10GigE soon. 2) per-flow queuing is a mere matter of memory organization after that point, and we already know how to scale that to millions of flows on a day to day basis on intel hardware.[1] As sort of a side note that doesn't really fit anywhere 3) - FQ, for lack of a better word, can act as a step down transformer. Imagine if you will a TSO burst emitted at 10GigE, hitting a saturated 10GigE link with 1000 flows with FQ enabled. Each packet from this flow will be slowed down and delivered at effectively 10Mbits/sec. At one level this is desirable, giving the ultimate endpoint more time to maneuver. Breaking up the burst applies a form of pacing, even if the bottleneck link is only momentarily saturated with another flow(s). At another it isn't desirable, particularly in the case of packet aggregation on the endpoint. This burst break-up is, of course, something that already basically happens on switched ports, by design. [1] Please note I just said per-flow queuing not any particular form of fq algorithm and
Re: [aqm] [homenet] IEEE 1905.1 and 1905.1a
up until this moment I had never heard of http://en.wikipedia.org/wiki/IEEE_1905 this spec, and it does sound useful. +10 on more open access to it. +100 on anyone working on open source code for it. I would certainly like closer relationships between the IEEE and IETF one day, perhaps even a truly joint (as opposed to back to back) conference. For far too long members of these two orgs have been going to different parties, and many, many cross layer issues have arisen due to this. In my own case I had hoped (in dropping ietf) to be able to attend more IEEE 802.11 wg meetings - but I would really prefer to stay home and code for a while. I would be very supportive of someone(s) taking on the tasks of better grokking wifi and other non-ethernet media across both orgs both in the context of homenet and in aqm. PS While I have a good grip on cable media layers, I am lacking such on gpon... ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] comcast research funding tracks
http://techfund.comcast.com/ has quite a few topics on it that might be of interest to those working on networking and bufferbloat. I am going to put in for a bit of funding from there myself, but certainly others here have the right interests, but not the time or money to pursue their interests, so... do check that url out. There are a few other programs I am exploring. Things like SBIR didn't seem useful, and most of what DHS is funding is security related rather than network performance related. -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] fixing bufferbloat on bigpond cable...
I was very pleased to see this tweet go by today: https://twitter.com/mnot/status/575581792650018816 where Mark Nottingham fixed his bufferbloat on bigpond cable using a very simple htb + fq_codel script. (I note ubnt edgerouters also have a nice gui for that, as does openwrt) But: he does point out a flaw in netanalyzr's current tests[1], in that it does not correctly detect the presence of aqm or FQing on the link, (in part due to not running long enough, and also in not using multiple distinct flows) and like the ping loss considered harmful thread last week on the aqm and bloat lists, matching user expectations and perceptions would be good with any public tests that exist. There is some stuff in the aqm evaluation guide's burst tolerance tests that sort of applies, but... ideas? [1] I am not aware of any other tests for FQ than mine, which are still kind of hacky. What I have is in my isochronous repo on github. -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] fixing bufferbloat on bigpond cable...
Sorry, didn't read the thread closely. I made a few suggestions on that person's gist, as you probably also have downstream bufferbloat as well, which you can fix (on the edgerouter and openwrt) at speeds up to 60mbit on those weak cpus using the user-supplied edgerouter gui for the ingress stuff. The code for doing inbound shaping also is not that much harder, a simple example for that is in the ingress section on the gentoo wiki here: http://wiki.gentoo.org/wiki/Traffic_shaping (sqm-scripts in openwrt and other linuxen has the logic for this also built-in) It is grand to have helped you out a bit. Thx for all the work on http/2! How about some ecn? ;) On Wed, Mar 11, 2015 at 7:14 PM, Mark Nottingham m...@mnot.net wrote: Hi, Just to clarify -- the credit goes to 'saltspork' on that thread, not I :) Cheers, On 12 Mar 2015, at 1:11 pm, Dave Taht dave.t...@gmail.com wrote: I was very pleased to see this tweet go by today: https://twitter.com/mnot/status/575581792650018816 where Mark Nottingham fixed his bufferbloat on bigpond cable using a very simple htb + fq_codel script. (I note ubnt edgerouters also have a nice gui for that, as does openwrt) But: he does point out a flaw in netanalyzr's current tests[1], in that it does not correctly detect the presence of aqm or FQing on the link, (in part due to not running long enough, and also in not using multiple distinct flows) and like the ping loss considered harmful thread last week on the aqm and bloat lists, matching user expectations and perceptions would be good with any public tests that exist. There is some stuff in the aqm evaluation guide's burst tolerance tests that sort of applies, but... ideas? [1] I am not aware of any other tests for FQ than mine, which are still kind of hacky. What I have is in my isochronous repo on github. -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb -- Mark Nottingham https://www.mnot.net/ -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] fixing bufferbloat on bigpond cable...
cake, if we ever get around to finishing it, gets it down to 1 line of code for outbound, and maybe 1 or 2 for inbound. That said, we probably need a policer for inbound traffic on the lowest end hardware built around fq_codel principles. The design is called bobbie, and I kept meaning to get around to it for about 3 years now. That one line (for anyone willing to try the patches) tc qdisc add dev eth0 root cake bandwidth 2500kbit diffserv but back to my open question - how can we get better public benchmarks that accurately detect the presence of AQM and FQ technologies on the link? On Wed, Mar 11, 2015 at 7:23 PM, Dave Taht dave.t...@gmail.com wrote: Sorry, didn't read the thread closely. I made a few suggestions on that person's gist, as you probably also have downstream bufferbloat as well, which you can fix (on the edgerouter and openwrt) at speeds up to 60mbit on those weak cpus using the user-supplied edgerouter gui for the ingress stuff. The code for doing inbound shaping also is not that much harder, a simple example for that is in the ingress section on the gentoo wiki here: http://wiki.gentoo.org/wiki/Traffic_shaping (sqm-scripts in openwrt and other linuxen has the logic for this also built-in) It is grand to have helped you out a bit. Thx for all the work on http/2! How about some ecn? ;) On Wed, Mar 11, 2015 at 7:14 PM, Mark Nottingham m...@mnot.net wrote: Hi, Just to clarify -- the credit goes to 'saltspork' on that thread, not I :) Cheers, On 12 Mar 2015, at 1:11 pm, Dave Taht dave.t...@gmail.com wrote: I was very pleased to see this tweet go by today: https://twitter.com/mnot/status/575581792650018816 where Mark Nottingham fixed his bufferbloat on bigpond cable using a very simple htb + fq_codel script. (I note ubnt edgerouters also have a nice gui for that, as does openwrt) But: he does point out a flaw in netanalyzr's current tests[1], in that it does not correctly detect the presence of aqm or FQing on the link, (in part due to not running long enough, and also in not using multiple distinct flows) and like the ping loss considered harmful thread last week on the aqm and bloat lists, matching user expectations and perceptions would be good with any public tests that exist. There is some stuff in the aqm evaluation guide's burst tolerance tests that sort of applies, but... ideas? [1] I am not aware of any other tests for FQ than mine, which are still kind of hacky. What I have is in my isochronous repo on github. -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb -- Mark Nottingham https://www.mnot.net/ -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] a test post on a thread that disappeared
On Sat, Mar 7, 2015 at 12:14 PM, Dave Taht dave.t...@gmail.com wrote: I was wondering why a certain thread did not show up in the ietf aqm archive, http://www.ietf.org/mail-archive/web/aqm/current/maillist.html and now, stripping out the urls with an invalid cert, also as a test. Sorry for the noise... I would try to see if it was merely me, that goofed, on the cc. The aqm list was cc'd on a thread titled: some thoughts towards medals and other recognition for fundamental contributions to the internet on the cerowrt-devel and bloat mailing lists, with a follow on message, from Vint Cerf. That title was certainly not indexed by google, thus far... and I really do need to fix that cert... but the post did make it to gmane, at least: http://article.gmane.org/gmane.network.routing.codel/629/match=medals -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] [Cerowrt-devel] ping loss considered harmful
I had spoken to someone at nznog that promised to combine mrtg + smokeping or cacti + smokeping so as to be able to get long term latency and bandwidth numbers on one graph. cc added. On Thu, Mar 5, 2015 at 12:38 PM, Matt Taggart m...@lackof.org wrote: Dave Taht writes: wow. It never registered to me that users might make a value judgement based on the amount of ping *loss*, rather than latency, and in looking back in time, I can think of multiple people that have said things based on their perception that losing pings was bad, and that sqm-scripts was worse than something else because of it. This thread makes me realize that my standard method of measuring latency over time might have issues. I use smokeping http://oss.oetiker.ch/smokeping/ in sqm-scripts's case, possibly, all you have been collecting is largely worst case behavior, which I don't mind collecting as it tends to be pretty good. :) However, I have been unclear. In the main (modern - I don't know what version you have) sqm code, IF you enable dscp squashing on inbound (the default), you do end up with a single fq_codel queue, not 3, no classification or ping prioritization. (it is the default because of all the re-marking I have seen from comcast) So if you are, as I am, monitoring your boxes from the outside, there is no classification and prioritization present for ping. do a tc -s qdisc show ifbwhatever (varies by platform) to see how many queues you have. Example of a single queued inbound rate limiter + fq_codel (yea! packet drop AND ecn working great!) root@lorna-gw:~# tc -s qdisc show dev ifb4ge00 qdisc htb 1: root refcnt 2 r2q 10 default 10 direct_packets_stat 0 direct_qlen 32 Sent 168443514948 bytes 334370551 pkt (dropped 0, overlimits 143273498 requeues 0) backlog 0b 0p requeues 0 qdisc fq_codel 110: parent 1:10 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn Sent 168443514948 bytes 334370551 pkt (dropped 17480, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 1514 drop_overlimit 0 new_flow_count 125872421 ecn_mark 1044 new_flows_len 0 old_flows_len 1 root@lorna-gw:~# uptime 12:45:35 up 54 days, 22:33, load average: 0.05, 0.05, 0.04 dscp classification in general, is only useful from within your own network, going outside. which is a really nice way of measuring and visualizing packet loss and variations in latency. I am using the default probe type which uses fping (ICMP http://www.fping.org/ ). I LOVE smokeping and wish very much we had a way to combine it with mrtg data to see latency AND bandwidth at the same time. It has been working well, I set it up for a site in advance of setting up SQM and then afterwards I can see the changes and determine if more tuning is needed. But if ICMP is having it's priority adjusted (up or down), then the results might not reflect the latency of other services. Fortunately the nice thing is that many other probe types exist http://oss.oetiker.ch/smokeping/probe/index.en.html So which probe types would be good to use for bufferbloat measurement? I guess the answer is whatever is important to you, but I also suspect there is a set of things that ISPs are known to mess with. HTTP? But also maybe HTTPS in case they are doing some sort of transparent proxy? DNS? SIP? I suppose you could even do explicit checks for things like Netflix (but then it's easy to go off on a tangent of building a net neutrality observatory). On a somewhat related note, I was once using smokeping to measure a fiber link to a bandwidth provider and had it configured to ping the router IP on the other side of the link. In talking to one of their engineers, I learned that they deprioritize ICMP when talking _with_ their routers, so my measurement weren't valid. (I don't know if they deprioritize ICMP traffic going _through_ their routers) I do strongly recomend deprioritizing ping slightly, and as I noted, I have seen many a borken script that actually prioritized it, which is foolish, at best. I keep hoping multiple (many!) someones here will go have lunch with their company's oft lonely, oft starving sysadmin(s), to ask them what they are doing as to firewalling, QoS and traffic shaping. Most of the ones I have talked are quite eager to show off their work, which is unfortunately often of wildly varying quality and complexity. I find that an offer of saki and sushi are most conducive to getting that conversation started. I certainly would like to see more default corporate firewall/QoS/shaping rules than I have personally, for various platforms. Someone's got to have some good ideas in them... and it would be nice to know how far the bad ones, have propagated. -- Matt Taggart m...@lackof.org -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https
Re: [aqm] the cisco pie patent and IETF IPR filing
On Wed, Mar 4, 2015 at 12:17 AM, Vishal Misra mi...@cs.columbia.edu wrote: Hi Dave, Thanks for your email. A few quick points: - I have actually sent a note already to someone on the Cisco PIE team about the error in the IETF IPR filing and am sure they will get it corrected. You have helpfully dug out the actual patent application and it appears that one digit got inadvertently changed in the Cisco IETF IPR declaration of the patent application. - I wish I had a marketing department that would do stories for me :-). I work at Columbia University and that story that you point out was done by a writer at the UMass-Amherst engineering school as an example of academic research having practical impact. There is an urgent need to support more academic research and I think stories like this one support the cause. Well, yes and no. One thing I have tried really hard to do throughout this project is give credit where credit is due, at every talk for example, always mentioning pie, even before I actually had any data on it's performance.- I try to give every individual that has contributed something to this stone soup project, as here at uknof - https://plus.google.com/u/0/107942175615993706558/posts/immF8Pkj19C praise - for what they did to help out. There have been an amazing level of details to sort out along the way here at every level in the OS stack, and in the hardware and there is simply no one individual or company I would single out as truly key, except maybe George P. Burdell! A lesson I have learned is that folk in marketing are not particularly good at correctly distributing credit, and I assume that is how they are taught to write, to not look at any facts outside of their immediate objectives. [1] http://newsroom.cisco.com/feature-content?type=webcontentarticleId=1414442 and 'course nobody in the press has shown up with a photographer to write puff pieces about the overall effort except, well, cringely's work is not puffy enough by marketing standards: ( http://www.cringely.com/tag/bufferbloat/ ) I admit to a great deal of frustration when nick weaver writes an otherwise *excellent* piece in forbes, http://www.forbes.com/sites/valleyvoices/2015/02/27/this-one-clause-in-the-new-net-neutrality-regs-would-be-a-fiasco-for-the-internet/ and expends 3+ paragraphs explaining bufferbloat, but never gives the reader a link back *to the word* so that maybe, some CTO or CEO that reads that rag would have some context and clue when an engineer comes up to him asking for permission to go implement a fix that is now, basically, off the shelf. *I* am going to keep giving credit to everyone I can, in every talk and presentation I do, and there are quite a few core contributors that I wish I had called out by name more - for example, I would have mentioned felix feitkau's contribution towards fixing wifi at the nznog talk if I could correctly pronounce his name! I struggled for years to be able to pronounce juliusz's! At the very least, I hope we can do more from a SEO perspective - and all *pull together* to get the message out - that bufferbloat is fixed, that solutions are being standardized in the ietf, and the code is widely available on a ton of platforms already - and move to somehow get to where ISPs are announcing settings for things like openwrt + sqm-scripts, and more importantly - schedules for rolling out fixes (like docsis 3.1 and better CPE) to their customers. everyone: What else more can we do here to cross the chasm? - Indeed neither me nor any of the other PI authors had any idea of the PIE work. I discovered it accidentally when I was at MIT giving a talk on Network Neutrality and Dave Clark mentioned Cisco's PIE and DOCSIS 3.1 to me. I later read up on PIE and was pleasantly surprised that our PI work from more than a decade back evolved into it. - I had contributed the PI code to Sally Floyd back in 2001 and it has been part of ns2 for the longest time (pi.cc). It shouldn't be difficult to adapt that for a Linux implementation and I am happy to help anyone who wishes to try it. Maybe that might affect your loyalty to fq_codel. I let the data take me where it may. I (not) always have, but reformed about 15 years ago. [1] I hope that you and your students also, do some experiments on the successors to PI and RED and DRR - and also follow the data where-ever it leads you. I was fiercely proud of sfqred - until fq_codel blew it away on every benchmark I could devise. I have long longed to find another independent expert in the field to create new experiments and/or recreate/reproduce/disprove our results. [1] For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled. - Richard P. Feyman, Challenger Disaster report: https://www.youtube.com/watch?v=6Rwcbsn19c0 -Vishal -- http://www.cs.columbia.edu/~misra/ On Mar 4, 2015, at 1:07 AM, Dave Taht dave.t...@gmail.com wrote: Two items: A) The IETF
Re: [aqm] the cisco pie patent and IETF IPR filing
On Wed, Mar 4, 2015 at 2:08 PM, Rong Pan (ropan) ro...@cisco.com wrote: The correct Cisco IPR is http://datatracker.ietf.org/ipr/2540/. Thank you very much for the pointer to the correct IPR filing. I apologize for being grumpy. -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] [Bloat] ping loss considered harmful
On Tue, Mar 3, 2015 at 10:00 AM, Fred Baker (fred) f...@cisco.com wrote: On Mar 3, 2015, at 9:29 AM, Wesley Eddy w...@mti-systems.com wrote: On 3/3/2015 12:20 PM, Fred Baker (fred) wrote: On Mar 1, 2015, at 7:57 PM, Dave Taht dave.t...@gmail.com mailto:dave.t...@gmail.com wrote: How can we fix this user perception, short of re-prioritizing ping in sqm-scripts? IMHO, ping should go at the same priority as general traffic - the default class, DSCP=0. When I send one, I am asking whether a random packet can get to a given address and get a response back. I can imagine having a command-line parameter to set the DSCP to another value of my choosing. I generally agree, however ... The DSCP of the response isn't controllable though, and likely the DSCP that is ultimately received will not be the one that was sent, so it can't be as simple as echoing back the same one. Ping doesn't tell you latency components in the forward or return path (some other protocols can do this though). So, setting the DSCP on the outgoing request may not be all that useful, depending on what the measurement is really for. Note that I didn’t say “I demand”… :-) My point was A), I have seen tons of shapers out there that actually prioritize ping over other traffic. I figure everyone here will agree that is a terrible practice, but I can certainly say it exists, as it is a dumb mistake replicated in tons of shapers I have seen... that makes people in marketing happy. Already put up extensive commentary on that bit of foolishness on wondershaper must die. Please feel free to review any shapers or firewall code you might have access to for the same sort of BS and/or post the code somewhere for public review. A BCP for these two things would be nice. And B) Deprioritizing ping (slightly) as I do came from what has happened to me multiple times when hit by a bot that ping floods the network. One time, 30+ virtual windows boxes in a lab got infected by something that went nuts pinging the entire 10/8 network we were on. It actually DID melt the switch - and merely getting to isolating that network off from the rest was a PITA, as getting to the (SFQ-ing) router involved was nearly impossible via ssh. (like, 2 minutes between keystrokes). Thus, ping, deprioritized. I tend to feel deprioritizing it slightly is much more important in the post ipv6 world. I share the perception that ping is useful when it’s useful, and that it is at best an approximation. If I can get a packet to the destination and a response back, and I know the time I sent it and the time I received the response, I know exactly that - messages went out and back and took some amount of total time. I don’t know anything about the specifics of the path, of buffers en route, or delay time in the target. Traceroute tells me a little more, at the cost of a more intense process. In places I use ping, I tend to send a number of them over a period of time and observe on the statistics that result, not a single ping result. -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] speedtest-like results for 3g and 4g at ofcom
Anybody know anybody here that could ask them to run a valid latency under load test? http://media.ofcom.org.uk/news/2014/3g-4g-bb-speeds/ -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] Gathering Queue Length Statistics
On Wed, Feb 25, 2015 at 9:53 AM, Ryan Doyle rpdo...@live.unc.edu wrote: Hello, I am a senior undergraduate student at the University of North Carolina at Chapel Hill and am studying the effectiveness of AQMs. I have set up a lab network and plan on running different sets of experiments with different AQMs. My router machines are running Linux kernel version 3.16.0. I am using the fq_codel, codel, and pie qdiscs for my research and am wondering if there is a way to collect statistics regarding the average queue length since a qdisc was enabled? I have looked at tc's -s flag for statistics, but they show nothing about queue length and I have been unable to find anything else that might help me get queue length statistics. Oh, god. I am getting incredibly sensitive about average queue length, and I realize that that is not what you meant. But since not enough people have seemingly read this or any of the related materials, here it is again. http://www.pollere.net/Pdfdocs/QrantJul06.pdf And I of course always recommend van´s talk on the fountain model for thinking about closed loop servo systems. http://www.bufferbloat.net/projects/cerowrt/wiki/Bloat-videos In the bufferbloat project... We have developed many tools that drive aqms hard, see netperf-wrapper on github for that, which measures e2e delay without requiring any tools on the routers inbetween. e2e delay in my mind is way more important than average queue length. And you can derive the queue length(s) from tcp timestamps in those netperf-wrapper tests, from additional packet captures, if you must. If you absolutely MUST derive average queue length from the box, you can poll the interface frequently with tc -s qdisc show as well as with ifconfig- and parse out the number of packets and the number of bytes. But you can do MUCH more valid statistical analysis than that, with that sort of data set - and if you poll too frequently you will heisenbug your tests, as those data collection calls take locks that interfere with the path. and we have all sorts of advice about traps for the unwary here: http://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmarking_Codel_and_FQ_Codel Please use things like CDFs to see the range of delays, rather than averages. It is what happens at above 90% of the range that makes bufferbloat maddening to ordinary users. I am summarily rejecting any papers that I review that report average queue length as if it meant anything. And for a few other reasons. You have been warned. I really lost my temper after the last paper I reviewed last weekend and the resulting flamage is all over the bloat list and codel lists, and starts here: https://lists.bufferbloat.net/pipermail/codel/2015-February/000872.html Best, Ryan Doyle ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] PIE implementation on NS2
pie, codel, sfq_codel, codel-dt and other variants are all part of the upcoming ns2 release. A release candidate is here: http://nsnam.isi.edu/nsnam/index.php/Roadmap codel ended up in the september release of ns3-21, the fq_codel variant is being merged (hopefully) in the 3.22 release, present tree for that awaiting some pending refactoring and someone to help do the work. https://www.nsnam.org/wiki/Ns-3.22 Please give 'em a try and report any bugs, etc, to the relevant ns* mailing lists. On Tue, Jan 20, 2015 at 6:40 PM, ETAF dancing_li...@foxmail.com wrote: Hello! Dose Anyone can provide the implementation of PIE on NS2 ? Thanks a lot! ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] incorrect defaults for RED in ns2 cross check
I am curious if those of you fiddling with RED have been setting q_weight in your simulations and papers, overriding the incorrect ns2 default? Similarly, those of you testing ARED, making sure the adaptive parameter is really on? ... and if someone can come up with a way of a validating correct configurations for RED were used in every paper that used it in ns2 for the last 12 years, I would love to hear it. -- Forwarded message -- From: Tom Henderson t...@tomh.org Date: Sun, Dec 21, 2014 at 12:07 PM Subject: [Ns-developers] proposed changes to ns-2 RED code To: ns-developers list ns-develop...@isi.edu, ns-us...@isi.edu ns-us...@isi.edu Cc: Mohit P. Tahiliani tahiliani.n...@gmail.com If you are using ns-2 and RED queues, please help us to evaluate the following proposed change. Mohit Tahiliani has been working on Adaptive RED in ns-2, and has patched a few issues, including: - use of ARED in wireless networks leads to floating point exception - the default value of Queue/RED set q_weight_ -1 is incorrect - Queue/RED set adaptive_ 0: this must be set to 1, otherwise max_p parameter never adapts. While the default values of some parameters (such as thresh_, maxthresh_, q_weight_) were changed in 2001 to make ARED as the default RED mechanism in ns-2, those of others parameters were left unchanged. The resulting code defaults to something that is neither RED nor ARED; this patch will fix the default to ARED. The proposed patch is in a tracker issue here: http://sourceforge.net/p/nsnam/patches/25/ I'm testing release candidates for ns-2.36, which are described here: http://nsnam.isi.edu/nsnam/index.php/Roadmap Mohit's patch is _not_ part of the first release candidate. If we move forward with it, it will be merged as part of a later release candidate. So to test it yourself, I recommend to download the release candidate and apply the patch there. I've been through a couple of review cycles with Mohit on this patch. We'll use lazy consensus to try to decide on its inclusion. Unless we hear from the community that these changes should be reconsidered (let's set a date, such as by January 10), I plan to work with Mohit to evaluate the ns-2 validate trace changes and update the traces, and commit this to ns-2 prior to the ns-2.36 release. Of course, even if you support this, it would be nice to hear positive feedback if you read over this patch, test it, and like what you see. Thanks, Tom -- Dave Täht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] RED implementation on Linux 3.14.22
On Mon, Dec 15, 2014 at 5:41 AM, Jim Gettys j...@freedesktop.org wrote: On Mon, Dec 15, 2014 at 2:51 AM, Simone Ferlin-Oliveira fer...@simula.no wrote: All, I am doing some work with shared bottleneck detection that requires some evaluation with different AQM, in particular, RED. Since I haven't been following the evolution of the implementation, I would like to ask about your experience with the code on Linux 3.14 (and newer). I know that Dave Taht ran into bugs in RED a while back, which I believe have been fixed for quite a while. The power of git to answer questions like this is unparalleled. Taking a look at my current kernel tree and doing a: git log net/sched/sch_red.c shows eric fixed 2 bugs in Linux RED in commit 1ee5fa1e9970a16036e37c7b9d5ce81c778252fc Author: Eric Dumazet eric.duma...@gmail.com Date: Thu Dec 1 11:06:34 2011 + sch_red: fix red_change ... http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1ee5fa1e9970a16036e37c7b9d5ce81c778252fc ARED was added slightly later, and sfqred (a first attempt at blending fq + AQM together) shortly after that. (which doesn't do ared (never made it to mainline as fq_codel landed soon after that) Advice: Keep track of net-next, do git pulls regularly, and watch git log net for changes. You should always be looking at whether code has been patched in the current kernel.org system for a module like that you are interested in, so do a diff between 3.14 and the current Linux system. 3.14 is recent enough that it may be viable for experiments, for the time being. Planning to keep up with Linux development is wise long term in any case, as the rate of improvement/change in the networking stack is very high at the moment as draining the bufferbloat swamp and other performance work continues. Important changes since 3.14: pie added, DCTCP added, gso/tso offloads seriously reworked and made gentler, sch_fq's pacing improved. The last kernel rounds (3.18,3.19) were seriously productive: hystart improved at longer RTTs, still more TSO/gso improvements, and xmit_more support was added for some devices. Also support for per route congestion control settings (primarily targetted at DCTCP) was just added. I believe some of the long RTT falloff we saw in toke's paper was due to hystart issues, as I have been unable to duplicate some of his results with this upcoming release. I have basically thrown out all my 3.14 results at this point and am starting over with the soon-to-stablize 3.19 release. (Well, in fact, I ended up starting over 3 times in the last 2 months as each of the new features above landed in the kernel) (but as for red, no changes except in the underlying TCPs and device drivers) Relevant commits were: Hystart change: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=42eef7a0bb0989cd50d74e673422ff98a0ce4d7b xmit_more: http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html very good lwn article on it http://lwn.net/Articles/615238/ one of several GSO fixes: commit d649a7a81f3b5bacb1d60abd7529894d8234a666 Author: Eric Dumazet eduma...@google.com Date: Thu Nov 13 09:45:22 2014 -0800 tcp: limit GSO packets to half cwnd ... etc. Do a git log net. :) preso that convinced systemd to switch to fq_codel: http://lwn.net/Articles/616241/ Also note that underlying device drivers may have (sometimes lots) of buffering out of control of the Linux queue discipline. For Ethernet devices, you should ensure that that the drivers have BQL support implemented to minimize this buffering. Other classes of drivers are more problematic, and may have lots of buffering to surprise you. +10 (or rather, -10). It's up to 25 devices now. I note that TSO/GSO used to interact very badly with soft rate limiting (htb), it seems better now. Also be aware that ethernet flow control may move the bottleneck from where you expect to somewhere else, and that switches in networks also have to be well understood. Most consumer switches have this *on* by default, and mixed 1G/100Mb networks can be particularly entertaining in this regard. Cable modems, unfortunately, typically do not implement flow control, but some DSL modems do (putting the bottleneck into your router, rather than in the modem). I should probably put red back into my test matrixes. I stopped benchmarking it and pfifo_fast a long time ago. A netperf-wrapper data set that predates the hystart fix, testing 3 RTTs: http://snapon.lab.bufferbloat.net/~d/comprehensive.puck/ or: http://snapon.lab.bufferbloat.net/~d/comprehensive_puck.tgz *Any* help is appreciated. Hope this helps. Thanks, Simone ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks ___ aqm mailing list aqm
Re: [aqm] RED implementation on Linux 3.14.22
On Mon, Dec 15, 2014 at 7:54 AM, Dave Taht dave.t...@gmail.com wrote: On Mon, Dec 15, 2014 at 5:41 AM, Jim Gettys j...@freedesktop.org wrote: On Mon, Dec 15, 2014 at 2:51 AM, Simone Ferlin-Oliveira fer...@simula.no wrote: All, I am doing some work with shared bottleneck detection that requires some evaluation with different AQM, in particular, RED. Since I haven't been following the evolution of the implementation, I would like to ask about your experience with the code on Linux 3.14 (and newer). I need to clarify something about newer. The third parameter in Linux is for bug fixes only. 3.14 is the major release, a 3.14.22 was 22 bug fix releases. A -X or 4th parameter, if it exists, is distro specific changes, which can often, particularly in major distros like redhat or ubuntu, be quite extensive. New features, such as the ones I mentioned in the previous email, generally do not make it to the bug fix releases, and I don't know if (for example) the hystart change or GSO half cwnd change will make it to the -stable tree for older releases (without checking), as usually only security or crash critical bugs make it into stable. I mention this in light of a fairly recent DCTCP paper which used a pre-bufferbloat-fixes kernel of 3.2.something, discussed (Well, ranted about slightly, apologies) here. https://lists.bufferbloat.net/pipermail/bloat/2013-November/001736.html (I would dearly like to see that paper's experiments revised and updated in light of that discussion, now that all these other fixes have landed, and DCTCP is now in mainline linux.) I try to publish a simple debian kernel build script, and my own patch set of the codel-related research in progress regularly, somewhere: http://snapon.lab.bufferbloat.net/~d/codel_patches/ and will probably restart publishing a separate debloat-testing tree for the upcoming make-wifi-fast effort, as that set of changes is going to be quite extensive, and buggy, for a while. -- Dave Täht http://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] analysis paper on PIE...
On Wed, Nov 12, 2014 at 4:03 PM, Scheffenegger, Richard r...@netapp.com wrote: Hi Martin, I believe these papers may qualify that requirement: http://ipv6.cablelabs.com/wp-content/uploads/2014/06/DOCSIS-AQM_May2014.pdf This documents the docsis-pie implementation which has rather a few basic improvements on pie by itself, notably bytemode, some predictive stuff, and drop-semi-derandomization. It also uses overlarge and not-recomended-by-the-inventors constants for codel and sfq_codel, and lumps together all results at all bandwidths, where, as we've shown, current aqm implementations perform differently at different bandwidths and RTTs. The earlier paper covered the bandwidth scenarios more broadly and in depth, with less twiddling of the constants. I also had a very long post on this list going into the problems with the testing done here. which I'll search for unless someone beats me to it. Some of the tools used in this evaluation landed in ns2 earlier this year, and I would certainly like these tests reproduced independently, with sane values for codel and fq_codel, and preferably against a simulator I trust more, like ns3. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6925768 Yep. Toke points to codel's falloff at higher rates in his paper also - this is mostly due to a problem in the control law introduced in the linux version that isn't in the ns2 version, and is nearly invisible in the fq_codel version. I do fear a similar problem is in PIE when dealing with TSO packets but have not tested. And the load transients are a problem with any straight aqm system. https://www.duo.uio.no/handle/10852/37381 I have picked apart this paper elsewhere also. I would have liked it, if in particular, fq_codel had been used throughout the tests in comparison, particularly with ecn To add to the comparisons: http://caia.swin.edu.au/reports/140630A/CAIA-TR-140630A.pdf While quite good, this was kind of limited to steady state performance, and at rates below 10mbit. I was delighted to see someone actually use cerowrt for it's intended purpose, evaluating each new algorithm on real hardware in their bachelor's thesis, but I can't find that url right now. tl;dr - both pie and codel camps did some independent implementations and testing of the respective other algorithm, I tracked codel, fq_codel, pie, improvements to red (ared), sfq, sfqred, and SFB closely for the past 4 years. All of these are in linux 3.14 or later (with pie entering last). The ability to do basic testing of everything on the table is a download away, on nearly every linux distro, and cerowrt has 'em all - and openwrt several... I spent GSOC2014 getting ns3 up to speed. For some reason I don't seem to have any time to write papers myself... with discussions and denting out some poorly described aspects in that process. It's my understanding that this lead to a better quality of the drafts in both instances. Ya know, I'm not part of the codel or pie camp. I'm pretty firmly in the low knobs fq + an aqm with ecn support camp.There aren't a lot of people in a codel-only camp. The algorithm is a toolkit, much like pie is a toolkit for docsis-pie, and setting up the debate as codel vs pie feels like an exercise in dialectical dualism for sake of excluding the third alternative. Certainly codel the algorithm, and codel the stand-alone aqm can be improved (and I have patches for that available for a long time now) but it has taken a long time to have an adaquate suite of test tools to be able to analyze the often microscopic differences between versions of the base aqm algorithms, be the pie or codel or red derived. (for example I was unaware until I got a preview of toke's paper of the degree of decline in effectiveness in codel alone at 100Mbit, being focused primarily on finding improvements at 5mbit or below (the speed most of the edge of the internet runs at - and the hardware I use to test peaking out at about 60mbits before htb rolls over and dies on a cpu designed in 1989. So instead of fiddling with codel I've been fiddling with a better rate shaper embedded into fq_codel) I do certainly hope that work can move forward on the evaluation guidelines based on a wide variety of scenarios. In my own case, my biases are towards managing slow start better (vs steady state TCPs), capturing all the latency sources (e.g DNS, tcp syns), and enabling voip, gaming, web, and videoconferencing traffic better to the expense of full single flow goodput, at rates well below 100mbit, and typically at baseline physical rtts in the 4-50ms range. I utterly agree that more testing is needed. However the only open source test suite (netperf-wrapper) only implements a few of the tests in the pie and docsis-pie papers, making reproduction difficult, and the earlier Richard Scheffenegger -Original Message- From: aqm [mailto:aqm-boun...@ietf.org] On Behalf Of Martin Stiemerling Sent: Mittwoch, 12. November 2014
Re: [aqm] adoption call: draft-welzl-ecn-benefits
On Tue, Aug 12, 2014 at 3:24 AM, Gorry Fairhurst go...@erg.abdn.ac.uk wrote: OK, so I have many comments, see below. Gorry On 12/08/2014 10:43, Bob Briscoe wrote: Wes, and responders so far, A doc on the benefits and pitfalls of ECN is needed. Personally I wouldn't assign much of my own time as a priority for such work; I'd rather work on finding the best road through the protocol engineering. But I'm glad others are doing this. We need to be clear that this doc (at the moment) is about the benefits of 'something like RFC3168 ECN'. I think that is the right direction. I would not be interested in a doc solely about the benefits of 'classic' ECN (we had RFC2884 for that). However, if it is about the benefits of some other ECN-like thing, it will not be worth writing unless it is more concrete on what that other ECN-like thing is. At present different and sometimes conflicting ideas are floating around (I'm to blame for a few). In order to write about benefits, surely experiments are needed to quantify the benefits? +10 Alternatively, this could be a manifesto to identify /potential/ benefits of ECN that the current classic ECN is failing to give. I think at the moment it's the latter (and that's OK given that's where we have reached today). GF: If someone wishes to write this research paper, I'd be happy to join them, but it was not what I had in mind for this ID. How about the title Explicit Congestion Notification (ECN): Benefits, Opportunities and Pitfalls ? GF: I could live with that, if the group wished this! +1 We (in the RITE project) have agreed to start work on an 'ECN Roadmap' in order to identify all the potential ideas for using ECN coming out of research, and write down whether new standards will be needed for some, whether they can evolve without changing standards, which are complementary, which conflict, etc. I'd like to see experiments done through the free.fr network as it's the only one I know of with ecn enabled along the edge in their revolution v6 product. Presently cerowrt ships with ecn enabled on the inbound rate limiter and disabled on the outbound, I have considered enabling it by default on the outbound for connections 4mbits. (users can override these settings, of course) I don't know whether this ECN benefits doc ought to include this detailed ECN roadmap work, but if it's going to talk about something like ECN I believe it will have to include a summary of the main items on such a roadmap to be concrete. more inline... At 00:38 12/08/2014, John Leslie wrote: (I have read Michael's reply to this, but I'll respond here.) Dave Taht dave.t...@gmail.com wrote: On Mon, Aug 11, 2014 at 7:48 AM, Wesley Eddy w...@mti-systems.com wrote: This draft has been discussed a bit here and in TSVWG: http://tools.ietf.org/html/draft-welzl-ecn-benefits-01 I do think this is the right place to discuss it. As I understand, the IAB has also discussed it a bit, and would be happy if this was something that an IETF working group published. I believe the TSVWG chairs also discussed this and would be fine if the AQM working group adopted it. Thus, I am in favor of adopting it, with the understanding that it will see significant changes during our discussion. I think we can and should agree the direction of those changes in this thread. I'd rather not agree to start on a doc and plan to meander. GF: +1, we can add comments to the ID to align to this, personally I've already said that I'd like to see text on: - bleaching and middlebox requirements to deploy. - Need to verify the paths actually really *do support* ECN (sorry, but may be needed). I agree that verifying that a path can take a congestion notification e2e is important. I don't think this will be a quick (6 months) job, because of the problem of being clear about the things like ECN that it needs to talk about. GF: That depends also in part on whether these new mechanisms: will actually change the message to potential users of transports and people considering deployment. In my mind the definition of the protocol techniques does not HAVE to be the same document that tells people *HOW* to implement this in stacks or network devices. (My own choice would be to keep these to research papers and RFCs targeted at their respective communities). I don't share the relentless optimism of this document, and would like it - or a competing document - to go into the potential negatives. I think it should concentrate on what its name says: the benefits of ECN, both now and in an expected future; but that it should also at least mention downsides this WG sees: and that it should avoid any recommendation stronger than make ECN available to consenting applications. I agree it should be informative, rather than making too many detailed recommendations. GF: Any other bullets listing additional topics are most welcome
[aqm] Sane analysis of typical traffic
changing the title as this is not relevant to the aqm document... ... but to an attitude that is driving me absolutely crazy. On Tue, Jul 15, 2014 at 10:46 AM, Akhtar, Shahid (Shahid) shahid.akh...@alcatel-lucent.com wrote: Dave, The message of the results that we presented in November is that it is possible, with currently deployed access hardware, to configure RED so that it consistently improves the end user experience of common network services over Tail-Drop (which is most often configured), and that this improvement can be achieved with a fixed set of RED configuration guidelines. We did not run experiments with sfq_codel because it is not deployed in access networks today. We ran experiments with plain CoDel to understand the difference between a well-configured RED and a more recent single-bucket AQM in our target scenarios, and as reported, didn't observe significant differences in application QoE. Your application was a bunch of video streams. Not web traffic, not voip, not, gaming, not bittorrent, not a family of four doing a combination of these things, nor a small business that isn't going to use HAS at all. Please don't over generalize your results. RED proven suitable for family of couch potatoes surfing the internet and watching 4 movies at once over the internet but not 5, at 8mbit/sec might have been a better title for this paper. In this fictional family, just one kid under the stair, trying to do something useful, interactive and/or fun, can both wreck the couch potatoes' internet experience, and have his own, wrecked also. Additional inline clarifications below. -Shahid. -Original Message- From: Dave Taht [mailto:dave.t...@gmail.com] Sent: Monday, July 14, 2014 2:00 PM To: Akhtar, Shahid (Shahid) Cc: Fred Baker (fred); John Leslie; aqm@ietf.org Subject: Re: [aqm] Obsoleting RFC 2309 On Mon, Jul 14, 2014 at 11:08 AM, Akhtar, Shahid (Shahid) shahid.akh...@alcatel-lucent.com wrote: Hi Fred, All, Let me an additional thought to this issue. Given that (W)RED has been deployed extensively in operators' networks, and most vendors are still shipping equipment with (W)RED, concern is that obsoleting 2309 would discourage research on trying to find good configurations to make (W)RED work. We had previously given a presentation at the ICCRG on why RED can still provide value to operators (http://www.ietf.org/proceedings/88/slides/slides-88-iccrg-0.pdf). We have a paper at Globecom 2014 that explains this study much better, but I cannot share a link to it until the proceedings are available. My problem with the above preso and no doubt the resulting study is that it doesn't appear cover the classic, most basic, bufferbloat scenario, which is 1 stream up, 1 stream down, one ping (or some form of voip-like traffic) and usually on an edge network with asymmetric bandwidth. Two additional analyses of use from the download perspective might be Arris's analysis of the benefits of red and fq over cable head ends: http://snapon.lab.bufferbloat.net/~d/trimfat/Cloonan_Paper.pdf and the cable labs work which focused more on the effects of traffic going upstream which has been discussed fairly extensively here. SA: We tried to cover the typical expected traffic over the Internet. I don't know where you get your data, but my measured edge traffic looks nothing like yours. Sure bandwidth wise, theres the netflix spike 3 hours out of the day but the rest sure isn't HAS. Most of the traffic is now HAS traffic (as per the sandvine report), so if only a single stream is present, it is likely to be HAS. The closest approximation of a continuous TCP stream, as you mention, would be a progressive download which can last long enough to look continuous. These were modeled together with other types of traffic. You keep saying, download, download, download. I am saying merely please ALWAYS try an upload at the same time you are testing downloads - be it videoconferencing (which can easily use up that 1.6mbit link), a youtube upload, a rsync backup, a scp, anything... It needent be the crux of your paper! But adding several tests of that sort does need to inform your total modeling experience. If you do that much, you we get a feel for how present day systems interact with things like ack clocking which will do very interesting things to your downstream to the couch potato performance metrics. It's not clear from the study that this is a 8mbit down 1mbit up DSL network (?), SA: In the study presented, It was 8M down and 1.6M up - slide 9 Thx. nor is it clear if RED is being applied in both directions or only one direction onl? SA: AQMs (including RED) were only applied in downstream direction - slide 9 Are you going to follow up with stuff that looks at the upstream direction? (and the results you get from an asymmetric network are quite interesting, particularly in the face of any cross traffic at all) SA
Re: [aqm] Obsoleting RFC 2309
On Mon, Jul 14, 2014 at 11:23 AM, Fred Baker (fred) f...@cisco.com wrote: On Jul 14, 2014, at 11:08 AM, Akhtar, Shahid (Shahid) shahid.akh...@alcatel-lucent.com wrote: Hi Fred, All, Let me an additional thought to this issue. Given that (W)RED has been deployed extensively in operators' networks, and most vendors are still shipping equipment with (W)RED, concern is that obsoleting 2309 would discourage research on trying to find good configurations to make (W)RED work. Well, note that we’re not saying to pull RED out of the network; we’re saying to not make it the default. Note that even in the networks you mention, (W)RED is not the default configuration; you have to give it several parameters, and therefore have to actively turn it on. We had previously given a presentation at the ICCRG on why RED can still provide value to operators (http://www.ietf.org/proceedings/88/slides/slides-88-iccrg-0.pdf). We have a paper at Globecom 2014 that explains this study much better, but I cannot share a link to it until the proceedings are available. One of the major reasons why operators chose not to deploy (W)RED was a number of studies and research which gave operators conflicting messages on the value of (W)RED and appropriate parameters to use. Some of these are mentioned in the presentation above. In it we show that the previous studies which showed low value for RED used web traffic which had very small file sizes (of the order of 5-10 packets), which reduces the effectives of all AQMs which work by dropping or ECN marking of flows to indicate congestion. Today's traffic is composed of mostly multi-media traffic like HAS or video progressive download which has much larger file sizes and can be controlled much better with AQMs and in our research we show that RED can be quite effective with this traffic, with little tuning needed for typical residential access flows. Prefer John's proposal of updating 2309 rather than obsoleting, but if we can have some text in Fred's draft acknowledging the large deployment of (W)RED and the need to still find good configurations - that may work. I can volunteer to provide that text. The existing draft doesn’t mention any specific AQM algorithms. It seems to me that the more consistent approach would be to write a short draft documenting WRED, that the WG could pass along as informational or experimental on the basis of not meeting the requirements of being self-configuring/tuning, at the same time as it passes along others as PS or whatever. I strongly support a good, consistent set of recommendations for how, when, and where to use and not use WRED. -Shahid. -Original Message- From: aqm [mailto:aqm-boun...@ietf.org] On Behalf Of Fred Baker (fred) Sent: Monday, July 14, 2014 2:06 AM To: John Leslie Cc: aqm@ietf.org Subject: Re: [aqm] Obsoleting RFC 2309 On Jul 3, 2014, at 10:22 AM, John Leslie j...@jlc.net wrote: It would be possible for someone to argue that restating a recommendation from another document weakens both statements; but I disagree: We should clearly state what we mean in this document, and I believe this wording does so. The argument for putting it in there started from the fact that we are obsoleting 2309, as stated in the charter. I would understand a document that updates 2309 to be in a strange state if 2309 is itself made historic or obsolete. So we carried the recommendation into this document so it wouldn't get lost. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] [iccrg] Fwd: New Version Notification for draft-irtf-iccrg-tcpeval-01.txt
I like what I see here. I will have a few suggestions for the text after the (USA) holiday... On Jul 4, 2014 5:11 AM, David Ros d...@simula.no wrote: Dear all, After a long hiatus, we have finally posted an update to the TCP evaluation suite. Comments are welcome. Besides mostly editorial fixes, the main changes wrt version -00 that David Hayes presented in Berlin (*) concern parameter values for several scenarios (some have been fixed, some have been added). (*) We just realised version -00 never got posted to the IETF datatracker, so there was only a privately-hosted version online; we've just fixed this omission. Our apologies. Thanks, David (as individual) Begin forwarded message: From: internet-dra...@ietf.org Subject: New Version Notification for draft-irtf-iccrg-tcpeval-01.txt Date: 4 Jul 2014 13:50:53 GMT+2 To: Lachlan L.H. Andrew lachlan.and...@monash.edu, Lachlan L.H. Andrew lachlan.and...@monash.edu, David Hayes davi...@ifi.uio.no, Sally Floyd fl...@acm.org, David Ros d...@simula.no, David Ros d...@simula.no, Sally Floyd fl...@acm.org, David Hayes davi...@ifi.uio.no A new version of I-D, draft-irtf-iccrg-tcpeval-01.txt has been successfully submitted by David Ros and posted to the IETF repository. Name: draft-irtf-iccrg-tcpeval Revision: 01 Title:Common TCP Evaluation Suite Document date:2014-07-04 Group:iccrg Pages:34 URL: http://www.ietf.org/internet-drafts/draft-irtf-iccrg-tcpeval-01.txt Status: https://datatracker.ietf.org/doc/draft-irtf-iccrg-tcpeval/ Htmlized: http://tools.ietf.org/html/draft-irtf-iccrg-tcpeval-01 Diff: http://www.ietf.org/rfcdiff?url2=draft-irtf-iccrg-tcpeval-01 Abstract: This document presents an evaluation test suite for the initial assessment of proposed TCP modifications. The goal of the test suite is to allow researchers to quickly and easily evaluate their proposed TCP extensions in simulators and testbeds using a common set of well- defined, standard test cases, in order to compare and contrast proposals against standard TCP as well as other proposed modifications. This test suite is not intended to result in an exhaustive evaluation of a proposed TCP modification or new congestion control mechanism. Instead, the focus is on quickly and easily generating an initial evaluation report that allows the networking community to understand and discuss the behavioral aspects of a new proposal, in order to guide further experimentation that will be needed to fully investigate the specific aspects of such proposal. Please note that it may take a couple of minutes from the time of submission until the htmlized version and diff are available at tools.ietf.org. The IETF Secretariat ___ iccrg mailing list ic...@irtf.org https://www.irtf.org/mailman/listinfo/iccrg ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] aqm conference call results?
there some slides presented that I'd like to refer to as to the aqm evaluation guide's directions that I'd like to see again. Link? As it is being broken up into an overview and a second document detailing tests, I'd like people to look over the tests proposed in http://tools.ietf.org/html/draft-sarker-rmcat-eval-test-01 as a possible inspiration. While I like the above a lot, it bothers me that it is only targeted at very low bandwidth scenarios (4mbit being the topmost). There are hopefully other tests proposed by other relevant working groups (ippm, http 2.0, sctp come to mind immediately), that I'd like to be aware of, and yet don't have the energy to sort through each wg to find. If there is a way to get a list of tests each wg considers important to work with, that would be a starting point. -- Dave Täht ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] New Version Notification for draft-baker-aqm-sfq-implementation-00.txt
On Tue, Jun 24, 2014 at 1:01 PM, Fred Baker (fred) f...@cisco.com wrote: On Jun 24, 2014, at 12:45 PM, Daniel Havey dha...@yahoo.com wrote: So IMHO it really doesn't matter except in the weird corner case where a a running flow has already bloated the queue and then we switch on the AQM. Hmm? In practice, changing the qdisc in Linux, at least, does completely blow up the existing queue: all packets are discarded, the various data structures removed, the new data structures created, then switched. Don't do that. Do it once, at init time, or before address acquisition. Simple schemes can be handled now (linux 3.13 and later) by a single sysctl variable, set either in /etc/sysctl.conf or via sysctl -w net.core.default_qdisc=fq_codel # or pie, or sfq, or fq Arguably this needs to allow for arguments, be more flexible and interface specific. Same goes for enabling ecn or not. (net.ipv4.tcp_ecn=0) More complex implementations, like htb, have a default direction things go until things are fully setup. Other linux implementations, like drr and qfq, do not, and result in packet loss until entirely setup. Recently support for a plug scheduler was developed in order to assist vm migration, which might make it more possible to switch out qdiscs without interrupting service. That actually has me a little worried with the 100 ms delay built into codel. Initial delay until it finds a sane delay to drop at. Imagine, if you will, that you have a deep buffer at the bottleneck and a relatively short RTT, and are moving a large file. I could imagine codel’s delay allowing a session to build a large backlog and then “suddenly turning on AQM”. on a 10 MS link with O(200) packets queue depth, for example, you could build 100 ms plus of data in the queue, spend the delay mostly emptying it, and then drop the last queued packet because 100 ms had gone by, there was still data in the queue, and the next packet had sat there longer than 5 ms. Given that pie depends on an estimation window being filled this is not a problem pie has. However needing that window filled is a big problem at low bandwidths for pie. As for codel Well there is a specific inhibit in present forms of codel to not drop the last packet in the queue even if it has sat there too long. Codel stops dropping at minbytes (called maxpacket in the code), which is a variable determined from the flow characteristics, and is usually 1MTU in size, but can be larger if TSO or GRO are in operation on the device. The first versions of fq_codel preserved this behavior: it would never drop the last packet in any fq_codel queue. This (still) seems like desirable behavior in the case of having nearly one queue per flow, but it led inevitably to what I had called the horizontal standing queue problem (where we could end up with 1024 queues all with one packet and no longer meeting the latency target(s)). So eric made the backlog maxpacket check global to all queues, and that's what's been deployed ever since. Later work, (I think) is showing, that in practice any inhibit at all hurts on the architectures available, as htb or (bql and the tx-ring) are already buffering up packets below where codel was dropping from near-head. More packets will always be along, later. This patch disables the maxpacket check entirely, and results in a space and cpu savings, without much observable negative or positive effect on latency and utilization on the bandwidths available to me. I remain a bit concerned about what happens with TSO and/or GRO enabled. http://snapon.lab.bufferbloat.net/~cero2/0003-codel-eliminate-maxpacket-variable.patch I'd love it if people tried it. Of higher concern to me has long been more sanely applying hysteresis in the drop rate over wildly varying high bandwidths and loads, but not a lot of work has gone into codel since it's inception, as it was so good to start with and so dramatically improved by fq_codel as to be barely worth debating. But certainly better control laws are welcomed! ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] New Version Notification for draft-baker-aqm-sfq-implementation-00.txt
On Tue, Jun 24, 2014 at 1:48 PM, Fred Baker (fred) f...@cisco.com wrote: On Jun 24, 2014, at 1:33 PM, Daniel Havey dha...@yahoo.com wrote: There may be scenarios where the interaction of the interval, the RTT and the bandwidth cause this to happen recurringly constantly underflowing the bandwidth. To be honest, the real concern is very long delay paths, and it applies to AQM algorithms generally. During TCP slow start (which is not particularly slow, but entertains contains exponential growth), we have an initial burst, which with TCP Offload Engines can, I’m told, spit 65K bytes out in the initial burst. The burst travels somewhere and results in a set of acks, which presumably arrive at the sender at approximately the rate the burst went through the bottleneck, but elicit a burst roughly twice as fast as the bottleneck. That happens again and again until either a loss/mark event is detected or cwnd hits ssthresh, at which point the growth of cwnd becomes linear. I think tcp offloads have been thoroughly shown by now to blow up all sorts of networks, and there has been a lot of work in recent linux kernels for hosts to mitigate it (use smaller bursts), most recently the sch_fq + pacing work. The objective of slow start is to fill the pipe and especially in the case of long rtts like in satellite and lte networks, it needs to be, well, slower. tcp offloads are an assist to slower cpus and a per-ethernet-device feature to get more bandwidth for less cpu... at the cost of latency, bursty loss, and packet mixing. modern x86 hardware can easily saturate gigE links without TSO in use at all. many lower end (arm) products can't, as yet, and 10GigE is still the realm of TSO (with mitigations arriving in software as per above) I do hope things like TSO2 (bursts of 256k packets) are not widely adopted, and smarter mixing happens on multi-queued ethernet devices instead. If the burst is allowed to use the entire memory of the bottleneck system’s interface, it will very possibly approach the capacity of the bottleneck. However, with pretty much any AQM algorithm I’m aware of, the algorithm will sense an issue and drop or mark something, kicking the session into congestion avoidance relatively early. big bursts are bad. let packets be packets! Kicking things into congestion avoidance early turns out to have interesting interactions with hystart. This is well-known behavior, and something we have a couple of RFCs on. But yes, it can happen on more nominal paths as well. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] AQM in every buffer?
In the other long thread, gorry said something that didn't quite ring true with me: Our goal should be AQM in every buffer. Well, that's somewhat desirable but not doable (at least in my world) - 1) The device has sufficient buffering to get at least one packet out. 2) There's a tx ring which puts packets for the device to pick up from 3) In linux now (and some older cisco boxes) there is this thing called byte queue limits which moderates the tx ring to only have enough data in it to keep the device busy 4) These layers gives the upper portions of the stack time to think harder about what to put on the tx ring. *Ideally* an AQM should have a picture of the total buffering in the system all the way to the wire, but in practice, at higher speeds, once things are controlled by BQL, it's a trivial amount of extra buffering. (this is partially why I get non-plussed by people dissing drop head, when what's on the TX ring is already past the drop head point of the AQM layer ) Now, I imagine that at least some hardware switches *could* have a picture all the way to the wire, but doubt that it's feasible, also. -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] AQM conference call - June 24
On Mon, Jun 2, 2014 at 8:19 AM, Wesley Eddy w...@mti-systems.com wrote: Hello, we're planning on holding an AQM conference call on June 24 at 1PM US/Eastern time. We'll publish webex/telecon coordinates closer to the day. This is just a notification for calendar planning purposes. I don't have webex capability. Do have google hangouts. Can dial in. We will also have it announced to the IETF announcement list soon. The goal of this is to give us a chance to focus some higher bandwidth discussion around the working group milestones, and hopefully make a little bit of progress prior to the next actual working group meeting at IETF 90. I'm hoping that we need no more than about an hour and a half. The rough agenda (for bashing) is: 1 - discuss overall WG status quickly 2 - discuss state of the 2309bis / recommendation draft - if any editors or people with comments are online, this will be a chance to discuss any remaining items that haven't converged through the mailing list yet 3 - discuss state of evaluation guidelines / scenarios - if one of the editors is available, we'd like them to share plans and status briefly 4 - discuss possibly adopting algorithms, as mentioned on the mailing list and get some feedback on this I am interested in feedback and discussion on the following two drafts before that date, if possible: http://tools.ietf.org/html/draft-nichols-tsvwg-codel-02 http://tools.ietf.org/html/draft-hoeiland-joergensen-aqm-fq-codel-00 IF the wg is interested in seeing this draft completed before ietf, let me know soonest: http://snapon.lab.bufferbloat.net/~d/draft-taht-home-gateway-best-practices-00.html I would probably accompany it with a preso on the case for comprehensive queue management, talking about the expected network behavior and test suites developed by other wgs like webrtc. 5 - plan agenda for Toronto -- Wes Eddy MTI Systems ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] last call results on draft-ietf-aqm-recommendation
I agree with the complement language. I don't mind if they are separable. Integration, however, is highly advantagous. I started another thread on the backlog issue. Because scheduling requires policy and AQM doesn't. Machine gunning down packets randomly until the flows start to behave does not require any policy, agreed. a 5 tuple fq system is not a lot of policy to impose. certainly qos and rate scheduling systems impose a lot more policy. Actually, I'm going to retract part of what I just said. Everything is a policy. Drop tail is a policy, it's useful for e2e mechanisms like ledbat if the queue size is greater than 100ms. not helpful for bufferbloat. Drop head is a policy, it's useful for voip (actually useful for tcp too). Not helpful for ledbat. Shooting randomly and increasingly until flows get under control, a decent compromise between drop head and drop tail, that also shoots at a lot of packets it doesn't need to drr is a policy that does better mixing and does byte fairness sfq is a polic does better mixing of packet fairness qfq does weighted fq red/ared/wred is a policy. hfsc is a policy that does interesting scheduling and drop things all its own htb based policies are often complex and interesting so the problem is in defining what policies are needed and what algorithms can be used to implement that policy. May the ones that provide the best QoE for the end user succeed in the marketplace, and networks get ever better. https://www0.comp.nus.edu/~bleong/publications/pam14-ispcheck.pdf So operators don't want to have to face the dilemma of needing the AQM part, but not being able to have it because they don't want the policy implicit in the scheduling part. A dilemma of choosing which single line of code to incorporate in an otherwise far more complex system? I certainly do wish it was entirely parameterless, and perhaps a future version could be more so than this is today. I can write up the complexity required to do for example qfq + pie but it would be a great deal longer than the below, and qfq + RED or red alone, is much longer than either. Scripting is needed to configure those... # to do both AQM + DRR at the same time, with reasonable defaults for 4mbit-10gbit tc qdisc add dev your_device root fq_codel # AQM only # ecn not presently recomended tc qdisc add dev your_device root codel # or (functional equivalent) tc qdisc add dev your_device root fq_codel flows 1 noecn # (you could also replace the default tc filter, to get, like, # a 4 queued system on dscp...) # DRR + SQF-like behavior with minimal AQM, probably mostly reverting # to drop head from largest queue (with the largest delay I consider # even slightly reasonable) tc qdisc add dev your_device root fq_codel target 250ms interval 2500ms # if your desire is to completely rip out the codel portion of fq_codel that's # doable. I know a fq_pie exists, too. # reasonable default for satellite systems (might need to be closer to 120ms, # and given the speed of most satellites, quantum 300 makes sense as well as # a reduced mtu and IW) tc qdisc add dev your_device root fq_codel target 60ms interval 1200ms # useful option for lower bandwidth systems is quantum 300 # Data center only use can run at reduced target and interval tc qdisc add dev your_device root fq_codel target 500us interval 10ms # above 10Gbit, increasing the packet limit is good, probably a good idea to increase flows # a current problematic interaction with htb below 2.5mb leads to a need for a larger target # (it would be better to fix htb or to write a better rate limiter) It's about a page of directions to handle every use case. I'd LOVE to have similar guideline and cookbook page(s) for EVERY well known aqm and packet scheduling system - notably red and ared. I lack data on pie's scalability presently, too. Most rate shaping code on top of this sort of stuff, and most shaping/qos related code also is orders of magnitude more complex than this. Take htb's compensator for ATM and/or PPPoe framing. Please. OR the hideous QoS schemes people have designed using DPI. As things stand fq_codel is a simpler/faster/better drop in replacement for tons of code that shaped and used RED, or shaped and did sfq. Sensing the line rate, choosing an appropriate packet limit based on available memory, and auto-choosing number of flows are things the C code could be smarter about. They are something I currently do in a shell script (that also tries to figure out atm framing and a 3 tier qos system) I think that adding a rate limiter directly to a fq_codel or wfq + codel derived algo is a great idea and would be better than htb or hfsc + X. Been meaning to polish up the code... This is critical for fq_codel, because apparently CoDel alone is not recommended (which I would agree with). The present version of that is useful (without ecn) in many scenarios. It has been used in combination with hfsc, htb, and standalone. We've long
Re: [aqm] chrome web page benchmarker fixed
Doug Orr recommended to us that we give http://www.chromium.org/developers/telemetry a shot in generating reproducible web traffic models. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] the side effects of 330ms lag in the real world
On Tue, Apr 29, 2014 at 12:56 AM, Mikael Abrahamsson swm...@swm.pp.se wrote: On Tue, 29 Apr 2014, Fred Baker (fred) wrote: A couple points here. 1) The video went viral, and garnered over 600,000 new hits in the 12 hours since I posted it here. there is pent up demand for less latency. While the ad conflates bandwidth with latency, they could have published their RTTs on their local fiber network, which is probably a great deal less than dsl or cable. That counts for a lot when accessing local services. 2) There is a lot of things an ISP can do to improve apparent latency on the long haul A) co-locating with a major dns server like f-root to reduce dns latency B) co-locating with major services like google and netflix publishing ping times to google for example might be a good tactic. C) Better peering Well, we could discuss international communications. I happen to be at Infocom in Toronto, VPN’d into Cisco San Jose, and did a ping to you: Yes, but as soon as you hit the long distance network the latency is the same regardless of access method. So while I agree that understanding the effect of latency is important, it's no longer a meaningful way of selling fiber access. If your last-mile is fiber instead of ADSL2+ won't improve your long distance latency. Well, it chops a great deal from the baseline physical latency, and most people tend to access resources closer to them than farther away. An american in paris might want to access the NYT, but Parisians La Monde. Similarly most major websites are replicated and use CDNs to distribute their data closer to the user. The physical RTT matters more and more in the last mile the more resources are co-located in the local data center. -- Dave Täht ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] [Bloat] the side effects of 330ms lag in the real world
On Tue, Apr 29, 2014 at 9:44 AM, Jim Gettys j...@freedesktop.org wrote: On Tue, Apr 29, 2014 at 3:56 AM, Mikael Abrahamsson swm...@swm.pp.se wrote: On Tue, 29 Apr 2014, Fred Baker (fred) wrote: Well, we could discuss international communications. I happen to be at Infocom in Toronto, VPN’d into Cisco San Jose, and did a ping to you: Yes, but as soon as you hit the long distance network the latency is the same regardless of access method. So while I agree that understanding the effect of latency is important, it's no longer a meaningful way of selling fiber access. If your last-mile is fiber instead of ADSL2+ won't improve your long distance latency. FIOS bufferbloat is a problem too. Measured bufferbloat, symmetric 25/25 service in New Jersey at my inlaw's house is 200ms (on the ethernet port of the Actiontec router provided by Verizon). So latency under load is the usual problem. ESR's link, before and after the cerowrt SQM treatment: https://www.bufferbloat.net/projects/codel/wiki/RRUL_Rogues_Gallery#Verizon-FIOS-Testing-at-25Mbit-up-and-25Mbit-down Why would you think the GPON guys are any better in principle than cable or DSL? Cable and DSL may be somewhat worse, just because it is older and downward compatibility means that new modems on low bandwidth tiers are even more grossly over buffered. Well, buffering on the DSLAM or CMTS needs to be more actively managed. Fixed limits are much like conventional policing, always either too large or too small to handle sustained or bursty traffic respectively. I have been fiddling with Tim Shepard's udpburst tool as a quick means of measuring head end buffering, even with fq_codel present on the inbound. (It's not suitable for open internet use as yet, but code in progress can be had or enhanced at https://github.com/dtaht/isochronous ). I just added ecn and tos setting support to it. server: ./udpburst -S -E -D 32 # Server mode, enable ECN marking, set dscp to 0x20 (CS1) client: This is from a 22Mbit down CMTS d@nuc:~/git/isochronous$ ./udpburst -f 149.20.63.30 -E -C -d -n 400 -s 1400 1400 bytes -- received 382 of 400 -- 365 consecutive 0 ooo 0 dups 2 ect .. . ... . ... . .... . . 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 or roughly 512k of buffering. A DSL link (6400 down) d@puck:~/git/isochronous$ ./udpburst -f snapon.lab.bufferbloat.net -n 100 -C -d -s 1000 1000 bytes -- received 71 of 100 -- 71 consecutive 0 ooo 0 dups 0 ect .. 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 or roughly 64k worth of buffering. Interestingly the bandwidth disparity between the server (gigE in isc.org's co-lo), is so great that fq_codel can't kick in before the 64k dslam buffer is overrun. You can look at the netalyzr scatter plots in http://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/ Now, if someone gives me real fiber to the home, with a real switch fabric upstream, rather than gpon life might be somewhat better (if the switches aren't themselves overbuffered But so far, it isn't. - Jim - Jim - Jim -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list bl...@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article
Re: [aqm] [Bloat] the side effects of 330ms lag in the real world
On Tue, Apr 29, 2014 at 10:01 AM, Toke Høiland-Jørgensen t...@toke.dk wrote: Jim Gettys j...@freedesktop.org writes: Now, if someone gives me real fiber to the home, with a real switch fabric upstream, rather than gpon life might be somewhat better (if the switches aren't themselves overbuffered But so far, it isn't. As a data point for this, I have fibre to my apartment building and ethernet into the apartment. I get .5 ms to my upstream gateway and about 6 ms to Google. Still measured up to ~20 ms of bufferbloat while running at 100 Mbps... http://files.toke.dk/bufferbloat/data/karlstad/cdf_comparison.png I need to note that what this wonderfully flat CDF for the measurement stream shows is that short flows under fq_codel leap to the head of the queue ever better as you get more and more bandwidth available. The background load flows not shown on this graph are experiencing 5-20ms worth of latency in each direction as per codel's algorithm. A better test (in progress) would measure typical voip behaviors However, as that graph shows, it is quite possible to completely avoid bufferbloat by deploying the right shaping. It does not completely avoid bufferbloat, the fq_codel fast queue merely eliminates queuing delay for sparse flows, things like arp, syn, syn/ack, dns, ntp, etc, as well as the first packet of any flow that has not built up a queue yet. (which is, admittedly, quite a lot of bufferbloat reduction) The rest of the magic comes from codel. And in that case fibre *does* have a significant latency advantage. The best latency I've seen to the upstream gateway on DSL has been ~12 ms. And reduced RTT = money. this piece states observed average RTTs at peak times were 17ms for fiber, 28ms for cable, and 44ms for DSL. http://www.igvita.com/2012/07/19/latency-the-new-web-performance-bottleneck/ I don't know if the underlying report measures baseline unloaded last mile RTT. -Toke ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] the side effects of 330ms lag in the real world
pretty wonderful experiment and video http://livingwithlag.com/ -- Dave Täht ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] chrome web page benchmarker fixed
On Fri, Apr 18, 2014 at 1:41 PM, Greg White g.wh...@cablelabs.com wrote: On 4/18/14, 1:05 PM, Dave Taht dave.t...@gmail.com wrote: On Fri, Apr 18, 2014 at 11:15 AM, Greg White g.wh...@cablelabs.com wrote: The choice of RTTs also came from the web traffic captures. I saw RTTmin=16ms, RTTmean=53.8ms, RTTmax=134ms. Get a median? Median value was 62ms. My own stats are probably quite skewed lower from being in california, and doing some tests from places like isc.org in redwood city, which is insanely well co-located. Mine are probably skewed too. I was told that global median (at the time I collected this data) was around 100ms. Well, the future is already here, just not evenly distributed. Nearly every sample I'd taken at the same time as from, almost entirely from major cities, came in at under 70ms median. It strikes me that a possibly useful metric would be object size vs RTT, over time. -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] chrome web page benchmarker fixed
On Fri, Apr 18, 2014 at 11:15 AM, Greg White g.wh...@cablelabs.com wrote: Dave, We used the 25k object size for a short time back in 2012 until we had resources to build a more advanced model (appendix A). I did a bunch of captures of real web pages back in 2011 and compared the object size statistics to models that I'd seen published. Lognormal didn't seem to be *exactly* right, but it wasn't a bad fit to what I saw. I've attached a CDF. That does seem a bit large on the initial 20%. Hmm. There is a second kind of major case, where you are moving around on the same web property, and hopefully many core portions of the web page(s) such as the css and javascript, basic logos and other images, are cached. Caching is handled two ways, one is to explicitly mark the data as cacheable for a certain period, the other is an if-modified-since request, which costs RTTs for setup and the query. I am under the impression that we generally see a lot more of the latter than the former these days. The choice of 4 servers was based somewhat on logistics, and also on a finding that across our data set, the average web page retrieved 81% of its resources from the top 4 servers. Increasing to 5 servers only increased that percentage to 84%. The choice of RTTs also came from the web traffic captures. I saw RTTmin=16ms, RTTmean=53.8ms, RTTmax=134ms. Get a median? My own stats are probably quite skewed lower from being in california, and doing some tests from places like isc.org in redwood city, which is insanely well co-located. Much of this can be found in https://tools.ietf.org/html/draft-white-httpbis-spdy-analysis-00 Thx! In many of the cases that we've simulated, the packet drop probability is less than 1% for DNS packets. In our web model, there are a total of 4 I think we have the ability to get a better number for dns loss now. servers, so 4 DNS lookups assuming none of the addresses are cached. If PLR = 1%, there would be a 3.9% chance of losing one or more DNS packets (with a resulting ~5 second additional delay on load time). I've probably oversimplified this, but Kathie N. and I made the call that it would be significantly easier to just do this math than to build a dns implementation in ns2. The specific thing I've been concerned about was not the probability of a dns loss, although as you note the consequences are huge - but the frequency and cost of a cache miss and the resulting fill. This is a very simple namebench test against the alexa top 1000: http://snapon.lab.bufferbloat.net/~d/namebench/namebench_2014-03-20_1255.html This is a more comprehensive one taken against my own recent web history file. http://snapon.lab.bufferbloat.net/~d/namebench/namebench_2014-03-24_1541.html Both of these were taken against the default SQM system in cerowrt against a cable modem, so you can pretty safely assume the ~20ms (middle) knee in the curve is basically based on physical RTT to the nearest upstream DNS server. And it's a benchmark so I don't generally believe in the relative hit ratios vs a vs normal traffic, but do think the baseline RTT, and the knees in the curves in the cost of a miss and file are relevant. (it's also not clear to me if all cable modems run a local dns server) Recently simon kelly added support for gathering hit and miss statistics to dnsmasq 2.69. They can be obtained via a simple dns lookup as answers to queries of class CHAOS and type TXT in domain bind. The domain names are cachesize.bind, insertions.bind, evictions.bind, misses.bind, hits.bind, auth.bind and servers.bind. An example command to query this, using the dig utility would be dig +short chaos txt cachesize.bind It would be very interesting to see the differences between dnsmasq without DNSSEC, with DNSSEC and with DNSSEC and --dnssec-check-unsigned (checking for proof of non-existence) - we've been a bit concerned about the overheads of the last in particular. Getting more elaborate stats (hit, miss, and fill costs) is under discussion. We've open sourced the web model (it's on Kathie's web page and will be part of ns2.36) with an encouragement to the community to improve on it. If you'd like to port it to ns3 and add a dns model, that would be fantastic. As part of the google summer of code I am signed up to mentor a student with tom for the *codel related bits in ns3, and certainly plan to get fingers dirty in the cablelabs drop, and there was a very encouraging patch set distributed around for tcp-cubic with hystart support recently as well as a halfway decent 802.11 mac emulation. As usual, I have no funding, personally, to tackle the job, but I'll do what I can anyway. It would be wonderful to finally have all the ns2 and ns3 code mainlined for more people to use it. -- Dave Täht ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] The Effect of Network and Infrastructural Variables on SPDY's Performance
Last night's reading was quite good: http://arxiv.org/pdf/1401.6508.pdf As RTT goes up, it becomes increasingly expensive for HTTPS to establish separate connections for each resource. Each HTTPS connection costs one round trip on TCP handshaking and a further two on negotiating SSL setup. SPDY does this only once (per server) and hence reduces such large waste by multiplexing streams over a single connection. ... the separation between RTT and bandwidth is not particularly distinct. This is because HTTPS tends to operate in a somewhat network-unfriendly manner, creating queueing delays where bandwidth is low. The bursty use of HTTPS' parallel connections cre- ates congestion at the gateway queues, causing upto 3% PLR and inflating RTT by upto 570%. In contrast, SPDY causes negligible packet loss at the gateway. The network friendly behaviour of SPDY is particularly interesting as Google has recently argued for the use of a larger IW for TCP [7]. The aim of this is to reduce round trips and speed up delivery | an idea which has been criticised for potentially causing congestion. One question here is whether or not this is a strategy that is speci cally designed to oper- ate in conjunction with SPDY. To explore this, we run further tests using and bandwidth xed at 1Mbps (all other parameters as above). For HTTPS, it appears that the critics are right: RTT and loss increase greatly with larger IWs. In contrast, SPDY achieves much higher gains when increasing the IW without these negative side effects. and then they inject packet loss: we inspect the impact of packet loss on SPDY's performance. We fix RTT at 150ms Sigh, the rest of the paper is pretty good, but they should have looked at packet loss at 10-30ms at least. and BW at 1Mbps, varying packet loss using the Linux kernel rewall with a stochastic proportional packet processing rule between 0 and 3%. . Figure 6 presents the results. Immediately, we see that SPDY is far more adversely affected by packet loss than HTTPS is. This has been anticipated in other work [29] but never before tested. It is also contrary to what has been reported in the SPDY white paper [2], which states that SPDY is better able to deal with loss. The authors suggest because SPDY sends fewer packets, the negative eect of TCP backo is mitigated. We nd that SPDY does, indeed, send fewer packets (up to 49% less due to TCP connection reuse). However, SPDY's multiplexed connections persist far longer compared to HTTPS. -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] chrome web page benchmarker fixed
. - Once initial HTTP GET completes, initiate 24 simultaneous HTTP GETs (via separate TCP connections), 6 connections each to 4 different server nodes I usually don't see more than 15. and certainly not 25k sized objects. - Once each individual HTTP GET completes, initiate a subsequent GET to the same server, until 25 objects have been retrieved from each server. * We don't make sure to flush all the network state in between runs, so if you're using that option, don't trust it to work. The typical scenario we used was a run against dozens or hundreds of urls, capturing traffic, while varying network conditions. Regarded the first run as the most interesting. Can exit the browser and restart after a run like that. At moment, merely plan to use the tool primarily to survey various web sites and load times while doing packet captures. Hope was to get valid data from the network portion of the load, tho... * If you have an advanced Chromium setup, this definitely does not work. I advise using the benchmark extension only with a separate Chromium profile for testing purposes. Our flushing of sockets, caches, etc does not actually work correctly when you use the Chromium multiprofile feature and also fails to flush lots of our other network caches. noted. * No one on Chromium really believes the time to paint numbers that we output :) It's complicated. Our graphics stack is complicated. The time from I actually care only about time-to-full layout as that's a core network effect... when Blink thinks it painted to when the GPU actually blits to the screen cannot currently be corroborated with any high degree of accuracy from within our code. * It has not been maintained since 2010. It is quite likely there are many other subtle inaccuracies here. Grok. In short, while you can expect it to give you a very high level understanding of performance issues, I advise against placing non-trivial confidence in the accuracy of the numbers generated by the benchmark extension. The fact that numbers are produced by the extension should not be treated as evidence that the extension actually functions correctly. OK, noted. Still delighted to be able to have a simple load generator that exercises the browsers and generates some results, however dubious. Cheers. On Thu, Apr 17, 2014 at 10:49 AM, Dave Taht dave.t...@gmail.com wrote: Getting a grip on real web page load time behavior in an age of sharded websites, dozens of dns lookups, javascript, and fairly random behavior in ad services and cdns against how a modern browsers behaves is very, very hard. it turns out if you run google-chrome --enable-benchmarking --enable-net-benchmarking (Mac users have to embed these options in their startup script - see http://www.chromium.org/developers/how-tos/run-chromium-with-flags ) enable developer options and install and run the chrome web page benchmarker, ( https://chrome.google.com/webstore/detail/page-benchmarker/channimfdomahekjcahlbpccbgaopjll?hl=en ) that it works (at least for me, on a brief test of the latest chrome, on linux. Can someone try windows and mac?) You can then feed in a list of urls to test against, and post process the resulting .csv file to your hearts content. We used to use this benchmark a lot while trying to characterise typical web behaviors under aqm and packet scheduling systems under load. Running it simultaneously with a rrul test or one of the simpler tcp upload or download tests in the rrul suite was often quite interesting. It turned out the doc has been wrong a while as to the name of the second command lnie option. I was gearing up mentally for having to look at the source http://code.google.com/p/chromium/issues/detail?id=338705 /me happy -- Dave Täht Heartbleed POC on wifi campus networks with EAP auth: http://www.eduroam.edu.au/advisory.html ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] [AQM Evaluation Guidelines]
On Tue, Apr 15, 2014 at 6:57 AM, Nicolas KUHN nicolas.k...@telecom-bretagne.eu wrote: Thank you for detailing the content of the Cable Labs document and where these 700kB come from. Concerning your last point: As such I would be strongly in favour of changing the draft to actually describe realistic web client behaviour, rather than just summarising it as repeated downloads of 700KB. +100. I understand that it may be a drastic simplification to just summarise the web client behaviour as only repeated downloads of 700kB. However, the draft may not detail realistic web client behaviour: I believe that it may be out of topic and the draft cannot contain such level of complexity for all the covered protocols/traffic. I propose that the following changes: Was: - Realistic HTTP web traffic (repeated download of 700kB); Changed by: - Realistic HTTP web page downloads: the tester should at least consider repeated downloads of 700kB - for a more accurate web traffic, a single user web page download [White] may exploited; What do you think ? An AQM evaluation guide MUST include evaluations against real traffic patterns. Period. The white PLT model was decent; the repeated single flow 700k download proposal is nuts. (I can certainly see attempting to emulate DASH traffic, however) I have further pointed out some flaws in the white PLT model in previous emails - notably as to the effect of not emulating DNS traffic - and have been working towards acquiring a reasonable distribution of DNS hit, miss, and fill numbers to plug into it for some time. That has required work - work on finding a decent web benchmark - and work on acquiring statistics that make sense - and some of that work is beginning to bear fruit. Dnsmasq, for example has sprouted the ability to collect statistics, we are trying to get the chrome web page benchmarker working again, and so on. You ignore the overhead of DNS lookups to your peril. There are other overheads worth looking at, too... Similarly I regard testing a correct emulation of bittorrent's real-world behavior in an AQM'd environment as pretty critical. [white] was not even close in this respect. (but it was a good first try!) Overall I suggest that we also adopt the same tests that other WGs are proposing for their protocols. rmcat had a good starter set here: http://www.ietf.org/proceedings/89/slides/slides-89-rmcat-2.pdf Regards, Nicolas On Apr 15, 2014, at 12:28 PM, Toke Høiland-Jørgensen t...@toke.dk wrote: Nicolas KUHN nicolas.k...@telecom-bretagne.eu writes: and realistic HTTP web traffic (repeated download of 700kB). As a reminder, please find here the comments of Shahid Akhtar regarding these values: The Cablelabs work doesn't specify web traffic as simply repeated downloads of 700KB, though. Quoting from [0], the actual wording is: Webs indicates the number of simultaneous web users (repeated downloads of a 700 kB page as described in Appendix A of [White]), Where [White] refers to [1] which states (in the Appendix): The file sizes are generated via a log-normal distribution, such that the log10 of file size is drawn from a normal distribution with mean = 3.34 and standard deviation = 0.84. The file sizes (yi) are calculated from the resulting 100 draws (xi ) using the following formula, in order to produce a set of 100 files whose total size =~ 600 kB (614400 B): And in the main text it specifies (in section 3.2.3) the actual model for the web traffic used: Model single user web page download as follows: - Web page modeled as single HTML page + 100 objects spread evenly across 4 servers. Web object sizes are currently fixed at 25 kB each, whereas the initial HTML page is 100 kB. Appendix A provides an alternative page model that may be explored in future work. - Server RTTs set as follows (20 ms, 30 ms, 50 ms, 100 ms). - Initial HTTP GET to retrieve a moderately sized object (100 kB HTML page) from server 1. - Once initial HTTP GET completes, initiate 24 simultaneous HTTP GETs (via separate TCP connections), 6 connections each to 4 different server nodes - Once each individual HTTP GET completes, initiate a subsequent GET to the same server, until 25 objects have been retrieved from each server. Which is a pretty far cry from just saying repeated downloads of 700 KB and, while still somewhat bigger, matches the numbers from Google better in terms of distribution between page sizes and other objects. And, more importantly, it features the kind of parallelism and interactions that a real web browser does; which, as Shahid mentioned is (can be) quite important for the treatment it receives by an AQM. As such I would be strongly in favour of changing the draft to actually describe realistic web client behaviour, rather than just summarising it as repeated downloads of 700KB. -Toke [0]
Re: [aqm] working group LAST CALL on recommendations draft
I still don't support wglc. a) Nit: Network Working Group? b) I have given up on using the term AQM to describe anything other than Active Queue Length Management algorithms. What I wrote about SQM is mostly outside the scope of the AQM guidelines document, but it's here: http://www.bufferbloat.net/projects/cerowrt/wiki/Smart_Queue_Management but I can live with the broad definition as used in this document. c) I also don't just mean fair or flow queuing when I say packet scheduling, we've identified an elephant in the room as is actually rate limiting (tbf, htb), or hybrid rate limiting/scheduling (hfsc), or powerboost style rate limiting/expansion Moving on: d) The traditional technique for managing the queue length in a network device is to set a maximum length (in terms of packets) - well, bytes is common on many devices like dlsams and cmts and modems... substitute: packets or bytes e) 2. Provide a lower-delay interactive service I tend to regard dns traffic also as rather significant and telnet is rather obsolete. ssh. f) 3. Non-TCP-friendly Transport Protocols There's also the problem of DDOS attacks. g) Another topic requiring consideration is the appropriate granularity of a flow when considering a queue management method. There are a few natural answers: 1) a transport (e.g. TCP or UDP) flow (source address/port, destination address/port, Differentiated Services Code Point - DSCP); 2) a source/destination host pair (IP addresses, DSCP); 3) a given source host or a given destination host. We suggest that the source/destination host pair gives the most appropriate granularity in many circumstances. I don't suggest the last, as I have no data that backs it up. and my request to include a 5 tuple address/port destination address/port and protocol isn't in here, although the MF classifier is mentioned later on in section 4.4. Elsewhere (in webrtc) there is an assumption that the 5 tuple is addr/port daddr/dport protocol. I don't actually think doing a 5 tuple including the dscp rather than the protocol is a very good idea given the amount of misclassified flows I see transiting site boundries. I do think moving out dscp flows into their own queues and then 5 tupling with protocol is not a bad idea. As for the final recomendations: h)4. AQM algorithms SHOULD respond to measured congestion, not application profiles. I'm not sure if this precludes active classification and optimization measures? http://www.smallnetbuilder.com/lanwan/lanwan-features/32297-does-qualcomms-streamboost-really-work Not all applications transmit packets of the same size. Although applications may be characterized by particular profiles of packet size this should not be used as the basis for AQM (see next section). From a packet scheduling perspective I strongly support using some differentiation based on packet size at low bandwidths. 300 bytes works well. for AQM, don't care, just care about latency no matter if it comes from a pps problem or a packet size problem. I didn't mind pie's increasing probability of a drop based on packet size (Which so far as I know is still in cablemodem pie, and: it helps on competing voip traffic) i) 4.5 might want to also mention more modern protocols like uTP and QUIC In 2013, an obvious example of further research is the need to consider the use of Map/Reduce applications in data centers; do we need to extend our taxonomy of TCP/SCTP sessions to include not only mice and elephants, but lemmings. Lemmings are flash crowds of mice that the network inadvertently try to signal to as if they were elephant flows, resulting in head of line blocking in data center applications. I like to talk about ANTS. Can suggest some language if you want. On Wed, Apr 9, 2014 at 8:35 AM, Wesley Eddy w...@mti-systems.com wrote: We didn't receive any comments yet on the updated recommendations draft, which we were trying to have a working group last call on per Richard's email to the list on 3/5. Since we think people might not have noticed the last call, we're re-announcing it. In the next two weeks, please review this document: https://datatracker.ietf.org/doc/draft-ietf-aqm-recommenda tion/ and relay any comments, questions, corrections, words of support, etc. to this AQM mailing list. Thanks for your help in finishing this document! -- Wes Eddy MTI Systems ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] packet loss (and therefore ECN) might not happen much actually
I'd be very interest in DNS request/reply analysis of that traffic. On Fri, Mar 21, 2014 at 9:27 PM, Fred Baker (fred) f...@cisco.com wrote: On Mar 4, 2014, at 9:12 AM, Eggert, Lars l...@netapp.com wrote: it looks like (in japan at least) TCP is very rarely controlled by packet loss (dupack or timeout) but more by sender or receiver rate limiting (or just being too short lived:) It would be interesting to know their delay variation. You've seen my famous 9 second delay graphic. There was no packet loss at all in that... You have also seen, I believe, my annotated Shepherd Diagram of an upload to Picasa. That was from Akasaka, and had three drops in a five second window, resulting in the session spending 40% of its duration underrunning available capacity. It would be interesting to know the traffic mix, the line speeds and latencies end to end, and so on. From my perspective, it's Really Hard to say the internet acts this way; consider the problem of the six blind philosophers and the elephant... What I think I *can* say is that I measured something in a certain way in a particular topological place at a particular time and with a particular workload, I analyzed it in a certain way, and in that measurement and analysis I observed ... something. If you want my guess at what the Japanese trace measured, it had upwards of 50 MBPS end to end and enough buffer at that rate in the bottleneck switch to prevent tail-drop loss in the ambient workload. Short sessions, which predominate, would not touch that, and high volume sessions might, as you say, self-limit in one of several ways. For comparison, yesterday, I took 24 hours of tcpdump trace on my laptop and wrote a reduction script. I started out by capturing 38 hours of traces earlier in the week in one hour chunks, and discovered that tcpdump zero-bases its data structures when it switches output files. Then I took a single 24 hour trace file. In that reduction, I distinguished between microflows *from* me and microflows *to* me (where me might be my IPv4 or my IPv6 address or name), which would be the two halves of a TCP session. I also threw out sessions that didn't make sense to me, such as ones that might have already been open when I started the trace. Reason? I have asymmetric bandwidth (12 MBPS down and 2 MBPS up, sez the contract, and I think that's interpreted as at least, as I have seen higher), and I expect the two directions to behave a little differently. Rates are in kilobits/second, and all numbers are for a session. I have TCP sessions that are as short as a single packet each way (data/RST, for whatever reason I might receive such things, and maybe SYN/SYN-ACK) and pipelined tcp connections lasting the better part of an hour (I opened all of my face:b00c friends' pages, which moved quite a bit of data, all using IPv6). my flows: 10548 my retransmissions: 4009 my packets: min=1 median=10 95%=33 max=73732 my bytes: min=1 median=2493 95%=19608 max=697486314 my durations: min=0.002751median=58.09656195%=120.355108 max=35764.656936 my kbps:min=0.74median=0.577529 95%=17.171851 max=1049048788.929813 his flows: 14977 his retransmissions: 2859 his packets:min=1 median=995%=104 max=181542 his bytes: min=1 median=3795 95%=110354 max=221579702 his durations: min=0.15median=46.14641295%=148.102106 max=35764.620901 his kbps: min=0.000459median=0.928163 95%=114.575466 max=22604.513177 There are some weird questions I want to understand about the max fields. I edited out sessions that were open when I started the trace, of which there were a few. There are a couple of other strange sessions. One of these days I might sort out the difference between 14977 and 10548. But I think the bottom line is that while the median session in my home office probably doesn't incur a loss, it looks to me like the ones at the 95th percentile for size probably does - and maybe several. ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
[aqm] stochastic hashing in hardware with a limited number of queues
The thread on netdev starting here: http://comments.gmane.org/gmane.linux.network/307532 was pretty interesting, where a research group at suny looked hard at the behavior of a 64 hw queue system running giant flows: http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf They ran smack into the birthday problem inherent in a small number of queues. And also a bug (now fixed). The conclusion of the thread was amusing, in that with the new sch_fq scheduler with a single hardware queue (and a string of fixes over the past year for tcp small queues and tso offloads), performed as well as the multi queue implementation... with utter fairness. On Sun, Mar 9, 2014 at 9:44 AM, Eric Dumazet eric.dumazet at gmail.com wrote: Multiqueue is not a requirement in your case. You can easily reach line rate with a single queue on a 10Gbe NIC. I repeated the experiment for 10 times using one tx queue with FQ, and all clients get fair share of the bandwidth. The overall throughout showed no difference between the single queue case and the mq case, and the throughput in both cases are close to the line rate. Sometimes merely because a feature is available on the hardware does not mean it should be used. Certainly multiple hw queues is a good idea for some traffic mixes, but not for the circumstances of this particular test series. -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] Notes
On Tue, Mar 4, 2014 at 5:53 PM, Scheffenegger, Richard r...@netapp.com wrote: First of all, thanks to the note takers. We've had quite some discussion around the AQM evaluation guideline draft, and I believe the notes capture many of the points brought up. If you have been up and made a comment on the Microphone, I would like you to check if the spirit of your comment has been properly captured in the notes: http://etherpad.tools.ietf.org:9000/p/notes-ietf-89-aqm Not even close to what I said. Is it too much to request, that for future meetings that the proceedings be recorded and comments transcribed? The technology exists... Dave Taht: Care a lot about inter-flow packet loss. Bursty is really bad. Like to have a metric on inter flow loss This reminds me of an old far side joke. http://hubpages.com/hub/Gary-Larson#slide209782 Substitute Packet loss for Ginger here. What I said was: I care a lot about interflow latency, jitter, and packet loss. Only bursty packet loss is really bad. I'd like to have a metric on interflow latency, jitter, and packet loss. Of these, packet loss is the *least* of my concerns. Our protocols recover from and compensate well from non-bursty packet loss, and packet loss IS the most common signal to tell protocols to slow down. and thus desirable... As an illustrative example, the cerowrt group has been working on ways to make aqm and packet scheduling technologies work well at rates well below 10Mbit, notably on the 768kbit uplinks common in the DSL world (which also has weird framing derived from the bad olde days of ATM) At below 100Mbit, TCP behavior is dominated by certain constants - notably the initial window, be it 3,4 or 10, but also MTU * IWx in relation to MSS, availability of pacing on on/off traffic with a large cwnd, etc. There are a string of recent tests put up here http://richb-hanover.com/ The first graph shows bufferbloat in all its glory on the link - well over 2secs of delay and goodput of about 1.6Mbits on the download. The remainder of the graphs are on variants of nfq_codel and fq_codel setups, but the core result was after applying the cerowrt SQM system (scheduling and aqm) goodput was way, way, up and latency way, way, down compared to the bufferbloat'd alternative, nearly triple the download goodput, and 1/50th the latency. (The debate is over how best to get better interflow results and the differences in results not much above a percentage point) - packet loss on this link after applying AQM was well over 35%! But as it is not bursty, and latency is held low, the link remains markedly useful, all the flows work pretty well, and the low rate flows are doing good... Thread for ongoing discussion here: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2014-February/002370.html Packet captures seem to show that MAC TCP is not reducing it's window to a reasonable value, nor is it reducing MSS to something more appropriate for the link rate. I'd recomend looking at the packet captures on that test to get a feel for how slow start, fast recovery and dup acks are is interact at these timescales. Packet loss, particularly when taken as a pure percentage is not a good metric for most measurements. Most of the time, I don't give a rats arse about it. Richard Scheffenegger NetApp r...@netapp.com +43 1 3676811 3146 Office (2143 3146 - internal) +43 676 654 3146 Mobile www.netapp.com EURO PLAZA Gebäude G, Stiege 7, 3.OG Am Euro Platz 2 A-1120 Wien ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] Draft Agenda for IETF89
On Sat, Feb 15, 2014 at 7:10 AM, Michael Welzl mich...@ifi.uio.no wrote: 14:40 draft-fairhurst-ecn-motivation Gorry Fairhurst 15 min This is apparently not a published draft yet. It's draft-welzl-ecn-benefits, http://tools.ietf.org/html/draft-welzl-ecn-benefits-00 It describes the benefits of ECN persuasively and well. I would rather like a section discussing the negatives. -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm
Re: [aqm] I-D Action: draft-ietf-aqm-recommendation-02.txt
On Thu, Feb 13, 2014 at 4:30 PM, Fred Baker (fred) f...@cisco.com wrote: Gorry and I have posted a second update to the AQM Recommendations draft discussed at IETF 88. This update mostly picks up nit-level matters. We, of course, invite review, and would suggest that reviews look at this version. A few nits. A) I have not bought into byte-pkt. I don't want to go into it today. In particular, I'd like the original pie benchmarks rerun now that that code doesn't have a byte sensitive dropping mode, and the two compared. Perhaps that would shed some light on the issue. B) Another topic requiring consideration is the appropriate granularity of a flow when considering a queue management method. There are a few natural answers: 1) a transport (e.g. TCP or UDP) flow (source address/port, destination address/port, Differentiated Services Code Point - DSCP); 2) a source/destination host pair (IP addresses, DSCP); 3) a given source host or a given destination host. add: 4) 5 tuple consisting of source ip/port, dest/port, proto. And we can hash it out later. C) We suggest that the source/destination host pair gives the most appropriate granularity in many circumstances. Back that up with measurements of real traffic from real homes and small businesses, and I'll believe you. Breaking up packet trains back into packets in sane ways is the only way to deal with the impact of iw10 at low bandwidths that I can think of, in particular. In the interim I would suggest language that waffles more as to appropriate methods. D) Traffic classes may be differentiated based on an Access Control List (ACL), the packet DiffServ Code Point (DSCP) [RFC5559], setting of the ECN field[RFC3168] [RFC4774] or an equivalent codepoint at a lower layer. Are you ruling out port number? I have no problem with (for example) deprioritizing port 873 (rsync) somewhat relevant to other traffic. Same goes for some other well known ports... Are you ruling out protocol number? Destination address? (stuff inside my network gets treated differently than stuff egressing) These are all common methods of classifying traffic that has codepoints that cannot be trusted. And regrettably, on inbound from another domain, diffserv values cannot be trusted, period. I don't know how to fit that into this draft, but a MUST regarding remarking inbound diffserv appropriately is needed. Right now I just quash everything inbound to BE. E) A malfunctioning or non-conforming network device may similarly hide an ECN mark. In normal operation such cases should be very uncommon. I disagree with the last sentence. ECN unleashed will be ECN abused. If the recent ntp flooding attacks were ECN marked, and ECN widely deployed, what would have happened? (I still strongly support the notion of ECN, but don't want to deprecate the dangers) A diff from IETF 88's version may be found at http://tools.ietf.org/rfcdiff?url1=http://tools.ietf.org/id/draft-ietf-aqm-recommendation-00.txturl2=http://tools.ietf.org/id/draft-ietf-aqm-recommendation-02.txt which is also http://tinyurl.com/k9tfufm On Feb 13, 2014, at 1:20 PM, internet-dra...@ietf.org wrote: A New Internet-Draft is available from the on-line Internet-Drafts directories. This draft is a work item of the Active Queue Management and Packet Scheduling Working Group of the IETF. Title : IETF Recommendations Regarding Active Queue Management Authors : Fred Baker Godred Fairhurst Filename: draft-ietf-aqm-recommendation-02.txt Pages : 22 Date: 2014-02-13 Abstract: This memo presents recommendations to the Internet community concerning measures to improve and preserve Internet performance. It presents a strong recommendation for testing, standardization, and widespread deployment of active queue management (AQM) in network devices, to improve the performance of today's Internet. It also urges a concerted effort of research, measurement, and ultimate deployment of AQM mechanisms to protect the Internet from flows that are not sufficiently responsive to congestion notification. The note largely repeats the recommendations of RFC 2309, updated after fifteen years of experience and new research. The IETF datatracker status page for this draft is: https://datatracker.ietf.org/doc/draft-ietf-aqm-recommendation/ There's also a htmlized version available at: http://tools.ietf.org/html/draft-ietf-aqm-recommendation-02 A diff from the previous version is available at: http://www.ietf.org/rfcdiff?url2=draft-ietf-aqm-recommendation-02 Please note that it may take a couple of minutes from the time of submission until the htmlized version and diff are available at tools.ietf.org. Internet-Drafts are also available by anonymous FTP at: ftp://ftp.ietf.org/internet-drafts/
[aqm] some comments I'd made on the cablelabs study privately
Since that study and test design were highly influential on the AQM requirements draft, I am going to publish now here what my comments were at the time, with a couple updates here and there I am not sure if any of the ns2 code from the last round made it out to the public. ? * Executive summary The Cablelabs AQM paper was the best simulation study of the effects of the new bufferbloat-fighting AQMs verses the common Internet traffic types in the home that has been created to date... However, it is only a study of half the edge network. It assumes throughout that there are no excessive latencies to be had from the CMTS side. Field measurements show latencies in excess of 1.8 seconds from the CMTS side. 300ms on Verizon gpon. Buffer sizes in the range of 64k to 512k on DSL in general, with some RED, SFQ, and SQF actually deployed there. So... while the focus has been on what is perceived as the larger problem, the cable modems themselves, downstream behavior was not studied and the entire simulation set to reasonable values for ns2 modelers... and not seen in the real world. In the real world (RW) Flows are almost always bidirectional. What happens on the downstream side affects the upstream side and vice versa, as per Van Jacobson's fountain analogy. Correctly compensating for bidirectional TCP dynamics is incredibly important. The second largest problem with original cablelabs study is that it only analyzed traffic at one specific (although common) setting for cable operators, 20Mbits down, and 5 Mbits up. A common, lower setting should be analyzed, as well as more premier services. Some tweaking of codel derived technologies (flows and quantum), and of pie (alpha and beta) are indicated both at lower and higher bandwidths for optimum results. Additionally the effects of classification, notably of background traffic, has not been explored. There are numerous other difficulties in the simulations and models that need to be understood in order to make good decisions moving forward. This document goes into more detail on those later. All the AQMs tested performed vastly better than standard FIFO drop tail as well as buffercontrol. They all require minimal configuration to work. With some configuration they can be made to work better. * Recomendations I'd made at the time ** Study be repeated using at least two more bandwidth settings ** More exact emulation of current CMTS behavior Based on real world measurements ** Addition of more traffic types, notably VPN and Videoconferencing ** Improvements to the VOIP, web models ** Continued attempts at getting real world and simulated benchmarks to line up. My approach has been to follow the simulation work and try to devise real world benchmarks that are similar, and feed back the results into the ongoing simulation process. There are multiple limitations in this method, too, notably getting repeatable results, and doing large scale tests on customer equipment, both of which are subject to heisenbugs. * Issues in the cablelabs study ** Downstream behavior Tests with actual cablemodems in actual configurations shows a significant amount of buffering on the downstream. At 20Mbits, DS buffering well in excess of 1 second has been observed. The effect of excessive buffering on this side has not been explored in these tests. Certain behaviors - TCP's burstiness as it opens it's window to account for what it is thinking as a long path - reflect interestingly on congestion avoidance, on the downstream, and the effects on the upstream side of the pair are interesting too. I note that my own RW statistics were often very skewed by some very bad ack behavior on TSO offloads that has been a bug in Linux for years and recently fixed. ** Web model *** The web model does not emulate DNS lookups. Caching DNS forwarders are typically located on a gateway box (not sure about cablemodems ??), and the ISP locates a full DNS server nearby, (within 10ms RTT). DNS traffic is particularly sensitive to delay, loss and head of line blocking, and slowed DNS traffic stalls subsequent tcp connections on sharded web traffic in particular. *** The web model does no caching A fairly large percentage (not high enough) of websites make use of various forms of caching, ranging from marking whole objects as cachable for a certain amount of time, or using the etags method to provide checksum-like value for an if-modified get request. Use of the former method eliminates a RTT entirely, the latter works inside of a http 1.1 pipeline well. *** The web model does not use https Establishing a secure http connection requires additional round trips. *** The web model doesn't emulate tons of tabs Web users, already highly interactive, now tend to have tons of tabs, all on individual web sites, many of which are doing some sort of polling or interaction in the background against the remote web server. These benchmarks do not emulate this highly
Re: [aqm] Prefatory comments re draft-aqm-reccommendation and -evaluation, and a question
On Thu, Jan 23, 2014 at 1:10 AM, Fred Baker (fred) f...@cisco.com wrote: No, you're not blowing smoke. I'm not sure I would compare the behavior to PMTUD, as in that the endpoint is given a magic number and manages to it, where in this case, it is given the results of its behavior, and it manages to improve that. But this is what I have rambled on about in threads relating to the size of a buffer. Folks would really like to have a magic way to calculate the buffer size (an amount of memory) they need to install in a router or switch, and it isn't that easy, because it has a lot to do with where in the network a system is located and how it is used by the applications that use it. But AQM, in the end, isn't about buffer size. It is about buffer occupancy. In the ideal case, if there are N sessions active on the bottleneck link in a path, we would like each to obtain 1/N of the bottleneck's capacity, which is to say that it should be able to maximize its throughput, while keeping an average of zero packets standing in the queue (minimizing both latency and variation in latency). If you know your math, you know that the ideal goal isn't actually achievable. But that doesn't stop us from trying to asymptotically approach it. I prefer to think of the goal as to keep a minimum of 1 packet in the queue, not as an average of 0. On Jan 17, 2014, at 3:51 PM, David Collier-Brown dave...@rogers.com wrote: I've been reading through the internet-drafts, and one paragraph struck me as very illuminating. This is therefor a sanity-check before I go full-hog down a particular path... The comment is from Baker and Fairhurst, https://datatracker.ietf.org/doc/draft-ietf-aqm-recommendation/ and the paragraph is [emphases added] The point of buffering in the network is to absorb data bursts and to transmit them during the (hopefully) ensuing bursts of silence. This is essential to permit the transmission of bursty data. Normally small queues are preferred in network devices, with sufficient queue capacity to absorb the bursts. The counter-intuitive result is that maintaining normally-small queues can result in higher throughput as well as lower end-to- end delay. In summary, queue limits should not reflect the steady state queues we want to be maintained in the network; instead, they should reflect the size of bursts that a network device needs to absorb. All of a sudden we're talking about the kinds of queues I know a little about (:-)) --- I'm going to suggest that these are queues and associated physical buffers that do two things: hold packets that arrive at a bottleneck for a long as it takes to send them out a slower link that they came in on, and hold bursts of packets that arrive adjacent to each other until they can be sent out in a normal spacing, with some small amount of time between them In an illustration of Dave Taht's, the first looks something like this -+ |X||X| +--- |X||X| |XXX||XXX| |X||X| +--- -+ At the choke-point there is a buffer at least big enough to give the packet a chance to wheel from line into column (:-)) and start down the smaller pipe. The speed at which the acks come back, the frequency of drops, and any explicit congestion notifications slows the sender until they don't overload the skinnier pipe, thus spacing the packets in the fatter pipe out. Various causes [Leland] can slow or speed the packets in the fat pipe, making it possible for several to arrive adjacent to each other, followed by a gap. The second purpose of a buffer is to hold these bursts while things space themselves back out. They need to be big enough at minimum to do the speed matching, and at maximum, big enough to spread a burst back into a normal progression, always assuming that acks, drops and explicit congestion notifications are slowing the sender to the speed of the slowest part of the network. --- If I'm right about this, we can draw some helpful conclusions buffer sizes can be set based on measurements: speed differences, which are pretty static, plus observed burstyness drops and ECN can be done to match the slowest speed in the path The latter suddenly sounds a bit like path MTU discovery, except it's a bit more dynamic, and varies with both path and what's happening in various parts of it. To me, as a capacity/performance nerd, this sounds a lot more familiar and manageable. My question to you, before I start madly scribbling on the internet drafts is: Am I blowing smoke? --dave -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest dav...@spamcop.net | -- Mark Twain (416) 223-8968 ___ aqm
Re: [aqm] Prefatory comments re draft-aqm-reccommendation and -evaluation, and a question
On Thu, Jan 23, 2014 at 1:15 AM, Dave Taht dave.t...@gmail.com wrote: On Thu, Jan 23, 2014 at 1:10 AM, Fred Baker (fred) f...@cisco.com wrote: No, you're not blowing smoke. I'm not sure I would compare the behavior to PMTUD, as in that the endpoint is given a magic number and manages to it, where in this case, it is given the results of its behavior, and it manages to improve that. But this is what I have rambled on about in threads relating to the size of a buffer. Folks would really like to have a magic way to calculate the buffer size (an amount of memory) they need to install in a router or switch, and it isn't that easy, because it has a lot to do with where in the network a system is located and how it is used by the applications that use it. But AQM, in the end, isn't about buffer size. It is about buffer occupancy. In the ideal case, if there are N sessions active on the bottleneck link in a path, we would like each to obtain 1/N of the bottleneck's capacity, which is to say that it should be able to maximize its throughput, while keeping an average of zero packets standing in the queue (minimizing both latency and variation in latency). If you know your math, you know that the ideal goal isn't actually achievable. But that doesn't stop us from trying to asymptotically approach it. I prefer to think of the goal as to keep a minimum of 1 packet in the queue, not as an average of 0. And that's not strictly true either. In the case of wifi, and other bundling technologies like those used in cable, you want to keep a minimum of a good aggregate of packets in the queue for that technology. On Jan 17, 2014, at 3:51 PM, David Collier-Brown dave...@rogers.com wrote: I've been reading through the internet-drafts, and one paragraph struck me as very illuminating. This is therefor a sanity-check before I go full-hog down a particular path... The comment is from Baker and Fairhurst, https://datatracker.ietf.org/doc/draft-ietf-aqm-recommendation/ and the paragraph is [emphases added] The point of buffering in the network is to absorb data bursts and to transmit them during the (hopefully) ensuing bursts of silence. This is essential to permit the transmission of bursty data. Normally small queues are preferred in network devices, with sufficient queue capacity to absorb the bursts. The counter-intuitive result is that maintaining normally-small queues can result in higher throughput as well as lower end-to- end delay. In summary, queue limits should not reflect the steady state queues we want to be maintained in the network; instead, they should reflect the size of bursts that a network device needs to absorb. All of a sudden we're talking about the kinds of queues I know a little about (:-)) --- I'm going to suggest that these are queues and associated physical buffers that do two things: hold packets that arrive at a bottleneck for a long as it takes to send them out a slower link that they came in on, and hold bursts of packets that arrive adjacent to each other until they can be sent out in a normal spacing, with some small amount of time between them In an illustration of Dave Taht's, the first looks something like this -+ |X||X| +--- |X||X| |XXX||XXX| |X||X| +--- -+ At the choke-point there is a buffer at least big enough to give the packet a chance to wheel from line into column (:-)) and start down the smaller pipe. The speed at which the acks come back, the frequency of drops, and any explicit congestion notifications slows the sender until they don't overload the skinnier pipe, thus spacing the packets in the fatter pipe out. Various causes [Leland] can slow or speed the packets in the fat pipe, making it possible for several to arrive adjacent to each other, followed by a gap. The second purpose of a buffer is to hold these bursts while things space themselves back out. They need to be big enough at minimum to do the speed matching, and at maximum, big enough to spread a burst back into a normal progression, always assuming that acks, drops and explicit congestion notifications are slowing the sender to the speed of the slowest part of the network. --- If I'm right about this, we can draw some helpful conclusions buffer sizes can be set based on measurements: speed differences, which are pretty static, plus observed burstyness drops and ECN can be done to match the slowest speed in the path The latter suddenly sounds a bit like path MTU discovery, except it's a bit more dynamic, and varies with both path and what's happening in various parts of it. To me, as a capacity/performance nerd, this sounds a lot more familiar and manageable. My question to you, before I start madly scribbling on the internet drafts is: Am I blowing smoke? --dave
Re: [aqm] Text for aqm-recommendation on independent ECN config
For starters, the codel signaling delay from the onset of continuous over 5ms delay on packets defaults to target 100ms, not 200ms. I don't know who started saying 200ms but even I started believing it with the few brain cells I've had to spare of late. 5x a CDN rtt in a world of 30-60k images sounds about right. Secondly, codel drops/marks from head, not from tail, so the signal gets back to the sender in 1/2 the real physical RTT after that, rather than the tail of a queue that may be out of control at that point. Much faster than pie. There has been so much misinformation spread of late on these threads. I'm hoping we're beginning to make a dent in it? I look forward to making all this clear on the upcoming RFCs. I think I should stop now, revisit the rest of this thread and see what else can be cleared up before even beginning to tackle fq_codel after I get caught up on sleep. as for your other comments... I have always said deploy RED and for that matter DRR, SFQ or SQF where you can. I distinctly remember polling the crowd at the first uknof I went to and being sad to discover only about 4% of the room had (4 people). I DO hold that red is too hard to configure for ordinary mortals, and that it doesn't work at all on variable bandwidth links like cable, or wireless, which happen to be the dominant form of end-user link nowadays. As for the hysteresis problem, in practice it doesn't seem to be much of a problem. things get well under control before a web page completes.Same goes for my tests against DASH traffic. I have plenty of plots and traces of this. Many are on the results webpage for bufferbloat.net. as for a good default for interval, a good number IS dependent on your RTT, and without coupling the ingress and output queues, it's difficult to determine or even auto tune that. Perhaps with connection tracking or some other form of coupling, one day. The ACC code from the gargoyle router project is worth looking at. I am satisfied that fq_codel can be deployed on fixed rate lines without any tuning on bandwidths ranging from 4mbit to 1gbit, today, as it stands. I have done hundreds of thousands of tests to prove that. Optimizations are helpful for the 3 band system that is what mostly deployed today, such as smaller quantum's on slow asymmetric links, and a smaller packet limit on low memory routers. A larger target is working well on sub 4ms links. I think that could auto tune better. A lower target and interval seem right for data center use, but I have yet to get anyone to run my suite of published tests. A rate limiter is required to compensate for ISP's lousy dslam/cmts/gpon head end and CPE at least until this code makes it onto those devices. Long lead times predominate on this sort of hardware - We have three years to get DOCSIS 3.1 right, as one example. These are second order problems that will be fixed over time. Wifi and wireless remain problematic, but dents in those problems seem imminent by next year, and many of the problems aren't aqm or packet scheduling ones. SO. damn straight, I'm one of the people pushing for deployment, notably on boxes that are easy to upgrade and fix as we learn more about what we should be doing. I'm definitely reluctant to hard code stuff into big iron or hard to replace firmware as yet. But as matt mathis said at ietf - what we have is such an improvement over what is in place today, that it is time to deploy. After almost 3 years of effort I'm happy to have a few million boxes in place to learn more from. Aren't you? We just have a couple billion boxes left to fix. Plenty of time to tweak things as we go along. If you want RED, or ARED, in linux, it's been fixed now for 2 years to perform as to the spec. Go for it. If you could create something to automate RED configuration as I have for the ceroshaper tool in cerowrt, let me know. Any time someone has debloating code worth working on... I'm willing to help. I've been helping on pie, and as you know I've been looking over your DCTCP experiment carefully, finding and fixing bugs, and moving the code forward to where it can be compared against a modern kernel and a modern TCP and modern AQM and packet scheduling systems. On Thu, Dec 12, 2013 at 4:05 PM, Bob Briscoe bob.bris...@bt.com wrote: Dave, At 22:11 12/12/2013, Dave Taht wrote: but quickly... Bob, I object to your characterization of users links being busy 1-3% of the time. That's an average. I said it was an average. You're repeating and agreeing with what I said, but saying you object to me saying it? When they are busy, they are very busy for short periods, typically 2-16 seconds in the case of web traffic, then idle for minutes. DASH traffic is busy for 2+ seconds every 10 on a 20mbit link, and so on, for 1.5 hours or so. Etc. Yes, again, you're agreeing with me. The mean for a Web session is towards the low end of the 2-16 seconds range even now. And as we get the other latency-saving advances out