Since that study and test design were highly influential on the AQM requirements draft, I am going to publish now here what my comments were at the time, with a couple updates here and there....
I am not sure if any of the ns2 code from the last round made it out to the public. ? * Executive summary The Cablelabs AQM paper was the best simulation study of the effects of the new bufferbloat-fighting AQMs verses the common Internet traffic types in the home that has been created to date... However, it is only a study of half the edge network. It assumes throughout that there are no excessive latencies to be had from the CMTS side. Field measurements show latencies in excess of 1.8 seconds from the CMTS side. 300ms on Verizon gpon. Buffer sizes in the range of 64k to 512k on DSL in general, with some RED, SFQ, and SQF actually deployed there. So... while the focus has been on what is perceived as the larger problem, the cable modems themselves, downstream behavior was not studied and the entire simulation set to "reasonable" values for ns2 modelers... and not seen in the real world. In the real world (RW) Flows are almost always bidirectional. What happens on the downstream side affects the upstream side and vice versa, as per Van Jacobson's "fountain" analogy. Correctly compensating for bidirectional TCP dynamics is incredibly important. The second largest problem with original cablelabs study is that it only analyzed traffic at one specific (although common) setting for cable operators, 20Mbits down, and 5 Mbits up. A common, lower setting should be analyzed, as well as more premier services. Some tweaking of codel derived technologies (flows and quantum), and of pie (alpha and beta) are indicated both at lower and higher bandwidths for optimum results. Additionally the effects of classification, notably of background traffic, has not been explored. There are numerous other difficulties in the simulations and models that need to be understood in order to make good decisions moving forward. This document goes into more detail on those later. All the AQMs tested performed vastly better than standard FIFO drop tail as well as buffercontrol. They all require minimal configuration to work. With some configuration they can be made to work better. * Recomendations I'd made at the time ** Study be repeated using at least two more bandwidth settings ** More exact emulation of current CMTS behavior Based on real world measurements ** Addition of more traffic types, notably VPN and Videoconferencing ** Improvements to the VOIP, web models ** Continued attempts at getting real world and simulated benchmarks to "line up". My approach has been to follow the simulation work and try to devise real world benchmarks that are similar, and feed back the results into the ongoing simulation process. There are multiple limitations in this method, too, notably getting repeatable results, and doing large scale tests on customer equipment, both of which are subject to heisenbugs. * Issues in the cablelabs study ** Downstream behavior Tests with actual cablemodems in actual configurations shows a significant amount of buffering on the downstream. At 20Mbits, DS buffering well in excess of 1 second has been observed. The effect of excessive buffering on this side has not been explored in these tests. Certain behaviors - TCP's "burstiness" as it opens it's window to account for what it is thinking as a long path - reflect interestingly on congestion avoidance, on the downstream, and the effects on the upstream side of the pair are interesting too. I note that my own RW statistics were often very skewed by some very bad ack behavior on TSO offloads that has been a bug in Linux for years and recently fixed. ** Web model *** The web model does not emulate DNS lookups. Caching DNS forwarders are typically located on a gateway box (not sure about cablemodems ??), and the ISP locates a full DNS server nearby, (within 10ms RTT). DNS traffic is particularly sensitive to delay, loss and head of line blocking, and slowed DNS traffic stalls subsequent tcp connections on sharded web traffic in particular. *** The web model does no caching A fairly large percentage (not high enough) of websites make use of various forms of caching, ranging from marking whole objects as cachable for a certain amount of time, or using the etags method to provide checksum-like value for an if-modified get request. Use of the former method eliminates a RTT entirely, the latter works inside of a http 1.1 pipeline well. *** The web model does not use https Establishing a secure http connection requires additional round trips. *** The web model doesn't emulate tons of tabs Web users, already highly interactive, now tend to have tons of tabs, all on individual web sites, many of which are doing some sort of polling or interaction in the background against the remote web server. These benchmarks do not emulate this highly common behavior. ** TCP Cubic in ns2 is not the same as modern TCP cubic in Linux Showing that these new AQMs work correctly with modern TCPs is essential. ** VOIP model The VOIP model measures *one-way* egress delays only, and does not track the burstiness of packet loss, or the burstiness of jitter. It uses an archaic codec emulation, as well. Something like opus should be looked at. ** The gaming model Treats gaming as equivalent to VOIP, with the same problems as the VOIP model. Gaming traffic is bidirectional, even more so than voip. Additionally, some of the traces I have access to use a 15ms, rather than 20ms period, to emitted packets. (I have collected a vast amount of gaming traffic data that I have not yet had time to analyze) ** Videoconferencing was not stuidied. ** Bittorrent model The torrent model is flawed in multiple ways. Notably it only tests phase III of the typical torrent cycle, not the download or down/up phases... **** The download saturation problem Torrent can and WILL saturate a downlink and not respond to congestion indicators until over 100ms of delay is observed. Most clients do have a ratelimit set for download. It is often turned off after-hours. **** The upload saturation problem shown in the study Bittorrent clients have evolved to where, out of the box, there is a very low rate limit set, typically in the range of 50-150KBytes/sec. This makes bittorrent uploads a non-problem for most people. Still, benchmarking each of these phases would be worthwhile. Torrent can be fixed. *** LEDBAT != Bittorrent LEDBAT as defined and uTP as deployed remain significantly different. No implementation of bittorrent I've looked at (utorrent and transmission) behaves anything like the LEDBAT modules I have. *** TCP-LEDBAT kernel module is buggy In my RW tests this Linux congestion control module never gets out of SSTHRESH and into congestion avoidance. The behavior LOOKs like bittorrent (look! It's scavaging), when in reality, it's merely stuck at a low rate. However, it is possible this module works correctly under older linuxes under ns2. The RW behavior of bittorrent under RED and SFQ was explored in [[YIXI2012]] and under those two AQM systems, formerly scavaging flows are reprioritized to have roughly the same weight as non-scavaging flows until 100ms of delay is incurred. The simulated LEDBAT results do show that extraordinarily high numbers of persistent high rate flows interact badly with fq_codel, at this 20/4 bandwidth setting. Using another TCP that is not buggy will probably be even worse. However, bittorrent is a very special case. Dozens of full rate flows are extremely rare in the real world. (RW benchmarks show even fairly large numbers of flows (50+) show fq_codel still doing quite well) ** IP address Hashing ns2 does not have support for a full 5 tuple including the protocol (e.g. TCP, UDP, ARP, etc). This makes hashing multiple protocols together problematic and I'm unsure if this was compensated for correctly in all the models. ** Fq_codel and PIE configuration with 10ms target While I buy into (kathie doesn't) the idea that the delay target variable needs to be greater than the cable MAC media acquisition time (in this model), it does not need to be set to 10ms. 6.X should be sufficient to avoid MAC acquisition artifacts. There is a quadratic response time to delay in TCP, so the doubling of the default target to aim for of 5ms to 10ms results in much fuller queues. (Lest I be mis-interpreted, the quadratic behavior is end to end and I honestly don't know what differences we'd see between a 6.5ms target and a 10ms target) PIE has similar constraints, (but apparently ran fine on the cable MAC) so a target delay of 6.X or 7 being tried for both would be interesting. **I note that (2014) pie has now grown a target of 20ms in the Linux implementation, and a closer tie with the htb rate shaper in the as-yet-unpublished cablelabs model. ** At very, very low bandwidths (<4mbit) we have found it desireable to increase the fq_codel target to account for a single MTU at that bandwidth. The fq_codel interval estimation window in the cablelabs testing was set to 150ms, rather than the default 100ms. I note that most experimental variants of codel fiddle with the "drop resumption" portion of the algorithm which is very sensitive to the interval. It's an area of research... The estimation window in pie has grown to 10k bytes. * rate limiters have a cost. At higher rates, more fq_codel queues are of use, and the RW tests point to the rate limiter being the principal CPU hog and source of problems, the drop/mark/scheduler algorithm hardly enter into it. The PIE codel and fq_codel algorithms barely show upon a trace.... ** Back to back packet drop Gaming and VOIP traffic tolerate single, random packet drop with aplomb. It's bursty packet loss and sudden delays that cause audible artifacts, and gaming misbehavior. You can hear 3 packets lost in a row, (which is no different from a sudden delay of 60ms on a stream) So in addition to the packet loss figure, a good measurement would be a bursty packet loss, and a bursty delay graph. Hopefully in the aqm cases it follows a random distribution, but... (Similarly many tcps respond to bursty packet loss in a drastic fashion, but do not react too much to bursty delay) Another problem with VOIP is "creeping delay", where a voip queue builds and builds and then delivers or drops a full boatload of packets to catch up. I have experienced this on multiple wifi based voip sessions where I ended up with seconds of delay on the line over time... ** ns2 issues Many of the problems in this test series are due to using an obsolete and undermaintained network model system (ns2). Alternatives exist that are better (ns3, minitel, etc) in multiple aspects. While ns2 can certainly be improved the cost/benefit ratio of using ns3 seems better. (There is no harm in using other technologies as a cross-check too.) *** Ns2 doesn't support many modern networking features, like ECN. *** ns2 doesn't have TOS values (so it can't do diffserv) *** ns2 doesn't have port numbers fq_codel uses a 5 tuple of port, protocol, src and destination ips * Statistical notes Analyzing network queue, and network traffic behavior does not lend itself to many means of statistical analysis. In particular, throwing out the upper or lower percentiles of most results is a bad practice - with real time systems, it's the outliers that are interesting and important. This paper uses CDF plots throughout, which is a fine way to measure the full range of results, however the use of a log scale is difficult on the unpracticed eye. In it's summary form, the paper uses a method of averaging together the results of each separate subtest, and the weighting is unclear. This in particular gives the bittorrent result a lot more weight than one would expect. NOTE: RW = real world * Overall recommendations... ** Continue developing better models! ** Find a cable vendor willing to do the exploratory work It took me 24 hours to port the code from ns2 to Linux. Analysis of several other operating systems indicate that a week in conjunction with a local expert would be enough to make this code work on most other network OSes. ** Better models of media acquisition, (cable, wifi, lte), aggregation, or scheduling characteristics (wifi, lte, token ring) dissimilar to ethernet. * Nits ** fq_pie Given the probalistic dropping technique in PIE, it too can benefit from flow queuing, fair queueing or weighted fair queuing tecniques inserted before it in the chain, with only one PIE queue needed to do so. (This is unlike fq_codel, which due to the isochronous nature of the timestamping has to have one codel queue per fq in order to work, presently) The prospect of fq + pie seems quite promising. * Packet Trains * Next steps in the RW cerowrt tests ** pie and fq_pie improvements Cerowrt gained pie support back in august or so and has tracked revisions 1-4. Codel has had some tiny tweaks as well. ** Several new codel and fq_codel models based on the cablelabs testing, it is evident that number of queues could be dependent on the available bandwidth, which can be made automatic, there are also a few tweaks that can be made to pie and codel (ecn handling, notably) Also it seems possible to improve fq_codel's behavior by adopting slightly different strategies for nearly empty queues, under load. Like (original) PIE, favoring dropping big packets more, particularly when a queue is at 1 and the system is experiencing large delays, seems plausible to handle the bittorrent problem while not affecting most other traffic types. ** classification Under test are several shapers that use limited amounts of classification. In the RW, in the campground test site, Over 52% of all packets are marked CS1 on egress. Applications are obviously trying to deprioritize themselves, it seems logical to try to support that. * Side notes It should be clear that these AQMs and packet schedulers can apply not only to edge networks, but anywhere there is a fast to slow transition on a network. This includes within a machine itself! It may well be easier to apply these technogies to load balancers, high end Linux based routers, and vm root hardware, piecemeal, far faster than they can be deployed en-mass across the customer edge network. Linux based servers can also benefit, today. Deploying these AQM technologies on any scale will help gain needed operational experience with them before they have to be burned into hard to update edge customer network hardware. -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html _______________________________________________ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm