Re: [Bloat] Congestion control with FQ-Codel/Cake with Multicast?
> On 24 May, 2024, at 12:43 am, Holland, Jake via Bloat > wrote: > > I agree with your conclusion that FQ systems would see different > streams as separate queues and each one could be independently > overloaded, which is among the reasons I don't think FQ can be > viewed as a solution here (though as a mitigation for the damage > I'd expect it's a good thing to have in place). Cake has the facility to override the built-in flow and tin classification using custom filter rules. Look in the tc-cake manpage under "OVERRIDING CLASSIFICATION". This could be applicable to multicast traffic in two ways: 1: Assign all traffic with a multicast-range IP address to the Video tin. Since Cake schedules by tin first, and only then by host and/or flow, this should successfully keep multicast traffic from obliterating best-effort and Voice tin traffic. 2: Assign all multicast traffic to a single flow ID (eg. zero), without reassigning the tin. This will cause it all to be treated like a single flow, giving the FQ mechanisms something to bite on. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "
> On 21 May, 2024, at 8:32 pm, Sebastian Moeller wrote: > >> On 21. May 2024, at 19:13, Livingood, Jason via Bloat >> wrote: >> >> On 5/21/24, 12:19, "Bloat on behalf of Jonathan Morton via Bloat wrote: >> >>> Notice in particular that the only *performance* comparisons they make are >>> between L4S and no AQM at all, not between L4S and conventional AQM - even >>> though they now mention that the latter *exists*. >> >> I cannot speak to the Nokia deck. But in our field trials we have certainly >> compared single queue AQM to L4S, and L4S flows perform better. I don't dispute that, at least insofar as the metrics you prefer for such comparisons, under the network conditions you also prefer. But by omitting the conventional AQM results from the performance charts, the comparison presented to readers is not between L4S and the current state of the art, and the expected benefit is therefore exaggerated in a misleading way. An unbiased presentation would alert readers to the fact that merely deploying a conventional AQM would already eliminate nearly all of the queue-related delay associated with a dumb FIFO, without sacrificing much if any goodput. By doing this, they would also not expose themselves to the risks associated with deploying L4S (see below). >>> There's also no mention whatsoever of what happens when L4S traffic meets a >>> conventional AQM. >> >> We also tested this and all is well; the performance of classic queue with >> AQM is fine. > > [SM] I think you are thinking of a different case than Jonathan, not classic > traffic in the C-queue, but L4S traffic (ECT(1)) that by chance is not hiting > abottleneck employing DualQ but the traditional FIFO... > This is the case where at least TCP Prague just folds it, gives up and goes > home... > > Here is Pete's data showing that, the middle two bars show what happens when > the bottleneck is not treating TCP Prague to the expected signalling... This isn't even the case I was thinking of. Neither "classic" traffic in the C queue (a situation which L4S has always been designed to accommodate, however much we might debate the effectiveness of the design), nor L4S traffic in a dumb FIFO (which, though it performs badly, is at least "safe"), but L4S traffic in a "classic" RFC-3168 AQM, of the type which is already deployed to some extent. This is what exposes the fundamental incompatibility between L4S and conventional traffic, as I have been saying from practically the moment I heard about L4S. It's unfortunate that this case is not covered in the chart that Sebastian linked. The situation arose because that particular chart is focused on a performance concern, not a safety concern which was treated elsewhere in the report. What it would show, if a fourth qdisc such as "codel" were included (with ECN turned on), is a similar magnitude of throughput bias as in the "pfifo" qdisc, but in the opposite direction. Note that the bias in the "pfifo" case arises solely because Prague does not *scale up* to high BDPs in the way that CUBIC does. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "
> On 21 May, 2024, at 6:31 pm, Frantisek Borsik via Bloat > wrote: > > Just "fresh from the oven", shared by Jason on social media: > > https://ripe88.ripe.net/wp-content/uploads/presentations/67-20240521_RIPE88_L4S_introduction_Werner_Coomans_upload.pdf The usual set of half-truths, with a fresh coat of paint. Notice in particular that the only *performance* comparisons they make are between L4S and no AQM at all, not between L4S and conventional AQM - even though they now mention that the latter *exists*. There's also no mention whatsoever of what happens when L4S traffic meets a conventional AQM. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] The Confucius queue management scheme
> On 10 Feb, 2024, at 7:05 pm, Toke Høiland-Jørgensen via Bloat > wrote: > > This looks interesting: https://arxiv.org/pdf/2310.18030.pdf > > They propose a scheme to gradually let new flows achieve their fair > share of the bandwidth, to avoid the sudden drops in the available > capacity for existing flows that can happen with FQ if a lot of flows > start up at the same time. I took some time to read and think about this. The basic idea is delightfully simple: "old" flows have a fixed weight of 1.0; "new" flows have a weight of (old flows / new flows) * 2^(k*t), where t is the age of the flow and k is a tuning constant, and are reclassified as "old" flows when this quantity reaches 1.0. They also describe a queuing mechanism which uses these weights, which while mildly interesting in itself, isn't directly relevant since a variant of DRR++ would also work here. I noticed four significant problems, three of which arise from significant edge cases, and the fourth is an implementation detail which can easily be remedied. I didn't see any discussion of these edge cases in the paper, only the implementation detail. The latter is just a discretisation of the exponential function into doubling epochs, probably due to an unfamiliarity with fixed-point arithmetic techniques. We can ignore it when thinking about the wider design theory. The first edge case is already fatal unless somehow handled: starting with an idle link, there are no "old" flows and thus the numerator of the equation is zero, resulting in a zero weight for any number of new flows which then arise. There are several reasonable and quite trivial ways to handle this. The second edge case is the dynamic behaviour when "new" flows transition to "old" ones. This increases the numerator and decreases the denominator for other "new" flows, causing a cascade effect where several "new" flows of similar but not identical age suddenly become "old", and younger flows see a sudden jump in weight, thus available capacity. This would become apparent in realistic traffic more easily than in a lab setting. A formulation which remains smooth over this transition would be preferable. The third edge case is that there is no described mechanism to remove flows from the "old" set when they become idle. Most flows on the Internet are in practice short, so they might even go permanently idle before leaving the "new" set. If not addressed, this becomes either a memory leak or a mechanism for the flow hash table to rapidly fill up, so that in practice all flows are soon seen as "old". The DRR++ mechanism doesn't suffice, because the state in Confucius is supposed to evolve over longer time periods, much longer than the sojourn time of an individual packet in the queue. The basic idea is interesting, but the algorithmic realisation of the idea needs work. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] slow start improvement
> On 28 Dec, 2023, at 12:17 pm, Sebastian Moeller via Bloat > wrote: > > The inherent idea seems to be if one would know the available capacity one > could 'jump' the cwnd immediately to that window... (ignoring the fact the > rwnd typically takes a while to increase accordingly*). Yes, I've just got to the bit about selectively ignoring rwnd - that's a straight violation of TCP. There may be scope for optimising congestion control in various ways, but rwnd is a fundamental part of the protocol that predates congestion control itself; it implements TCP's original function of "flow control". Sending data outside the rwnd invites the receiver invoking RST, or even firewall action, which I can guarantee will have a material impact on flow completion time! Slow-start already increases cwnd to match the BDP in at most 20 RTTs, and that's the extreme condition, starting from an IW of 1 segment and ramping up to the maximum possible window of 2^30 bytes (assuming an MSS of at least 1KB, which is usual). The more recent standard of having IW=10 already shortens that by 3-4 RTTs. It's an exponential process, so even quite large changes in available bandwidth don't affect the convergence time very much. TCP's adaptation to changes in the BDP after slow-start is considerably slower, even with CUBIC. I also note a lack of appreciation as to how HyStart (and HyStart++) works. Their delay-sensitive criterion is triggered not when the cwnd exceeds the BDP, but at an earlier point when the packet bursts (issued at double the natural ack-clocked rate) cause a meaningful amount of temporary queue delay. This queuing is normally drained almost immediately after it occurs, *precisely because* the cwnd has not yet reached the true path BDP. This allows slow-start to transition to congestion-avoidance smoothly, without a multiplicative-decrease episode. HyStart++ adds a further phase of exponential growth on a more cautious schedule, but with essentially the same principle in mind. The irony is that they rely on precisely the same phenomenon of short-term queuing, but observe it in the form of the limited delivery rate of a burst, rather than an increase in delay on the later packets of the burst. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Best approach for debloating Airbnb host?
> On 17 Oct, 2023, at 4:10 pm, Sebastian Moeller via Bloat > wrote: > > [SM] This is why maybe a demo unit would be helpful, but then we would > need something with commercial grade support to point them at? Maybe > evenroute's IQrouter (I like their approach, but I never tested it). For IETF Montreal and Singapore, I carried along my IQrouter and temporarily inserted it into the network of the AirBnBs we used - for which the host wasn't directly present. I only had to inform it that there was a new network to calibrate itself to, and it ran the necessary capacity tests automatically. It's also possible to inform it directly about the line's rated capacity, and it will just run tests to verify that the capacity is actually available. Mine is the v2 hardware, which is no longer the one sold, but the v3 is just a newer model from the same underlying vendor. There seems to be enough commonality for a similar feature set and UI to be available in both versions. I'm sure that simplifies support logistics. They would easily be able to cope with an 80/20 line. > 2. What would I recommend? Obviously, inserting something with cake into the > mix would help a lot. Even if they were willing to let me examine their > entire network (Comcast router, Apple Airport in our Airbnb unit, other > router?) I have no idea what kind of tar baby I would be touching. I don't > want to become their network admin for the rest of time. For a one-stop "plug it in and go" solution, the IQrouter is hard to beat. Evenroute also do a reasonably good job of explaining the technical background on the necessary level for end users, to help them understand what needs to be plugged into what and why, and more importantly where things should NOT be plugged in any more. Of course, while the IQrouter has a decent WiFi AP of its own, installing it wouldn't directly improve the WiFi characteristics of the Apple Airport - it's quite understandable to have a separate AP for guests, in particular so they don't have to "shout through a wall". However, if the airwaves are not overly congested (we found that the 2.4GHz band was a mess in Montreal, but 5GHz was fine), that probably doesn't matter, as the WiFi link may not be the bottleneck. If necessary, it could be substituted with a debloated AP - if there's one we can recommend with the "new wifi stack", so much the better. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Starlink] [LibreQoS] [Rpm] net neutrality back in the news
> On 29 Sep, 2023, at 1:19 am, David Lang via Bloat > wrote: > > Dave T called out earlier that the rise of bittorrent was a large part of the > inital NN discussion here in the US. But a second large portion was a money > grab from ISPs thinking that they could hold up large paid websites (netflix > for example) for additional fees by threatening to make their service less > useful to their users (viewing their users as an asset to be marketed to the > websites rather than customers to be satisfied by providing them access to > the websites) > > I don't know if a new round of "it's not fair that Netflix doesn't pay us for > the bandwidth to service them" would fall flat at this point or not. I think there were three more-or-less separate concerns which have, over time, fallen under the same umbrella: 1: Capacity-seeking flows tend to interfere with latency-sensitive flows, and the "induced demand" phenomenon means that increases in link rate do not in themselves solve this problem, even though they may be sold as doing so. This is directly addressed by properly-sized buffers and/or AQM, and even better by FQ and SQM. It's a solved problem, so long as the solutions are deployed. It's not usually necessary, for example, to specifically enhance service for latency-sensitive traffic, if FQ does a sufficiently good job. An increased link rate *does* enhance service quality for both latency-sensitive and capacity-seeking traffic, provided FQ is in use. 2: Swarm traffic tends to drown out conventional traffic, due to congestion control algorithms which try to be more-or-less fair on a per-flow basis, and the substantially larger number of parallel flows used by swarm traffic. This also caused subscribers using swarm traffic to impair the service of subscribers who had nothing to do with it. FQ on a per-flow basis (see problem 1) actually amplifies this effect, and I think it was occasionally used as an argument for *not* deploying FQ. ISPs' initial response was to outright block swarm traffic where they could identify it, which was then softened to merely throttling it heavily, before NN regulations intervened. Usage quotas also showed up around this time, and were probably related to this problem. This has since been addressed by several means. ISPs may use FQ on a per-subscriber basis to prevent one subscriber's heavy traffic from degrading service for another. Swarm applications nowadays tend to employ altruistic congestion control which deliberately compensates for the large number of flows, and/or mark them with one or more of the Least Effort class DSCPs. Hence, swarm applications are no longer as damaging to service quality as they used to be. Usage quotas, however, still remain in use as a profit centre, to the point where an "unlimited" service is a rare and precious specimen in many jurisdictions. 3: ISPs merged with media distribution companies, creating a conflict of interest in which the media side of the business wanted the internet side to actively favour "their own" media traffic at the expense of "the competition". Some ISPs began to actively degrade Netflix traffic, in particular by refusing to provision adequate peering capacity at the nodes through which Netflix traffic predominated, or by zero-rating (for the purpose of usage quotas) traffic from their own media empire while refusing to do the same for Netflix traffic. **THIS** was the true core of Net Neutrality. NN regulations forced ISPs to carry Netflix traffic with reasonable levels of service, even though they didn't want to for purely selfish and greedy commercial reasons. NN succeeded in curbing an anti-competitive and consumer-hostile practice, which I am perfectly sure would resume just as soon as NN regulations were repealed. And this type of practice is just the sort of thing that technologies like L4S are designed to support. The ISPs behind L4S actively do not want a technology that works end-to-end over the general Internet. They want something that can provide a domination service within their own walled gardens. That's why L4S is a NN hazard, and why they actively resisted all attempts to displace it with SCE. All of the above were made more difficult to solve by the monopolistic nature of the Internet service industry. It is actively difficult for Internet users to move to a truly different service, especially one based on a different link technology. When attempts are made to increase competition, for example by deploying a publicly-funded network, the incumbents actively sabotage those attempts by any means they can. Monopolies are inherently customer-hostile, and arguments based on market forces fail in their presence. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Ecn-sane] The curious case of "cursed-ECN" steam downloads
> On 3 Sep, 2023, at 9:54 pm, Sebastian Moeller via Ecn-sane > wrote: > > B) Excessive ECT(1) marking (this happened with a multi-GB download) This *could* be a badly configured middlebox attempting to apply a DSCP, but clobbering the entire TOS byte instead of the (left justified) DSCP field. Apparently Comcast just found a whole raft of these in their own network as part of rolling out L4S support. Funny how they didn't notice them previously. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Ecn-sane] quick question
> On 26 Aug, 2023, at 2:48 pm, Sebastian Moeller via Ecn-sane > wrote: > > percentage of packets marked: 100 * (2346329 / 3259777) = 72% > > This seems like too high a marking rate to me. I would naively expect that a > flow on getting a mark scale back by its cwin by 20-50% and then slowly > increaer it again, so I expect the actual marking rate to be considerably > below 50% per flow... > My gut feeling is that these steam flows do not obey RFC3168 ECN (or > something wipes the CE marks my router sends upstream along the path)... but > without a good model what marking rate I should expect this is very > hand-wavy, so if anybody could help me out with an easy derivation of the > expected average marking rate I would be grateful. Yeah, that's definitely too much marking. We've actually seen this behaviour from Steam servers before, but they had fixed it at some point. Perhaps they've unfixed it again. My best guess is that they're running an old version of BBR with ECN negotiation left on. BBRv1, at least, completely ignores ECE responses. Fortunately BBR itself does a good job of congestion control in the FQ environment which Cake provides, as you can tell by the fact that the queues never get full enough to trigger heavy dropping. The CUBIC RFC offers an answer to your question: Reading the table, for RTT of 100ms and throughput 100Mbps in a single flow, a "loss rate" (equivalent to a marking rate) of about 1 per 7000 packets is required. The formula can be rearranged to find a more general answer. - Jonathan Morton___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Cake] Anybody has contacts at Dropbox?
> On 25 Jun, 2023, at 12:00 am, Sebastian Moeller via Cake > wrote: > > Is dropbox silently already using an L4S-style CC for their TCP? It should be possible to distinguish this by looking at the three-way handshake at the start of the connection. This will show a different set of TCP flags and ECN field values depending on whether RFC-3168 or AccECN is being attempted. Without AccECN, you won't have functioning L4S on a TCP stream. But I think it is more likely that it's a misapplied DSCP. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] SQM tuning question
> On 3 Jun, 2023, at 4:56 pm, John D via Bloat > wrote: > > On the website it says the following: > > CoDel is a novel “no knobs”, “just works”, “handles variable bandwidth and > RTT”, and simple AQM algorithm. > > • It is parameterless — no knobs are required for operators, users, or > implementers to adjust. > • It treats good queue and bad queue differently - that is, it keeps > the delays low while permitting bursts of traffic. > • It controls delay, while insensitive to round-trip delays, link > rates, and traffic loads. > • It adapts to dynamically changing link rates with no negative impact > on utilization. > > But everywhere I have read about about hardware which implements SQM > (including the bufferbloat website) it describes the need to tune based on > actual internet connection speed. > These seem to conflict especially that "handles variable bandwidth" bit. Have > I misunderstood or do the algorithms used in modern hardware just not provide > this part typically? My connection performance is quite variable and I'm > worried about crippling SQM to the lowest speed seen. SQM in practice requires three components: 1: Flow isolation, so that different flows don't affect each others' latency and are delivered fairly; 2: Active Queue Management (AQM) to signal flows to slow down transmissions when link capacity is exceeded; 3: Bandwidth shaping to match the queue to the available capacity. CoDel is, in itself, only the AQM component. It does indeed work pretty well with no additional tuning - but only in combination with the other two components, or when applied directly to the actual bottleneck. Unfortunately in most consumer internet links, the actual bottleneck is inaccessible for this purpose. Thus an artificial bottleneck must be introduced, at which SQM is applied. The most convenient tool for applying all three SQM components at once is Cake. This includes implementations of advanced flow isolation, CoDel AQM, and a deficit-mode bandwidth shaper. All you really need to do is to tell it how much bandwidth you have in each direction, minus a small margin to ensure it becomes the actual bottleneck and can exert the necessary control. When your available bandwidth varies over time, that can be inconvenient. There are methods, however, of observing how available capacity tends to change over time (typically on diurnal and weekly patterns, if the variations are due to congestion in the ISP backhaul or peering) and scheduling adjustments on that basis. If you have more information on your situation, we might be able to give more detailed advice. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Codel] ACM queue article on facebook´s "Adaptive LIFO" and codel
> On 11 Apr, 2023, at 5:12 am, Dave Taht wrote: > > I have no idea what an "adaptive LIFO" is, but the acm queue paper > here just takes the defaults from codel... > > https://twitter.com/teivah/status/1645362443986640896 They're applying it to a server request queue, not a network packet queue. I can see the logic of it in that context, but I would also note that LIFO breaks one of Codel's core assumptions, which is that the maximum delay of the queue it's controlling can be inferred from the delay experienced by the most recently dequeued item. Maybe it still happens to work by accident, or maybe they've implemented some specific workaround, but that paper is a very high-level overview (of more than one technology, to boot) without much technical detail. If I didn't already know a great deal about Codel from the coal face, I wouldn't even know to consider such a failure mode, let alone be able to infer what they could do to mitigate it. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Hey, all, what about bandlength?
> On 8 Apr, 2023, at 9:49 pm, Michael Richardson via Bloat > wrote: > >> If I have a bandwidth of 1 Mbit/S, but it takes 2 seconds to deliver 1 >> Mbit, do I have a bandlength of only 1/2 Mbit/S? > > Is that because there is 2seconds of delay? It could merely be that, due to new-flow effects, the effective utilisation of the path is only 50% over those two seconds. A longer flow might have better utilisation in its later stages. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [ih] Installed base momentum (was Re: Design choices in SMTP)
> -- Forwarded message - > From: Jack Haverty via Internet-history > > Even today, as an end user, I can't tell if "congestion control" is > implemented and working well, or if congestion is just mostly being > avoided by deployment of lots of fiber and lots of buffer memory in all > the switching locations where congestion might be expected. That of > course results in the phenomenon of "buffer bloat". That's another > question for the Historians. Has "Congestion Control" in the Internet > been solved? Or avoided? It's a good question, and one that shows understanding of the underlying problem. TCP has implemented a workable congestion control system since the introduction of Reno, and has continued to take congestion control seriously with the newer flavours of Reno (eg. NewReno, SACK, etc) and CUBIC. Each of these schemes reacts to congestion *signals* from the network; they probe gradually for capacity, then back off rapidly when that capacity is evidently exceeded, repeatedly. Confusingly, this process is called the "congestion avoidance" phase of TCP, to distinguish it from the "slow start" phase which is, equally confusingly, a rapid initial probe for path capacity. CUBIC's main refinement is that it spends more time near the capacity limit thus found than Reno does, and thus scales better to modern high-capacity networks at Internet scale. In the simplest and most widespread case, the overflow of a buffer, resulting in packet loss, results in that loss being interpreted as a congestion signal, as well as triggering the "reliable stream" function of retransmission. Congestion signals can also be explicitly encoded by the network onto IP packets, in the form of ECN, without requiring packet losses and the consequent retransmissions. My take is that *if* networks focus only on increasing link and buffer capacity, then they are "avoiding" congestion - a strategy that only works so long as capacity consistently exceeds load. However, it has repeatedly been shown in many contexts (not just networking) that increased capacity *stimulates* increased load; the phenomenon is called "induced demand". In particular, many TCP-based Internet applications are "capacity seeking" by nature, and will *immediately* expand to fill whatever path capacity is made available to them. If this causes the path latency to exceed about 2 seconds, DNS timeouts can be expected and the user experience will suffer dramatically. Fortunately, many networks and, more importantly, equipment providers are now learning the value of implementing AQM (to apply congestion signals explicitly, before the buffers are full), or failing that, of sizing the buffers appropriately so that path latency doesn't increase unreasonably before congestion signals are naturally produced. This allows TCP's sophisticated congestion control algorithms to work as intended. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] summarizing the bitag latency report?
ge "railcars" are also available for hire. For the next month's timetable, instead of the two 12-carriage trains each day, he will run one of these railcars every hour. These will provide exactly the same seating capacity over the course of the day, but the waiting time will now be limited to a much more palatable duration. (In Internet terms, he's optimised squarely for latency.) Still the complaints come in - but now from different sources. No longer are passengers waiting for hours and sleeping overnight in stations. Instead, rush-hour commuters who had previously found the 12-carriage trains convenient are finding the railcars too crowded. Even with over a hundred passengers crammed in like sardines, many more are left on the platforms and arrive at work late - or worse, come home to a cold dinner and an annoyed wife. Simply put, demand is not evenly distributed through the day, but concentrated on particular times; at other times, the railcars are sufficient for the relatively small number of passengers, or even run almost empty. So again, even though the "Quality of Service" is provided just as specified, the "Quality of Experience" for the passengers is very poor. Indeed the overcrowding leads to some railcars being delayed, due to the difficulty of getting everyone in and out of the doors, and the conductors have great difficulty in checking tickets, hence a noticeable reduction in fare revenue. Things improve markedly when the manager brings in 6-carriage express trains for the morning, lunchtime, and evening commuters, and continues to run the railcars at hourly intervals in between them, except for the small hours when some trains are removed due to minimal demand. Now there are enough carriages in the rush-hour trains to satisfy commuters, and there are still trains running at other times so that nobody needs to wait particularly long for one. In fact, demand increases substantially due to the good "Quality of Experience" that this new timetable provides, such that by the end of the first year, many of the railcars are upgraded to 3-carriage trains, and the commuter expresses are lengthened to 8 carriages. Fare revenue is more than doubled. The modernisation effort is a success. The lesson here is that QoS is merely the means by which you may attempt to achieve high QoE. Meeting QoS does not guarantee QoE. Only if the QoS is designed around the factors that genuinely influence QoE will you succeed. Unfortunately, many QoS schemes are inadequate for the needs of actual Internet users; this is because their designers have not kept up with the appropriate QoE factors. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Researchers discover major roadblock in alleviating network congestion
> On 5 Aug, 2022, at 2:46 am, Daniel Sterling wrote: > > "Flow control power is non-decentralizable" is from -- 1981? So we've > known for 40 years that TCP streams won't play nicely with each other > unless you shape them at the slower endpoint-- am I understanding that > correctly? But we keep trying anyway? :) More precisely, what was stated in 1981 was: The specific metric of "network power" (the ratio of throughput to delay, calculated for each flow and globally summed) cannot reliably be maximised solely by the action of individual endpoints, without information from within the network itself. Current TCPs generally converge not to maximise or even equalise "network power", but to equalise between flows a completely different metric called "RTT fairness", the *product* of throughput and delay. Adding information from the network via AQMs allows for reductions in delay with little effect on throughput, and thus a general increase in network power, but the theoretical global optimum is still not even approached. Adding FQ in the network, thus implementing "max-min fairness" instead of "RTT fairness", hence equalising throughput instead of the product of throughput and delay. This is essentially the geometric mean of RTT-fairness and network power. I believe it is actually possible to achieve equalisation of network power between flows, which would approach the global optimum of network power, using information from the network to guide endpoint behaviour. This is *only* possible using explicit information from the network, however, and is not directly compatible with the current congestion-control paradigm of RTT-fairness by default. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Researchers discover major roadblock in alleviating network congestion
> On 4 Aug, 2022, at 3:21 pm, Bjørn Ivar Teigen via Bloat > wrote: > > Main take-away (as I understand it) is something like "In real-world > networks, jitter adds noise to the end-to-end delay such that any algorithm > trying to infer congestion from end-to-end delay measurements will > occasionally get it wrong and this can lead to starvation". Seems related to > Jaffe's work on network power (titled "Flow control power is > non-decentralizable"). Hasn't this been known for many years, as a consequence of experience with TCP Vegas? - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] updating the theory of buffer sizing
> On 10 Oct, 2021, at 8:48 pm, Dave Taht wrote: > > This latest from Nick & co, was quite good: > > https://arxiv.org/pdf/2109.11693.pdf Skip the false modesty - I think this is very important work, actually. I would expect it to get cited a heck of a lot in future work, both in academia and in the IETF. In terms of its content, it confirms, contextualises, and formalises various things that I already understood at an intuitive level. The mathematics involved is simple and accessible (unlike some papers I've read recently), and the practical explanations of the observed behaviours are clear and to the point. I particularly appreciate the way they were able to parameterise certain characteristics on a continuum, rather than all-or-nothing, as that captures the complex characteristics of real traffic much better. The observations about synchronisation of congestion responses are also very helpful. When synchronised, the aggregate behaviour of many flows is similar to that of a much smaller number, perhaps even a single flow. When desynchronised, the well-known statistical multiplexing effects apply. They also clearly explain why the "hard threshold" type of ECN marking is undesirable - because it provokes synchronisation in a way that tail-drop does not (and this is also firmly related to a point we discussed last week). Notably, they started seeing the effects of burstiness, on a small and theoretically "smooth" network, on timescales of approximately a seventh of a millisecond (20 packets, 9000 byte MTU, 10Gbps). They were unable to reduce buffer sizes below that level without throughput dropping well below their theoretical predictions, which had held true down to that point. This has implications for setting AQM targets and tolerances in even near-ideal network environments. But they did also note that BBR showed much less sensitivity to this effect, as it uses pacing. In any case, it confirms that the first role of a buffer is to absorb bursts without excessive loss. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Relentless congestion control for testing purposes
> On 29 Sep, 2021, at 2:17 am, Dave Taht wrote: > > In today's rpm meeting I didn't quite manage to make a complicated point. > This long-ago proposal of matt mathis's has often intrigued (inspired? > frightened?) me: > > https://datatracker.ietf.org/doc/html/draft-mathis-iccrg-relentless-tcp-00 > > where he proposed that a tcp variant have no response at all to loss or > markings, merely replacing lost segments as they are requested, continually > ramping up until the network basically explodes. I think "no response at all" is overstating it. Right in the abstract, it is described as removing the lost segments from the cwnd; ie. only acked segments result in new segments being transmitted (modulo the 2-segment minimum). In this sense, Relentless TCP is an AIAD algorithm much like DCTCP, to be classified distinctly from Reno (AIMD) and Scalable TCP (MIMD). Relentless congestion control is a simple modification that can be applied to almost any AIMD style congestion control: instead of applying a multiplicative reduction to cwnd after a loss, cwnd is reduced by the number of lost segments. It can be modeled as a strict implementation of van Jacobson's Packet Conservation Principle. During recovery, new segments are injected into the network in exact accordance with the segments that are reported to have been delivered to the receiver by the returning ACKs. Obviously, an AIAD congestion control would not coexist nicely with AIMD based traffic. We know this directly from experience with DCTCP. It cannot therefore be recommended for general use on the Internet. This is acknowledged extensively in Mathis' draft. > In the context of *testing* bidirectional network behaviors in particular, > seeing tcp tested more than unicast udp has been, in more labs, has long been > on my mind. Yes, as a tool specifically for testing with, and distributed with copious warnings against attempting to use it more generally, this might be interesting. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] DSLReports Speed Test doesn't like Remote Desktop
> On 28 Aug, 2021, at 10:36 pm, Michael Richardson wrote: > > RDP (specifically with Windows as the desktop) is integrated into the display > pipeline such that it effectively never loses frames. The results of an > (e.g.) Excel redraw over a slow link can be spectactically stupid with every > cell being drawn each time it is "re"-computed. The result is that the > application itself is blocked when the RDP frames are being generated. > > I/we observed this a decade ago when building virtual desktop infrastructure. > There was a Linux Xrdp server (via a bunch of patches that didn't survive) > that was more screen-scraper. VNC has always screen scraped the pixels, so it > "naturally" skips the intermediate frames when the application draws faster > than then remote desktop protocol can keep up. > > I thought that there were patches to RDP to make this better, but I never > confirmed this. Funnily enough, I was actually in the VNC community for a while, having written a functioning server for Classic MacOS, so I'm familiar with this dilemma. Due to some quirks of Classic MacOS, it was often necessary to do the screen-scraping, encoding and socket transmissions at interrupt time, and I had to limit the amount of data generated at any given time so that it didn't block on a full buffer - which could lock *everything* up. My experience of modern browser rendering pipelines is that they do everything in backbuffers, then blit them to the screen wholesale. This *should* be quite efficient for an RDP to handle, so long as it excludes areas that were unchanged on consecutive blits. But it's also possible for it to pick up drawing to background tabs, and only after much CPU effort determine that nothing visibly changed. At any rate, the original problem turned out to be something else entirely. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] DSLReports Speed Test doesn't like Remote Desktop
> On 27 Aug, 2021, at 2:25 am, Kenneth Porter wrote: > > The DSLReports speed test gives me <5 Mbps download speed when used over a > Remote Desktop connection. The same test gives me around 200 Mbps when run on > my machine connected to my display. The Waveform test shows 200 Mbps from the > remote machine. All are done with Chrome. Bunch of tabs open on both, similar > sets of extensions. > > I'm testing my Comcast XB3 modem + OpenWrt router before upgrading it to XB7. > > I use two computers, both Win10-x64. One's a half-height with a bit better > CPU and memory that I use for development and web/mail, while the other has a > full-height tower chassis so it has my good video card for gaming. I have my > big 43" display hooked to the latter and I remote to the short machine for > "business" use. > > https://www.waveform.com/tools/bufferbloat?test-id=62b54f0c-eb3e-40c8-ab99-4f2105f39525 > > This one looks very poor, 4 Mbps: > > http://www.dslreports.com/speedtest/69341504 > > Much better, direct instead of through RDP: > > http://www.dslreports.com/speedtest/69341657 A browser-based speed test like DSLreports depends heavily on the responsiveness of the browser itself. It would appear that RDP interferes with that quite spectacularly, although I'm unsure exactly why. The only advice I can give is "don't do that, then". - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Rpm] Airbnb
> On 10 Aug, 2021, at 7:51 am, Matt Mathis via Bloat > wrote: > > For years we published a "jitter" metric that I considered to be bogus, > basically max_rtt - min_rtt, (which were builtin Web100 instruments). > > In 2019, we transitioned from web100 to "standard" linux tcp_info, which does > not capture max_rtt. Since the web100 jitter was viewed as bogus, we did > not attempt to reconstruct it, although we could have. Designing and > implementing a new latency metric was on my todo list from the beginning of > that transition, but chronically preempted by more pressing problems. > > It finally made it to the top of my queue which is why I am suddenly not > lurking here and the new rpm list. I was very happy to see the Apple > responsiveness metric, and realized that M-Lab can implement a TCP version of > it, that can be computed both in real time on future tests and retroactively > over archived tests collected over the last 12 years. > > This quick paper might be of interest: Preliminary Longitudinal Study of > Internet Responsiveness Intriguing. The properly processed version of the data will probably show the trends more clearly, too. I think there is merit in presenting the European data as well, so long as the discontinuities caused by topological/geographical alterations can be identified and indicated. There are some particular local phenomena that I think would be reflected in that, such as the early rollout of fq_codel by free.fr. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Rpm] Airbnb
> On 9 Aug, 2021, at 10:25 pm, Dave Collier-Brown > wrote: > > My run of it reported latency, but without any qualifiers... One would reasonable assume that's idle latency, then. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Make-wifi-fast] [Starlink] [Cake] [Cerowrt-devel] Due Aug 2: Internet Quality workshop CFP for the internet architecture board
> On 8 Aug, 2021, at 9:36 pm, Aaron Wood wrote: > > Less common, but something I still see, is that a moving station has > continual issues staying in proper MIMO phase(s) with the AP. Or I think > that's what's happening. Slow, continual movement of the two, relative to > each other, and the packet rate drops through the floor until they stop > having relative motion. And I assume that also applies to time-varying > path-loss and path-distance (multipath reflections). So is it time to mount test stations on model railway wagons? - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Starlink] Of interest: Comcast AQM Paper
On Wed, 4 Aug 2021 at 21:31, Juliusz Chroboczek wrote: > A Cortex-A53 SoC at 1GHz with correctly designed Ethernet (i.e. not the > Raspberry Pi) can push 1Gbit from userspace without breaking a sweat. That was true of the earlier Raspberry Pis (eg. the Pi 3 uses a brace of Cortex-A53s) which use Ethernet chipsets attached over USB 2, but the Pi 4B has a directly integrated Ethernet port and two of the external USB ports are USB 3, giving enough bandwidth to attach a second GigE port. We have tested this in practice, and got full line rate throughput through Cake (though the CPU usage went up fairly sharply after about halfway). The Compute Module 4 exposes the same integrated Ethernet port, and a PCIe lane in place of the USB 3 chipset (the latter being attached to the former in the standard Pi 4B). This obviously allows attaching at least one real GigE port (with a free choice of PCIe-based chipset) at full line rate, without the intermediate step of USB. I think it would be reasonable to include a small Ethernet switch downstream of this, matching the connectivity of typical CPE on the LAN side. If a PCIe switch is inserted, then a choice of Mini-PCIe Wifi cards can be installed, with cables running to the normal array of external antennae, sidestepping the problem of USB Wifi dongles. ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Starlink] Of interest: Comcast AQM Paper
I firmly believe this is due to an I/O bottleneck in the SoC between the network complex and the CPU complex, not due to any limitation of the CPU itself. It stems from the reliance on accelerated forwarding hardware to achieve full line-rate throughput. Even so, I'd much rather have 40Mbps with Cake than 400Mbps with a dumb FIFO. (Heck, 40Mbps would be a big upgrade from what I currently have.) I think some of the newer Atheros chipsets are less constrained in this respect. There are two reasonably good solutions to this problem in the hands of the SoC vendors: 1: Relieve that I/O bottleneck, so that the CPU can handle packets at full line rate. I assume this is not hugely complicated to implement, and just requires a sufficient degree of will to select the right option from the upstream fabless IP vendor's design library. 2: Implement good shaping, FQ, and AQM within the network complex. At consumer broadband/LAN speeds, this shouldn't be too difficult (unlike doing the same at 100+ Gbps), but it does require a significant amount of hardware design and validation, and that tends to have long lead times. There is a third solution in the hands of us mere mortals: 3: Leverage the Raspberry Pi ecosystem to build a CPE device that meets our needs. This could be a Compute Module 4 (which has the necessary I/O throughput) mounted on a custom PCB that provides additional Ethernet ports and some reasonable Wifi AP. It could alternatively be a standard Pi 4B with some USB Ethernet and Wifi hardware plugged into it. Either will do the job withhout any Ethernet bottlenecks, although the capabilities of USB Wifi dongles are usually quite limited. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Starlink] Of interest: Comcast AQM Paper
> I assume by WiFi what is really meant is devices that have at least one WiFi > (layer 1/layer 2) interface. While there are queues in the MAC sublayer, > there is really no queue management functionality ... yet ... AFAIK. I know > IEEE P802.11bd in conjunction w/ IEEE 1609 is working on implementing a few > rudimentary queue mgmt functions. > > That said, seems any AQM in such devices would more than likely be in layer 3 > and above. Linux-based CPE devices have AQM functionality integrated into the Wifi stack. The AQM itself operates at layer 3, but the Linux Wifi stack implementation uses information from layers 2 and 4 to improve scheduling decisions, eg. airtime-fairness and flow-isolation (FQ). This works best on soft-MAC Wifi hardware, such as ath9k/10k and MT76, where this information is most readily available to software. In principle it could also be implemented in the MAC, but I don't know of any vendor that's done that yet. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Make-wifi-fast] Little's Law mea culpa, but not invalidating my main point
> On 12 Jul, 2021, at 11:04 pm, Bob McMahon via Make-wifi-fast > wrote: > > "Flow control in store-and-forward computer networks is appropriate for > decentralized execution. A formal description of a class of "decentralized > flow control algorithms" is given. The feasibility of maximizing power with > such algorithms is investigated. On the assumption that communication links > behave like M/M/1 servers it is shown that no "decentralized flow control > algorithm" can maximize network power. Power has been suggested in the > literature as a network performance objective. It is also shown that no > objective based only on the users' throughputs and average delay is > decentralizable. Finally, a restricted class of algorithms cannot even > approximate power." > > https://ieeexplore.ieee.org/document/1095152 > > Did Jaffe make a mistake? I would suggest that if you model traffic as having no control feedback, you will inevitably find that no control occurs. But real Internet traffic *does* have control feedback - though it was introduced some time *after* Jaffe's paper, so we can forgive him for a degree of ignorance on that point. Perhaps Jaffe effectively predicted the ARPANET congestion collapse events with his analysis. > Also, it's been observed that latency is non-parametric in it's distributions > and computing gaussians per the central limit theorem for OWD feedback loops > aren't effective. How does one design a control loop around things that are > non-parametric? It also begs the question, what are the feed forward knobs > that can actually help? Control at endpoints benefits greatly from even small amounts of information supplied by the network about the degree of congestion present on the path. This is the role played first by packets lost at queue overflow, then deliberately dropped by AQMs, then marked using the ECN mechanism rather than dropped. AQM algorithms can be exceedingly simple, or they can be rather sophisticated. Increased levels of sophistication in both the AQM and the endpoint's congestion control algorithm may be used to increase the "network power" actually obtained. The required level of complexity for each, achieving reasonably good results, is however quite low. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] rpm (was: on incorporating as an educational institution(s)?)
> On 11 Jul, 2021, at 1:15 am, Kenneth Porter wrote: > > What is "rpm"? I only know of the Redhat Package Manager and revolutions per > minute. I don't see it explained on the mailing list page or in the mailing > list postings. It has been discussed recently. It is "Rounds Per Minute", Apple's new measure of network responsiveness. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Little's Law mea culpa, but not invalidating my main point
> On 10 Jul, 2021, at 2:01 am, Leonard Kleinrock wrote: > > No question that non-stationarity and instability are what we often see in > networks. And, non-stationarity and instability are both topics that lead to > very complex analytical problems in queueing theory. You can find some > results on the transient analysis in the queueing theory literature > (including the second volume of my Queueing Systems book), but they are > limited and hard. Nevertheless, the literature does contain some works on > transient analysis of queueing systems as applied to network congestion > control - again limited. On the other hand, as you said, control theory > addresses stability head on and does offer some tools as well, but again, it > is hairy. I was just about to mention control theory. One basic characteristic of Poisson traffic is that it is inelastic, and assumes there is no control feedback whatsoever. This means it can only be a valid model when the following are both true: 1: The offered load is *below* the link capacity, for all links, averaged over time. 2: A high degree of statistical multiplexing exists. If 1: is not true and the traffic is truly inelastic, then the queues will inevitably fill up and congestion collapse will result, as shown from ARPANET experience in the 1980s; the solution was to introduce control feedback to the traffic, initially in the form of TCP Reno. If 2: is not true then the traffic cannot be approximated as Poisson arrivals, regardless of load relative to capacity, because the degree of correlation is too high. Taking the iPhone introduction anecdote as an illustrative example, measuring utilisation as very close to 100% is a clear warning sign that the Poisson model was inappropriate, and a control-theory approach was needed instead, to capture the feedback effects of congestion control. The high degree of statistical multiplexing inherent to a major ISP backhaul is irrelevant to that determination. Such a model would have found that the primary source of control feedback was human users giving up in disgust. However, different humans have different levels of tolerance and persistence, so this feedback was not sufficient to reduce the load sufficiently to give the majority of users a good service; instead, *all* users received a poor service and many users received no usable service. Introducing a technological control feedback, in the form of packet loss upon overflow of correctly-sized queues, improved service for everyone. (BTW, DNS becomes significantly unreliable around 1-2 seconds RTT, due to protocol timeouts, which is inherited by all applications that rely on DNS lookups. Merely reducing the delays consistently below that threshold would have improved perceived reliability markedly.) Conversely, when talking about the traffic on a single ISP subscriber's last-mile link, the Poisson model has to be discarded due to criterion 2 being false. The number of flows going to even a family household is probably in the low dozens at best. A control-theory approach can also work here. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Abandoning Window-based CC Considered Harmful (was Re: Bechtolschiem)
> On 8 Jul, 2021, at 4:29 pm, Matt Mathis via Bloat > wrote: > > That said, it is also true that multi-stream BBR behavior is quite > complicated and needs more queue space than single stream. This complicates > the story around the traditional workaround of using multiple streams to > compensate for Reno & CUBIC lameness at larger scales (ordinary scales > today).Multi-stream does not help BBR throughput and raises the queue > occupancy, to the detriment of other users. I happen to think that using multiple streams for the sake of maximising throughput is the wrong approach - it is a workaround employed pragmatically by some applications, nothing more. If BBR can do just as well using a single flow, so much the better. Another approach to improving the throughput of a single flow is high-fidelity congestion control. The L4S approach to this, derived rather directly from DCTCP, is fundamentally flawed in that, not being fully backwards compatible with ECN, it cannot safely be deployed on the existing Internet. An alternative HFCC design using non-ambiguous signalling would be incrementally deployable (thus applicable to Internet scale) and naturally overlaid on existing window-based congestion control. It's possible to imagine such a flow reaching optimal cwnd by way of slow-start alone, then "cruising" there in a true equilibrium with congestion signals applied by the network. In fact, we've already shown this occurring under lab conditions; in other cases it still takes one CUBIC cycle to get there. BBR's periodic probing phases would not be required here. > IMHO, two approaches seem to be useful: > a) congestion-window-based operation with paced sending > b) rate-based/paced sending with limiting the amount of inflight data So this corresponds to approach a) in Roland's taxonomy. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Really getting 1G out of ISP?
> On 7 Jul, 2021, at 12:27 pm, Wheelock, Ian wrote: > > It is entirely possible through the mechanics of DOCSIS provisioning that AQM > could be enabled or disabled on different CMs or groups of CMs. Doing so > would be rather petty and may add additional unnecessary complexity to the > provisioning system. Users that own their CMs are still paying for the > internet access with the specific ISP, so would likely expect equivalent > performance. Entirely true, but for the ISP the matter of whether the subscriber is using a rented or self-owned modem is not entirely petty - it is the difference of a line item on the monthly bill. I'm sure you can see how the perverse incentives arise with that. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Credit and/or collaboration on a responsiveness metric?
> On 6 Jul, 2021, at 2:21 am, Matt Mathis wrote: > > The rounds based responsiveness metric is awesome! There are several > slightly different versions, with slightly different properties > > I would like to write a little paper (probably for the IAB workshop), but > don't want to short change anybody else's credit, or worse, scoop somebody > else's work in progress. I don't really know if I am retracing somebody > else's steps, or on a parallel but different path (more likely). I would be > really sad to publish something and then find out later that I trashed some > PhD students' thesis It's possible that I had some small influence in originating it, although Dave did most of the corporate marketing. My idea was simply to express delays and latencies as a frequency, in Hz, so that "bigger numbers are better", rather than always in milliseconds, where "smaller numbers are better". The advantage of Hz is that you can directly compare it to framerates of video or gameplay. Conversely, an advantage of "rounds per minute" is that you don't need to deal with fractions or rounding for relatively modest and common levels of bloat, where latencies of 1-5 seconds are typical. I'm not overly concerned with taking credit for it, though. It's a reasonably obvious idea to anyone who takes a genuine interest in this field, and other people did most of the hard work. > Please let me know if you know of anybody else working in this space, of any > publications that might be in progress or if people might be interested in > another collaborator. There are two distinct types of latency that RPM can be used to measure, and I have written a short Internet Draft describing the distinction: https://www.ietf.org/archive/id/draft-morton-tsvwg-interflow-intraflow-delays-00.html Briefly, "inter-flow delays" (or BFID) are what you measure with an independent latency-measuring flow, and "intra-flow delays" (or WFID) are what you measure by inserting latency probes into an existing flow (whether at the protocol level with HTTP2, or by extracting it from existing application activity). The two typically differ when the path bottleneck has a flow-isolating queue, or when the application flow experiences loss and retransmission recovery. I think both measures are important in different contexts. An individual application may be concerned with its own intra-flow delay, as that determines how quickly it can respond to changes in network conditions or user intent. Network engineers should be concerned with inter-flow delays, as those determine what effect a bulk application load has on other, more latency-sensitive applications. The two are also optimally controlled by different mechanisms - FQ versus AQM - which is why the combination of the two is so powerful. Feel free to use material from the above with appropriate attribution. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Bechtolschiem
> On 2 Jul, 2021, at 7:59 pm, Stephen Hemminger > wrote: > > In real world tests, TCP Cubic will consume any buffer it sees at a > congested link. Maybe that is what they mean by capture effect. First, I'll note that what they call "small buffer" corresponds to about a tenth of a millisecond at the port's link rate. This would be ludicrously small at Internet scale, but is actually reasonable for datacentre conditions where RTTs are often in the microseconds. Assuming the effect as described is real, it ultimately stems from a burst of traffic from a particular flow arriving at a queue that is *already* full. Such bursts are expected from ack-clocked flows coming out of application-limited mode (ie. on completion of a disk read), in slow-start, or recovering from earlier losses. It is also possible for a heavily coalesced ack to abruptly open the receive and congestion windows and trigger a send burst. These bursts occur much less in paced flows, because the object of pacing is to avoid bursts. The queue is full because tail drop upon queue overflow is the only congestion signal provided by the switch, and ack-clocked capacity-seeking transports naturally keep the queue as full as they can - especially under high statistical multiplexing conditions where a single multiplicative decrease event does not greatly reduce the total traffic demand. CUBIC arguably spends more time with the queue very close to full than Reno does, due to the plateau designed into it, but at these very short RTTs I would not be surprised if CUBIC is equivalent to Reno in practice. The solution is to keep some normally-unused space in the queue for bursts of traffic to use occasionally. This is most naturally done using ECN applied by some AQM algorithm, or the AQM can pre-emptively and selectively drop packets in Not-ECT flows. And because the AQM is more likely to mark or drop packets from flows that occupy more link time or queue capacity, it has a natural equalising effect between flows. Applying ECN requires some Layer 3 awareness in the switch, which might not be practical. A simple alternative it to drop packets instead. Single packet losses are easily recovered from by retransmission after approximately one RTT. There are also emerging techniques for applying congestion signals at Layer 2, which can be converted into ECN signals at some convenient point downstream. However it is achieved, the point is that keeping the *standing* queue down to some fraction of the total queue depth reserves space for accommodating those bursts which are expected occasionally in normal traffic. Because those bursts are not lost, the flows experiencing them are not disadvantaged and the so-called "capture effect" will not occur. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Apple WWDC Talks on Latency/Bufferbloat
> On 11 Jun, 2021, at 10:14 pm, Nathan Owens wrote: > > round-trips per minute Wow, one of my suggestions finally got some traction. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Fwd: Traffic shaping at 10~300mbps at a 10Gbps link
> On 7 Jun, 2021, at 8:28 pm, Rich Brown wrote: > > Saw this on the lartc mailing list... For my own information, does anyone > have thoughts, esp. for this quote: > > "... when the speed comes to about 4.5Gbps download (upload is about > 500mbps), chaos kicks in. CPU load goes sky high (all 24x2.4Ghz physical > cores above 90% - 48x2.4Ghz if count that virtualization is on)..." This is probably the same phenomenon that limits most cheap CPE devices to about 100Mbps or 300Mbps with software shaping, just on a bigger scale due to running on fundamentally better hardware. My best theory to date on the root cause of this phenomenon is a throughput bottleneck between the NIC and the system RAM via DMA, which happens to be bypassed by a hardware forwarding engine within the NIC (or in an external switch chip) when software shaping is disabled. I note that 4.5Gbps is close to the capacity of a single PCIe v2 lane, so checking the topology of the NIC's attachment to the machine might help to confirm my theory. To avoid the problem, you'll either need to shape to a rate lower than the bottleneck capacity, or eliminate the unexpected bottleneck by implementing a faster connection to the NIC that can support wire-speed transfers. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Educate colleges on tcp vs udp
> On 27 May, 2021, at 10:42 am, Hal Murray > wrote: > > I would back up. You need to understand how networks work before discussing > TCP or UDP. > > The internet is not like a phone system. There are no connections within the > network and hence no reserved bandwidth and nothing like a busy signal to > tell > you that the network is full. (There are host-host connections, but the > network doesn't know anything about them.) Packets are delivered on a > best-efforts basis. They may be dropped, delayed, mangled, or duplicated. You're right - the distinction between Bell and ARPA networking is a crucial foundation topic. A discussion of the basic 10base Ethernet PHY (and how that fundamentally differs from the 8kHz multiplex of a traditional telephone network) might be helpful, since the intended audience already understands things like modulation. Once that is established, you can talk about how reliable stream transports are implemented on top of an ARPA-style network, using Ethernet as a concrete example. There are a lot of gritty details about how IP and TCP work that can be glossed over for a fundamental understanding, and maybe filled in later. Things like Diffserv, the URG pointer, option fields, and socket timeouts are not relevant topics. There's no need to actually hide them from a header diagram, but just highlight the fields that are fundamental to getting a payload from A to B. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] AQM & Net Neutrality
> On 24 May, 2021, at 10:18 pm, Stuart Cheshire via Bloat > wrote: > > When first class passengers board the plane first, all economy passengers > wait a little bit longer as a result. Technically, they all get to the runway at the same time anyway; the first-class pax just get out of the terminal to sit in their airline seats waiting for longer, while the more congested cattle-class cabin sorts itself out. If the latter process were optimised better, the first-class passengers might actually end up waiting less, and pretty much everyone would benefit accordingly. Where first-class passengers *do* have an advantage is in priority lanes at check-in and security. It means they can turn up at the airport later to catch the same flight, without fear of missing it and without having to spend unnecessary hours in duty-free hell. They also get posher waiting lounges with "free" food. It is that sort of atmosphere that Net Neutrality advocates object to in computer networking. I believe NN advocates will respond positively to concrete signs of improvement in perceived consumer fairness and reduction of costs to consumers. I also believe that implementing AQM well is a key enabler towards those improvements. That is probably the right perspective for "selling" AQM to them. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] AQM & Net Neutrality
>> Maybe the worries I have heard just points out the need for more >> education/awareness about what delay is and why things like AQM are not >> prioritization/QoS? I appreciate any thoughts. > > I'm pleased to help with education in this area. The short and simplistic > answer would be that AQM treats all traffic going through it equally; the > non-interactive traffic *also* sees a reduction in latency; though many > people won't viscerally notice this, they can observe it if they look > closely. More importantly, it's not necessary for traffic to make any sort > of business or authentication arrangement in order to benefit from AQM, only > comply with existing, well-established specifications as they already do. There is one more point I'd like to touch on up front. Net Neutrality first became a concern with file-sharing "swarm" protocols, and then with video-on-demand services. The common feature of these from a technical perspective, is high utilisation of throughput capacity, to the detriment of other users sharing the same back-end and head-end ISP infrastructure. Implementing AF-AQM or FQ-AQM within the backhaul and head-end equipment, not to distinguish individual 5-tuple flows but merely traffic associated with different subscribers, would fairly share out back-end and head-end capacity between subscribers. This would reduce the pressure on the ISP to implement policies and techniques that violate Net Neutrality and/or are otherwise unpopular with consumers, such as data caps. This assumes (as I believe has been represented in some official forums) that these measures are due to technical needs rather than financial greed. I'm aware of some reasonably fast equipment that already implements AF-AQM commercially. My understanding is that similar functionality can also be added to many recent cable head-ends by a firmware upgrade. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] AQM & Net Neutrality
mply hold bulk traffic to its "fair share", and keep it out of the way of interactive traffic, without also reducing the delay to the bulk traffic flows. I would suggest that if you implement FQ, you can also usually implement AQM on top with little difficulty. Please do ask for further clarification if that would be helpful. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Educate colleges on tcp vs udp
> On 23 May, 2021, at 9:47 pm, Erik Auerswald > wrote: > > As an additional point to consider when pondering whether to > use TCP or UDP: > > To mitigate that simple request-response protocols using UDP > lend themselves to being abused for reflection and amplification… I suspect such considerations are well beyond the level of education requested here. I think what was being asked for was "how do these protocols work, and why do they work that way, in language suitable for people working in a different field", rather than "which one should I use for X application". - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Educate colleges on tcp vs udp
> On 21 May, 2021, at 9:01 am, Taraldsen Erik wrote: > > I'm getting some traction with my colleges in the Mobile department on > measurements to to say something about user experience. While they are > coming around to the idea, they have major gaps in tcp/udp/ip understanding. > I don't have the skill or will to try and educate them. > > Is there good education out there - preferably in the form of an video - > which I can send to my co workers? The part of tcp using ack's is pure magic > to them. They really struggle to grasp the concept. With so basic lack of > understanding it is hard to have a meaningful discussion about loss, latency > an buffering. > > I don't mean to talk them down to much, they are really good with the radio > part of their job - but the transition into seeing tcp and radio together is > very hard on them. I don't have a video link to hand, but let's tease out the major differences between these three protocols: IP (in both v4 and v6 variants) is all about getting a package of data to a particular destination. It works rather like a postal system. The package has a sender's address and a recipient's address, and the routers take care of getting it to the latter. Most packages get through, but for various reasons some packages can be lost, for example if the sorting office (queue) is full of traffic. Some packages are very small (eg. a postcard), some very large (eg. a container load), and some in between. UDP is an "unreliable datagram" protocol. You package it up in an IP wrapper, send it, and *usually* it gets to the recipient. It has an additional "office" address, as the postal system only gets the package to the right building. If it doesn't arrive, you don't get any notification about that - which is why it is "unreliable". Each package also stands on its own without any relationship to others, which is why it is a "datagram". Most UDP packets are small to medium in size. TCP is a "reliable stream" protocol. You use it when you have a lot of data to send, which won't fit into a single datagram, or when you need to know whether your data arrived safely or not. To do this, you use the biggest, container-sized packages the post office supports, and you number them in sequence so you know which ones come first. The recipient and the post office both have regulations so you can't have too many of these huge packages in the system at once, and they reserve the right to discard the excess so they can function properly (this is "congestion control"). So you arrange for the recipient to send the containers back empty when they've been received (they collapse to a small size when empty), and then you know there's room in the system for it to be sent out full again, with a fresh sequence number (this is the "stream"). And if you notice that a particular container *didn't* come back in the expected sequence, you infer that it got lost somewhere and send a replacement for its contents (making the delivery "reliable"). In fact, the actual containers are not sent back, but an acknowledgement postcard basically saying "all containers up to XXX arrived safely, we have room for YYY more, and the post office told us to tell you to slow down the sending rate because they're getting overwhelmed." Some of these postcards may themselves get lost in the system, but as long as some *do* get through, the sender knows all is well. It's common to use TCP for transferring files or establishing a persistent command-and-control connection. It's common to use UDP for simple request-response applications (where both the request and response are small) and where timeliness of delivery is far more important than reliability (eg. multiplayer games, voice/video calls). - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [EXTERNAL] Re: Terminology for Laypeople
> On 13 May, 2021, at 12:10 am, Michael Richardson wrote: > > But, I'm looking for terminology that I can use with my mother-in-law. Here's a slide I used a while ago, which seems to be relevant here: The important thing about the term "quick" in this context is that throughput capacity can contribute to it in some circumstances, but is mostly irrelevant in others. For small requests, throughput is irrelevant and quickness is a direct result of low latency. For a grandmother-friendly analogy, consider what you'd do if you wanted milk for your breakfast cereal, but found the fridge was empty. The ideal solution to this problem would be to walk down the road to the village shop and buy a bottle of milk, then walk back home. That might take about ten minutes - reasonably "quick". It might take twice that long if you have to wait for someone who wants to scratch off a dozen lottery tickets right at the counter while paying by cheque; it's politer for such people to step out of the way. My village doesn't have a shop, so that's not an option. But I've seen dairy tankers going along the main road, so I could consider flagging one of them down. Most of them ignore the lunatic trying to do that, and the one that does (five hours later) decides to offload a thousand gallons of milk instead of the pint I actually wanted, to make it worth his while. That made rather a mess of my kitchen and was quite expensive. Dairy tankers are set up for "fast" transport of milk - high throughput, not optimised for latency. The non-lunatic alternative would be to get on my bicycle and go to the supermarket in town. That takes about two hours, there and back. It takes me basically the same amount of time to fetch that one bottle of milk as it would to conduct a full shopping trip, and I can't reduce that time at all without upgrading to something faster than a bicycle, or moving house to somewhere closer to town. That's latency for you. - Jonathan Morton___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Terminology for Laypeople
> On 17 May, 2021, at 8:18 am, Simon Barber wrote: > > How’s that? It's a wall of text full of technical jargon. It seems to be technically correct, but probably not very useful for the intended context. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Terminology for Laypeople
> On 17 May, 2021, at 12:33 am, Jonathan Morton wrote: > > The delay is caused by the fact that the product already in the pipeline has > already been bought by the hardware store, and thus contractually the loggers > can't divert it to an individual customer like me. The reason this part of the analogy is relevant (and why I set up the hardware store's representative buying the branches at the felling stage) is because in internet traffic I don't want just any old data packets, I need the ones that specifically relate to the connection I opened. We could say for the sake of the analogy that the hardware store is buying all the pine and spruce, and the felling team is thus working only on those trees, but I want a birch tree to fuel my sauna (since it's in less demand, the price is lower). That also makes it easier to identify my branches as they go through the pipeline. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Terminology for Laypeople
> On 16 May, 2021, at 11:44 pm, Michael Richardson wrote: > > Your analogy is definitely the result of optimizing for batches rather than > latency. I really don't know how you got there from here. What I described is basically a pipeline process, not batch processing. The delay is caused by the fact that the product already in the pipeline has already been bought by the hardware store, and thus contractually the loggers can't divert it to an individual customer like me. You can think of one bag of firewood as representing a packet of data. I've requested a particular number of such bags to fill my trailer. Until my trailer is full, my request is not satisfied. The hardware store is just taking whatever manufacturing capacity is available; their warehouse is *huge*. We can explore the analogy further by changing some of the conditions: 1: If the felling of trees was the bottleneck of the operation, such that the trimming, chopping and bagging could all keep up with it, then the delay to me would be much less because I wouldn't have to wait for various backlogs (of complete trees, branches, and piles of firewood) belonging to the hardware store to be dealt with first. Processing each tree doesn't take very long, there's just an awful lot of them in this patch of forest. 1a: If the foreman told the felling team to take a tea break when a backlog built up, that would have nearly the same effect. That's what an AQM does. 2: If the hardware store wasn't involved at all, the bags of firewood would be waiting, ready to be sold. I'd be done in the time it took to load the bags into my trailer. 3: If the loggers sold the *output* of the process to the hardware store, rather than having them reserve it at the head of the pipeline, then I might only have to wait for the throughput of of the operation to produce what I needed, and load it directly into my trailer. *That* would be just-in-time manufacturing. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Terminology for Laypeople
> On 16 May, 2021, at 9:48 pm, john wrote: > > After watching Dave's YouTube video, it seems to me, the congestion of > the packets which Dave was explaining is equivalent to sushi plats on > conveyor stuck on the route and colliding each other on the conveyor > since the conveyor keeps bringing more packets one after another to the > collision point, then plats overflow from the conveyor and dropped on > the floor. > > So now, my question is the picture I described above is close to what > bufferbloat is? Or I am still very far from understanding? If I am still > far from understanding, will you be able to explain it to me, the > laypeople, using the sushi or donuts conveyor? Is the problem the speed > adjustment of the conveyor? Or too many plates or donuts are placed on > the conveyor? If so, why the rate or speed of each factors can not be > adjusted? I even wonder if you could explain it using the door to door > package delivery service since you are talking about delivering packets. Here's an analogy for you: Today there is a logging operation going on just up the road - not unusual in my part of the world. They have a team felling trees, another team trimming off the branches, and the trunks are then stacked for later delivery to the sawmill (*much* later - they have to season first). The branches are fed into a chopping machine which produces firewood and mulch, which is then weighed and bagged for immediate sale. I need firewood for my sauna stove. I know that if I load my trailer full of firewood, it'll last me about a year. I figure I'll pay these guys a visit, and it shouldn't take more than half an hour of my time to get what I need. Under normal circumstances, that would be true. However, the hardware store in the town an hour away has also chosen today to replenish its stock of firewood, and they have a representative on site who's basically buying the branches from every tree as it comes down; every so often a big van turns up and collects the product. He graciously lets me step in and buy the branches off one tree for my own use, and they're tagged as such by the loggers. So instead of just loading ready-made bags of firewood into my trailer, I have to wait for the trimming team to get around to taking the branches off "my" tree which is waiting behind a dozen others. The branches then go into a big stack of branches waiting for the chopping machine. When they eventually get around to chopping those, the firewood is carefully put in a separate pile, waiting for the weighing and bagging. It takes a full hour before I have the branches from "my" tree in a useful format for firing a sauna stove and in my trailer. Which is now only half full. To fill it completely, I have to go through the entire process again from the beginning - only the felling team has been going gangbusters and there are now *twenty* trees waiting for trimming. I planned for half an hour. It actually took me three hours to get my firewood. Not for lack of throughput - that was one pretty effective logging operation - but because of the *queues*. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Questions for Bufferbloat Wikipedia article
> On 7 Apr, 2021, at 12:30 am, Sebastian Moeller wrote: > > I still think that it is not completely wrong to abstractly say BBR evaluates > RTT changes as function of the current sending rate to probe the bottlenecks > capacity (and adjust its sending rate based on that estimated capacity), but > that might either indicate I am looking at the whole thing at too abstract a > level, or, as I fear, that I am simply misunderstanding BBR's principle of > operation... It might be more accurate to say that it estimates the delivery rate at the receiver by observing the ack stream, and aims to match that with the send rate. There is some periodic probing upwards to see if a higher delivery rate is possible, followed by a downwards drain cycle which, I think, pays some attention to the observed RTT. And there is also a cwnd mechanism overlaid as a safety valve. Overall, it's very much a hybrid approach. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] how to ecn again on osx and ios!!!
> On 9 Mar, 2021, at 10:38 pm, Dave Taht wrote: > > sudo sysctl -w net.inet.tcp.disable_tcp_heuristics=1 Now that might well be the missing link. I think we missed it before since it doesn't have "ecn" in its name. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] HardenedBSD implementation of CAKE
> On 2 Mar, 2021, at 2:59 am, Dave Taht wrote: > > My major doubting point about a port was the > resolution of the kernel clock. Linux has a high quality hirres clock, > BSDs didn't seem capable of scheduling on less than a 1ms tick at the > time I last paid attention. This is actually something Cake's shaper can already cope with. I did some testing on an ancient PC that didn't have HPET hardware, so timer interrupts only had 1ms resolution even on Linux. This merely results in small bursts of traffic at 1ms intervals, which collectively add up to the configured rate. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Make-wifi-fast] [Cake] Fwd: [Galene] Dave on bufferbloat and jitter at 8pm CET Tuesday 23
> On 24 Feb, 2021, at 5:19 pm, Taraldsen Erik wrote: > > Do you have a subscription with rate limitations? The PGW (router which > enforces the limit) is a lot more latency friendly than if you are radio > limited. So it may be beneficial to have a "slow" subscription rather than > "free speed" then it comes to latency. Slow meaning lower subscrption rate > than radio rate. This is actually something I've noticed in Finland with DNA. The provisioning shaper they use for the "poverty tariff" is quite well debloated (which was very much not the case some years ago). However, there's no tariff at any convenient level between 1Mbps (poverty tariff) and 50Mbps (probably radio limited on a single carrier). - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] uk and canada starlink beta
> On 23 Jan, 2021, at 6:44 pm, Jonathan Foulkes > wrote: > > Looking forward to this one. Any recommended settings for Cake on this > service? > > Is target RTT of ‘Internet’ (100ms) still appropriate? > Oceanic seems a bit high (300ms). I would say so, since the inherent path latency is (reportedly) similar to a terrestrial path and much shorter than a geostationary bounce. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] UniFi Dream Machine Pro
> On 22 Jan, 2021, at 11:09 pm, Toke Høiland-Jørgensen via Bloat > wrote: > > As Sebastian says, the source of lower performance when using SQM on > some boxes is the traffic shaper, and sometimes the lack of hardware > offloads. I have a strong suspicion that on some hardware, the offload engine & switch is connected to the SoC through a link that is much slower than the Ethernet ports exposed to the outside. As long as traffic stays within the engine, it can run at line rate, but engaging rich software measures requires stuffing it all through the narrower link. This is independent of the CPU's capabilities and is purely an I/O bottleneck. In this particular case, I believe the router portion of the Dream Machine is natively a Gigabit Ethernet device, for which good IPsec and SQM performance at 800Mbps is reasonably expected. The pair of 10G ports are part of the switch portion, and thus intended to support LAN rather than WAN traffic. Think of it as equivalent to attaching a Raspberry Pi 4 (which has native GigE) to a switch with a pair of 10G "uplink" ports for daisy-chaining to other switches. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
> On 10 Jan, 2021, at 7:39 am, Erik Auerswald > wrote: > > In my experience, asking about token-bucket algorithm details is often > a sign for the asker to not see the forest for the trees. IMHO, token-bucket is an obsolete algorithm that should not be used. Like RED, it requires tuning parameters whose correct values are not obvious to the typical end-user, nor even to automatic algorithms. Codel replaces RED, and virtual-clock algorithms can similarly replace token-bucket. Token-bucket is essentially a credit-mode algorithm. The notional "bucket" is replenished at regular (frequent) intervals by an amount proportional to the configured rate of delivery. Traffic may be delivered as long as there is sufficient credit in the bucket to cover it. This inherently leads to the delivery of traffic bursts at line rate, rather than delivery rate, and the size of those bursts may be as large as the bucket. Conversely, if the bucket is too small, then scheduling and other quantum effects may conspire to reduce achievable throughput. Since the bucket size must be chosen, manually, in advance, it is almost always wrong (and usually much too large). Many token-bucket implementations further complicate this by having two nested token-buckets. A larger bucket is replenished at exactly the configured rate from an infinite source, while a smaller bucket is replenished at some higher rate from the larger bucket. This reduces the incidence of line-rate bursts and accommodates Reno-like sawtooth behaviour, but as noted, has the potential to seriously confuse BBR if the buckets are too large. BBRv2 may handle it better if you add ECN and AQM, as the latter will help to correct bad estimations of throughput capacity resulting from the buckets initially being drained. The virtual-clock algorithm I implemented in Cake is essentially a deficit-mode algorithm. During any continuous period of traffic delivery, defined as finding a packet in the queue when one is scheduled to deliver, the time of delivering the next packet is updated after every packet is delivered, by calculating the serialisation time of that packet and adding it to the previous delivery schedule. As long as that time is in the past, the next packet may be delivered immediately. When it goes into the future, the time to wait before delivering the next packet is precisely known. Hence bursts occur only due to quantum effects and are automatically of the minimum size necessary to maintain throughput, without any configuration (explicit or otherwise). Since the scenario here involves an OpenWRT device, you should be able to install Cake on it, if it isn't there already. Please give it a try and let us know if it improves matters. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Make-wifi-fast] [bbr-dev] D* tcp looks pretty good, on paper
> On 8 Jan, 2021, at 5:38 pm, Neal Cardwell via Make-wifi-fast > wrote: > > What did you have in mind by "variable links" here? (I did not see that term > in the paper.) Wifi and LTE tend to vary their link characteristics a lot over time. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Cerowrt-devel] my thx to spacex (and kerbal space program) forcheering me up all year
> On 2 Jan, 2021, at 1:31 am, David P. Reed wrote: > > Now, one wonders: why can't Starlink get it right first time? > > It's not like bufferbloat is hard on a single bent pipe hop, which is all > Starlink does today. The bloat doesn't seem to be in Starlink itself, but in the consumer-end modem. This is fixable, just as soon as Starlink put their minds to it, because it's based on the same Atheros SoCs as the consumer CPE we're already familiar with. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Good Wi-Fi test programs?
> On 7 Dec, 2020, at 1:00 am, Rich Brown wrote: > > I would first do the following "easy tests": > > - Check for conflicting/overlapping Wi-Fi channels. I am fond of the free > app, WiFi Analyzer from farproc (http://a.farproc.com/wifi-analyzer) for this > test, but there are several similar Android apps. > - Compare the signal strength for the DSL modem and the Calix modem, as shown > by WiFi Analyzer > - Be sure that all computer(s) are using the Calix modem. > - Use a variety of speed tests: DSLReports, Fast.com, other favorites? > - Compare speedtest results when the test computer is close to, or far from > the router. > - (If possible) compare the performance for both Wi-Fi and Ethernet > - Shut off the DSL modem on my way out the door to be sure it's not causing > interference or confusing the situation. > > Anything else you'd recommend? Make sure the customer's devices are using 5GHz rather than 2.4GHz band, where possible. The Calix devices apparently support both and try to perform "band steering", but it's worth double checking. https://www.calix.com/content/calix/en/site-prod/library-html/systems-products/prem/op/p-gw-op/eth-gw/800e-gc-spg/index.htm?toc.htm?76518.htm I also read while briefly scanning the accessible documentation that Calix operates at maximum permitted wifi transmit power and with up to 80MHz RF bandwidth. While this does maximise the range and throughput of an individual AP, many such APs in close proximity will see the RF channel as "occupied" by each others' transmissions more often than if a lower transmit power were used. The result is that they all shout so much that they can't hear themselves think, and clients can't get a word in edgewise to send acks (with generally lower transmit power themselves). You should look for evidence of this while analysing channel occupancy, especially in multi-occupancy buildings. It's probably less of a concern in detached or semi-detached housing. I didn't see any mention of Airtime Fairness technology, which is now a highlighted feature on some other manufacturers' products (specifically TP-Link). Ask whether that is present or can be implemented. You may be able to test for it, if you have established a case where wifi is clearly the bottleneck, by passing a saturating ECN Capable flow through it and looking for CE marks (and/or ECE feedback), since Airtime Fairness comes with built-in fq_codel. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] starlink
> On 1 Dec, 2020, at 3:20 pm, Toke Høiland-Jørgensen via Bloat > wrote: > >> Jim Gettys made a reddit post on r/Starlink asking for data from beta >> testers. I am one of those testers. I spun up an Ubuntu VM and did three >> runs of flent and rrul as depicted in the getting started page. You may >> find the results here: >> https://drive.google.com/file/d/1NIGPpCMrJgi8Pb27t9a9VbVOGzsKLE0K/view?usp=sharing > > Thanks for sharing! That is some terrible bloat, though! :( I imagine it exists in the uplink device rather than the Starlink network itself. Distinct upload and download bloat tests would help in determining whether it's at your end or the remote end. You should be able to use dslreports.com/speedtest to determine that. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Adding CAKE "tc qdisc" options to NetworkManager
> On 5 Oct, 2020, at 7:13 pm, David Collier-Brown wrote: > > By pure luck, I ended up chatting with one of the NetworkManager chaps, who > invited a merge request with the proper parameters for CAKE. > > He wrote > > Currently NM doesn't support configuring CAKE parameters. IOW, if you > set "root cake bandwidth 100Mbit", you will see in the tc output that > cake was set but with default parameters. > > Yes, I think it will be useful to have CAKE support in NM, but I can't > say when it will be implemented. Of course, patches are always > welcome; if anybody is interested in contributing it, please have a > look at the work that was done to support SFQ: > > https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/b22b4f9101b1cbfde49b65d9e2107e4ae0d817c0 > > Sounds like a good job for next weekend, can I get some reviewers for the > week after? I could probably at least glance at it. How easy is it to set this up in, say, Linux Mint? - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Dumb question time 1: using upload and download speeds from dslreports
> On 4 Oct, 2020, at 6:25 pm, Dave Collier-Brown > wrote: > > When setting my laptop to explicitly use CAKE for an article- and > recipe-writing effort, I blithely took the download speed and stuffed it > into > >tc qdisc replace dev enp0s25 root cake docsis ack-filter bandwidth > 179mbit > > When Iván Baldo kindly suggested I mention ingress, it suddenly struck > me: I was using the downstream/ingress value for my upstream setting! > > Should I not be using my upload speed, some 13mbit, not 179 ??? For ingress traffic (usually the download direction), you need to redirect the ingress traffic to an IFB device and attach an ingress-configured Cake instance there. You would use "ingress" instead of "ack-filter" and your download bandwidth. For egress traffic you should indeed use the upload speed. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] cake + ipv6
> On 23 Sep, 2020, at 8:36 pm, Daniel Sterling > wrote: > > I ran some updates on the xbox and watched iftop. I found that the > xbox does the following: > > * uses up to four http (TCP port 80) connections at once to download data > * connects (seemingly randomly) to both ipv4 and ipv6 update hosts > > That means at any given time, the xbox could be downloading solely via > ipv4, solely via ipv6, or a with mix of the two. > > I believe this means when it's using both v4 and v6, it's getting > double its "share" of the bandwidth since cake can't tell that the v4 > and v6 traffic is coming from the same LAN host -- is that correct? It fits my mental model, yes, though obviously the ideal would be to recognise that the xbox is a singular machine. Are you seeing a larger disparity than that? If so, is it even larger than four connections would justify without host-fairness? > I'm using the default "triple-isolate" parameter. I can try switching > to dual-src/dest host or even plain srchost / dsthost isolation. In > theory that should limit traffic more per download host, even if cake > can't determine the LAN host that's doing the downloading, right? Triple-isolate is designed to function reasonably well when the user can't be sure which side of the network is the LAN! The "dual" modes provide Cake with that information explicitly, so may be more reliable in corner cases. For your topology, eth0 (LAN egress) should get dual-dsthost, and eth1 (WAN egress) should get dual-srchost. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] How about a topical LWN article on demonstrating the real-world goodness of CAKE?
> On 8 Sep, 2020, at 7:48 pm, Matt Mathis via Bloat > wrote: > > To be simplistic, you might just talk about cake vs (bloated) drop tail. To > be thorough, you also need to make the case that cake is better than other > AQMs. This feels like too much for LWN, but silence on other solutions might > trigger skeptics. Personally, my position is: 1: Bloated dumb FIFOs are terrible. 2: Basic AQM is good. This can be as simple as TBF+WRED; it solves a large part of the basic problem by eliminating multi-second queue delays. In some cases this can solve very serious problems, such as DNS lookups failing when the link is loaded, quite adequately. Properly configured, you can keep queue delays below the 100ms threshold for reasonable VoIP performance. 3: FQ-AQM is better. That generally means HTB+fq_codel, but other forms of this exist. It means essentially zero added delay for non-saturating flows. It's an easy way to make DNS, VoIP and online gaming work nicely without having to restrict data-hungry applications. 4: Cake offers some extra tools and aims to be easier (more intuitive) to configure. Currently, it is the best solution for slow and medium-speed broadband (up to 100Mbps), and can also be used at higher speeds with some care, mostly regarding device performance. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Other CAKE territory (was: CAKE in openwrt high CPU)
> On 4 Sep, 2020, at 1:14 am, David Collier-Brown wrote: > > I'm wondering if edge servers with 1Gb NICs are inside the "CAKE stays > relevant" territory? Edge servers usually have strong enough CPUs and I/O - by which I mean anything from AMD K8 and Intel Core 2 onwards with PCIe attached NICs - to run Cake at 1Gbps without needing special measures. I should run a test to see how much I can shove through an AMD Bobcat these days - not exactly a speed demon. We're usually seeing problems with the smaller-scale CPUs found in CPE SoCs, which are very much geared to take advantage of hardware accelerated packet forwarding. I think in some cases there might actually be insufficient internal I/O bandwidth to get 1Gbps out of the NIC, into the CPU, and back out to the NIC again, only through the dedicated forwarding path. That could manifest itself as a lot of kernel time spent waiting for the hardware, and can only really be solved by redesigning the hardware. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
> On 3 Sep, 2020, at 5:32 pm, Toke Høiland-Jørgensen via Bloat > wrote: > > Yeah, offloading of some sort is another option, but I consider that > outside of the "CAKE stays relevant" territory, since that will most > likely involve an entirely programmable packet scheduler. Offload of *just* shaping could be valuable in itself at higher rates, when combined with BQL, as it would avoid having to interact with the CPU-side timer infrastructure so much. It would also not be difficult at all to implement in hardware at line rate, even with overhead compensation. It's the sort of thing you could sensibly do with 74-series logic and a lookup table in a cheap SRAM, up to millions of PPS, and considerably faster in FPGA or ASIC territory. I think that's what the questions about combining "unlimited Cake" with some other shaper are angling towards, though I suspect that the way Cake's shaper is integrated is still better than having an external one in software. With that said, it's also possible that something a bit lighter than Cake might be appropriate at cable speeds. There is background work in this general area going on, so don't despair. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
> On 1 Sep, 2020, at 11:04 pm, Sebastian Moeller wrote: > >> The challenge are the end users, who only understand the silly ’speed’ >> metric, and feel anything that lowers that number is a ‘bad’ thing. It takes >> effort to get even technical users to get it. > > I repeatedly fall into that trap... For a lot of users, I rather suspect that setting 40/10 Mbps would give them entirely sufficient speed, and most existing CPE would be able to keep up with those settings even with all of Cake's bells and whistles turned on. The trouble is that that might be 10% of what the cable company is advertising to them. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
> On 1 Sep, 2020, at 9:45 pm, Toke Høiland-Jørgensen via Bloat > wrote: > > CAKE takes the global qdisc lock. Presumably this is a default mechanism because CAKE doesn't handle any locking itself. Obviously it would need to be replaced with at least a lock over CAKE's complete data structures, taking the lock on each entry point and releasing it at each return point, and I assume there is a flag we can set to indicate we do so. Finer-grained locking might be possible, but CAKE is fairly complex so that might be hard to implement. Locking per CAKE instance would at least allow running ingress and egress on different CPUs. Is there an example anywhere on how to do this? - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] cake + ipv6
On 18/08/2020 06:44, Daniel Sterling wrote: ...is it possible to identify (and thus classify) plain old bulk downloads, as separate from video streams? They're both going to use http / https (or possibly QUIC) -- and they're both likely to come from CDN networks... I can't think of a simple way to tell them apart. If there was an easy way to do it, I would already have done so. We are unfortunately hamstrung by some bad design and deployment around Diffserv, which might otherwise provide a useful end-to-end visible signal here. Is this enough of a problem that people would try to make a list of netblocks / prefixes that belong to video vs other CDN content? It's possible that someone is doing this, but I don't specifically know of such a source of information. It would of course be better to find a solution that didn't rely on white/black lists, which have a distressing habit of going stale. But one of the more reliable ways might be to use Autonomous System (AS) information. ASes are an organisational unit used for assigning IP address ranges and for routing, and usually correspond to a more-or-less significant Internet organisation. It should be feasible to map an observed IP address to an AS, then look up the address blocks assigned to that AS, thereby capturing a whole range of related IP addresses. I do notice video streams are much more bursty than plain downloads for me, but that may not hold for all users. That is, for me at least, a video stream may average 5mbps over, say, 1 minute, but it will sit at 0mbps for a while and then burst at 20mbps for a bit. Correct, YouTube at least likes to fetch a big block of data from disk and send it all at once, then rely on the client buffer to tide it over while the disk services other requests. It makes some sense when you consider how slow disk seeks are relative to the number of clients they need to support, each of which will generally be watching a different video (or at least a different part of the same one). However, this burstiness disappears on the wire just when you would like to use it to identify traffic, ie. when the video traffic saturates the bandwidth available to it. If there's only just enough bandwidth, or even *less* than what is required, then YouTube sends data continuously into the client buffer, trying to keep it as full as possible. There are no easy answers here. But I've suggested some things to look for and try out. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] cake + ipv6
On Tuesday, 18 August 2020, Daniel Sterling wrote: > As you know, I'm here cuz I have an xbox and y'all created cake, which > I am eternally grateful for, since it makes latency go away. > > But I've recently hit an interesting issue -- > > Microsoft (and/or akamai, or whatever) has recently started pushing > updates to the xbox via ipv6 instead of v4. > > As I'm sure you know ipv6 addresses are essentially random on the > internal LAN as compared to v4 -- a box can grab as many v6 addresses > as it wants, and I don't believe my linux router can really know which > box is using which address, can it? > > Which means... ipv6 breaks cake's flow isolation. > > Cake can't throttle all those xbox downloads correctly cuz it doesn't > know they're all going to/from that one device. > > So I suppose this may be similar to the "bittorrent" problem -- which, > is there a general solution for that problem? > > In my case the xbox grabs more than its share of bandwidth, which > means other bulk streaming -- that is to say, youtube and netflix :) > -- stops working well > > I can think of one general solution -- run more wires to more devices, > and give devices their own VLAN, and tag / prioritize / deprioritize > specific traffic that way... > > But.. are there better / more general solutions? Does this traffic at least have some consistent means of identification, such as a port number or a remote address range? If so, you could use fwmark rules and Cake's diffserv3 mode to put that traffic in the Bulk tin, same as with BitTorrent. I suppose it's also possible to make Cake sensitive to Layer 2 addresses (that is, the Ethernet address) for the purpose of host isolation. That is presently not implemented, so might take a while to filter through the deployment range. ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] How about a topical LWN article on demonstrating the real-world goodness of CAKE?
> The current best practice seems to be to instantiate cake/SQM on a reasonably > fixed rate wan link and select WiFi cards/socs that offer decent airtime > fairness. > Works pretty well in practice... Yes, AQL does essentially the right thing here, again along the lines of limiting the influence of one machine's load on another's performance, and completely automatically since it has faurly direct information and control over the relevant hardware. Cake is designed to deal with wired links where the capacity doesn't change much, but the true bottleneck is typically not at the device exerting control. On that note, there is a common wrinkle whereby the bottleneck may shift between the private last mile link and some shared backhaul in the ISP at different times of day and/or days of week. Locally I've seen it vary between 20M (small hours, weekday) and 1Mbps (weekend evening). When Cake is configured for one case but the situation is different, the results are obviously suboptimal. I'm actually now trying a different ISP to see if they do better in the evenings. Evenroute's product includes automatic detection of and scheduling for this case, assuming that it follows a consistent pattern over a weekly period. Once set up, it is essentially a cronjob adjusting Cake's parameters dynamically, so providing a manual setup for the general OpenWRT community should be feasible. On “tc qdisc change”, Cake usually doesn't drop any packets, so parameters can be changed frequently if you have a reason for it. ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] How about a topical LWN article on demonstrating the real-world goodness of CAKE?
> Are the risks and tradeoffs well enough understood (and visible enough > for troubleshooting) to recommend broader deployment? > > I recently gave openwrt a try on some hardware that I ultimately > concluded was insufficient for the job. Fairly soon after changing out > my access point, I started getting complaints of Wi-Fi dropping in my > household, especially when someone was trying to videoconference. I > discovered that my AP was spontaneously rebooting, and the box was > getting hot. Most CPE devices these days rely on hardware accelerated packet forwarding to achieve their published specs. That's all about taking packets in one side and pushing them out the other as quickly as possible, with only minimal support from the CPU (likely, new connections get a NAT/firewall lookup, that's all). It has the advantages of speed and power efficiency, but unfortunately it is also incompatible with our debloating efforts. So debloated CPE will tend to run hotter and with lower peak throughput, which may be noticeable to cable and fibre users; VDSL (FTTC) users might have service of 80Mbps or less where this effect is less likely to matter. It sounds like that AP had a very marginal thermal design which caused the hardware to overheat as soon as the CPU was under significant load, which it can easily be when a shaper and AQM are running on it at high throughput. The cure is to use better designed hardware, though you could also contemplate breaking the case open to cure the thermal problem directly. There are some known reliable models which could be collected into a list. As a rule of thumb, the ones based on ARM cores are likely to be designed with CPU performance more in mind than those with MIPS. Cake has some features which can be used to support explicit classification and (de)prioritisation of traffic via firewall marking rules, either by rewriting the Diffserv field or by associating metadata with packets within the network stack (fwmark). This can be very useful for pushing Bittorrent or WinUpdate swarm traffic out of the way. But for most situations, the default flow-isolating behaviour already works pretty well, especially for ensuring that one computer's network load has only a bounded effect on any other. We can discuss that in more detail if that would be helpful. ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Phoronix: Linux 5.9 to allow FQ_PIE as default
> On 16 Jul, 2020, at 12:58 am, Michael Yartys via Bloat > wrote: > > Are there any major differences between fq_codel and fq_pie in terms of their > performance? I think some tests were run some time ago which showed significantly better behaviour by fq_codel than fq_pie. In particular, the latter used only a single AQM instead of an independent one for each flow. I'm not sure whether it's been changed since then. The only advantage I can see for PIE over Codel is, possibly, a reduction of system load for use of the AQM. But fq_codel is already pretty efficient so that would be an edge case. In any case, it is already possible to chose any qdisc you like (with default parameters) as the default qdisc. I'm really not sure what the fuss is about. > And how does the improved fq_codel called cobalt, which is used in cake, > stack up? COBALT has some modifications to basic Codel which, I think, could profitably be backported into fq_codel. It also has a particular extra mode, based on BLUE, for dealing with unresponsive traffic (that continued to build queue even after lots of ECN signalling and/or Cdel-scheduled packet drops). It is the latter which inspired the name. For the other major functional component of fq_codel, Cake also has a set-associative hash function for allocating flows into queues, which substantially reduces the probability of hash collisions in most cases. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] the future belongs to pacing
> On 5 Jul, 2020, at 9:09 pm, Stephen Hemminger > wrote: > > I keep wondering how BBR will respond to intermediaries that aggregate > packets. > At higher speeds, won't packet trains happen and would it not get confused > by this? Or is its measurement interval long enough that it doesn't matter. Up-thread, there was mention of patches related to wifi. Aggregation is precisely one of the things that would address. I should note that the brief description I gave glossed over a lot of fine details of BBR's implementation, which include careful filtering and conditioning of the data it gathers about the network path. I'm not altogether a fan of such complexity. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] the future belongs to pacing
> On 4 Jul, 2020, at 8:52 pm, Daniel Sterling wrote: > > could someone explain this to a lay person or point to a doc talking > about this more? > > What does BBR do that's different from other algorithms? Why does it > break the clock? Before BBR, was the clock the only way TCP did CC? Put simply, BBR directly probes for the capacity and baseline latency of the path, and picks a send rate (implemented using pacing) and a failsafe cwnd to match that. The bandwidth probe looks at the rate of returning acks, so in fact it's still using the ack-clock mechanism, it's just connected much less directly to the send rate than before. Other TCPs can use pacing as well. In that case the cwnd and RTT estimate are calculated in the normal way, and the send rate (for pacing) is calculated from those. It prevents a sudden opening of the receive or congestion windows from causing a huge burst which would tend to swamp buffers. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] FW: [Dewayne-Net] Ajit Pai caves to SpaceX but is still skeptical of Musk's latency claims
> On 14 Jun, 2020, at 12:15 am, Michael Richardson wrote: > > They claim they will be able to play p2p first person shooters. > I don't know if this means e2e games, or ones that middlebox everything into > a server in a DC. That's what I keep asking. I think P2P implies that there is *not* a central server in the loop, at least not on the latency-critical path. But that's not how PvP multiplayer games are typically architected these days, largely due to the need to carefully manage the "fog of war" to prevent cheating; each client is supposed to receive only the information it needs to accurately render a (predicted) view of the game world from that player's perspective. So other players that are determined by the server to be "out of sight" cannot be rendered by x-ray type cheat mods, because the information about where they are is not available. The central server has full information and performs the appropriate filtering before replicating game state to each player. Furthermore, in a PvP game it's wise to hide information about other players' IP addresses, as that often leads to "griefing" tactics such as a DoS attack. If you can force an opposing player to experience lag at a crucial moment, you gain a big advantage over him. And there are players who are perfectly happy to "grief" members of their own team; I could dig up some World of Tanks videos demonstrating that. It might be more reasonable to implement a P2P communication strategy for a PvE game. The central server is then only responsible for coordinating enemy movements. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] FW: [Dewayne-Net] Ajit Pai caves to SpaceX but is still skeptical of Musk's latency claims
> On 11 Jun, 2020, at 7:03 pm, David P. Reed wrote: > > So, what do you think the latency (including bloat in the satellites) will > be? My guess is > 2000 msec, based on the experience with Apple on ATT > Wireless back when it was rolled out (at 10 am, in each of 5 cities I tested, > repeatedly with smokeping, for 24 hour periods, the ATT Wireless access > network experienced ping time grew to 2000 msec., and then to 4000 by mid day > - true lag-under-load, with absolutely zero lost packets!) > > I get that SpaceX is predicting low latency by estimating physical distance > and perfect routing in their LEO constellation. Possibly it is feasible to > achieve this if there is zero load over a fixed path. But networks aren't > physical, though hardware designers seem to think they are. > > Anyone know ANY reason to expect better from Musk's clown car parade? Speaking strictly from a theoretical perspective, I don't see any reason why they shouldn't be able to offer latency that is "normally" below 100ms (to a regional PoP, not between two arbitrary points on the globe). The satellites will be much closer to any given ground station than a GEO satellite, the latter typically adding 500ms to the path due mostly to physical distance. All that is needed is to keep queue delays reasonably under control, and there's any number of AQMs that can help with that. Clearly ATT Wireless did not perform any bufferbloat mitigation at all. I have no insight or visibility into anything they're *actually* doing, though. Can anyone dig up anything about that? - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] What's a good non-intrusive way to look at bloat (and perhaps things like gout (:-))
> On 4 Jun, 2020, at 1:21 am, Dave Collier-Brown > wrote: > > We've good tools to measure network performance under stress, by the simple > expedient of stressing it, but is there a good approach I could recommend to > my company to monitor a bunch of reasonably modern links, without the > measurement significantly affecting their state? > > I don't mind increasing bandwidth usage, but I'm downright grumpy about > adding to the service time: I have a transaction that times out for gross > slowness if it takes much more that an tenth of a second, and it involves a > scatter-gather interaction with at least 10 customers in that time. > > I'm topically interested in bloat, but really we should understand > "everything" about our links. If they can get the bloats like cattle, they > can probably get the gout, like King Henry the Eighth (;-)) > > My platform is Centos 8, and I have lots of Smarter Colleagues to help. My first advice would be to browse pollere.net for tools - like pping (passive ping), which monitors the latency of flows in transit. That should give you some interesting information without adding any load at all. There is also connmon (https://github.com/pollere/connmon). - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] New speed/latency/jitter test site from Cloudflare
> On 3 Jun, 2020, at 7:48 pm, Dave Taht wrote: > > I am of course, always interested in how they are measuring latency, and > where. They don't seem to be adding more latency measurements once the download tests begin. So in effect they are only measuring idle latency. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CPU consumption using TC-TBF and TC-POLICE to limit rate
> On 26 May, 2020, at 12:47 pm, Jose Blanquicet wrote: > > We have an embedded system with limited CPU resources that acts as > gateway to provide Internet access from LTE to a private Wi-Fi > network. Our problem is that the bandwidth on LTE and Wi-Fi links is > higher than what the system is able to handle thus it reaches 100% of > CPU load when we perform a simple speed test from a device connected > to our Wi-Fi Hotspot. > > Therefore, we want to limit the bandwidth to avoid system gets > saturated is such use-case. To do so, we thought to use the QDISC-TBF > on the Wi-Fi interface. For instance, to have 10Mbps: > >tc qdisc add dev wlan0 root tbf rate 10mbit burst 12500b latency 50ms > > It worked correctly and maximum rate was limited to 10Mbps. However, > we noticed that the CPU load added by the TBF was not negligible for > our system. Just how limited is the CPU on this device? I have successfully shaped at several tens of Mbps on a Pentium-MMX, where the limiting factor may have been the PCI bus rather than the CPU itself. Assuming your CPU is of that order of capability, I would suggest installing Cake using the out-of-tree build process, and the latest stable version of the iproute2 tools to configure it. Start with: git clone https://github.com/dtaht/sch_cake.git This provides a more efficient and more effective shaper than TBF, and a more effective AQM than a policer, and good flow-isolation properties, all in a single bundle that will be more efficient than running two separate components. Once installed, the following should set it up nicely for you: tc qdisc replace dev wlan0 root cake bandwidth 10Mbit besteffort flows ack-filter Cake is considered quite a heavyweight solution, but very effective. If it doesn't work well for this particular use case, it may be feasible to backport some more recent work which takes a simpler approach, though along similar lines. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Does it makes sense to shape traffic with 16Kbit/s up and 16Kbit/s down?
> On 4 May, 2020, at 6:47 pm, Richard Fröhning wrote: > > I have a VPN provider which support lzo-compression. If I were to use > VPN through the 16Kbps it could squeeze out some bytes. > > I guess in that case I shape the tunX interface, right? > > Would the MTU setting be on the usb0 device and/or the tunX? You should set the qdisc and those options on the *physical* device, not the one that carries your uncompressed data. Don't forget to set up ingress shaping as well as egress. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Does it makes sense to shape traffic with 16Kbit/s up and 16Kbit/s down?
> On 4 May, 2020, at 5:09 pm, Sebastian Moeller wrote: > > At 16Kbps a full-MTU sized packet will take around > (1000 ms/sec * (1500 * 8) bits/packet ) / 16000 bits/sec = 750 ms > > This is just to put things into perspective, 16Kbps is going to be both > painful and much better than no service at all Reducing the MTU to 576 bytes is likely to help. That was commonly done in the days of analogue modems, when such low speeds were normal. I'm fortunate enough to live somewhere where the local ISPs don't limit your data transfer, even on the budget subscriptions. Roughly €25 will buy you 500Kbps mobile service for three months, and you can use that 500Kbps as much as you like. And that is with the lowest population density in Europe, so the per capita cost of covering the country in cell towers is obviously no excuse. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Does it makes sense to shape traffic with 16Kbit/s up and 16Kbit/s down?
> On 4 May, 2020, at 3:26 pm, Richard Fröhning wrote: > > And if so, which queue discipline would work best with it? > > Background: I am forced to use my cell phone as uplink and after I > reach the monthly limit, bandwidth will be reduces to given up/downlink > speeds. > > I know Surfing websites with those speeds will take forever - however > it should be enough to send/receive emails and/or use a messenger. You should be able to do this with Cake. Unlike most other qdiscs, it will automatically adjust several parameters to work nicely with low-speed links, because with the built-in shaper it has knowledge of the speed. I don't think I've tested it as low as 16Kbit, but I have used it at 64Kbit. To keep things simple, you may want to specify "besteffort flows satellite" as parameters. Some of those settings may also be available in a GUI. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [tsvwg] my backlogged comments on the ECT(1) interim call
> On 29 Apr, 2020, at 12:25 pm, Luca Muscariello wrote: > > BTW, I hope I made the point about incentives to cheat, and the risks > for unresponsive traffic for L4S when using ECT(1) as a trusted input. One scenario that I think hasn't been highlighted yet, is the case of a transport which implements 1/p congestion control through CE, but marks itself as a "classic" transport. We don't even have to imagine such a thing; it already exists as DCTCP, so is trivial for a bad (or merely ignorant) actor to implement. Such a flow would squeeze out other traffic that correctly responds to CE with MD, and would not be "caught" by queue protection logic designed to protect the latency of the LL queue (as that has no effect on traffic in the classic queue). It would only be corralled by an AQM which can act to isolate the effects of one flow on others; in this case AF would suffice, but FQ would also work. This hazard already exists today. However, the L4S proposal "legitimises" the use of 1/p congestion control using CE, and the subtlety that marking such traffic with a specific classifier is required for effective congestion control is likely to be lost on people focused entirely on their own throughput, as much of the Internet still is. Using ECT(1) as an output from the network avoids this new hazard, by making it clear that 1/p CC behaviour is only acceptable on signals that unambiguously originate from an AQM which expects and can handle it. The SCE proposal also inserts AF or FQ protection at these nodes, which serves as a prophylactic against the likes of DCTCP being used inappropriately on the Internet. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [tsvwg] my backlogged comments on the ECT(1) interim call
> On 28 Apr, 2020, at 10:43 pm, Black, David wrote: > > And I also noted this at the end of the meeting: “queue protection that > might apply the disincentive” > > That would send cheaters to the L4S conventional queue along with all the > other queue-building traffic. Alas, we have not yet seen an integrated implementation of the queue protection mechanism, so that we can test its effectiveness. I think it is part of the extra evidence that would be needed before a decision could be taken in favour of using ECT(1) as an input. I would also note in this context that mere volume of data, or length of development, are not marks that should be taken in favour of a proposal. The relevance, quality, thoroughness and results of data collection must be carefully evaluated, and it could easily be argued that a lengthy development cycle that still has not produced reliable results should be retired, to avoid throwing good money after bad. The fact that we were able to find serious problems with the (only?) reference implementation of L4S using a relatively small, but independently selected test suite does not lend confidence in its maturity. Reputable engineers know that it is necessary to establish a robust design first. Only then can a robust implementation be hoped for. It is the basic design decision, over the semantics of each ECN codepoint, that we were trying to discuss yesterday. I'm not certain that everyone in the room understood that. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] dropbox, bbr and ecn packet capture
> On 26 Apr, 2020, at 3:36 am, Dave Taht wrote: > > I just did a rather large dropbox download. They are well known to be > using bbr and experimenting with bbrv2. So I fired off a capture > during a big dropbox download... > > It negotiated ecn, my fq_codel shaper and/or my newly ath10k > fq_codel's wifi exerted CE, osx sent back ecn-echo, and the rtt > results were lovely. However, there is possibly not a causal > relationship here, and if anyone is bored and wants to scetrace, > tcptrace or otherwise tear this cap apart, go for it. Well, the CE response at their end is definitely not Multiplicative Decrease. I haven't dug into it more deeply than that. But they're also not running AccECN, nor are they "proactively" sending CWR to get a "more accurate" CE feedback. I suspect they're running BBRv1 in this one. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] this explains speedtest stuff
> On 25 Apr, 2020, at 8:24 pm, Y via Bloat wrote: > > ECN on > http://www.dslreports.com/speedtest/62823326 > ECN off > http://www.dslreports.com/speedtest/62823112 Yup, that's what I mean. > doesn't appear to have worked. retransmits are still high. Ken, it might be that your version of fq_codel doesn't actually have ECN support on by default. So try adding the "ecn" keyword to the qdisc. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] this explains speedtest stuff
> On 25 Apr, 2020, at 5:14 pm, Kenneth Porter wrote: > > I see "ecn" in the qdisc commands. No, not the qdisc (where ECN is enabled by default), but on the client. Linux: # sysctl net.ipv4.tcp_ecn=1 Windows: > netsh interface tcp set global ecncapability=enabled OSX: $ sudo sysctl -w net.inet.tcp.ecn_initiate_out=1 $ sudo sysctl -w net.inet.tcp.ecn_negotiate_in=1 In Linux and OSX, to make the setting persist across reboots, edit /etc/sysctl.conf. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] this explains speedtest stuff
> On 25 Apr, 2020, at 4:49 pm, Kenneth Porter wrote: > > before: > > http://www.dslreports.com/speedtest/62767361 > > after: > > http://www.dslreports.com/speedtest/62803997 > > Using simple.qos with: > > UPLINK=45000 > DOWNLINK=42500 > > (The link is supposed to be 50 Mbps symmetric and speed test does show it > bursting that high sometimes.) Looks like a definite improvement. The Quality grade of C may indicate that you haven't enabled ECN on your client; without it, Codel has to drop packets to do congestion signalling. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] this explains speedtest stuff
> On 25 Apr, 2020, at 4:16 am, Kenneth Porter wrote: > > Alas, CentOS 7 lacks cake. It does have fq_codel so I used the simple.qos > script from sqm-scripts, with uplink 5 and downlink 45000: > > http://www.dslreports.com/speedtest/62797600 Those bandwidth settings are definitely too high; you don't have complete control of the queue here, and that's visible particularly with the steady increase in the upload latency during the test. Try 44500 up, 42000 down, equivalent to my suggestions for Cake. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] this explains speedtest stuff
> On 24 Apr, 2020, at 7:22 pm, Kenneth Porter wrote: > > My next project will be to enable cake on my CentOS 7 box that just got a new > 45 Mbps symmetric fiber connection from AT&T ("Business in a Box"). We > upgraded from 1.5Mbps/128kbps ADSL. Any hints on what settings to use? Fibre probably uses Ethernet-style framing, or at least it does at the provisioning shaper. So the following settings should probably work well: # outbound tc qdisc replace dev $WAN root cake bandwidth 44.5Mbit besteffort dual-srchost nonat ethernet ack-filter # inbound tc qdisc replace dev $IFB4WAN root cake bandwidth 42Mbit besteffort dual-dsthost nonat ethernet ingress With, of course, the usual redirecting of $WAN ingress to $IFB4WAN. The dual-src/dsthost settings should share things nicely between different users, including the server, even if one uses a lot more flows than another. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] this explains speedtest stuff
> On 24 Apr, 2020, at 3:44 am, Kenneth Porter wrote: > >> dslreports.com is only on the third page of the search results. > > What does it mean that my bloat indicator is a grey dot? > > <http://www.dslreports.com/speedtest/62741609> It looks like there was a websockets error during the test, so try it again and it might work. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Bufferbloat glossary
> On 29 Mar, 2020, at 9:10 pm, Kenneth Porter wrote: > > For example, in today's message from David P. Reed I find "EDF" and "ACID". Those aren't standard bufferbloat jargon, but come from elsewhere in computer science. EDF is Earliest Deadline First (a scheduling policy normally applied in RTOSes - Realtime Operating Systems), and ACID is Atomicity, Consistency, Isolation, Durability (a set of properties typically desirable in a database). I think the main distinction between online gaming and teleconferencing is the volume of data involved. Games demand low latency, but also usually aren't throwing megabytes of data across the network at a time, just little bundles of game state updates telling the server what actions the player is taking, and telling the player's computer what enemies and other effects the player needs to be able to see. Teleconferencing, by contrast, tends to involve multiple audio and video streams going everywhere. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] fcc's coronovirus guidelines
> On 28 Mar, 2020, at 4:30 pm, Sebastian Moeller wrote: > > *) I wonder how well macos devices stack-up here, given that they default to > fq_codel (at least over wifi)? That might help if the wifi link is the bottleneck, *and* if not too much buffering is done by the wifi hardware. Otherwise the benefit will only be limited. AQM and/or FQ has to be applied at the bottleneck; sometimes a bottleneck has to be artificially induced to implement that. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] pacing, applied differently than bbr
> On 26 Feb, 2020, at 8:51 am, Taran Lynn wrote: > > As promised, here's the updated arXiv paper on applying model predictive > control to TCP CC [1]. It contains more in depth information about the > implementation, as well as some data from physical experiments. > > [1] https://arxiv.org/abs/2002.09825 Hmmm. I see some qualitative similarities to BBR behaviour, but the algorithm doesn't seem to be very robust since it seems to improve a lot when given approximate a-priori information via cap and collar settings. How does it treat ECN information, or does it set itself Not-ECT? - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] is extremely consistent low-latency for e.g. xbox possible on SoHo networks w/o manual configuration?
> On 12 Feb, 2020, at 6:55 am, Daniel Sterling > wrote: > > * first and foremost, to the exclusion of all other goals, consistent > low-latency for non-bulk streams from particular endpoints; usually > those streams are easily identified and differentiated from all other > streams based on UDP/TCP port number, This is the ideal situation for simply deploying Cake without any special effort. Just tell it the capacity of the link it's controlling, minus a modest margin (say 1% upstream, 5% downstream). You should be pleasantly surprised by the results. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Ecn-sane] 2019-12-31 docsis strict priority dual queue patent granted
> On 24 Jan, 2020, at 7:37 am, Dave Taht wrote: > > "Otherwise, this exemplary embodiment enables system configuration to > discard the low-priority packet tail, and transmit the high-priority > packet instead, without waiting." So this really *is* a "fast lane" enabling technology. Just as we suspected. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] abc congestion control on time varying wireless links
> On 11 Dec, 2019, at 9:54 pm, Dave Taht wrote: > > The DC folk want a multibit more immediate signal, for which L4S is > kind of targetted, (and SCE also > applies). I haven't seen any data on how well dctcp or SCE -style can > work on wildly RTT varying links as yet, although it's been pitched at > the LTE direction, not at wifi. It turns out that a Codel marking strategy for SCE, with modified parameters of course, works well for tolerating bursty and aggregating links. The RED-ramp and step-function strategies do not - and they're equally bad if the same test scenario is applied to DCTCP or TCP Prague. The difference is not small; switching from RED to Codel improves goodput from 1/8th to 80% of nominal link capacity, when a rough model of wifi characteristics is inserted into our usual Internet-path scenario. We're currently exploring how best to set the extra set of Codel parameters involved. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Ecn-sane] sce materials from ietf
> On 1 Dec, 2019, at 9:32 pm, Sebastian Moeller wrote: > >> Meanwhile, an ack filter that avoids dropping acks in which the reserved >> flag bits differ from its successor will not lose any information in the >> one-bit scheme. This is what's implemented in Cake (except that not all the >> reserved bits are covered yet, only the one we use). > > So, to show my lack of knowledge, basically a pure change in sequence number > is acceptable, any other differences should trigger ACK conservation instead > of filtering? You are broadly correct, in that a pure advance of acked sequence number effectively obsoletes the earlier ack and it is therefore safe (and even arguably beneficial) to drop it. However a *duplicate* ack should *not* be dropped, because that may be required to trigger Fast Retransmission in the absence of SACK. Cake's ack filter is a bit more sophisticated than that, in that it can also accept certain harmless changes within TCP options. I believe Timestamps and SACK get special handling along these lines; Timestamps can always change, SACK gets equivalent "pure superset" logic to detect when the old ack is completely covered by the new one. Other options not specifically handled are treated as disqualifying. All this only occurs in two consecutive packets which are both acks for the same connection and which are both waiting for a delivery opportunity in the queue. An earlier ack is never delayed just to see if it can be combined with a later one. The result is a better use of limited capacity to carry useful payloads, without having to rely on dropping acks by AQM action (which Codel is actually rather bad at). - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Ecn-sane] sce materials from ietf
> On 1 Dec, 2019, at 9:03 pm, Sebastian Moeller wrote: > >> If less feedback is observed by the sender than intended by the AQM, growth >> will continue and the AQM will increase its marking to compensate, >> ultimately resorting to a CE mark. > > Well, that seems undesirable? As a safety valve, getting a CE mark is greatly preferable to losing congestion control entirely, or incurring a packet loss as the other alternative congestion signal. It would only happen if the SCE signal or feedback were seriously disrupted or entirely erased - the latter being the *normal* state of affairs when either endpoint is not SCE aware in the first place. > Am I right to assume that the fault tolerance requires a relative steady ACK > stream though? It only needs to be sufficient to keep the TCP stream flowing. If the acks are bursty, that's a separate problem in which it doesn't really matter if they're all present or not. And technically, the one-bit feedback mechanism is capable of precisely reflecting a sparse sequence of SCE marks using just two acks per mark. > I fully agree that if ACK thinning is performed it really should be careful > to not loose information when doing its job, but SCE hopefully can deal with > whatever is out in the field today (I am looking at you DOCSIS uplinks...), > no? Right, that's the essence of the above discussion about relative feedback error, which is the sort of thing that random ack loss or unprincipled ack thinning is likely to introduce. Meanwhile, an ack filter that avoids dropping acks in which the reserved flag bits differ from its successor will not lose any information in the one-bit scheme. This is what's implemented in Cake (except that not all the reserved bits are covered yet, only the one we use). - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Ecn-sane] sce materials from ietf
> On 1 Dec, 2019, at 6:35 pm, Sebastian Moeller wrote: > > Belt and suspenders, eh? But realistically, the idea of using an accumulating > SCE counter to allow for a lossy reverse ACK path seems sort of okay (after > all TCP relies on the same, so there would be a nice symmetry ). Sure, we did think of several schemes that used a counter. But when it came down to actually implementing it, we decided to try the simplest possible solution first and see how well it worked in practice. It turned out to work very well, and can recover cleanly from as much as 100% relative feedback error caused by ack loss: If less feedback is observed by the sender than intended by the AQM, growth will continue and the AQM will increase its marking to compensate, ultimately resorting to a CE mark. This is, incidentally, exactly what happens if the receiver *or* sender are completely SCE-ignorant, and looks very much like RFC-3168 behaviour, which is entirely intentional. If feedback is systematically doubled by the time it reaches the sender, perhaps through faulty ack filtering on the return path, it will back off more than intended, the bottleneck queue will empty, and AQM feedback will consequently reduce or cease entirely. Only a very serious fault would re-inject ESCE feedback once SCE marking has completely ceased, so the sender will then grow back towards the correct cwnd after a relatively small negative excursion. The above represents both extremes of 100% relative error in the feedback, which is shown to be safe and reasonably tolerable. Smaller errors due to random ack loss are more likely, and consequently easier to tolerate in a closed negative-feedback control loop. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Ecn-sane] sce materials from ietf
> On 1 Dec, 2019, at 12:17 am, Carsten Bormann wrote: > >> There are unfortunate problems with introducing new TCP options, in that >> some overzealous firewalls block traffic which uses them. This would be a >> deployment hazard for SCE, which merely using a spare header flag avoids. >> So instead we are still planning to use the spare bit - which happens to be >> one that AccECN also uses, but AccECN negotiates in such a way that SCE can >> safely use it even with an AccECN capable partner. > > This got me curious: Do you have any evidence that firewalls are friendlier > to new flags than to new options? Mirja Kuhlewind said as much during the TCPM session we attended, and she ought to know. There appear to have been several studies performed on this subject; reserved TCP flags tend to get ignored pretty well, but unknown TCP options tend to get either stripped or blocked. This influenced the design of AccECN as well; in an early version it would have used only a TCP option and left the TCP flags alone. When it was found that firewalls would often interfere with this, the three-bit field in the TCP flags area was cooked up. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat