Re: [Bloat] Congestion control with FQ-Codel/Cake with Multicast?

2024-05-25 Thread Jonathan Morton via Bloat
> On 24 May, 2024, at 12:43 am, Holland, Jake via Bloat 
>  wrote:
> 
> I agree with your conclusion that FQ systems would see different
> streams as separate queues and each one could be independently
> overloaded, which is among the reasons I don't think FQ can be
> viewed as a solution here (though as a mitigation for the damage
> I'd expect it's a good thing to have in place).

Cake has the facility to override the built-in flow and tin classification 
using custom filter rules.  Look in the tc-cake manpage under "OVERRIDING 
CLASSIFICATION".  This could be applicable to multicast traffic in two ways:

1: Assign all traffic with a multicast-range IP address to the Video tin.  
Since Cake schedules by tin first, and only then by host and/or flow, this 
should successfully keep multicast traffic from obliterating best-effort and 
Voice tin traffic.

2: Assign all multicast traffic to a single flow ID (eg. zero), without 
reassigning the tin.  This will cause it all to be treated like a single flow, 
giving the FQ mechanisms something to bite on.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Jonathan Morton via Bloat
> On 21 May, 2024, at 8:32 pm, Sebastian Moeller  wrote:
> 
>> On 21. May 2024, at 19:13, Livingood, Jason via Bloat 
>>  wrote:
>> 
>> On 5/21/24, 12:19, "Bloat on behalf of Jonathan Morton via Bloat wrote:
>> 
>>> Notice in particular that the only *performance* comparisons they make are 
>>> between L4S and no AQM at all, not between L4S and conventional AQM - even 
>>> though they now mention that the latter *exists*.
>> 
>> I cannot speak to the Nokia deck. But in our field trials we have certainly 
>> compared single queue AQM to L4S, and L4S flows perform better.

I don't dispute that, at least insofar as the metrics you prefer for such 
comparisons, under the network conditions you also prefer.  But by omitting the 
conventional AQM results from the performance charts, the comparison presented 
to readers is not between L4S and the current state of the art, and the 
expected benefit is therefore exaggerated in a misleading way.

An unbiased presentation would alert readers to the fact that merely deploying 
a conventional AQM would already eliminate nearly all of the queue-related 
delay associated with a dumb FIFO, without sacrificing much if any goodput.  By 
doing this, they would also not expose themselves to the risks associated with 
deploying L4S (see below).

>>> There's also no mention whatsoever of what happens when L4S traffic meets a 
>>> conventional AQM.
>> 
>> We also tested this and all is well; the performance of classic queue with 
>> AQM is fine.
> 
> [SM] I think you are thinking of a different case than Jonathan, not classic 
> traffic in the C-queue, but L4S traffic (ECT(1)) that by chance is not hiting 
> abottleneck employing DualQ but the traditional FIFO...
> This is the case where at least TCP Prague just folds it, gives up and goes 
> home...
> 
> Here is Pete's data showing that, the middle two bars show what happens when 
> the bottleneck is not treating TCP Prague to the expected signalling...

This isn't even the case I was thinking of.  Neither "classic" traffic in the C 
queue (a situation which L4S has always been designed to accommodate, however 
much we might debate the effectiveness of the design), nor L4S traffic in a 
dumb FIFO (which, though it performs badly, is at least "safe"), but L4S 
traffic in a "classic" RFC-3168 AQM, of the type which is already deployed to 
some extent.  This is what exposes the fundamental incompatibility between L4S 
and conventional traffic, as I have been saying from practically the moment I 
heard about L4S.

It's unfortunate that this case is not covered in the chart that Sebastian 
linked.  The situation arose because that particular chart is focused on a 
performance concern, not a safety concern which was treated elsewhere in the 
report.  What it would show, if a fourth qdisc such as "codel" were included 
(with ECN turned on), is a similar magnitude of throughput bias as in the 
"pfifo" qdisc, but in the opposite direction.  Note that the bias in the 
"pfifo" case arises solely because Prague does not *scale up* to high BDPs in 
the way that CUBIC does.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-21 Thread Jonathan Morton via Bloat
> On 21 May, 2024, at 6:31 pm, Frantisek Borsik via Bloat 
>  wrote:
> 
> Just "fresh from the oven", shared by Jason on social media:
> 
> https://ripe88.ripe.net/wp-content/uploads/presentations/67-20240521_RIPE88_L4S_introduction_Werner_Coomans_upload.pdf

The usual set of half-truths, with a fresh coat of paint.  Notice in particular 
that the only *performance* comparisons they make are between L4S and no AQM at 
all, not between L4S and conventional AQM - even though they now mention that 
the latter *exists*.  There's also no mention whatsoever of what happens when 
L4S traffic meets a conventional AQM.

 - Jonathan Morton


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Confucius queue management scheme

2024-02-14 Thread Jonathan Morton via Bloat
> On 10 Feb, 2024, at 7:05 pm, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> This looks interesting: https://arxiv.org/pdf/2310.18030.pdf
> 
> They propose a scheme to gradually let new flows achieve their fair
> share of the bandwidth, to avoid the sudden drops in the available
> capacity for existing flows that can happen with FQ if a lot of flows
> start up at the same time.

I took some time to read and think about this.

The basic idea is delightfully simple:  "old" flows have a fixed weight of 1.0; 
"new" flows have a weight of (old flows / new flows) * 2^(k*t), where t is the 
age of the flow and k is a tuning constant, and are reclassified as "old" flows 
when this quantity reaches 1.0.  They also describe a queuing mechanism which 
uses these weights, which while mildly interesting in itself, isn't directly 
relevant since a variant of DRR++ would also work here.

I noticed four significant problems, three of which arise from significant edge 
cases, and the fourth is an implementation detail which can easily be remedied. 
 I didn't see any discussion of these edge cases in the paper, only the 
implementation detail.  The latter is just a discretisation of the exponential 
function into doubling epochs, probably due to an unfamiliarity with 
fixed-point arithmetic techniques.  We can ignore it when thinking about the 
wider design theory.

The first edge case is already fatal unless somehow handled:  starting with an 
idle link, there are no "old" flows and thus the numerator of the equation is 
zero, resulting in a zero weight for any number of new flows which then arise.  
There are several reasonable and quite trivial ways to handle this.

The second edge case is the dynamic behaviour when "new" flows transition to 
"old" ones.  This increases the numerator and decreases the denominator for 
other "new" flows, causing a cascade effect where several "new" flows of 
similar but not identical age suddenly become "old", and younger flows see a 
sudden jump in weight, thus available capacity.  This would become apparent in 
realistic traffic more easily than in a lab setting.  A formulation which 
remains smooth over this transition would be preferable.

The third edge case is that there is no described mechanism to remove flows 
from the "old" set when they become idle.  Most flows on the Internet are in 
practice short, so they might even go permanently idle before leaving the "new" 
set.  If not addressed, this becomes either a memory leak or a mechanism for 
the flow hash table to rapidly fill up, so that in practice all flows are soon 
seen as "old".  The DRR++ mechanism doesn't suffice, because the state in 
Confucius is supposed to evolve over longer time periods, much longer than the 
sojourn time of an individual packet in the queue.

The basic idea is interesting, but the algorithmic realisation of the idea 
needs work.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] slow start improvement

2023-12-28 Thread Jonathan Morton via Bloat
> On 28 Dec, 2023, at 12:17 pm, Sebastian Moeller via Bloat 
>  wrote:
> 
> The inherent idea seems to be if one would know the available capacity one 
> could 'jump' the cwnd immediately to that window... (ignoring the fact the 
> rwnd typically takes a while to increase accordingly*). 

Yes, I've just got to the bit about selectively ignoring rwnd - that's a 
straight violation of TCP.  There may be scope for optimising congestion 
control in various ways, but rwnd is a fundamental part of the protocol that 
predates congestion control itself; it implements TCP's original function of 
"flow control".  Sending data outside the rwnd invites the receiver invoking 
RST, or even firewall action, which I can guarantee will have a material impact 
on flow completion time!

Slow-start already increases cwnd to match the BDP in at most 20 RTTs, and 
that's the extreme condition, starting from an IW of 1 segment and ramping up 
to the maximum possible window of 2^30 bytes (assuming an MSS of at least 1KB, 
which is usual).  The more recent standard of having IW=10 already shortens 
that by 3-4 RTTs.  It's an exponential process, so even quite large changes in 
available bandwidth don't affect the convergence time very much.  TCP's 
adaptation to changes in the BDP after slow-start is considerably slower, even 
with CUBIC.

I also note a lack of appreciation as to how HyStart (and HyStart++) works.  
Their delay-sensitive criterion is triggered not when the cwnd exceeds the BDP, 
but at an earlier point when the packet bursts (issued at double the natural 
ack-clocked rate) cause a meaningful amount of temporary queue delay.  This 
queuing is normally drained almost immediately after it occurs, *precisely 
because* the cwnd has not yet reached the true path BDP.  This allows 
slow-start to transition to congestion-avoidance smoothly, without a 
multiplicative-decrease episode.  HyStart++ adds a further phase of exponential 
growth on a more cautious schedule, but with essentially the same principle in 
mind.

The irony is that they rely on precisely the same phenomenon of short-term 
queuing, but observe it in the form of the limited delivery rate of a burst, 
rather than an increase in delay on the later packets of the burst.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Best approach for debloating Airbnb host?

2023-10-17 Thread Jonathan Morton via Bloat
> On 17 Oct, 2023, at 4:10 pm, Sebastian Moeller via Bloat 
>  wrote:
> 
>   [SM] This is why maybe a demo unit would be helpful, but then we would 
> need something with commercial grade support to point them at? Maybe 
> evenroute's IQrouter (I like their approach, but I never tested it).

For IETF Montreal and Singapore, I carried along my IQrouter and temporarily 
inserted it into the network of the AirBnBs we used - for which the host wasn't 
directly present.  I only had to inform it that there was a new network to 
calibrate itself to, and it ran the necessary capacity tests automatically.  
It's also possible to inform it directly about the line's rated capacity, and 
it will just run tests to verify that the capacity is actually available.

Mine is the v2 hardware, which is no longer the one sold, but the v3 is just a 
newer model from the same underlying vendor.  There seems to be enough 
commonality for a similar feature set and UI to be available in both versions.  
I'm sure that simplifies support logistics.  They would easily be able to cope 
with an 80/20 line.

> 2. What would I recommend? Obviously, inserting something with cake into the 
> mix would help a lot. Even if they were willing to let me examine their 
> entire network (Comcast router, Apple Airport in our Airbnb unit, other 
> router?) I have no idea what kind of tar baby I would be touching. I don't 
> want to become their network admin for the rest of time.

For a one-stop "plug it in and go" solution, the IQrouter is hard to beat.  
Evenroute also do a reasonably good job of explaining the technical background 
on the necessary level for end users, to help them understand what needs to be 
plugged into what and why, and more importantly where things should NOT be 
plugged in any more.

Of course, while the IQrouter has a decent WiFi AP of its own, installing it 
wouldn't directly improve the WiFi characteristics of the Apple Airport - it's 
quite understandable to have a separate AP for guests, in particular so they 
don't have to "shout through a wall".  However, if the airwaves are not overly 
congested (we found that the 2.4GHz band was a mess in Montreal, but 5GHz was 
fine), that probably doesn't matter, as the WiFi link may not be the 
bottleneck.  If necessary, it could be substituted with a debloated AP - if 
there's one we can recommend with the "new wifi stack", so much the better.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Starlink] [LibreQoS] [Rpm] net neutrality back in the news

2023-09-28 Thread Jonathan Morton via Bloat
> On 29 Sep, 2023, at 1:19 am, David Lang via Bloat 
>  wrote:
> 
> Dave T called out earlier that the rise of bittorrent was a large part of the 
> inital NN discussion here in the US. But a second large portion was a money 
> grab from ISPs thinking that they could hold up large paid websites (netflix 
> for example) for additional fees by threatening to make their service less 
> useful to their users (viewing their users as an asset to be marketed to the 
> websites rather than customers to be satisfied by providing them access to 
> the websites)
> 
> I don't know if a new round of "it's not fair that Netflix doesn't pay us for 
> the bandwidth to service them" would fall flat at this point or not.

I think there were three more-or-less separate concerns which have, over time, 
fallen under the same umbrella:


1:  Capacity-seeking flows tend to interfere with latency-sensitive flows, and 
the "induced demand" phenomenon means that increases in link rate do not in 
themselves solve this problem, even though they may be sold as doing so.

This is directly addressed by properly-sized buffers and/or AQM, and even 
better by FQ and SQM.  It's a solved problem, so long as the solutions are 
deployed.  It's not usually necessary, for example, to specifically enhance 
service for latency-sensitive traffic, if FQ does a sufficiently good job.  An 
increased link rate *does* enhance service quality for both latency-sensitive 
and capacity-seeking traffic, provided FQ is in use.


2:  Swarm traffic tends to drown out conventional traffic, due to congestion 
control algorithms which try to be more-or-less fair on a per-flow basis, and 
the substantially larger number of parallel flows used by swarm traffic.  This 
also caused subscribers using swarm traffic to impair the service of 
subscribers who had nothing to do with it.

FQ on a per-flow basis (see problem 1) actually amplifies this effect, and I 
think it was occasionally used as an argument for *not* deploying FQ.  ISPs' 
initial response was to outright block swarm traffic where they could identify 
it, which was then softened to merely throttling it heavily, before NN 
regulations intervened.  Usage quotas also showed up around this time, and were 
probably related to this problem.

This has since been addressed by several means.  ISPs may use FQ on a 
per-subscriber basis to prevent one subscriber's heavy traffic from degrading 
service for another.  Swarm applications nowadays tend to employ altruistic 
congestion control which deliberately compensates for the large number of 
flows, and/or mark them with one or more of the Least Effort class DSCPs.  
Hence, swarm applications are no longer as damaging to service quality as they 
used to be.  Usage quotas, however, still remain in use as a profit centre, to 
the point where an "unlimited" service is a rare and precious specimen in many 
jurisdictions.


3:  ISPs merged with media distribution companies, creating a conflict of 
interest in which the media side of the business wanted the internet side to 
actively favour "their own" media traffic at the expense of "the competition".  
Some ISPs began to actively degrade Netflix traffic, in particular by refusing 
to provision adequate peering capacity at the nodes through which Netflix 
traffic predominated, or by zero-rating (for the purpose of usage quotas) 
traffic from their own media empire while refusing to do the same for Netflix 
traffic.

**THIS** was the true core of Net Neutrality.  NN regulations forced ISPs to 
carry Netflix traffic with reasonable levels of service, even though they 
didn't want to for purely selfish and greedy commercial reasons.  NN succeeded 
in curbing an anti-competitive and consumer-hostile practice, which I am 
perfectly sure would resume just as soon as NN regulations were repealed.

And this type of practice is just the sort of thing that technologies like L4S 
are designed to support.  The ISPs behind L4S actively do not want a technology 
that works end-to-end over the general Internet.  They want something that can 
provide a domination service within their own walled gardens.  That's why L4S 
is a NN hazard, and why they actively resisted all attempts to displace it with 
SCE.


All of the above were made more difficult to solve by the monopolistic nature 
of the Internet service industry.  It is actively difficult for Internet users 
to move to a truly different service, especially one based on a different link 
technology.  When attempts are made to increase competition, for example by 
deploying a publicly-funded network, the incumbents actively sabotage those 
attempts by any means they can.  Monopolies are inherently customer-hostile, 
and arguments based on market forces fail in their presence.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] The curious case of "cursed-ECN" steam downloads

2023-09-03 Thread Jonathan Morton via Bloat
> On 3 Sep, 2023, at 9:54 pm, Sebastian Moeller via Ecn-sane 
>  wrote:
> 
> B) Excessive ECT(1) marking (this happened with a multi-GB download)

This *could* be a badly configured middlebox attempting to apply a DSCP, but 
clobbering the entire TOS byte instead of the (left justified) DSCP field.  
Apparently Comcast just found a whole raft of these in their own network as 
part of rolling out L4S support.  Funny how they didn't notice them previously.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] quick question

2023-08-26 Thread Jonathan Morton via Bloat
> On 26 Aug, 2023, at 2:48 pm, Sebastian Moeller via Ecn-sane 
>  wrote:
> 
> percentage of packets marked: 100 * (2346329 / 3259777) = 72%
> 
> This seems like too high a marking rate to me. I would naively expect that a 
> flow on getting a mark scale back by its cwin by 20-50% and then slowly 
> increaer it again, so I expect the actual marking rate to be considerably 
> below 50% per flow...

> My gut feeling is that these steam flows do not obey RFC3168 ECN (or 
> something wipes the CE marks my router sends upstream along the path)... but 
> without a good model what marking rate I should expect this is very 
> hand-wavy, so if anybody could help me out with an easy derivation of the 
> expected average marking rate I would be grateful.

Yeah, that's definitely too much marking.  We've actually seen this behaviour 
from Steam servers before, but they had fixed it at some point.  Perhaps 
they've unfixed it again.

My best guess is that they're running an old version of BBR with ECN 
negotiation left on.  BBRv1, at least, completely ignores ECE responses.  
Fortunately BBR itself does a good job of congestion control in the FQ 
environment which Cake provides, as you can tell by the fact that the queues 
never get full enough to trigger heavy dropping.

The CUBIC RFC offers an answer to your question:



Reading the table, for RTT of 100ms and throughput 100Mbps in a single flow, a 
"loss rate" (equivalent to a marking rate) of about 1 per 7000 packets is 
required.  The formula can be rearranged to find a more general answer.

 - Jonathan Morton___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cake] Anybody has contacts at Dropbox?

2023-06-24 Thread Jonathan Morton via Bloat
> On 25 Jun, 2023, at 12:00 am, Sebastian Moeller via Cake 
>  wrote:
> 
> Is dropbox silently already using an L4S-style CC for their TCP?

It should be possible to distinguish this by looking at the three-way handshake 
at the start of the connection.  This will show a different set of TCP flags 
and ECN field values depending on whether RFC-3168 or AccECN is being 
attempted.  Without AccECN, you won't have functioning L4S on a TCP stream.

But I think it is more likely that it's a misapplied DSCP.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] SQM tuning question

2023-06-03 Thread Jonathan Morton via Bloat
> On 3 Jun, 2023, at 4:56 pm, John D via Bloat  
> wrote:
> 
> On the website it says the following:
> 
> CoDel is a novel “no knobs”, “just works”, “handles variable bandwidth and 
> RTT”, and simple AQM algorithm.
> 
>   • It is parameterless — no knobs are required for operators, users, or 
> implementers to adjust.
>   • It treats good queue and bad queue differently - that is, it keeps 
> the delays low while permitting bursts of traffic.
>   • It controls delay, while insensitive to round-trip delays, link 
> rates, and traffic loads.
>   • It adapts to dynamically changing link rates with no negative impact 
> on utilization.
> 
> But everywhere I have read about about hardware which implements SQM 
> (including the bufferbloat website) it describes the need to tune based on 
> actual internet connection speed.
> These seem to conflict especially that "handles variable bandwidth" bit. Have 
> I misunderstood or do the algorithms used in modern hardware just not provide 
> this part typically? My connection performance is quite variable and I'm 
> worried about crippling SQM to the lowest speed seen.

SQM in practice requires three components:

1: Flow isolation, so that different flows don't affect each others' latency 
and are delivered fairly;

2: Active Queue Management (AQM) to signal flows to slow down transmissions 
when link capacity is exceeded;

3: Bandwidth shaping to match the queue to the available capacity.

CoDel is, in itself, only the AQM component.  It does indeed work pretty well 
with no additional tuning - but only in combination with the other two 
components, or when applied directly to the actual bottleneck.  Unfortunately 
in most consumer internet links, the actual bottleneck is inaccessible for this 
purpose.  Thus an artificial bottleneck must be introduced, at which SQM is 
applied.

The most convenient tool for applying all three SQM components at once is Cake. 
 This includes implementations of advanced flow isolation, CoDel AQM, and a 
deficit-mode bandwidth shaper.  All you really need to do is to tell it how 
much bandwidth you have in each direction, minus a small margin to ensure it 
becomes the actual bottleneck and can exert the necessary control.

When your available bandwidth varies over time, that can be inconvenient.  
There are methods, however, of observing how available capacity tends to change 
over time (typically on diurnal and weekly patterns, if the variations are due 
to congestion in the ISP backhaul or peering) and scheduling adjustments on 
that basis.  If you have more information on your situation, we might be able 
to give more detailed advice.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Codel] ACM queue article on facebook´s "Adaptive LIFO" and codel

2023-04-10 Thread Jonathan Morton via Bloat
> On 11 Apr, 2023, at 5:12 am, Dave Taht  wrote:
> 
> I have no idea what an "adaptive LIFO" is, but the acm queue paper
> here just takes the defaults from codel...
> 
> https://twitter.com/teivah/status/1645362443986640896

They're applying it to a server request queue, not a network packet queue.  I 
can see the logic of it in that context, but I would also note that LIFO breaks 
one of Codel's core assumptions, which is that the maximum delay of the queue 
it's controlling can be inferred from the delay experienced by the most 
recently dequeued item.

Maybe it still happens to work by accident, or maybe they've implemented some 
specific workaround, but that paper is a very high-level overview (of more than 
one technology, to boot) without much technical detail.  If I didn't already 
know a great deal about Codel from the coal face, I wouldn't even know to 
consider such a failure mode, let alone be able to infer what they could do to 
mitigate it.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Hey, all, what about bandlength?

2023-04-08 Thread Jonathan Morton via Bloat
> On 8 Apr, 2023, at 9:49 pm, Michael Richardson via Bloat 
>  wrote:
> 
>> If I have a bandwidth of 1 Mbit/S, but it takes 2 seconds to deliver 1
>> Mbit, do I have a bandlength of only 1/2 Mbit/S?
> 
> Is that because there is 2seconds of delay?

It could merely be that, due to new-flow effects, the effective utilisation of 
the path is only 50% over those two seconds.  A longer flow might have better 
utilisation in its later stages.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [ih] Installed base momentum (was Re: Design choices in SMTP)

2023-02-13 Thread Jonathan Morton via Bloat
> -- Forwarded message -
> From: Jack Haverty via Internet-history 
> 
> Even today, as an end user, I can't tell if "congestion control" is
> implemented and working well, or if congestion is just mostly being
> avoided by deployment of lots of fiber and lots of buffer memory in all
> the switching locations where congestion might be expected. That of
> course results in the phenomenon of "buffer bloat".   That's another
> question for the Historians.  Has "Congestion Control" in the Internet
> been solved?  Or avoided?

It's a good question, and one that shows understanding of the underlying 
problem.

TCP has implemented a workable congestion control system since the introduction 
of Reno, and has continued to take congestion control seriously with the newer 
flavours of Reno (eg. NewReno, SACK, etc) and CUBIC.  Each of these schemes 
reacts to congestion *signals* from the network; they probe gradually for 
capacity, then back off rapidly when that capacity is evidently exceeded, 
repeatedly.

Confusingly, this process is called the "congestion avoidance" phase of TCP, to 
distinguish it from the "slow start" phase which is, equally confusingly, a 
rapid initial probe for path capacity.  CUBIC's main refinement is that it 
spends more time near the capacity limit thus found than Reno does, and thus 
scales better to modern high-capacity networks at Internet scale.

In the simplest and most widespread case, the overflow of a buffer, resulting 
in packet loss, results in that loss being interpreted as a congestion signal, 
as well as triggering the "reliable stream" function of retransmission.  
Congestion signals can also be explicitly encoded by the network onto IP 
packets, in the form of ECN, without requiring packet losses and the consequent 
retransmissions.

My take is that *if* networks focus only on increasing link and buffer 
capacity, then they are "avoiding" congestion - a strategy that only works so 
long as capacity consistently exceeds load.  However, it has repeatedly been 
shown in many contexts (not just networking) that increased capacity 
*stimulates* increased load; the phenomenon is called "induced demand".  In 
particular, many TCP-based Internet applications are "capacity seeking" by 
nature, and will *immediately* expand to fill whatever path capacity is made 
available to them.  If this causes the path latency to exceed about 2 seconds, 
DNS timeouts can be expected and the user experience will suffer dramatically.

Fortunately, many networks and, more importantly, equipment providers are now 
learning the value of implementing AQM (to apply congestion signals explicitly, 
before the buffers are full), or failing that, of sizing the buffers 
appropriately so that path latency doesn't increase unreasonably before 
congestion signals are naturally produced.  This allows TCP's sophisticated 
congestion control algorithms to work as intended.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] summarizing the bitag latency report?

2022-11-14 Thread Jonathan Morton via Bloat
ge "railcars" are also available for hire.  For the next month's 
timetable, instead of the two 12-carriage trains each day, he will run one of 
these railcars every hour.  These will provide exactly the same seating 
capacity over the course of the day, but the waiting time will now be limited 
to a much more palatable duration.  (In Internet terms, he's optimised squarely 
for latency.)

Still the complaints come in - but now from different sources.  No longer are 
passengers waiting for hours and sleeping overnight in stations.  Instead, 
rush-hour commuters who had previously found the 12-carriage trains convenient 
are finding the railcars too crowded.  Even with over a hundred passengers 
crammed in like sardines, many more are left on the platforms and arrive at 
work late - or worse, come home to a cold dinner and an annoyed wife.  Simply 
put, demand is not evenly distributed through the day, but concentrated on 
particular times; at other times, the railcars are sufficient for the 
relatively small number of passengers, or even run almost empty.

So again, even though the "Quality of Service" is provided just as specified, 
the "Quality of Experience" for the passengers is very poor.  Indeed the 
overcrowding leads to some railcars being delayed, due to the difficulty of 
getting everyone in and out of the doors, and the conductors have great 
difficulty in checking tickets, hence a noticeable reduction in fare revenue.

Things improve markedly when the manager brings in 6-carriage express trains 
for the morning, lunchtime, and evening commuters, and continues to run the 
railcars at hourly intervals in between them, except for the small hours when 
some trains are removed due to minimal demand.  Now there are enough carriages 
in the rush-hour trains to satisfy commuters, and there are still trains 
running at other times so that nobody needs to wait particularly long for one.

In fact, demand increases substantially due to the good "Quality of Experience" 
that this new timetable provides, such that by the end of the first year, many 
of the railcars are upgraded to 3-carriage trains, and the commuter expresses 
are lengthened to 8 carriages.  Fare revenue is more than doubled.  The 
modernisation effort is a success.

The lesson here is that QoS is merely the means by which you may attempt to 
achieve high QoE.  Meeting QoS does not guarantee QoE.  Only if the QoS is 
designed around the factors that genuinely influence QoE will you succeed.  
Unfortunately, many QoS schemes are inadequate for the needs of actual Internet 
users; this is because their designers have not kept up with the appropriate 
QoE factors.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Researchers discover major roadblock in alleviating network congestion

2022-08-07 Thread Jonathan Morton via Bloat
> On 5 Aug, 2022, at 2:46 am, Daniel Sterling  wrote:
> 
> "Flow control power is non-decentralizable" is from -- 1981? So we've
> known for 40 years that TCP streams won't play nicely with each other
> unless you shape them at the slower endpoint-- am I understanding that
> correctly? But we keep trying anyway? :)

More precisely, what was stated in 1981 was:

The specific metric of "network power" (the ratio of throughput to delay, 
calculated for each flow and globally summed) cannot reliably be maximised 
solely by the action of individual endpoints, without information from within 
the network itself.

Current TCPs generally converge not to maximise or even equalise "network 
power", but to equalise between flows a completely different metric called "RTT 
fairness", the *product* of throughput and delay.  Adding information from the 
network via AQMs allows for reductions in delay with little effect on 
throughput, and thus a general increase in network power, but the theoretical 
global optimum is still not even approached.

Adding FQ in the network, thus implementing "max-min fairness" instead of "RTT 
fairness", hence equalising throughput instead of the product of throughput and 
delay.  This is essentially the geometric mean of RTT-fairness and network 
power.

I believe it is actually possible to achieve equalisation of network power 
between flows, which would approach the global optimum of network power, using 
information from the network to guide endpoint behaviour.  This is *only* 
possible using explicit information from the network, however, and is not 
directly compatible with the current congestion-control paradigm of 
RTT-fairness by default.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Researchers discover major roadblock in alleviating network congestion

2022-08-04 Thread Jonathan Morton via Bloat
> On 4 Aug, 2022, at 3:21 pm, Bjørn Ivar Teigen via Bloat 
>  wrote:
> 
> Main take-away (as I understand it) is something like "In real-world 
> networks, jitter adds noise to the end-to-end delay such that any algorithm 
> trying to infer congestion from end-to-end delay measurements will 
> occasionally get it wrong and this can lead to starvation". Seems related to 
> Jaffe's work on network power (titled "Flow control power is 
> non-decentralizable"). 

Hasn't this been known for many years, as a consequence of experience with TCP 
Vegas?

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] updating the theory of buffer sizing

2021-10-10 Thread Jonathan Morton
> On 10 Oct, 2021, at 8:48 pm, Dave Taht  wrote:
> 
> This latest from Nick & co, was quite good:
> 
> https://arxiv.org/pdf/2109.11693.pdf

Skip the false modesty - I think this is very important work, actually.  I 
would expect it to get cited a heck of a lot in future work, both in academia 
and in the IETF.

In terms of its content, it confirms, contextualises, and formalises various 
things that I already understood at an intuitive level.  The mathematics 
involved is simple and accessible (unlike some papers I've read recently), and 
the practical explanations of the observed behaviours are clear and to the 
point.  I particularly appreciate the way they were able to parameterise 
certain characteristics on a continuum, rather than all-or-nothing, as that 
captures the complex characteristics of real traffic much better.

The observations about synchronisation of congestion responses are also very 
helpful.  When synchronised, the aggregate behaviour of many flows is similar 
to that of a much smaller number, perhaps even a single flow.  When 
desynchronised, the well-known statistical multiplexing effects apply.  They 
also clearly explain why the "hard threshold" type of ECN marking is 
undesirable - because it provokes synchronisation in a way that tail-drop does 
not (and this is also firmly related to a point we discussed last week).

Notably, they started seeing the effects of burstiness, on a small and 
theoretically "smooth" network, on timescales of approximately a seventh of a 
millisecond (20 packets, 9000 byte MTU, 10Gbps).  They were unable to reduce 
buffer sizes below that level without throughput dropping well below their 
theoretical predictions, which had held true down to that point.  This has 
implications for setting AQM targets and tolerances in even near-ideal network 
environments.  But they did also note that BBR showed much less sensitivity to 
this effect, as it uses pacing.  In any case, it confirms that the first role 
of a buffer is to absorb bursts without excessive loss.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Relentless congestion control for testing purposes

2021-09-28 Thread Jonathan Morton
> On 29 Sep, 2021, at 2:17 am, Dave Taht  wrote:
> 
> In today's rpm meeting I didn't quite manage to make a complicated point. 
> This long-ago proposal of matt mathis's has often intrigued (inspired? 
> frightened?) me:
> 
> https://datatracker.ietf.org/doc/html/draft-mathis-iccrg-relentless-tcp-00
> 
> where he proposed that a tcp variant have no response at all to loss or 
> markings, merely replacing lost segments as they are requested, continually 
> ramping up until the network basically explodes.

I think "no response at all" is overstating it.  Right in the abstract, it is 
described as removing the lost segments from the cwnd; ie. only acked segments 
result in new segments being transmitted (modulo the 2-segment minimum).  In 
this sense, Relentless TCP is an AIAD algorithm much like DCTCP, to be 
classified distinctly from Reno (AIMD) and Scalable TCP (MIMD).

   Relentless congestion control is a simple modification that can be
   applied to almost any AIMD style congestion control: instead of
   applying a multiplicative reduction to cwnd after a loss, cwnd is
   reduced by the number of lost segments.  It can be modeled as a
   strict implementation of van Jacobson's Packet Conservation
   Principle.  During recovery, new segments are injected into the
   network in exact accordance with the segments that are reported to
   have been delivered to the receiver by the returning ACKs.

Obviously, an AIAD congestion control would not coexist nicely with AIMD based 
traffic.  We know this directly from experience with DCTCP.  It cannot 
therefore be recommended for general use on the Internet.  This is acknowledged 
extensively in Mathis' draft.

> In the context of *testing* bidirectional network behaviors in particular, 
> seeing tcp tested more than unicast udp has been, in more labs, has long been 
> on my mind.

Yes, as a tool specifically for testing with, and distributed with copious 
warnings against attempting to use it more generally, this might be interesting.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] DSLReports Speed Test doesn't like Remote Desktop

2021-08-28 Thread Jonathan Morton
> On 28 Aug, 2021, at 10:36 pm, Michael Richardson  wrote:
> 
> RDP (specifically with Windows as the desktop) is integrated into the display
> pipeline such that it effectively never loses frames.  The results of an
> (e.g.) Excel redraw over a slow link can be spectactically stupid with every
> cell being drawn each time it is "re"-computed.  The result is that the
> application itself is blocked when the RDP frames are being generated.
> 
> I/we observed this a decade ago when building virtual desktop infrastructure.
> There was a Linux Xrdp server (via a bunch of patches that didn't survive)
> that was more screen-scraper.  VNC has always screen scraped the pixels, so it
> "naturally" skips the intermediate frames when the application draws faster
> than then remote desktop protocol can keep up.
> 
> I thought that there were patches to RDP to make this better, but I never
> confirmed this.

Funnily enough, I was actually in the VNC community for a while, having written 
a functioning server for Classic MacOS, so I'm familiar with this dilemma.  Due 
to some quirks of Classic MacOS, it was often necessary to do the 
screen-scraping, encoding and socket transmissions at interrupt time, and I had 
to limit the amount of data generated at any given time so that it didn't block 
on a full buffer - which could lock *everything* up.

My experience of modern browser rendering pipelines is that they do everything 
in backbuffers, then blit them to the screen wholesale.  This *should* be quite 
efficient for an RDP to handle, so long as it excludes areas that were 
unchanged on consecutive blits.  But it's also possible for it to pick up 
drawing to background tabs, and only after much CPU effort determine that 
nothing visibly changed.

At any rate, the original problem turned out to be something else entirely.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] DSLReports Speed Test doesn't like Remote Desktop

2021-08-27 Thread Jonathan Morton
> On 27 Aug, 2021, at 2:25 am, Kenneth Porter  wrote:
> 
> The DSLReports speed test gives me <5 Mbps download speed when used over a 
> Remote Desktop connection. The same test gives me around 200 Mbps when run on 
> my machine connected to my display. The Waveform test shows 200 Mbps from the 
> remote machine. All are done with Chrome. Bunch of tabs open on both, similar 
> sets of extensions.
> 
> I'm testing my Comcast XB3 modem + OpenWrt router before upgrading it to XB7.
> 
> I use two computers, both Win10-x64. One's a half-height with a bit better 
> CPU and memory that I use for development and web/mail, while the other has a 
> full-height tower chassis so it has my good video card for gaming. I have my 
> big 43" display hooked to the latter and I remote to the short machine for 
> "business" use.
> 
> https://www.waveform.com/tools/bufferbloat?test-id=62b54f0c-eb3e-40c8-ab99-4f2105f39525
> 
> This one looks very poor, 4 Mbps:
> 
> http://www.dslreports.com/speedtest/69341504
> 
> Much better, direct instead of through RDP:
> 
> http://www.dslreports.com/speedtest/69341657

A browser-based speed test like DSLreports depends heavily on the 
responsiveness of the browser itself.  It would appear that RDP interferes with 
that quite spectacularly, although I'm unsure exactly why.  The only advice I 
can give is "don't do that, then".

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Rpm] Airbnb

2021-08-10 Thread Jonathan Morton
> On 10 Aug, 2021, at 7:51 am, Matt Mathis via Bloat 
>  wrote:
> 
> For years we published a "jitter" metric that I considered to be bogus, 
> basically max_rtt - min_rtt, (which were builtin Web100 instruments).
> 
> In 2019, we transitioned from web100 to "standard" linux tcp_info, which does 
> not capture max_rtt.   Since the web100 jitter was viewed as bogus, we did 
> not attempt to reconstruct it, although we could have.   Designing and 
> implementing a new latency metric was on my todo list from the beginning of 
> that transition, but chronically preempted by more pressing problems.
> 
> It finally made it to the top of my queue which is why I am suddenly not 
> lurking here and the new rpm list.  I was very happy to see the Apple 
> responsiveness metric, and realized that M-Lab can implement a TCP version of 
> it, that can be computed both in real time on future tests and retroactively 
> over archived tests collected over the last 12 years.
> 
> This quick paper might be of interest: Preliminary Longitudinal Study of 
> Internet Responsiveness

Intriguing.  The properly processed version of the data will probably show the 
trends more clearly, too.

I think there is merit in presenting the European data as well, so long as the 
discontinuities caused by topological/geographical alterations can be 
identified and indicated.  There are some particular local phenomena that I 
think would be reflected in that, such as the early rollout of fq_codel by 
free.fr.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Rpm] Airbnb

2021-08-09 Thread Jonathan Morton
> On 9 Aug, 2021, at 10:25 pm, Dave Collier-Brown 
>  wrote:
> 
> My run of it reported latency, but without any qualifiers...

One would reasonable assume that's idle latency, then.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Make-wifi-fast] [Starlink] [Cake] [Cerowrt-devel] Due Aug 2: Internet Quality workshop CFP for the internet architecture board

2021-08-08 Thread Jonathan Morton
> On 8 Aug, 2021, at 9:36 pm, Aaron Wood  wrote:
> 
> Less common, but something I still see, is that a moving station has 
> continual issues staying in proper MIMO phase(s) with the AP.  Or I think 
> that's what's happening.  Slow, continual movement of the two, relative to 
> each other, and the packet rate drops through the floor until they stop 
> having relative motion.  And I assume that also applies to time-varying 
> path-loss and path-distance (multipath reflections).

So is it time to mount test stations on model railway wagons?

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Starlink] Of interest: Comcast AQM Paper

2021-08-04 Thread Jonathan Morton
On Wed, 4 Aug 2021 at 21:31, Juliusz Chroboczek  wrote:
> A Cortex-A53 SoC at 1GHz with correctly designed Ethernet (i.e. not the
> Raspberry Pi) can push 1Gbit from userspace without breaking a sweat.

That was true of the earlier Raspberry Pis (eg. the Pi 3 uses a brace
of Cortex-A53s) which use Ethernet chipsets attached over USB 2, but
the Pi 4B has a directly integrated Ethernet port and two of the
external USB ports are USB 3, giving enough bandwidth to attach a
second GigE port.  We have tested this in practice, and got full line
rate throughput through Cake (though the CPU usage went up fairly
sharply after about halfway).

The Compute Module 4 exposes the same integrated Ethernet port, and a
PCIe lane in place of the USB 3 chipset (the latter being attached to
the former in the standard Pi 4B).  This obviously allows attaching at
least one real GigE port (with a free choice of PCIe-based chipset) at
full line rate, without the intermediate step of USB.  I think it
would be reasonable to include a small Ethernet switch downstream of
this, matching the connectivity of typical CPE on the LAN side.  If a
PCIe switch is inserted, then a choice of Mini-PCIe Wifi cards can be
installed, with cables running to the normal array of external
antennae, sidestepping the problem of USB Wifi dongles.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Starlink] Of interest: Comcast AQM Paper

2021-08-04 Thread Jonathan Morton
I firmly believe this is due to an I/O bottleneck in the SoC between
the network complex and the CPU complex, not due to any limitation of
the CPU itself.  It stems from the reliance on accelerated forwarding
hardware to achieve full line-rate throughput.  Even so, I'd much
rather have 40Mbps with Cake than 400Mbps with a dumb FIFO.  (Heck,
40Mbps would be a big upgrade from what I currently have.)  I think
some of the newer Atheros chipsets are less constrained in this
respect.

There are two reasonably good solutions to this problem in the hands
of the SoC vendors:

1: Relieve that I/O bottleneck, so that the CPU can handle packets at
full line rate.  I assume this is not hugely complicated to implement,
and just requires a sufficient degree of will to select the right
option from the upstream fabless IP vendor's design library.

2: Implement good shaping, FQ, and AQM within the network complex.  At
consumer broadband/LAN speeds, this shouldn't be too difficult (unlike
doing the same at 100+ Gbps), but it does require a significant amount
of hardware design and validation, and that tends to have long lead
times.

There is a third solution in the hands of us mere mortals:

3: Leverage the Raspberry Pi ecosystem to build a CPE device that
meets our needs.  This could be a Compute Module 4 (which has the
necessary I/O throughput) mounted on a custom PCB that provides
additional Ethernet ports and some reasonable Wifi AP.  It could
alternatively be a standard Pi 4B with some USB Ethernet and Wifi
hardware plugged into it.  Either will do the job withhout any
Ethernet bottlenecks, although the capabilities of USB Wifi dongles
are usually quite limited.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Starlink] Of interest: Comcast AQM Paper

2021-08-04 Thread Jonathan Morton
> I assume by WiFi what is really meant is devices that have at least one WiFi 
> (layer 1/layer 2) interface. While there are queues in the MAC sublayer, 
> there is really no queue management functionality ... yet ... AFAIK. I know 
> IEEE P802.11bd in conjunction w/ IEEE 1609 is working on implementing a few 
> rudimentary queue mgmt functions.
>
> That said, seems any AQM in such devices would more than likely be in layer 3 
> and above.

Linux-based CPE devices have AQM functionality integrated into the
Wifi stack.  The AQM itself operates at layer 3, but the Linux Wifi
stack implementation uses information from layers 2 and 4 to improve
scheduling decisions, eg. airtime-fairness and flow-isolation (FQ).
This works best on soft-MAC Wifi hardware, such as ath9k/10k and MT76,
where this information is most readily available to software.  In
principle it could also be implemented in the MAC, but I don't know of
any vendor that's done that yet.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Make-wifi-fast] Little's Law mea culpa, but not invalidating my main point

2021-07-12 Thread Jonathan Morton
> On 12 Jul, 2021, at 11:04 pm, Bob McMahon via Make-wifi-fast 
>  wrote:
> 
> "Flow control in store-and-forward computer networks is appropriate for 
> decentralized execution. A formal description of a class of "decentralized 
> flow control algorithms" is given. The feasibility of maximizing power with 
> such algorithms is investigated. On the assumption that communication links 
> behave like M/M/1 servers it is shown that no "decentralized flow control 
> algorithm" can maximize network power. Power has been suggested in the 
> literature as a network performance objective. It is also shown that no 
> objective based only on the users' throughputs and average delay is 
> decentralizable. Finally, a restricted class of algorithms cannot even 
> approximate power."
> 
> https://ieeexplore.ieee.org/document/1095152
> 
> Did Jaffe make a mistake?

I would suggest that if you model traffic as having no control feedback, you 
will inevitably find that no control occurs.  But real Internet traffic *does* 
have control feedback - though it was introduced some time *after* Jaffe's 
paper, so we can forgive him for a degree of ignorance on that point.  Perhaps 
Jaffe effectively predicted the ARPANET congestion collapse events with his 
analysis.

> Also, it's been observed that latency is non-parametric in it's distributions 
> and computing gaussians per the central limit theorem for OWD feedback loops 
> aren't effective. How does one design a control loop around things that are 
> non-parametric? It also begs the question, what are the feed forward knobs 
> that can actually help?

Control at endpoints benefits greatly from even small amounts of information 
supplied by the network about the degree of congestion present on the path.  
This is the role played first by packets lost at queue overflow, then 
deliberately dropped by AQMs, then marked using the ECN mechanism rather than 
dropped.

AQM algorithms can be exceedingly simple, or they can be rather sophisticated.  
Increased levels of sophistication in both the AQM and the endpoint's 
congestion control algorithm may be used to increase the "network power" 
actually obtained.  The required level of complexity for each, achieving 
reasonably good results, is however quite low.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] rpm (was: on incorporating as an educational institution(s)?)

2021-07-10 Thread Jonathan Morton
> On 11 Jul, 2021, at 1:15 am, Kenneth Porter  wrote:
> 
> What is "rpm"? I only know of the Redhat Package Manager and revolutions per 
> minute. I don't see it explained on the mailing list page or in the mailing 
> list postings.

It has been discussed recently.  It is "Rounds Per Minute", Apple's new measure 
of network responsiveness.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Little's Law mea culpa, but not invalidating my main point

2021-07-09 Thread Jonathan Morton
> On 10 Jul, 2021, at 2:01 am, Leonard Kleinrock  wrote:
> 
> No question that non-stationarity and instability are what we often see in 
> networks.  And, non-stationarity and instability are both topics that lead to 
> very complex analytical problems in queueing theory.  You can find some 
> results on the transient analysis in the queueing theory literature 
> (including the second volume of my Queueing Systems book), but they are 
> limited and hard. Nevertheless, the literature does contain some works on 
> transient analysis of queueing systems as applied to network congestion 
> control - again limited. On the other hand, as you said, control theory 
> addresses stability head on and does offer some tools as well, but again, it 
> is hairy. 

I was just about to mention control theory.

One basic characteristic of Poisson traffic is that it is inelastic, and 
assumes there is no control feedback whatsoever.  This means it can only be a 
valid model when the following are both true:

1: The offered load is *below* the link capacity, for all links, averaged over 
time.

2: A high degree of statistical multiplexing exists.

If 1: is not true and the traffic is truly inelastic, then the queues will 
inevitably fill up and congestion collapse will result, as shown from ARPANET 
experience in the 1980s; the solution was to introduce control feedback to the 
traffic, initially in the form of TCP Reno.  If 2: is not true then the traffic 
cannot be approximated as Poisson arrivals, regardless of load relative to 
capacity, because the degree of correlation is too high.

Taking the iPhone introduction anecdote as an illustrative example, measuring 
utilisation as very close to 100% is a clear warning sign that the Poisson 
model was inappropriate, and a control-theory approach was needed instead, to 
capture the feedback effects of congestion control.  The high degree of 
statistical multiplexing inherent to a major ISP backhaul is irrelevant to that 
determination.

Such a model would have found that the primary source of control feedback was 
human users giving up in disgust.  However, different humans have different 
levels of tolerance and persistence, so this feedback was not sufficient to 
reduce the load sufficiently to give the majority of users a good service; 
instead, *all* users received a poor service and many users received no usable 
service.  Introducing a technological control feedback, in the form of packet 
loss upon overflow of correctly-sized queues, improved service for everyone.

(BTW, DNS becomes significantly unreliable around 1-2 seconds RTT, due to 
protocol timeouts, which is inherited by all applications that rely on DNS 
lookups.  Merely reducing the delays consistently below that threshold would 
have improved perceived reliability markedly.)

Conversely, when talking about the traffic on a single ISP subscriber's 
last-mile link, the Poisson model has to be discarded due to criterion 2 being 
false.  The number of flows going to even a family household is probably in the 
low dozens at best.  A control-theory approach can also work here.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Abandoning Window-based CC Considered Harmful (was Re: Bechtolschiem)

2021-07-08 Thread Jonathan Morton
> On 8 Jul, 2021, at 4:29 pm, Matt Mathis via Bloat 
>  wrote:
> 
> That said, it is also true that multi-stream BBR behavior is quite 
> complicated and needs more queue space than single stream.   This complicates 
> the story around the traditional workaround of using multiple streams to 
> compensate for Reno & CUBIC lameness at larger scales (ordinary scales 
> today).Multi-stream does not help BBR throughput and raises the queue 
> occupancy, to the detriment of other users.

I happen to think that using multiple streams for the sake of maximising 
throughput is the wrong approach - it is a workaround employed pragmatically by 
some applications, nothing more.  If BBR can do just as well using a single 
flow, so much the better.

Another approach to improving the throughput of a single flow is high-fidelity 
congestion control.  The L4S approach to this, derived rather directly from 
DCTCP, is fundamentally flawed in that, not being fully backwards compatible 
with ECN, it cannot safely be deployed on the existing Internet.

An alternative HFCC design using non-ambiguous signalling would be 
incrementally deployable (thus applicable to Internet scale) and naturally 
overlaid on existing window-based congestion control.  It's possible to imagine 
such a flow reaching optimal cwnd by way of slow-start alone, then "cruising" 
there in a true equilibrium with congestion signals applied by the network.  In 
fact, we've already shown this occurring under lab conditions; in other cases 
it still takes one CUBIC cycle to get there.  BBR's periodic probing phases 
would not be required here.

> IMHO, two approaches seem to be useful:
> a) congestion-window-based operation with paced sending
> b) rate-based/paced sending with limiting the amount of inflight data

So this corresponds to approach a) in Roland's taxonomy.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Really getting 1G out of ISP?

2021-07-07 Thread Jonathan Morton
> On 7 Jul, 2021, at 12:27 pm, Wheelock, Ian  wrote:
> 
> It is entirely possible through the mechanics of DOCSIS provisioning that AQM 
> could be enabled or disabled on different CMs or groups of CMs. Doing so 
> would be rather petty and may add additional unnecessary complexity to the 
> provisioning system. Users that own their CMs are still paying for the 
> internet access with the specific ISP, so would likely expect equivalent 
> performance.

Entirely true, but for the ISP the matter of whether the subscriber is using a 
rented or self-owned modem is not entirely petty - it is the difference of a 
line item on the monthly bill.  I'm sure you can see how the perverse 
incentives arise with that.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Credit and/or collaboration on a responsiveness metric?

2021-07-06 Thread Jonathan Morton
> On 6 Jul, 2021, at 2:21 am, Matt Mathis  wrote:
> 
> The rounds based responsiveness metric is awesome!   There are several 
> slightly different versions, with slightly different properties
> 
> I would like to write a little paper (probably for the IAB workshop), but 
> don't want to short change anybody else's credit, or worse, scoop somebody 
> else's work in progress.   I don't really know if I am retracing somebody 
> else's steps, or on a parallel but different path (more likely).   I would be 
> really sad to publish something and then find out later that I trashed some 
> PhD students' thesis

It's possible that I had some small influence in originating it, although Dave 
did most of the corporate marketing.

My idea was simply to express delays and latencies as a frequency, in Hz, so 
that "bigger numbers are better", rather than always in milliseconds, where 
"smaller numbers are better".  The advantage of Hz is that you can directly 
compare it to framerates of video or gameplay.

Conversely, an advantage of "rounds per minute" is that you don't need to deal 
with fractions or rounding for relatively modest and common levels of bloat, 
where latencies of 1-5 seconds are typical.

I'm not overly concerned with taking credit for it, though.  It's a reasonably 
obvious idea to anyone who takes a genuine interest in this field, and other 
people did most of the hard work.

> Please let me know if you know of anybody else working in this space, of any 
> publications that might be in progress or if people might be interested in 
> another collaborator.

There are two distinct types of latency that RPM can be used to measure, and I 
have written a short Internet Draft describing the distinction:


https://www.ietf.org/archive/id/draft-morton-tsvwg-interflow-intraflow-delays-00.html

Briefly, "inter-flow delays" (or BFID) are what you measure with an independent 
latency-measuring flow, and "intra-flow delays" (or WFID) are what you measure 
by inserting latency probes into an existing flow (whether at the protocol 
level with HTTP2, or by extracting it from existing application activity).  The 
two typically differ when the path bottleneck has a flow-isolating queue, or 
when the application flow experiences loss and retransmission recovery.

I think both measures are important in different contexts.  An individual 
application may be concerned with its own intra-flow delay, as that determines 
how quickly it can respond to changes in network conditions or user intent.  
Network engineers should be concerned with inter-flow delays, as those 
determine what effect a bulk application load has on other, more 
latency-sensitive applications.  The two are also optimally controlled by 
different mechanisms - FQ versus AQM - which is why the combination of the two 
is so powerful.

Feel free to use material from the above with appropriate attribution.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Bechtolschiem

2021-07-02 Thread Jonathan Morton
> On 2 Jul, 2021, at 7:59 pm, Stephen Hemminger  
> wrote:
> 
> In real world tests, TCP Cubic will consume any buffer it sees at a
> congested link. Maybe that is what they mean by capture effect.

First, I'll note that what they call "small buffer" corresponds to about a 
tenth of a millisecond at the port's link rate.  This would be ludicrously 
small at Internet scale, but is actually reasonable for datacentre conditions 
where RTTs are often in the microseconds.

Assuming the effect as described is real, it ultimately stems from a burst of 
traffic from a particular flow arriving at a queue that is *already* full.  
Such bursts are expected from ack-clocked flows coming out of 
application-limited mode (ie. on completion of a disk read), in slow-start, or 
recovering from earlier losses.  It is also possible for a heavily coalesced 
ack to abruptly open the receive and congestion windows and trigger a send 
burst.  These bursts occur much less in paced flows, because the object of 
pacing is to avoid bursts.

The queue is full because tail drop upon queue overflow is the only congestion 
signal provided by the switch, and ack-clocked capacity-seeking transports 
naturally keep the queue as full as they can - especially under high 
statistical multiplexing conditions where a single multiplicative decrease 
event does not greatly reduce the total traffic demand. CUBIC arguably spends 
more time with the queue very close to full than Reno does, due to the plateau 
designed into it, but at these very short RTTs I would not be surprised if 
CUBIC is equivalent to Reno in practice.

The solution is to keep some normally-unused space in the queue for bursts of 
traffic to use occasionally.  This is most naturally done using ECN applied by 
some AQM algorithm, or the AQM can pre-emptively and selectively drop packets 
in Not-ECT flows.  And because the AQM is more likely to mark or drop packets 
from flows that occupy more link time or queue capacity, it has a natural 
equalising effect between flows.

Applying ECN requires some Layer 3 awareness in the switch, which might not be 
practical.  A simple alternative it to drop packets instead.  Single packet 
losses are easily recovered from by retransmission after approximately one RTT. 
 There are also emerging techniques for applying congestion signals at Layer 2, 
which can be converted into ECN signals at some convenient point downstream.

However it is achieved, the point is that keeping the *standing* queue down to 
some fraction of the total queue depth reserves space for accommodating those 
bursts which are expected occasionally in normal traffic.  Because those bursts 
are not lost, the flows experiencing them are not disadvantaged and the 
so-called "capture effect" will not occur.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Apple WWDC Talks on Latency/Bufferbloat

2021-06-11 Thread Jonathan Morton
> On 11 Jun, 2021, at 10:14 pm, Nathan Owens  wrote:
> 
> round-trips per minute

Wow, one of my suggestions finally got some traction.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Fwd: Traffic shaping at 10~300mbps at a 10Gbps link

2021-06-07 Thread Jonathan Morton
> On 7 Jun, 2021, at 8:28 pm, Rich Brown  wrote:
> 
> Saw this on the lartc mailing list... For my own information, does anyone 
> have thoughts, esp. for this quote:
> 
> "... when the speed comes to about 4.5Gbps download (upload is about 
> 500mbps), chaos kicks in. CPU load goes sky high (all 24x2.4Ghz physical 
> cores above 90% - 48x2.4Ghz if count that virtualization is on)..."

This is probably the same phenomenon that limits most cheap CPE devices to 
about 100Mbps or 300Mbps with software shaping, just on a bigger scale due to 
running on fundamentally better hardware.

My best theory to date on the root cause of this phenomenon is a throughput 
bottleneck between the NIC and the system RAM via DMA, which happens to be 
bypassed by a hardware forwarding engine within the NIC (or in an external 
switch chip) when software shaping is disabled.  I note that 4.5Gbps is close 
to the capacity of a single PCIe v2 lane, so checking the topology of the NIC's 
attachment to the machine might help to confirm my theory.

To avoid the problem, you'll either need to shape to a rate lower than the 
bottleneck capacity, or eliminate the unexpected bottleneck by implementing a 
faster connection to the NIC that can support wire-speed transfers.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Educate colleges on tcp vs udp

2021-05-27 Thread Jonathan Morton
> On 27 May, 2021, at 10:42 am, Hal Murray  
> wrote:
> 
> I would back up.  You need to understand how networks work before discussing 
> TCP or UDP.
> 
> The internet is not like a phone system.  There are no connections within the 
> network and hence no reserved bandwidth and nothing like a busy signal to 
> tell 
> you that the network is full.  (There are host-host connections, but the 
> network doesn't know anything about them.)  Packets are delivered on a 
> best-efforts basis.  They may be dropped, delayed, mangled, or duplicated.

You're right - the distinction between Bell and ARPA networking is a crucial 
foundation topic.

A discussion of the basic 10base Ethernet PHY (and how that fundamentally 
differs from the 8kHz multiplex of a traditional telephone network) might be 
helpful, since the intended audience already understands things like 
modulation.  Once that is established, you can talk about how reliable stream 
transports are implemented on top of an ARPA-style network, using Ethernet as a 
concrete example.

There are a lot of gritty details about how IP and TCP work that can be glossed 
over for a fundamental understanding, and maybe filled in later.  Things like 
Diffserv, the URG pointer, option fields, and socket timeouts are not relevant 
topics.  There's no need to actually hide them from a header diagram, but just 
highlight the fields that are fundamental to getting a payload from A to B.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] AQM & Net Neutrality

2021-05-25 Thread Jonathan Morton
> On 24 May, 2021, at 10:18 pm, Stuart Cheshire via Bloat 
>  wrote:
> 
> When first class passengers board the plane first, all economy passengers 
> wait a little bit longer as a result.

Technically, they all get to the runway at the same time anyway; the 
first-class pax just get out of the terminal to sit in their airline seats 
waiting for longer, while the more congested cattle-class cabin sorts itself 
out.  If the latter process were optimised better, the first-class passengers 
might actually end up waiting less, and pretty much everyone would benefit 
accordingly.

Where first-class passengers *do* have an advantage is in priority lanes at 
check-in and security.  It means they can turn up at the airport later to catch 
the same flight, without fear of missing it and without having to spend 
unnecessary hours in duty-free hell.  They also get posher waiting lounges with 
"free" food.  It is that sort of atmosphere that Net Neutrality advocates 
object to in computer networking.

I believe NN advocates will respond positively to concrete signs of improvement 
in perceived consumer fairness and reduction of costs to consumers.  I also 
believe that implementing AQM well is a key enabler towards those improvements. 
 That is probably the right perspective for "selling" AQM to them.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] AQM & Net Neutrality

2021-05-24 Thread Jonathan Morton
>> Maybe the worries I have heard just points out the need for more 
>> education/awareness about what delay is and why things like AQM are not 
>> prioritization/QoS? I appreciate any thoughts.
> 
> I'm pleased to help with education in this area.  The short and simplistic 
> answer would be that AQM treats all traffic going through it equally; the 
> non-interactive traffic *also* sees a reduction in latency; though many 
> people won't viscerally notice this, they can observe it if they look 
> closely.  More importantly, it's not necessary for traffic to make any sort 
> of business or authentication arrangement in order to benefit from AQM, only 
> comply with existing, well-established specifications as they already do.

There is one more point I'd like to touch on up front.  Net Neutrality first 
became a concern with file-sharing "swarm" protocols, and then with 
video-on-demand services.  The common feature of these from a technical 
perspective, is high utilisation of throughput capacity, to the detriment of 
other users sharing the same back-end and head-end ISP infrastructure.

Implementing AF-AQM or FQ-AQM within the backhaul and head-end equipment, not 
to distinguish individual 5-tuple flows but merely traffic associated with 
different subscribers, would fairly share out back-end and head-end capacity 
between subscribers.  This would reduce the pressure on the ISP to implement 
policies and techniques that violate Net Neutrality and/or are otherwise 
unpopular with consumers, such as data caps.  This assumes (as I believe has 
been represented in some official forums) that these measures are due to 
technical needs rather than financial greed.

I'm aware of some reasonably fast equipment that already implements AF-AQM 
commercially.  My understanding is that similar functionality can also be added 
to many recent cable head-ends by a firmware upgrade.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] AQM & Net Neutrality

2021-05-24 Thread Jonathan Morton
mply hold bulk traffic to its "fair share", and keep it 
out of the way of interactive traffic, without also reducing the delay to the 
bulk traffic flows.  I would suggest that if you implement FQ, you can also 
usually implement AQM on top with little difficulty.

Please do ask for further clarification if that would be helpful.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Educate colleges on tcp vs udp

2021-05-23 Thread Jonathan Morton
> On 23 May, 2021, at 9:47 pm, Erik Auerswald  
> wrote:
> 
> As an additional point to consider when pondering whether to
> use TCP or UDP:
> 
> To mitigate that simple request-response protocols using UDP
> lend themselves to being abused for reflection and amplification…

I suspect such considerations are well beyond the level of education requested 
here.  I think what was being asked for was "how do these protocols work, and 
why do they work that way, in language suitable for people working in a 
different field", rather than "which one should I use for X application".

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Educate colleges on tcp vs udp

2021-05-23 Thread Jonathan Morton
> On 21 May, 2021, at 9:01 am, Taraldsen Erik  wrote:
> 
> I'm getting some traction with my colleges in the Mobile department on 
> measurements to to say something about user experience.  While they are 
> coming around to the idea, they have major gaps in tcp/udp/ip understanding.  
> I don't have the skill or will to try and educate them.
> 
> Is there good education out there - preferably in the form of an video - 
> which I can send to my co workers?  The part of tcp using ack's is pure magic 
> to them.  They really struggle to grasp the concept.  With so basic lack of 
> understanding it is hard to have a meaningful discussion about loss, latency 
> an buffering.
> 
> I don't mean to talk them down to much, they are really good with the radio 
> part of their job - but the transition into seeing tcp and radio together is 
> very hard on them.

I don't have a video link to hand, but let's tease out the major differences 
between these three protocols:

IP (in both v4 and v6 variants) is all about getting a package of data to a 
particular destination.  It works rather like a postal system.  The package has 
a sender's address and a recipient's address, and the routers take care of 
getting it to the latter.  Most packages get through, but for various reasons 
some packages can be lost, for example if the sorting office (queue) is full of 
traffic.  Some packages are very small (eg. a postcard), some very large (eg. a 
container load), and some in between.

UDP is an "unreliable datagram" protocol.  You package it up in an IP wrapper, 
send it, and *usually* it gets to the recipient.  It has an additional "office" 
address, as the postal system only gets the package to the right building.  If 
it doesn't arrive, you don't get any notification about that - which is why it 
is "unreliable".  Each package also stands on its own without any relationship 
to others, which is why it is a "datagram".  Most UDP packets are small to 
medium in size.

TCP is a "reliable stream" protocol.  You use it when you have a lot of data to 
send, which won't fit into a single datagram, or when you need to know whether 
your data arrived safely or not.  To do this, you use the biggest, 
container-sized packages the post office supports, and you number them in 
sequence so you know which ones come first.  The recipient and the post office 
both have regulations so you can't have too many of these huge packages in the 
system at once, and they reserve the right to discard the excess so they can 
function properly (this is "congestion control").  So you arrange for the 
recipient to send the containers back empty when they've been received (they 
collapse to a small size when empty), and then you know there's room in the 
system for it to be sent out full again, with a fresh sequence number (this is 
the "stream").  And if you notice that a particular container *didn't* come 
back in the expected sequence, you infer that it got lost somewhere and send a 
replacement for its contents (making the delivery "reliable").

In fact, the actual containers are not sent back, but an acknowledgement 
postcard basically saying "all containers up to XXX arrived safely, we have 
room for YYY more, and the post office told us to tell you to slow down the 
sending rate because they're getting overwhelmed."  Some of these postcards may 
themselves get lost in the system, but as long as some *do* get through, the 
sender knows all is well.

It's common to use TCP for transferring files or establishing a persistent 
command-and-control connection.  It's common to use UDP for simple 
request-response applications (where both the request and response are small) 
and where timeliness of delivery is far more important than reliability (eg. 
multiplayer games, voice/video calls).

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [EXTERNAL] Re: Terminology for Laypeople

2021-05-17 Thread Jonathan Morton
> On 13 May, 2021, at 12:10 am, Michael Richardson  wrote:
> 
> But, I'm looking for terminology that I can use with my mother-in-law.

Here's a slide I used a while ago, which seems to be relevant here:



The important thing about the term "quick" in this context is that throughput 
capacity can contribute to it in some circumstances, but is mostly irrelevant 
in others.  For small requests, throughput is irrelevant and quickness is a 
direct result of low latency.

For a grandmother-friendly analogy, consider what you'd do if you wanted milk 
for your breakfast cereal, but found the fridge was empty.  The ideal solution 
to this problem would be to walk down the road to the village shop and buy a 
bottle of milk, then walk back home.  That might take about ten minutes - 
reasonably "quick".  It might take twice that long if you have to wait for 
someone who wants to scratch off a dozen lottery tickets right at the counter 
while paying by cheque; it's politer for such people to step out of the way.

My village doesn't have a shop, so that's not an option.  But I've seen dairy 
tankers going along the main road, so I could consider flagging one of them 
down.  Most of them ignore the lunatic trying to do that, and the one that does 
(five hours later) decides to offload a thousand gallons of milk instead of the 
pint I actually wanted, to make it worth his while.  That made rather a mess of 
my kitchen and was quite expensive.  Dairy tankers are set up for "fast" 
transport of milk - high throughput, not optimised for latency.

The non-lunatic alternative would be to get on my bicycle and go to the 
supermarket in town.  That takes about two hours, there and back.  It takes me 
basically the same amount of time to fetch that one bottle of milk as it would 
to conduct a full shopping trip, and I can't reduce that time at all without 
upgrading to something faster than a bicycle, or moving house to somewhere 
closer to town.  That's latency for you.

 - Jonathan Morton___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Terminology for Laypeople

2021-05-16 Thread Jonathan Morton
> On 17 May, 2021, at 8:18 am, Simon Barber  wrote:
> 
> How’s that?

It's a wall of text full of technical jargon.  It seems to be technically 
correct, but probably not very useful for the intended context.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Terminology for Laypeople

2021-05-16 Thread Jonathan Morton
> On 17 May, 2021, at 12:33 am, Jonathan Morton  wrote:
> 
> The delay is caused by the fact that the product already in the pipeline has 
> already been bought by the hardware store, and thus contractually the loggers 
> can't divert it to an individual customer like me.

The reason this part of the analogy is relevant (and why I set up the hardware 
store's representative buying the branches at the felling stage) is because in 
internet traffic I don't want just any old data packets, I need the ones that 
specifically relate to the connection I opened.

We could say for the sake of the analogy that the hardware store is buying all 
the pine and spruce, and the felling team is thus working only on those trees, 
but I want a birch tree to fuel my sauna (since it's in less demand, the price 
is lower).  That also makes it easier to identify my branches as they go 
through the pipeline.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Terminology for Laypeople

2021-05-16 Thread Jonathan Morton
> On 16 May, 2021, at 11:44 pm, Michael Richardson  wrote:
> 
> Your analogy is definitely the result of optimizing for batches rather than 
> latency.

I really don't know how you got there from here.  What I described is basically 
a pipeline process, not batch processing.  The delay is caused by the fact that 
the product already in the pipeline has already been bought by the hardware 
store, and thus contractually the loggers can't divert it to an individual 
customer like me.

You can think of one bag of firewood as representing a packet of data.  I've 
requested a particular number of such bags to fill my trailer.  Until my 
trailer is full, my request is not satisfied.  The hardware store is just 
taking whatever manufacturing capacity is available; their warehouse is *huge*.

We can explore the analogy further by changing some of the conditions:

1: If the felling of trees was the bottleneck of the operation, such that the 
trimming, chopping and bagging could all keep up with it, then the delay to me 
would be much less because I wouldn't have to wait for various backlogs (of 
complete trees, branches, and piles of firewood) belonging to the hardware 
store to be dealt with first.  Processing each tree doesn't take very long, 
there's just an awful lot of them in this patch of forest.

1a: If the foreman told the felling team to take a tea break when a backlog 
built up, that would have nearly the same effect.  That's what an AQM does.

2: If the hardware store wasn't involved at all, the bags of firewood would be 
waiting, ready to be sold.  I'd be done in the time it took to load the bags 
into my trailer.

3: If the loggers sold the *output* of the process to the hardware store, 
rather than having them reserve it at the head of the pipeline, then I might 
only have to wait for the throughput of of the operation to produce what I 
needed, and load it directly into my trailer.  *That* would be just-in-time 
manufacturing.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Terminology for Laypeople

2021-05-16 Thread Jonathan Morton
> On 16 May, 2021, at 9:48 pm, john  wrote:
> 
> After watching Dave's YouTube video, it seems to me, the congestion of 
> the packets which Dave was explaining is equivalent to sushi plats on 
> conveyor stuck on the route and colliding each other on the conveyor 
> since the conveyor keeps bringing more packets one after another to the 
> collision point, then plats overflow from the conveyor and dropped on 
> the floor.
> 
> So now, my question is the picture I described above is close to what 
> bufferbloat is? Or I am still very far from understanding? If I am still 
> far from understanding, will you be able to explain it to me, the 
> laypeople, using the sushi or donuts conveyor? Is the problem the speed 
> adjustment of the conveyor? Or too many plates or donuts are placed on 
> the conveyor? If so, why the rate or speed of each factors can not be 
> adjusted? I even wonder if you could explain it using the door to door 
> package delivery service since you are talking about delivering packets.

Here's an analogy for you:

Today there is a logging operation going on just up the road - not unusual in 
my part of the world.  They have a team felling trees, another team trimming 
off the branches, and the trunks are then stacked for later delivery to the 
sawmill (*much* later - they have to season first).  The branches are fed into 
a chopping machine which produces firewood and mulch, which is then weighed and 
bagged for immediate sale.

I need firewood for my sauna stove.  I know that if I load my trailer full of 
firewood, it'll last me about a year.  I figure I'll pay these guys a visit, 
and it shouldn't take more than half an hour of my time to get what I need.

Under normal circumstances, that would be true.  However, the hardware store in 
the town an hour away has also chosen today to replenish its stock of firewood, 
and they have a representative on site who's basically buying the branches from 
every tree as it comes down; every so often a big van turns up and collects the 
product.  He graciously lets me step in and buy the branches off one tree for 
my own use, and they're tagged as such by the loggers.

So instead of just loading ready-made bags of firewood into my trailer, I have 
to wait for the trimming team to get around to taking the branches off "my" 
tree which is waiting behind a dozen others.  The branches then go into a big 
stack of branches waiting for the chopping machine.  When they eventually get 
around to chopping those, the firewood is carefully put in a separate pile, 
waiting for the weighing and bagging.

It takes a full hour before I have the branches from "my" tree in a useful 
format for firing a sauna stove and in my trailer.  Which is now only half 
full.  To fill it completely, I have to go through the entire process again 
from the beginning - only the felling team has been going gangbusters and there 
are now *twenty* trees waiting for trimming.

I planned for half an hour.  It actually took me three hours to get my 
firewood.  Not for lack of throughput - that was one pretty effective logging 
operation - but because of the *queues*.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Questions for Bufferbloat Wikipedia article

2021-04-06 Thread Jonathan Morton
> On 7 Apr, 2021, at 12:30 am, Sebastian Moeller  wrote:
> 
> I still think that it is not completely wrong to abstractly say BBR evaluates 
> RTT changes as function of the current sending rate to probe the bottlenecks 
> capacity (and adjust its sending rate based on that estimated capacity), but 
> that might either indicate I am looking at the whole thing at too abstract a 
> level, or, as I fear, that I am simply misunderstanding BBR's principle of 
> operation...

It might be more accurate to say that it estimates the delivery rate at the 
receiver by observing the ack stream, and aims to match that with the send 
rate.  There is some periodic probing upwards to see if a higher delivery rate 
is possible, followed by a downwards drain cycle which, I think, pays some 
attention to the observed RTT.  And there is also a cwnd mechanism overlaid as 
a safety valve.

Overall, it's very much a hybrid approach.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] how to ecn again on osx and ios!!!

2021-03-09 Thread Jonathan Morton
> On 9 Mar, 2021, at 10:38 pm, Dave Taht  wrote:
> 
> sudo sysctl -w net.inet.tcp.disable_tcp_heuristics=1

Now that might well be the missing link.  I think we missed it before since it 
doesn't have "ecn" in its name.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] HardenedBSD implementation of CAKE

2021-03-02 Thread Jonathan Morton
> On 2 Mar, 2021, at 2:59 am, Dave Taht  wrote:
> 
> My major doubting point about a port was the
> resolution of the kernel clock. Linux has a high quality hirres clock,
> BSDs didn't seem capable of scheduling on less than a 1ms tick at the
> time I last paid attention.

This is actually something Cake's shaper can already cope with.  I did some 
testing on an ancient PC that didn't have HPET hardware, so timer interrupts 
only had 1ms resolution even on Linux.  This merely results in small bursts of 
traffic at 1ms intervals, which collectively add up to the configured rate.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Make-wifi-fast] [Cake] Fwd: [Galene] Dave on bufferbloat and jitter at 8pm CET Tuesday 23

2021-02-24 Thread Jonathan Morton
> On 24 Feb, 2021, at 5:19 pm, Taraldsen Erik  wrote:
> 
> Do you have a subscription with rate limitations?  The PGW (router which 
> enforces the limit) is a lot more latency friendly than if you are radio 
> limited.  So it may be beneficial to have a "slow" subscription rather than 
> "free speed" then it comes to latency.  Slow meaning lower subscrption rate 
> than radio rate.

This is actually something I've noticed in Finland with DNA.  The provisioning 
shaper they use for the "poverty tariff" is quite well debloated (which was 
very much not the case some years ago).  However, there's no tariff at any 
convenient level between 1Mbps (poverty tariff) and 50Mbps (probably radio 
limited on a single carrier).

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] uk and canada starlink beta

2021-01-23 Thread Jonathan Morton
> On 23 Jan, 2021, at 6:44 pm, Jonathan Foulkes  
> wrote:
> 
> Looking forward to this one. Any recommended settings for Cake on this 
> service?
> 
> Is target RTT of ‘Internet’ (100ms) still appropriate?
> Oceanic seems a bit high (300ms).

I would say so, since the inherent path latency is (reportedly) similar to a 
terrestrial path and much shorter than a geostationary bounce.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] UniFi Dream Machine Pro

2021-01-22 Thread Jonathan Morton
> On 22 Jan, 2021, at 11:09 pm, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> As Sebastian says, the source of lower performance when using SQM on
> some boxes is the traffic shaper, and sometimes the lack of hardware
> offloads.

I have a strong suspicion that on some hardware, the offload engine & switch is 
connected to the SoC through a link that is much slower than the Ethernet ports 
exposed to the outside.  As long as traffic stays within the engine, it can run 
at line rate, but engaging rich software measures requires stuffing it all 
through the narrower link.  This is independent of the CPU's capabilities and 
is purely an I/O bottleneck.

In this particular case, I believe the router portion of the Dream Machine is 
natively a Gigabit Ethernet device, for which good IPsec and SQM performance at 
800Mbps is reasonably expected.  The pair of 10G ports are part of the switch 
portion, and thus intended to support LAN rather than WAN traffic.  Think of it 
as equivalent to attaching a Raspberry Pi 4 (which has native GigE) to a switch 
with a pair of 10G "uplink" ports for daisy-chaining to other switches.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos

2021-01-09 Thread Jonathan Morton
> On 10 Jan, 2021, at 7:39 am, Erik Auerswald  
> wrote:
> 
> In my experience, asking about token-bucket algorithm details is often
> a sign for the asker to not see the forest for the trees.

IMHO, token-bucket is an obsolete algorithm that should not be used.  Like RED, 
it requires tuning parameters whose correct values are not obvious to the 
typical end-user, nor even to automatic algorithms.  Codel replaces RED, and 
virtual-clock algorithms can similarly replace token-bucket.

Token-bucket is essentially a credit-mode algorithm.  The notional "bucket" is 
replenished at regular (frequent) intervals by an amount proportional to the 
configured rate of delivery.  Traffic may be delivered as long as there is 
sufficient credit in the bucket to cover it.  This inherently leads to the 
delivery of traffic bursts at line rate, rather than delivery rate, and the 
size of those bursts may be as large as the bucket.  Conversely, if the bucket 
is too small, then scheduling and other quantum effects may conspire to reduce 
achievable throughput.  Since the bucket size must be chosen, manually, in 
advance, it is almost always wrong (and usually much too large).

Many token-bucket implementations further complicate this by having two nested 
token-buckets.  A larger bucket is replenished at exactly the configured rate 
from an infinite source, while a smaller bucket is replenished at some higher 
rate from the larger bucket.  This reduces the incidence of line-rate bursts 
and accommodates Reno-like sawtooth behaviour, but as noted, has the potential 
to seriously confuse BBR if the buckets are too large.  BBRv2 may handle it 
better if you add ECN and AQM, as the latter will help to correct bad 
estimations of throughput capacity resulting from the buckets initially being 
drained.

The virtual-clock algorithm I implemented in Cake is essentially a deficit-mode 
algorithm.  During any continuous period of traffic delivery, defined as 
finding a packet in the queue when one is scheduled to deliver, the time of 
delivering the next packet is updated after every packet is delivered, by 
calculating the serialisation time of that packet and adding it to the previous 
delivery schedule.  As long as that time is in the past, the next packet may be 
delivered immediately.  When it goes into the future, the time to wait before 
delivering the next packet is precisely known.  Hence bursts occur only due to 
quantum effects and are automatically of the minimum size necessary to maintain 
throughput, without any configuration (explicit or otherwise).

Since the scenario here involves an OpenWRT device, you should be able to 
install Cake on it, if it isn't there already.  Please give it a try and let us 
know if it improves matters.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Make-wifi-fast] [bbr-dev] D* tcp looks pretty good, on paper

2021-01-08 Thread Jonathan Morton
> On 8 Jan, 2021, at 5:38 pm, Neal Cardwell via Make-wifi-fast 
>  wrote:
> 
> What did you have in mind by "variable links" here? (I did not see that term 
> in the paper.)

Wifi and LTE tend to vary their link characteristics a lot over time.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] my thx to spacex (and kerbal space program) forcheering me up all year

2021-01-01 Thread Jonathan Morton
> On 2 Jan, 2021, at 1:31 am, David P. Reed  wrote:
> 
> Now, one wonders: why can't Starlink get it right first time?
> 
> It's not like bufferbloat is hard on a single bent pipe hop, which is all 
> Starlink does today.

The bloat doesn't seem to be in Starlink itself, but in the consumer-end modem. 
 This is fixable, just as soon as Starlink put their minds to it, because it's 
based on the same Atheros SoCs as the consumer CPE we're already familiar with.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Good Wi-Fi test programs?

2020-12-06 Thread Jonathan Morton
> On 7 Dec, 2020, at 1:00 am, Rich Brown  wrote:
> 
> I would first do the following "easy tests":
> 
> - Check for conflicting/overlapping Wi-Fi channels. I am fond of the free 
> app, WiFi Analyzer from farproc (http://a.farproc.com/wifi-analyzer) for this 
> test, but there are several similar Android apps. 
> - Compare the signal strength for the DSL modem and the Calix modem, as shown 
> by WiFi Analyzer 
> - Be sure that all computer(s) are using the Calix modem.
> - Use a variety of speed tests: DSLReports, Fast.com, other favorites?
> - Compare speedtest results when the test computer is close to, or far from 
> the router.
> - (If possible) compare the performance for both Wi-Fi and Ethernet
> - Shut off the DSL modem on my way out the door to be sure it's not causing 
> interference or confusing the situation.
> 
> Anything else you'd recommend?

Make sure the customer's devices are using 5GHz rather than 2.4GHz band, where 
possible.  The Calix devices apparently support both and try to perform "band 
steering", but it's worth double checking.

https://www.calix.com/content/calix/en/site-prod/library-html/systems-products/prem/op/p-gw-op/eth-gw/800e-gc-spg/index.htm?toc.htm?76518.htm

I also read while briefly scanning the accessible documentation that Calix 
operates at maximum permitted wifi transmit power and with up to 80MHz RF 
bandwidth.  While this does maximise the range and throughput of an individual 
AP, many such APs in close proximity will see the RF channel as "occupied" by 
each others' transmissions more often than if a lower transmit power were used. 
 The result is that they all shout so much that they can't hear themselves 
think, and clients can't get a word in edgewise to send acks (with generally 
lower transmit power themselves).

You should look for evidence of this while analysing channel occupancy, 
especially in multi-occupancy buildings.  It's probably less of a concern in 
detached or semi-detached housing.

I didn't see any mention of Airtime Fairness technology, which is now a 
highlighted feature on some other manufacturers' products (specifically 
TP-Link).  Ask whether that is present or can be implemented.  You may be able 
to test for it, if you have established a case where wifi is clearly the 
bottleneck, by passing a saturating ECN Capable flow through it and looking for 
CE marks (and/or ECE feedback), since Airtime Fairness comes with built-in 
fq_codel.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] starlink

2020-12-01 Thread Jonathan Morton
> On 1 Dec, 2020, at 3:20 pm, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
>> Jim Gettys made a reddit post on r/Starlink asking for data from beta
>> testers. I am one of those testers. I spun up an Ubuntu VM and did three
>> runs of flent and rrul as depicted in the getting started page. You may
>> find the results here:
>> https://drive.google.com/file/d/1NIGPpCMrJgi8Pb27t9a9VbVOGzsKLE0K/view?usp=sharing
> 
> Thanks for sharing! That is some terrible bloat, though! :(

I imagine it exists in the uplink device rather than the Starlink network 
itself.  Distinct upload and download bloat tests would help in determining 
whether it's at your end or the remote end.  You should be able to use 
dslreports.com/speedtest to determine that.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Adding CAKE "tc qdisc" options to NetworkManager

2020-10-05 Thread Jonathan Morton
> On 5 Oct, 2020, at 7:13 pm, David Collier-Brown  wrote:
> 
> By pure luck, I ended up chatting with one of the NetworkManager chaps, who 
> invited a merge request with the proper parameters for CAKE.
> 
> He wrote
> 
> Currently NM doesn't support configuring CAKE parameters. IOW, if you
> set "root cake bandwidth 100Mbit", you will see in the tc output that
> cake was set but with default parameters.
> 
> Yes, I think it will be useful to have CAKE support in NM, but I can't
> say when it will be implemented. Of course, patches are always
> welcome; if anybody is interested in contributing it, please have a
> look at the work that was done to support SFQ:
> 
> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/b22b4f9101b1cbfde49b65d9e2107e4ae0d817c0
> 
> Sounds like a good job for next weekend, can I get some reviewers for the 
> week after?

I could probably at least glance at it.  How easy is it to set this up in, say, 
Linux Mint?

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Dumb question time 1: using upload and download speeds from dslreports

2020-10-04 Thread Jonathan Morton
> On 4 Oct, 2020, at 6:25 pm, Dave Collier-Brown 
>  wrote:
> 
> When setting my laptop to explicitly use CAKE for an article- and
> recipe-writing effort, I blithely took the download speed and stuffed it
> into
> 
>tc qdisc replace dev enp0s25 root cake docsis ack-filter bandwidth
> 179mbit
> 
> When Iván Baldo kindly suggested I mention ingress, it suddenly struck
> me: I was using the downstream/ingress value for my upstream setting!
> 
> Should I not be using my upload speed, some 13mbit, not 179 ???

For ingress traffic (usually the download direction), you need to redirect the 
ingress traffic to an IFB device and attach an ingress-configured Cake instance 
there.  You would use "ingress" instead of "ack-filter" and your download 
bandwidth.

For egress traffic you should indeed use the upload speed.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] cake + ipv6

2020-09-23 Thread Jonathan Morton
> On 23 Sep, 2020, at 8:36 pm, Daniel Sterling  
> wrote:
> 
> I ran some updates on the xbox and watched iftop. I found that the
> xbox does the following:
> 
> * uses up to four http (TCP port 80) connections at once to download data
> * connects (seemingly randomly) to both ipv4 and ipv6 update hosts
> 
> That means at any given time, the xbox could be downloading solely via
> ipv4, solely via ipv6, or a with mix of the two.
> 
> I believe this means when it's using both v4 and v6, it's getting
> double its "share" of the bandwidth since cake can't tell that the v4
> and v6 traffic is coming from the same LAN host -- is that correct?

It fits my mental model, yes, though obviously the ideal would be to recognise 
that the xbox is a singular machine.  Are you seeing a larger disparity than 
that?  If so, is it even larger than four connections would justify without 
host-fairness?

> I'm using the default "triple-isolate" parameter. I can try switching
> to dual-src/dest host or even plain srchost / dsthost isolation. In
> theory that should limit traffic more per download host, even if cake
> can't determine the LAN host that's doing the downloading, right?

Triple-isolate is designed to function reasonably well when the user can't be 
sure which side of the network is the LAN!  The "dual" modes provide Cake with 
that information explicitly, so may be more reliable in corner cases.

For your topology, eth0 (LAN egress) should get dual-dsthost, and eth1 (WAN 
egress) should get dual-srchost.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] How about a topical LWN article on demonstrating the real-world goodness of CAKE?

2020-09-08 Thread Jonathan Morton
> On 8 Sep, 2020, at 7:48 pm, Matt Mathis via Bloat 
>  wrote:
> 
> To be simplistic, you might just talk about cake vs (bloated) drop tail.  To 
> be thorough, you also need to make the case that cake is better than other 
> AQMs.  This feels like too much for LWN, but silence on other solutions might 
> trigger skeptics.

Personally, my position is:

1: Bloated dumb FIFOs are terrible.

2: Basic AQM is good.  This can be as simple as TBF+WRED; it solves a large 
part of the basic problem by eliminating multi-second queue delays.  In some 
cases this can solve very serious problems, such as DNS lookups failing when 
the link is loaded, quite adequately.  Properly configured, you can keep queue 
delays below the 100ms threshold for reasonable VoIP performance.

3: FQ-AQM is better.  That generally means HTB+fq_codel, but other forms of 
this exist.  It means essentially zero added delay for non-saturating flows.  
It's an easy way to make DNS, VoIP and online gaming work nicely without having 
to restrict data-hungry applications.

4: Cake offers some extra tools and aims to be easier (more intuitive) to 
configure.  Currently, it is the best solution for slow and medium-speed 
broadband (up to 100Mbps), and can also be used at higher speeds with some 
care, mostly regarding device performance.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Other CAKE territory (was: CAKE in openwrt high CPU)

2020-09-03 Thread Jonathan Morton
> On 4 Sep, 2020, at 1:14 am, David Collier-Brown  wrote:
> 
> I'm wondering if edge servers with 1Gb NICs are inside the "CAKE stays 
> relevant" territory?  

Edge servers usually have strong enough CPUs and I/O - by which I mean anything 
from AMD K8 and Intel Core 2 onwards with PCIe attached NICs - to run Cake at 
1Gbps without needing special measures.  I should run a test to see how much I 
can shove through an AMD Bobcat these days - not exactly a speed demon.

We're usually seeing problems with the smaller-scale CPUs found in CPE SoCs, 
which are very much geared to take advantage of hardware accelerated packet 
forwarding.  I think in some cases there might actually be insufficient 
internal I/O bandwidth to get 1Gbps out of the NIC, into the CPU, and back out 
to the NIC again, only through the dedicated forwarding path.  That could 
manifest itself as a lot of kernel time spent waiting for the hardware, and can 
only really be solved by redesigning the hardware.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Jonathan Morton
> On 3 Sep, 2020, at 5:32 pm, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> Yeah, offloading of some sort is another option, but I consider that
> outside of the "CAKE stays relevant" territory, since that will most
> likely involve an entirely programmable packet scheduler.

Offload of *just* shaping could be valuable in itself at higher rates, when 
combined with BQL, as it would avoid having to interact with the CPU-side timer 
infrastructure so much.  It would also not be difficult at all to implement in 
hardware at line rate, even with overhead compensation.  It's the sort of thing 
you could sensibly do with 74-series logic and a lookup table in a cheap SRAM, 
up to millions of PPS, and considerably faster in FPGA or ASIC territory.

I think that's what the questions about combining "unlimited Cake" with some 
other shaper are angling towards, though I suspect that the way Cake's shaper 
is integrated is still better than having an external one in software.

With that said, it's also possible that something a bit lighter than Cake might 
be appropriate at cable speeds.  There is background work in this general area 
going on, so don't despair.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Jonathan Morton
> On 1 Sep, 2020, at 11:04 pm, Sebastian Moeller  wrote:
> 
>> The challenge are the end users, who only understand the silly ’speed’ 
>> metric, and feel anything that lowers that number is a ‘bad’ thing. It takes 
>> effort to get even technical users to get it.
> 
>   I repeatedly fall into that trap...

For a lot of users, I rather suspect that setting 40/10 Mbps would give them 
entirely sufficient speed, and most existing CPE would be able to keep up with 
those settings even with all of Cake's bells and whistles turned on.

The trouble is that that might be 10% of what the cable company is advertising 
to them.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Jonathan Morton
> On 1 Sep, 2020, at 9:45 pm, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> CAKE takes the global qdisc lock.

Presumably this is a default mechanism because CAKE doesn't handle any locking 
itself.

Obviously it would need to be replaced with at least a lock over CAKE's 
complete data structures, taking the lock on each entry point and releasing it 
at each return point, and I assume there is a flag we can set to indicate we do 
so.  Finer-grained locking might be possible, but CAKE is fairly complex so 
that might be hard to implement.  Locking per CAKE instance would at least 
allow running ingress and egress on different CPUs.

Is there an example anywhere on how to do this?

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] cake + ipv6

2020-08-17 Thread Jonathan Morton

On 18/08/2020 06:44, Daniel Sterling wrote:

...is it possible to identify (and thus classify)
plain old bulk downloads, as separate from video streams? They're both
going to use http / https (or possibly QUIC) -- and they're both
likely to come from CDN networks... I can't think of a simple way to
tell them apart.


If there was an easy way to do it, I would already have done so.  We are 
unfortunately hamstrung by some bad design and deployment around 
Diffserv, which might otherwise provide a useful end-to-end visible 
signal here.



Is this enough of a problem that people would try to make a list of
netblocks / prefixes that belong to video vs other CDN content?


It's possible that someone is doing this, but I don't specifically know 
of such a source of information.  It would of course be better to find a 
solution that didn't rely on white/black lists, which have a distressing 
habit of going stale.


But one of the more reliable ways might be to use Autonomous System (AS) 
information.  ASes are an organisational unit used for assigning IP 
address ranges and for routing, and usually correspond to a more-or-less 
significant Internet organisation.  It should be feasible to map an 
observed IP address to an AS, then look up the address blocks assigned 
to that AS, thereby capturing a whole range of related IP addresses.



I do notice video streams are much more bursty than plain downloads
for me, but that may not hold for all users.

That is, for me at least, a video stream may average 5mbps over, say,
1 minute, but it will sit at 0mbps for a while and then burst at
20mbps for a bit.


Correct, YouTube at least likes to fetch a big block of data from disk 
and send it all at once, then rely on the client buffer to tide it over 
while the disk services other requests.  It makes some sense when you 
consider how slow disk seeks are relative to the number of clients they 
need to support, each of which will generally be watching a different 
video (or at least a different part of the same one).


However, this burstiness disappears on the wire just when you would like 
to use it to identify traffic, ie. when the video traffic saturates the 
bandwidth available to it.  If there's only just enough bandwidth, or 
even *less* than what is required, then YouTube sends data continuously 
into the client buffer, trying to keep it as full as possible.


There are no easy answers here.  But I've suggested some things to look 
for and try out.


 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] cake + ipv6

2020-08-17 Thread Jonathan Morton
On Tuesday, 18 August 2020, Daniel Sterling wrote:
> As you know, I'm here cuz I have an xbox and y'all created cake, which
> I am eternally grateful for, since it makes latency go away.
> 
> But I've recently hit an interesting issue --
> 
> Microsoft (and/or akamai, or whatever) has recently started pushing
> updates to the xbox via ipv6 instead of v4.
> 
> As I'm sure you know ipv6 addresses are essentially random on the
> internal LAN as compared to v4 -- a box can grab as many v6 addresses
> as it wants, and I don't believe my linux router can really know which
> box is using which address, can it?
> 
> Which means... ipv6 breaks cake's flow isolation.
> 
> Cake can't throttle all those xbox downloads correctly cuz it doesn't
> know they're all going to/from that one device.
> 
> So I suppose this may be similar to the "bittorrent" problem -- which,
> is there a general solution for that problem?
> 
> In my case the xbox grabs more than its share of bandwidth, which
> means other bulk streaming -- that is to say, youtube and netflix :)
> -- stops working well
> 
> I can think of one general solution -- run more wires to more devices,
> and give devices their own VLAN, and tag / prioritize / deprioritize
> specific traffic that way...
> 
> But.. are there better / more general solutions?

Does this traffic at least have some consistent means of identification, such 
as a port number or a remote address range?  If so, you could use fwmark rules 
and Cake's diffserv3 mode to put that traffic in the Bulk tin, same as with 
BitTorrent.

I suppose it's also possible to make Cake sensitive to Layer 2 addresses (that 
is, the Ethernet address) for the purpose of host isolation.  That is presently 
not implemented, so might take a while to filter through the deployment range.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] How about a topical LWN article on demonstrating the real-world goodness of CAKE?

2020-08-10 Thread Jonathan Morton
> The current best practice seems to be to instantiate cake/SQM on a reasonably 
> fixed rate wan link and select WiFi cards/socs that offer decent airtime 
> fairness.
> Works pretty well in practice...

Yes, AQL does essentially the right thing here, again along the lines of 
limiting the influence of one machine's load on another's performance, and 
completely automatically since it has faurly direct information and control 
over the relevant hardware.  Cake is designed to deal with wired links where 
the capacity doesn't change much, but the true bottleneck is typically not at 
the device exerting control.

On that note, there is a common wrinkle whereby the bottleneck may shift 
between the private last mile link and some shared backhaul in the ISP at 
different times of day and/or days of week.  Locally I've seen it vary between 
20M (small hours, weekday) and 1Mbps (weekend evening).  When Cake is 
configured for one case but the situation is different, the results are 
obviously suboptimal.  I'm actually now trying a different ISP to see if they 
do better in the evenings.

Evenroute's product includes automatic detection of and scheduling for this 
case, assuming that it follows a consistent pattern over a weekly period.  Once 
set up, it is essentially a cronjob adjusting Cake's parameters dynamically, so 
providing a manual setup for the general OpenWRT community should be feasible.  
On “tc qdisc change”, Cake usually doesn't drop any packets, so parameters can 
be changed frequently if you have a reason for it.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] How about a topical LWN article on demonstrating the real-world goodness of CAKE?

2020-08-09 Thread Jonathan Morton
> Are the risks and tradeoffs well enough understood (and visible enough 
> for troubleshooting) to recommend broader deployment?
> 
> I recently gave openwrt a try on some hardware that I ultimately 
> concluded was insufficient for the job.  Fairly soon after changing out 
> my access point, I started getting complaints of Wi-Fi dropping in my 
> household, especially when someone was trying to videoconference.  I 
> discovered that my AP was spontaneously rebooting, and the box was 
> getting hot.

Most CPE devices these days rely on hardware accelerated packet forwarding to 
achieve their published specs.  That's all about taking packets in one side and 
pushing them out the other as quickly as possible, with only minimal support 
from the CPU (likely, new connections get a NAT/firewall lookup, that's all).  
It has the advantages of speed and power efficiency, but unfortunately it is 
also incompatible with our debloating efforts.  So debloated CPE will tend to 
run hotter and with lower peak throughput, which may be noticeable to cable and 
fibre users; VDSL (FTTC) users might have service of 80Mbps or less where this 
effect is less likely to matter.

It sounds like that AP had a very marginal thermal design which caused the 
hardware to overheat as soon as the CPU was under significant load, which it 
can easily be when a shaper and AQM are running on it at high throughput.  The 
cure is to use better designed hardware, though you could also contemplate 
breaking the case open to cure the thermal problem directly.  There are some 
known reliable models which could be collected into a list.  As a rule of 
thumb, the ones based on ARM cores are likely to be designed with CPU 
performance more in mind than those with MIPS.

Cake has some features which can be used to support explicit classification and 
(de)prioritisation of traffic via firewall marking rules, either by rewriting 
the Diffserv field or by associating metadata with packets within the network 
stack (fwmark).  This can be very useful for pushing Bittorrent or WinUpdate 
swarm traffic out of the way.  But for most situations, the default 
flow-isolating behaviour already works pretty well, especially for ensuring 
that one computer's network load has only a bounded effect on any other.  We 
can discuss that in more detail if that would be helpful.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Phoronix: Linux 5.9 to allow FQ_PIE as default

2020-07-15 Thread Jonathan Morton
> On 16 Jul, 2020, at 12:58 am, Michael Yartys via Bloat 
>  wrote:
> 
> Are there any major differences between fq_codel and fq_pie in terms of their 
> performance?

I think some tests were run some time ago which showed significantly better 
behaviour by fq_codel than fq_pie.  In particular, the latter used only a 
single AQM instead of an independent one for each flow.  I'm not sure whether 
it's been changed since then.

The only advantage I can see for PIE over Codel is, possibly, a reduction of 
system load for use of the AQM.  But fq_codel is already pretty efficient so 
that would be an edge case.

In any case, it is already possible to chose any qdisc you like (with default 
parameters) as the default qdisc.  I'm really not sure what the fuss is about.

> And how does the improved fq_codel called cobalt, which is used in cake, 
> stack up?

COBALT has some modifications to basic Codel which, I think, could profitably 
be backported into fq_codel.  It also has a particular extra mode, based on 
BLUE, for dealing with unresponsive traffic (that continued to build queue even 
after lots of ECN signalling and/or Cdel-scheduled packet drops).  It is the 
latter which inspired the name.

For the other major functional component of fq_codel, Cake also has a 
set-associative hash function for allocating flows into queues, which 
substantially reduces the probability of hash collisions in most cases.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] the future belongs to pacing

2020-07-05 Thread Jonathan Morton
> On 5 Jul, 2020, at 9:09 pm, Stephen Hemminger  
> wrote:
> 
> I keep wondering how BBR will respond to intermediaries that aggregate 
> packets.
> At higher speeds, won't packet trains happen and would it not get confused
> by this? Or is its measurement interval long enough that it doesn't matter.

Up-thread, there was mention of patches related to wifi.  Aggregation is 
precisely one of the things that would address.  I should note that the brief 
description I gave glossed over a lot of fine details of BBR's implementation, 
which include careful filtering and conditioning of the data it gathers about 
the network path.

I'm not altogether a fan of such complexity.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] the future belongs to pacing

2020-07-04 Thread Jonathan Morton
> On 4 Jul, 2020, at 8:52 pm, Daniel Sterling  wrote:
> 
> could someone explain this to a lay person or point to a doc talking
> about this more?
> 
> What does BBR do that's different from other algorithms? Why does it
> break the clock? Before BBR, was the clock the only way TCP did CC?

Put simply, BBR directly probes for the capacity and baseline latency of the 
path, and picks a send rate (implemented using pacing) and a failsafe cwnd to 
match that.  The bandwidth probe looks at the rate of returning acks, so in 
fact it's still using the ack-clock mechanism, it's just connected much less 
directly to the send rate than before.

Other TCPs can use pacing as well.  In that case the cwnd and RTT estimate are 
calculated in the normal way, and the send rate (for pacing) is calculated from 
those.  It prevents a sudden opening of the receive or congestion windows from 
causing a huge burst which would tend to swamp buffers.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] FW: [Dewayne-Net] Ajit Pai caves to SpaceX but is still skeptical of Musk's latency claims

2020-06-13 Thread Jonathan Morton
> On 14 Jun, 2020, at 12:15 am, Michael Richardson  wrote:
> 
> They claim they will be able to play p2p first person shooters.
> I don't know if this means e2e games, or ones that middlebox everything into
> a server in a DC.  That's what I keep asking.

I think P2P implies that there is *not* a central server in the loop, at least 
not on the latency-critical path.

But that's not how PvP multiplayer games are typically architected these days, 
largely due to the need to carefully manage the "fog of war" to prevent 
cheating; each client is supposed to receive only the information it needs to 
accurately render a (predicted) view of the game world from that player's 
perspective.  So other players that are determined by the server to be "out of 
sight" cannot be rendered by x-ray type cheat mods, because the information 
about where they are is not available.  The central server has full information 
and performs the appropriate filtering before replicating game state to each 
player.

Furthermore, in a PvP game it's wise to hide information about other players' 
IP addresses, as that often leads to "griefing" tactics such as a DoS attack.  
If you can force an opposing player to experience lag at a crucial moment, you 
gain a big advantage over him.  And there are players who are perfectly happy 
to "grief" members of their own team; I could dig up some World of Tanks videos 
demonstrating that.

It might be more reasonable to implement a P2P communication strategy for a PvE 
game.  The central server is then only responsible for coordinating enemy 
movements.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] FW: [Dewayne-Net] Ajit Pai caves to SpaceX but is still skeptical of Musk's latency claims

2020-06-11 Thread Jonathan Morton
> On 11 Jun, 2020, at 7:03 pm, David P. Reed  wrote:
> 
> So, what do you think the latency (including bloat in the satellites) will 
> be? My guess is > 2000 msec, based on the experience with Apple on ATT 
> Wireless back when it was rolled out (at 10 am, in each of 5 cities I tested, 
> repeatedly with smokeping, for 24 hour periods, the ATT Wireless access 
> network experienced ping time grew to 2000 msec., and then to 4000 by mid day 
> - true lag-under-load, with absolutely zero lost packets!)
>  
> I get that SpaceX is predicting low latency by estimating physical distance 
> and perfect routing in their LEO constellation. Possibly it is feasible to 
> achieve this if there is zero load over a fixed path. But networks aren't 
> physical, though hardware designers seem to think they are.
>  
> Anyone know ANY reason to expect better from Musk's clown car parade?

Speaking strictly from a theoretical perspective, I don't see any reason why 
they shouldn't be able to offer latency that is "normally" below 100ms (to a 
regional PoP, not between two arbitrary points on the globe).  The satellites 
will be much closer to any given ground station than a GEO satellite, the 
latter typically adding 500ms to the path due mostly to physical distance.  All 
that is needed is to keep queue delays reasonably under control, and there's 
any number of AQMs that can help with that.  Clearly ATT Wireless did not 
perform any bufferbloat mitigation at all.

I have no insight or visibility into anything they're *actually* doing, though. 
 Can anyone dig up anything about that?

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] What's a good non-intrusive way to look at bloat (and perhaps things like gout (:-))

2020-06-03 Thread Jonathan Morton
> On 4 Jun, 2020, at 1:21 am, Dave Collier-Brown 
>  wrote:
> 
> We've good tools to measure network performance under stress, by the simple 
> expedient of stressing it, but is there a good approach I could recommend to 
> my company to monitor a bunch of reasonably modern links,  without the 
> measurement significantly affecting their state?
> 
> I don't mind increasing bandwidth usage, but I'm downright grumpy about 
> adding to the service time: I have a transaction that times out for gross 
> slowness if it takes much more that an tenth of a second, and it involves a 
> scatter-gather interaction with at least 10 customers in that time.
> 
> I'm topically interested in bloat, but really we should understand 
> "everything" about our links. If they can get the bloats like cattle, they 
> can probably get the gout, like King Henry the Eighth (;-))
> 
> My platform is Centos 8, and I have lots of Smarter Colleagues to help.

My first advice would be to browse pollere.net for tools - like pping (passive 
ping), which monitors the latency of flows in transit.  That should give you 
some interesting information without adding any load at all.  There is also 
connmon (https://github.com/pollere/connmon).

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] New speed/latency/jitter test site from Cloudflare

2020-06-03 Thread Jonathan Morton
> On 3 Jun, 2020, at 7:48 pm, Dave Taht  wrote:
> 
> I am of course, always interested in how they are measuring latency, and 
> where.

They don't seem to be adding more latency measurements once the download tests 
begin.  So in effect they are only measuring idle latency.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CPU consumption using TC-TBF and TC-POLICE to limit rate

2020-05-26 Thread Jonathan Morton
> On 26 May, 2020, at 12:47 pm, Jose Blanquicet  wrote:
> 
> We have an embedded system with limited CPU resources that acts as
> gateway to provide Internet access from LTE to a private Wi-Fi
> network. Our problem is that the bandwidth on LTE and Wi-Fi links is
> higher than what the system is able to handle thus it reaches 100% of
> CPU load when we perform a simple speed test from a device connected
> to our Wi-Fi Hotspot.
> 
> Therefore, we want to limit the bandwidth to avoid system gets
> saturated is such use-case. To do so, we thought to use the QDISC-TBF
> on the Wi-Fi interface. For instance, to have 10Mbps:
> 
>tc qdisc add dev wlan0 root tbf rate 10mbit burst 12500b latency 50ms
> 
> It worked correctly and maximum rate was limited to 10Mbps. However,
> we noticed that the CPU load added by the TBF was not negligible for
> our system.

Just how limited is the CPU on this device?  I have successfully shaped at 
several tens of Mbps on a Pentium-MMX, where the limiting factor may have been 
the PCI bus rather than the CPU itself.

Assuming your CPU is of that order of capability, I would suggest installing 
Cake using the out-of-tree build process, and the latest stable version of the 
iproute2 tools to configure it.  Start with:

git clone https://github.com/dtaht/sch_cake.git

This provides a more efficient and more effective shaper than TBF, and a more 
effective AQM than a policer, and good flow-isolation properties, all in a 
single bundle that will be more efficient than running two separate components.

Once installed, the following should set it up nicely for you:

tc qdisc replace dev wlan0 root cake bandwidth 10Mbit besteffort flows 
ack-filter

Cake is considered quite a heavyweight solution, but very effective.  If it 
doesn't work well for this particular use case, it may be feasible to backport 
some more recent work which takes a simpler approach, though along similar 
lines.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Does it makes sense to shape traffic with 16Kbit/s up and 16Kbit/s down?

2020-05-04 Thread Jonathan Morton
> On 4 May, 2020, at 6:47 pm, Richard Fröhning  wrote:
> 
> I have a VPN provider which support lzo-compression. If I were to use
> VPN through the 16Kbps it could squeeze out some bytes.
> 
> I guess in that case I shape the tunX interface, right?
> 
> Would the MTU setting be on the usb0 device and/or the tunX?

You should set the qdisc and those options on the *physical* device, not the 
one that carries your uncompressed data.  Don't forget to set up ingress 
shaping as well as egress.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Does it makes sense to shape traffic with 16Kbit/s up and 16Kbit/s down?

2020-05-04 Thread Jonathan Morton
> On 4 May, 2020, at 5:09 pm, Sebastian Moeller  wrote:
> 
> At 16Kbps a full-MTU sized packet will take around
> (1000 ms/sec * (1500 * 8) bits/packet ) / 16000 bits/sec = 750 ms
> 
> This is just to put things into perspective, 16Kbps is going to be both 
> painful and much better than no service at all

Reducing the MTU to 576 bytes is likely to help.  That was commonly done in the 
days of analogue modems, when such low speeds were normal.

I'm fortunate enough to live somewhere where the local ISPs don't limit your 
data transfer, even on the budget subscriptions.  Roughly €25 will buy you 
500Kbps mobile service for three months, and you can use that 500Kbps as much 
as you like.  And that is with the lowest population density in Europe, so the 
per capita cost of covering the country in cell towers is obviously no excuse.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Does it makes sense to shape traffic with 16Kbit/s up and 16Kbit/s down?

2020-05-04 Thread Jonathan Morton
> On 4 May, 2020, at 3:26 pm, Richard Fröhning  wrote:
> 
> And if so, which queue discipline would work best with it?
> 
> Background: I am forced to use my cell phone as uplink and after I
> reach the monthly limit, bandwidth will be reduces to given up/downlink
> speeds.
> 
> I know Surfing websites with those speeds will take forever - however
> it should be enough to send/receive emails and/or use a messenger.

You should be able to do this with Cake.  Unlike most other qdiscs, it will 
automatically adjust several parameters to work nicely with low-speed links, 
because with the built-in shaper it has knowledge of the speed.  I don't think 
I've tested it as low as 16Kbit, but I have used it at 64Kbit.

To keep things simple, you may want to specify "besteffort flows satellite" as 
parameters.  Some of those settings may also be available in a GUI.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [tsvwg] my backlogged comments on the ECT(1) interim call

2020-04-29 Thread Jonathan Morton
> On 29 Apr, 2020, at 12:25 pm, Luca Muscariello  wrote:
> 
> BTW, I hope I made the point about incentives to cheat, and the risks
> for unresponsive traffic for L4S when using ECT(1) as a trusted input.

One scenario that I think hasn't been highlighted yet, is the case of a 
transport which implements 1/p congestion control through CE, but marks itself 
as a "classic" transport.  We don't even have to imagine such a thing; it 
already exists as DCTCP, so is trivial for a bad (or merely ignorant) actor to 
implement.

Such a flow would squeeze out other traffic that correctly responds to CE with 
MD, and would not be "caught" by queue protection logic designed to protect the 
latency of the LL queue (as that has no effect on traffic in the classic 
queue).  It would only be corralled by an AQM which can act to isolate the 
effects of one flow on others; in this case AF would suffice, but FQ would also 
work.

This hazard already exists today.  However, the L4S proposal "legitimises" the 
use of 1/p congestion control using CE, and the subtlety that marking such 
traffic with a specific classifier is required for effective congestion control 
is likely to be lost on people focused entirely on their own throughput, as 
much of the Internet still is.

Using ECT(1) as an output from the network avoids this new hazard, by making it 
clear that 1/p CC behaviour is only acceptable on signals that unambiguously 
originate from an AQM which expects and can handle it.  The SCE proposal also 
inserts AF or FQ protection at these nodes, which serves as a prophylactic 
against the likes of DCTCP being used inappropriately on the Internet.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [tsvwg] my backlogged comments on the ECT(1) interim call

2020-04-28 Thread Jonathan Morton
> On 28 Apr, 2020, at 10:43 pm, Black, David  wrote:
> 
> And I also noted this at the end of the meeting:  “queue protection that 
> might apply the disincentive”
>  
> That would send cheaters to the L4S conventional queue along with all the 
> other queue-building traffic.

Alas, we have not yet seen an integrated implementation of the queue protection 
mechanism, so that we can test its effectiveness.  I think it is part of the 
extra evidence that would be needed before a decision could be taken in favour 
of using ECT(1) as an input.

I would also note in this context that mere volume of data, or length of 
development, are not marks that should be taken in favour of a proposal.  The 
relevance, quality, thoroughness and results of data collection must be 
carefully evaluated, and it could easily be argued that a lengthy development 
cycle that still has not produced reliable results should be retired, to avoid 
throwing good money after bad.  The fact that we were able to find serious 
problems with the (only?) reference implementation of L4S using a relatively 
small, but independently selected test suite does not lend confidence in its 
maturity.

Reputable engineers know that it is necessary to establish a robust design 
first.  Only then can a robust implementation be hoped for.  It is the basic 
design decision, over the semantics of each ECN codepoint, that we were trying 
to discuss yesterday.  I'm not certain that everyone in the room understood 
that.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] dropbox, bbr and ecn packet capture

2020-04-25 Thread Jonathan Morton
> On 26 Apr, 2020, at 3:36 am, Dave Taht  wrote:
> 
> I just did a rather large dropbox download. They are well known to be
> using bbr and experimenting with bbrv2. So I fired off a capture
> during a big dropbox download...
> 
> It negotiated ecn, my fq_codel shaper and/or my newly ath10k
> fq_codel's wifi exerted CE, osx sent back ecn-echo, and the rtt
> results were lovely. However, there is possibly not a causal
> relationship here, and if anyone is bored and wants to scetrace,
> tcptrace or otherwise tear this cap apart, go for it.

Well, the CE response at their end is definitely not Multiplicative Decrease.  
I haven't dug into it more deeply than that.  But they're also not running 
AccECN, nor are they "proactively" sending CWR to get a "more accurate" CE 
feedback.  I suspect they're running BBRv1 in this one.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] this explains speedtest stuff

2020-04-25 Thread Jonathan Morton
> On 25 Apr, 2020, at 8:24 pm, Y via Bloat  wrote:
> 
> ECN on
> http://www.dslreports.com/speedtest/62823326
> ECN off
> http://www.dslreports.com/speedtest/62823112

Yup, that's what I mean.

> doesn't appear to have worked. retransmits are still high.

Ken, it might be that your version of fq_codel doesn't actually have ECN 
support on by default.  So try adding the "ecn" keyword to the qdisc.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] this explains speedtest stuff

2020-04-25 Thread Jonathan Morton
> On 25 Apr, 2020, at 5:14 pm, Kenneth Porter  wrote:
> 
> I see "ecn" in the qdisc commands.

No, not the qdisc (where ECN is enabled by default), but on the client.

Linux:
# sysctl net.ipv4.tcp_ecn=1

Windows:
> netsh interface tcp set global ecncapability=enabled

OSX:
$ sudo sysctl -w net.inet.tcp.ecn_initiate_out=1  
$ sudo sysctl -w net.inet.tcp.ecn_negotiate_in=1

In Linux and OSX, to make the setting persist across reboots, edit 
/etc/sysctl.conf.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] this explains speedtest stuff

2020-04-25 Thread Jonathan Morton
> On 25 Apr, 2020, at 4:49 pm, Kenneth Porter  wrote:
> 
> before:
> 
> http://www.dslreports.com/speedtest/62767361
> 
> after:
> 
> http://www.dslreports.com/speedtest/62803997
> 
> Using simple.qos with:
> 
> UPLINK=45000
> DOWNLINK=42500
> 
> (The link is supposed to be 50 Mbps symmetric and speed test does show it 
> bursting that high sometimes.)

Looks like a definite improvement.  The Quality grade of C may indicate that 
you haven't enabled ECN on your client; without it, Codel has to drop packets 
to do congestion signalling.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] this explains speedtest stuff

2020-04-25 Thread Jonathan Morton
> On 25 Apr, 2020, at 4:16 am, Kenneth Porter  wrote:
> 
> Alas, CentOS 7 lacks cake. It does have fq_codel so I used the simple.qos 
> script from sqm-scripts, with uplink 5 and downlink 45000:
> 
> http://www.dslreports.com/speedtest/62797600

Those bandwidth settings are definitely too high; you don't have complete 
control of the queue here, and that's visible particularly with the steady 
increase in the upload latency during the test.  Try 44500 up, 42000 down, 
equivalent to my suggestions for Cake.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] this explains speedtest stuff

2020-04-24 Thread Jonathan Morton
> On 24 Apr, 2020, at 7:22 pm, Kenneth Porter  wrote:
> 
> My next project will be to enable cake on my CentOS 7 box that just got a new 
> 45 Mbps symmetric fiber connection from AT&T ("Business in a Box"). We 
> upgraded from 1.5Mbps/128kbps ADSL. Any hints on what settings to use?

Fibre probably uses Ethernet-style framing, or at least it does at the 
provisioning shaper.  So the following settings should probably work well:

# outbound
tc qdisc replace dev $WAN root cake bandwidth 44.5Mbit besteffort dual-srchost 
nonat ethernet ack-filter

# inbound
tc qdisc replace dev $IFB4WAN root cake bandwidth 42Mbit besteffort 
dual-dsthost nonat ethernet ingress

With, of course, the usual redirecting of $WAN ingress to $IFB4WAN.  The 
dual-src/dsthost settings should share things nicely between different users, 
including the server, even if one uses a lot more flows than another.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] this explains speedtest stuff

2020-04-23 Thread Jonathan Morton
> On 24 Apr, 2020, at 3:44 am, Kenneth Porter  wrote:
> 
>> dslreports.com is only on the third page of the search results.
> 
> What does it mean that my bloat indicator is a grey dot?
> 
> <http://www.dslreports.com/speedtest/62741609>

It looks like there was a websockets error during the test, so try it again and 
it might work.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Bufferbloat glossary

2020-03-29 Thread Jonathan Morton
> On 29 Mar, 2020, at 9:10 pm, Kenneth Porter  wrote:
> 
> For example, in today's message from David P. Reed I find "EDF" and "ACID".

Those aren't standard bufferbloat jargon, but come from elsewhere in computer 
science.  EDF is Earliest Deadline First (a scheduling policy normally applied 
in RTOSes - Realtime Operating Systems), and ACID is Atomicity, Consistency, 
Isolation, Durability (a set of properties typically desirable in a database).

I think the main distinction between online gaming and teleconferencing is the 
volume of data involved.  Games demand low latency, but also usually aren't 
throwing megabytes of data across the network at a time, just little bundles of 
game state updates telling the server what actions the player is taking, and 
telling the player's computer what enemies and other effects the player needs 
to be able to see.  Teleconferencing, by contrast, tends to involve multiple 
audio and video streams going everywhere.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] fcc's coronovirus guidelines

2020-03-28 Thread Jonathan Morton
> On 28 Mar, 2020, at 4:30 pm, Sebastian Moeller  wrote:
> 
> *) I wonder how well macos devices stack-up here, given that they default to 
> fq_codel (at least over wifi)?

That might help if the wifi link is the bottleneck, *and* if not too much 
buffering is done by the wifi hardware.  Otherwise the benefit will only be 
limited.  AQM and/or FQ has to be applied at the bottleneck; sometimes a 
bottleneck has to be artificially induced to implement that.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] pacing, applied differently than bbr

2020-02-26 Thread Jonathan Morton
> On 26 Feb, 2020, at 8:51 am, Taran Lynn  wrote:
> 
> As promised, here's the updated arXiv paper on applying model predictive
> control to TCP CC [1]. It contains more in depth information about the
> implementation, as well as some data from physical experiments.
> 
> [1] https://arxiv.org/abs/2002.09825

Hmmm.  I see some qualitative similarities to BBR behaviour, but the algorithm 
doesn't seem to be very robust since it seems to improve a lot when given 
approximate a-priori information via cap and collar settings.

How does it treat ECN information, or does it set itself Not-ECT?

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] is extremely consistent low-latency for e.g. xbox possible on SoHo networks w/o manual configuration?

2020-02-12 Thread Jonathan Morton
> On 12 Feb, 2020, at 6:55 am, Daniel Sterling  
> wrote:
> 
> * first and foremost, to the exclusion of all other goals, consistent
> low-latency for non-bulk streams from particular endpoints; usually
> those streams are easily identified and differentiated from all other
> streams based on UDP/TCP port number,

This is the ideal situation for simply deploying Cake without any special 
effort.  Just tell it the capacity of the link it's controlling, minus a modest 
margin (say 1% upstream, 5% downstream).

You should be pleasantly surprised by the results.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] 2019-12-31 docsis strict priority dual queue patent granted

2020-01-23 Thread Jonathan Morton
> On 24 Jan, 2020, at 7:37 am, Dave Taht  wrote:
> 
> "Otherwise, this exemplary embodiment enables system configuration to
> discard the low-priority packet tail, and transmit the high-priority
> packet instead, without waiting."

So this really *is* a "fast lane" enabling technology.  Just as we suspected.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] abc congestion control on time varying wireless links

2019-12-11 Thread Jonathan Morton
> On 11 Dec, 2019, at 9:54 pm, Dave Taht  wrote:
> 
> The DC folk want a multibit more immediate signal, for which L4S is
> kind of targetted, (and SCE also
> applies). I haven't seen any data on how well dctcp or SCE -style can
> work on wildly RTT varying links as yet, although it's been pitched at
> the LTE direction, not at wifi.

It turns out that a Codel marking strategy for SCE, with modified parameters of 
course, works well for tolerating bursty and aggregating links.  The RED-ramp 
and step-function strategies do not - and they're equally bad if the same test 
scenario is applied to DCTCP or TCP Prague.

The difference is not small; switching from RED to Codel improves goodput from 
1/8th to 80% of nominal link capacity, when a rough model of wifi 
characteristics is inserted into our usual Internet-path scenario.

We're currently exploring how best to set the extra set of Codel parameters 
involved.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] sce materials from ietf

2019-12-01 Thread Jonathan Morton
> On 1 Dec, 2019, at 9:32 pm, Sebastian Moeller  wrote:
> 
>> Meanwhile, an ack filter that avoids dropping acks in which the reserved 
>> flag bits differ from its successor will not lose any information in the 
>> one-bit scheme.  This is what's implemented in Cake (except that not all the 
>> reserved bits are covered yet, only the one we use).
> 
> So, to show my lack of knowledge, basically a pure change in sequence number 
> is acceptable, any other differences should trigger ACK conservation instead 
> of filtering?

You are broadly correct, in that a pure advance of acked sequence number 
effectively obsoletes the earlier ack and it is therefore safe (and even 
arguably beneficial) to drop it.  However a *duplicate* ack should *not* be 
dropped, because that may be required to trigger Fast Retransmission in the 
absence of SACK.

Cake's ack filter is a bit more sophisticated than that, in that it can also 
accept certain harmless changes within TCP options.  I believe Timestamps and 
SACK get special handling along these lines; Timestamps can always change, SACK 
gets equivalent "pure superset" logic to detect when the old ack is completely 
covered by the new one.  Other options not specifically handled are treated as 
disqualifying.

All this only occurs in two consecutive packets which are both acks for the 
same connection and which are both waiting for a delivery opportunity in the 
queue.  An earlier ack is never delayed just to see if it can be combined with 
a later one.  The result is a better use of limited capacity to carry useful 
payloads, without having to rely on dropping acks by AQM action (which Codel is 
actually rather bad at).

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] sce materials from ietf

2019-12-01 Thread Jonathan Morton
> On 1 Dec, 2019, at 9:03 pm, Sebastian Moeller  wrote:
> 
>> If less feedback is observed by the sender than intended by the AQM, growth 
>> will continue and the AQM will increase its marking to compensate, 
>> ultimately resorting to a CE mark.  
> 
> Well, that seems undesirable?

As a safety valve, getting a CE mark is greatly preferable to losing congestion 
control entirely, or incurring a packet loss as the other alternative 
congestion signal.  It would only happen if the SCE signal or feedback were 
seriously disrupted or entirely erased - the latter being the *normal* state of 
affairs when either endpoint is not SCE aware in the first place.

> Am I right to assume that the fault tolerance requires a relative steady ACK 
> stream though?

It only needs to be sufficient to keep the TCP stream flowing.  If the acks are 
bursty, that's a separate problem in which it doesn't really matter if they're 
all present or not.  And technically, the one-bit feedback mechanism is capable 
of precisely reflecting a sparse sequence of SCE marks using just two acks per 
mark.

> I fully agree that if ACK thinning is performed it really should be careful 
> to not loose information when doing its job, but SCE hopefully can deal with 
> whatever is out in the field today (I am looking at you DOCSIS uplinks...), 
> no?

Right, that's the essence of the above discussion about relative feedback 
error, which is the sort of thing that random ack loss or unprincipled ack 
thinning is likely to introduce.

Meanwhile, an ack filter that avoids dropping acks in which the reserved flag 
bits differ from its successor will not lose any information in the one-bit 
scheme.  This is what's implemented in Cake (except that not all the reserved 
bits are covered yet, only the one we use).

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] sce materials from ietf

2019-12-01 Thread Jonathan Morton
> On 1 Dec, 2019, at 6:35 pm, Sebastian Moeller  wrote:
> 
> Belt and suspenders, eh? But realistically, the idea of using an accumulating 
> SCE counter to allow for a lossy reverse ACK path seems sort of okay (after 
> all TCP relies on the same, so there would be a nice symmetry ).

Sure, we did think of several schemes that used a counter.  But when it came 
down to actually implementing it, we decided to try the simplest possible 
solution first and see how well it worked in practice.  It turned out to work 
very well, and can recover cleanly from as much as 100% relative feedback error 
caused by ack loss:

If less feedback is observed by the sender than intended by the AQM, growth 
will continue and the AQM will increase its marking to compensate, ultimately 
resorting to a CE mark.  This is, incidentally, exactly what happens if the 
receiver *or* sender are completely SCE-ignorant, and looks very much like 
RFC-3168 behaviour, which is entirely intentional.

If feedback is systematically doubled by the time it reaches the sender, 
perhaps through faulty ack filtering on the return path, it will back off more 
than intended, the bottleneck queue will empty, and AQM feedback will 
consequently reduce or cease entirely.  Only a very serious fault would 
re-inject ESCE feedback once SCE marking has completely ceased, so the sender 
will then grow back towards the correct cwnd after a relatively small negative 
excursion.

The above represents both extremes of 100% relative error in the feedback, 
which is shown to be safe and reasonably tolerable.  Smaller errors due to 
random ack loss are more likely, and consequently easier to tolerate in a 
closed negative-feedback control loop.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] sce materials from ietf

2019-11-30 Thread Jonathan Morton
> On 1 Dec, 2019, at 12:17 am, Carsten Bormann  wrote:
> 
>> There are unfortunate problems with introducing new TCP options, in that 
>> some overzealous firewalls block traffic which uses them.  This would be a 
>> deployment hazard for SCE, which merely using a spare header flag avoids.  
>> So instead we are still planning to use the spare bit - which happens to be 
>> one that AccECN also uses, but AccECN negotiates in such a way that SCE can 
>> safely use it even with an AccECN capable partner.
> 
> This got me curious:  Do you have any evidence that firewalls are friendlier 
> to new flags than to new options?

Mirja Kuhlewind said as much during the TCPM session we attended, and she ought 
to know.  There appear to have been several studies performed on this subject; 
reserved TCP flags tend to get ignored pretty well, but unknown TCP options 
tend to get either stripped or blocked.

This influenced the design of AccECN as well; in an early version it would have 
used only a TCP option and left the TCP flags alone.  When it was found that 
firewalls would often interfere with this, the three-bit field in the TCP flags 
area was cooked up.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


  1   2   3   4   5   6   >