Re: [Bloat] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Sebastian Moeller via Bloat
Hi Jason,

It is not just l4s, nqb and udp options are similarly flawed process-wise... so 
this is not about me being in the rough.
It is rather determination of consensus, however rough, seems under more or 
less sole power of the chairs (like in a court, but without a jury) and chairs 
are not bound to act as fair and impartial arbiters... and unlike in court 
there is no supposedly rigid set of rules by which to assess a chairs decision, 
let alone reliable methods to appeal a decision. Sure the IETF lets jockels 
like me participate in the process, but no, we do not have any meaningful say. 
Because in the end rough consensus is what the chairs declare it to be... And 
this is where in private strategy discussions with chairs become problematic.

Now, I understand why/how one ends up with a system like this, but thay does 
not make that a great or desirable system IMHO.

On 23 May 2024 02:06:26 CEST, "Livingood, Jason"  
wrote:
>On 5/22/24, 09:11, "Sebastian Moeller" > wrote:
>>[SM] The solution is IMHO not to try to enforce rfc7282 
>
>[JL] ISTM that the things in 7282 are well reflected in how TSVWG operates. I 
>know from experience it can be hard when rough consensus doesn't go your way - 
>it happens. And at the end of the day there are always competing technical 
>solutions - and if L4S indeed does not scale up well and demonstrate 
>sufficient benefit (or demonstrate downside) then something else will win the 
>day. 
>
>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Livingood, Jason via Bloat
On 5/22/24, 09:11, "Sebastian Moeller" mailto:moell...@gmx.de>> wrote:
>[SM] The solution is IMHO not to try to enforce rfc7282 

[JL] ISTM that the things in 7282 are well reflected in how TSVWG operates. I 
know from experience it can be hard when rough consensus doesn't go your way - 
it happens. And at the end of the day there are always competing technical 
solutions - and if L4S indeed does not scale up well and demonstrate sufficient 
benefit (or demonstrate downside) then something else will win the day. 


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] A Transport Protocol's View of Starlink

2024-05-22 Thread Kenneth Porter via Bloat

The Register came out with this summary today:

https://www.theregister.com/2024/05/22/starlink_tcp_performance_evaluation/

Excerpt:

Using PING, he found "minimum latency changes regularly every 15 
seconds" and surmised "It appears that this change correlates to the 
Starlink user's terminal being assigned to a different satellite. That 
implies that the user equipment 'tracks' each satellite for a 
15-second interval, which corresponds to a tracking angle of 11 
degrees of arc."


During those handovers, Huston observed some packet loss – and a 
significant increase in latency. "The worst case in this data set is a 
shift from 30ms to 80ms," he wrote. Further: "Within each 15-second 
satellite tracking interval, the latency variation is relatively high. 
The average variation of jitter between successive RTT intervals is 
6.7ms. The latency spikes at handover impose an additional 30ms to 
50ms indicating the presence of deep buffers in the system to 
accommodate the transient issues associated with satellite handover."


Overall, Huston believes Starlink has "a very high jitter rate, a 
packet drop rate of around one percent to two percent that is 
unrelated to network congestion, and a latency profile that jumps 
regularly every 15 seconds."


[No need to cc me in replies, I'll read them on the mailiing list.]


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [EXTERNAL] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Sebastian Moeller via Bloat
Hi Jason

let me apologise for the harsh tone. I should have been able to phrase my point 
way politer, but clearly failed.
I am sure your testing matrix was large enough already and tested those 
conditions you considered most urgent for your use-cases.
I understand that I am free to test what ever I am interested in L4S myself.

Regards
Sebastian


> On 22. May 2024, at 14:54, Sebastian Moeller  wrote:
> 
> Hi Jason
> 
>> On 22. May 2024, at 14:27, Livingood, Jason  
>> wrote:
>> 
>> [SM] Here is Pete's data showing that, the middle two bars show what happens 
>> when the bottleneck is not treating TCP Prague to the expected signalling... 
>> That is not really fit for use over the open internet...
>> 
>> [JL] That graph is not anything like what we’ve seen in lab or field 
>> testing. I suspect you may have made some bad assumptions in the simulation.
> 
> 
> So have you actually tested 1 TCP CUBIC versus 1 TCP Prague flow over a FIFO 
> bottleneck with 80ms minRTT?
> Then, I would appreciate if you could share that data.
> 
> My best guess is, that you did not explicitly test that (*). I expect almost 
> all testing used short RTTs and likely the low latency docsis scheduler/AQM 
> combination (essentially an implementation close to DualQ). But I am happy to 
> be wrong.
> 
> One of my complaints of the data presented in favor of L4S during the 
> ratification process was (and still is) that we got a multitude of very 
> similar tests all around locations n parameter space that were known to work, 
> while the amount od even mildly adversarial testing was miniscule.
> 
> *) As Jonathan implied the issue might be TCP Prague"s pedigree from TCP Reno 
> mostly, as Reno and Cubic compete similarly unequal at 80ms RTT. To which I 
> asked, who came up with the idea of basing TCP Prague on Reno in the first 
> place? Changing that now, essentially will invalidate most previous L4S 
> testing. See above why I do not believe this to be a terrible loss, but just 
> procedurally I consider than not very impressive engineering. That aside. if 
> this explanation is correct, the only way for you not having encountered that 
> dutring your tests is by not actually testing that condition. But that in 
> turn makes waters down the weight of the claim "not anything like what we’ve 
> seen in lab or field testing" considerably, no?
> 
> 
> 

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] A Transport Protocol's View of Starlink

2024-05-22 Thread Sebastian Moeller via Bloat


> On 22. May 2024, at 17:59, Stephen Hemminger via Bloat 
>  wrote:
> 
> On Wed, 22 May 2024 06:16:17 -0700
> Kenneth Porter via Bloat  wrote:
> 
>> This technical paper on Starlink by the chief scientist at APNIC crossed my 
>> feed this week. [I thought I'd share it to the Starlink list here but my 
>> application to join that list seems to have gotten stuck so I'll share it 
>> here for now.]
>> 
>> 
>> 
>>> From the end of the paper:  
>> 
>>> While earlier TCP control protocols, such as Reno, have been observed to
>>> perform poorly on Starlink connections, more recent TCP counterparts,
>>> such as CUBIC, perform more efficiently. The major TCP feature that makes
>>> these protocols viable in Starlink contexts is the use of Selective
>>> Acknowledgement [11], that allows the TCP control algorithm to
>>> distinguish between isolated packet loss and loss-inducing levels of
>>> network congestion.
>>> 
>>> TCP control protocols that attempt to detect the onset of network queue
>>> formation can do so using end-to-end techniques by detecting changes in
>>> end-to-end latency during intermittent periods of burst, such as BBR.

[SM] Is that actually what BBR does? I believe BBR cyclically reduces its 
sending rate to measure the minRTT but during its bandwidth probes (or 
intermittent periods of bursts), it actually measures the rate via the ACK 
feedback and not the increased queueing delay?


>>> These protocols need to operate with a careful implementation of their
>>> sensitivity to latency, as the highly unstable short-term latency seen on
>>> Starlink connections, coupled with the 15-second coarse level latency
>>> shifts have the potential to confuse the queue onset detection algorithm.
>>> 
>>> It would be interesting to observe the behaviour of an ECN-aware TCP
>>> protocol behaviour if ECN were to enabled on Starlink routing devices.
>>> ECN has the potential to provide a clear signal to the endpoints about
>>> the onset of network-level queue formation, as distinct from latency
>>> variation.  
> 
> It frustrates me that all research still looks primarily at Reno, rather
> than the congestion controls that are actually implemented in Linux and 
> Windows
> which are used predominately on the Internet.
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] A Transport Protocol's View of Starlink

2024-05-22 Thread Stephen Hemminger via Bloat
On Wed, 22 May 2024 06:16:17 -0700
Kenneth Porter via Bloat  wrote:

> This technical paper on Starlink by the chief scientist at APNIC crossed my 
> feed this week. [I thought I'd share it to the Starlink list here but my 
> application to join that list seems to have gotten stuck so I'll share it 
> here for now.]
> 
> 
> 
> >From the end of the paper:  
> 
> > While earlier TCP control protocols, such as Reno, have been observed to
> > perform poorly on Starlink connections, more recent TCP counterparts,
> > such as CUBIC, perform more efficiently. The major TCP feature that makes
> > these protocols viable in Starlink contexts is the use of Selective
> > Acknowledgement [11], that allows the TCP control algorithm to
> > distinguish between isolated packet loss and loss-inducing levels of
> > network congestion.
> >
> > TCP control protocols that attempt to detect the onset of network queue
> > formation can do so using end-to-end techniques by detecting changes in
> > end-to-end latency during intermittent periods of burst, such as BBR.
> > These protocols need to operate with a careful implementation of their
> > sensitivity to latency, as the highly unstable short-term latency seen on
> > Starlink connections, coupled with the 15-second coarse level latency
> > shifts have the potential to confuse the queue onset detection algorithm.
> >
> > It would be interesting to observe the behaviour of an ECN-aware TCP
> > protocol behaviour if ECN were to enabled on Starlink routing devices.
> > ECN has the potential to provide a clear signal to the endpoints about
> > the onset of network-level queue formation, as distinct from latency
> > variation.  

It frustrates me that all research still looks primarily at Reno, rather
than the congestion controls that are actually implemented in Linux and Windows
which are used predominately on the Internet.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


[Bloat] A Transport Protocol's View of Starlink

2024-05-22 Thread Kenneth Porter via Bloat
This technical paper on Starlink by the chief scientist at APNIC crossed my 
feed this week. [I thought I'd share it to the Starlink list here but my 
application to join that list seems to have gotten stuck so I'll share it 
here for now.]





From the end of the paper:



While earlier TCP control protocols, such as Reno, have been observed to
perform poorly on Starlink connections, more recent TCP counterparts,
such as CUBIC, perform more efficiently. The major TCP feature that makes
these protocols viable in Starlink contexts is the use of Selective
Acknowledgement [11], that allows the TCP control algorithm to
distinguish between isolated packet loss and loss-inducing levels of
network congestion.

TCP control protocols that attempt to detect the onset of network queue
formation can do so using end-to-end techniques by detecting changes in
end-to-end latency during intermittent periods of burst, such as BBR.
These protocols need to operate with a careful implementation of their
sensitivity to latency, as the highly unstable short-term latency seen on
Starlink connections, coupled with the 15-second coarse level latency
shifts have the potential to confuse the queue onset detection algorithm.

It would be interesting to observe the behaviour of an ECN-aware TCP
protocol behaviour if ECN were to enabled on Starlink routing devices.
ECN has the potential to provide a clear signal to the endpoints about
the onset of network-level queue formation, as distinct from latency
variation.


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Sebastian Moeller via Bloat
Hi Jason,

> On 22. May 2024, at 14:48, Livingood, Jason  
> wrote:
> 
> > in the IETF the gap between the 'no politics' motto  There have always been 
> > politics at the IETF and in every other SDO, open source project, etc. – it 
> > is human nature IMO.

[SM] I agree, but most other organisations openly accept that, it is only the 
IETF that claims to abhor politics. The IETF however publishes 
https://datatracker.ietf.org/doc/html/rfc7282 arguing against exactly the kind 
of horse-trading happening out in the open. The solution is IMHO not to try to 
enforce rfc7282 but to accept that politics is unavoidale and implement 
processes that takr that into account. As is the IETF rules allow chairs and 
ADs tremendous leeway without recourse or checks and balances.
BUT, I do admit that even with my limited experience with the IETF I have also 
seen WGs were the IETF process works really well, civil and productive, so not 
all is bad, but IMHO TSVWG demonstrates how easily that can derail or be 
derailed on purpose. Like when for a humming event (cough, ECT(1) input or 
output, cough) dozens of members appear that seem to never before or after have 
given any attributable input on a draft...


>  > And the fact that WG members see no harm in having private only strategy 
> discussions with chairs and ADs.
>  In my personal experience at the IETF, when you are lead author or editor of 
> a working group document it is routine to strategize with WG chairs and even 
> ADs on how to keep the document moving forward, how to resolve conflict and 
> achieve consensus, and how to be well-prepared for meetings. That IMO is a 
> sign of WG chairs and ADs doing their job of developing standards on a timely 
> basis.

[SM] Chairs and ADs function as arbiters in the process (whether they like it 
or not) and I like my arbiters neutral and unbiased. What would be the harm to 
have the discussion how to keep a document moving forward open on the mailing 
list? Doing it in secret is IMHO not a good optic (even if, what I assume and 
hope nothing untowardly happens).
My understanding is that "timely basis" is a far too important factor in recent 
years, I prefer no RFC over sup-standard RFCs. 



Regards
Sebastian


>   JL


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [EXTERNAL] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Sebastian Moeller via Bloat
Hi Jason

> On 22. May 2024, at 14:27, Livingood, Jason  
> wrote:
> 
> [SM] Here is Pete's data showing that, the middle two bars show what happens 
> when the bottleneck is not treating TCP Prague to the expected signalling... 
> That is not really fit for use over the open internet...
> 
> [JL] That graph is not anything like what we’ve seen in lab or field testing. 
> I suspect you may have made some bad assumptions in the simulation.


So have you actually tested 1 TCP CUBIC versus 1 TCP Prague flow over a FIFO 
bottleneck with 80ms minRTT?
Then, I would appreciate if you could share that data.

My best guess is, that you did not explicitly test that (*). I expect almost 
all testing used short RTTs and likely the low latency docsis scheduler/AQM 
combination (essentially an implementation close to DualQ). But I am happy to 
be wrong.

One of my complaints of the data presented in favor of L4S during the 
ratification process was (and still is) that we got a multitude of very similar 
tests all around locations n parameter space that were known to work, while the 
amount od even mildly adversarial testing was miniscule.

*) As Jonathan implied the issue might be TCP Prague"s pedigree from TCP Reno 
mostly, as Reno and Cubic compete similarly unequal at 80ms RTT. To which I 
asked, who came up with the idea of basing TCP Prague on Reno in the first 
place? Changing that now, essentially will invalidate most previous L4S 
testing. See above why I do not believe this to be a terrible loss, but just 
procedurally I consider than not very impressive engineering. That aside. if 
this explanation is correct, the only way for you not having encountered that 
dutring your tests is by not actually testing that condition. But that in turn 
makes waters down the weight of the claim "not anything like what we’ve seen in 
lab or field testing" considerably, no?



___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Fwd: "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Livingood, Jason via Bloat
> in the IETF the gap between the 'no politics' motto

There have always been politics at the IETF and in every other SDO, open source 
project, etc. – it is human nature IMO.

> And the fact that WG members see no harm in having private only strategy 
> discussions with chairs and ADs.

In my personal experience at the IETF, when you are lead author or editor of a 
working group document it is routine to strategize with WG chairs and even ADs 
on how to keep the document moving forward, how to resolve conflict and achieve 
consensus, and how to be well-prepared for meetings. That IMO is a sign of WG 
chairs and ADs doing their job of developing standards on a timely basis.


JL


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [EXTERNAL] Re: "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Livingood, Jason via Bloat
> I don't dispute that, at least insofar as the metrics you prefer for such 
> comparisons, under the network conditions you also prefer. But by omitting 
> the conventional AQM results from the performance charts, the comparison 
> presented to readers is not between L4S and the current state of the art, and 
> the expected benefit is therefore exaggerated in a misleading way.

[JL] That is good feedback for you to send to Nokia. But as I mentioned, all 
our comparisons in lab and field testing are of AQM vs L4S - so we have that 
covered (and lots of other tests cases I won't cover here). 




___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [EXTERNAL] Re: "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Livingood, Jason via Bloat
[SM] Here is Pete's data showing that, the middle two bars show what happens 
when the bottleneck is not treating TCP Prague to the expected signalling... 
That is not really fit for use over the open internet...

[JL] That graph is not anything like what we’ve seen in lab or field testing. I 
suspect you may have made some bad assumptions in the simulation.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] "Very interesting L4S presentation from Nokia Bell Labs on tap for RIPE 88 in Krakow this week! "

2024-05-22 Thread Jonathan Morton via Bloat
> On 21 May, 2024, at 8:32 pm, Sebastian Moeller  wrote:
> 
>> On 21. May 2024, at 19:13, Livingood, Jason via Bloat 
>>  wrote:
>> 
>> On 5/21/24, 12:19, "Bloat on behalf of Jonathan Morton via Bloat wrote:
>> 
>>> Notice in particular that the only *performance* comparisons they make are 
>>> between L4S and no AQM at all, not between L4S and conventional AQM - even 
>>> though they now mention that the latter *exists*.
>> 
>> I cannot speak to the Nokia deck. But in our field trials we have certainly 
>> compared single queue AQM to L4S, and L4S flows perform better.

I don't dispute that, at least insofar as the metrics you prefer for such 
comparisons, under the network conditions you also prefer.  But by omitting the 
conventional AQM results from the performance charts, the comparison presented 
to readers is not between L4S and the current state of the art, and the 
expected benefit is therefore exaggerated in a misleading way.

An unbiased presentation would alert readers to the fact that merely deploying 
a conventional AQM would already eliminate nearly all of the queue-related 
delay associated with a dumb FIFO, without sacrificing much if any goodput.  By 
doing this, they would also not expose themselves to the risks associated with 
deploying L4S (see below).

>>> There's also no mention whatsoever of what happens when L4S traffic meets a 
>>> conventional AQM.
>> 
>> We also tested this and all is well; the performance of classic queue with 
>> AQM is fine.
> 
> [SM] I think you are thinking of a different case than Jonathan, not classic 
> traffic in the C-queue, but L4S traffic (ECT(1)) that by chance is not hiting 
> abottleneck employing DualQ but the traditional FIFO...
> This is the case where at least TCP Prague just folds it, gives up and goes 
> home...
> 
> Here is Pete's data showing that, the middle two bars show what happens when 
> the bottleneck is not treating TCP Prague to the expected signalling...

This isn't even the case I was thinking of.  Neither "classic" traffic in the C 
queue (a situation which L4S has always been designed to accommodate, however 
much we might debate the effectiveness of the design), nor L4S traffic in a 
dumb FIFO (which, though it performs badly, is at least "safe"), but L4S 
traffic in a "classic" RFC-3168 AQM, of the type which is already deployed to 
some extent.  This is what exposes the fundamental incompatibility between L4S 
and conventional traffic, as I have been saying from practically the moment I 
heard about L4S.

It's unfortunate that this case is not covered in the chart that Sebastian 
linked.  The situation arose because that particular chart is focused on a 
performance concern, not a safety concern which was treated elsewhere in the 
report.  What it would show, if a fourth qdisc such as "codel" were included 
(with ECN turned on), is a similar magnitude of throughput bias as in the 
"pfifo" qdisc, but in the opposite direction.  Note that the bias in the 
"pfifo" case arises solely because Prague does not *scale up* to high BDPs in 
the way that CUBIC does.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat