Re: [Bloat] Congestion control with FQ-Codel/Cake with Multicast?

2024-05-23 Thread Holland, Jake via Bloat
Hi Linus,

One correction on that last message:

After sending and re-reading I belatedly realized it's completely viable to
snoop SDP over SAP if you wanted to.  So I guess it's not a non-starter if
you don't care about solving other kinds of multicast traffic in addition to
the ones in that can announce SDP to the SAP channels you can watch.

I'm not sure why it wouldn't be better to make a separate service that
coordinates the channels in use since the SAP channels will probably have
to do that anyway, but on reflection I think snooping SDP is probably
more viable than I gave it credit for.  My caution there would be that in
a network where multicast can get delivered and people are using it, I
would think someone might start doing SDP outside the SAP channels at
the app level, or maybe doing other kinds of multicast traffic.  So I'd
imagine it only can lead to a partial solution, but it could be useful.

Apologies for my confusion.

Best regards,
Jake


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Congestion control with FQ-Codel/Cake with Multicast?

2024-05-23 Thread Holland, Jake via Bloat
Hi Linus,

I did do some multicast tests with an OpenWRT that was
recent when I did them in about 2019-2020 IIRC.

I did not look at any FQ/CAKE while doing it.  I wouldn't call
FQ a viable, fair congestion control for this, though I do expect
it would isolate the damage to flows that share the queue with
the multicast stream, which at least helps prevent congestion
from breaking the network.

However, I'd describe that behavior as more like FQ providing
damage control against uncontrolled flows.  While it would be a
helpful mitigation, it's not a solution.

The work I did on the problem was mostly captured in the CBACC
draft (expired since the project was killed and I haven't managed
to take it up in my free time, but I still think it's a promising
approach):
https://www.ietf.org/archive/id/draft-ietf-mboned-cbacc-04.html

You probably need to do the congestion control in the app (see
https://www.rfc-editor.org/rfc/rfc8085.html#section-4.1 for the
general approaches that make sense, and CBACC lists several
specific ones in the references), but CBACC tries to provide a way
you can maintain network safety in the presence of misbehaving
apps that don't do it well (it would also disincentivize apps from
trying to cheat and causing harm to the network, which maybe
also should have been mentioned but I don't think was).

Also worth noting is that on the default OpenWRT install I saw at
the time, it did a layer 2 conversion to unicast for multicast packets
(tho I believe it can be turned off if you are talking something like
a stadium where you have a lot of users that really should be
getting the packets as layer 2 broadcast, as opposed to a home or
something with just a few users sharing the same wifi and watching
the same content).

There's a link in https://www.rfc-editor.org/rfc/rfc9119.html#section-4.6.2
about that, but basically at the Ethernet and WiFi layer, it uses IGMP/MLD
snooping to make the multicast IP packets go to one or more unicast MAC
addresses of the receivers that are joined, so it's actually using unicast as
far as WiFi or Ethernet is concerned, while still being multicast at the IP
layer (everything upstream of the router can still share the bandwidth
utilization, and the app still joins the multicast channel).
(PS: That whole RFC 9119 might be worth a read, for what you're doing.)

I agree with your conclusion that FQ systems would see different
streams as separate queues and each one could be independently
overloaded, which is among the reasons I don't think FQ can be
viewed as a solution here (though as a mitigation for the damage
I'd expect it's a good thing to have in place).

To your 2nd question, I don't see snooping SDP as a viable path
for some kind of in-network congestion control either, I think
it'll have to be explicitly exposed in general (hence the CBACC+
DORMS approach proposed).

Anything you want to deploy at any scale is going to have to have
the SDP encrypted, I expect, so I would consider SDP-snooping a
non-starter for something like that.  In an enterprise or a closed
network, I guess it could maybe work, but there you can also just
control the streams by their IPs so you still probably don't need
SDP snooping (and a lot of enterprises wouldn't want unencrypted
SDP either--at least 80% of the security people I've met will veto it,
and at least 60% of those will have good reasons at top of mind).

I love RaptorQ and it has performed amazingly well for me so it
sounds to me like you're on the best known path for loss recovery,
but I don't know how best to put it into SDP, I never did much work
on that part.  I guess my general advice is to decide what level of
loss recovery you want to provide by default and just send FEC for
that to everyone (like at 1% or so), and maybe a separate channel
that can support another higher threshold (maybe like another 3-5%)
for networks that are persistently noisy or something, and anything
higher provide via an HTTP endpoint if individuals need more than
that once in a while.  And make those tunable without restarting
the stream.  And write up your findings :)

Best of luck and I hope that's helpful.

-Jake

On 5/21/24, 7:56 AM, "Bloat on behalf of Linus Lüssing via Bloat" 
mailto:bloat-boun...@lists.bufferbloat.net> on behalf of 
bloat@lists.bufferbloat.net > wrote:


Hi,


In the past, flooding a network with multicast packets
was usually the easiest way to jam a (wifi) network,
as IPv6/UDP multicast in contrast to TCP has no native
congestion control. And broadcast/multicast packets on Wifi
would have a linear instead of exponential backoff time compared
to unicast for CSMA-CA, as far as I know.


I was wondering, have there been any tests with multicast on a
recent OpenWrt with FQ-Codel or Cake, do these queueing machanisms
work as a viable, fair congestion control option for multicast,
too? Has anyone looked at / tested this?


Second question: I'm sending an IPv6 multicast
UDP/RTP audio 

Re: [Bloat] Little's Law mea culpa, but not invalidating my main point

2021-07-14 Thread Holland, Jake via Bloat
From: Bob McMahon via Bloat 
> Date: Wed,2021-07-14 at 11:38 AM
> One challenge I faced with iperf 2 was around flow control's effects on
> latency. I find if iperf 2 rate limits on writes then the end/end
> latencies, RTT look good because the pipe is basically empty, while rate
> limiting reads to the same value fills the window and drives the RTT up.
> One might conclude, from a network perspective, the write side is
> better.  But in reality, the write rate limiting is just pushing the
> delay into the application's logic, i.e. the relevant bytes may not be
> in the pipe but they aren't at the receiver either, they're stuck
> somewhere in the "tx application space."
>
> It wasn't obvious to me how to address this. We added burst measurements
> (burst xfer time, and bursts/sec) which, I think, helps.
...
>>> I find the assumption that congestion occurs "in network" as not always
>>> true. Taking OWD measurements with read side rate limiting suggests that
>>> equally important to mitigating bufferbloat driven latency using congestion
>>> signals is to make sure apps read "fast enough" whatever that means. I
>>> rarely hear about how important it is for apps to prioritize reads over
>>> open sockets. Not sure why that's overlooked and bufferbloat gets all the
>>> attention. I'm probably missing something.

Hi Bob,

You're right that the sender generally also has to avoid sending
more than the receiver can handle to avoid delays in a message-
reply cycle on the same TCP flow.

In general, I think of failures here as application faults rather
than network faults.  While important for user experience, it's
something that an app developer can solve.  That's importantly
different from network buffering.

It's also somewhat possible to avoid getting excessively backed up
in the network because of your own traffic.  Here bbr usually does
a decent job of keeping the queues decently low.  (And you'll maybe
find that some of the bufferbloat measurement efforts are relying
on the self-congestion you get out of cubic, so if you switch them
to bbr you might not get a good answer on how big the network buffers
are.)

In general, anything along these lines has to give backpressure to
the sender somehow.  What I'm guessing you saw when you did receiver-
side rate limiting was that the backpressure had to fill bytes all
the way back to a full receive kernel buffer (making a 0 rwnd for
TCP) and a full send kernel buffer before the send writes start
failing (I think with ENOBUFS iirc?), and that's the first hint the
sender has that it can't send more data right now.  The assumption
that the receiver can receive as fast as the sender can send is so
common that it often goes unstated.

(If you love to suffer, you can maybe get the backpressure to start
earlier, and with maybe a lower impact to your app-level RTT, if
you try hard enough from the receive side with TCP_WINDOW_CLAMP:
https://man7.org/linux/man-pages/man7/tcp.7.html#:~:text=tcp_window_clamp
But you'll still be living with a full send buffer ahead of the
message-response.)

But usually the right thing to do if you want receiver-driven rate
control is to send back some kind of explicit "slow down, it's too
fast for me" feedback at the app layer that will make the sender send
slower.  For instance most ABR players will shift down their bitrate
if they're failing to render video fast enough just as well as if the
network isn't feeding the video segments fast enough, like if they're
CPU-bound from something else churning on the machine.  (RTP-based
video players are supposed to send feedback with this same kind of
"slow down" capability, and sometimes they do.)

But what you can't fix from the endpoints no matter how hard you
try is the buffers in the network that get filled by other people's
traffic.

Getting other people's traffic to avoid breaking my latency when
we're sharing a bottleneck requires deploying something in the network
and it's not something I can fix myself except inside my own network.

While the app-specific fixes would make for very fine blog posts or
stack overflow questions that could help someone who managed to search
the right terms, there's a lot of different approaches for different
apps that can solve it more or less, and anyone who tries hard enough
will land on something that works well enough for them, and you don't
need a whole movement to get people to make it so their own app works
ok for them and their users.  The problems can be subtle and maybe
there will be some late and frustrating nights involved, but anyone
who gets it reproducible and keeps digging will solve it eventually.

But getting stuff deployed in networks to stop people's traffic
breaking each other's latency is harder, especially when it's a
major challenge for people to even grasp the problem and understand
its causes.  The only possible paths to getting a solution widely
deployed (assuming you have one that works) start with things like
"start an advocacy movement" 

Re: [Bloat] Little's Law mea culpa, but not invalidating my main point

2021-07-09 Thread Holland, Jake via Bloat
Hi David,

That’s an interesting point, and I think you’re right that packet arrival is 
poorly modeled as a Poisson process, because in practice packet transmissions 
are very rarely unrelated to other packet transmissions.

But now you’ve got me wondering what the right approach is.  Do you have any 
advice for how to improve this kind of modeling?

I’m thinking maybe a useful adjustment is to use Poisson start times on packet 
bursts, with a distribution on some burst characteristics? (Maybe like 
duration, rate, choppiness?)  Part of the point being that burst parameters 
then have a chance to become significant, as well as the load from aggregate 
user behavior.

And although I think user behavior is probably often ok to model as independent 
(outside of a background average that changes by time of day), in some contexts 
maybe it needs a 2nd overlay for bursts in user activity to address 
user-synchronizing events...  But for some problems I expect this kind of 
approach might still miss important feedback loop effects, and maybe for some 
problems it needs a more generalized suite of patterns besides a “burst”.  But 
maybe it would still be a step in the right direction for examining network 
loading problems in the abstract?

Or maybe it’s better to ask a different question:
Are there any good exemplars to follow here?  Any network traffic analysis (or 
related) work you’d recommend as having useful results that apply more broadly 
than a specific set of simulation/testing parameters, and that you wish more 
people would follow their example?

Also related: any particular papers come to mind that you wish someone would 
re-do with a better model?


Anyway, coming back to where that can of worms opened, I gotta say I like the 
“language of math” idea as a goal to aim for, and it would be surprising to me 
if no such useful information could be extracted from iperf runs.

A Little’s Law-based average queue estimate sounds possibly useful to me 
(especially compared across different runs or against external stats on 
background cross-traffic activity), and some kind of tail analysis on latency 
samples also sounds relevant to user experience. Maybe there’s some other 
things that would be better to include?

Best regards,
Jake


From: "David P. Reed" 
Date: Fri,2021-07-09 at 12:31 PM
To: Luca Muscariello 
Cc: Cake List , Make-Wifi-fast 
, Leonard Kleinrock , 
Bob McMahon , "starl...@lists.bufferbloat.net" 
, "co...@lists.bufferbloat.net" 
, cerowrt-devel 
, bloat , Ben 
Greear 
Subject: Re: [Bloat] Little's Law mea culpa, but not invalidating my main point


Len - I admit I made a mistake in challenging Little's Law as being based on 
Poisson processes. It is more general. But it tells you an "average" in its 
base form, and latency averages are not useful for end user applications.



However, Little's Law does assume something that is not actually valid about 
the kind of distributions seen in the network, and in fact, it is NOT true that 
networks converge on Poisson arrival times.



The key issue is well-described in the sandard analysis of the M/M/1 queue 
(e.g. 
https://en.wikipedia.org/wiki/M/M/1_queue)
 , which is done only for Poisson processes, and is also limited to "stable" 
systems. But networks are never stable when fully loaded. They get unstable and 
those instabilities persist for a long time in the network. Instability is at 
core the underlying *requirement* of the Internet's usage.



So specifically: real networks, even large ones, and certainly the Internet 
today, are not asymptotic limits of sums of stationary stochastic arrival 
processes. Each esternal terminal of any real network has a real user there, 
running a real application, and the network is a complex graph. This makes it 
completely unlike a single queue. Even the links within a network carry a 
relatively small number of application flows. There's no ability to apply the 
Law of Large Numbers to the distributions, because any particular path contains 
only a small number of serialized flows with hightly variable rates.



Here's an example of what really happens in a real network (I've observed this 
in 5 different cities on ATT's cellular network, back when it was running 
Alcatel Lucent HSPA+ gear in those cities).

But you can see this on any network where transient overload occurs, creating 
instability.





At 7 AM, the data transmission of the network is roughty stable. That's because 
no links are overloaded within the network. Little's Law can tell you by 
observing the delay and throughput on any path that the average delay in the 
network is X.



Continue sampling delay in the network as the day wears on. At about 10 AM, 
ping delay starts to soar into the multiple second range. No packers are lost. 
The peak ping time is about 4000 milliseconds - 4 

Re: [Bloat] AQM & Net Neutrality

2021-06-03 Thread Holland, Jake via Bloat
Hi Stuart,

On 05-24, 12:18 PM, "Stuart Cheshire via Bloat"  
wrote:
> Delay reduction is not an either/or choice. In order for some traffic to 
> benefit other traffic doesn’t have to suffer. It’s not a zero-sum game. 
> Eliminating standing queues in network buffers benefits all traffic. This can 
> be hard to communicate because it seems counter to human intuition. It sounds 
> too good to be true. In normal human life this is uncommon. When first class 
> passengers board the plane first, all economy passengers wait a little bit 
> longer as a result. Computer network queueing doesn’t operate like that, 
> which makes it hard to explain by analogy to everyday experiences that most 
> people understand.

This is a great point and maybe worth expanding on.  It made me
think of a few such analogies in the real world that maybe could be
easier for people to intuitively grasp.

For example, there's some good videos on traffic handling like this
one that can show the difference between 2 intersecting 4-lane roads
with an intersection having a traffic flow of 191 (at the start), vs.
2 intersecting 4-lane roads with a stack interchange doing an almost
6x better 1099 (at the 6-minute mark):
https://www.youtube.com/watch?v=yITr127KZtQ

I feel like if you told people that's kinda like the difference
between using a router with and without AQM, they'd have a useful
model that would make some sense to them, even though the dynamics
aren't quite the same.  The point is that by setting up a nicely
optimized infrastructure, it prevents people from getting in each
others' way and makes it better and faster for everyone.

But I don't think that's the only example, there's other real-world
scenarios people can maybe follow, like comparing the Steffan airplane
boarding method to back-to-front.  There's some short and sweet videos
that can make the point, perhaps in a way that non-experts can more or
less follow and see how it helps:
https://www.youtube.com/watch?v=Y7RXo20jTM4
(Even when there's a first class section with special treatment, this
could still help in coach, if only there were enough luggage space...)

Likewise with some other IRL queue-tuning strategies like how to
set up checkout lines at stores, there's some videos that do a good
job explaining why it helps and how it works (though IIRC this trades
off maximum delay vs. average delay, so it's not quite the same
"everybody just purely wins" point):
https://www.youtube.com/watch?v=9nczHfj-Oh8

But maybe some of these are too far away from networking, I dunno.
So another also-ran on "accessible explanation" is this adorable video
from RIPE in 2016, which is nice because it's actually talking about
the right thing.  It says the same thing about helping everyone, but
only quickly and in-passing.  I feel like something along these lines
that harps on the "helps everyone" point a bit more might be the right
kind of thing:
https://www.youtube.com/watch?v=3eAVGF70HjY

Anyway, it's an excellent point that this non-zero-sum aspect might be
a barrier to understanding, and I bet it's worthwhile to try to get
the "everybody benefits from a good AQM" point across more explicitly
and more often, especially if people are mistaking this for priority/
net neutrality issue.

And just a quick side note:

> This is why I’ve been advocating for making low delay available for *any* 
> traffic that chooses to opt-in to this smarter queue management, not 
> selectively for just some privileged traffic. 

AQM at a network bottleneck is not an opt-in thing, it just applies
to all the traffic passing through.  (ECN helps a little at the app
layer from loss avoidance, but usually not as much as the lower queue
target by far.)


Best regards,
Jake


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] my backlogged comments on the ECT(1) interim call

2020-04-28 Thread Holland, Jake via Bloat
--- Begin Message ---
Hi Luca,

To your point about the discussion being difficult to follow: I tried to 
capture the intent of everyone who commented while taking notes:
https://etherpad.ietf.org:9009/p/notes-ietf-interim-2020-tsvwg-03

I think this was intended to take the place of a need for everyone to re-send 
the same points to the list, but of course some of the most crucial points 
could probably use fleshing out with on-list follow up.

It got a bit rough in places because I was disconnected a few times and had to 
cut over to a local text file, and I may have failed to correctly understand or 
summarize some of the comments, so there’s chances I might have missed 
something, but I did my best to capture them all.

I encourage people to review comments and check whether they came out more or 
less correct, and to offer formatting and cleanup suggestions if there’s a good 
way to make it easier to follow.

I had timestamps at the beginning of each main point of discussion, with the 
intent that after the video is published it would be easier to go back and 
check precisely what was said. It looks like someone has been making cleanup 
edits that removed the first half of those so far, but my local text file still 
has most of those and I can go back and re-insert them if it seems useful.

@Luca: during your comments in particular I think there might have been a 
disruption--I had a “first comment missed, please check video” placeholder and 
I may have misunderstood the part about video elasticity, but my interpretation 
at the time was that Stuart was claiming that video was elastic in that it 
would adjust downward to avoid overflowing a loaded link, and I thought you 
were claiming that it was not elastic in that it would not exceed a maximum 
rate, which I summarized as perhaps a semantic disagreement, but if you’d like 
to help clean that up, it might be useful.

From this message, it sounds like the key point you were making was that it 
also will not go below a certain rate, and perhaps that quality can stay 
relatively good in spite of high network loss?

Best regards,
Jake

From: Luca Muscariello 
Date: Tuesday, April 28, 2020 at 1:54 AM
To: Dave Taht 
Cc: tsvwg IETF list , bloat 
Subject: Re: [Bloat] my backlogged comments on the ECT(1) interim call

Hi Dave and list members,

It was difficult to follow the discussion at the meeting yesterday.
Who  said what in the first place.

There have been a lot of non-technical comments such as: this solution
is better than another in my opinion. "better" has often been used
as when evaluating the taste of an ice cream: White chocolate vs black 
chocolate.
This has taken a significant amount of time at the meeting. I haven't learned
much from that kind of discussion and I do not think that helped to make
much progress.

If people can re-make their points in the list it would help the debate.

Another point that a few raised is that we have to make a decision as fast as 
possible.
I dismissed entirely that argument. Trading off latency with resilience of the 
Internet
is entirely against the design principle of the Internet architecture itself.
Risk analysis is something that we should keep in mind even when deploying any 
experiment
and should be a substantial part of it.

Someone claimed that on-line meeting traffic is elastic. This is not true, I 
tried to
clarify this. These applications (WebEx/Zoom) are low rate, a typical maximum 
upstream
rate is 2Mbps and is not elastic. These applications have often a stand-alone 
app
that is not using the browser WebRTC stack (the standalone app typically works 
better).

A client sends upstream one or two video qualities unless the video camera is 
switched off.
In presence of losses, FEC is used but it is still non elastic.
Someone claimed (at yesterday's meeting) that fairness is not an issue (who 
cares, I heard!)
Well, fairness can constitute a differentiation advantage between two companies 
that are
commercializing on-line meetings products. Unless at the IETF we accept
"law-of-the-jungle" behaviours from Internet applications developers, we should 
be careful
about making such claims.
Any opportunity to cheat, that brings a business advantage WILL be used.

/Luca

TL;DR
To Dave: you asked several times what  Cisco does on latency reduction in
network equipment. I tend to be very shy when replying on these questions
as this is not vendor neutral. If chairs think this is not appropriate for
the list, please say it and I'll reply privately only.

What I write below can be found in Cisco products data sheets and is not
trade secret. There are very good blog posts explaining details.
Not surprisingly Cisco implements the state of the art on the topic
and it is totally feasible to do-the-right-thing in software and hardware.

Cisco implements AFD (one queue + a flow table) accompanied by a priority queue 
for
flows that have a certain profile in rate and size. The concept is well known 
and well
studied in 

Re: [Bloat] what protocols do the AI's speak?

2019-12-17 Thread Holland, Jake
The article says RoCE.

On v2 that means UDP, I believe:
https://en.wikipedia.org/wiki/RDMA_over_Converged_Ethernet#RoCE_v2


On 2019-12-16, 18:06, "Dave Taht"  wrote:

Apparently 100Gbit networking is getting important in the AI space. My
question is - what protocols do they speak? Is it a ton of udp? raw
packets? tcp?

https://www.forbes.com/sites/moorinsights/2019/12/16/intel-acquires-habana-labs-for-2b/#79d3efe319f9

-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] datapoint from one vendor regarding bloat

2019-04-11 Thread Holland, Jake
Ah, I see what you mean.  Yes, this makes sense as a major concern worth 
checking,
thanks for explaining.

-Jake

On 2019-04-11, 17:37, "Jonathan Morton"  wrote:

> On 12 Apr, 2019, at 2:56 am, Holland, Jake  wrote:
> 
> But in practice do you expect link speed changes to be a major issue?

For wireline, consider ADSL2+.  Maximum downstream link speed is 24Mbps, 
impaired somewhat by ATM framing but let's ignore that for now.  A basic 
"poverty package" might limit to 4Mbps; already a 6:1 ratio.  In rural areas 
the "last mile" copper may be so long and noisy for certain individual 
subscribers that only 128Kbps is available; this is now a 192:1 ratio, turning 
10ms into almost 2 seconds if uncompensated from the headline 24Mbps rate.  
Mind you, 10ms is too short to get even a single packet through at 128Kbps, so 
you'd need to put in a failsafe.

That's on wireline, where link speed changes are relatively infrequent and 
usually minor, so it's easy to signal changes back to some discrete policer box 
(usually called a BRAS in an ADSL context).  That may be what you have in mind.

One could, and should, also consider wireless technologies.  A given 
handset on a given SIM card may expect 100Mbps LTE under ideal conditions, in a 
major city during the small hours, but might only have a dodgy EDGE signal on a 
desolate hilltop a few hours later.  (Here in Finland, cell coverage was 
greatly improved in rural areas by cascading old 2G equipment from urban areas 
that received 3G upgrades, so this is not at all uncommon.)  In general, 
wireless links change rate rapidly and unpredictably in reaction to propagation 
conditions as the handset moves (or, for fixed stations, as the weather 
changes), and the ratio of possible link rates is even more severe than the 
ADSL example above.

Often a "poverty package" is implemented through a shaper rather than a 
policer, backed by a dumb FIFO on which no right-sizing has been considered 
(even though the link rate is obviously known in advance).  On one of these, I 
have personally experienced 40+ seconds of delay, rendering the connection 
useless for other purposes while any sort of sustained download was in 
progress.  In fact, that's one of the incidents which got me seriously 
interested in congestion control; at the time, I hacked the Linux TCP stack to 
right-size the receive window, and directed most of my traffic through a proxy 
running on that machine.  This was sufficient to restore some usability.

I find it notable that ISPs mostly consider only policers for congestion 
signalling, and rarely deploy even these to all the places where congestion may 
reasonably occur.

 - Jonathan Morton



___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] datapoint from one vendor regarding bloat

2019-04-11 Thread Holland, Jake
On 2019-04-11, 11:29, "Jonathan Morton"  wrote:
> A question I would ask, though, is whether that 10ms automatically scales to 
> the actual link rate, or whether it is pre-calculated for the fastest rate 
> and then actually turns into a larger time value when the link rate drops.  
> That's a common fault with sizing FIFOs, too.

That's an interesting question and maybe a useful experiment, if somebody's
got one of these boxes.

But in practice do you expect link speed changes to be a major issue?  Would
this just be an extra point that should also be tuned if the max link speed is
changed dramatically by config and the policer is turned on (so there's, say, a
5% increased chance of misconfig and a reasonably diagnosable problem that a
knowledgebase post somewhere would end up covering once somebody's dug into it),
or is there a deeper problem if it's pre-calculated for the fastest rate?

-Jake


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] datapoint from one vendor regarding bloat

2019-04-11 Thread Holland, Jake
MBS = maximum burst size
PIR = peak information rate
CBS = committed burst size
CIR = committed information rate

Pages 1185 thru 1222 of the referenced doc* are actually really interesting 
reading
and an excellent walk-through of their token bucket concept and how to use it.

Best,
Jake

*
https://infoproducts.alcatel-lucent.com/cgi-bin/dbaccessfilename.cgi/3HE13300TQZZA01_V1_Advanced%20Configuration%20Guide%20for%207450%20ESS%207750%20SR%20and%207950%20XRS%20for%20Releases%20up%20to%2014.0.R7%20-%20Part%20II.pdf


On 2019-04-11, 10:54, "Jonathan Morton"  wrote:

> On 11 Apr, 2019, at 1:38 pm, Mikael Abrahamsson  wrote:
> 
> The mbs defines the MBS for the PIR bucket and the cbs defines the CBS 
for the CIR bucket

What do these lumps of jargon refer to?

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-27 Thread Holland, Jake
Thanks Greg,

But wouldn’t this potentially violate at least one MUST from section 5.2 of 
l4s-id?

   The likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST
   be roughly proportional to the square of the likelihood that it would
   have marked it if it had been an L4S packet (p_L)
https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-06#section-5.2

Maybe it depends on how far you stretch “roughly”, I guess...  I’m not sure, 
but I’d imagine some realistic conditions could provide counterexamples, unless 
there’s some reason I’m missing that prevents it?

Regards,
Jake

From: Greg White 
Date: 2019-03-20 at 14:49
To: "Holland, Jake" , Bob Briscoe , 
"David P. Reed" , Vint Cerf 
Cc: tsvwg IETF list , bloat 
Subject: Re: [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and 
experimentation of TCP Prague/L4S hackaton at IETF104

I responded to point 2 separately.  In response to point 1, the dual queue 
mechanism is not the only way to support L4S and TCP Prague.  As we’ve 
mentioned a time or two, an fq_codel system can also support L4S and TCP 
Prague.  I’m not aware that anyone has implemented it to test it out yet 
(because so far most interest has been on dual-queue), but I think the simplest 
version is:

At dequeue:
If packet indicates ECT(1):
If sojourn_time > L4S_threshold:
Set CE*, and forward packet
Else:
Forward packet
Else:
Do all the normal CoDel stuff

In many cases, all of the packets in a single queue will either all be ECT(1) 
or none of them will.  But, to handle VPNs (where a mix of ECT(1) and 
non-ECT(1) packets could share a queue), I would think that including the ECN 
field in the flow hash would keep those packets separate.

*a more sophisticated approach would be to mark CE based on a ramp function 
between a min_thresh and max_thresh, which could be implemented as a random 
draw, or via a counting function




From: Bloat  on behalf of "Holland, Jake" 

Date: Wednesday, March 20, 2019 at 1:06 PM
To: Bob Briscoe , "David P. Reed" , 
Vint Cerf 
Cc: tsvwg IETF list , bloat 
Subject: Re: [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and 
experimentation of TCP Prague/L4S hackaton at IETF104

Hi Bob & Greg,

I agree there has been a reasonably open conversation about the L4S
work, and thanks for all your efforts to make it so.

However, I think there's 2 senses in which "private" might be fair that
I didn't see covered in your rebuttals (merging forks and including
Greg's rebuttal by reference from here:
https://lists.bufferbloat.net/pipermail/bloat/2019-March/009038.html )

Please note:
I don't consider these senses of "private" a disqualifying argument
against the use of L4S, though I do consider them costs that should be
taken into account (and of course opinions may differ here).

With that said, I wondered whether either of you have any responses that
speak to these points:


1. the L4S use of the ECT(1) codepoint can't be marked CE except by a
patent-protected AQM scheduler.

I understand that l4s-id suggests the possibility of an alternate
scheme.  However, comparing the MUSTs of the section 5 requirements
with the claims made by the patent seems to leave no room for an
alternate that would not infringe the patent claims, unless I'm missing
something?

https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-06#section-5
https://patents.google.com/patent/US20170019343A1/en


2. the L4S use of the ECT(1) codepoint privileges the low latency use
case.

As Jonathan Morton pointed out, low latency is only one of several
known use cases that would be worthwhile to identify to an AQM
scheduler, which the L4S approach cannot be extended to handle:
- Minimize Latency
- Minimize Loss
- Minimize Cost
- Maximize Throughput

https://lists.bufferbloat.net/pipermail/ecn-sane/2019-March/66.html

I understand that "Minimize Cost" is perhaps covered by LEPHB, and that
operator tuning parameters for a dualq node can possibly allow for
tuning between minimizing loss and latency, as mentioned in section
4.1 of aqm-dualq, but since the signaling is bundled together, it can
only happen for one at a time, if I'm reading it right.

But more importantly, the L4S usage couples the minimized latency use
case to any possibility of getting a high fidelity explicit congestion
signal, so the "maximize throughput" use case can't ever get it.


Regards,
Jake

PS:
If you get a chance, I'm still interested in seeing answers to my
questions about deployment mitigations on the tsvwg list:
https://mailarchive.ietf.org/arch/msg/tsvwg/TWOVpI-SvVsYVy0_U6K8R04eq3A

I'm not surprised if it slipped by unnoticed, there have been a lot of
emails on this.  But good answers to those questions would go a long way
toward easing my concerns about the urgency on this discussion.

PPS:
The

Re: [Bloat] [Ecn-sane] can we setup a "how to get this into existing networks" get-together in Prague coming week?

2019-03-26 Thread Holland, Jake
Hi Mikael,

Any operator nibbles on making this meeting happen?

I'm not sure how useful I can be, but if there's any network
operators reading, I would love to hear the questions and
concerns from your side, and I'm happy to explain anything
I can help explain.

(Of course if it's just going to be vendors and/or implementors
getting together to agree ECN is a good idea, we might as well
bring plenty of beer...)

-Jake

On 2019-03-23, 20:12, "Mikael Abrahamsson"  wrote:

On Sat, 23 Mar 2019, Jonathan Morton wrote:

> Heated agreement from over here, despite my preference for flow 
> isolation.  Plain old AQM can be orders of magnitude better than a dumb 
> FIFO.

In my testing, FIFO with RED was already huge improvement over just plain 
FIFO.

Configuring 1GE shaping with FIFO yielded 100ms buffering just by naive 
configuration, adding one line of random-detect config brought this down 
to 10-15ms without any loss of actual throughput.


On Thu, 21 Mar 2019, Mikael Abrahamsson wrote:

Btw, I reached out to some people here at the BBF about doing
anti-bufferbloat in this context and getting this into the documents, and
there is no reason why this can't be introduced.

Now, the proposal needs to be "reasonable" and implementable, so if
someone would be interested in work like that I'd like to hear from you. I
have taken initiative in trying to come up with configuration guidance for
operators for their existing equipment, and that could be a way forward.

...

I'll be at the IETF meeting monday-friday coming week, can we set up a
meeting with some interested parties and actually have a "how do we get
this into networks" kind of meeting. It would not be "my mechanism is
better than yours" meeting, I'm not interested in that in this context.

 

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] can we setup a "how to get this into existing networks" get-together in Prague coming week?

2019-03-23 Thread Holland, Jake
On 2019-03-21, 11:32, "Mikael Abrahamsson"  wrote:
I'll be at the IETF meeting monday-friday coming week, can we set up a 
meeting with some interested parties and actually have a "how do we get 
this into networks" kind of meeting. It would not be "my mechanism is 
better than yours" meeting, I'm not interested in that in this context.

I'm interested in "what can we do to improve the situation in the next 1-2 
years including installed base".

This seems like a useful idea to me.

I guess based on Apple's observations[1], we know there's at least
one network in France and at least one in the Argentina that has
rolled this out at some scale.

Does anyone know which networks these are and maybe have contacts
there?  Maybe someone could find out what they did and write up a
case study?

I agree even getting some regular basic CE-marking stuff deployed
would probably be really cool if it can be done sanely.

-Jake

[1] page 12:
https://www.ietf.org/proceedings/98/slides/slides-98-maprg-tcp-ecn-experience-with-enabling-ecn-on-the-internet-padma-bhooma-00.pdf

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [tsvwg] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-20 Thread Holland, Jake
Hi Greg,

On 2019-03-20, 13:55, "Greg White"  wrote:
In normal conditions, L4S offers "Maximize Throughput" +  "Minimize Loss" + 
"Minimize Latency" all at once.  It doesn't require an application to have to 
make that false choice (hence the name "Low Latency Low Loss Scalable 
throughput").  


[JH] This is an interesting claim, and I'm eager to see
how well it holds up under scrutiny and deployment.

I guess I'm not sure what exactly "normal" means, but I
would expect that there are a lot of cases that occur
frequently in practice where tradeoffs have to be made
between throughput, loss, and latency.

I'm finding I struggle to nail down exactly what I expect
from scenarios like a short-RTT L4S flow competing with a
long-RTT L4S flow (from transit delay) and with a BBR flow,
and likewise when a short and a long RTT L4S flow are
competing with a bunch of independent slow-start flows,
but if the L4S cases do indeed get a better throughput than
SCE-based approaches under the wide variety of situations
normal internet use can fall into, I think that would
convince me it's optimizing all of them at once, and it's
a mistake to call it focused on the latency use case.

But for now, I hope you'll forgive a little bit of
skepticism...  I find this stuff complicated, and it's hard
for me to put high confidence on some of the predictions.

Best regards,
Jake


If an application would prefer to "Minimize Cost", then I suppose it could 
adjust its congestion control to be less aggressive (assuming it is congestion 
controlled). Also, as you point out the LEPHB could be an option as well.

What section 4.1 in the dualq draft is referring to is a case where the 
system needs to protect against unresponsive, overloading flows in the low 
latency queue.   In that case something has to give (you can't ensure low 
latency & low loss to e.g. a 100 Mbps unresponsive flow arriving at a 50 Mbps 
bottleneck).

-Greg




On 3/20/19, 2:05 PM, "Bloat on behalf of Jonathan Morton" 
 wrote:

> On 20 Mar, 2019, at 9:39 pm, Gorry Fairhurst  
wrote:
> 
> Concerning "Maximize Throughput", if you don't need scalability to 
very high rates, then is your requirement met by TCP-like semantics, as in TCP 
with SACK/loss or even better TCP with ABE/ECT(0)?

My intention with "Maximise Throughput" is to support the bulk-transfer 
applications that TCP is commonly used for today.  In Diffserv terminology, you 
may consider it equivalent to "Best Effort".

As far as I can see, L4S offers "Maximise Throughput" and "Minimise 
Latency" services, but not the other two.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [tsvwg] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-20 Thread Holland, Jake
Hi Sebastian,

+1 on "approximate classifier" and CE.  This does have me worried,
particularly in the case of multiple AQMs inline.

I think we mostly agree here, and my default position is still
that because SCE is cleaner, I like it better unless L4S can
demonstrate significant value over SCE.

I just wanted to make the point that there's a real chance
L4S might be able to demonstrate significant value that's much
harder to get in some other way (like DSCP), particularly in a
core or closer-to-core device where fq is impractical.

I basically was responding just because I felt like the dismissal
was premature, and seemed to rest on fq being practical in all
AQM cases, which I think is also unproven.  (Even if the OLT can
be solved, I'm willing to stipulate there might be relevant
devices where fq is impractical.)

But I don't mean to argue against the conclusion that SCE seems
more likely to cause fewer problems.  Just that it might be useful
to avoid dismissing legitimate points in L4S's favor.

Best Regards,
Jake

On 2019-03-20, 16:31, "Sebastian Moeller"  wrote:

Hi Jake,


> On Mar 21, 2019, at 00:11, Holland, Jake  wrote:
> 
> I think it's a fair point that even as close as the non-home side
> of the access network, fq would need a lot of queues, and if you
> want something in hardware it's going to be tricky.  I hear
> they're up to an average of ~6k homes per OLT.

Except they state "In practice it would also be important to de- ploy 
AQM in the residential gateway, but to minimise side-effects we kept upstream 
traffic below capacity." meaning in addition to the OLT/BNG/whatever shaper 
they also envision a shaper on the CPE. And I believe there is ample evidence 
(in openwrt with sqm-scripts) that in that case the downstream shaper can also 
be put on the CPE with reasonable success. 


> 
> I don't think the default assumption here should be that they
> missed something obvious, but rather that they're trying to
> solve a hard problem, and something with a classifier has a
> legitimate value.

I agree, except ECT(1) clearly is a very approximate "classifier" as it 
can not distinguish the L4S-ness of CE marked packets, which affects both the 
AQM part which will treat non-L4S traffic as false positive as well as TCP 
Prague endpoints that will mistreat CE-marked packets as L4S signals even if 
the CE mark is from a TCP-friendly AQM. I note that neither "‘Data Centre to 
the Home’: Ultra-Low Latency for All" nor "PI2: A Linearized AQM for both 
Classic and Scalable TCP" seem to discuss these classification errors and their 
effects on real traffic in sufficient depth.
It is one thing to soak of one of the last few available "codepoints" in 
the IP headers, but it is another in my book to do so and not reliably being 
able to extract the encoded information. At least from my layman's perspective 
I wonder why this does not seem to bother anybody here?

> 
> The question to me is about how much it breaks other things to
> extract that value, and how much you get out of it in the end.

That is basically the core of my question above, how much do you get 
out in the end?

>  If you need fq and therefore the only viable place for AQM with good
> results is on the home side of the router, that's got some bad
> deployment problems too.

As I state above, even the L4S project position seems to be that AQM on 
the CPE/router is essential, so we are only haggling about how much AQM needs 
to be done on the router. But from that perspective, I would not be unhappy if 
my ISP would employ a lower latency AQM solution upstream of my router than 
they currently do, sort of as a belt and suspender approach to have my router's 
back in cases of severe packet inrush.

Best Regards
Sebastian

> 
> Just my 2c.
> 
> -Jake
> 
> On 2019-03-20, 15:56, "Sebastian Moeller"  wrote:
> 
> 
> 
>> On Mar 20, 2019, at 23:31, Jonathan Morton  wrote:
>> 
>>> On 21 Mar, 2019, at 12:12 am, Sebastian Moeller  wrote:
>>> 
>>> they see 20ms queue delay with a 7ms base link delay @ 40 Mbps
>> 
>> At 40Mbps you might as well be running Cake, and thereby getting 1ms 
inter-flow induced delay; an order of magnitude better.  And we achieved that o 
a shoestring budget while they were submarining for a patent application.
>> 
>> If we're supposed to be impressed…
> 
>Nah, there is this GEM:
> 
> 
>Comparing Experiments 5, 7 with 6, 8, we can again conclude that our 
DualQ AQM very much approximates the fq CoDel AQM without the need for flo

Re: [Bloat] [tsvwg] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-20 Thread Holland, Jake
I think it's a fair point that even as close as the non-home side
of the access network, fq would need a lot of queues, and if you
want something in hardware it's going to be tricky.  I hear
they're up to an average of ~6k homes per OLT.

I don't think the default assumption here should be that they
missed something obvious, but rather that they're trying to
solve a hard problem, and something with a classifier has a
legitimate value.

The question to me is about how much it breaks other things to
extract that value, and how much you get out of it in the end.  If
you need fq and therefore the only viable place for AQM with good
results is on the home side of the router, that's got some bad
deployment problems too.

Just my 2c.

-Jake

On 2019-03-20, 15:56, "Sebastian Moeller"  wrote:



> On Mar 20, 2019, at 23:31, Jonathan Morton  wrote:
> 
>> On 21 Mar, 2019, at 12:12 am, Sebastian Moeller  wrote:
>> 
>> they see 20ms queue delay with a 7ms base link delay @ 40 Mbps
> 
> At 40Mbps you might as well be running Cake, and thereby getting 1ms 
inter-flow induced delay; an order of magnitude better.  And we achieved that o 
a shoestring budget while they were submarining for a patent application.
> 
> If we're supposed to be impressed…

Nah, there is this GEM:


Comparing Experiments 5, 7 with 6, 8, we can again conclude that our DualQ 
AQM very much approximates the fq CoDel AQM without the need for flow identi- 
fication and more complex processing. The main ad- vantage is DualQ’s lower 
queuing delay for L4S traffic.

So for normal traffic is is worse than fq_codel and better for traffic that 
does behave TCP-friendly, for which it was bespoke made. So at least they shoud 
have pimped fq_codel/cake to emit their required CE marking regime and do a 
test against that, if the goal is to compare apples and apples. I note that 
they do come into this with a grudge against fq "Per-flow queuing:  Similarly 
per-flow queuing is not incompatible with the L4S approach.  However, one queue 
for every flow can be thought of as overkill compared to the minimum of two 
queues for 
all traffic needed for the L4S approach.  The overkill of per-flow queuing 
has side-effects:" followed by a list of 4 more or less straw-man arguments. 
Heck these might be actually reasonable arguments at their core, but the short 
description in the RFC is fishy.
I believe the coupling between the two queues to be clever and elegant, but 
the whole premise seems odd to me. What they should have done, IMHO is teach 
their AQM something like SCE so it can easily react to CE and drops in a 
standard compliant TCP-friendly fashion, and only do the clever window/rate 
adjustments if the AQM signals ECT(1), add fair queueing to separate the 
different TCP variants behavior from each other, and bang no classification bit 
needed. And no patent (assuming the patent covers the coupling between the two 
queues)... I am sure I am missing something here, it can not be that simple.


Best Regards
Sebastian

P.S.: How did the SCE-Talk go, interesting feed-back and discussions?



> 
> - Jonathan Morton
> 

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-20 Thread Holland, Jake
On 2019-03-20, 12:58, "Stephen Hemminger"  wrote:
Has anyone done a detailed investigation for prior art?

I don't know, but for serendipitous reasons I recently came across this,
which may be interesting to those who would be interested in digging
further on that question:

http://www.statslab.cam.ac.uk/~frank/PAPERS/tac.pdf
"On packet marking at priority queues"
R. J. Gibbens and F. P. Kelly
IEEE Transactions on Automatic Control 47 (2002) 1016-1020.

I wasn't sure if it was technically prior art or not, and I'm offering
no opinion, but it sounded similar to me in some significant ways.

HTH.

-Jake

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [tsvwg] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-20 Thread Holland, Jake
On 2019-03-20, 12:40, "Gorry Fairhurst"  wrote:
Concerning "Maximize Throughput", if you don't need scalability to very 
high rates, then is your requirement met by TCP-like semantics, as in 
TCP with SACK/loss or even better TCP with ABE/ECT(0)?

[JH] A problem with TCP with BE/ECT(0) is that it gets at most 1 ECE signal
per round-trip, so the kind of high-fidelity congestion response I hope to
see out of some upcoming SCE-echo proposal would be very welcome, especially
in BBR (or similar), as well as anything that uses slow-start.

I wonder  if the intent is to scale to really high rates, then the 
control loop delay for the congestion-controller becomes a limiting 
issue, and in that case low-latency is necessary to safely climb the 
rate to the high speed - and conversely to allow the controller to react 
quickly when (or if) that overshoots a capacity bottleneck. In other 
words, is scalable high throughput inseperable from low latency?

[JH] I agree lower latency, particularly including anything that avoids
buffer-bloat, is an important factor in returning a timely congestion
signal to the sender.

However, there may be a big difference in throughput between a CC that
allows for an increase of, say, 10-20ms, or +.5 base-RTT at a bottleneck,
vs. one that pushes back on anything above 1ms, especially when considering
paths with longer transit times.

In that sense of course a good bandwidth-maximizing approach benefits from
keeping latency low also, but perhaps with different thresholds.

Gorry


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-20 Thread Holland, Jake
on
* Nicolas Kuhn of CNES has been assessing the use of L4S for satellite.
* Magnus Westerlund of Ericsson with a team of others has written the necessary 
ECN feedback into QUIC
* L4S hardware is also being implemented for hi-speed switches at the moment
(the developer wants to announce it themselves, so I have been asked not to 
identify them).

There's a lot more stuff been going on, but I've tried to pick out highlights.

All this is good healthy development of much lower latency for Internet 
technology.


I find it extremely disappointing that some people on this list are taking such 
a negative attitude to the major development in their own field that they seem 
not to have noticed since it first hit the limelight in 2015.

L4S has been open-sourced since 2016 so that everyone can develop it and make 
it better...

If I was in this position, having overlooked something important for multiple 
years, I would certainly not condemn it while I was trying to understand what 
it was and how it worked. Can I suggest everyone takes a step back, and 
suspends judgement until they have understood the potential, the goals and the 
depth of what they have missed. People who know me, know that I am very careful 
with Internet architecture, and particularly with balancing public policy 
against commercial issues. Please presume respect unless proven otherwise.

Best Regards



Bob

PS. Oh and BBR would be welcome to use the ECT(1) codepoint to get into the L4S 
queue. But only if it can keep latency down below around 1ms. Currently those 
~15-25ms delay spikes would not pass muster. Indeed, its delay is not 
consistently low enough between the spikes either.









-Original Message-
From: "Vint Cerf" <mailto:v...@google.com>
Sent: Saturday, March 16, 2019 5:57pm
To: "Holland, Jake" <mailto:jholl...@akamai.com>
Cc: "Mikael Abrahamsson" <mailto:swm...@swm.pp.se>, "David P. 
Reed" <mailto:dpr...@deepplum.com>, 
"ecn-s...@lists.bufferbloat.net"<mailto:ecn-s...@lists.bufferbloat.net> 
<mailto:ecn-s...@lists.bufferbloat.net>, 
"bloat" <mailto:bloat@lists.bufferbloat.net>
Subject: Re: [Ecn-sane] [Bloat] [iccrg] Fwd: [tcpPrague] Implementation and 
experimentation of TCP Prague/L4S hackaton at IETF104
where does BBR fit into all this?
v

On Sat, Mar 16, 2019 at 5:39 PM Holland, Jake 
mailto:jholl...@akamai.com>> wrote:
On 2019-03-15, 11:37, "Mikael Abrahamsson" 
mailto:swm...@swm.pp.se>> wrote:
L4S has a much better possibility of actually getting deployment into the
wider Internet packet-moving equipment than anything being talked about
here. Same with PIE as opposed to FQ_CODEL. I know it's might not be as
good, but it fits better into actual silicon and it's being proposed by
people who actually have better channels into the people setting hard
requirements.

I suggest you consider joining them instead of opposing them.


Hi Mikael,

I agree it makes sense that fq_anything has issues when you're talking
about the OLT/CMTS/BNG/etc., and I believe it when you tell me PIE
makes better sense there.

But fq_x makes great sense and provides real value for the uplink in a
home, small office, coffee shop, etc. (if you run the final rate limit
on the home side of the access link.)  I'm thinking maybe there's a
disconnect here driven by the different use cases for where AQMs can go.

The thing is, each of these is the most likely congestion point at
different times, and it's worthwhile for each of them to be able to
AQM (and mark packets) under congestion.

One of the several things that bothers me with L4S is that I've seen
precious little concern over interfering with the ability for another
different AQM in-path to mark packets, and because it changes the
semantics of CE, you can't have both working at the same time unless
they both do L4S.

SCE needs a lot of details filled in, but it's so much cleaner that it
seems to me there's reasonably obvious answers to all (or almost all) of
those detail questions, and because the semantics are so much cleaner,
it's much easier to tell it's non-harmful.


The point you raised in another thread about reordering is mostly
well-taken, and a good counterpoint to the claim "non-harmful relative
to L4S".

To me it seems sad and dumb that switches ended up trying to make
ordering guarantees at cost of switching performance, because if it's
useful to put ordering in the switch, then it must be equally useful to
put it in the receiver's NIC or OS.

So why isn't it in all the receivers' NIC or OS (where it would render
the switch's ordering efforts moot) instead of in all the switches?

I'm guessing the answer is a competition trap for the switch vendors,
plus "with ordering goes faster than without, when you benchmark the
switch with typical load and current (non-RACK) receivers".

If that's the case, it seems li

[Bloat] SCE receiver-side vs. sender-side response

2019-03-16 Thread Holland, Jake
Hi guys,

I was looking through the updates on SCE, and I think this
receiver-driven idea is doomed.  I mean, good luck and it's a neat idea
if it works, but it's got some problems with flexibility:
- you don't want to reduce right edge of window, so you can't reduce
  rwnd faster than data is arriving.
- growing rwnd won't grow cwnd automatically at that rate, so it's only
  a cap on how fast cwnd can grow, not a way to make it actually grow
  from the sender's side.  I think this limits sender's response
  substantially.

So I'd like to get just a quick sanity check on the 3 ways that occur to
me to report the SCE ratio explicitly to sender, because I think that's
going to end up being necessary.  I assume some of these have occurred
to others already, but it would be good to see some discussion:

1. a new TCP option:
As some have mentioned, there are middle boxes known to drop unknown TCP
options[1].  But the nice thing about "backward compatible" means you
don't lose the regular ECE response, so it's no different from not
having the SCE marks, and just degrades gracefully.  So I rate this
viable here because it's an optional bonus, if I'm not missing
something?

2. new TCP header flag:
There's 4 more reserved bits in TCP, so it's not hard to imagine a ESCE
that's either 1-for-1 with extra acks as needed, like L4S's ECE
response, or with a proportion matching the SCE proportional packet rate
on the naturally occurring acks.  (I like the 2nd option better because
I think it still probabilistically works well even with GRO, depending
how GRO coalesces the SCE flag, but maybe there's some trickiness.)

3. new TCP header flag plus URG pointer space:
It's also not hard to imagine extending #2 so that it's incompatible
with URG on the same packet (which mostly nobody uses, AFAIK?), and
using the URG ptr space to report a fixed-point value for SCE rate over
a defined time span or packet count or something.


I can see any of these 3 with or without SYN/SYNACK option negotiation
at startup. (Or even from just setting ESCE, sort of like the ECE
signaling in regular RFC3168 ECT negotiation.)

I do think that to capture the value here, the SCE rate probably has to
get back to sender, but I think there's good options for doing so, and
anywhere it doesn't work, (I think?) it just falls back to regular CE,
which is a nice bonus that comes from backward compatibility.

Any other ideas?  Any pros or cons that spring to mind on these ideas?

Cheers,
Jake

[1] Is it still possible to extend TCP?  (Honda et. al, SigComm 2013)
http://nrg.cs.ucl.ac.uk/mjh/tmp/mboxes.pdf


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-16 Thread Holland, Jake
Yeah, great question.

I don't want to answer for the L4S guys, I don't have a good feel for
what they might think.  But it does concern me that there seems to be at
least one tuning parameter that was picked for Reno, which I think I
mentioned on the tsvwg list:
https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-08#section-2.1

For SCE, I would assume they'll want to make use of it at some point,
and so they'll have to write a draft for how BBR will handle it.

I think there's an open question of what exactly the rate of SCE
markings would look like for a SCE-capable AQM, and presumably this also
needs to be nailed down before it can be useful.  My initial instinct is
a probabilistic SCE setting based on current queue length, either when
forwarded or when enqueued, but I think this will take some more
thought, and I'm not sure that's best.

But whatever the most informative schedule we can figure out is, if that
info can get back to sender, it can essentially do whatever it thinks is
right, according to the cc it’s running, is my understanding.

-Jake


From: Vint Cerf 
Date: 2019-03-16 at 14:57
To: "Holland, Jake" 
Cc: Mikael Abrahamsson , "David P. Reed" 
, "ecn-s...@lists.bufferbloat.net" 
, bloat 
Subject: Re: [Ecn-sane] [Bloat] [iccrg] Fwd: [tcpPrague] Implementation and 
experimentation of TCP Prague/L4S hackaton at IETF104

where does BBR fit into all this?

v


On Sat, Mar 16, 2019 at 5:39 PM Holland, Jake 
mailto:jholl...@akamai.com>> wrote:
On 2019-03-15, 11:37, "Mikael Abrahamsson" 
mailto:swm...@swm.pp.se>> wrote:
L4S has a much better possibility of actually getting deployment into the
wider Internet packet-moving equipment than anything being talked about
here. Same with PIE as opposed to FQ_CODEL. I know it's might not be as
good, but it fits better into actual silicon and it's being proposed by
people who actually have better channels into the people setting hard
requirements.

I suggest you consider joining them instead of opposing them.


Hi Mikael,

I agree it makes sense that fq_anything has issues when you're talking
about the OLT/CMTS/BNG/etc., and I believe it when you tell me PIE
makes better sense there.

But fq_x makes great sense and provides real value for the uplink in a
home, small office, coffee shop, etc. (if you run the final rate limit
on the home side of the access link.)  I'm thinking maybe there's a
disconnect here driven by the different use cases for where AQMs can go.

The thing is, each of these is the most likely congestion point at
different times, and it's worthwhile for each of them to be able to
AQM (and mark packets) under congestion.

One of the several things that bothers me with L4S is that I've seen
precious little concern over interfering with the ability for another
different AQM in-path to mark packets, and because it changes the
semantics of CE, you can't have both working at the same time unless
they both do L4S.

SCE needs a lot of details filled in, but it's so much cleaner that it
seems to me there's reasonably obvious answers to all (or almost all) of
those detail questions, and because the semantics are so much cleaner,
it's much easier to tell it's non-harmful.


The point you raised in another thread about reordering is mostly
well-taken, and a good counterpoint to the claim "non-harmful relative
to L4S".

To me it seems sad and dumb that switches ended up trying to make
ordering guarantees at cost of switching performance, because if it's
useful to put ordering in the switch, then it must be equally useful to
put it in the receiver's NIC or OS.

So why isn't it in all the receivers' NIC or OS (where it would render
the switch's ordering efforts moot) instead of in all the switches?

I'm guessing the answer is a competition trap for the switch vendors,
plus "with ordering goes faster than without, when you benchmark the
switch with typical load and current (non-RACK) receivers".

If that's the case, it seems like the drive for a competitive advantage
caused deployment of a packet ordering workaround in the wrong network
location(s), out of a pure misalignment of incentives.

RACK rates to fix that in the end, but a lot of damage is already done,
and the L4S approach gives switches a flag that can double as proof that
RACK is there on the receiver, so they can stop trying to order those
packets.

So point granted, I understand and agree there's a cost to abandoning
that advantage.


But as you also said so well in another thread, this is important.  ("The
last unicorn", IIRC.)  How much does it matter if there's a feature that
has value today, but only until RACK is widely deployed?  If you were
convinced RACK would roll out everywhere within 3 years and SCE would
produce better results than L4S over the following 15 years, would that
change your mind?

It would for me, and that's why I'd like to see SCE explored befor

Re: [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

2019-03-16 Thread Holland, Jake
On 2019-03-15, 11:37, "Mikael Abrahamsson"  wrote:
L4S has a much better possibility of actually getting deployment into the 
wider Internet packet-moving equipment than anything being talked about 
here. Same with PIE as opposed to FQ_CODEL. I know it's might not be as 
good, but it fits better into actual silicon and it's being proposed by 
people who actually have better channels into the people setting hard 
requirements.

I suggest you consider joining them instead of opposing them.


Hi Mikael,

I agree it makes sense that fq_anything has issues when you're talking
about the OLT/CMTS/BNG/etc., and I believe it when you tell me PIE
makes better sense there.

But fq_x makes great sense and provides real value for the uplink in a
home, small office, coffee shop, etc. (if you run the final rate limit
on the home side of the access link.)  I'm thinking maybe there's a
disconnect here driven by the different use cases for where AQMs can go.

The thing is, each of these is the most likely congestion point at
different times, and it's worthwhile for each of them to be able to
AQM (and mark packets) under congestion.

One of the several things that bothers me with L4S is that I've seen
precious little concern over interfering with the ability for another
different AQM in-path to mark packets, and because it changes the
semantics of CE, you can't have both working at the same time unless
they both do L4S.

SCE needs a lot of details filled in, but it's so much cleaner that it
seems to me there's reasonably obvious answers to all (or almost all) of
those detail questions, and because the semantics are so much cleaner,
it's much easier to tell it's non-harmful.


The point you raised in another thread about reordering is mostly
well-taken, and a good counterpoint to the claim "non-harmful relative
to L4S".

To me it seems sad and dumb that switches ended up trying to make
ordering guarantees at cost of switching performance, because if it's
useful to put ordering in the switch, then it must be equally useful to
put it in the receiver's NIC or OS.

So why isn't it in all the receivers' NIC or OS (where it would render
the switch's ordering efforts moot) instead of in all the switches?

I'm guessing the answer is a competition trap for the switch vendors,
plus "with ordering goes faster than without, when you benchmark the
switch with typical load and current (non-RACK) receivers".

If that's the case, it seems like the drive for a competitive advantage
caused deployment of a packet ordering workaround in the wrong network
location(s), out of a pure misalignment of incentives.

RACK rates to fix that in the end, but a lot of damage is already done,
and the L4S approach gives switches a flag that can double as proof that
RACK is there on the receiver, so they can stop trying to order those
packets.

So point granted, I understand and agree there's a cost to abandoning
that advantage.


But as you also said so well in another thread, this is important.  ("The
last unicorn", IIRC.)  How much does it matter if there's a feature that
has value today, but only until RACK is widely deployed?  If you were
convinced RACK would roll out everywhere within 3 years and SCE would
produce better results than L4S over the following 15 years, would that
change your mind?

It would for me, and that's why I'd like to see SCE explored before
making a call.  I think at its core, it provides the same thing L4S does
(a high-fidelity explicit congestion signal for the sender), but with
much cleaner semantics that can be incrementally added to congestion
controls that people are already using.

Granted, it still remains to be seen whether SCE in practice can match
the results of L4S, and L4S was here first.  But it seems to me L4S comes
with some problems that have not yet been examined, and that are nicely
dodged by a SCE-based approach.

If L4S really is as good as they seem to think, I could imagine getting
behind it, but I don't think that's proven yet.  I'm not certain, but
all the comparative analyses I remember seeing have been from more or
less the same team, and I'm not convinced they don't have some
misaligned incentives of their own.

I understand a lot of work has gone into L4S, but this move to jump it
from interesting experiment to de-facto standard without a more critical
review that digs deeper into some of the potential deployment problems
has me concerned.

If it really does turn out to be good enough to be permanent, I'm not
opposed to it, but I'm just not convinced that it's non-harmful, and my
default position is that the cleaner solution is going to be better in
the long run, if they can do the same job.

It's not that I want it to be a fight, but I do want to end up with the
best solution we can get.  We only have the one internet.

Just my 2c.  

-Jake


___
Bloat mailing list
Bloat@lists.bufferbloat.net

Re: [Bloat] The "Some Congestion Experienced" ECN codepoint - a new internet draft -

2019-03-11 Thread Holland, Jake
+1, I agree SCE on its own isn't enough.

Before I support adoption as a proposed standard I'd want real-world tests
demonstrating the value.  I believe SCE has potential similar to L4S by
providing a similar fine-grained congestion signal, and that it does so
in a much cleaner way.

But there's a big gap between "has potential" and "has proven benefit" that
I'd want to see filled before it's an RFC.

That said, I would like to see experiments go forward, and I would like to
see this become an active draft that a wg owns, so that if the experiments
do prove there's utility that can be captured, it has a good path forward.

I have concerns about L4S, and so my relief is about seeing a cleaner
(and backward compatible!) proposal that does something that to me looks
like a very similar effect.  And I would very much like to see a bakeoff
of some sort before committing ECT(1) to the use of L4S, since it seems
to me there are some ugly problems down that road.

Cheers,
Jake

On 2019-03-11, 02:47, "Mikael Abrahamsson"  wrote:

On Mon, 11 Mar 2019, Jonathan Morton wrote:

>> On 11 Mar, 2019, at 11:07 am, Mikael Abrahamsson  
wrote:
>>
>> Well, I am not convinced blowing the last codepoint on SCE has enough 
merit.
>
> I will make a stronger statement: I am convinced that blowing the last 
codepoint on L4S does *not* have enough merit.

... and I believe that blowing it on SCE doesn't have merit either.

That's the entire thing why I am opposing the use of ECT(1) unless we're 
*really* *really* *really* sure there is something we want to blow it on.

Using it for SCE for me is marginal benefit compared to what ECT(1) could 
be used for either. I think L4S is proposing enough novelty that it could 
be used for that, but I'm open for other suggestions. SCE isn't enough.

-- 
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Ecn-sane] The "Some Congestion Experienced" ECN codepoint - a new internet draft -

2019-03-11 Thread Holland, Jake
Hi,

It's a fair question which is the best use of the last IP codepoint,
and thanks Mikael for raising some good questions.

There was a great example posted recently of why the wifi,docsis,etc.
ordering attempts aren't good enough to guarantee ordering (namely ECMP):
https://mailarchive.ietf.org/arch/msg/tsvwg/WR-UXtj9jXiYNQQIALfXQDikAjw


I also was thinking something like RACK is a better solution in the long
term.  If >80% of flows start responding better to reordering, future switches
can safely drop the reordering requirements they've adopted to work around
TCP's problems.  (It's a shame that the switches ended up having to work
around that, but there was never an ordering requirement on the IP
packets, just a performance improvement to the extent you managed to
provide one.)

I think both proposals are doing almost the same thing from the point of
view of the endpoints (that is: they're providing a fine-grained congestion
signal so that sender can back off earlier and less aggressively than
multiplicative decrease).

The key difference to me is that L4S doesn't work without a dual queue.
It embeds a major design decision ("how many queues should there be at
a middle box?") into the codepoint, and comes with very narrow requirements
on the sender's congestion control.

SCE by contrast can have different responses for different congestion
controls, with more room for experimenting and fine tuning.  It fits much
better with concepts like fq, which can maintain something like fairness
even when different senders are behaving differently.

Along the same lines, it's less open to abuse, because there's not an
opt-in fast lane, like with L4S.  I'm worried L4S just sticks us in the
same arms race with a different queue, in the end, where everybody does
better for them and worse for the network if they can find a way to cheat
a little.  So I view good fq support as more useful in the end.  (And I
agree that wifi/docsis/etc. shouldn't be trying to maintain ordering
guarantees.)

I do think SCE is going to end up needing TCP options to communicate the
congestion signal, but I think this is a good first step down a long
road, and I look forward to seeing experiments that can demonstrate the
advantages, if they turn out to be what I expect here.

Cheers,
Jake

On 2019-03-11, 02:07, "Mikael Abrahamsson"  wrote:

On Mon, 11 Mar 2019, Jonathan Morton wrote:

> Seriously?  I had to dig in the specs to find any mention of that, and… 
> it's all about better supporting bonded links.  Which can already be

It doesn't stop there. Right now DOCSIS, 3GPP networks, Wifi etc all do 
ordering guarantees, so they will hold up any delivery of packets until it 
can assure that none of them are delivered out of order.

Allowing these transports to re-order the packets mean they can do a 
better job than they do today. You do not want to ask them to drop their 
packets either because the drop rate is potentially way higher than most 
transports would feel comfortable with.

> done by implementing RACK at the sender, and all you propose is that 
> when L4S is in use, the extra buffering at the link layer is dropped.

I did?

> This is absolutely useless for ordinary Internet users, who are unlikely 
> to have consecutive packets sufficiently closely traversing such a link 
> for this reordering to exceed the 3-dupack threshold in any case - so 
> you might as well delete that reordering buffer anyway, and let the 
> endpoints handle it.  You don't need L4S for that.

That's not my experience with wifi and how it behaves at the edge.

> endpoints (eg. using AccECN) to discover whether setting ECT(1) at the 
> sender is legal.  SCE does not require such negotiation (ie. a transport 
> could implement it entirely at the receiver, manipulating the send rate 
> via the already-standardised receive window), so should be easier to 
> specify and deploy successfully.

Well, I am not convinced blowing the last codepoint on SCE has enough 
merit.


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The "Some Congestion Experienced" ECN codepoint - a new internet draft -

2019-03-10 Thread Holland, Jake
Hi Dave,

You and John have my enthusiastic +1.

It's a frank relief to read this draft after trying to figure out L4S,
and I think the basic core concept upon which to build the actual
response systems is very well separated and very well framed here.

Please submit this and present, I humbly beg you.  It seems to me a
strictly better use of ECT(1), even though there's still probably a
few hundred pages' worth of catching up to do on draft-writing to
nail down details.

I have a few minor comments for your consideration, but please don't
let them stop you from posting before deadline, if any are hard to
integrate.  It would be better to ignore them all and post as-is than
to get hung up on these:

1.
"Some" in "Some Congestion Experienced" is maybe misleading, and
arguably has the same meaning as "Congestion Experienced".

I was thinking maybe "Pre-Congestion Experienced" or "Queue
Utilization Observed", or if you want to preserve "SCE" and the
link to CE (which I do agree is nice), maybe "Slight" or "Sub"
instead of "Some", just to try to make it more obvious it's
flagging a lesser situation than "there is some congestion".

2.
It's easy to accidently read section 5 as underspecified concrete
proposals instead of rough sketches for future direction that might
be worth investigating.  I'll offer an attempt at some language,
feel free to edit (or ignore if you think the intro is enough to
make the scope sufficiently clear already):


The scope of this document is limited to the definition of the
SCE codepoint.  However, for illustration purposes, a few possible
future usage scenarios are outlined here.  These examples are non
normative.

3.
Similarly, I would lower-case the "MAY" and "SHOULD" in section
5.2 for receiver-side handling in TCP--it's not clear this will
ever be a good idea to do without more explicit signaling thru
new opts or whatever, and granting permission here seems like
asking for trouble that's just not necessary.


And a few that I'd defer if I were you, but I'd like to see
sometime in at least a post-Prague version or list discussion:

4.
an informative reference or 2 would be a welcome addition in Section 3:

   Research has shown that the ECT(1) codepoint goes essentially unused,
   with the "Nonce Sum" extension to ECN having not been implemented in

5.
Should this must be MUST in Section 4?  If not, why not?

   New SCE-aware receivers and transport protocols must continue to


Thanks guys, nice work and good luck!

Cheers,
Jake


On 2019-03-10, 11:07, "Dave Taht"  wrote:

I would love to have some fresh eyeballs on a new IETF draft for the
TSVWG we intend to submit tonight.

I've attached the html for easy to read purposes, but I would prefer
that folk referred back to the github repository for the most current
version, which is here:


https://github.com/dtaht/bufferbloat-rfcs/blob/master/sce/draft-morton-taht-SCE.txt

and in open source tradition, discuss here, or file bugs, and submit
pull requests to the gitub.

The first draft (of 3 or more pending), is creating the SCE codepoint
and defining the state machine, is pretty short, and we think the
basic concept solves a zillion problems with ECN in one stroke. It's
easy to implement (one line of code in codel), backward compatible
with all the RFCs, and somewhat incompatible with the stalled out TCP
Prague/dualpi effort in the IETF.

We have several other drafts in progress which I increasingly doubt
we'll finish today, but I think only this one is required to get an
audience in the tsvwg at the coming IETF meeting.

If ya have any comments and spare time today, I'd like to get the
first draft in tonight, and the filing deadline for final drafts is
sometime tomorrow. It may help for context to review some of the other
work in the github repo.

THX!

-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] one benefit of turning off shaping + fq_codel

2018-11-27 Thread Holland, Jake
On 2018-11-27, 10:31, "Stephen Hemminger"  wrote:
With asynchronous circuits there is too much unpredictablity and 
instability.
Seem to remember there are even cases where two inputs arrive at once and 
output is non-determistic.

IIRC they talked about that some too. I think maybe some papers were going back 
and forth. But last I heard, they proved that this is not a real objection, in 
that:
1. you can quantify the probability of failure and ensure a design keeps it 
under threshold when operating within specified conditions (e.g. normal 
temperature and voltage thresholds)
2. you can work around the issues where it's critical by adding failure 
detection and faults, and
3. you have the exact same fundamental theoretical problem with synchronous 
circuits, particularly in registers that can keep a value through a clock 
cycle, but it hasn't stopped them from being useful.

I'm not an expert and this was all a long time ago for me, but  the qdi wiki 
page doesn't disagree with what I'm remembering here, and has some good 
references on the topic:
https://en.wikipedia.org/wiki/Quasi-delay-insensitive_circuit#Stability_and_non-interference
https://en.wikipedia.org/wiki/Quasi-delay-insensitive_circuit#Timing


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] one benefit of turning off shaping + fq_codel

2018-11-27 Thread Holland, Jake
On 2018-11-23, 08:33, "Dave Taht"  wrote:
Back in the day, I was a huge fan of async logic, which I first
encountered via caltech's cpu and later the amulet.

https://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU

...

I've never really understood why it didn't take off, I think, in part,
it doesn't scale to wide busses well, and that centrally clocked designs
are how most engineers and fpgas and code got designed since. Anything
with delay built into it seems hard for EEs to grasp but I wish I
knew why, or had the time to go play with circuits again at a reasonable
scale.

At the time, I was told the objections they got were that it uses about 2x the 
space for the same functionality, and space usage is approximately linear with 
the chip cost, and when under load you still need reasonable cooling, so it was 
only considered maybe worthwhile for some narrow use cases.

I don't really know enough to confirm or deny the claim, and the use cases may 
have gotten a lot closer to a good match by now, but this was the opinion of at 
least some of the people involved with the work, IIRC.


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat