On 10/16/2025 8:44 PM, Kazuho Oku wrote:
2025年10月16日(木) 11:17 Ian Swett <[email protected]>:
On Tue, Oct 14, 2025 at 9:28 PM Kazuho Oku <[email protected]> wrote:
2025年10月14日(火) 23:45 Ian Swett <[email protected]>:
Thanks for bringing this up, Kazuho. My back-of-the envelope math also
indicated that 1/3 was a better value than 1/2 when I looked into it a few
years ago, but I never constructed a clean test to prove it with a real
congestion controller. Unfortunately, our congestion control simulator is
below our flow control layer.
It probably makes sense to test this in the real-world and see if
reducing it to 1/3 measurably reduces the number of blocked frames we
receive on our servers.
Makes perfect sense. In fact, that was how we noticed the problem —
someone asked us why the H3 traffic we were serving was slower than H2.
Looking at the stats, we saw that we were receiving blocked frames, and
ended up reading the client-side source code to identify the bug.
There are use cases when auto-tuning is nice. Even for Chrome, there
are cases when if we started with a smaller stream flow control window, we
would have avoided some bugs where a few streams consume the entire
connection flow control window.
Yeah, it can certainly be useful at times to block the sender’s progress
so that resources can be utilized elsewhere.
That said, blocking Slow Start from making progress is a different matter
— especially after spending so much effort developing QUIC based on the
idea that reducing startup latency by one RTT is worth it.
I completely agree. Is there a good heuristic for guessing whether the
peer is still in slow start, particularly when one doesn't know what
congestion controller they're using?
IIUC, the primary intent of auto-tuning is to avoid bufferbloat when the
receiving application is slow to read.
The intent makes perfect sense, but I’m under the impression that the "old"
approach - estimating the sender’s rate and trying to stay slightly ahead
of it - is showing its age.
It really depends what you want to achieve. As you mention later, it
depends whether the QUIC stack delivers data to the application via a
queue or via a callback. If using a callback, by definition the stack
will not keep a queue of packets "received but not delivered". Any
buffer bloat will happen in the application. If it cannot keep up with
the rate at which the peer is sending, it will have to either buffer
unprocessed data or drop it on the floor. And if you don't want that to
happen, then you need to either do a lot of guesswork and predict how
fast the application will process the data, or instead just provide an
API to the application to control that. That's why I ended up
implementing a per stream API in picoquic to let the application either
just process data as they come (the default), or take control and open
the "max stream data" parameter as it needs.
By the way, there is another issue than just "the receiver cannot cope".
Packets for a stream may be received out of order. The stack can only
deliver them to the application in order. Suppose that the stack has
increased "max stream data" to allow a thousand packets on the stream.
If the first packet is lost, the stack may have to buffer 999 packets
until it receives the correction. So there is a direct relation between
"max stream data" times number of streams and the max memory that the
stack will need. On a small device, one has to be cautious. And if you
have a memory budget, then it makes sense to just enforce it using "max
data".
-- Christian Huitema