Re: Buggy receive window auto-tuning inside multiple QUIC stacks (Re: More QUIC, how?)

Kazuho Oku Sat, 18 Oct 2025 08:48:23 -0700

2025年10月14日(火) 23:45 Ian Swett <[email protected]>:

> Thanks for bringing this up, Kazuho.  My back-of-the envelope math also
> indicated that 1/3 was a better value than 1/2 when I looked into it a few
> years ago, but I never constructed a clean test to prove it with a real
> congestion controller.  Unfortunately, our congestion control simulator is
> below our flow control layer.
>
> It probably makes sense to test this in the real-world and see if reducing
> it to 1/3 measurably reduces the number of blocked frames we receive on our
> servers.
>


Makes perfect sense. In fact, that was how we noticed the problem — someone
asked us why the H3 traffic we were serving was slower than H2. Looking at
the stats, we saw that we were receiving blocked frames, and ended up
reading the client-side source code to identify the bug.


> There are use cases when auto-tuning is nice.  Even for Chrome, there are
> cases when if we started with a smaller stream flow control window, we
> would have avoided some bugs where a few streams consume the entire
> connection flow control window.
>

Yeah, it can certainly be useful at times to block the sender’s progress so
that resources can be utilized elsewhere.

That said, blocking Slow Start from making progress is a different matter —
especially after spending so much effort developing QUIC based on the idea
that reducing startup latency by one RTT is worth it.


>
> Thanks, Ian
>
> On Tue, Oct 14, 2025 at 8:49 AM Kazuho Oku <[email protected]> wrote:
>
>>
>>
>> 2025年10月14日(火) 19:36 Max Inden <[email protected]>:
>>
>>> * Send MAX_DATA / MAX_STREAM_DATA no later than when 33% (i.e., 1/3) of
>>> the credit is consumed.
>>>
>>> Firefox will send MAX_STREAM_DATA after 25% of the credit has been
>>> consumed.
>>>
>>>
>>> https://github.com/mozilla/neqo/blob/791fd40fb7e9ee4599c07c11695d1849110e704b/neqo-transport/src/fc.rs#L30-L37
>>>
>>> * Instead of doubling (x2) the window size, increase it by a larger
>>> factor (e.g., x4).
>>>
>>> Firefox will increase the window by up to 4x the overshoot of the
>>> current BDP estimate.
>>>
>>
>> Good to know that Firefox uses these numbers. They look fine to me,
>> though depending on the size of the initial credit, Careful Resume might
>> get blocked.
>>
>>>
>>> https://github.com/mozilla/neqo/blob/791fd40fb7e9ee4599c07c11695d1849110e704b/neqo-transport/src/fc.rs#L402-L409
>>>
>>> * disable auto tuning entirely (it's needed only for latency-sensitive
>>> applications).
>>>
>>> What would be a reasonable one-size-fits-all stream data window size,
>>> which at the same time doesn't expose the receiver to a memory exhaustion
>>> attack?
>>>
>>> Because it is difficult to estimate the sender's initial window and how
>>> quicly it ramps up - especially with algorithms like Careful Resume, which
>>> don't use Slow Start - my preference is to disable auto tuning by default.
>>>
>>> Wouldn't a high but reasonable start value + window auto-tuning be ideal?
>>>
>>
>>
>> Yeah, I think there’s often confusion between two distinct aspects:
>> a) the maximum buffer size that the receiver can allocate, and
>> b) how fast the sender might transmit.
>>
>> A is what receivers need to prevent memory-exhaustion attacks. It’s
>> purely a local policy: the limit might be 1 MB or 10 MB, but it’s unrelated
>> to B — that is, it doesn’t depend on how quickly the sender sends.
>>
>> For latency-sensitive applications that read slowly, it’s important to
>> cap the receive buffer at roughly read_speed × latency, because otherwise
>> bufferbloat increases latency. But again, that consideration is separate
>> from B.
>>
>> In my view, B mainly concerns minimizing the amount of memory allocated
>> inside the kernel. Kernel-space memory management is far more constrained
>> than in user space: allocations often have to be contiguous, and falling
>> back to swap is not an option. Note also that the TCP/IP stack is decades
>> old, from an era when memory was a much more precious resource than it is
>> today.
>>
>> In contrast, a QUIC stack running in user space can rely on virtual
>> memory, where fragmentation is rarely a real issue. When a user-space
>> buffer fills up, the program can simply call realloc() and append
>> data—possibly incurring operations such as virtual-memory remapping or
>> paging. There is no need to pre-reserve large contiguous chunks of memory.
>>
>> To summarize, there is far less need in QUIC, if any, to minimize the
>> receive window advertised to the peer, compared to what was necessary for
>> in-kernel TCP.
>>
>>
>>> On 14/10/2025 03.38, Kazuho Oku wrote:
>>>
>>>
>>>
>>> 2025年9月29日(月) 16:28 Max Inden <[email protected]>:
>>>
>>>> For what it is worth, also referencing previous discussion on this list:
>>>>
>>>> "Why isn't QUIC growing?"
>>>>
>>>> https://mailarchive.ietf.org/arch/msg/quic/RBhFFY3xcGRdBEdkYmTK2k926mQ/
>>>>
>>>
>>> Reading the old thread, I'm reminded that people often assume QUIC
>>> performs better than TCP. However, that is true only when the QUIC stack is
>>> implemented, configured, and deployed correctly.
>>>
>>> One bug I've seen in multiple stacks - one that significantly affects
>>> benchmark results - is the failure to auto-tune the receive window as
>>> aggressively as the sender's Slow Start allows.
>>>
>>> Based on my understanding, Google Quiche implements receive window
>>> auto-tuning as follows:
>>> * Send MAX_DATA / MAX_STREAM_DATA when 50% of the credit has been
>>> consumed.
>>> * Double the window size when these frames are frequently.
>>>
>>> Several other stacks have adopted this approach.
>>>
>>> The problem with this logic is that it's too conservative and causes the
>>> sender to become flow-control-blocked during Slow Start.
>>>
>>> Consider the following example:
>>> 1. The receiver advertises an initial Maximum Data of W.
>>> 2. After receiving 0.5W bytes, the receiver sends Maximum Data=2.5W
>>> along with ACKs up to W/2. The next Maximum Data will be sent once the
>>> receiver has received 1.5W bytes.
>>> 3. The receiver receives bytes up to W and ACKs them.
>>> 4. At this point, the sender's Slow Start permits transmission up to 2W
>>> bytes, but the advertised receive window is only 1.5W. As a rsult, the
>>> connection becomes flow-control-blocked.
>>>
>>> There are several ways to address this issue:
>>> * Send MAX_DATA / MAX_STREAM_DATA no later than when 33% (i.e., 1/3) of
>>> the credit is consumed.
>>> * Instead of doubling (x2) the window size, increase it by a larger
>>> factor (e.g., x4).
>>> * disable auto tuning entirely (it's needed only for latency-sensitive
>>> applications).
>>>
>>> Because it is difficult to estimate the sender's initial window and how
>>> quicly it ramps up - especially with algorithms like Careful Resume, which
>>> don't use Slow Start - my preference is to disable auto tuning by default.
>>>
>>> In fact, this is also the choice made by Chromium, which is why it is
>>> not affected by this bug!
>>>
>>> For reference, Tatshiro addressed this issue in ngtcp2 in the
>>> follwing PRs;
>>> * https://github.com/ngtcp2/ngtcp2/pull/1396 - Tweak threshold for
>>> max_stream_data and max_data transmission
>>> * https://github.com/ngtcp2/ngtcp2/pull/1397 - Add note for window
>>> auto-tuning
>>> * https://github.com/ngtcp2/ngtcp2/pull/1398 - examples/client: Disable
>>> window auto-tuning by default
>>>
>>> However, I suspect the bug may still exist in other stacks.
>>>
>>> On 29/09/2025 05.38, Lars Eggert wrote:
>>>>
>>>> Hi,
>>>>
>>>> pitch for a discussion at 124.
>>>>
>>>> https://radar.cloudflare.com/
>>>> <https://radar.cloudflare.com/adoption-and-usage?dateRange=52w> and
>>>> similar stats have had H3 around 30% for a few years now, with little
>>>> changes since the first quichbram up to that level.
>>>>
>>>> Topic: why is that and is there anything the WG or IETF can do to
>>>> change it (upwards, of course)?
>>>>
>>>> Thanks,
>>>> Lars
>>>> --
>>>> Sent from a mobile device; please excuse typos.
>>>>
>>>>
>>>
>>> --
>>> Kazuho Oku
>>>
>>>
>>
>> --
>> Kazuho Oku
>>
>

-- 
Kazuho Oku

Re: Buggy receive window auto-tuning inside multiple QUIC stacks (Re: More QUIC, how?)

Reply via email to