Re: Buggy receive window auto-tuning inside multiple QUIC stacks (Re: More QUIC, how?)

Ian Swett Sat, 18 Oct 2025 16:14:27 -0700

On Tue, Oct 14, 2025 at 9:28 PM Kazuho Oku <[email protected]> wrote:


>
>
> 2025年10月14日(火) 23:45 Ian Swett <[email protected]>:
>
>> Thanks for bringing this up, Kazuho.  My back-of-the envelope math also
>> indicated that 1/3 was a better value than 1/2 when I looked into it a few
>> years ago, but I never constructed a clean test to prove it with a real
>> congestion controller.  Unfortunately, our congestion control simulator is
>> below our flow control layer.
>>
>> It probably makes sense to test this in the real-world and see if
>> reducing it to 1/3 measurably reduces the number of blocked frames we
>> receive on our servers.
>>
>
> Makes perfect sense. In fact, that was how we noticed the problem —
> someone asked us why the H3 traffic we were serving was slower than H2.
> Looking at the stats, we saw that we were receiving blocked frames, and
> ended up reading the client-side source code to identify the bug.
>
>
>> There are use cases when auto-tuning is nice.  Even for Chrome, there are
>> cases when if we started with a smaller stream flow control window, we
>> would have avoided some bugs where a few streams consume the entire
>> connection flow control window.
>>
>
> Yeah, it can certainly be useful at times to block the sender’s progress
> so that resources can be utilized elsewhere.
>
> That said, blocking Slow Start from making progress is a different matter
> — especially after spending so much effort developing QUIC based on the
> idea that reducing startup latency by one RTT is worth it.
>
>

I completely agree.  Is there a good heuristic for guessing whether the
peer is still in slow start, particularly when one doesn't know what
congestion controller they're using?

One could certainly use a heuristic such as the first N packets of the
connection I'll assume they might be in slow start and then change
strategies, but it's clearly not perfect.


>
>> Thanks, Ian
>>
>> On Tue, Oct 14, 2025 at 8:49 AM Kazuho Oku <[email protected]> wrote:
>>
>>>
>>>
>>> 2025年10月14日(火) 19:36 Max Inden <[email protected]>:
>>>
>>>> * Send MAX_DATA / MAX_STREAM_DATA no later than when 33% (i.e., 1/3) of
>>>> the credit is consumed.
>>>>
>>>> Firefox will send MAX_STREAM_DATA after 25% of the credit has been
>>>> consumed.
>>>>
>>>>
>>>> https://github.com/mozilla/neqo/blob/791fd40fb7e9ee4599c07c11695d1849110e704b/neqo-transport/src/fc.rs#L30-L37
>>>>
>>>> * Instead of doubling (x2) the window size, increase it by a larger
>>>> factor (e.g., x4).
>>>>
>>>> Firefox will increase the window by up to 4x the overshoot of the
>>>> current BDP estimate.
>>>>
>>>
>>> Good to know that Firefox uses these numbers. They look fine to me,
>>> though depending on the size of the initial credit, Careful Resume might
>>> get blocked.
>>>
>>>>
>>>> https://github.com/mozilla/neqo/blob/791fd40fb7e9ee4599c07c11695d1849110e704b/neqo-transport/src/fc.rs#L402-L409
>>>>
>>>> * disable auto tuning entirely (it's needed only for latency-sensitive
>>>> applications).
>>>>
>>>> What would be a reasonable one-size-fits-all stream data window size,
>>>> which at the same time doesn't expose the receiver to a memory exhaustion
>>>> attack?
>>>>
>>>> Because it is difficult to estimate the sender's initial window and how
>>>> quicly it ramps up - especially with algorithms like Careful Resume, which
>>>> don't use Slow Start - my preference is to disable auto tuning by default.
>>>>
>>>> Wouldn't a high but reasonable start value + window auto-tuning be
>>>> ideal?
>>>>
>>>
>>>
>>> Yeah, I think there’s often confusion between two distinct aspects:
>>> a) the maximum buffer size that the receiver can allocate, and
>>> b) how fast the sender might transmit.
>>>
>>> A is what receivers need to prevent memory-exhaustion attacks. It’s
>>> purely a local policy: the limit might be 1 MB or 10 MB, but it’s unrelated
>>> to B — that is, it doesn’t depend on how quickly the sender sends.
>>>
>>> For latency-sensitive applications that read slowly, it’s important to
>>> cap the receive buffer at roughly read_speed × latency, because otherwise
>>> bufferbloat increases latency. But again, that consideration is separate
>>> from B.
>>>
>>> In my view, B mainly concerns minimizing the amount of memory allocated
>>> inside the kernel. Kernel-space memory management is far more constrained
>>> than in user space: allocations often have to be contiguous, and falling
>>> back to swap is not an option. Note also that the TCP/IP stack is decades
>>> old, from an era when memory was a much more precious resource than it is
>>> today.
>>>
>>> In contrast, a QUIC stack running in user space can rely on virtual
>>> memory, where fragmentation is rarely a real issue. When a user-space
>>> buffer fills up, the program can simply call realloc() and append
>>> data—possibly incurring operations such as virtual-memory remapping or
>>> paging. There is no need to pre-reserve large contiguous chunks of memory.
>>>
>>> To summarize, there is far less need in QUIC, if any, to minimize the
>>> receive window advertised to the peer, compared to what was necessary for
>>> in-kernel TCP.
>>>
>>>
>>>> On 14/10/2025 03.38, Kazuho Oku wrote:
>>>>
>>>>
>>>>
>>>> 2025年9月29日(月) 16:28 Max Inden <[email protected]>:
>>>>
>>>>> For what it is worth, also referencing previous discussion on this
>>>>> list:
>>>>>
>>>>> "Why isn't QUIC growing?"
>>>>>
>>>>> https://mailarchive.ietf.org/arch/msg/quic/RBhFFY3xcGRdBEdkYmTK2k926mQ/
>>>>>
>>>>
>>>> Reading the old thread, I'm reminded that people often assume QUIC
>>>> performs better than TCP. However, that is true only when the QUIC stack is
>>>> implemented, configured, and deployed correctly.
>>>>
>>>> One bug I've seen in multiple stacks - one that significantly affects
>>>> benchmark results - is the failure to auto-tune the receive window as
>>>> aggressively as the sender's Slow Start allows.
>>>>
>>>> Based on my understanding, Google Quiche implements receive window
>>>> auto-tuning as follows:
>>>> * Send MAX_DATA / MAX_STREAM_DATA when 50% of the credit has been
>>>> consumed.
>>>> * Double the window size when these frames are frequently.
>>>>
>>>> Several other stacks have adopted this approach.
>>>>
>>>> The problem with this logic is that it's too conservative and causes
>>>> the sender to become flow-control-blocked during Slow Start.
>>>>
>>>> Consider the following example:
>>>> 1. The receiver advertises an initial Maximum Data of W.
>>>> 2. After receiving 0.5W bytes, the receiver sends Maximum Data=2.5W
>>>> along with ACKs up to W/2. The next Maximum Data will be sent once the
>>>> receiver has received 1.5W bytes.
>>>> 3. The receiver receives bytes up to W and ACKs them.
>>>> 4. At this point, the sender's Slow Start permits transmission up to 2W
>>>> bytes, but the advertised receive window is only 1.5W. As a rsult, the
>>>> connection becomes flow-control-blocked.
>>>>
>>>> There are several ways to address this issue:
>>>> * Send MAX_DATA / MAX_STREAM_DATA no later than when 33% (i.e., 1/3) of
>>>> the credit is consumed.
>>>> * Instead of doubling (x2) the window size, increase it by a larger
>>>> factor (e.g., x4).
>>>> * disable auto tuning entirely (it's needed only for latency-sensitive
>>>> applications).
>>>>
>>>> Because it is difficult to estimate the sender's initial window and how
>>>> quicly it ramps up - especially with algorithms like Careful Resume, which
>>>> don't use Slow Start - my preference is to disable auto tuning by default.
>>>>
>>>> In fact, this is also the choice made by Chromium, which is why it is
>>>> not affected by this bug!
>>>>
>>>> For reference, Tatshiro addressed this issue in ngtcp2 in the
>>>> follwing PRs;
>>>> * https://github.com/ngtcp2/ngtcp2/pull/1396 - Tweak threshold for
>>>> max_stream_data and max_data transmission
>>>> * https://github.com/ngtcp2/ngtcp2/pull/1397 - Add note for window
>>>> auto-tuning
>>>> * https://github.com/ngtcp2/ngtcp2/pull/1398 - examples/client:
>>>> Disable window auto-tuning by default
>>>>
>>>> However, I suspect the bug may still exist in other stacks.
>>>>
>>>> On 29/09/2025 05.38, Lars Eggert wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> pitch for a discussion at 124.
>>>>>
>>>>> https://radar.cloudflare.com/
>>>>> <https://radar.cloudflare.com/adoption-and-usage?dateRange=52w> and
>>>>> similar stats have had H3 around 30% for a few years now, with little
>>>>> changes since the first quichbram up to that level.
>>>>>
>>>>> Topic: why is that and is there anything the WG or IETF can do to
>>>>> change it (upwards, of course)?
>>>>>
>>>>> Thanks,
>>>>> Lars
>>>>> --
>>>>> Sent from a mobile device; please excuse typos.
>>>>>
>>>>>
>>>>
>>>> --
>>>> Kazuho Oku
>>>>
>>>>
>>>
>>> --
>>> Kazuho Oku
>>>
>>
>
> --
> Kazuho Oku
>

Re: Buggy receive window auto-tuning inside multiple QUIC stacks (Re: More QUIC, how?)

Reply via email to