Max Inden mentioned "careful resume",
 "Because it is difficult to estimate the sender's initial window and how
quicly it ramps up - especially with algorithms like Careful Resume, which
don't use Slow Start - my preference is to disable auto tuning by default."

If you do use Careful Resume, you probably want to ensure that MAX DATA
limit is larger than the remembered CWND. But of course, that's only useful
if there are enough bytes to send in the active streams.

-- Christian Huitema


On 10/14/2025 6:27 PM, Kazuho Oku wrote:
2025年10月14日(火) 23:45 Ian Swett <[email protected]>:

Thanks for bringing this up, Kazuho.  My back-of-the envelope math also
indicated that 1/3 was a better value than 1/2 when I looked into it a few
years ago, but I never constructed a clean test to prove it with a real
congestion controller.  Unfortunately, our congestion control simulator is
below our flow control layer.

It probably makes sense to test this in the real-world and see if reducing
it to 1/3 measurably reduces the number of blocked frames we receive on our
servers.

Makes perfect sense. In fact, that was how we noticed the problem — someone
asked us why the H3 traffic we were serving was slower than H2. Looking at
the stats, we saw that we were receiving blocked frames, and ended up
reading the client-side source code to identify the bug.


There are use cases when auto-tuning is nice.  Even for Chrome, there are
cases when if we started with a smaller stream flow control window, we
would have avoided some bugs where a few streams consume the entire
connection flow control window.

Yeah, it can certainly be useful at times to block the sender’s progress so
that resources can be utilized elsewhere.

That said, blocking Slow Start from making progress is a different matter —
especially after spending so much effort developing QUIC based on the idea
that reducing startup latency by one RTT is worth it.


Thanks, Ian

On Tue, Oct 14, 2025 at 8:49 AM Kazuho Oku <[email protected]> wrote:


2025年10月14日(火) 19:36 Max Inden <[email protected]>:

* Send MAX_DATA / MAX_STREAM_DATA no later than when 33% (i.e., 1/3) of
the credit is consumed.

Firefox will send MAX_STREAM_DATA after 25% of the credit has been
consumed.


https://github.com/mozilla/neqo/blob/791fd40fb7e9ee4599c07c11695d1849110e704b/neqo-transport/src/fc.rs#L30-L37

* Instead of doubling (x2) the window size, increase it by a larger
factor (e.g., x4).

Firefox will increase the window by up to 4x the overshoot of the
current BDP estimate.

Good to know that Firefox uses these numbers. They look fine to me,
though depending on the size of the initial credit, Careful Resume might
get blocked.

https://github.com/mozilla/neqo/blob/791fd40fb7e9ee4599c07c11695d1849110e704b/neqo-transport/src/fc.rs#L402-L409

* disable auto tuning entirely (it's needed only for latency-sensitive
applications).

What would be a reasonable one-size-fits-all stream data window size,
which at the same time doesn't expose the receiver to a memory exhaustion
attack?

Because it is difficult to estimate the sender's initial window and how
quicly it ramps up - especially with algorithms like Careful Resume, which
don't use Slow Start - my preference is to disable auto tuning by default.

Wouldn't a high but reasonable start value + window auto-tuning be ideal?


Yeah, I think there’s often confusion between two distinct aspects:
a) the maximum buffer size that the receiver can allocate, and
b) how fast the sender might transmit.

A is what receivers need to prevent memory-exhaustion attacks. It’s
purely a local policy: the limit might be 1 MB or 10 MB, but it’s unrelated
to B — that is, it doesn’t depend on how quickly the sender sends.

For latency-sensitive applications that read slowly, it’s important to
cap the receive buffer at roughly read_speed × latency, because otherwise
bufferbloat increases latency. But again, that consideration is separate
from B.

In my view, B mainly concerns minimizing the amount of memory allocated
inside the kernel. Kernel-space memory management is far more constrained
than in user space: allocations often have to be contiguous, and falling
back to swap is not an option. Note also that the TCP/IP stack is decades
old, from an era when memory was a much more precious resource than it is
today.

In contrast, a QUIC stack running in user space can rely on virtual
memory, where fragmentation is rarely a real issue. When a user-space
buffer fills up, the program can simply call realloc() and append
data—possibly incurring operations such as virtual-memory remapping or
paging. There is no need to pre-reserve large contiguous chunks of memory.

To summarize, there is far less need in QUIC, if any, to minimize the
receive window advertised to the peer, compared to what was necessary for
in-kernel TCP.


On 14/10/2025 03.38, Kazuho Oku wrote:



2025年9月29日(月) 16:28 Max Inden <[email protected]>:

For what it is worth, also referencing previous discussion on this list:

"Why isn't QUIC growing?"

https://mailarchive.ietf.org/arch/msg/quic/RBhFFY3xcGRdBEdkYmTK2k926mQ/

Reading the old thread, I'm reminded that people often assume QUIC
performs better than TCP. However, that is true only when the QUIC stack is
implemented, configured, and deployed correctly.

One bug I've seen in multiple stacks - one that significantly affects
benchmark results - is the failure to auto-tune the receive window as
aggressively as the sender's Slow Start allows.

Based on my understanding, Google Quiche implements receive window
auto-tuning as follows:
* Send MAX_DATA / MAX_STREAM_DATA when 50% of the credit has been
consumed.
* Double the window size when these frames are frequently.

Several other stacks have adopted this approach.

The problem with this logic is that it's too conservative and causes the
sender to become flow-control-blocked during Slow Start.

Consider the following example:
1. The receiver advertises an initial Maximum Data of W.
2. After receiving 0.5W bytes, the receiver sends Maximum Data=2.5W
along with ACKs up to W/2. The next Maximum Data will be sent once the
receiver has received 1.5W bytes.
3. The receiver receives bytes up to W and ACKs them.
4. At this point, the sender's Slow Start permits transmission up to 2W
bytes, but the advertised receive window is only 1.5W. As a rsult, the
connection becomes flow-control-blocked.

There are several ways to address this issue:
* Send MAX_DATA / MAX_STREAM_DATA no later than when 33% (i.e., 1/3) of
the credit is consumed.
* Instead of doubling (x2) the window size, increase it by a larger
factor (e.g., x4).
* disable auto tuning entirely (it's needed only for latency-sensitive
applications).

Because it is difficult to estimate the sender's initial window and how
quicly it ramps up - especially with algorithms like Careful Resume, which
don't use Slow Start - my preference is to disable auto tuning by default.

In fact, this is also the choice made by Chromium, which is why it is
not affected by this bug!

For reference, Tatshiro addressed this issue in ngtcp2 in the
follwing PRs;
* https://github.com/ngtcp2/ngtcp2/pull/1396 - Tweak threshold for
max_stream_data and max_data transmission
* https://github.com/ngtcp2/ngtcp2/pull/1397 - Add note for window
auto-tuning
* https://github.com/ngtcp2/ngtcp2/pull/1398 - examples/client: Disable
window auto-tuning by default

However, I suspect the bug may still exist in other stacks.

On 29/09/2025 05.38, Lars Eggert wrote:
Hi,

pitch for a discussion at 124.

https://radar.cloudflare.com/
<https://radar.cloudflare.com/adoption-and-usage?dateRange=52w> and
similar stats have had H3 around 30% for a few years now, with little
changes since the first quichbram up to that level.

Topic: why is that and is there anything the WG or IETF can do to
change it (upwards, of course)?

Thanks,
Lars
--
Sent from a mobile device; please excuse typos.


--
Kazuho Oku


--
Kazuho Oku


Reply via email to