On Tue, Oct 14, 2025 at 9:28 PM Kazuho Oku <[email protected]> wrote:
> > > 2025年10月14日(火) 23:45 Ian Swett <[email protected]>: > >> Thanks for bringing this up, Kazuho. My back-of-the envelope math also >> indicated that 1/3 was a better value than 1/2 when I looked into it a few >> years ago, but I never constructed a clean test to prove it with a real >> congestion controller. Unfortunately, our congestion control simulator is >> below our flow control layer. >> >> It probably makes sense to test this in the real-world and see if >> reducing it to 1/3 measurably reduces the number of blocked frames we >> receive on our servers. >> > > Makes perfect sense. In fact, that was how we noticed the problem — > someone asked us why the H3 traffic we were serving was slower than H2. > Looking at the stats, we saw that we were receiving blocked frames, and > ended up reading the client-side source code to identify the bug. > > >> There are use cases when auto-tuning is nice. Even for Chrome, there are >> cases when if we started with a smaller stream flow control window, we >> would have avoided some bugs where a few streams consume the entire >> connection flow control window. >> > > Yeah, it can certainly be useful at times to block the sender’s progress > so that resources can be utilized elsewhere. > > That said, blocking Slow Start from making progress is a different matter > — especially after spending so much effort developing QUIC based on the > idea that reducing startup latency by one RTT is worth it. > > I completely agree. Is there a good heuristic for guessing whether the peer is still in slow start, particularly when one doesn't know what congestion controller they're using? One could certainly use a heuristic such as the first N packets of the connection I'll assume they might be in slow start and then change strategies, but it's clearly not perfect. > >> Thanks, Ian >> >> On Tue, Oct 14, 2025 at 8:49 AM Kazuho Oku <[email protected]> wrote: >> >>> >>> >>> 2025年10月14日(火) 19:36 Max Inden <[email protected]>: >>> >>>> * Send MAX_DATA / MAX_STREAM_DATA no later than when 33% (i.e., 1/3) of >>>> the credit is consumed. >>>> >>>> Firefox will send MAX_STREAM_DATA after 25% of the credit has been >>>> consumed. >>>> >>>> >>>> https://github.com/mozilla/neqo/blob/791fd40fb7e9ee4599c07c11695d1849110e704b/neqo-transport/src/fc.rs#L30-L37 >>>> >>>> * Instead of doubling (x2) the window size, increase it by a larger >>>> factor (e.g., x4). >>>> >>>> Firefox will increase the window by up to 4x the overshoot of the >>>> current BDP estimate. >>>> >>> >>> Good to know that Firefox uses these numbers. They look fine to me, >>> though depending on the size of the initial credit, Careful Resume might >>> get blocked. >>> >>>> >>>> https://github.com/mozilla/neqo/blob/791fd40fb7e9ee4599c07c11695d1849110e704b/neqo-transport/src/fc.rs#L402-L409 >>>> >>>> * disable auto tuning entirely (it's needed only for latency-sensitive >>>> applications). >>>> >>>> What would be a reasonable one-size-fits-all stream data window size, >>>> which at the same time doesn't expose the receiver to a memory exhaustion >>>> attack? >>>> >>>> Because it is difficult to estimate the sender's initial window and how >>>> quicly it ramps up - especially with algorithms like Careful Resume, which >>>> don't use Slow Start - my preference is to disable auto tuning by default. >>>> >>>> Wouldn't a high but reasonable start value + window auto-tuning be >>>> ideal? >>>> >>> >>> >>> Yeah, I think there’s often confusion between two distinct aspects: >>> a) the maximum buffer size that the receiver can allocate, and >>> b) how fast the sender might transmit. >>> >>> A is what receivers need to prevent memory-exhaustion attacks. It’s >>> purely a local policy: the limit might be 1 MB or 10 MB, but it’s unrelated >>> to B — that is, it doesn’t depend on how quickly the sender sends. >>> >>> For latency-sensitive applications that read slowly, it’s important to >>> cap the receive buffer at roughly read_speed × latency, because otherwise >>> bufferbloat increases latency. But again, that consideration is separate >>> from B. >>> >>> In my view, B mainly concerns minimizing the amount of memory allocated >>> inside the kernel. Kernel-space memory management is far more constrained >>> than in user space: allocations often have to be contiguous, and falling >>> back to swap is not an option. Note also that the TCP/IP stack is decades >>> old, from an era when memory was a much more precious resource than it is >>> today. >>> >>> In contrast, a QUIC stack running in user space can rely on virtual >>> memory, where fragmentation is rarely a real issue. When a user-space >>> buffer fills up, the program can simply call realloc() and append >>> data—possibly incurring operations such as virtual-memory remapping or >>> paging. There is no need to pre-reserve large contiguous chunks of memory. >>> >>> To summarize, there is far less need in QUIC, if any, to minimize the >>> receive window advertised to the peer, compared to what was necessary for >>> in-kernel TCP. >>> >>> >>>> On 14/10/2025 03.38, Kazuho Oku wrote: >>>> >>>> >>>> >>>> 2025年9月29日(月) 16:28 Max Inden <[email protected]>: >>>> >>>>> For what it is worth, also referencing previous discussion on this >>>>> list: >>>>> >>>>> "Why isn't QUIC growing?" >>>>> >>>>> https://mailarchive.ietf.org/arch/msg/quic/RBhFFY3xcGRdBEdkYmTK2k926mQ/ >>>>> >>>> >>>> Reading the old thread, I'm reminded that people often assume QUIC >>>> performs better than TCP. However, that is true only when the QUIC stack is >>>> implemented, configured, and deployed correctly. >>>> >>>> One bug I've seen in multiple stacks - one that significantly affects >>>> benchmark results - is the failure to auto-tune the receive window as >>>> aggressively as the sender's Slow Start allows. >>>> >>>> Based on my understanding, Google Quiche implements receive window >>>> auto-tuning as follows: >>>> * Send MAX_DATA / MAX_STREAM_DATA when 50% of the credit has been >>>> consumed. >>>> * Double the window size when these frames are frequently. >>>> >>>> Several other stacks have adopted this approach. >>>> >>>> The problem with this logic is that it's too conservative and causes >>>> the sender to become flow-control-blocked during Slow Start. >>>> >>>> Consider the following example: >>>> 1. The receiver advertises an initial Maximum Data of W. >>>> 2. After receiving 0.5W bytes, the receiver sends Maximum Data=2.5W >>>> along with ACKs up to W/2. The next Maximum Data will be sent once the >>>> receiver has received 1.5W bytes. >>>> 3. The receiver receives bytes up to W and ACKs them. >>>> 4. At this point, the sender's Slow Start permits transmission up to 2W >>>> bytes, but the advertised receive window is only 1.5W. As a rsult, the >>>> connection becomes flow-control-blocked. >>>> >>>> There are several ways to address this issue: >>>> * Send MAX_DATA / MAX_STREAM_DATA no later than when 33% (i.e., 1/3) of >>>> the credit is consumed. >>>> * Instead of doubling (x2) the window size, increase it by a larger >>>> factor (e.g., x4). >>>> * disable auto tuning entirely (it's needed only for latency-sensitive >>>> applications). >>>> >>>> Because it is difficult to estimate the sender's initial window and how >>>> quicly it ramps up - especially with algorithms like Careful Resume, which >>>> don't use Slow Start - my preference is to disable auto tuning by default. >>>> >>>> In fact, this is also the choice made by Chromium, which is why it is >>>> not affected by this bug! >>>> >>>> For reference, Tatshiro addressed this issue in ngtcp2 in the >>>> follwing PRs; >>>> * https://github.com/ngtcp2/ngtcp2/pull/1396 - Tweak threshold for >>>> max_stream_data and max_data transmission >>>> * https://github.com/ngtcp2/ngtcp2/pull/1397 - Add note for window >>>> auto-tuning >>>> * https://github.com/ngtcp2/ngtcp2/pull/1398 - examples/client: >>>> Disable window auto-tuning by default >>>> >>>> However, I suspect the bug may still exist in other stacks. >>>> >>>> On 29/09/2025 05.38, Lars Eggert wrote: >>>>> >>>>> Hi, >>>>> >>>>> pitch for a discussion at 124. >>>>> >>>>> https://radar.cloudflare.com/ >>>>> <https://radar.cloudflare.com/adoption-and-usage?dateRange=52w> and >>>>> similar stats have had H3 around 30% for a few years now, with little >>>>> changes since the first quichbram up to that level. >>>>> >>>>> Topic: why is that and is there anything the WG or IETF can do to >>>>> change it (upwards, of course)? >>>>> >>>>> Thanks, >>>>> Lars >>>>> -- >>>>> Sent from a mobile device; please excuse typos. >>>>> >>>>> >>>> >>>> -- >>>> Kazuho Oku >>>> >>>> >>> >>> -- >>> Kazuho Oku >>> >> > > -- > Kazuho Oku >
