Hi all,

We've just isolated a very consistent way to recreate this (or similar 
conditions) in 3.2.4.

If you reload HAProxy and then use an already established QUIC connection, the 
following occurs consistently on the previous worker process:

haproxy[1464934]: FATAL: bug condition "((&qcs->el_buf)->n != (&qcs->el_buf))" 
matched at src/mux_quic.c:3738
haproxy[1464934]:   call trace(13):
haproxy[1464934]:   | 0x56525634e410 <17 1c 00 e8 60 e8 1b 00]: main+0x51ab0 > 
ha_backtrace_to_stderr
haproxy[1464934]:   | 0x5652564fcbc8 <89 d6 48 89 df ff 50 18]: 
cli_io_handler+0x2cec8
haproxy[1464934]:   | 0x5652565011e2 <48 89 df e8 1e b8 ff ff]: 
sc_conn_io_cb+0x82/0xbd > cli_io_handler+0x2cd00
haproxy[1464934]:   | 0x5652565ad98d <89 c2 48 89 cf 41 ff d1]: 
run_tasks_from_lists+0x2fd/0x89b
haproxy[1464934]:   | 0x5652565ae31a <4e 30 01 e8 76 f3 ff ff]: 
process_runnable_tasks+0x3ea/0x8f3 > 
haproxy[1464934]:   | 0x565256521966 <01 00 00 e8 ca c5 08 00]: 
run_poll_loop+0x146/0x567 > 
haproxy[1464934]:   | 0x565256521fe1 <00 00 00 e8 3f f8 ff ff]: 
run_thread_poll_loop+0x251/0x54a > run_poll_loop+0
haproxy[1464934]:   | 0x5652562fddbf <48 d0 00 e8 d1 3f 22 00]: 
main+0x145f/0x21f0 > run_thread_poll_loop

debugged:

#0  0x000056525634e426 in qmux_strm_snd_buf (sc=<optimized out>, 
buf=0x7f0f1ea26478, count=16144, flags=<optimized out>) at src/mux_quic.c:3738
3738            BUG_ON(LIST_INLIST(&qcs->el_buf));
(gdb) bt
#0  0x000056525634e426 in qmux_strm_snd_buf (sc=<optimized out>, 
buf=0x7f0f1ea26478, count=16144, flags=<optimized out>) at src/mux_quic.c:3738
#1  0x00005652564fcbc8 in sc_conn_send (sc=sc@entry=0x7f0f1f493c60) at 
src/stconn.c:1728
#2  0x00005652565011e2 in sc_conn_io_cb (t=0x7f0f1f6cc640, ctx=0x7f0f1f493c60, 
state=<optimized out>) at src/stconn.c:1925
#3  0x00005652565ad98d in run_tasks_from_lists (budgets=<optimized out>) at 
src/task.c:648
#4  0x00005652565ae31a in process_runnable_tasks () at src/task.c:889
#5  0x0000565256521966 in run_poll_loop () at src/haproxy.c:2851
#6  0x0000565256521fe1 in run_thread_poll_loop (data=<optimized out>) at 
src/haproxy.c:3067
#7  0x00005652562fddbf in main (argc=<optimized out>, argv=<optimized out>) at 
src/haproxy.c:3670

We have the following relevant config:

  # Enable proper binding for QUIC
  setcap cap_net_bind_service

  tune.quic.socket-owner connection
  tune.quic.frontend.max-idle-timeout 300s
  hard-stop-after 30m

We're not certain this is the same issue, but at least one thread to pull on in 
the meantime. Is there anything else we can pull from the coredump that would 
assist debugging this?

Best,
Luke

> On Jul 30, 2025, at 11:20, Luke Seelenbinder 
> <[email protected]> wrote:
> 
> Thanks, Amaury!
> 
> We'll give it a shot. Can we enable this selectively? We'll take a look at 
> that to isolate it to just the sessions that are problematic.
> 
> Best,
> Luke
> 
> —
> Luke Seelenbinder
> Stadia Maps | Founder & CEO
> stadiamaps.com
> 
>> On Jul 29, 2025, at 18:11, Amaury Denoyelle <[email protected]> wrote:
>> 
>> On Tue, Jul 29, 2025 at 04:10:12PM +0300, Luke Seelenbinder wrote:
>>> Hi list,
>>> We're working on debugging a quic/h3 issue on 3.2.3.
>>> The client shows a QUIC error (on Chrome v138). The server shows CD--, and 
>>> the connections timeout. The client then retries after ~6s on http2, and 
>>> everything succeeds.
>>> We're seeing the following debug details on the connections:
>>> fs=< qcs=0x7ff797f4ca00 .id=176 .st=HCR .flg=0x0181 .rx=486/1474200 
>>> rxb=0(1) .tx=0 0/6291456  buf=0(0)/0 .ti=30148/29785/0 
>>> qcc=0x7ff798552200(F) qc=0x7ff797fe0000 .st=INIT .sc=11 .hreq=11 
>>> .flg=0x0028 .tx=8133047 8133047/15728640 bwnd=492506/491520 
>>> conn.flg=0x803c0300 qc.wnd=511307/491520> bs=< h1s=0x7ff7954c7fc0 
>>> h1s.flg=0x14094 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET 
>>> status=200 .sd.flg=0x106c0a01 .sd.evts=E1 .sc.flg=0x00035211 
>>> .sc.app=0x7ff797857c00 .sc.evts=S1 .subs=(nil) h1c.flg=0x80004800 .sub=0 
>>> .ibuf=15568@0x7ff78cf96500+800/16384 .obuf=0@(nil)+0/0 .evts=M1 
>>> .task=0x7ff79251eec0 .exp=<NEVER> conn.flg=0x080300 conn.err_code=0 
>>> conn.evts=F1>
>> 
>> At first glance, I'm seeing value 0x0181 for QCS instances, which can be
>> translated as QC_SF_HREQ_RECV | QC_SF_TO_RESET | QC_SF_SIZE_KNOWN.
>> 
>> The second flag is probably the issue here : the stream has been
>> resetted by haproxy. This can occur due to several reasons :
>> * a STOP_SENDING frame was received
>> * an error during HTTP/3 decoding
>> * stream is shut prematurely by its upper haproxy layer
>> 
>> To further investigate on the issue, the easiest solution would be to
>> activate traces, so that we can detect which one of the condition above
>> was encountered. The simplest way is to use the following command-line
>> argument to haproxy : '-dt qmux:developer:minimal', to output debug
>> traces on stderr. This will be really verbose, if not suitable a traces
>> section can be configured to redirect output on a sink.
>> 
>> Regards,
>> 
>> -- 
>> Amaury Denoyelle
>> 
>> 
> 

Reply via email to