Le 08/10/2024 à 16:13, Christopher Faulet a écrit :
Le 08/10/2024 à 11:05, Luke Seelenbinder a écrit :
Hi Christopher,
I was out last week, but we were able to gather a few more pieces of data.
1) We ran some tcpdumps, and nothing odd popped up at all. Given our traffic
levels, we were actually surprised how few RST, etc. we had. Any TCP issues that
did occur did *not* coincide with the SD-- log lines.
2) We've confirmed this also happens with similar rates on publicly served
traffic, as well (which mostly uses varnish as the backend). The symptom is a
slightly truncated response body.
That left us with the upgrade to 3.0.5 as the culprit, so we're going to need to
rollback to 3.0.4 for now. If we have the time, we'll run some more tests on a
limited subset of servers to see if we can find out anything more.
Thanks Luke for the info. For me, there are just few commits that may be related
(with no explanation):
* e2a93b6492 ("BUG/MEDIUM: stconn: Report error on SC on send if a previous
SE
error was set")
* 0be5e36d8c ("BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in
CLOSING state")
* aa43ed1719 ("BUG/MEDIUM: http-ana: Report error on write error waiting for
the response")
The first one is the best candidate. If you perform some tests, you can try to
revert it.
In addition, you may enable the H1 traces at the error level. It may help to
spot the place where the error is detected. To do so, you should add following
blocks in your configuration:
ring buf1
size 104857600 # 10MB
format timed
backing-file /tmp/blah
global
expose-experimental-directives
trace h1 sink buf1
trace h1 level error
trace h1 verbosity complete
trace h1 start now
This will write the H1 error traces in /tmp/blah file. You can show the traces
by running:
strings /tmp/blah | less
If it does not work, you may use haring tool from HAProxy sources. To do so, you
should compile it (make dev/haring/haring). Then run
./dev/haring/haring -f /tmp/blah | less
--
Christopher Faulet