Hi Luke,
Le 26/09/2024 à 12:28, Luke Seelenbinder a écrit :
On upgrading to 3.0.5, we began to see a lot of failed backend requests. They
are successful status codes but fail with connection state `SD--`. On the
upstream side, the request succeeds (the upstream is also HAProxy, its state is
`----`).
The data appears to be fully transferred without error, but something goes wrong
towards the end of the request. This happens on a rather small percentage of
requests, but I'm struggling to determine how to isolate the problem further.
Timing and bytes transferred on both sides match up. Varnish is in the loop for
most of these requests (but not all), and it ends up returning an error
response, so it's not a spurious log line where the client doesn't register an
error. To make matters worse, the response status code from the backend is
successful, so the requests can't be retried using L7.
Sorry, I don't understand, the response was successfully sent to the client when
this happens or not ? It is "just" an issue with the termination state or there
is also an issue with the response itself ?
The only thing that was changed should be the upgrade between 3.0.4 and 3.0.5.
Our settings are pretty standard. TLS on both sides; a mix of H3, H2, and H1.1
for the frontend; exclusively client-cert TLS + H1.1 for the backend. Errors
happen on all FE protocols.
Any tips on how to debug this further? Possibly relevant config below.
Well, if it is a issue with the termination state while the response is fully
sent to the client, it may be a server shutdown that is caught too early, when
it is received with the last bytes of data.
At first glance, there is not so much fix that can explain that. Maybe the
following one, not sure:
commit e2a93b649286b30245333eec5851acd3991fda47
Author: Christopher Faulet <[email protected]>
Date: Mon Jul 29 17:48:16 2024 +0200
BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was
set
When a send on a connection is performed, if a SE error (or a pending error)
was already reported earlier, we leave immediately. No send is performed.
However, we must be sure to report the error at the SC level if necessary.
Indeed, the SE error may have been reported during the zero-copy data
forwarding. So during receive on the opposite side. In that case, we may
have missed the opportunity to report it at the SC level.
The patch must be backported as far as 2.8.
(cherry picked from commit 5dc45445ff18207dbacebf1f777e1f1abcd5065d)
Signed-off-by: Christopher Faulet <[email protected]>
You may try do disable the zero-copy data forwarding with -dZ command line
option.
--
Christopher Faulet