Hi Luke,

Le 26/09/2024 à 12:28, Luke Seelenbinder a écrit :
On upgrading to 3.0.5, we began to see a lot of failed backend requests. They are successful status codes but fail with connection state `SD--`. On the upstream side, the request succeeds (the upstream is also HAProxy, its state is `----`).

The data appears to be fully transferred without error, but something goes wrong towards the end of the request. This happens on a rather small percentage of requests, but I'm struggling to determine how to isolate the problem further. Timing and bytes transferred on both sides match up. Varnish is in the loop for most of these requests (but not all), and it ends up returning an error response, so it's not a spurious log line where the client doesn't register an error. To make matters worse, the response status code from the backend is successful, so the requests can't be retried using L7.

Sorry, I don't understand, the response was successfully sent to the client when this happens or not ? It is "just" an issue with the termination state or there is also an issue with the response itself ?


The only thing that was changed should be the upgrade between 3.0.4 and 3.0.5.

Our settings are pretty standard. TLS on both sides; a mix of H3, H2, and H1.1 for the frontend; exclusively client-cert TLS + H1.1 for the backend. Errors happen on all FE protocols.

Any tips on how to debug this further? Possibly relevant config below.


Well, if it is a issue with the termination state while the response is fully sent to the client, it may be a server shutdown that is caught too early, when it is received with the last bytes of data.

At first glance, there is not so much fix that can explain that. Maybe the following one, not sure:

commit e2a93b649286b30245333eec5851acd3991fda47
Author: Christopher Faulet <[email protected]>
Date:   Mon Jul 29 17:48:16 2024 +0200

    BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was 
set

    When a send on a connection is performed, if a SE error (or a pending error)
    was already reported earlier, we leave immediately. No send is performed.
    However, we must be sure to report the error at the SC level if necessary.
    Indeed, the SE error may have been reported during the zero-copy data
    forwarding. So during receive on the opposite side. In that case, we may
    have missed the opportunity to report it at the SC level.

    The patch must be backported as far as 2.8.

    (cherry picked from commit 5dc45445ff18207dbacebf1f777e1f1abcd5065d)
    Signed-off-by: Christopher Faulet <[email protected]>

You may try do disable the zero-copy data forwarding with -dZ command line 
option.

--
Christopher Faulet



Reply via email to