Just a quick update. `-dZ` doesn't seem to have any impact on the number of `SD--` requests.
— Luke Seelenbinder Stadia Maps | Founder & CEO stadiamaps.com > On Sep 26, 2024, at 18:18, Luke Seelenbinder > <[email protected]> wrote: > > Hi Christopher, > > Thanks for the response. > >> Sorry, I don't understand, the response was successfully sent to the client >> when this happens or not ? It is "just" an issue with the termination state >> or there is also an issue with the response itself ? > > It's also an issue with the response. The chain is: > > Varnish (status: 503) -> HAProxy (status: 200; termination: SD--) -> HAProxy > Upstream (status: 200, termination: ----) > >> At first glance, there is not so much fix that can explain that. Maybe the >> following one, not sure: > > I had the same thought…nothing really made sense to me either. > > I'll try with `-dZ` and report back! > > Best, > Luke > > — > Luke Seelenbinder > Stadia Maps | Founder & CEO > stadiamaps.com > >> On Sep 26, 2024, at 16:28, Christopher Faulet <[email protected]> wrote: >> >> Hi Luke, >> >> Le 26/09/2024 à 12:28, Luke Seelenbinder a écrit : >>> On upgrading to 3.0.5, we began to see a lot of failed backend requests. >>> They are successful status codes but fail with connection state `SD--`. On >>> the upstream side, the request succeeds (the upstream is also HAProxy, its >>> state is `----`). >>> The data appears to be fully transferred without error, but something goes >>> wrong towards the end of the request. This happens on a rather small >>> percentage of requests, but I'm struggling to determine how to isolate the >>> problem further. Timing and bytes transferred on both sides match up. >>> Varnish is in the loop for most of these requests (but not all), and it >>> ends up returning an error response, so it's not a spurious log line where >>> the client doesn't register an error. To make matters worse, the response >>> status code from the backend is successful, so the requests can't be >>> retried using L7. >> >> Sorry, I don't understand, the response was successfully sent to the client >> when this happens or not ? It is "just" an issue with the termination state >> or there is also an issue with the response itself ? >> >>> The only thing that was changed should be the upgrade between 3.0.4 and >>> 3.0.5. >>> Our settings are pretty standard. TLS on both sides; a mix of H3, H2, and >>> H1.1 for the frontend; exclusively client-cert TLS + H1.1 for the backend. >>> Errors happen on all FE protocols. >>> Any tips on how to debug this further? Possibly relevant config below. >> >> Well, if it is a issue with the termination state while the response is >> fully sent to the client, it may be a server shutdown that is caught too >> early, when it is received with the last bytes of data. >> >> At first glance, there is not so much fix that can explain that. Maybe the >> following one, not sure: >> >> commit e2a93b649286b30245333eec5851acd3991fda47 >> Author: Christopher Faulet <[email protected]> >> Date: Mon Jul 29 17:48:16 2024 +0200 >> >> BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was >> set >> >> When a send on a connection is performed, if a SE error (or a pending >> error) >> was already reported earlier, we leave immediately. No send is performed. >> However, we must be sure to report the error at the SC level if necessary. >> Indeed, the SE error may have been reported during the zero-copy data >> forwarding. So during receive on the opposite side. In that case, we may >> have missed the opportunity to report it at the SC level. >> >> The patch must be backported as far as 2.8. >> >> (cherry picked from commit 5dc45445ff18207dbacebf1f777e1f1abcd5065d) >> Signed-off-by: Christopher Faulet <[email protected]> >> >> You may try do disable the zero-copy data forwarding with -dZ command line >> option. >> >> -- >> Christopher Faulet >> >

