Just a quick update. `-dZ` doesn't seem to have any impact on the number of 
`SD--` requests.

—
Luke Seelenbinder
Stadia Maps | Founder & CEO
stadiamaps.com

> On Sep 26, 2024, at 18:18, Luke Seelenbinder 
> <[email protected]> wrote:
> 
> Hi Christopher,
> 
> Thanks for the response.
> 
>> Sorry, I don't understand, the response was successfully sent to the client 
>> when this happens or not ? It is "just" an issue with the termination state 
>> or there is also an issue with the response itself ?
> 
> It's also an issue with the response. The chain is:
> 
> Varnish (status: 503) -> HAProxy (status: 200; termination: SD--) -> HAProxy 
> Upstream (status: 200, termination: ----)
> 
>> At first glance, there is not so much fix that can explain that. Maybe the 
>> following one, not sure:
> 
> I had the same thought…nothing really made sense to me either.
> 
> I'll try with `-dZ` and report back!
> 
> Best,
> Luke
> 
> —
> Luke Seelenbinder
> Stadia Maps | Founder & CEO
> stadiamaps.com
> 
>> On Sep 26, 2024, at 16:28, Christopher Faulet <[email protected]> wrote:
>> 
>> Hi Luke,
>> 
>> Le 26/09/2024 à 12:28, Luke Seelenbinder a écrit :
>>> On upgrading to 3.0.5, we began to see a lot of failed backend requests. 
>>> They are successful status codes but fail with connection state `SD--`. On 
>>> the upstream side, the request succeeds (the upstream is also HAProxy, its 
>>> state is `----`).
>>> The data appears to be fully transferred without error, but something goes 
>>> wrong towards the end of the request. This happens on a rather small 
>>> percentage of requests, but I'm struggling to determine how to isolate the 
>>> problem further. Timing and bytes transferred on both sides match up. 
>>> Varnish is in the loop for most of these requests (but not all), and it 
>>> ends up returning an error response, so it's not a spurious log line where 
>>> the client doesn't register an error. To make matters worse, the response 
>>> status code from the backend is successful, so the requests can't be 
>>> retried using L7.
>> 
>> Sorry, I don't understand, the response was successfully sent to the client 
>> when this happens or not ? It is "just" an issue with the termination state 
>> or there is also an issue with the response itself ?
>> 
>>> The only thing that was changed should be the upgrade between 3.0.4 and 
>>> 3.0.5.
>>> Our settings are pretty standard. TLS on both sides; a mix of H3, H2, and 
>>> H1.1 for the frontend; exclusively client-cert TLS + H1.1 for the backend. 
>>> Errors happen on all FE protocols.
>>> Any tips on how to debug this further? Possibly relevant config below.
>> 
>> Well, if it is a issue with the termination state while the response is 
>> fully sent to the client, it may be a server shutdown that is caught too 
>> early, when it is received with the last bytes of data.
>> 
>> At first glance, there is not so much fix that can explain that. Maybe the 
>> following one, not sure:
>> 
>> commit e2a93b649286b30245333eec5851acd3991fda47
>> Author: Christopher Faulet <[email protected]>
>> Date:   Mon Jul 29 17:48:16 2024 +0200
>> 
>>    BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was 
>> set
>> 
>>    When a send on a connection is performed, if a SE error (or a pending 
>> error)
>>    was already reported earlier, we leave immediately. No send is performed.
>>    However, we must be sure to report the error at the SC level if necessary.
>>    Indeed, the SE error may have been reported during the zero-copy data
>>    forwarding. So during receive on the opposite side. In that case, we may
>>    have missed the opportunity to report it at the SC level.
>> 
>>    The patch must be backported as far as 2.8.
>> 
>>    (cherry picked from commit 5dc45445ff18207dbacebf1f777e1f1abcd5065d)
>>    Signed-off-by: Christopher Faulet <[email protected]>
>> 
>> You may try do disable the zero-copy data forwarding with -dZ command line 
>> option.
>> 
>> -- 
>> Christopher Faulet
>> 
> 

Reply via email to