Re: Early connection close, incomplete transfers

Veiko Kukk Wed, 20 Feb 2019 07:54:59 -0800

On 2019-02-19 06:47, Willy Tarreau wrote:


This is interesting. As you observed in the trace you sent me, the
lighttpd server closes just after sending the response headers. This
indeed matches the "SD" log that aproxy emits. If it doesn't happen
in TCP mode nor with Nginx, it means that something haproxy modifies
in the request causes this effect on the server.

Hi

I'm sending answer from colleague who investigated this more thoroughly,especially from lighttpd side:

we've been debugging this a bit further and it does not look like theissue with the seemingly random incomplete HTTP responses would be dueto any particular request headers at the HTTP layer. It rather lookslike something at the TCP level (so specific to HTTP mode):

A first observation we made is that the frequency of these incompletetransfers increases when we add a delay at the backend server aftersending the response headers and before sending the body data. We addeda 100 ms delay there and then got a lot of interrupted transfers thathad only received the response headers (= no delay) but 0 bytes of thebody (= which was sent just after delay). So the frequency with whichthis happens appears to be proportional to latencies/stalls in thebackend server sending the response data (some read timeout logic athaproxy??).

We debugged further and noticed that in all cases where transfers wereincomplete our lighttpd backend server was receiving an EPOLLRDHUP eventon the socket where it communicates with haproxy. So it appears as ifhaproxy is *sometimes* (apparently depending on some read latency/stall- see above) shutting down its socket with the backend for writing*before* the full response and body data has been received.

And this is also basically ok because the socket remains writeable forlighttpd and so it could still send down the rest of the response data.However, it looks like lighttpd is not expecting this kind of behaviorfrom the client and is not correctly handling such a half-closed TCPsession. There is code in lighttpd to handle such a EPOLLRDHUP event andhalf-closed TCP connection, but lighttpd then also checks the state ofthe TCP session with getsockopts and keeps the connection open *only*when the state is TCP_CLOSE_WAIT. In all other cases upon receiving theEPOLLRDHUP it actively changes the state of the connection to "ERROR"and then closes the connection:


https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L908
https://github.com/lighttpd/lighttpd1.4/blob/master/src/fdevent.c#L995

We checked and every time we have a incomplete response lighttpdreceives the EPOLLRDHUP event on the socket but the tcp state queriedvia getsockopts is always TCP_CLOSE (and not TCP_CLOSE_WAIT as lighttpdseems to expect). And because of this lighttpd then actively closes thehalf-closed connection also from its end (which likely is the cause ofthe TCP FIN sent by lighttpd as seen in the tcpdump).

When we remove this condition from lighttpd which marks the connectionas errorness in case of EPOLLRDHUP and tcp state != TCP_CLOSE_WAIT, thenthe problem with the incomplete transfers disappears:


https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L922

We do not understand why this is or what the correct reaction to theEPOLLRDHUP event should be. In particular, we do not understand whylighttpd performs this check for TCP_CLOSE_WAIT or why we always get astate of TCP_CLOSE when we receive this event but the socket stillcontinues to be writeable (so does the TCP_CLOSE just indicate that onedirection of the connection is closed??). Still, because thishalf-closing of the connection to the backed server appears to happenjust pretty randomly and depending on latency/stalls of the backendserver sending down the response data, we assume that this is not theintended behavior by haproxy (and so possibly indicates some bug inhaproxy too).

We assume that the reason why direct requests to the backend server orrequests proxied via Nginx did never fail is because in these casesthere never occurs the EPOLLRDHUP event and there never are half-closedconnections. However, we have not tested this (yet), so we did notre-test with Nginx to verify that then indeed lighttpd never sees aEPOLLRDHUP.

Any ideas or suggestions based on these findings what should be theproper solution to the problem?


Thank you.

Re: Early connection close, incomplete transfers

Reply via email to