On Tue, May 9, 2017 at 9:47 PM, Willy Tarreau <w...@1wt.eu> wrote:
> On Tue, May 09, 2017 at 02:54:45PM +0300, Jarno Huuskonen wrote:
>> My firefox(52.1 on linux) was able to send 128k file,
>> but 800k file results in connection reset. My chrome sent 16k file, but
>> fails (ERR_CONNECTION_RESET) on 17k file (few times even the 17k file
>> worked).


My particular request is 17575 octets in total (headers + body), and
as Jarno observed, it sometimes works Firefox / Chromium, but most
times Chromium trips over it.

The "non-determinism" of the issue is triggered by the fact if the
browser manages to push all its payload over the network before the
response from HAProxy is received.  (If the network latency would be
zero, and the receive window on HAProxy size would be extremly small,
this issue would happen every time.)

Based on a few `tcpdump` captures the issue can be described in terms
of TCP as this:
* the client sends its headers, and starts sending the request body;
* HAProxy receives the headers, determines it's a redirect;
* HAProxy writes the response status line and headers, and closes the
connection via a reset packet;
* meanwhile the client while still writing to the socket its payload,
the reset packet is received which puts the client into an error state
(because it tries to write to a closed socket);
* thus the client just errors out without reading the response;

Sometimes the reset packet is received after the client has finished
writing, thus the condition is not encountered any more.

(On explicit request over private email, I can provide a small tcpdump
capture displaying the behaviour described herein.)




> Hmmm that sounds bad, it looks like we've broken something again.


I don't think HAProxy was "broken" this time, as we are using HAProxy
1.6.11 since last November, and only recently (since two-three weeks
ago) we started encountering this issue without having any major
changes to either the HAProxy configuration or the web application
where we encountered this issue.

In fact I think the browsers "broke" something, as only with recent
variants of Chromium and Firefox we encounter this.


However, since there are far less few HAProxy deployments than
browsers (say 50.000 to 1 in our deployment), I think the easiest
component to fix is HAProxy, by making it "drain" the entire
connection before closing it.  (Although I have no idea how
complicated is to do this.)




> What status code are you facing there ? I remember we've had such
> an issue in the past where the server timeout could expire during
> a long upload because it was not being refreshed during the upload
> (it would thus result in a 504).


This is the point, there is no "error" situation, as HAProxy just
issues the redirect status line and headers and then resets the
connection.

Thanks,
Ciprian.

Reply via email to