Re: Browser downloads failing because of h2c_send_goaway_error (1.8.17 + 1.9.1)

2019-02-01 Thread Willy Tarreau
Hi Wert,

On Thu, Jan 17, 2019 at 01:38:52PM +0300, Wert wrote:
> It makes large downloads completely unusable even with not very often reloads.

Yesterday while addressing some H2 issues, I found a bug which can cause
exactly what you were seeing. I'm interested in knowing whether the latest
1.9 fixes this for you :

 http://git.haproxy.org/?p=haproxy-1.9.git;a=snapshot;h=HEAD;sf=tgz

Thanks,
Willy



Re: Browser downloads failing because of h2c_send_goaway_error (1.8.17 + 1.9.1)

2019-01-17 Thread Wert
Hi

> Hi,

> On Wed, Jan 16, 2019 at 10:32:00AM +0300, Wert wrote:
>> How to reproduce:
>> 1. Start browser-download (content-disposition: attachment) of some big file 
>> through H2
>> * Tested with 1Gb file and several Chrome-versions (67-)
>> 2. Make reload
>> 3. Process with this connection would stay, transfer everything successfully 
>> and even send to log info about code-200 full-size transfer without any 
>> termination flags
>> 4. Chrome would show "Network error" at the end of transfer
>> 
>> Versions:
>> 1.8.17 - it happens in all cases
>> #c4d56e5 - looks like it happens only in some conditions, but just because 
>> there is another bug that makes old processes stay much longer than with 
>> 1.8.17
>> 
>> Fix:
>> For 1.8.17 I just disabled condition "if (sess && unlikely(sess->fe->state 
>> == PR_STSTOPPED))" leading to h2c_send_goaway_error() in mux_h2.c:2330
>> It is not correct way and probably breaks something else, but fixes this

> I understand what's happening. And it is caused by the H2 state machine
> which does not have provisions for something equivalent to the LAST_ACK
> TCP state. A stream is closed as soon as the END_STREAM flag has been
> seen in both directions. Thus during such a reload, we end up with a
> connection which doesn't have any stream anymore but still bytes in
> flight for the last stream(s). Since we have a reload in progress, we
> know the process is waiting for old connections to leave, thus we close
> this idle and unused connection. The problem is that the client is still
> sending WINDOW_UPDATES for the data it receives, and that these ones will
> cause a TCP RST to be emitted. Depending on the packet ordering on the
> path and on the remote TCP stack, this can lead to truncated responses,
> or even in a client reporting an error after it correctly gets everything.

> I'd like to study the possibility to introduce an equivalent of the LAST_ACK
> state for streams. In theory it should not be very difficult but in practice
> it's tricky because frames are not really ACKed, the client sends us more
> budget to send more data, so we never exactly know when all data were received
> on the other end, though it serves as a hint. Ideally we could consider that
> we can watch the window increments only and compare them to the amount of
> data sent.

> I think it's something I'll bring to the IETF HTTP working group, because
> it could actually serve to address another issue related to priorities
> which are sometimes incorrectly re-attached when streams disappear.

> For the short term however I don't have any solution to this, and tricks
> such as the one above are indeed risky, you may face connections which
> never die :-/

> Best regards,
> Willy

Thank you for answer.

It makes large downloads completely unusable even with not very often reloads.

"Never die" connection looks not so big problem since it is stopping process 
and "hard-stop-after" can deal with it.
It is less evil for many cases.

How about optional delayed-processing of h2c_send_goaway_error (exclusively for 
stopping processes)?
Like separate queue for "connections awaiting send of goaway-error"

-- 
Wert




Re: Browser downloads failing because of h2c_send_goaway_error (1.8.17 + 1.9.1)

2019-01-16 Thread Willy Tarreau
Hi,

On Wed, Jan 16, 2019 at 10:32:00AM +0300, Wert wrote:
> How to reproduce:
> 1. Start browser-download (content-disposition: attachment) of some big file 
> through H2
> * Tested with 1Gb file and several Chrome-versions (67-)
> 2. Make reload
> 3. Process with this connection would stay, transfer everything successfully 
> and even send to log info about code-200 full-size transfer without any 
> termination flags
> 4. Chrome would show "Network error" at the end of transfer
> 
> Versions:
> 1.8.17 - it happens in all cases
> #c4d56e5 - looks like it happens only in some conditions, but just because 
> there is another bug that makes old processes stay much longer than with 
> 1.8.17
> 
> Fix:
> For 1.8.17 I just disabled condition "if (sess && unlikely(sess->fe->state == 
> PR_STSTOPPED))" leading to h2c_send_goaway_error() in mux_h2.c:2330
> It is not correct way and probably breaks something else, but fixes this

I understand what's happening. And it is caused by the H2 state machine
which does not have provisions for something equivalent to the LAST_ACK
TCP state. A stream is closed as soon as the END_STREAM flag has been
seen in both directions. Thus during such a reload, we end up with a
connection which doesn't have any stream anymore but still bytes in
flight for the last stream(s). Since we have a reload in progress, we
know the process is waiting for old connections to leave, thus we close
this idle and unused connection. The problem is that the client is still
sending WINDOW_UPDATES for the data it receives, and that these ones will
cause a TCP RST to be emitted. Depending on the packet ordering on the
path and on the remote TCP stack, this can lead to truncated responses,
or even in a client reporting an error after it correctly gets everything.

I'd like to study the possibility to introduce an equivalent of the LAST_ACK
state for streams. In theory it should not be very difficult but in practice
it's tricky because frames are not really ACKed, the client sends us more
budget to send more data, so we never exactly know when all data were received
on the other end, though it serves as a hint. Ideally we could consider that
we can watch the window increments only and compare them to the amount of
data sent.

I think it's something I'll bring to the IETF HTTP working group, because
it could actually serve to address another issue related to priorities
which are sometimes incorrectly re-attached when streams disappear.

For the short term however I don't have any solution to this, and tricks
such as the one above are indeed risky, you may face connections which
never die :-/

Best regards,
Willy



Browser downloads failing because of h2c_send_goaway_error (1.8.17 + 1.9.1)

2019-01-15 Thread Wert
How to reproduce:
1. Start browser-download (content-disposition: attachment) of some big file 
through H2
* Tested with 1Gb file and several Chrome-versions (67-)
2. Make reload
3. Process with this connection would stay, transfer everything successfully 
and even send to log info about code-200 full-size transfer without any 
termination flags
4. Chrome would show "Network error" at the end of transfer

Versions:
1.8.17 - it happens in all cases
#c4d56e5 - looks like it happens only in some conditions, but just because 
there is another bug that makes old processes stay much longer than with 1.8.17

Fix:
For 1.8.17 I just disabled condition "if (sess && unlikely(sess->fe->state == 
PR_STSTOPPED))" leading to h2c_send_goaway_error() in mux_h2.c:2330
It is not correct way and probably breaks something else, but fixes this

Tcpdump for end of failed download:
04:28:16.947674 IP client.example.com.53228 > server.example.com.https: Flags 
[.], ack 1189837535, win 3645, length 0
04:28:16.947685 IP server.example.com.https > client.example.com.53228: Flags 
[.], seq 1190016095:1190018975, ack 22882, win 254, length 2880
04:28:16.947695 IP server.example.com.https > client.example.com.53228: Flags 
[.], seq 1190018975:1190021855, ack 22882, win 254, length 2880
04:28:16.950105 IP client.example.com.53228 > server.example.com.https: Flags 
[.], ack 1189843295, win 3645, length 0
04:28:16.950117 IP server.example.com.https > client.example.com.53228: Flags 
[.], seq 1190021855:1190024735, ack 22882, win 254, length 2880
04:28:16.950127 IP server.example.com.https > client.example.com.53228: Flags 
[.], seq 1190024735:1190027615, ack 22882, win 254, length 2880
04:28:16.952186 IP client.example.com.53228 > server.example.com.https: Flags 
[.], ack 1189847615, win 3645, length 0
04:28:16.952194 IP server.example.com.https > client.example.com.53228: Flags 
[.], seq 1190027615:1190030495, ack 22882, win 254, length 2880
04:28:16.952202 IP server.example.com.https > client.example.com.53228: Flags 
[.], seq 1190030495:1190033375, ack 22882, win 254, length 2880
04:28:16.954424 IP client.example.com.53228 > server.example.com.https: Flags 
[.], ack 1189853375, win 3645, length 0
04:28:16.954430 IP server.example.com.https > client.example.com.53228: Flags 
[.], seq 1190033375:1190036255, ack 22882, win 254, length 2880
04:28:16.954438 IP server.example.com.https > client.example.com.53228: Flags 
[.], seq 1190036255:1190039135, ack 22882, win 254, length 2880
04:28:16.954441 IP client.example.com.53228 > server.example.com.https: Flags 
[P.], seq 22882:22924, ack 1189853375, win 3645, length 42
04:28:16.954496 IP server.example.com.https > client.example.com.53228: Flags 
[R], seq 1015944401, win 0, length 0
04:28:16.954690 IP client.example.com.53228 > server.example.com.https: Flags 
[.], ack 1189856255, win 3645, length 0
04:28:16.954704 IP server.example.com.https > client.example.com.53228: Flags 
[R], seq 1015947281, win 0, length 0
04:28:16.959056 IP client.example.com.53228 > server.example.com.https: Flags 
[.], ack 1189862015, win 3645, length 0
...

---
Wert