On Fri, Dec 29, 2017 at 04:46:57PM +0100, Lukas Tribus wrote:
> On Fri, Dec 29, 2017 at 3:58 PM, Willy Tarreau <w...@1wt.eu> wrote:
> > On Fri, Dec 29, 2017 at 03:42:30PM +0100, Willy Tarreau wrote:
> >> OK I managed to reproduce it with nghttp using --expect-continue to
> >> force it to leave a pause before sending the data. And indeed there
> >> the data are immediately followed by a shutdown. Getting closer...
> >
> > So here's what I found : when dealing with request forwarding, we used
> > to let the close migrate from the client to the server with the last
> > block. And this happens only once we switch to fast forwarding, which
> > means that the last block from the request didn't fit in the buffer.
> > Thus it would randomly impact large uploads (though timing would often
> > protect them) and almost always impact small ones if sent in two parts
> > as we could produce.
> >
> > The attached patch fixes it for me. Could you please give it a try ?
> 
> Confirmed, that patch fixes the issue for me.

Excellent, thank you (and I've just read Lucas' confirmation as well).

> The client was just a windows build of curl (with openssl 1.1.0), and
> a tiny POST:
> curl -kv https://localhost/111 -d "bla=bla" --http2
> 
> Port 443 SSH tunneled to a remote dev box running haproxy (but that
> should not have affected the H2 stream).

OK. That was my first attempt (albeit on a different OS) but what
matters is to find one case where it fails, so we're fine now.

> I'm confident this fixes the POST issue reported in this thread, as
> already confirmed by Lucas (by modifying nginx abort settings,
> permitting in-flight half-close). Not sure if the GET issue is just
> another vector of the same underlying issue, or if that is a
> completely different issue altogether.

I thought about it and can't imagine how it could be the same issue
since the one I just fixed is in the upload path, which will not happen
with a GET (or they're purposely sending a content-length to excite our
bugs but I'd rule this out :-)).

> Lucas, Maximilian, can you check the situation with this patch? The
> POST issue should definitely be gone, please also verify the GET issue
> with this patch (as I was unable to reproduce it).

I'll merge the patch to ease testing for others. I think that between
this and the recent mworker fixes, we're good to have another version
to progressively get rid of old bug reports.

> Also I think we can update the bugs filed with Firefox and Safari,
> clearly they don't send the connection header (they just show it in
> the dev-tools and debug logs).

Yes that would be nice to them.

> > I'm currently thinking how to add more information to improve observability
> > inside the mux to match what can currently be done outside (ie: we don't
> > have the equivalent of the "show errors" nor error counters inside the mux).
> 
> Indeed I think that is a good idea. This morning I plastered the mux
> code with printf's to figure out where exactly haproxy rejects the
> request. Only to then find out the problem was not in the request :)
> There are quite a few possible H2 failure code paths and the only
> debug possibility right now is to manually are printf's.

I know. During development I had to use a *lot* of printf, even with
colors and hexdumps, as the usual strace doesn't give useful info since
it's entirely done over SSL. The problem was that during the progressive
cleanups leading to the final code merge, this debugging code disappeared
and we were left with something much more opaque.

But while thinking about how to report more detailed info there, I'm
also seeing what will change once we rearrange the connection layer,
and the code will significantly move and be quite a bit reduced, with
less out-of-context errors. So it should become easier later to report
more precise information. That's why I'm trying to find the sweet spot
between absolutely needed elements and temporary stuff that will vanish
soon.

Cheers,
Willy

Reply via email to