Hi Willy,

This is all very good to hear. I'm glad you were able to get to the bottom of 
it all!

Feel free to send along patches if you want me to test before the 1.9.3 
release. I'm more than happy to do so.

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, January 23, 2019 6:02 PM, Willy Tarreau <w...@1wt.eu> wrote:

> Hi Luke,
> 

> On Wed, Jan 23, 2019 at 10:47:33AM +0000, Luke Seelenbinder wrote:
> 

> > We were using http-reuse always and experiencing this
> > issue (as well as getting 80+% connection reuse). When I scaled it back to
> > http-reuse safe, the frequency of this issue seemed to be much lower.
> > (Perhaps because the bulk of my testing was with one client and somewhat
> > unscientific?)
> 

> It could be caused by various things. In my tests the client doesn't even
> use keep-alive so haproxy is less aggressive with connection reuse and
> that could explain some differences.
> 

> > > Thus it
> > > definitely is a matter of bad interaction between two streams, or one
> > > stream affecting the connection and hurting the other stream.
> > 

> > My debugging spidery-sense points to the same thing.
> 

> So I have more info now. There are multiple issues which stack up and
> cause this :
> 

> -   the GOAWAY frame indicating the last stream id might be in flight
>     while many more streams have been added. This results in batch
>     deaths once the limit is met ;
>     

> -   the last stream ID received in the GOAWAY frame was not considered
>     when calculating the number of available streams, leading to more
>     than acceptable by the server to be created ;
>     

> -   there is an issue with how new streams are attached to idle connections
>     making them non-retryable in case of a failure sur as above. I managed
>     to fix this but it still requires some testing with other configs ;
>     

> -   another issue affects idle connections, some of them could remain
>     in the idle list while they don't have room anymore because they
>     are removed only when they deliver the last stream, thus the check
>     doesn't support jumps in the number of available streams ; I suspect
>     it could be related to the client aborts that cause server aborts,
>     just because it allowed some excess streams to be sent to a mux which
>     doesn't have room anymore, but I could be wrong ;
>     

>     And a less important one : the maximum number of concurrent streams per
>     connection is global. In this case it's 100 so it's lower than nginx's
>     128 thus it doesn't cause any issue. But we could run into problems with
>     this and I must address this to make it per-connection.
>     

>     With all these changes, I managed to run a long test with no more errors
>     and only an immediate retry once in a while if nginx announced the GOAWAY
>     too late. When we set the limit ourselves, there's not even any retry
>     anymore. Thus I'll continue to work on this and we'll slightly delay 1.9.3
>     to collect these fixes. From there we'll be able to see if you still have
>     problems and iterate.
>     

> 

> > Let me know if you want me to share our config (it's quite complex) with you
> > privately or if there's anything else we can do to assist.
> 

> That's kind but now I don't need it anymore, I have everything needed to
> reproduce the whole issue it seems.
> 

> Thanks,
> Willy

Attachment: publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to