Hi Milan,

On Fri, Jun 08, 2018 at 01:26:41PM +0200, Milan Petruzelka wrote:
> On Wed, 6 Jun 2018 at 11:20, Willy Tarreau <w...@1wt.eu> wrote:
> 
> > Hi Milan,
> >
> > On Wed, Jun 06, 2018 at 11:09:19AM +0200, Milan Petruzelka wrote:
> > > Hi Willy,
> > >
> > > I've tracked one of connections hanging in CLOSE_WAIT state with tcpdump
> > > over last night. It started at 17:19 like this:
> > >
> > >     "Packet No.","Time in
> > seconds","Source","Destination","Protocol","Length","Info"
> > >     "1","0.000000","ip_client","ip_haproxy_server","TCP","62","64311
> > > 443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1"
> > >     "2","0.001049","ip_haproxy_server","ip_client","TCP","62","443
> > > 64311 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1"
> >
> > Sure, it's extremely useful. It looks like a shutdown is ignored when
> > waiting for the H2 preface. The FD is not being polled for reading, so
> > I tend to conclude that either it was disabled for a reason I still do
> > not know, or it used to be enabled, the shutdown was received then the
> > polling was disabled. But that doesn't appear in the connection flags.
> > So it seems that the transition between the end of handshake and the
> > start of parsing could be at fault. Maybe we refrain from entering the
> > H2 state machine because of the early shutdown. I'm going to have a look
> > in that direction.
> 
> 
> Hi Willy,
> 
> Here is some additional info about the hanging CLOSE_WAIT connection
> described in my last message - screenshot from our log management -
> https://pasteboard.co/HoVCR0b.jpg and text variant:
> 
> No. Time hap.ts.accept      hap.client.port        hap.term.state
>   hap.rq.method
>       hap.rsp.status        hap.rsp.size
> 1 June 5th 2018, 17:19:22.913 June 5th 2018, 17:19:22.435 64311 ---- GET 200
> 17,733
> 2 June 5th 2018, 17:19:23.814 June 5th 2018, 17:19:23.806 64311 ---- GET 200
> 4,671
> 3 June 5th 2018, 17:19:23.814 June 5th 2018, 17:19:23.806 64311 ---- GET 200
> 4,481
> 4 June 5th 2018, 17:19:23.820 June 5th 2018, 17:19:23.808 64311 ---- GET 200
> 4,613
> 5 June 5th 2018, 17:19:23.821 June 5th 2018, 17:19:23.808 64311 ---- GET 200
> 4,584
> 6 June 5th 2018, 17:19:24.967 June 5th 2018, 17:19:24.961 64311 ---- GET 200
> 2,786
> 7 June 5th 2018, 17:19:28.945 June 5th 2018, 17:19:25.049 64311 ---- GET 200
> 78,325
> 8 June 5th 2018, 17:19:28.945 June 5th 2018, 17:19:25.049 64311 ---- GET 200
> 46,830
> 9 June 5th 2018, 17:19:31.617 June 5th 2018, 17:19:24.707 64311 CL-- GET 200
> 13,086
> 10 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.049 64311 CD-- GET
> 200 49,162
> 11 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.049 64311 CD-- GET
> 200 40,328
> 12 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.066 64311 CL-- POST
> 200 3,198
> 
> As you can see, client first made 8 successful requests using http/2
> connection through Haproxy to our backend server. Remaining 4 requests
> finished with termination states CL-- or CD--, probably because of FIN
> packet received form client.

Or possibly because the client aborted with an RST_STREAM, which is
equally possible (eg: the user pressed STOP).

> C : the TCP session was unexpectedly aborted by the client
> D : the session was in the DATA phase.
> L : the proxy was still transmitting LAST data to the client while the
> server had already finished. This one is very rare as it can only happen
> when the client dies while receiving the last packets.
> 
> It looks like Haproxy knows about client disconnection in muxed requests
> inside frontend http/2 connection. It just seems not to propagate this
> knowledge to the frontend connection, leaving it hanging in CLOSE_WAIT
> state.

It is indeed a possibility. I'm currently studying the code in hope to
find a clue there. I've been unsuccessful at trying to provoke the issue
for now.

> Sorry for not sending this piece of log last time, i didn't expect
> to see any of those CL-- / CD-- requests.

Hey are you kidding ? You don't have to be sorry, you've sent one of
the most detailed bug reports we've had for a while, and I really
appreciate the amount of work it requires on your side to dig through
this, blur all these fields to be able to still provide us with as much
information as possible! No, thanks *a lot* for your traces, really!

> I'll switch http/2 off on the site because I'll be AFK for 2 weeks. After
> that I'll be ready to provide any additional info to hunt this one down. Or
> to test the patched version if available.

I'll continue to work on this. I really don't like this strange behaviour.

Thanks!
Willy

Reply via email to