On Wed, 6 Jun 2018 at 11:20, Willy Tarreau <w...@1wt.eu> wrote:

> Hi Milan,
>
> On Wed, Jun 06, 2018 at 11:09:19AM +0200, Milan Petruzelka wrote:
> > Hi Willy,
> >
> > I've tracked one of connections hanging in CLOSE_WAIT state with tcpdump
> > over last night. It started at 17:19 like this:
> >
> >     "Packet No.","Time in
> seconds","Source","Destination","Protocol","Length","Info"
> >     "1","0.000000","ip_client","ip_haproxy_server","TCP","62","64311
> > 443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1"
> >     "2","0.001049","ip_haproxy_server","ip_client","TCP","62","443
> > 64311 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1"
>
> Sure, it's extremely useful. It looks like a shutdown is ignored when
> waiting for the H2 preface. The FD is not being polled for reading, so
> I tend to conclude that either it was disabled for a reason I still do
> not know, or it used to be enabled, the shutdown was received then the
> polling was disabled. But that doesn't appear in the connection flags.
> So it seems that the transition between the end of handshake and the
> start of parsing could be at fault. Maybe we refrain from entering the
> H2 state machine because of the early shutdown. I'm going to have a look
> in that direction.


Hi Willy,

Here is some additional info about the hanging CLOSE_WAIT connection
described in my last message - screenshot from our log management -
https://pasteboard.co/HoVCR0b.jpg and text variant:

No. Time hap.ts.accept      hap.client.port        hap.term.state
  hap.rq.method
      hap.rsp.status        hap.rsp.size
1 June 5th 2018, 17:19:22.913 June 5th 2018, 17:19:22.435 64311 ---- GET 200
17,733
2 June 5th 2018, 17:19:23.814 June 5th 2018, 17:19:23.806 64311 ---- GET 200
4,671
3 June 5th 2018, 17:19:23.814 June 5th 2018, 17:19:23.806 64311 ---- GET 200
4,481
4 June 5th 2018, 17:19:23.820 June 5th 2018, 17:19:23.808 64311 ---- GET 200
4,613
5 June 5th 2018, 17:19:23.821 June 5th 2018, 17:19:23.808 64311 ---- GET 200
4,584
6 June 5th 2018, 17:19:24.967 June 5th 2018, 17:19:24.961 64311 ---- GET 200
2,786
7 June 5th 2018, 17:19:28.945 June 5th 2018, 17:19:25.049 64311 ---- GET 200
78,325
8 June 5th 2018, 17:19:28.945 June 5th 2018, 17:19:25.049 64311 ---- GET 200
46,830
9 June 5th 2018, 17:19:31.617 June 5th 2018, 17:19:24.707 64311 CL-- GET 200
13,086
10 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.049 64311 CD-- GET
200 49,162
11 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.049 64311 CD-- GET
200 40,328
12 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.066 64311 CL-- POST
200 3,198

As you can see, client first made 8 successful requests using http/2
connection through Haproxy to our backend server. Remaining 4 requests
finished with termination states CL-- or CD--, probably because of FIN
packet received form client.

C : the TCP session was unexpectedly aborted by the client
D : the session was in the DATA phase.
L : the proxy was still transmitting LAST data to the client while the
server had already finished. This one is very rare as it can only happen
when the client dies while receiving the last packets.

It looks like Haproxy knows about client disconnection in muxed requests
inside frontend http/2 connection. It just seems not to propagate this
knowledge to the frontend connection, leaving it hanging in CLOSE_WAIT
state. Sorry for not sending this piece of log last time, i didn't expect
to see any of those CL-- / CD-- requests.

I'll switch http/2 off on the site because I'll be AFK for 2 weeks. After
that I'll be ready to provide any additional info to hunt this one down. Or
to test the patched version if available.

Thank you, best regards
Milan

Reply via email to