Hi Milan,

On Wed, Jun 06, 2018 at 11:09:19AM +0200, Milan Petruzelka wrote:
> Hi Willy,
> 
> I've tracked one of connections hanging in CLOSE_WAIT state with tcpdump
> over last night. It started at 17:19 like this:
> 
>     "Packet No.","Time in
> seconds","Source","Destination","Protocol","Length","Info"
>     "1","0.000000","ip_client","ip_haproxy_server","TCP","62","64311  >
> 443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1"
>     "2","0.001049","ip_haproxy_server","ip_client","TCP","62","443  >
> 64311 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1"
>     "3","0.127239","ip_client","ip_haproxy_server","TCP","54","64311  >
> 443 [ACK] Seq=1 Ack=1 Win=64240 Len=0"
>     "4","0.127344","ip_client","ip_haproxy_server","TLSv1.2","571","Client
> Hello"
>     "5","0.130304","ip_haproxy_server","ip_client","TLSv1.2","2974","Server
> Hello, Certificate"
>     "6","0.130336","ip_haproxy_server","ip_client","TLSv1.2","310","Server
> Key Exchange, Server Hello Done"
> 
> After some 13 seconds client sent it's last data, which haproxy server
> acknowledged.
> 
> 
> "319","13.781347","ip_client","ip_haproxy_server","TLSv1.2","96","Application
> Data"
>     "320","13.781365","ip_haproxy_server","ip_client","TCP","54","443  >
> 64311 [ACK] Seq=240156 Ack=3689 Win=36448 Len=0"
> 
> Then client sent FIN packet, server acknowledged it again
> 
>     "321","16.292016","ip_client","ip_haproxy_server","TCP","54","64311  >
> 443 [FIN, ACK] Seq=3689 Ack=240156 Win=64240 Len=0"
>     "322","16.329574","ip_haproxy_server","ip_client","TCP","54","443  >
> 64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0"
> 
> >From then client sent only TCP keepalive every 45s, which server always
> ackonwledged.
> 
>     "323","61.443121","ip_client","ip_haproxy_server","TCP","55","[TCP
> Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
>     "324","61.443216","ip_haproxy_server","ip_client","TCP","66","[TCP
> Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
> SLE=3689 SRE=3690"
>     "325","106.528926","ip_client","ip_haproxy_server","TCP","55","[TCP
> Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
>     "326","106.529117","ip_haproxy_server","ip_client","TCP","66","[TCP
> Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
> SLE=3689 SRE=3690"
>     ...
> 
> After some 4.5 hours (at 21:51) client sent last keepalive which server
> acknowledged. There were no more packets after that.
> 
>     "1043","16284.644240","ip_client","ip_haproxy_server","TCP","55","[TCP
> Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
>     "1044","16284.644354","ip_haproxy_server","ip_client","TCP","66","[TCP
> Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
> SLE=3689 SRE=3690"
>     "1045","16329.797223","ip_client","ip_haproxy_server","TCP","55","[TCP
> Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
>     "1046","16329.797274","ip_haproxy_server","ip_client","TCP","66","[TCP
> Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
> SLE=3689 SRE=3690"
> 
> Next day in the morning at 10:40 I can still see the hanging connection on
> the server:
> 
>     netstat -aptn|grep 64311
>     tcp      430      0 ip_haproxy_server:443      ip_client:64311
>  CLOSE_WAIT  916/haproxy
> 
>     lsof|grep 64311
>     haproxy     916          haproxy   40u     IPv4          106204553
>   0t0        TCP ip_haproxy_server:https->ip_client:64311 (CLOSE_WAIT)
> 
>     echo "show fd" | socat - $HASOCK | grep "40 :"
>     40 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x1648d80
> iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
> fe=fe-http mux=H2 mux_ctx=0x15e9460
> 
> I hope this can help in tracking the problem down.

Sure, it's extremely useful. It looks like a shutdown is ignored when
waiting for the H2 preface. The FD is not being polled for reading, so
I tend to conclude that either it was disabled for a reason I still do
not know, or it used to be enabled, the shutdown was received then the
polling was disabled. But that doesn't appear in the connection flags.
So it seems that the transition between the end of handshake and the
start of parsing could be at fault. Maybe we refrain from entering the
H2 state machine because of the early shutdown. I'm going to have a look
in that direction.

Thank you!
Willy

Reply via email to