Hi Janusz,

On Thu, May 24, 2018 at 01:49:52PM +0200, Janusz Dziemidowicz wrote:
> Recently I've moved several servers from haproxy 1.7.x to 1.8.x I have
> a setup with nghttpx handling h2 (haproxy connects to nghttpx via unix
> socket which handles h2 and connects back to haproxy with plain
> http/1.1 also through unix socket).
> 
> After the upgrade I wanted to switch to native h2 supported by
> haproxy. Unfortunately, it seems that over time haproxy is
> accumulating sockets in CLOSE_WAIT state. Currently, after 12h I have
> 5k connections in this state. All of them have non-zero Recv-Q and
> zero Send-Q. netstat -ntpa shows something like this:
> 
> tcp        1      0 IP:443      IP:28032      CLOSE_WAIT  115495/haproxy
> tcp       35      0 IP:443      IP:49531       CLOSE_WAIT  115495/haproxy
> tcp      507      0 IP:443      IP:31938     CLOSE_WAIT  115495/haproxy
> tcp      134      0 IP:443      IP:49672      CLOSE_WAIT  115495/haproxy
> tcp      732      0 IP:443      IP:3180       CLOSE_WAIT  115494/haproxy
> tcp      746      0 IP:443      IP:39731      CLOSE_WAIT  115494/haproxy
> tcp       35      0 IP:443      IP:62986      CLOSE_WAIT  115495/haproxy
> tcp      585      0 IP:443      IP:51318     CLOSE_WAIT  115493/haproxy
> tcp      100      0 IP:443      IP:60449     CLOSE_WAIT  115493/haproxy
> tcp       35      0 IP:443      IP:1274      CLOSE_WAIT  115494/haproxy
> ..

I never managed to see this happen yet. Even haproxy.org uses H2 and I've
just checked on the server, zero CLOSE_WAIT. What is strange is that they
all have pending data, it means they sent some data and closed. It could
correspond to a timeout where the client finally closed not receiving a
response.

> Those are all frontend connections. Reloading haproxy removes those
> connections, but only after hard-stop-after kicks in and old processes
> are killed. Disabling native h2 support and switching back to nghttpx
> makes the problem disappear.

OK.

> This kinda seems like the socket was closed on the writing side, but
> the client has already sent something and everything is stuck. I was
> not able to reproduce the problem by myself. Any ideas how to debug
> this further?

For now not much comes to my mind. I'd be interested in seeing the
output of "show fd" issued on the stats socket of such a process (it
can be large, be careful).

> haproxy -vv (Debian package rebuilt on stretch with USE_TFO):

Interesting, and I'm seeing "tfo" on your bind line. We don't have it
on haproxy.org. Could you please re-test without it, just in case ?
Maybe you're receiving SYN+data+FIN that are not properly handled.

> HA-Proxy version 1.8.9-1~tsg9+1 2018/05/21

Is 1.8.9 the first version you tested or is it the first one you saw
the issue on, or did you notice the issue on another 1.8 version ? If
it turned out to be a regression it could be easier to spot in fact.

Your config is very clean and shows nothing suspicious at all. Thus at
first knowing if tfo changes anything would be a good start.

Thanks!
Willy

Reply via email to