On 2018/8/9 01:01, Patrick Hemmer wrote: > There's an issue on current master (287527a) where haproxy is losing > track of its connections, and they're getting stuck in CLOSE_WAIT. And > it's counting these connections towards limits (such as maxconn). > Eventually maxconn is reached and everything starts queuing, and > timing out. > > Here's an example: > # lsof -p 25520 > COMMAND PID USER FD TYPE DEVICE > SIZE/OFF NODE NAME > .... > haproxy 25520 phemmer 0u CHR 16,5 > 0x1409380c 1797 /dev/ttys005 > haproxy 25520 phemmer 1u CHR 16,5 > 0x1409380c 1797 /dev/ttys005 > haproxy 25520 phemmer 2u CHR 16,5 > 0x1409380c 1797 /dev/ttys005 > haproxy 25520 phemmer 3u > KQUEUE count=0, state=0xa > haproxy 25520 phemmer 4u unix 0x933b1c4c9ed396b7 > 0t0 /tmp/haproxy.sock.25520.tmp > haproxy 25520 phemmer 5u IPv4 0x933b1c4cbf5fc73f > 0t0 TCP *:8001 (LISTEN) > haproxy 25520 phemmer 6u IPv4 0x933b1c4c925fe437 > 0t0 UDP *:51977 > haproxy 25520 phemmer 7 PIPE 0x933b1c4c9f89c457 > 16384 ->0x933b1c4c9f89bf17 > haproxy 25520 phemmer 8 PIPE 0x933b1c4c9f89bf17 > 16384 ->0x933b1c4c9f89c457 > haproxy 25520 phemmer 9u unix 0x933b1c4c9548c96f > 0t0 /tmp/haproxy.sock.25520.tmp > haproxy 25520 phemmer 11u IPv4 0x933b1c4cc0cca73f > 0t0 TCP 127.0.0.1:8001->127.0.0.1:59199 (CLOSE_WAIT) > haproxy 25520 phemmer 12u IPv4 0x933b1c4cbfd20ddf > 0t0 TCP 127.0.0.1:59200->127.0.0.1:8081 (CLOSE_WAIT) > > > FD 11 was the client making the request to haproxy, and FD 12 was > haproxy to the server. > > > show fd > 4 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc] > cnext=-13 cprev=-2 tmask=0x1 umask=0x0 owner=0x7f852af08430 > iocb=0x10bb89c40(listener_accept) l.st=RDY fe=GLOBAL > 5 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc] > cnext=-13 cprev=-2 tmask=0x1 umask=0x0 owner=0x7f852af08a40 > iocb=0x10bb89c40(listener_accept) l.st=RDY fe=f1 > 7 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc] > cnext=-3 cprev=0 tmask=0x1 umask=0x0 owner=0x10bbc2ca0 > iocb=0x10bbc2ca0(poller_pipe_io_handler) > 9 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [lc] > cnext=-3 cprev=-2 tmask=0x1 umask=0x1 owner=0x7f852ae14550 > iocb=0x10bba60a0(conn_fd_handler) cflg=0x00201300 fe=GLOBAL mux=PASS > mux_ctx=0x7f852ca00000 > 11 : st=0x22(R:pRa W:pRa) ev=0x10(Heopi) [lc] > cnext=-3 cprev=-2 tmask=0x1 umask=0x0 owner=0x7f852ae13ef0 > iocb=0x10bba60a0(conn_fd_handler) cflg=0x80241300 fe=f1 mux=PASS > mux_ctx=0x7f852ae14080 > 12 : st=0x25(R:PrA W:pRa) ev=0x00(heopi) [Lc] > cnext=-17 cprev=-2 tmask=0x1 umask=0x0 owner=0x7f852ca000c0 > iocb=0x10bba60a0(conn_fd_handler) cflg=0x00202306 sv=s1/b1 mux=PASS > mux_ctx=0x7f852ae14bb0 > > > show sess > 0x7f852ae14140: proto=tcpv4 src=127.0.0.1:59199 fe=f1 be=b1 > srv=s1 ts=00 age=2m9s calls=6 rq[f=8840020h,i=0,an=8000h,rx=,wx=,ax=] > rp[f=80400000h,i=0,an=a00000h,rx=,wx=,ax=] s0=[7,8h,fd=11,ex=] > s1=[7,118h,fd=12,ex=] exp= > 0x7f852ae14830: proto=unix_stream src=unix:1 fe=GLOBAL > be=<NONE> srv=<none> ts=00 age=20s calls=3 > rq[f=40c08202h,i=0,an=00h,rx=,wx=,ax=] > rp[f=c0048202h,i=0,an=00h,rx=,wx=,ax=] s0=[7,ch,fd=9,ex=] > s1=[7,4018h,fd=-1,ex=] exp=23h59m > > > > Here's the config I'm using: > global > stats socket /tmp/haproxy.sock level admin > defaults > log 127.0.0.1:1234 daemon > mode http > option httplog > timeout queue 5s > frontend f1 > bind :8001 > default_backend b1 > backend b1 > server s1 127.0.0.1:8081 maxconn 1 > > > Log output around the time it broke: > <30>Aug 9 00:45:56 haproxy[25520]: 127.0.0.1:59186 > [09/Aug/2018:00:45:56.620] f1 b1/s1 0/167/1/107/275 200 121 - - ---- > 2/2/1/1/0 0/2 "GET / HTTP/1.1" > <30>Aug 9 00:45:57 haproxy[25520]: 127.0.0.1:59189 > [09/Aug/2018:00:45:56.726] f1 b1/s1 0/169/1/108/278 200 121 - - ---- > 2/2/1/1/0 0/2 "GET / HTTP/1.1" > <30>Aug 9 00:45:57 haproxy[25520]: 127.0.0.1:59193 > [09/Aug/2018:00:45:56.948] f1 b1/s1 0/56/1/109/167 200 121 - - ---- > 2/2/1/1/0 0/1 "GET / HTTP/1.1" > <30>Aug 9 00:45:57 haproxy[25520]: 127.0.0.1:59196 > [09/Aug/2018:00:45:57.056] f1 b1/s1 0/58/1/101/161 200 121 - - ---- > 2/2/1/1/0 0/1 "GET / HTTP/1.1" > <30>Aug 9 00:46:02 haproxy[25520]: 127.0.0.1:59202 > [09/Aug/2018:00:45:57.351] f1 b1/<NOSRV> 0/5001/-1/-1/5007 503 212 - - > sQ-- 3/3/2/0/0 0/1 "GET / HTTP/1.1" > <30>Aug 9 00:46:02 haproxy[25520]: 127.0.0.1:59204 > [09/Aug/2018:00:45:57.546] f1 b1/<NOSRV> 0/5005/-1/-1/5005 503 212 - - > sQ-- 3/3/2/0/0 0/2 "GET / HTTP/1.1" > <30>Aug 9 00:46:07 haproxy[25520]: 127.0.0.1:59206 > [09/Aug/2018:00:46:02.414] f1 b1/<NOSRV> 0/5010/-1/-1/5011 503 212 - - > sQ-- 3/3/2/0/0 0/2 "GET / HTTP/1.1" > <30>Aug 9 00:46:08 haproxy[25520]: 127.0.0.1:59208 > [09/Aug/2018:00:46:03.610] f1 b1/<NOSRV> 0/5010/-1/-1/5010 503 212 - - > sQ-- 3/3/2/0/0 0/2 "GET / HTTP/1.1" > > > Attached is the debug output (compiled with DEBUG=-DDEBUG_FULL) > > To reproduce I was hammering it with several clients simultaneously, > and a server which basically slept for 100ms before responding. Even > with this it's still rather difficult to trigger. I sometimes will > have to run for a few hours before it happens. > > Version data: > # ./haproxy -vv > Sharing comp_state with email_alert > Sharing filter with tcpcheck_ru > Sharing sig_handlers with pipe > Sharing tasklet with email_alert > Sharing notification with email_alert > Sharing session with task > Sharing conn_stream with task > Sharing pendconn with tcpcheck_ru > Sharing http_txn with spoe_ctx > Sharing hlua with uniqueid > HA-Proxy version 1.9-dev1-287527-26 2018/08/08 > Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org> > > Build options : > TARGET = osx > CPU = generic > CC = gcc > CFLAGS = -O0 -g -fno-strict-aliasing > -Wdeclaration-after-statement -fwrapv -fno-strict-overflow > -Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label > OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 > > Default settings : > maxconn = 2000, bufsize = 16384, maxrewrite = 1024, > maxpollevents = 200 > > Built with OpenSSL version : OpenSSL 1.1.0h 27 Mar 2018 > Running on OpenSSL version : OpenSSL 1.1.0h 27 Mar 2018 > OpenSSL library supports TLS extensions : yes > OpenSSL library supports SNI : yes > OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 > Built with Lua version : Lua 5.3.4 > Built with transparent proxy support using: > Encrypted password support via crypt(3): yes > Built with PCRE version : 8.42 2018-03-20 > Running on PCRE version : 8.42 2018-03-20 > PCRE library supports JIT : no (USE_PCRE_JIT not set) > Built with zlib version : 1.2.11 > Running on zlib version : 1.2.11 > Compression algorithms supported : identity("identity"), > deflate("deflate"), raw-deflate("deflate"), gzip("gzip") > Built with network namespace support. > > Available polling systems : > kqueue : pref=300, test result OK > poll : pref=200, test result OK > select : pref=150, test result OK > Total: 3 (3 usable), will use kqueue. > > Available multiplexer protocols : > (protocols markes as <default> cannot be specified using > 'proto' keyword) > h2 : mode=HTTP side=FE > <default> : mode=TCP|HTTP side=FE|BE > > Available filters : > [SPOE] spoe > [COMP] compression > [TRACE] trace > > > -Patrick
I think this is my last ping. -Patrick