Hey,
I've got a kinda strange problem with UNIX sockets. I first thought it's
a Varnish issue but it may be actually be a HAProxy one.
So I've this attached test config, a static error file and a test script
(perl, no third party modules, just a few lines), which I'd like to
share only off-list (just mail me / let me know) to prevent
script-kiddies from abusing it.
So we have those huge timeouts just for this test case. maxconn 500k.
The test script will do 200k connections. It will open all 200k
connections (verify via netstat/ss or something else) and waits for a
/tmp/debug file to exist (just touch it if ready). If it exists, it will
send GET / requests as fast as possible with all the connections. It was
initially meant to test a connection to Varnish instead.
So what happens is what is strange:
Often: Some connections/requests will be answered with a 503 by HAProxy.
HAProxy will log a "SC" on the backend connection:
Aug 25 18:14:20 localhost haproxy[23512]: unix:1
[25/Aug/2022:18:14:20.051] li_udstest be_static_err/<NOSRV>
62/-1/-1/-1/62 503 97 - - SC-- 162032/1/0/0/0 0/0 "GET / HTTP/1.1"
Sometimes:
The mentioned 503's plus:
Aug 25 18:14:19 localhost haproxy[23512]: Connect() failed for backend
be_udstest: can't connect to destination unix socket, check backlog size
on the server.
Also sometimes:
All 200's. Anything fine.
I'd expect it to behave basically the same every time but that's a
completely different behavior in all three cases. And actually I'd like
it to only answer 200's :)
The process limits are looking good so far:
# grep 'open files' /proc/$(pgrep -n haproxy)/limits
Max open files 1000082 8000000
files
On another test machine we even have a 30mio hard and 10mio soft limit.
So those limits should be enough in any case, actually.
This has been tested with 2.6.4, 2.6.3, 2.6.2.
The same happens if be_udstest points to e.g. Varnish via UDS.
Can you reproduce it? Any idea what may cause it?
--
Regards,
Christian Ruppert
global
maxconn 500000
defaults
maxconn 500000
timeout client 15m
timeout client-fin 15m
timeout connect 15m
timeout http-request 15m
timeout queue 15m
timeout http-keep-alive 15m
timeout server 15m
log 127.0.0.1 len 65535 local0
frontend fe_udstest
mode http
bind :61610
log global
option httplog
default_backend be_udstest
backend be_udstest
mode http
server udstest unix@/run/udstest.sock
listen li_udstest
mode http
option httplog
bind unix@/run/udstest.sock mode 666
default_backend be_static_err
backend be_static_err
mode http
errorfile 503 /etc/haproxy/static-err.txt
HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: close
Content-Type: text/plain
Test