Hi List,
After upgrading from 3.2.11 to 3.2.12, we're seeing TCP connections from
HAProxy's DNS resolver to nameservers accumulate in CLOSE_WAIT state
indefinitely. The nameservers send their FIN, but HAProxy never completes
the teardown on its side. The leak is continuous — we observed growth from
~1K to ~5K total connections in about 4 hours.
The issue appeared immediately after upgrading with no configuration
changes. It does not occur on 3.2.11.
We use TCP resolvers (tcp6@ prefix) with SRV record resolution via
server-template. The relevant config:
resolvers default
nameserver g1 tcp6@[2001:4860:4860::8888]:53 source [<ipv6>]
nameserver g2 tcp6@[2001:4860:4860::8844]:53 source [<ipv6>]
nameserver opendns tcp6@[2620:0:ccc::2]:53 source [<ipv6>]
accepted_payload_size 8192
resolve_retries 4
hold valid 60s
hold obsolete 30s
hold timeout 300s
timeout resolve 20s
timeout retry 1s
defaults
default-server resolvers default inter 5s [...]
option abortonclose
backend example
server-template myserver 2 ipv6@_svc._tcp.example.com [...]
We have several backends using this pattern with SRV records.
The CLOSE_WAIT connections are exclusively to port 53 on the configured
nameservers:
$ ss -tn state close-wait | awk '{print $5}' | \
sed 's/:[0-9]*$//' | sort | uniq -c | sort -rn
1100 [2620:0:ccc::2]
1097 [2001:4860:4860::8844]
1092 [2001:4860:4860::8888]
$ ss -tn state close-wait -o | head -5
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 [2600:3c03::xxxx]:49570 [2001:4860:4860::8844]:53
0 0 [2600:3c03::xxxx]:40414 [2001:4860:4860::8888]:53
0 0 [2600:3c03::xxxx]:38996 [2001:4860:4860::8888]:53
0 0 [2600:3c03::xxxx]:53882 [2620:0:ccc::2]:53
Meanwhile actual client and backend connections are healthy without leakage.
Two commits in 3.2.12 seem like potential candidates since DNS resolver
TCP connections go through the raw socket path:
- rawsock: introduce CO_RFL_TRY_HARDER to detect closures on complete
reads (Willy)
- ssl: don't always process pending handshakes on closed connections
(Willy)
Note that "option abortonclose" is enabled in our defaults, which the
second commit explicitly interacts with.
We've confirmed reverting to 3.2.11 also resolves it,
though we'd prefer to stay on 3.2.12 for the QUIC CVE fixes.
Happy to provide haproxy -vv output, full config, or any additional
debugging if helpful.
Best,
Luke
—
Luke Seelenbinder
stadiamaps.com