Hi,

Since we don't really know how to track this one, we thought it might
be better to reach out here to get feedback.

We're using haproxy to deliver streaming files under pressure
(80-90Gbps per machine). When using h1/http, splice-response is a
great help to keep load under control. We use branch v2.9 at the
moment.

However, we've hit a bug with splice-response (Github issue created)
and we had to use all day our haproxies without splicing.

When we reach a certain load, a "connection refused" alarm starting
buzzing like crazy (2-3 times every 30 minutes). This alarm is simply
a connect to localhost with 500ms timeout:

socat /dev/null  tcp4-connect:127.0.0.1:80,connect-timeout=0.5

The log file indicates the port is virtually closed:

2024/03/27 01:06:04 socat[984480] E read(6, 0xe98000, 8192): Connection refused

The thing is haproxy process is very much alive...so we just restart
it everytime this happens.

What data do you suggest we collect to help track this down? Not sure
if the stats socket is available, but we can definitely try and get
some information.

We're not running out of fds, or even connections with/without backlog
(we have a global maxconn of 900000 with ~30,000 streaming sessions
active and we have tcp_max_syn_backlog set to 262144), we checked. But
it seems to correlate with heavy traffic.

Thanks!

-- 
Felipe Damasio

Reply via email to