Under load, we're sometimes seeing a situation where HAProxy will
completely delete a bound unix domain socket after a reload.

The "bad flow" looks something like the following:


   - haproxy is running on pid A, bound to /var/run/domain.sock (via a bind
   line in a frontend)
   - we run `haproxy -sf A`, which starts a new haproxy on pid B
   - pid B binds to /var/run/domain.sock.B
   - pid B moves /var/run/domain.sock.B to /var/run/domain.sock (in
   uxst_bind_listener)
   - in the mean time, there are a zillion connections to
   /var/run/domain.sock and pid B isn't started up yet; backlog is exhausted
   - pid B signals pid A to shut down
   - pid A runs the destroy_uxst_socket function and tries to connect to
   /var/run/domain.sock to see if it's still in use. The connection fails
   (because the backlog is full). Pid A unlinks /var/run/domain.sock.
   Everything is sad forever now.

I'm thinking about just commenting out the call to destroy_uxst_socket
since this is all on a tmpfs and we don't really care if spare sockets are
leaked when/if we change configuration in the future. Arguably, the
solution should be something where we don't overflow the listen socket at
all; I'm thinking about also binding to a TCP port on localhost and just
using that for the few seconds it takes to reload (since otherwise we run
out of ephemeral sockets to 127.0.0.1); it still seems wrong for haproxy to
unlink the socket, though.

This has proven extremely irritating to reproduce (since it only occurs if
there's enough load to fill up the backlog on the socket between when pid B
starts up and when pid A shuts down), but I'm pretty confident that what I
described above is happening, since periodically on reloads the domain
socket isn't there and this code fits.

Our configs are quite large, so I'm not reproducing them here. The reason
we bind on a domain socket at all is because we're running two sets of
haproxies — one in multi-process mode doing TCP-mode SSL termination
pointing back over a domain socket to a single-process haproxy applying all
of our actual config.

-- 
James Brown
Systems ​
Engineer

Reply via email to