Hi, we're hitting a big roadblock in getting HAProxy to load balance our apps with transparent reloads. Namely, when the new instance starts up with -sf option, the old instances don't seem to react to SIGUSR1 properly. They continue to hang around indefinitely, and snatch incoming requests from the new instance, which can lead it to them trying to route requests to backends that have been taken out of commission since.
We tried inspecting it, and the first lead we got was "option forceclose", but that didn't help. After playing with the debugging info available over stats socket interface, we've come to the following conclusions: - It reports no connections that would lead it to not shutting down - Consistent with the above, it does NOT need to have any working backends to distribute load to. However, making HTTP requests to HAProxy's port still seems to increase chances of it getting into the bugged state - Reported values for "tasks" are in the range of 9-15, and "run_queue" is 1-2. I don't know the exact meaning of these values, so I don't know how to judge their importance, but it doesn't seem that properly exiting processes (see below) had values any lower. - SIGTTOU doesn't appear to have an effect -- the process doesn't unbind its socket and continues to receive connections. It *does* print "global.srvcls", "global.clicls" and "global.close" to stdout when run in debugging mode though, after which it happily prints handling details for newly accepted connections. - SIGUSR1 does not shut it down when it's in the bugged state. I can't tell whether it *receives* USR1, though, as there's no debug output for signal received, only for signal-caused shutdown. Weirdly enough, it doesn't happen every time -- every once in a while everything would be A-OK and the instances shutting down cleanly, which served to confuse debugging greatly, as it appeared that things we did to the config had an effect. Nothing seemed to be consistent though, and eventually we'd be back to staring at a long process list of haproxy instances stealing requests from each other. We've created a test setup that reproduces the issue; it's very similar to the setup we're trying to deploy (modulo backend "services" being complete dummies that don't even serve anything on port 80). It's based on docker, so it requires a new enough version of Docker to run (we're running 1.9 here). To run it, unpack it, then run ./haproxy-bug-repro.sh. It's based on registering backends (== newly spawned docker containers exposing port 80 and named appropriately) via registrator into Consul, which facilitates service discovery. Consul's catalogue is then templated into HAProxy's config, and haproxy process reloaded. To see the issue, do: $ docker exec -ti haproxy sh # ps ax (note: if you want to start over, you will have to remove docker's containers, because they have to be named uniquely. Running $ docker rm -f backend-{1,2,3,4,5,6,7,8,9,10} haproxy should help) Link to the tarball: https://purestorage.app.box.com/s/nnzqueais46plzd9xfisnmkeab7j9s0y I will be sending it as an attachment in a separate mail as a followup to this one, in case the mailing list software scrubs attachments and/or considers them spam. Thanks, Maciej