We've battled the same issue with our haproxys. We root caused it to slow dns lookup times while parsing the config was causing haproxy config parsing to take so long that we were attempting to reload again before the original reload had completed. I'm still not sure why or where the Signals are getting dropped to the old haproxy, but we found by installing a dns cache on our haproxy nodes we were able to greatly decrease the likelihood of creating zombie haproxy instances.
We further improved on that by rearchitecting our micro-service architecture to make use of the haproxy dynamic scaling apis, and allocating dummy slots for future expansion. Similar to https://www.haproxy.com/blog/dynamic-scaling-for-microservices-with-runtime-api/ . Good luck, I hope that's the answer to your problem. Dave. On Tue, May 29, 2018 at 10:12 AM William Dauchy <[email protected]> wrote: > Hello William, > > Sorry for the last answer. > > > Are the problematical workers leaving when you reload a second time? > > no, they seems to stay for a long time (forever?) > > > Did you try to kill -USR1 the worker ? It should exits with "Former > worker $PID > > exited with code 0" on stderr. > > If not, could you check the Sig* lines in /proc/$PID/status for this > worker? > > will try. I need to put the setup back in shape, and maybe test > without multiple binding. > > > Do you know how much time take haproxy to load its configuration, and do > you > > think you tried a reload before it finished to parse and load the config? > > Type=notify in your systemd unit file should help for this case. If I > remember > > well it checks that the service is 'ready' before trying to reload. > > We are using Type=notify. I however cannot guarantee we do not trigger > a new reload, before the previous one is done. Is there a way to check > the "ready" state you mentioned? > (We are talking about a reload every 10 seconds maximum though) > > > I suspect the SIGUSR1 signal is not received by the worker, but I'm not > sure > > either if it's the master that didn't send it or if the worker blocked > it. > > good to know. > > Best, > -- > William > >

