We've battled the same issue with our haproxys.  We root caused it to slow
dns lookup times while parsing the config was causing haproxy config
parsing to take so long that we were attempting to reload again before the
original reload had completed.  I'm still not sure why or where the Signals
are getting dropped to the old haproxy, but we found by installing a dns
cache on our haproxy nodes we were able to greatly decrease the likelihood
of creating zombie haproxy instances.

We further improved on that by rearchitecting our micro-service
architecture to make use of the haproxy dynamic scaling apis, and
allocating dummy slots for future expansion.  Similar to
https://www.haproxy.com/blog/dynamic-scaling-for-microservices-with-runtime-api/
.

Good luck, I hope that's the answer to your problem.
Dave.

On Tue, May 29, 2018 at 10:12 AM William Dauchy <[email protected]> wrote:

> Hello William,
>
> Sorry for the last answer.
>
> > Are the problematical workers leaving when you reload a second time?
>
> no, they seems to stay for a long time (forever?)
>
> > Did you try to kill -USR1 the worker ? It should exits with "Former
> worker $PID
> > exited with code 0" on stderr.
> > If not, could you check the Sig* lines in /proc/$PID/status for this
> worker?
>
> will try. I need to put the setup back in shape, and maybe test
> without multiple binding.
>
> > Do you know how much time take haproxy to load its configuration, and do
> you
> > think you tried a reload before it finished to parse and load the config?
> > Type=notify in your systemd unit file should help for this case. If I
> remember
> > well it checks that the service is 'ready' before trying to reload.
>
> We are using Type=notify. I however cannot guarantee we do not trigger
> a new reload, before the previous one is done. Is there a way to check
> the "ready" state you mentioned?
> (We are talking about a reload every 10 seconds maximum though)
>
> > I suspect the SIGUSR1 signal is not received by the worker, but I'm not
> sure
> > either if it's the master that didn't send it or if the worker blocked
> it.
>
> good to know.
>
> Best,
> --
> William
>
>

Reply via email to