On Thu, May 24, 2018 at 11:00:29PM +0200, William Dauchy wrote: > On Thu, May 24, 2018 at 12:01:38PM +0200, William Lallemand wrote: > > I managed to reproduce something similar with the 1.8.8 version. It looks > > like > > letting a socat connected to the socket helps. > > > > I'm looking into the code to see what's happening. > > Indeed, after some more hours, I got the same issue on v1.8.8. However it > seems to > be easier to reproduce in v1.8.9, but I might be wrong. > So now I bet on either thread issue, or bind with reuseport. > I'll try to do some more tests. > > Best,
Hi, I don't think I reproduced the same problem, so I have a few questions for you :-) Are the problematical workers leaving when you reload a second time? Did you try to kill -USR1 the worker ? It should exits with "Former worker $PID exited with code 0" on stderr. If not, could you check the Sig* lines in /proc/$PID/status for this worker? Do you know how much time take haproxy to load its configuration, and do you think you tried a reload before it finished to parse and load the config? Type=notify in your systemd unit file should help for this case. If I remember well it checks that the service is 'ready' before trying to reload. I suspect the SIGUSR1 signal is not received by the worker, but I'm not sure either if it's the master that didn't send it or if the worker blocked it. Thanks! -- William Lallemand

