Re: master-worker no-exit-on-failure with SO_REUSEPORT and a port being already in use

William Lallemand Tue, 19 Nov 2019 07:20:30 -0800

On Tue, Nov 19, 2019 at 03:45:09PM +0100, Christian Ruppert wrote:
> Hi list,
>


Hello,

> I'm facing some issues with already in use ports and the fallback 
> feature, during a reload. SO_REUSEPORT already makes ist easier/better 
> but not perfect, as there are still cases were it fails.
> In my test case I've got a Squid running on port 80 and a HAProxy with 
> "master-worker no-exit-on-failure".

The "no-exit-on-failure" option is only useful when you don't want the master
to kill all the HAProxy processes when one of the workers was killed by
another thing that the master (segv, OOM, bug..). In this case you still need
another worker available to do the job. It's mostly used with a configuration
with nbproc > 1.

> I am using the shipped (2.0.8) 
> systemd unit file and startup HAProxy with some frontend and a bind on 
> like 1337 or something.
> I then add another bind for port 80, which is in use by squid already 
> and try to reload HAProxy. It takes some time until it failes:
> 
> Nov 19 14:39:21 894a0f616fec haproxy[2978]: [WARNING] 322/143921 (2978) 
> : Reexecuting Master process
> ...
> Nov 19 14:39:28 894a0f616fec haproxy[2978]: [ALERT] 322/143922 (2978) : 
> Starting frontend somefrontend: cannot bind socket [0.0.0.0:80]
> ...
> Nov 19 14:39:28 894a0f616fec systemd[1]: haproxy.service: Main process 
> exited, code=exited, status=1/FAILURE
> 
> The reload itself is still running (systemd) and will timeout after 
> about 90s. After that, because of the Restart=always, I guess, it ends 
> up in a restart loop.
> 
> So I would have expected that the master process will fallback to the 
> old process and proceed with the old child until the problem has been 
> fixed.
> 
> Can anybody confirm that? Is that intended?
> 
> https://cbonte.github.io/haproxy-dconv/2.0/management.html#4
> https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#3.1-master-worker
>

Looks like a bug to me, the master should have fallback to the "waitpid mode" 
in this case.

Maybe we don't send the sd_notify OK when we are in waitpid mode and systemd
kills the process after the reload timeout.

I'll do some tests to check what's going on. 

-- 
William Lallemand

Re: master-worker no-exit-on-failure with SO_REUSEPORT and a port being already in use

Reply via email to