Hello Anthony,

On Wed, Dec 06, 2017 at 10:48:23PM +0000, Anthony Via wrote:
> ?Hello,
> 
> 
> I am testing seamless reloads on HAProxy 1.8.0 using the master-worker model 
> and am running into the following when sending SIGUSR2 to the master process:
> 
> 
> [ALERT] 339/222907 (61399) : Starting frontend internal_http: cannot bind 
> socket [0.0.0.0:80]
> [WARNING] 339/222907 (61399) : Reexecuting Master process in waitpid mode
> [WARNING] 339/222907 (61399) : Reexecuting Master process
> 
> From my understanding, after the master process receives the SIGUSR2 signal
> it should be sending the worker process(es) the SIGUSR1 signal, which does
> not appear to be happening.

When the master receive the SIGUSR2 signal, it should reexec itself with the
-sf argument followed by the PIDs of the current workers.

It follows the execution of a normal haproxy process with -sf:

- It parses the configuration
- Try to bind with SO_REUSEPORT if supported by your system (I don't think your 
system support it)
- If it cannot bind it sends the SIGTTOU signal to the old processes
- the old processes receive the SIGTTOU and disable the bind
- the new process try to bind
- the new process send the SIGUSR1 to the old ones
- the new process fork the children

I just tested the master-worker with the -dR option to validate that the
SIGTTOU system is still working, it seems to work on my linux.

> I have manually sent worker processes the SIGUSR1 signal, and they do shut
> down cleanly as expected. I thought maybe the worker wasn't shutting down
> quick enough, so I played around with "hard-stop-after?", but that did not
> help.
> 

Did you try launching a new haproxy process with the -sf option, without using
the master-worker?

> This is on a Solaris based operating system. I did have success on Ubuntu
> 16.04, so I'm wondering if this is an issue with my OS.
> 

What is your operating system and version exactly?

> Any ideas for narrowing the problem down?
> 
> Thanks,

I think the old processes did not receive the SIGTTOU for an unknown reason,
or did not unbind once it received the signal.

Maybe you could try to compare what's happening on your solaris-like system and
your ubuntu with the -dR option, using strace on linux and truss on solaris.

Regards,

-- 
William Lallemand

Reply via email to