On Fri, Oct 06, 2017 at 05:04:18PM +0200, Moemen MHEDHBI wrote:
> Hi Lukas,
> 
> 
> On 04/10/2017 22:01, Lukas Tribus wrote:
> > I guess the problem is that when a reload happens before a restart and 
> > pre-reload
> > systemd-wrapper process is still alive, systemd gets confused by that old 
> > process
> > and therefor, refrains from starting up the new instance.
> >
> > Or systemd doesn't get confused, sends SIGTERM to the old systemd-wrapper
> > process as well, but the wrapper doesn't handle SIGTERM after a SIGUSR1
> > (a hard stop WHILE we are already gracefully stopping).
> >
> >
> > Should the systemd-wrapper exit after distributing the graceful stop 
> > message to
> > processes? I don't think so, it sounds horribly.
> >
> > Should the systemd-wrapper expect a SIGTERM after a SIGUSR1 and sends the
> > TERM/INT to its childs? I think so, but I'm not 100% sure. Is that even the 
> > issue?
> >
> >
> >
> > We did get rid of the systemd-wrapper in haproxy 1.8-dev, and replaced it 
> > with a
> > master->worker solution, so I'd say there is a chance that this doesn't 
> > affect 1.8.
> >
> 
> A. It appears to me that it is not the wrapper that receives the SIGUSR1
> but the haproxy process.
> 
> B. Here is how I technically explain the "bug" (to be confirmed by the
> Devs) reported by Niels:
>  - During the reload:
>   1. A SIGUSR2 is sent to the systemd-wrapper
>   2. The wrapper sends SIGUSR1 to haproxy processes listed in the pid file.
>   3. A new haproxy process is listening for incoming connections and the
> pid file now contains only the pid of the new process.
> - Then when issuing a restart/stop:
>  1. A SIGTERM is sent to the systemd-wrapper
>  2. The wrapper sends SIGTERM to haproxy processes listed in the pid file.
>  3. Only the new haproxy process is stopped the other one is still there
> since it did not receive the SIGTERM
> - This why systemd is getting confused and after the timeout systemd
> gets done with this by sending a SIGTERM to all child process
> (killmode=mixed policy)
> 

During a reload the wrapper receive a SIGURS2 or a SIGHUP which causes it to
reexec itself without changing its PID, read the pid file and fork kind of a
master process with -sf.  This new master process will send the SIGUSR1 to the
previous processes, fork the new children and write their PID in the pid file.

During a restart, it's more simple, the wrapper will receive a SIGTERM or a
SIGINT, the wrapper will read the PID file, and forward the signal to those
processes. Once the processes are killed, the master will leave and the wrapper
too.


> C. I was able to verify this by doing the following:
>  1. After the reload I manually add the old process pid to the pidfile
>  2. Then When I hit restart, all process are stopped correctly.
> 
> So the question is ( @William ): when doing a soft stop should we
> preserve old process pid in the pidfile until the process terminates ?
> 

Unfortunately that's one of the problem of the current wrapper system, it's
more a hack than a real process supervisor. The wrapper does not handle the
PID, it only forwards the signals and read the pid file.

The problem with letting old pid in the pidfile, is that you don't know if it's 
still an haproxy process, so, if you ask for a restart, it will eventualy kill
something which has been forked between the reload and the restart.
And the list will grow indefinitely with each reload/restart.


The master-worker model should fix that kind of issue, because it's aware of
all PIDs, old and new.

You could try:

* To change the KillMode to the default, which should kill -SIGTERM all 
processes
on a stop or restart. But if I remember well, it leads to a bad exit code on
the systemd side and display an error.

* To reduce the timeout of the SIGTERM with TimeoutStopSec= in your unit file

-- 
William Lallemand

Reply via email to