subject:"Haproxy refuses new connections when doing a reload followed by a restart"

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread Apollon Oikonomopoulos

On 17:30 Thu 12 Oct , William Lallemand wrote:
> On Thu, Oct 12, 2017 at 05:50:52PM +0300, Apollon Oikonomopoulos wrote:
> > Yes, there are. systemd will only perform a single operation on a 
> > unit at a time, and will queue up the rest. When you inform systemd 
> > that something (startup/reload) is in progress, it will not let any 
> > other action happen until the first operation is finished. Now it's 
> > trivial to issue a ton of reloads in a row that will leave a ton of 
> > old processes lying around until they terminate.
>  
> I don't think you can, either with the master-worker or the wrapper, it was 
> one
> of the problems we had in the past.
> 
> The master-worker waits to be ready to handle the signals, and the wrapper 
> waits
> for a pipe to be closed on the children side to handle signals.

Interesting, thanks! I guess I'm still stuck in the 1.6 era, I have some 
catch-up to do :)

Regards,
Apollon

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread William Lallemand

On Thu, Oct 12, 2017 at 05:50:52PM +0300, Apollon Oikonomopoulos wrote:
> > 
> > One helpful feature I read in the documentation is the usage of the
> > sd_notify(..  "READY=1").  It can be useful for configuration files that 
> > takes
> > time to process, for example those with a lot of ssl frontends. This signal
> > could be send once the children has been forked.
> > 
> > It's difficult to know when a reload is completely finished (old processes
> > killed) in case of long TCP sessions. So, if we use this system there is a 
> > risk
> > to trigger a timeout in systemd on the reload isn't it?
> 
> The Reload timeout is apparently controlled by TimeoutStartSec in 
> systemd.
> 
> > feature for the reload, it should be done after the fork of the new 
> > processes,
> > not after the leaving of the old processes, because the processes are 
> > ready to
> > receive traffic at this stage.
> 
> That's true. OTOH the problem with haproxy-systemd-wrapper is that once 
> it re-exec's itself it loses track of the old processes completely 
> (IIRC),

That's right, but we won't fix it in the wrapper, the current architecture
doesn't allow it easily, and it's not reasonable to backport the master-worker
in a stable branch. Those problems will be fixed with the master-worker in 1.8.

> combined with the fact that old processes may eat up a lot of 
> memory. There are cases where you would prefer breaking a long TCP 
> session after 30s if it would give you back 2GB of RSS, to having the 
> process lying around just for one client.

Sure, that can be done in the haproxy config file with the hard-stop-after 
keyword.

> > Are there really advantages to letting know systemd when a reload is 
> > finished
> > or when a process is ready?
> 
> Yes, there are. systemd will only perform a single operation on a unit 
> at a time, and will queue up the rest. When you inform systemd that 
> something (startup/reload) is in progress, it will not let any other 
> action happen until the first operation is finished. Now it's trivial to 
> issue a ton of reloads in a row that will leave a ton of old processes 
> lying around until they terminate.
 
I don't think you can, either with the master-worker or the wrapper, it was one
of the problems we had in the past.

The master-worker waits to be ready to handle the signals, and the wrapper waits
for a pipe to be closed on the children side to handle signals.

> The other advantage with Type=notify services is that systemd will wait 
> for READY=1 before starting units with After=haproxy (although HAProxy 
> is really a "leaf" kind of service).
> 

That's interesting, nobody ever ask for that, but I see a few cases where that
can be useful.

> Note that the dependency is really thin, and you can always make it a 
> compile-time option. 
> 
> Regards,
> Apollon

Cheers,

-- 
William Lallemand

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread Apollon Oikonomopoulos

On 16:17 Thu 12 Oct , William Lallemand wrote:
> Hi,
> 
> On Thu, Oct 12, 2017 at 01:19:58PM +0300, Apollon Oikonomopoulos wrote:
> > The biggest issue here is that we are using a signal to trigger the 
> > reload (which is a complex, non-atomic operation) and let things settle 
> > on their own. Systemd assumes that as soon as the signal is delivered 
> > (i.e.  the ExecReload command is done), the reload is finished, while in 
> > our case the reload is finished when the old haproxy process is really 
> > dead. Using a signal to trigger the reload is handy, so we could keep 
> > that, but the wrapper would need some changes to make reloads more 
> > robust:
> > 
> >  1. It should use sd_notify(3) to communicate the start/stop/reload 
> > status to systemd (that would also mean converting the actual 
> > service to Type=notify). This way no other operation will be 
> > performed on the unit until the reload is finished and the process 
> > group is in a known-good state.
> > 
> >  2. It should handle the old process better: apart from relying on the 
> > new haproxy process for killing the old one, it should explicitly 
> > SIGKILL it after a given timeout if it's not dead yet and make sure 
> > reloads are timeboxed.
> > 
> > IIUC, in 1.8 the wrapper has been replaced by the master process which 
> > seems to do point 2 above, but point 1 is something that should still be 
> > handled IMHO.
> 
> One helpful feature I read in the documentation is the usage of the
> sd_notify(..  "READY=1").  It can be useful for configuration files that takes
> time to process, for example those with a lot of ssl frontends. This signal
> could be send once the children has been forked.
> 
> It's difficult to know when a reload is completely finished (old processes
> killed) in case of long TCP sessions. So, if we use this system there is a 
> risk
> to trigger a timeout in systemd on the reload isn't it?

The Reload timeout is apparently controlled by TimeoutStartSec in 
systemd.

> feature for the reload, it should be done after the fork of the new 
> processes,
> not after the leaving of the old processes, because the processes are 
> ready to
> receive traffic at this stage.

That's true. OTOH the problem with haproxy-systemd-wrapper is that once 
it re-exec's itself it loses track of the old processes completely 
(IIRC), combined with the fact that old processes may eat up a lot of 
memory. There are cases where you would prefer breaking a long TCP 
session after 30s if it would give you back 2GB of RSS, to having the 
process lying around just for one client.

> 
> However I'm not sure it's that useful, you can know when a process is ready
> using the logs, and it will add specific code for systemd and a dependency.
> 
> Are there really advantages to letting know systemd when a reload is finished
> or when a process is ready?

Yes, there are. systemd will only perform a single operation on a unit 
at a time, and will queue up the rest. When you inform systemd that 
something (startup/reload) is in progress, it will not let any other 
action happen until the first operation is finished. Now it's trivial to 
issue a ton of reloads in a row that will leave a ton of old processes 
lying around until they terminate.

The other advantage with Type=notify services is that systemd will wait 
for READY=1 before starting units with After=haproxy (although HAProxy 
is really a "leaf" kind of service).

Note that the dependency is really thin, and you can always make it a 
compile-time option. 

Regards,
Apollon

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread William Lallemand

Hi,

On Thu, Oct 12, 2017 at 01:19:58PM +0300, Apollon Oikonomopoulos wrote:
> The biggest issue here is that we are using a signal to trigger the 
> reload (which is a complex, non-atomic operation) and let things settle 
> on their own. Systemd assumes that as soon as the signal is delivered 
> (i.e.  the ExecReload command is done), the reload is finished, while in 
> our case the reload is finished when the old haproxy process is really 
> dead. Using a signal to trigger the reload is handy, so we could keep 
> that, but the wrapper would need some changes to make reloads more 
> robust:
> 
>  1. It should use sd_notify(3) to communicate the start/stop/reload 
> status to systemd (that would also mean converting the actual 
> service to Type=notify). This way no other operation will be 
> performed on the unit until the reload is finished and the process 
> group is in a known-good state.
> 
>  2. It should handle the old process better: apart from relying on the 
> new haproxy process for killing the old one, it should explicitly 
> SIGKILL it after a given timeout if it's not dead yet and make sure 
> reloads are timeboxed.
> 
> IIUC, in 1.8 the wrapper has been replaced by the master process which 
> seems to do point 2 above, but point 1 is something that should still be 
> handled IMHO.

One helpful feature I read in the documentation is the usage of the
sd_notify(..  "READY=1").  It can be useful for configuration files that takes
time to process, for example those with a lot of ssl frontends. This signal
could be send once the children has been forked.

It's difficult to know when a reload is completely finished (old processes
killed) in case of long TCP sessions. So, if we use this system there is a risk
to trigger a timeout in systemd on the reload isn't it? If we want to use this
feature for the reload, it should be done after the fork of the new processes,
not after the leaving of the old processes, because the processes are ready to
receive traffic at this stage.

However I'm not sure it's that useful, you can know when a process is ready
using the logs, and it will add specific code for systemd and a dependency.

Are there really advantages to letting know systemd when a reload is finished
or when a process is ready?

Cheers,

-- 
William Lallemand

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread Apollon Oikonomopoulos

Hi all,

On 22:01 Wed 04 Oct , Lukas Tribus wrote:
> Hello Moemen,
> 
> 
> Am 04.10.2017 um 19:21 schrieb Moemen MHEDHBI:
> >
> > I am wondering if this is actually an expected behaviour and if maybe
> > that restart/stop should just shutdown the process and its open connections.
> > I have made the following tests:
> > 1/ keep an open connection then do a restart will work correctly without
> > waiting for existing connections to be closed.
> 
> You're right, I got confused there.
> 
> Stop or restart is supposed to kill existing connections without any 
> timeouts, and
> systemd would signal it with a SIGTERM to the systemd-wrapper:
> 
> https://cbonte.github.io/haproxy-dconv/1.7/management.html#4
> 
> 
> 
> > I think it makes more sense to say that restart will not wait for
> > established connections.
> 
> Correct, that's the documented behavior as per haproxy documentation and
> really the implicit assumption when talking about stopping/restarting in a 
> process
> management context (I need more coffee).
> 
> 
> 
> >   Otherwise there will be no difference between
> > reload and restart unless there is something else am not aware of.
> > If we need to fix 2/, a possible solution would be:
> > - Set killmode to "control-group" rather than "mixed" (the current
> > value) in systemd unit file.
> 
> Indeed the mixed killmode was a conscious choice:
> https://marc.info/?l=haproxy=141277054505608=2
> 
> 
> I guess the problem is that when a reload happens before a restart and 
> pre-reload
> systemd-wrapper process is still alive, systemd gets confused by that old 
> process
> and therefor, refrains from starting up the new instance.

The biggest issue here is that we are using a signal to trigger the 
reload (which is a complex, non-atomic operation) and let things settle 
on their own. Systemd assumes that as soon as the signal is delivered 
(i.e.  the ExecReload command is done), the reload is finished, while in 
our case the reload is finished when the old haproxy process is really 
dead. Using a signal to trigger the reload is handy, so we could keep 
that, but the wrapper would need some changes to make reloads more 
robust:

 1. It should use sd_notify(3) to communicate the start/stop/reload 
status to systemd (that would also mean converting the actual 
service to Type=notify). This way no other operation will be 
performed on the unit until the reload is finished and the process 
group is in a known-good state.

 2. It should handle the old process better: apart from relying on the 
new haproxy process for killing the old one, it should explicitly 
SIGKILL it after a given timeout if it's not dead yet and make sure 
reloads are timeboxed.

IIUC, in 1.8 the wrapper has been replaced by the master process which 
seems to do point 2 above, but point 1 is something that should still be 
handled IMHO.

Regards,
Apollon

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-08 Thread Georg Faerber

On 17-10-08 21:55:37, William Lallemand wrote:
> * To change the KillMode to the default, which should kill -SIGTERM
> all processes on a stop or restart. But if I remember well, it leads
> to a bad exit code on the systemd side and display an error.

There is SuccessExitStatus [1] which might help with that.

Cheers,
Georg


[1] 
https://www.freedesktop.org/software/systemd/man/systemd.service.html#SuccessExitStatus=


signature.asc
Description: Digital signature

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-08 Thread William Lallemand

On Fri, Oct 06, 2017 at 05:04:18PM +0200, Moemen MHEDHBI wrote:
> Hi Lukas,
> 
> 
> On 04/10/2017 22:01, Lukas Tribus wrote:
> > I guess the problem is that when a reload happens before a restart and 
> > pre-reload
> > systemd-wrapper process is still alive, systemd gets confused by that old 
> > process
> > and therefor, refrains from starting up the new instance.
> >
> > Or systemd doesn't get confused, sends SIGTERM to the old systemd-wrapper
> > process as well, but the wrapper doesn't handle SIGTERM after a SIGUSR1
> > (a hard stop WHILE we are already gracefully stopping).
> >
> >
> > Should the systemd-wrapper exit after distributing the graceful stop 
> > message to
> > processes? I don't think so, it sounds horribly.
> >
> > Should the systemd-wrapper expect a SIGTERM after a SIGUSR1 and sends the
> > TERM/INT to its childs? I think so, but I'm not 100% sure. Is that even the 
> > issue?
> >
> >
> >
> > We did get rid of the systemd-wrapper in haproxy 1.8-dev, and replaced it 
> > with a
> > master->worker solution, so I'd say there is a chance that this doesn't 
> > affect 1.8.
> >
> 
> A. It appears to me that it is not the wrapper that receives the SIGUSR1
> but the haproxy process.
> 
> B. Here is how I technically explain the "bug" (to be confirmed by the
> Devs) reported by Niels:
>  - During the reload:
>   1. A SIGUSR2 is sent to the systemd-wrapper
>   2. The wrapper sends SIGUSR1 to haproxy processes listed in the pid file.
>   3. A new haproxy process is listening for incoming connections and the
> pid file now contains only the pid of the new process.
> - Then when issuing a restart/stop:
>  1. A SIGTERM is sent to the systemd-wrapper
>  2. The wrapper sends SIGTERM to haproxy processes listed in the pid file.
>  3. Only the new haproxy process is stopped the other one is still there
> since it did not receive the SIGTERM
> - This why systemd is getting confused and after the timeout systemd
> gets done with this by sending a SIGTERM to all child process
> (killmode=mixed policy)
> 

During a reload the wrapper receive a SIGURS2 or a SIGHUP which causes it to
reexec itself without changing its PID, read the pid file and fork kind of a
master process with -sf.  This new master process will send the SIGUSR1 to the
previous processes, fork the new children and write their PID in the pid file.

During a restart, it's more simple, the wrapper will receive a SIGTERM or a
SIGINT, the wrapper will read the PID file, and forward the signal to those
processes. Once the processes are killed, the master will leave and the wrapper
too.

> C. I was able to verify this by doing the following:
>  1. After the reload I manually add the old process pid to the pidfile
>  2. Then When I hit restart, all process are stopped correctly.
> 
> So the question is ( @William ): when doing a soft stop should we
> preserve old process pid in the pidfile until the process terminates ?
> 

Unfortunately that's one of the problem of the current wrapper system, it's
more a hack than a real process supervisor. The wrapper does not handle the
PID, it only forwards the signals and read the pid file.

The problem with letting old pid in the pidfile, is that you don't know if it's 
still an haproxy process, so, if you ask for a restart, it will eventualy kill
something which has been forked between the reload and the restart.
And the list will grow indefinitely with each reload/restart.

The master-worker model should fix that kind of issue, because it's aware of
all PIDs, old and new.

You could try:

* To change the KillMode to the default, which should kill -SIGTERM all 
processes
on a stop or restart. But if I remember well, it leads to a bad exit code on
the systemd side and display an error.

* To reduce the timeout of the SIGTERM with TimeoutStopSec= in your unit file

-- 
William Lallemand

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-06 Thread Moemen MHEDHBI

Hi Lukas,


On 04/10/2017 22:01, Lukas Tribus wrote:
> I guess the problem is that when a reload happens before a restart and 
> pre-reload
> systemd-wrapper process is still alive, systemd gets confused by that old 
> process
> and therefor, refrains from starting up the new instance.
>
> Or systemd doesn't get confused, sends SIGTERM to the old systemd-wrapper
> process as well, but the wrapper doesn't handle SIGTERM after a SIGUSR1
> (a hard stop WHILE we are already gracefully stopping).
>
>
> Should the systemd-wrapper exit after distributing the graceful stop message 
> to
> processes? I don't think so, it sounds horribly.
>
> Should the systemd-wrapper expect a SIGTERM after a SIGUSR1 and sends the
> TERM/INT to its childs? I think so, but I'm not 100% sure. Is that even the 
> issue?
>
>
>
> We did get rid of the systemd-wrapper in haproxy 1.8-dev, and replaced it 
> with a
> master->worker solution, so I'd say there is a chance that this doesn't 
> affect 1.8.
>

A. It appears to me that it is not the wrapper that receives the SIGUSR1
but the haproxy process.

B. Here is how I technically explain the "bug" (to be confirmed by the
Devs) reported by Niels:
 - During the reload:
  1. A SIGUSR2 is sent to the systemd-wrapper
  2. The wrapper sends SIGUSR1 to haproxy processes listed in the pid file.
  3. A new haproxy process is listening for incoming connections and the
pid file now contains only the pid of the new process.
- Then when issuing a restart/stop:
 1. A SIGTERM is sent to the systemd-wrapper
 2. The wrapper sends SIGTERM to haproxy processes listed in the pid file.
 3. Only the new haproxy process is stopped the other one is still there
since it did not receive the SIGTERM
- This why systemd is getting confused and after the timeout systemd
gets done with this by sending a SIGTERM to all child process
(killmode=mixed policy)

C. I was able to verify this by doing the following:
 1. After the reload I manually add the old process pid to the pidfile
 2. Then When I hit restart, all process are stopped correctly.

So the question is ( @William ): when doing a soft stop should we
preserve old process pid in the pidfile until the process terminates ?

-- 
Moemen MHEDHBI

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-04 Thread Niels Hendriks

Hi all,

Thanks for the responses.

I agree, in most cases having Ansible trigger a reload instead of a restart
is better and it will prevent the situation I described. We have a few
environments with very long running sessions where there are situations
that we change the configuration and configure different backend IPs. With
a reload, we found that those old sessions would still be active to the old
backend IPs instead of the new ones we configured. Understandable of course
due to the reload, but that's why we have a restart handler in Ansible when
we perform configuration changes.
I will however look in the option Lukas mentioned, which sounds like it
will prevent this and allow us to always reload instead. (
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#hard-stop-after
)

Of course, regardless, it would also be great if the behaviour is made
consistent as Moemen suggests.

Regards,
Niels Hendriks

On 4 October 2017 at 22:01, Lukas Tribus  wrote:

> Hello Moemen,
>
>
> Am 04.10.2017 um 19:21 schrieb Moemen MHEDHBI:
> >
> > I am wondering if this is actually an expected behaviour and if maybe
> > that restart/stop should just shutdown the process and its open
> connections.
> > I have made the following tests:
> > 1/ keep an open connection then do a restart will work correctly without
> > waiting for existing connections to be closed.
>
> You're right, I got confused there.
>
> Stop or restart is supposed to kill existing connections without any
> timeouts, and
> systemd would signal it with a SIGTERM to the systemd-wrapper:
>
> https://cbonte.github.io/haproxy-dconv/1.7/management.html#4
>
>
>
> > I think it makes more sense to say that restart will not wait for
> > established connections.
>
> Correct, that's the documented behavior as per haproxy documentation and
> really the implicit assumption when talking about stopping/restarting in a
> process
> management context (I need more coffee).
>
>
>
> >   Otherwise there will be no difference between
> > reload and restart unless there is something else am not aware of.
> > If we need to fix 2/, a possible solution would be:
> > - Set killmode to "control-group" rather than "mixed" (the current
> > value) in systemd unit file.
>
> Indeed the mixed killmode was a conscious choice:
> https://marc.info/?l=haproxy=141277054505608=2
>
>
> I guess the problem is that when a reload happens before a restart and
> pre-reload
> systemd-wrapper process is still alive, systemd gets confused by that old
> process
> and therefor, refrains from starting up the new instance.
>
> Or systemd doesn't get confused, sends SIGTERM to the old systemd-wrapper
> process as well, but the wrapper doesn't handle SIGTERM after a SIGUSR1
> (a hard stop WHILE we are already gracefully stopping).
>
>
> Should the systemd-wrapper exit after distributing the graceful stop
> message to
> processes? I don't think so, it sounds horribly.
>
> Should the systemd-wrapper expect a SIGTERM after a SIGUSR1 and sends the
> TERM/INT to its childs? I think so, but I'm not 100% sure. Is that even
> the issue?
>
>
>
> We did get rid of the systemd-wrapper in haproxy 1.8-dev, and replaced it
> with a
> master->worker solution, so I'd say there is a chance that this doesn't
> affect 1.8.
>
>
> Niels, I still think what you want is for Ansible to reload instead of
> restart, but I
> agree that this is an issue ("systemctl [stop|restart] haproxy" should work
> regardless of an additional instance that is already gracefully stopping).
>
>
> CC'ing William and Apollon, maybe they can share their opinion?
>
>
>
>
> [1] https://cbonte.github.io/haproxy-dconv/1.7/
> configuration.html#hard-stop-after
>
>

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-04 Thread Lukas Tribus

Hello Moemen,


Am 04.10.2017 um 19:21 schrieb Moemen MHEDHBI:
>
> I am wondering if this is actually an expected behaviour and if maybe
> that restart/stop should just shutdown the process and its open connections.
> I have made the following tests:
> 1/ keep an open connection then do a restart will work correctly without
> waiting for existing connections to be closed.

You're right, I got confused there.

Stop or restart is supposed to kill existing connections without any timeouts, 
and
systemd would signal it with a SIGTERM to the systemd-wrapper:

https://cbonte.github.io/haproxy-dconv/1.7/management.html#4



> I think it makes more sense to say that restart will not wait for
> established connections.

Correct, that's the documented behavior as per haproxy documentation and
really the implicit assumption when talking about stopping/restarting in a 
process
management context (I need more coffee).



>   Otherwise there will be no difference between
> reload and restart unless there is something else am not aware of.
> If we need to fix 2/, a possible solution would be:
> - Set killmode to "control-group" rather than "mixed" (the current
> value) in systemd unit file.

Indeed the mixed killmode was a conscious choice:
https://marc.info/?l=haproxy=141277054505608=2


I guess the problem is that when a reload happens before a restart and 
pre-reload
systemd-wrapper process is still alive, systemd gets confused by that old 
process
and therefor, refrains from starting up the new instance.

Or systemd doesn't get confused, sends SIGTERM to the old systemd-wrapper
process as well, but the wrapper doesn't handle SIGTERM after a SIGUSR1
(a hard stop WHILE we are already gracefully stopping).


Should the systemd-wrapper exit after distributing the graceful stop message to
processes? I don't think so, it sounds horribly.

Should the systemd-wrapper expect a SIGTERM after a SIGUSR1 and sends the
TERM/INT to its childs? I think so, but I'm not 100% sure. Is that even the 
issue?



We did get rid of the systemd-wrapper in haproxy 1.8-dev, and replaced it with a
master->worker solution, so I'd say there is a chance that this doesn't affect 
1.8.


Niels, I still think what you want is for Ansible to reload instead of restart, 
but I
agree that this is an issue ("systemctl [stop|restart] haproxy" should work
regardless of an additional instance that is already gracefully stopping).


CC'ing William and Apollon, maybe they can share their opinion?




[1] 
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#hard-stop-after

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-04 Thread Moemen MHEDHBI

Hi Lukas,


On 04/10/2017 18:57, Lukas Tribus wrote:
> Hello Niels,
>
>
> a restart means stopping haproxy - and after haproxy exited completely,
> starting haproxy again. When that happens, haproxy immediately stops
> listening to the sockets and then waits for existing connections to be
> closed (you can accelerate that with hard-stop-after [1], but that's not
> the point).
>
> So what you are seeing is expected behavior when RESTARTING.
I am wondering if this is actually an expected behaviour and if maybe
that restart/stop should just shutdown the process and its open connections.
I have made the following tests:
1/ keep an open connection then do a restart will work correctly without
waiting for existing connections to be closed.
2/ Keep an open connection then do a reload + a restart: will wait for
existing connections to be closed.
So if restart should wait for existing connections to terminate then 1/
should be fixed otherwise 2/ should be fixed.

I think it makes more sense to say that restart will not wait for
established connections.  Otherwise there will be no difference between
reload and restart unless there is something else am not aware of.
If we need to fix 2/, a possible solution would be:
- Set killmode to "control-group" rather than "mixed" (the current
value) in systemd unit file.
 
>
> Seems to me you want RELOAD behavior instead, so RELOAD is what Ansible
> should trigger when it detects a config change, no RESTART.
>
Agree

-- 
Moemen MHEDHBI

Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-04 Thread Lukas Tribus

Hello Niels,


a restart means stopping haproxy - and after haproxy exited completely,
starting haproxy again. When that happens, haproxy immediately stops
listening to the sockets and then waits for existing connections to be
closed (you can accelerate that with hard-stop-after [1], but that's not
the point).

So what you are seeing is expected behavior when RESTARTING.

Seems to me you want RELOAD behavior instead, so RELOAD is what Ansible
should trigger when it detects a config change, no RESTART.



cheers,
lukas

[1] 
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#hard-stop-after

Haproxy refuses new connections when doing a reload followed by a restart

2017-10-04 Thread Niels Hendriks

Hello,

First time mailing here, so hopefully I'm at the right place.

I use Ansible to deploy our haproxy configuration, and the following
scenario might happen:
1. Haproxy is running on an existing system
2. I execute the Ansible playbook.
3. This playbook changes the haproxy config (which will trigger a restart
of haproxy at the end of the run due to an ansible handler)
4. The same playbook also updates Haproxy due to a new package being
available from the repository. This package automatically reloads haproxy.
5. Shortly after the automatic reload from the haproxy package, Ansible
triggers a restart of haproxy due to the config change we did at 3. )

This works as expected when there are no active sessions in haproxy.
However, if there is an active session, the active session will still work,
but no new connections will be accepted ("connection refused"). The call to
restart haproxy will also hang for 1.5 minutes. Please see the following
log:

# Reload happening here due to the apt upgrade of haproxy
Oct  4 13:01:03 icinga01 systemd[1]: Reloading HAProxy Load Balancer.
Oct  4 13:01:03 icinga01 systemd[1]: Reloaded HAProxy Load Balancer.

# restart happening here due to the ansible handler
Oct  4 13:01:18 icinga01 ansible-systemd: Invoked with no_block=False
name=haproxy enabled=None daemon_reload=False state=restarted user=False
masked=None
Oct  4 13:01:18 icinga01 systemd[1]: Stopping HAProxy Load Balancer...
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: State 'stop-sigterm'
timed out. Killing.
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: Killing process 14410
(haproxy-systemd) with signal SIGKILL.
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: Killing process 14413
(haproxy) with signal SIGKILL.
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: Killing process 14414
(haproxy) with signal SIGKILL.
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: Main process exited,
code=killed, status=9/KILL
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: Killing process 14413
(haproxy) with signal SIGKILL.
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: Killing process 14414
(haproxy) with signal SIGKILL.
Oct  4 13:02:48 icinga01 systemd[1]: Stopped HAProxy Load Balancer.
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: Unit entered failed
state.
Oct  4 13:02:48 icinga01 systemd[1]: haproxy.service: Failed with result
'timeout'.
Oct  4 13:02:48 icinga01 systemd[1]: Starting HAProxy Load Balancer...
Oct  4 13:02:48 icinga01 systemd[1]: Started HAProxy Load Balancer.

Please see the environment information:
apt-cache policy haproxy
haproxy:
  Installed: 1.7.9-1~bpo9+1
  Candidate: 1.7.9-1~bpo9+1
  Version table:
 *** 1.7.9-1~bpo9+1 100
100 http://ftp.nl.debian.org/debian stretch-backports/main amd64
Packages
100 http://haproxy.debian.net stretch-backports-1.7/main amd64
Packages
100 http://httpredir.debian.org/debian stretch-backports/main amd64
Packages
100 /var/lib/dpkg/status
 1.7.5-2 500
500 http://ftp.nl.debian.org/debian stretch/main amd64 Packages

This is a debian 9 amd64 installation. I have also seen the issue on Debian
8 however.


I have a minimal haproxy config to reproduce it:
cat /etc/haproxy/haproxy.cfg

global
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon

listen icinga_ido
bind :::3306 v4v6
mode tcp

server icingasql01 :3306 check port 9100
server icingasql02 :3306 check port 9100
server icingasql03 :3306 check port 9100


If haproxy is running with this config, and you connect to MySQL through
haproxy (simply: mysql --h 127.0.0.1 u user -p) the session will be "open".
No need to actually perform a query.
While this session is open, the following will cause the issue:

systemctl reload haproxy && systemctl restart haproxy.service

During this time, the active connection will still work, but I'm unable to
open a new connection until the timeout 1.5 minutes later.

If I don't have any sessions open to MySQL the reload & restart will have
no noticeable delay/hang and it works as expected.

Is this a bug or is there a setting I should change to prevent this from
happening?

Thank you!
Niels Hendriks

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Re: Haproxy refuses new connections when doing a reload followed by a restart

Haproxy refuses new connections when doing a reload followed by a restart

13 matches

Site Navigation

Mail list logo

Footer information