Re: no-stop keyword proposal

Willy Tarreau Tue, 20 Jul 2021 10:18:21 -0700

Hi Joao,

On Tue, Jul 20, 2021 at 12:18:18PM -0300, Joao Morais wrote:
> 
> Hello list, the diff below is a proposal to add a bind keyword used to flag
> LI_O_NOSTOP option in the bind's listener.
> 
> Regarding the use case: I need the ability to reach a stopping, but still
> running haproxy instance to, at least: 1) fairly distribute shutdown sessions
> of long running connections (usually websockets) before hard-stop-after
> timeouts and kicks all the remaining connections at the same time[1]; 2)
> collect some relevant metrics from a stopping instance, e.g. current sessions
> and rps, which would be otherwise lost when these metrics are collected only
> from the current instance.


It's a bit confusing for me because it mentions two opposite needs, one
applies to the listeners (and will thus either prevent the new process
from binding, or randomly distribute connections between the old and the
new one), and the other one implies killing random and active connections,
something we don't do at all.
 
> Regarding the patch: it's just the changes I needed to make and confirm that
> it works like I was expecting, provided that the listening socket is changed
> before reloading haproxy into a new instance. Please let me know if such
> improvement can be made and also if I'm in the right path.

That's quite of a concern to me because this means that you'll accumulate
plenty of old processes, even if they do not have any connection at all
anymore.

I think that your various needs would have to be addressed differently
(the killing of active connections and keeping the old process active).
For example, if you connect to the old process' CLI it will not quit, as
this socket counts for one. So maybe as long as you can connect there it
is enough to keep it alive and monitorable ?

For the connection shutdown, maybe we could extend "shutdown sessions" to
take a percentage, and it could apply some randomness over all connections.
This way you could periodically emit some shutdowns to kill 1% of the
connections every few seconds until you reach 100%. It's possible that
for high numbers of long connections, this significantly improves the
reload. I don't know if we could even easily automate this, but I do
see some value in it. It could sometimes kill some stats connections
as well, but with a bit of cheating that could be avoided.

Willy

Re: no-stop keyword proposal

Reply via email to