Re: no-stop keyword proposal

Joao Morais Tue, 20 Jul 2021 12:20:23 -0700

> Em 20 de jul. de 2021, à(s) 14:17, Willy Tarreau <[email protected]> escreveu:
> 
> Hi Joao,
> 
> On Tue, Jul 20, 2021 at 12:18:18PM -0300, Joao Morais wrote:
>> 
>> Hello list, the diff below is a proposal to add a bind keyword used to flag
>> LI_O_NOSTOP option in the bind's listener.
>> 
>> Regarding the use case: I need the ability to reach a stopping, but still
>> running haproxy instance to, at least: 1) fairly distribute shutdown sessions
>> of long running connections (usually websockets) before hard-stop-after
>> timeouts and kicks all the remaining connections at the same time[1]; 2)
>> collect some relevant metrics from a stopping instance, e.g. current sessions
>> and rps, which would be otherwise lost when these metrics are collected only
>> from the current instance.
> 
> It's a bit confusing for me because it mentions two opposite needs, one
> applies to the listeners (and will thus either prevent the new process
> from binding, or randomly distribute connections between the old and the
> new one), and the other one implies killing random and active connections,
> something we don't do at all.

Hi Willy, I think I wasn’t clear about the needs and the proposal itself. I’ve 
actually two needs regarding instances being stopped (SIGUSR) but holding some 
active connections: 1) collect metrics in order to create real data about these 
connections and new data crossing them - some tcp connections and tunnels 
create the most distortions specially in bytes in/out and concurrent 
connections; 2) allow to distribute some shutdown sessions before hard-stop 
expires.

So here is the proposal, which would allow me to connect to an old instance in 
order to change it’s state or collect it’s data, not limited to shutdown 
sessions or metrics: a way to connect to these old instances, so a stats socket 
alive comes to mind.

You’re correct that reuse the same socket would balance requests between older 
and newer instances, depending on the SO_REUSESOCKET state, but as stated below 
the socket would be changed (unix socket, new path on every reload). This 
wouldn’t be an option to use on a static config of course.


>> Regarding the patch: it's just the changes I needed to make and confirm that
>> it works like I was expecting, provided that the listening socket is changed
>> before reloading haproxy into a new instance. Please let me know if such
>> improvement can be made and also if I'm in the right path.
> 
> That's quite of a concern to me because this means that you'll accumulate
> plenty of old processes, even if they do not have any connection at all
> anymore.

The instance wouldn’t run forever - well, I did that test and inc 
unstoppable_jobs seems to be enough to see the instance shutting down after the 
last connection being closed.


> I think that your various needs would have to be addressed differently
> (the killing of active connections and keeping the old process active).
> For example, if you connect to the old process' CLI it will not quit, as
> this socket counts for one. So maybe as long as you can connect there it
> is enough to keep it alive and monitorable ?

This seems promising, I’ll give this a try. It’s the opposite compared with 
what I’m currently doing, I’ll need to take care with the timeout, but at least 
I have something to start right now. Any advice here is very welcome.


> For the connection shutdown, maybe we could extend "shutdown sessions" to
> take a percentage, and it could apply some randomness over all connections.
> This way you could periodically emit some shutdowns to kill 1% of the
> connections every few seconds until you reach 100%. It's possible that
> for high numbers of long connections, this significantly improves the
> reload. I don't know if we could even easily automate this, but I do
> see some value in it. It could sometimes kill some stats connections
> as well, but with a bit of cheating that could be avoided.

Maybe configure something that works with hard-stop-after? Adding another 
duration or a percentage of the whole hard-stop-after config where the shutdown 
starts - so a hard-stop-after of 30m and the configuration asking HAProxy to 
start the shutdown in the last 10% of the whole hard-stop-after time would 
start to shutdown 1000 session in the last 3 minutes, closing about 5-6 
sessions per second.

-Joao Morais
Re: no-stop keyword proposal

Reply via email to