> Em 20 de jul. de 2021, à(s) 14:17, Willy Tarreau <[email protected]> escreveu:
>
> Hi Joao,
>
> On Tue, Jul 20, 2021 at 12:18:18PM -0300, Joao Morais wrote:
>>
>> Hello list, the diff below is a proposal to add a bind keyword used to flag
>> LI_O_NOSTOP option in the bind's listener.
>>
>> Regarding the use case: I need the ability to reach a stopping, but still
>> running haproxy instance to, at least: 1) fairly distribute shutdown sessions
>> of long running connections (usually websockets) before hard-stop-after
>> timeouts and kicks all the remaining connections at the same time[1]; 2)
>> collect some relevant metrics from a stopping instance, e.g. current sessions
>> and rps, which would be otherwise lost when these metrics are collected only
>> from the current instance.
>
> It's a bit confusing for me because it mentions two opposite needs, one
> applies to the listeners (and will thus either prevent the new process
> from binding, or randomly distribute connections between the old and the
> new one), and the other one implies killing random and active connections,
> something we don't do at all.
Hi Willy, I think I wasn’t clear about the needs and the proposal itself. I’ve
actually two needs regarding instances being stopped (SIGUSR) but holding some
active connections: 1) collect metrics in order to create real data about these
connections and new data crossing them - some tcp connections and tunnels
create the most distortions specially in bytes in/out and concurrent
connections; 2) allow to distribute some shutdown sessions before hard-stop
expires.
So here is the proposal, which would allow me to connect to an old instance in
order to change it’s state or collect it’s data, not limited to shutdown
sessions or metrics: a way to connect to these old instances, so a stats socket
alive comes to mind.
You’re correct that reuse the same socket would balance requests between older
and newer instances, depending on the SO_REUSESOCKET state, but as stated below
the socket would be changed (unix socket, new path on every reload). This
wouldn’t be an option to use on a static config of course.
>> Regarding the patch: it's just the changes I needed to make and confirm that
>> it works like I was expecting, provided that the listening socket is changed
>> before reloading haproxy into a new instance. Please let me know if such
>> improvement can be made and also if I'm in the right path.
>
> That's quite of a concern to me because this means that you'll accumulate
> plenty of old processes, even if they do not have any connection at all
> anymore.
The instance wouldn’t run forever - well, I did that test and inc
unstoppable_jobs seems to be enough to see the instance shutting down after the
last connection being closed.
> I think that your various needs would have to be addressed differently
> (the killing of active connections and keeping the old process active).
> For example, if you connect to the old process' CLI it will not quit, as
> this socket counts for one. So maybe as long as you can connect there it
> is enough to keep it alive and monitorable ?
This seems promising, I’ll give this a try. It’s the opposite compared with
what I’m currently doing, I’ll need to take care with the timeout, but at least
I have something to start right now. Any advice here is very welcome.
> For the connection shutdown, maybe we could extend "shutdown sessions" to
> take a percentage, and it could apply some randomness over all connections.
> This way you could periodically emit some shutdowns to kill 1% of the
> connections every few seconds until you reach 100%. It's possible that
> for high numbers of long connections, this significantly improves the
> reload. I don't know if we could even easily automate this, but I do
> see some value in it. It could sometimes kill some stats connections
> as well, but with a bit of cheating that could be avoided.
Maybe configure something that works with hard-stop-after? Adding another
duration or a percentage of the whole hard-stop-after config where the shutdown
starts - so a hard-stop-after of 30m and the configuration asking HAProxy to
start the shutdown in the last 10% of the whole hard-stop-after time would
start to shutdown 1000 session in the last 3 minutes, closing about 5-6
sessions per second.
-Joao Morais