Re: [PATCH] BUG/MINOR: promex: don't count servers in maintenance

Christopher Faulet Mon, 11 Sep 2023 11:15:37 -0700

Le 07/09/2023 à 16:50, Cedric Paillet a écrit :

And I guess we should also check the healthchecks are enabled for the server. 
It is not really an issue because call to get_check_status_result() will 
exclude neutral and unknown satuses. But there is no reason to count these 
servers.


What we observed is that the health check of servers that have been put into maintenance 
is no longer updated, and the status returned is the last known one. I need to 
double-check, but I believe we even saw L7OK when putting a "UP" server into 
maintenance. (What I'm sure of is that the majority of servers in maintenance were in 
L7STS and not in UNK).

If we add the PROMEX_FL_NO_MAINT_SR (which makes sense), we will continue to display 
an incorrect backend_agg_server_status (and also haproxy_server_check_status) for 
servers in maintenance for those who don't set (no-maint=empty), and we probably 
then need another patch so that the status of these servers is 
HCHK_STATUS_UNKNOWN?"

Health-checks for servers in maintenance are paused. So indeed, the last knownstatus does not change anymore in this state. My purpose here was to also filterservers to only count those with health-checks enabled and running. When theserver's metrics are dumped, the check status is already skipped for servers inmaintenance. Thus it seems logical to not count them for the aggregated metric.

In fact, this way, all servers in maintenance are skipped without checking theserver's admin state. But it is probably cleaner to keep both checks forconsistency. Except if I missed something. This part is not really clear for meanymore...


--
Christopher Faulet

Re: [PATCH] BUG/MINOR: promex: don't count servers in maintenance

Reply via email to