RE: [PATCH 0/1] Introduce 'NOROOM' status for srv_state metric

Cedric Paillet Wed, 06 Sep 2023 05:45:34 -0700

Thanks for your review.

>I think that it might be more suitable to use the term "FULL"
Ok, no problem with that. (Perhaps we can also rename server_has_room to 
!server_is_full ?)

> Also, a more accurate metric that is generally watched is the queue (both 
> server and backend): 

My first use case is to detect/display one or more server(s) with problems, not 
to determine if the entire backend is undersized.
As I understand it, if just one server in the pool is very slow (and the load 
balancing method is round-robin), the number of sessions on this server will 
increase until it reaches maxconn. At this juncture, the server will no longer 
be selected, and requests will be routed to the other servers. Then, no queue 
(either backend or server) will start to fill up, correct? But the slow server 
will cease receiving requests until its session count drops below maxconn, 
right?

The second use case, as you've outlined, is to detect if a backend is 
undersized. My understanding is that if the backend is "nearly" undersized, the 
first symptom will be some servers reporting "FULL". Only when ALL servers are 
reporting "FULL" will the backend queue start to grow, correct?

Cédric

-----Message d'origine-----
De : Willy Tarreau <w...@1wt.eu> 
Envoyé : mercredi 6 septembre 2023 10:36
À : Cedric Paillet <c.pail...@criteo.com>
Cc : haproxy@formilux.org
Objet : Re: [PATCH 0/1] Introduce 'NOROOM' status for srv_state metric

Hi Cedric,

On Tue, Sep 05, 2023 at 01:40:14PM +0000, Cedric Paillet wrote:
> We are using Prometheus to provide feedback to our users about the 
> status of backend servers. Currently, we have no means of informing 
> them if a server exceeds the maxconn limit, and consequently why it's 
> no longer receiving new requests.
> 
> Therefore, we would like to be able to display when a server surpasses 
> the maxconn limit and is in the "noroom" state. I have prepared a 
> patch specifically for Prometheus, but it might be better to include a 
> boolean directly in the server structure indicating whether the server 
> was considered to have no room the last time server_has_room was 
> called. However, this change seems to have a significant impact on other 
> parts of the code.

I think that it might be more suitable to use the term "FULL" that we already 
use for the listeners, and that also matches what's used at a few places in the 
doc in association with servers or backends being "full".

Also, a more accurate metric that is generally watched is the queue (both 
server and backend): instead of being a boolean, it directly indicates how many 
additional servers are needed to improve the processing time. Persistent 
requests are placed into the server's queue but all other requests go into the 
backend queue. So if you have a total capacity of 4 servers * 100 connections = 
400 outstanding requests, and you see a queue of 200, you know that you'd need 
two extra servers to process these without queuing, and you can even predict 
that the total processing time will decrease by 200*the average queue time, so 
this allows to even add just the required number of servers to keep response 
time below a certain limit.

The only case where a server will be full without having any queue is when the 
total number of outstanding requests on a backend is exactly equal to the sum 
of the servers' maxconn values, so as you see, the extra metric covers very 
little compared to the queue itself. But there might be good use cases for 
this, I'm not denying it, I just wanted to make sure that you're going to 
monitor what's really relevant for your use case ;-)

Willy

RE: [PATCH 0/1] Introduce 'NOROOM' status for srv_state metric

Reply via email to