Hi Cedric,

On Tue, Sep 05, 2023 at 01:40:14PM +0000, Cedric Paillet wrote:
> We are using Prometheus to provide feedback to our users about the status of
> backend servers. Currently, we have no means of informing them if a server
> exceeds the maxconn limit, and consequently why it's no longer receiving new
> requests.
> 
> Therefore, we would like to be able to display when a server surpasses the
> maxconn limit and is in the "noroom" state. I have prepared a patch
> specifically for Prometheus, but it might be better to include a boolean
> directly in the server structure indicating whether the server was considered
> to have no room the last time server_has_room was called. However, this
> change seems to have a significant impact on other parts of the code.

I think that it might be more suitable to use the term "FULL" that we
already use for the listeners, and that also matches what's used at a
few places in the doc in association with servers or backends being
"full".

Also, a more accurate metric that is generally watched is the queue
(both server and backend): instead of being a boolean, it directly
indicates how many additional servers are needed to improve the
processing time. Persistent requests are placed into the server's
queue but all other requests go into the backend queue. So if you have
a total capacity of 4 servers * 100 connections = 400 outstanding
requests, and you see a queue of 200, you know that you'd need two extra
servers to process these without queuing, and you can even predict that
the total processing time will decrease by 200*the average queue time,
so this allows to even add just the required number of servers to keep
response time below a certain limit.

The only case where a server will be full without having any queue is
when the total number of outstanding requests on a backend is exactly
equal to the sum of the servers' maxconn values, so as you see, the
extra metric covers very little compared to the queue itself. But there
might be good use cases for this, I'm not denying it, I just wanted to
make sure that you're going to monitor what's really relevant for your
use case ;-)

Willy

Reply via email to