On Wed, Sep 06, 2023 at 12:44:56PM +0000, Cedric Paillet wrote:
> >I think that it might be more suitable to use the term "FULL"
> Ok, no problem with that. (Perhaps we can also rename server_has_room to 
> !server_is_full ?)

You're right, it might be clearer!

> > Also, a more accurate metric that is generally watched is the queue (both 
> > server and backend): 
> 
> My first use case is to detect/display one or more server(s) with problems,
> not to determine if the entire backend is undersized.
> As I understand it, if just one server in the pool is very slow (and the load
> balancing method is round-robin), the number of sessions on this server will
> increase until it reaches maxconn. At this juncture, the server will no
> longer be selected, and requests will be routed to the other servers. Then,
> no queue (either backend or server) will start to fill up, correct? But the
> slow server will cease receiving requests until its session count drops below
> maxconn, right?

Indeed, so actually you found a pretty valid use case to the edge case I
was seeing as rare enough not to be of much interest :-)

> The second use case, as you've outlined, is to detect if a backend is
> undersized. My understanding is that if the backend is "nearly" undersized,
> the first symptom will be some servers reporting "FULL". Only when ALL
> servers are reporting "FULL" will the backend queue start to grow, correct?

Yes that's it. For this specific use case, watching the queue is much
simpler and provides more than just a boolean. But to be clear, the
situation between one server full and all server full is very unstable,
as queuing is an exponential function of response time by definition,
so once a server is full, you can be certain that all other ones are
about to be full as well, and between the two you'll have a lot of
noise where servers are randomly full then ready depending on the
load. Let's say you're having 40 servers processing 10000 requests
per second with a maxconn 50. That's a total capacity of 2000
outstanding requests (and as many concurrent connections). This
means requests will last on average 200ms at saturation. At 196 ms
response time you'll have 1960 outstanding requests, hence 49 per
server, thus no server marked full. At 200ms all will be full. At 198ms,
half of the servers will be marked full, and that changes for every
single request, i.e. 250 times per second per server. Now you can see
how sampling such a boolean only once every few seconds when it changes
250 times per second will be useless for scalability purposes, you're
just sampling a random value. On the opposite, retrieving the queue
length is more stable since it's for the whole backend and not just
per server, plus it reports more than a boolean.

Hoping this helps,
Willy

Reply via email to