On Wed, Sep 06, 2023 at 12:44:56PM +0000, Cedric Paillet wrote: > >I think that it might be more suitable to use the term "FULL" > Ok, no problem with that. (Perhaps we can also rename server_has_room to > !server_is_full ?)
You're right, it might be clearer! > > Also, a more accurate metric that is generally watched is the queue (both > > server and backend): > > My first use case is to detect/display one or more server(s) with problems, > not to determine if the entire backend is undersized. > As I understand it, if just one server in the pool is very slow (and the load > balancing method is round-robin), the number of sessions on this server will > increase until it reaches maxconn. At this juncture, the server will no > longer be selected, and requests will be routed to the other servers. Then, > no queue (either backend or server) will start to fill up, correct? But the > slow server will cease receiving requests until its session count drops below > maxconn, right? Indeed, so actually you found a pretty valid use case to the edge case I was seeing as rare enough not to be of much interest :-) > The second use case, as you've outlined, is to detect if a backend is > undersized. My understanding is that if the backend is "nearly" undersized, > the first symptom will be some servers reporting "FULL". Only when ALL > servers are reporting "FULL" will the backend queue start to grow, correct? Yes that's it. For this specific use case, watching the queue is much simpler and provides more than just a boolean. But to be clear, the situation between one server full and all server full is very unstable, as queuing is an exponential function of response time by definition, so once a server is full, you can be certain that all other ones are about to be full as well, and between the two you'll have a lot of noise where servers are randomly full then ready depending on the load. Let's say you're having 40 servers processing 10000 requests per second with a maxconn 50. That's a total capacity of 2000 outstanding requests (and as many concurrent connections). This means requests will last on average 200ms at saturation. At 196 ms response time you'll have 1960 outstanding requests, hence 49 per server, thus no server marked full. At 200ms all will be full. At 198ms, half of the servers will be marked full, and that changes for every single request, i.e. 250 times per second per server. Now you can see how sampling such a boolean only once every few seconds when it changes 250 times per second will be useless for scalability purposes, you're just sampling a random value. On the opposite, retrieving the queue length is more stable since it's for the whole backend and not just per server, plus it reports more than a boolean. Hoping this helps, Willy