Hi there,

mod_jk for example uses such aging, but only for the non busyness case. busyness is meant to show the number of currently in-flight requests, so aging isn't a good fit there. Old load numbers are never part of busyness. But busyness is the mode that is most sensitive to the numer skew effects that JFC observed. Therefore that attempt to have more precise counting there.

It makes sense for byrequests and bytraffic though. But in mod_jk we use a different byrequests algorithm. Not the original count and decrement system that Mladen introduced but instead a count and age system.

The aging for byrequests and bytraffic could be hooked on mod_watchdog which is nice, because we would not need to run it as part of normal request handling.

Another thing that comes to my mind is (graceful) restart handlingan bybusyness. It might make sense to clear the numbers in case of such an event.

Best regards,

Rainer

Am 31.08.23 um 18:23 schrieb Jim Jagielski:
IIRC, the goal of having an "aging" function was to handle this exact kind of 
thing, where values could be normalized over a long period of time so that old entries 
that may skew results are not weighted as heavily as new ones.

On Aug 30, 2023, at 11:19 AM, jean-frederic clere <jfcl...@gmail.com> wrote:

Hi,

All the balancers have thread/process safe issues, but with bybusyness the 
effect is worse, basically a worker may stay with a busy count greater than 
zero even no request is being processed.

busy is displayed in the balancer_handler() so users/customers will notice the 
value doesn't return to zero...

If you run a load test the value of busy will increase by time and in all the 
workers

When using bybusyness, having pics in the load and later no much load makes the 
lowest busy workers used and the ones with a wrong higher value not being used.

In a test with 3 workers, I end with busy:
worker1: 3
worker2: 0
worker3: 2
Doing the load test several time the buys values are increasing in all workers.

I am wondering is we could end with something like:
worker1: 1000
worker2: 0
worker3: 1000

in this case bybusyness will send all the load to worker2 until we reach 1000 
simultaneous request on worker2... Obvious that looks bad.

How to fix that?
1 - reset the busy using a watchdog and elected (or transferred+read) unchanged 
for some time (using one of timeout we have on workers).
2 - warn in the docs that bybusyness is not the best choice for loadbalancing.
3 - create another balancer that just choose random a worker.

--
Cheers

Jean-Frederic
ยด

Reply via email to