Hi List, hi Mladen (master of mod_jk):
a year ago we changed to algorithm in mod_jk, that "counts" weighted
requests in the lb worker to decide, which balanced worker should
receive the next request.
The new algorithm three main advantages:
a) using only integers
b) using only a limited range of values
c) behaving well when one balanced worker went out of service for a
longer time and then came back
Unfortunately the algorithm has another deficiency and I propose to do
it in yet another way.
The problem is: the algo needs to update counters on all balanced
workers not only on the worker that gets the request. No in more complex
balancer configurations we have different groups of workers eligible for
a request. A request might carry a session and the right worker is dwn,
then the whole domain (replicas) are eligible. A request might not carry
a session, then all available workers are eligible. Although we have
different groups depending on the type of request and the status of the
workers we only update one common set of load counters. This leads to
incorrect balancing decisions, whenever these situations get mixed.
My proposal is to go back basically to the old way of doing it, but with
an implementation, that still respects a) and c) and to some extend also b).
A) During initialization we calculate the smallest common multiple L of
all balancing weights (lb factor) l_1,...,l_n and set
L_1=L/l_1,...,L_n=L/l_n. That's cheap, the algo is well known and all is
exact integer operations.
B) We use a 64Bit load counter and for each request we do not only add
1, instead we add L_i whenever worker i gets chosen.
C) We always choose the eligible worker with the smallest load counter
(and we can use a rotating offset like it is already implemented for
traffic).
D) During maintenance we divide all load counters by some constant. Best
would be a power of 2, so we can simply shift right all load counters.
This way we shape the counters with a simple approximation of an
exponential curve, so the influence of incrementing the counters in the
past gets less important the longer away it is.
E) If we divide in D) by 2 each minute, then the counters will never
exceed 2 times the maximum a counter can reach in a single minute. For
most weight factors and loads this will easily fit into a 64 Bit
integer. We can furthermore care about overflows by prematurely shifting
right whenever the counters get to big (with very high load aging the
counters earlier on demand).
I could supply a patch including documentation. I think that would be
low risk, because that patch would only influence how the load counters
are handled, and would not interfere with changing worker status and
decision logic.
Comments?
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]