Hi List, hi Mladen (master of mod_jk):

a year ago we changed to algorithm in mod_jk, that "counts" weighted requests in the lb worker to decide, which balanced worker should receive the next request.

The new algorithm three main advantages:

a) using only integers
b) using only a limited range of values
c) behaving well when one balanced worker went out of service for a longer time and then came back

Unfortunately the algorithm has another deficiency and I propose to do it in yet another way.

The problem is: the algo needs to update counters on all balanced workers not only on the worker that gets the request. No in more complex balancer configurations we have different groups of workers eligible for a request. A request might carry a session and the right worker is dwn, then the whole domain (replicas) are eligible. A request might not carry a session, then all available workers are eligible. Although we have different groups depending on the type of request and the status of the workers we only update one common set of load counters. This leads to incorrect balancing decisions, whenever these situations get mixed.

My proposal is to go back basically to the old way of doing it, but with an implementation, that still respects a) and c) and to some extend also b).

A) During initialization we calculate the smallest common multiple L of all balancing weights (lb factor) l_1,...,l_n and set L_1=L/l_1,...,L_n=L/l_n. That's cheap, the algo is well known and all is exact integer operations.

B) We use a 64Bit load counter and for each request we do not only add 1, instead we add L_i whenever worker i gets chosen.

C) We always choose the eligible worker with the smallest load counter (and we can use a rotating offset like it is already implemented for traffic).

D) During maintenance we divide all load counters by some constant. Best would be a power of 2, so we can simply shift right all load counters. This way we shape the counters with a simple approximation of an exponential curve, so the influence of incrementing the counters in the past gets less important the longer away it is.

E) If we divide in D) by 2 each minute, then the counters will never exceed 2 times the maximum a counter can reach in a single minute. For most weight factors and loads this will easily fit into a 64 Bit integer. We can furthermore care about overflows by prematurely shifting right whenever the counters get to big (with very high load aging the counters earlier on demand).

I could supply a patch including documentation. I think that would be low risk, because that patch would only influence how the load counters are handled, and would not interfere with changing worker status and decision logic.

Comments?

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to