Hi folks,

After seeing cpu-overloaded leaf servers becoming an ever-increasing 
problem I've spent some brain cycles thinking of a possible solution, and 
this is what I came up with. Flame and/or praise, have at it.

Sengaia / Arjen

*****

Dynamic Y: - lines

This is a description of a possible solution to the primary cause of sendq 
on the
network at the moment - leaf servers having so many fd's open that their 
cpu's do not
have enough cycles available to chew through sendq. I see this occuring on 
a daily
basis at the moment.

The solution - to lower the number of open fd's on a leaf server so that it 
gets a chance to
clear sendq. In order for this to happen a server obviously first needs to 
detect that
it is suffering sendq:

Note that all numbers, percentages etc mentioned are to illustrate the 
concept and can
obviously be changed.

Let's say for arguments sake a server considers itself lagged if:

- it is carrying more than 50% of MAXCONNECTIONS clients.
- the two last server-server pings took more than 1 minute to return.
And possibly also:
- At least one server on the network is currently net.bursting (or was 
within the last 5 minutes).

But I think that last one is not needed.

Once the server detects that it is lagged, it should look at it's Y: lines, 
pick the one with the highest number
of currently-connected clients in it, and lower the maximum connections for 
that class by 25%. This will result
in a lower client load and hence allow the server to clean it's sendq.

If this does not solve the lagged condition within X minutes, the server 
could take more aggressive measure (say a 50%
reduction). You can basically make this algorithm as simple or complicated 
as you want, as long as the concept stays the
same (I'd go for simple :).

Once the server detects that it is no longer lagged, it can raise the 
connection limit(s) for the effected class(es) back
to their original value, possibly in increments.

Although this is obviously not a 100% solution, and it will mean that in 
times of many splits it will be hard for people to get online,
however it will also greatly reduce the amount of time the network is 
lagged, which in the end is the greater good.


Reply via email to