Ian Clarke (ian at locut.us) wrote:

> Using VMStat is a cop-out.  If we can't find a correlation between any 
> of the intuitive measurable factors (such as threads active or network 
> load) then it suggests that something is wrong - and we should figure 
> out why.

There are two bottlenecks which affect a node's performance: CPU
and network bandwidth.  (Sure, there's also memory, disk I/O
throughput and latency, and possibly others, but for the purposes
of this discussion we're only concerned with the factors that
affect the majority of nodes/users.  To the best of my knowledge,
these are CPU and network.)

Consider case 1: a Pentium 120 running Fred in a university dorm that
hasn't been firewalled yet.  It's probably on something like a T1
or a T3.

Now consider case 2: a Duron 1.3 running Fred on ADSL with a 256
kbps upstream.

Each of these machines is capable of handling a certain number of
requests per hour.  In case 1, the network is never going to be
saturated by Fred alone; the CPU will be the bottleneck.  In case
2, the CPU may be overwhelemed from time to time (e.g. FEC decoding)
but in general the network will be the bottleneck.  (I could have
given a "case 2" with even more bias in favor of the network, but
my case 2 is a real machine.)

> What are we going to do with this load measurement anyway?  We will 
> reduce the number of concurrent requests - yet the whole rationale for 
> this is that the number of concurrent requests doesn't directly affect 
> load - so the whole rationale for this falls apart.

No, the problem is that the number of requests is a useful measurement,
but only if we know how many requests the node is *capable* of.
Some nodes can handle 1000 requests per hour and some can handle
100,000 requests per hour.

There's no way you can know what the working capacity of a node is.
But you can measure the CPU load (with vmstat or other per-platform
tools), and if a threshold has been exceeded, you can refuse requests
until the load drops to an acceptable level.  Unix daemons have been
doing this for 2 decades now (I'm thinking primarily of sendmail).

This leads us to the other variable: network bandwidth.  The only
reasonable measure of this that I'm aware of is the "Bytes waiting
to be sent" field in the OCM page (/servlet/nodeinfo/networking/ocm).
If that value becomes sufficiently large, then once again, you can
refuse requests until the outgoing queue is a bit leaner.

> The solution to this problem is to do profiling to understand why some 
> things affect the load more than others, and fortunately some people 
> have already started work in this area.

No, profiling is a solution to a *different* problem, namely: how
do we make Fred more efficient?  That's a long-term problem, and
certainly an important one, but it's separate from what we're talking
about right now.

Matthew's vmstat proposal is a partial solution to the immediate
problem, which is: why are some nodes overloaded and others
underutilized?  My proposal (building on Matthew's) to address this
problem is:

 1) Use vmstat or other per-platform tools to determine the CPU load,
    and throttle back (refuse requests, etc.) if it's too high.

 2) Use the "Bytes waiting to be sent" field to determine the network
    load, and throttle back (refuse requests, etc.) if it's too high.

I'm not as familiar with the code as you and Matthew are.  I'm
willing to accept that my proposal might have flaws, even deadly
ones which would make it totally unsuitable.  But so far I haven't
seen an acceptably powerful rebuttal of our basic position.

-- 
Greg Wooledge                  |   "Truth belongs to everybody."
greg at wooledge.org              |    - The Red Hot Chili Peppers
http://wooledge.org/~greg/     |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20030517/44db6f33/attachment.pgp>

Reply via email to