I've only looked at the new load management emails somewhat, not in
detail. However, it seems to me that there hasn't been a deep enough
theoretical analysis of the issue. I'll try to lay out what I mean by
that here, and start the process. I don't have time to finish it
anytime soon, but hopefully this will be useful anyway.

I see four major factors in load management, plus a couple others I'll
mention after this that make things more complicated. Those are
capacity of neighbors, location distribution of neighbors, incoming
request distribution, and request misrouting tolerance.

Obviously not all neighbors have the same request-handling capacity.
At present, Freenet acknowledges this to a limited degree via the
bandwidth limit / peer limit relationship. It is possible we do not
wish to allow an infinite range of capacities, because of concern
about the effects of (non-malicious) supernodes on the network, as
regards things like security and fault tolerance.

Neighbor location distribution and incoming request distribution are
closely connected. If the two are matched, then all neighbors will be
sent a similar number of requests. If they're mismatched, some
neighbors will get a larger share of requests than others. Variable
peer capacity means that a mismatch here may be desirable, in order to
produce an outbound request pattern that matches available network
capacity.

Lastly, we have misrouting tolerance. At one extreme, we route every
request correctly, and the capacity of the local node is limited by
the slowest neighbor node. (To be more precise: the local capacity C_l
becomes min({r_0, r_1, ..., r_n}) where r_i is the ratio C_i/S_i, and
C_i is the capacity of neighbor i, and S_i is the share of incoming
requests sent to neighbor i. In other words, the local capacity may be
limited not by the slowest node, but by some faster node that gets
sent a disproportionately larger share of the request traffic.)

These four interrelate, at a fundamental theoretical level. Namely,
you can't arbitrarily control all four at the same time. To use the
term loosely, there are four parameters, but only three degrees of
freedom. So, the question becomes, where do we want to exercise
control, and how can we do so most optimally?

This analysis seems to me to lead to a conclusion that we should be
trying to do load management by controlling who our peers are. We
can't do much about incoming request distribution (not entirely true
when you include local requests, more on this later) or peer capacity.
That leaves one degree of freedom between misrouting and peer location
distribution, so we find we're trading between those two.

This is, imho, not a surprising conclusion, but also not one I've seen
explicitly stated before. It's somewhat explicit in our current LRU
peer selection scheme, but not to the degree I think we should be
talking about.

There are a couple factors that complicate all of this. These include
locally originated requests, locally answered requests, and FOAF
routing. Locally answered requests aren't often discussed in detail in
simulations. For example, the simulators I've written all try to route
to the best node on the network (shortest node location to request
location distance). However, node store size variability makes things
more complicated than that. I'm also not sure how to handle it in a
nice clean fashion. We could try simulating it, but that's just going
to be an exercise in GIGO unless we have actual data to use. (Probe
requests as per issue 3568 would be good here.) However, I think we
can get rather far while continuing to ignore that issue.

Locally originated requests are trickier. Some nodes have lots, some
have very few, some vary when people browse freesites, etc. I'm not
really sure what to do about them. They need more careful analysis.
Specifically, I see two conflicting requirements: local requests need
to be taken into account when choosing peers, so that our local
requests don't swamp the long links entirely; and local requests need
to not be taken into account, so that they don't distort the overall
network topology.

FOAF routing, combined with our bandwidth-peer count relationship, is
a start at handling load by managing who our peers are. I think this
is the direction further load management work should take.

I'll end with a few ideas I haven't explored in much detail yet. They
might be good ones or bad ones, but I hope they're at least
interesting.

What if each peer advertises its available outbound bandwidth to its
peers? The peer selection code could then attempt to maintain a
constant ratio of bandwidth per fraction of requests served among
peers.

What about modifying the LRU code to correct the time since last usage
by a factor of number of FOAF locations? That is, if we have peer A
with 8 FOAF locations advertised and peer B with 20, and B is
responding to 1.5 requests for every 1 that A responds to, then A
should be treated as the better node. That is, we favor nodes which
minimize ((time since last request answered) / (FOAF locations
advertised)). (Count of FOAF locations is then being treated as a
standin for capacity, as per the suggestion above.)

Of course, all of these have the interesting problem that they distort
topology away from stuff we have careful theoretical models for.
That's bad. OTOH, I'm not entirely sure that those same models are
actually directly applicable to the current network, given the
combination of LRU peers selection, FOAF routing, and variable peer
counts with capacity. The models also don't tell us what to do about
the fact that some nodes generate a lot more requests than others.

Evan Daniel

Reply via email to