I've only looked at the new load management emails somewhat, not in detail. However, it seems to me that there hasn't been a deep enough theoretical analysis of the issue. I'll try to lay out what I mean by that here, and start the process. I don't have time to finish it anytime soon, but hopefully this will be useful anyway.
I see four major factors in load management, plus a couple others I'll mention after this that make things more complicated. Those are capacity of neighbors, location distribution of neighbors, incoming request distribution, and request misrouting tolerance. Obviously not all neighbors have the same request-handling capacity. At present, Freenet acknowledges this to a limited degree via the bandwidth limit / peer limit relationship. It is possible we do not wish to allow an infinite range of capacities, because of concern about the effects of (non-malicious) supernodes on the network, as regards things like security and fault tolerance. Neighbor location distribution and incoming request distribution are closely connected. If the two are matched, then all neighbors will be sent a similar number of requests. If they're mismatched, some neighbors will get a larger share of requests than others. Variable peer capacity means that a mismatch here may be desirable, in order to produce an outbound request pattern that matches available network capacity. Lastly, we have misrouting tolerance. At one extreme, we route every request correctly, and the capacity of the local node is limited by the slowest neighbor node. (To be more precise: the local capacity C_l becomes min({r_0, r_1, ..., r_n}) where r_i is the ratio C_i/S_i, and C_i is the capacity of neighbor i, and S_i is the share of incoming requests sent to neighbor i. In other words, the local capacity may be limited not by the slowest node, but by some faster node that gets sent a disproportionately larger share of the request traffic.) These four interrelate, at a fundamental theoretical level. Namely, you can't arbitrarily control all four at the same time. To use the term loosely, there are four parameters, but only three degrees of freedom. So, the question becomes, where do we want to exercise control, and how can we do so most optimally? This analysis seems to me to lead to a conclusion that we should be trying to do load management by controlling who our peers are. We can't do much about incoming request distribution (not entirely true when you include local requests, more on this later) or peer capacity. That leaves one degree of freedom between misrouting and peer location distribution, so we find we're trading between those two. This is, imho, not a surprising conclusion, but also not one I've seen explicitly stated before. It's somewhat explicit in our current LRU peer selection scheme, but not to the degree I think we should be talking about. There are a couple factors that complicate all of this. These include locally originated requests, locally answered requests, and FOAF routing. Locally answered requests aren't often discussed in detail in simulations. For example, the simulators I've written all try to route to the best node on the network (shortest node location to request location distance). However, node store size variability makes things more complicated than that. I'm also not sure how to handle it in a nice clean fashion. We could try simulating it, but that's just going to be an exercise in GIGO unless we have actual data to use. (Probe requests as per issue 3568 would be good here.) However, I think we can get rather far while continuing to ignore that issue. Locally originated requests are trickier. Some nodes have lots, some have very few, some vary when people browse freesites, etc. I'm not really sure what to do about them. They need more careful analysis. Specifically, I see two conflicting requirements: local requests need to be taken into account when choosing peers, so that our local requests don't swamp the long links entirely; and local requests need to not be taken into account, so that they don't distort the overall network topology. FOAF routing, combined with our bandwidth-peer count relationship, is a start at handling load by managing who our peers are. I think this is the direction further load management work should take. I'll end with a few ideas I haven't explored in much detail yet. They might be good ones or bad ones, but I hope they're at least interesting. What if each peer advertises its available outbound bandwidth to its peers? The peer selection code could then attempt to maintain a constant ratio of bandwidth per fraction of requests served among peers. What about modifying the LRU code to correct the time since last usage by a factor of number of FOAF locations? That is, if we have peer A with 8 FOAF locations advertised and peer B with 20, and B is responding to 1.5 requests for every 1 that A responds to, then A should be treated as the better node. That is, we favor nodes which minimize ((time since last request answered) / (FOAF locations advertised)). (Count of FOAF locations is then being treated as a standin for capacity, as per the suggestion above.) Of course, all of these have the interesting problem that they distort topology away from stuff we have careful theoretical models for. That's bad. OTOH, I'm not entirely sure that those same models are actually directly applicable to the current network, given the combination of LRU peers selection, FOAF routing, and variable peer counts with capacity. The models also don't tell us what to do about the fact that some nodes generate a lot more requests than others. Evan Daniel