On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland < toad at amphibian.dyndns.org> wrote:
> That is because we do not have the time or funding to empirically test > hypotheses. We don't gather enough data, we don't have a huge testnet to try > stuff on over extended periods, and so on. Most software development has > deadlines and resource limitations, but most software development doesn't > NEED to be empirical, at least not very often! See my other mail about this. > I don't think the problem is a lack of time or funding. The problem is that we come up with solutions that are too complicated to analyze or fix when they don't work. Throwing more time and money at that might help, but that is addressing the symptom, not the cause. The cause is complexity, which just grows and grows as we try to fix problems we don't understand by layering on more complexity. > Misrouting is unacceptable, in general. Extremely overloaded or extremely > low capacity nodes may be routed around. We might even allow some bounded > amount of misrouting in the more general case (e.g. go to either of the top > two peers for the key). But in general, transforming load into misrouting > (or into reduced HTL, or any other bogus escape valve) is a bad idea. We > need to reduce the incoming load. > Right, the same is true of queueing. If nodes are forced to do things to deal with overloading that make the problem worse then the load balancing algorithm has failed. Its job is to prevent that from happening. > > What if all nodes told other nodes that they were overloaded because all > > their peers are overloaded. Such a situation would basically cause the > > entire network to think it was overloaded, even though nobody actually > was! > > It becomes a bit like this cartoon: http://flic.kr/p/5npfm2 How can > this > > be avoided? > > Exactly. There are two possible approaches: > 1. Nodes control the rate at which *local* requests are initiated, based on > how many slow-down signals they get. > > (This ensures that such gridlock or feedback loop problems can't happen) > Ah, so slow-downs only ever affect local request initiation? The problem there is that if a peer is overloaded, only those peers that are directly connected to it will slow down their rate of requests. Peers elsewhere on the network will continue to fire requests at it, and there will be no way for them to know they need to slow down :-/ Unless... what if the DataReply message (or whatever its called these days) contains a slow-down request, and its respected by every peer along that path according to AIMD? The problem is that there will be a very week link between the rate of requests, and the rate of slow-down messages. It would be like having 10,000 individual heaters with thermostats all trying to regulate the temperature of a gigantic space. Each individual thermostat/heater will have an almost imperceptible impact. Although perhaps that would work...? Fair sharing can limit this. Limiting backoff so that we only misroute when > nodes are severely overloaded can also help. We could queue (provided this > happens only rarely and briefly), allow bounded misrouting (e.g. route to > one of the top two routes), or terminate requests (thus freeing up > resources). > I think "fair sharing" is one of those complexities that we need to go back to the drawing board on. Its clear that even now we have hypotheses about how fair sharing could be causing problems. Its too complicated to really reason about, or to anticipate all of its implications. > - The easiest way to implement #1 is with AIMD's. We can keep them, but > send slow-down messages earlier. > - We could keep NLM, i.e. queue for limited times to smooth out delays. Doesn't that violate the principal that the load balancing stuff shouldn't do anything that makes the problem worse? Queueing does exactly that. > Currently we RNF after a (longish) period waiting for a node to route to. > We would send a slow-down message when we have queued for more than a > certain period. This means we can have a fixed, limited amount of > misrouting, we can keep NLM's clear benefits for routing accuracy (as > visible in success rates), and we can ensure that the input load is low > enough to avoid severe slowdown due to queueing for a long time. > No, I think we need to avoid queueing except in case of emergency. Queueing only makes things worse by tying up more resources for longer. Ian. -- Ian Clarke Founder, The Freenet Project Email: ian at freenetproject.org -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110829/1d8153cd/attachment.html>