[freenet-dev] Beyond New Load Management

Matthew Toseland Sat, 27 Aug 2011 15:53:01 +0100

On Saturday 27 Aug 2011 15:43:53 Matthew Toseland wrote:
> After trying out New Load Management on the network and seeing rather bad 
> results, we need to reconsider load management. IMHO Old Load Management (the 
> current system) is still not an acceptable answer.
> 
> Ideal load management would:
> - PERFORMANCE: Performance is the number of requests running in parallel, 
> divided by the average time taken. This means latency should only be 
> increased if we increase the number of requests running in parallel by a 
> similar factor. We want to achieve close to the ideal for *both* successful 
> and unsuccessful requests. Our capacity for unsuccessful requests is huge, 
> but we don't know whether they are going to succeed when we send them, which 
> creates some problems.
> - LATENCY: Not increase latency significantly for realtime requests. These 
> must respond quickly, even at the expense of somewhat poorer routing.
> - REACTIVITY: React quickly to variation in available resources (e.g. some 
> requests completing quickly), but not overshoot.
> - ACCURACY: Route accurately, severely limiting misrouting, but avoid 
> excessively slow nodes.
> - INCENTIVES: Be incentives-compatible and secure: Ideally, the originator 
> should not be special.
> - DOS: Make DoS attacks hard, dependant on the number of connections you have 
> into the network (hence very hard on darknet and hopefully similar difficulty 
> to global surveillance on opennet).
> 
> NEW LOAD MANAGEMENT:
> 
> So far, NLM seems to have problems. Ian argues these are largely due to 
> queueing making a bad situation worse - queueing causes requests to take 
> longer, the only way to counteract this is to run more requests in parallel, 
> but that ALSO causes queueing to take longer...
> 
> Performance: So far poor
> Latency: Very poor
> Reactivity: Should be reasonable, not clear
> Accuracy: Very good
> Incentives: Good, the originator is not special
> DoS: Good, thanks mainly to fair sharing (see below)
> 
> OLD LOAD MANAGEMENT:
> 
> AIMDs on the originator based on RejectedOverload's, which are generated when 
> a request is rejected, and passed all the way back to the originator. When 
> the RejectedOverload is originally generated, the peer in question gets 
> backed off.
> 
> Performance: Moderate
> Latency: Good
> Reactivity: Poor
> Accuracy: Poor (all the backoffs)
> Incentives: Poor (the originator is special; sending loads of requests can 
> improve performance at the cost of the network)
> DoS: Poor (nothing to stop you ignoring AIMDs, sending loads of requests and 
> causing lots of backoffs)
> 
> OLD LOAD MANAGEMENT + FAIR SHARING:
> 
> Fair sharing between peers greatly reduces our vulnerability to DoS, and 
> improves performance on nodes with relatively few peers. However, current 
> fair sharing includes an abrupt transition which can cause backoffs, and 
> makes the next item harder. This should be fixed soon.
> 
> Incentives: Better but the originator is still special
> DoS: Moderate
> 
> IMPROVED AIMD'S:
> 
> We can make the AIMD's be a true request count window rather than a rate 
> estimator. This should respond faster to variations in retrieval times (e.g. 
> getting a bunch of offered keys). We should probably not multiply it by the 
> number of peers as we do now, and we should probably consider how sensitive 
> we want the AIMD's to be (I'm not sure how we would easily calirbate that).
> 
> Performance: Should be improved a bit.
> Reactivity: Definitely improved a bit.
> 
> EARLY "SLOW DOWN" MESSAGES:
> 
> Ian has proposed that we send some sort of "slow down" message when we are 
> over some load threshold but still able to accept requests. This could be 
> implemented by sending a non-local RejectedOverload (i.e. pretending we are 
> relaying it), but ideally we'd like to have two distinct messages so we can 
> tell what proportion of requests receive each. One problem is given there is 
> a time lag involved, we could get oscillations and still see too many 
> rejections.


One difficulty here is should the threshold be on the per-peer limit (fair 
sharing) or just on the total load. If it's on the per-peer limit the limit is 
often going to be rather small...
> 
> Performance: Should be improved
> Accuracy: Significantly better
> DoS: Unaffected but need to figure out how to deal with peers that 
> consistently send slow-down messages.
> 
> AIMD'S ON EACH NODE INCLUDING REMOTE REQUESTS:
> 
> I propose that we could keep a rate estimation on each node, and use it not 
> only for our requests but for all requests, based on whether requests 
> complete with or without a slow-down message. This would determine an upper 
> and lower bound. Above the upper bound, we would reject requests. Above the 
> lower bound, we would send slow-down messages, but still accept requests; 
> below the lower bound, we would accept requests. This would also determine 
> when we start local requests. The main risk here is that we could get some 
> sort of feedback loop situation. It probably would need to be simulated and 
> we might need to find a better algorithm, or tune the existing one, for 
> calculating the rate. Also, like with fair sharing, there is a time lag 
> involved in telling peers to slow down.
> 
> Performance: Unclear, should be similar if not better
> Reactivity: Should be reasonable, possibly better than NLM as it should be 
> able to deal with bottlenecks better
> Incentives: Good, the originator is not special
> DoS: Good, assuming we solve the problem mentioned in the previous strategy
> 
The fundamental issue on the last proposal is feedback: If we are estimating 
the capacity of the network based partly on other nodes' estimates, we have the 
potential for feedback loops which will cause collapse. There should be ways to 
prevent this from becoming catastrophic. One thing we can do is not count 
requests we are handling locally towards the limits. But we will need more than 
that.

Another issue is we have the infrastructure for determining whether a request 
will be accepted or not, reasonably reliably, before sending it. We could use 
this without queueing.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110827/e0b04971/attachment.pgp>

[freenet-dev] Beyond New Load Management

Reply via email to