After trying out New Load Management on the network and seeing rather bad results, we need to reconsider load management. IMHO Old Load Management (the current system) is still not an acceptable answer.
Ideal load management would: - PERFORMANCE: Performance is the number of requests running in parallel, divided by the average time taken. This means latency should only be increased if we increase the number of requests running in parallel by a similar factor. We want to achieve close to the ideal for *both* successful and unsuccessful requests. Our capacity for unsuccessful requests is huge, but we don't know whether they are going to succeed when we send them, which creates some problems. - LATENCY: Not increase latency significantly for realtime requests. These must respond quickly, even at the expense of somewhat poorer routing. - REACTIVITY: React quickly to variation in available resources (e.g. some requests completing quickly), but not overshoot. - ACCURACY: Route accurately, severely limiting misrouting, but avoid excessively slow nodes. - INCENTIVES: Be incentives-compatible and secure: Ideally, the originator should not be special. - DOS: Make DoS attacks hard, dependant on the number of connections you have into the network (hence very hard on darknet and hopefully similar difficulty to global surveillance on opennet). NEW LOAD MANAGEMENT: So far, NLM seems to have problems. Ian argues these are largely due to queueing making a bad situation worse - queueing causes requests to take longer, the only way to counteract this is to run more requests in parallel, but that ALSO causes queueing to take longer... Performance: So far poor Latency: Very poor Reactivity: Should be reasonable, not clear Accuracy: Very good Incentives: Good, the originator is not special DoS: Good, thanks mainly to fair sharing (see below) OLD LOAD MANAGEMENT: AIMDs on the originator based on RejectedOverload's, which are generated when a request is rejected, and passed all the way back to the originator. When the RejectedOverload is originally generated, the peer in question gets backed off. Performance: Moderate Latency: Good Reactivity: Poor Accuracy: Poor (all the backoffs) Incentives: Poor (the originator is special; sending loads of requests can improve performance at the cost of the network) DoS: Poor (nothing to stop you ignoring AIMDs, sending loads of requests and causing lots of backoffs) OLD LOAD MANAGEMENT + FAIR SHARING: Fair sharing between peers greatly reduces our vulnerability to DoS, and improves performance on nodes with relatively few peers. However, current fair sharing includes an abrupt transition which can cause backoffs, and makes the next item harder. This should be fixed soon. Incentives: Better but the originator is still special DoS: Moderate IMPROVED AIMD'S: We can make the AIMD's be a true request count window rather than a rate estimator. This should respond faster to variations in retrieval times (e.g. getting a bunch of offered keys). We should probably not multiply it by the number of peers as we do now, and we should probably consider how sensitive we want the AIMD's to be (I'm not sure how we would easily calirbate that). Performance: Should be improved a bit. Reactivity: Definitely improved a bit. EARLY "SLOW DOWN" MESSAGES: Ian has proposed that we send some sort of "slow down" message when we are over some load threshold but still able to accept requests. This could be implemented by sending a non-local RejectedOverload (i.e. pretending we are relaying it), but ideally we'd like to have two distinct messages so we can tell what proportion of requests receive each. One problem is given there is a time lag involved, we could get oscillations and still see too many rejections. Performance: Should be improved Accuracy: Significantly better DoS: Unaffected but need to figure out how to deal with peers that consistently send slow-down messages. AIMD'S ON EACH NODE INCLUDING REMOTE REQUESTS: I propose that we could keep a rate estimation on each node, and use it not only for our requests but for all requests, based on whether requests complete with or without a slow-down message. This would determine an upper and lower bound. Above the upper bound, we would reject requests. Above the lower bound, we would send slow-down messages, but still accept requests; below the lower bound, we would accept requests. This would also determine when we start local requests. The main risk here is that we could get some sort of feedback loop situation. It probably would need to be simulated and we might need to find a better algorithm, or tune the existing one, for calculating the rate. Also, like with fair sharing, there is a time lag involved in telling peers to slow down. Performance: Unclear, should be similar if not better Reactivity: Should be reasonable, possibly better than NLM as it should be able to deal with bottlenecks better Incentives: Good, the originator is not special DoS: Good, assuming we solve the problem mentioned in the previous strategy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110827/533e26b5/attachment.pgp>