[freenet-dev] Beyond New Load Management

Matthew Toseland Sat, 27 Aug 2011 15:43:53 +0100

After trying out New Load Management on the network and seeing rather bad 
results, we need to reconsider load management. IMHO Old Load Management (the 
current system) is still not an acceptable answer.

Ideal load management would:
- PERFORMANCE: Performance is the number of requests running in parallel,
divided by the average time taken. This means latency should only be increased
if we increase the number of requests running in parallel by a similar factor.
We want to achieve close to the ideal for *both* successful and unsuccessful
requests. Our capacity for unsuccessful requests is huge, but we don't know
whether they are going to succeed when we send them, which creates some
problems.
- LATENCY: Not increase latency significantly for realtime requests. These must
respond quickly, even at the expense of somewhat poorer routing.
- REACTIVITY: React quickly to variation in available resources (e.g. some
requests completing quickly), but not overshoot.
- ACCURACY: Route accurately, severely limiting misrouting, but avoid
excessively slow nodes.
- INCENTIVES: Be incentives-compatible and secure: Ideally, the originator
should not be special.
- DOS: Make DoS attacks hard, dependant on the number of connections you have
into the network (hence very hard on darknet and hopefully similar difficulty
to global surveillance on opennet).

NEW LOAD MANAGEMENT:

So far, NLM seems to have problems. Ian argues these are largely due to
queueing making a bad situation worse - queueing causes requests to take
longer, the only way to counteract this is to run more requests in parallel,
but that ALSO causes queueing to take longer...

Performance: So far poor
Latency: Very poor
Reactivity: Should be reasonable, not clear
Accuracy: Very good
Incentives: Good, the originator is not special
DoS: Good, thanks mainly to fair sharing (see below)

OLD LOAD MANAGEMENT:

AIMDs on the originator based on RejectedOverload's, which are generated when a
request is rejected, and passed all the way back to the originator. When the
RejectedOverload is originally generated, the peer in question gets backed off.

Performance: Moderate
Latency: Good
Reactivity: Poor
Accuracy: Poor (all the backoffs)
Incentives: Poor (the originator is special; sending loads of requests can
improve performance at the cost of the network)
DoS: Poor (nothing to stop you ignoring AIMDs, sending loads of requests and
causing lots of backoffs)

OLD LOAD MANAGEMENT + FAIR SHARING:

Fair sharing between peers greatly reduces our vulnerability to DoS, and
improves performance on nodes with relatively few peers. However, current fair
sharing includes an abrupt transition which can cause backoffs, and makes the
next item harder. This should be fixed soon.

Incentives: Better but the originator is still special
DoS: Moderate

IMPROVED AIMD'S:

We can make the AIMD's be a true request count window rather than a rate
estimator. This should respond faster to variations in retrieval times (e.g.
getting a bunch of offered keys). We should probably not multiply it by the
number of peers as we do now, and we should probably consider how sensitive we
want the AIMD's to be (I'm not sure how we would easily calirbate that).

Performance: Should be improved a bit.
Reactivity: Definitely improved a bit.

EARLY "SLOW DOWN" MESSAGES:

Ian has proposed that we send some sort of "slow down" message when we are over
some load threshold but still able to accept requests. This could be
implemented by sending a non-local RejectedOverload (i.e. pretending we are
relaying it), but ideally we'd like to have two distinct messages so we can
tell what proportion of requests receive each. One problem is given there is a
time lag involved, we could get oscillations and still see too many rejections.

Performance: Should be improved
Accuracy: Significantly better
DoS: Unaffected but need to figure out how to deal with peers that consistently
send slow-down messages.

AIMD'S ON EACH NODE INCLUDING REMOTE REQUESTS:

I propose that we could keep a rate estimation on each node, and use it not
only for our requests but for all requests, based on whether requests complete
with or without a slow-down message. This would determine an upper and lower
bound. Above the upper bound, we would reject requests. Above the lower bound,
we would send slow-down messages, but still accept requests; below the lower
bound, we would accept requests. This would also determine when we start local
requests. The main risk here is that we could get some sort of feedback loop
situation. It probably would need to be simulated and we might need to find a
better algorithm, or tune the existing one, for calculating the rate. Also,
like with fair sharing, there is a time lag involved in telling peers to slow
down.

Performance: Unclear, should be similar if not better
Reactivity: Should be reasonable, possibly better than NLM as it should be able
to deal with bottlenecks better
Incentives: Good, the originator is not special
DoS: Good, assuming we solve the problem mentioned in the previous strategy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20110827/533e26b5/attachment.pgp>

[freenet-dev] Beyond New Load Management

Reply via email to