On Wed, 09 Aug 2006 15:30:31 +0100, Matthew Toseland wrote:

> What is backoff for? It's not to prevent overloading nodes; nodes reject
> requests pre-emptively. What it is for is this:
> 
> If we don't backoff slow nodes, then these slow nodes will have to reject
> (or occasionally timeout) the requests that we do forward to them. Those
> failures will be propagated back to the request senders, who will have to
> reduce their send rate. In other words, if we don't avoid sending requests
> to the slowest nodes, the requests we do send to them will be rejected,
> and those rejections will cause a universal slowdown in request rates.

Solution: make the send rate a function of the key in question. Requests
to the areas of keyspace that are occupied by slow nodes *should* come at
a slower rate, while requests to the areas with fast nodes *should* go at
full speed.

It must be a function of the key, not node, since the node we forward to
will forward based on the key, so two requests to the same node will
likely differ in speed based on the key.

> Thus, I don't think that the current mechanism is sensible. Nodes can be
> expected to take care of themselves, now that we have pre-emptive
> rejection. I don't think the ethernet analogy works 100%; we don't really
> have a shared medium.
> 
> What we are looking at here is simply a way to ensure that requests are
> not forwarded to nodes who will inevitably reject them and thus cause an
> overall slowdown at the requester end. This is completely different to the
> case in 0.5, where requesters never slowed down.
> 
> So what am I proposing?
> 
> For each node, we track the proportion of requests which are accepted. We
> calculate a median. For any node above the median, all requests are
> forwarded to it: if most of our nodes are slow, that's not our problem,
> it's our peers who want to forward requests through us who have to deal
> with it (by not sending requests to us). For nodes below the median, we
> artificially narrow their specializations:
> 
> While routing:
> 
> public double distance(PeerNode node, double target) {
> 
> double distance;
> if(node.acceptedFrac >= medianAcceptedFrac) {
>       distance = rawDistance(node.location, target);
> } else {
>       distance = rawDistance(node.location, target) *
>               medianAcceptedFrac / node.acceptedFrac;
> }
> return distance;
> }
> }

So, basically, if we can't forward to the best node, we forward to some
other node ? Which causes the request to go the full HTL, increasing the
total load in the network and making it more likely that any given request
can't be forwarded to the best node, leading to a vicious circle of
raising load.

I have said this before and I'm saying it again: routing and load
balancing are mutually exclusive goals. You *can't* have both in the same
network at the same time. Any attempt to do so will simply lead to an
increasingly complex mess.

Oh, and making this specific modification will also make the code far more
confusing to read, since distance() doesn't actually return the distance
but rather some number that may or may not be the actual distance.

> Now, to some degree, this is alchemy, in the sense that it hasn't been
> simulated. However it will be some time before the new rate limiting
> mechanism is available, and it appears a solid argument to me: If all our
> peers are slow, that DOES NOT mean we shouldn't send requests to them.
> Because we have load propagation, it is entirely legitimate to send
> requests ANYWAY, and let upstream nodes deal with the slowness - either
> the entire network is slow, in which case we reduce all the send rates, or
> this particular node is slow, in which case we reduce the number of
> requests sent to it! Note also that slavishly following backoffs reduces
> our anonymity quite noticeably.
> 
> Comments?

You are trying to accomplish two mutually exclusive goals - routing and
load balancing - at the same time, and will inevitably end up failing
miserably, since it is logically impossible to simultaneously route to the
"best" and "least loaded" node, unless they happen to be the same - in
which case normal routing would have accomplished the same.


Reply via email to