For some time now I have been working on new load management. I am not 
convinced it is feasible, based both on theory and on experimentation with the 
implemented code. (Fortunately much of the code is 

Basically, the existing load management system relies on nodes rejecting 
requests when they think they can't be answered in a reasonable time, and the 
original sender calculates a rate for sending requests in a similar fashion to 
TCP, based on rejections/timeouts and the time taken for a request. (It 
calculates a window and then adjusts it, but it doesn't actually use the 
window, it uses a rate, this is similar to how we used to deal with link-level 
UDP congestion control). When a request is rejected, the peer becomes "backed 
off" for a period, which increases exponentially on subsequent rejections. When 
a node is backed off we avoid routing to it unless all other peers are also 
backed off.

This has several problems:
- It adapts to the local network capacity very badly.
- It tends to, and in fact relies on, misrouting, that is, routing to the wrong 
peer.
- It may leak information about who is the request originator.
- Denial of service attacks are possible by simply ignoring the timeouts and 
rejections and sending requests as fast as possible. This is also a perverse 
incentive for a short-sighted client. The result would be, while it might be 
contained to some degree thanks to the fair sharing code, that there was much 
more backoff on the network, more misrouting, and other nodes reduced their 
send rate while having less success for the requests that are sent.

For some years people have talked about making something better. Notable 
influences:
- Discussions mostly on Frost with various very insightful people about token 
passing and similar schemes. The eventual consensus was that we should just 
route stuff and rely on some sort of hop-by-hop flow control.
- The CCN paper, 
http://conferences.sigcomm.org/co-next/2009/papers/Jacobson.pdf . This is well 
worth reading, and proposes a content distribution and caching network 
operating at the same level as IP. It states in section 3.1 that because each 
packet of data is a reply to a request, flow balance is maintained in the 
network automatically, and there is no need for the client to dynamically 
adjust a window size according to packet loss. It does not clearly state how 
clients should decide how many requests to send, but implies that this is 
limited mainly by their own capacity. It does not queue requests, and does drop 
them when necessary.

The current scheme, known as "new load management", which is mostly 
implemented, but configurable, has the following properties:
- Misrouting is explicitly limited. A request can go to a limited group of 2-3 
peers. These are added in specific cases, for instance if the first choice has 
dramatically less capacity than its peers we add another peer (but keep the 
first one).
- The request will wait as long as is necessary to go to a reasonable peer. 
I.e. queueing.
- We share information on our current load status so that peers can determine 
whether their requests are likely to be accepted.

In practice, on the testnet, which has 10 or so nodes and relatively low 
traffic (I have a bunch of old inserts queued on my node), the time it takes 
for a request to get a node to route to is unreasonably large, and appears to 
be increasing. Initially I saw largish spikes (up to a minute or so), but low 
average queueing times (hundreds of milliseconds). However, after running it 
for a day or thereabouts, delay times are on average 6 and 27 seconds and 
appear to be rising.

This might be because the scheme is fundamentally broken. I tried to figure out 
what the likely queueing time would be mathematically and the maths wouldn't 
work out. It is possible that because on each hop a request can be held up 
waiting for other requests, it inevitably escalates to infinity.

Another possibility however is that queueing causes network-level deadlocks or 
loops. On a simple circular network this is fairly easy to demonstrate:

A - B - C - D - E (- A)

Each node has a capacity of 1 request.

Request 1 starts at A and moves towards E. It stops at D.
Request 2 starts at D and moves towards B. It stops at A.
Request 1 is now waiting for request 2, and request 2 is waiting for request 1.
Oops!

Note that we can't rely on flow control, because requests are small, and blocks 
can be, and are, cached in full on each hop. So simple flow control to balance 
between multiple streams doesn't seem to make sense. A higher level system 
would probably be some form of queueing.

Also, it might be possible to use the new load management code to avoid routing 
to nodes which will reject our requests, without queueing, treating it as a 
rejection.

Rereading parts of the CCN paper, and pondering this, there appear to be 
several options:

1. Keep the old load management scheme.

IMHO this sucks, for reasons explained above. It can be improved a little, e.g. 
by using an actual window for requests in flight rather than a rate.

It might be possible to limit misrouting even with old load management. 
However, the use of backoff even for short term rejections due to being 
temporarily close to capacity limits is doubtful.

2. Something closer to CCN.

We could drop requests which can't be routed quickly, rather than queueing 
them. We would have to tell the preceding hops, so killing the request. (CCN 
just relies on a timeout) This would result in the original sender retrying. In 
CCN there is no need to estimate the network capacity, but the problem then is 
will we just keep sending requests and getting them rejected? Even if we can 
avoid this on the next hop, if there is a bottleneck a few hops away we could 
waste a lot of bandwidth with local requests that then get rejected. On the 
other hand, requests can be very small, requests for the same key will get 
routed differently anyway (due to per-node failure tables), ensuring that the 
local area at least is searched, and a bottleneck would presumably only be in a 
specific part of the keyspace.

Or we could go the whole hog and just use a timeout like CCN, without notifying 
downstream when a request is dropped. But this would be difficult given our 
current hop-by-hop load limiting is based on the number of requests we can 
transfer in a reasonable time.

3. Keep queueing and detect loops.

One possible scheme:
A request starts as "live". If it waits for the same set of peers for more than 
a certain period, it is marked as "waiting", and a notice is sent downstream. 
This propagates slowly. If it is routed forward, or receives a reply, a cancel 
notice is sent, which propagates quickly. Likewise if the request completes. If 
all the nodes a request is waiting for are marked as waiting, the request is 
killed, hopefully freeing up capacity and breaking up loops.

I don't think I fully understand load management, and I would appreciate help 
from people who do! Unfortunately our theorists are mostly interested in 
routing, and mostly tend to go away after a while anyway. It is also possible 
that the problems on testnet are caused by simple bugs, I have solved many but 
more must remain.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110702/27c059f4/attachment.pgp>

Reply via email to