For some time now I have been working on new load management. I am not convinced it is feasible, based both on theory and on experimentation with the implemented code. (Fortunately much of the code is
Basically, the existing load management system relies on nodes rejecting requests when they think they can't be answered in a reasonable time, and the original sender calculates a rate for sending requests in a similar fashion to TCP, based on rejections/timeouts and the time taken for a request. (It calculates a window and then adjusts it, but it doesn't actually use the window, it uses a rate, this is similar to how we used to deal with link-level UDP congestion control). When a request is rejected, the peer becomes "backed off" for a period, which increases exponentially on subsequent rejections. When a node is backed off we avoid routing to it unless all other peers are also backed off. This has several problems: - It adapts to the local network capacity very badly. - It tends to, and in fact relies on, misrouting, that is, routing to the wrong peer. - It may leak information about who is the request originator. - Denial of service attacks are possible by simply ignoring the timeouts and rejections and sending requests as fast as possible. This is also a perverse incentive for a short-sighted client. The result would be, while it might be contained to some degree thanks to the fair sharing code, that there was much more backoff on the network, more misrouting, and other nodes reduced their send rate while having less success for the requests that are sent. For some years people have talked about making something better. Notable influences: - Discussions mostly on Frost with various very insightful people about token passing and similar schemes. The eventual consensus was that we should just route stuff and rely on some sort of hop-by-hop flow control. - The CCN paper, http://conferences.sigcomm.org/co-next/2009/papers/Jacobson.pdf . This is well worth reading, and proposes a content distribution and caching network operating at the same level as IP. It states in section 3.1 that because each packet of data is a reply to a request, flow balance is maintained in the network automatically, and there is no need for the client to dynamically adjust a window size according to packet loss. It does not clearly state how clients should decide how many requests to send, but implies that this is limited mainly by their own capacity. It does not queue requests, and does drop them when necessary. The current scheme, known as "new load management", which is mostly implemented, but configurable, has the following properties: - Misrouting is explicitly limited. A request can go to a limited group of 2-3 peers. These are added in specific cases, for instance if the first choice has dramatically less capacity than its peers we add another peer (but keep the first one). - The request will wait as long as is necessary to go to a reasonable peer. I.e. queueing. - We share information on our current load status so that peers can determine whether their requests are likely to be accepted. In practice, on the testnet, which has 10 or so nodes and relatively low traffic (I have a bunch of old inserts queued on my node), the time it takes for a request to get a node to route to is unreasonably large, and appears to be increasing. Initially I saw largish spikes (up to a minute or so), but low average queueing times (hundreds of milliseconds). However, after running it for a day or thereabouts, delay times are on average 6 and 27 seconds and appear to be rising. This might be because the scheme is fundamentally broken. I tried to figure out what the likely queueing time would be mathematically and the maths wouldn't work out. It is possible that because on each hop a request can be held up waiting for other requests, it inevitably escalates to infinity. Another possibility however is that queueing causes network-level deadlocks or loops. On a simple circular network this is fairly easy to demonstrate: A - B - C - D - E (- A) Each node has a capacity of 1 request. Request 1 starts at A and moves towards E. It stops at D. Request 2 starts at D and moves towards B. It stops at A. Request 1 is now waiting for request 2, and request 2 is waiting for request 1. Oops! Note that we can't rely on flow control, because requests are small, and blocks can be, and are, cached in full on each hop. So simple flow control to balance between multiple streams doesn't seem to make sense. A higher level system would probably be some form of queueing. Also, it might be possible to use the new load management code to avoid routing to nodes which will reject our requests, without queueing, treating it as a rejection. Rereading parts of the CCN paper, and pondering this, there appear to be several options: 1. Keep the old load management scheme. IMHO this sucks, for reasons explained above. It can be improved a little, e.g. by using an actual window for requests in flight rather than a rate. It might be possible to limit misrouting even with old load management. However, the use of backoff even for short term rejections due to being temporarily close to capacity limits is doubtful. 2. Something closer to CCN. We could drop requests which can't be routed quickly, rather than queueing them. We would have to tell the preceding hops, so killing the request. (CCN just relies on a timeout) This would result in the original sender retrying. In CCN there is no need to estimate the network capacity, but the problem then is will we just keep sending requests and getting them rejected? Even if we can avoid this on the next hop, if there is a bottleneck a few hops away we could waste a lot of bandwidth with local requests that then get rejected. On the other hand, requests can be very small, requests for the same key will get routed differently anyway (due to per-node failure tables), ensuring that the local area at least is searched, and a bottleneck would presumably only be in a specific part of the keyspace. Or we could go the whole hog and just use a timeout like CCN, without notifying downstream when a request is dropped. But this would be difficult given our current hop-by-hop load limiting is based on the number of requests we can transfer in a reasonable time. 3. Keep queueing and detect loops. One possible scheme: A request starts as "live". If it waits for the same set of peers for more than a certain period, it is marked as "waiting", and a notice is sent downstream. This propagates slowly. If it is routed forward, or receives a reply, a cancel notice is sent, which propagates quickly. Likewise if the request completes. If all the nodes a request is waiting for are marked as waiting, the request is killed, hopefully freeing up capacity and breaking up loops. I don't think I fully understand load management, and I would appreciate help from people who do! Unfortunately our theorists are mostly interested in routing, and mostly tend to go away after a while anyway. It is also possible that the problems on testnet are caused by simple bugs, I have solved many but more must remain. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110702/27c059f4/attachment.pgp>