[freenet-dev] Getting rid of load limiting

Matthew Toseland Thu, 1 Apr 2010 18:47:44 +0100

On Monday 29 March 2010 20:40:04 Robert Hailey wrote:
> 
> On Mar 24, 2010, at 2:24 PM, Matthew Toseland wrote:
> 
> > Currently our load management system works like this:
> > - At the level of any node, a request will be preferentially routed  
> > to the best node by location (or FOAF location), but if many nodes  
> > are backed off, we can route to the worst node by location. In other  
> > words there is no limit whatever on misrouting.
> > - At the level of the node originating requests, we attempt to  
> > estimate the capacity of the network in a manner similar to TCP  
> > (although more vague as we operate on rates rather than actual  
> > windows).
> >
> > Hence we do not limit misrouting, but we do limit load. And limiting  
> > load on the sender side is very expensive and inflexible, costing us  
> > a great deal of performance.
> 
> True, yet from a pragmatic perspective... is there really a way around  
> this? Network theory: you can control what you send but not what you  
> receive.
> 
> > Recently the CCN paper, and since long ago various people on Frost,  
> > have pointed out that we do not in fact need to limit load. We don't  
> > need to limit it at the sender side and arguably we don't need to do  
> > more at the any-node level than basic limiting of the number of  
> > requests in flight. This is because every time data is transferred  
> > it is the result of a request, so congestion in the sense that TCP/ 
> > IP deals with cannot really occur.
> 
> That is fascinating, I'll have to think about that. If the node would  
> always be busy I guess it could not be any worse than present.
> 
> > Clearly we do need to limit the number of requests which are pending  
> > on any given node. If we don't, we will end up sharing a finite and  
> > often very small resource (the output bandwidth limit) among an  
> > unbounded and potentially very large number of requests.
> 
> And in order to limit the number of requests in-flight, we need to  
> limit the number of requests which the node accepts. Isn't this  
> already the case?


Right. But we can do better at telling our peers about this than just rejecting 
them and letting the peers backoff. We can tell them how many requests we are 
likely to be able to accept in the near future, for example.
> 
> > Also it would be good to directly limit misrouting by refusing to  
> > send requests to a node that is too far from the ideal for the key  
> > we are requesting. However, some nodes will be always very slow, and  
> > some nodes will be temporarily very slow. IMHO these are two  
> > different issues: Nodes which are always very slow (e.g. due to  
> > being on dialup) should publish a very small capacity and be used  
> > occasionally when they can be used, and the fact that we can't use  
> > the very slow node for most requests that would in theory match it  
> > should not be a big concern with regards to misrouting. Whereas  
> > nodes which are temporarily very slow (e.g. temporarily suffering  
> > under massive CPU load) should just be ignored - they reject  
> > requests or time out, and we can backoff. Hence backoff should only  
> > happen in critical cases (e.g. high ping time), and most of the time  
> > load is limited by the published request limit, which takes over  
> > from output bandwidth liability preemptive rejection.
> 
> I think this is the true problem. Slow nodes.... heterogeneous  
> bandwidth limitations... possibly some unrecognized (unmeasured?)  
> machine-performance issues.
> 
> I would like to resubmit my previous suggestion: a "pre-request" which  
> calls for a node to make a very-firm estimate asto the amount of time  
> it would take to deliver a given key. If a followup "actual" request  
> exceeds the estimate (or is remotely cancelled) the upstream node is  
> penalized (via backoff), thus temporarily removing it from a healthy  
> network.
> 
> Plus, it closely mirrors *real-life*... your boss says "when can you  
> get this [very important] report to me", and you can indicate "given  
> the amount of other bulk & important requests, and knowing how fast I  
> can walk to your desk... 3 minutes".

IMHO this would be very difficult to implement because of the uncertainty in 
how many other requests will be running by that point. It would also be a 
largish security issue as it allows you to find the data's location without 
actually transferring it, and thus know who to DoS to mount an effective 
censorship attack (especially on opennet).
> 
> For security we may need to turn a fraction of these request-estimates  
> into actual (bulk) requests.
> 
> > Even with these precautions we will need a heuristic for the degree  
> > of misrouting we will tolerate. Options:
> > - Always route to the ideal node, subject to the above limitations.  
> > Arguably this is best, provided we can exclude outliers in a  
> > satisfactory way.
> > - Allow routing to the top two or three nodes.
> > - Take a proportion of the routing table (e.g. top 25%).
> 
> In my idea if the node received a request (or a pre-request), we could  
> query an arbitrary number of nodes and take the most favored one.
> 
> For bulk transfers we could take a measure of something other than the  
> bottom-line time estimate, such as largest-%-capacity or largest-%- 
> uptime of all links between the requestor and the datum.

No, there may be routing options to take into account response time, but 
fundamentally we must route to a location, not route load to wherever it can go 
at the moment. And if we can't route to a location right now, we need to slow 
things down, not devolve into routing load.
> 
> > [...snip...]
> >
> > 6. Realtime vs bulk flag
> >
> > Add an extra flag to requests. If the flag is set to BULK, we  
> > proceed as before, with queueing and a relatively long timeout. If  
> > the flag is set to REALTIME, we need an instant decision, even if it  
> > means an instant failure. So we try to route to the top nodes - a  
> > slightly more generous heuristic than with bulk - and if we can't,  
> > we fail as if we had timed out. Another way to implement this would  
> > be to have a dedicated quota within the per-node limit for realtime  
> > requests. A third option would be to allow realtime requests to be  
> > queued but for a much shorter period, and use stats to estimate  
> > whether that will be exceeded and if so fail immediately.
> 
> Another idea... three classifications: realtime, fast, and bulk.
> 
> Realtime requests always occur at the full link speed between nodes  
> and have an embedded "deadline" timestamp (which could be increased at  
> the origin until it succeeds). A node instantly rejects such a request  
> if it would make any other transfers go over time-budget, otherwise  
> all other transfers are (very-temporarily) suspended for the [one]  
> realtime transfer.
> 
> Hmmm... on second thought you would still need some kind of queuing or  
> token system... or a realtime response could include the narrowest  
> pipe; and a node could cancel any "slower" realtime requests.

Yeah, I think the idea of realtime requests as you suggest is implausible, 
because they'd have such a short window that they would not be able to be 
routed very far.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20100401/c0fa4f5c/attachment.pgp>

[freenet-dev] Getting rid of load limiting

Reply via email to