On Wed, Aug 31, 2011 at 9:00 AM, Matthew Toseland
<toad at amphibian.dyndns.org> wrote:
> On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
>> Ok, thinking about it, here is a proposal, or ?rather, the beginning of a
>> proposal. ?I'm assuming that we get rid of NLM, fair sharing, and anything
>> else intended to control load, and replace it with this. ?We will absolutely
>> need to simulate this before we write a single line of code to deploy it.
>>
>> The core idea is that a node will include a floating point number in
>> response to any kind of request showing how close that node is to being
>> overloaded. ?0.0 would mean its doing nothing, 1.0 would mean that it is
>> completely saturated and must reject requests. ?Clearly the goal is to avoid
>> getting anywhere near to 1.0.
>>
>> A node tracks several things:
>>
>> ? ?- What is the overall average load reported by responses this node has
>> ? ?received
>> ? ?- What is the average load reported by responses this node has received,
>> ? ?per remote node
>> ? ?- What is the average load reported by responses this node forwarded, per
>> ? ?remote node
>>
>> I think, given these metrics, we should be able to do the following:
>>
>> ? ?- Limit our overall rate of initiating local requests based on the global
>> ? ?average reported load
>> ? ?- Limit our rate of local requests based on the average load of the
>> ? ?connection to the peer it would need to be forwarded to
>> ? ?- Detect when remote peers are abusing the system by disregarding load -
>> ? ?as evidenced by a significantly higher average load in replies forwarded 
>> to
>> ? ?them
>>
>> Of course, lots of questions:
>>
>> ? ?- How do we compute the averages? ?A decaying running average of some
>> ? ?kind? ?What time period?
>> ? ?- How do we translate load into a desired rate of requests?
>> ? ?- What criteria indicate that a peer is abusing the system? ?What is the
>> ? ?remedy?
>>
>> This is basically control theory stuff, and we'll need robust answers to
>> these questions before we proceed (ideally with a theoretical rather than
>> experimental foundation).
>
> One fundamental issue that hasn't been covered in detail yet:
>
> WE NEED MORE DATA.
>
> How can we achieve this?
>
> First, digger3's graphs are very helpful. But he can only very rarely show 
> statistically significant results between one build and another. We need a 
> lot more people gathering data for this. We can either just ask people, or we 
> can build a plugin that makes it easy (there is some manual setup at the 
> moment).
>
> Second, digger3's graphs don't address throughput. This is THE reason why NLM 
> was turned off: throughput was dramatically down. digger3's graphs are based 
> on the LongTermManySingleBlocksTest, which is based on inserting a bunch of 
> single blocks. It's true that each is timed but that is irrelevant: What we 
> need is a big insert, where other factors affecting throughput can come into 
> play.
>
> Third, there are those who say our problems are not related to load 
> management at all, they are related to the fact that Freenet nodes with a lot 
> of downloads tend to have a flat distribution of links. This is a natural 
> enough optimisation for their local usage, but if it happens across the 
> network, or even if it happens in clusters, it will wreak havoc. We need some 
> way to estimate how widespread this is. If it is common, we should do 
> something about it, for instance, not path folding until requests reach HTL 
> 16 (which is a good idea for security, but if this problem is rare and not 
> clustered it would reduce performance).
>
> The obvious option is to implement the random routed probes that Evan 
> suggested. These should eventually be able to replace the current probe 
> requests, and would return a lot less data and so be safer. They would be 
> routed randomly and then terminate on a node, and return a single value from 
> that node - in this case some metric of how spread out vs specialised its 
> peers are. Other versions would be able to estimate the network size by a 
> tag-and-release method that should be cheaper than the current probe request 
> based stuff (it would return an estimate of uptime and a unique ID for the 
> node which is kept private and used solely for that purpose), or return the 
> proportion of peers backed off. Another variant is to random route and then 
> do a request - this would allow us to determine whether content is still 
> retrievable and thus whether we want to reinsert it, or might provide more 
> representative statistics than bootstrapping a new node every time (at least 
> allowing us to separate problems with bootstrapping from problems faced by 
> the average node).
>
> IMHO some or all of the above are worth seriously considering before 
> deploying any new scheme. If we want to be empirical we need to measure the 
> effect of our changes on the real network, not only in simulation.
>
> _______________________________________________
> Devl mailing list
> Devl at freenetproject.org
> http://freenetproject.org/cgi-bin/mailman/listinfo/devl
>

I like this proposal :)

Is the documentation on the math of how to get the random routing to
behave well sufficient? Let me know if it isn't. The MHMC routing math
shouldn't be too complicated, but we want to be certain it's
implemented correctly so that the data is sound.

Evan Daniel

Reply via email to