On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
> Ok, thinking about it, here is a proposal, or  rather, the beginning of a
> proposal.  I'm assuming that we get rid of NLM, fair sharing, and anything
> else intended to control load, and replace it with this.  We will absolutely
> need to simulate this before we write a single line of code to deploy it.
> 
> The core idea is that a node will include a floating point number in
> response to any kind of request showing how close that node is to being
> overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
> completely saturated and must reject requests.  Clearly the goal is to avoid
> getting anywhere near to 1.0.
> 
> A node tracks several things:
> 
>    - What is the overall average load reported by responses this node has
>    received
>    - What is the average load reported by responses this node has received,
>    per remote node
>    - What is the average load reported by responses this node forwarded, per
>    remote node
> 
> I think, given these metrics, we should be able to do the following:
> 
>    - Limit our overall rate of initiating local requests based on the global
>    average reported load
>    - Limit our rate of local requests based on the average load of the
>    connection to the peer it would need to be forwarded to
>    - Detect when remote peers are abusing the system by disregarding load -
>    as evidenced by a significantly higher average load in replies forwarded to
>    them
> 
> Of course, lots of questions:
> 
>    - How do we compute the averages?  A decaying running average of some
>    kind?  What time period?
>    - How do we translate load into a desired rate of requests?
>    - What criteria indicate that a peer is abusing the system?  What is the
>    remedy?
> 
> This is basically control theory stuff, and we'll need robust answers to
> these questions before we proceed (ideally with a theoretical rather than
> experimental foundation).

One fundamental issue that hasn't been covered in detail yet:

WE NEED MORE DATA.

How can we achieve this?

First, digger3's graphs are very helpful. But he can only very rarely show 
statistically significant results between one build and another. We need a lot 
more people gathering data for this. We can either just ask people, or we can 
build a plugin that makes it easy (there is some manual setup at the moment).

Second, digger3's graphs don't address throughput. This is THE reason why NLM 
was turned off: throughput was dramatically down. digger3's graphs are based on 
the LongTermManySingleBlocksTest, which is based on inserting a bunch of single 
blocks. It's true that each is timed but that is irrelevant: What we need is a 
big insert, where other factors affecting throughput can come into play.

Third, there are those who say our problems are not related to load management 
at all, they are related to the fact that Freenet nodes with a lot of downloads 
tend to have a flat distribution of links. This is a natural enough 
optimisation for their local usage, but if it happens across the network, or 
even if it happens in clusters, it will wreak havoc. We need some way to 
estimate how widespread this is. If it is common, we should do something about 
it, for instance, not path folding until requests reach HTL 16 (which is a good 
idea for security, but if this problem is rare and not clustered it would 
reduce performance).

The obvious option is to implement the random routed probes that Evan 
suggested. These should eventually be able to replace the current probe 
requests, and would return a lot less data and so be safer. They would be 
routed randomly and then terminate on a node, and return a single value from 
that node - in this case some metric of how spread out vs specialised its peers 
are. Other versions would be able to estimate the network size by a 
tag-and-release method that should be cheaper than the current probe request 
based stuff (it would return an estimate of uptime and a unique ID for the node 
which is kept private and used solely for that purpose), or return the 
proportion of peers backed off. Another variant is to random route and then do 
a request - this would allow us to determine whether content is still 
retrievable and thus whether we want to reinsert it, or might provide more 
representative statistics than bootstrapping a new node every time (at least 
allowing us to separate problems with bootstrapping from problems faced by the 
average node).

IMHO some or all of the above are worth seriously considering before deploying 
any new scheme. If we want to be empirical we need to measure the effect of our 
changes on the real network, not only in simulation.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110831/87a3fb3f/attachment.pgp>

Reply via email to