Re: [freenet-dev] How to gather more data was Re: Beyond New Load Management: A proposal

Matthew Toseland Thu, 01 Sep 2011 04:24:24 -0700

On Wednesday 31 Aug 2011 15:05:51 Evan Daniel wrote:
> On Wed, Aug 31, 2011 at 9:00 AM, Matthew Toseland
> <t...@amphibian.dyndns.org> wrote:
> > On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
> >> Ok, thinking about it, here is a proposal, or  rather, the beginning of a
> >> proposal.  I'm assuming that we get rid of NLM, fair sharing, and anything
> >> else intended to control load, and replace it with this.  We will 
> >> absolutely
> >> need to simulate this before we write a single line of code to deploy it.
> >>
> >> The core idea is that a node will include a floating point number in
> >> response to any kind of request showing how close that node is to being
> >> overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
> >> completely saturated and must reject requests.  Clearly the goal is to 
> >> avoid
> >> getting anywhere near to 1.0.
> >>
> >> A node tracks several things:
> >>
> >>    - What is the overall average load reported by responses this node has
> >>    received
> >>    - What is the average load reported by responses this node has received,
> >>    per remote node
> >>    - What is the average load reported by responses this node forwarded, 
> >> per
> >>    remote node
> >>
> >> I think, given these metrics, we should be able to do the following:
> >>
> >>    - Limit our overall rate of initiating local requests based on the 
> >> global
> >>    average reported load
> >>    - Limit our rate of local requests based on the average load of the
> >>    connection to the peer it would need to be forwarded to
> >>    - Detect when remote peers are abusing the system by disregarding load -
> >>    as evidenced by a significantly higher average load in replies 
> >> forwarded to
> >>    them
> >>
> >> Of course, lots of questions:
> >>
> >>    - How do we compute the averages?  A decaying running average of some
> >>    kind?  What time period?
> >>    - How do we translate load into a desired rate of requests?
> >>    - What criteria indicate that a peer is abusing the system?  What is the
> >>    remedy?
> >>
> >> This is basically control theory stuff, and we'll need robust answers to
> >> these questions before we proceed (ideally with a theoretical rather than
> >> experimental foundation).
> >
> > One fundamental issue that hasn't been covered in detail yet:
> >
> > WE NEED MORE DATA.
> >
> > How can we achieve this?
> >
> > First, digger3's graphs are very helpful. But he can only very rarely show 
> > statistically significant results between one build and another. We need a 
> > lot more people gathering data for this. We can either just ask people, or 
> > we can build a plugin that makes it easy (there is some manual setup at the 
> > moment).
> >
> > Second, digger3's graphs don't address throughput. This is THE reason why 
> > NLM was turned off: throughput was dramatically down. digger3's graphs are 
> > based on the LongTermManySingleBlocksTest, which is based on inserting a 
> > bunch of single blocks. It's true that each is timed but that is 
> > irrelevant: What we need is a big insert, where other factors affecting 
> > throughput can come into play.
> >
> > Third, there are those who say our problems are not related to load 
> > management at all, they are related to the fact that Freenet nodes with a 
> > lot of downloads tend to have a flat distribution of links. This is a 
> > natural enough optimisation for their local usage, but if it happens across 
> > the network, or even if it happens in clusters, it will wreak havoc. We 
> > need some way to estimate how widespread this is. If it is common, we 
> > should do something about it, for instance, not path folding until requests 
> > reach HTL 16 (which is a good idea for security, but if this problem is 
> > rare and not clustered it would reduce performance).
> >
> > The obvious option is to implement the random routed probes that Evan 
> > suggested. These should eventually be able to replace the current probe 
> > requests, and would return a lot less data and so be safer. They would be 
> > routed randomly and then terminate on a node, and return a single value 
> > from that node - in this case some metric of how spread out vs specialised 
> > its peers are. Other versions would be able to estimate the network size by 
> > a tag-and-release method that should be cheaper than the current probe 
> > request based stuff (it would return an estimate of uptime and a unique ID 
> > for the node which is kept private and used solely for that purpose), or 
> > return the proportion of peers backed off. Another variant is to random 
> > route and then do a request - this would allow us to determine whether 
> > content is still retrievable and thus whether we want to reinsert it, or 
> > might provide more representative statistics than bootstrapping a new node 
> > every time (at least allowing us to separate problems with bootstrapping 
> > from problems faced by the average node).
> >
> > IMHO some or all of the above are worth seriously considering before 
> > deploying any new scheme. If we want to be empirical we need to measure the 
> > effect of our changes on the real network, not only in simulation.
> >
> I like this proposal :)
> 
> Is the documentation on the math of how to get the random routing to
> behave well sufficient? Let me know if it isn't. The MHMC routing math
> shouldn't be too complicated, but we want to be certain it's
> implemented correctly so that the data is sound.


Do you have a metric for how clustered vs uniform a node's peers are?

What's MHMC?

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] How to gather more data was Re: Beyond New Load Management: A proposal

Reply via email to