On Wednesday 31 Aug 2011 15:05:51 Evan Daniel wrote: > On Wed, Aug 31, 2011 at 9:00 AM, Matthew Toseland > <t...@amphibian.dyndns.org> wrote: > > On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote: > >> Ok, thinking about it, here is a proposal, or rather, the beginning of a > >> proposal. I'm assuming that we get rid of NLM, fair sharing, and anything > >> else intended to control load, and replace it with this. We will > >> absolutely > >> need to simulate this before we write a single line of code to deploy it. > >> > >> The core idea is that a node will include a floating point number in > >> response to any kind of request showing how close that node is to being > >> overloaded. 0.0 would mean its doing nothing, 1.0 would mean that it is > >> completely saturated and must reject requests. Clearly the goal is to > >> avoid > >> getting anywhere near to 1.0. > >> > >> A node tracks several things: > >> > >> - What is the overall average load reported by responses this node has > >> received > >> - What is the average load reported by responses this node has received, > >> per remote node > >> - What is the average load reported by responses this node forwarded, > >> per > >> remote node > >> > >> I think, given these metrics, we should be able to do the following: > >> > >> - Limit our overall rate of initiating local requests based on the > >> global > >> average reported load > >> - Limit our rate of local requests based on the average load of the > >> connection to the peer it would need to be forwarded to > >> - Detect when remote peers are abusing the system by disregarding load - > >> as evidenced by a significantly higher average load in replies > >> forwarded to > >> them > >> > >> Of course, lots of questions: > >> > >> - How do we compute the averages? A decaying running average of some > >> kind? What time period? > >> - How do we translate load into a desired rate of requests? > >> - What criteria indicate that a peer is abusing the system? What is the > >> remedy? > >> > >> This is basically control theory stuff, and we'll need robust answers to > >> these questions before we proceed (ideally with a theoretical rather than > >> experimental foundation). > > > > One fundamental issue that hasn't been covered in detail yet: > > > > WE NEED MORE DATA. > > > > How can we achieve this? > > > > First, digger3's graphs are very helpful. But he can only very rarely show > > statistically significant results between one build and another. We need a > > lot more people gathering data for this. We can either just ask people, or > > we can build a plugin that makes it easy (there is some manual setup at the > > moment). > > > > Second, digger3's graphs don't address throughput. This is THE reason why > > NLM was turned off: throughput was dramatically down. digger3's graphs are > > based on the LongTermManySingleBlocksTest, which is based on inserting a > > bunch of single blocks. It's true that each is timed but that is > > irrelevant: What we need is a big insert, where other factors affecting > > throughput can come into play. > > > > Third, there are those who say our problems are not related to load > > management at all, they are related to the fact that Freenet nodes with a > > lot of downloads tend to have a flat distribution of links. This is a > > natural enough optimisation for their local usage, but if it happens across > > the network, or even if it happens in clusters, it will wreak havoc. We > > need some way to estimate how widespread this is. If it is common, we > > should do something about it, for instance, not path folding until requests > > reach HTL 16 (which is a good idea for security, but if this problem is > > rare and not clustered it would reduce performance). > > > > The obvious option is to implement the random routed probes that Evan > > suggested. These should eventually be able to replace the current probe > > requests, and would return a lot less data and so be safer. They would be > > routed randomly and then terminate on a node, and return a single value > > from that node - in this case some metric of how spread out vs specialised > > its peers are. Other versions would be able to estimate the network size by > > a tag-and-release method that should be cheaper than the current probe > > request based stuff (it would return an estimate of uptime and a unique ID > > for the node which is kept private and used solely for that purpose), or > > return the proportion of peers backed off. Another variant is to random > > route and then do a request - this would allow us to determine whether > > content is still retrievable and thus whether we want to reinsert it, or > > might provide more representative statistics than bootstrapping a new node > > every time (at least allowing us to separate problems with bootstrapping > > from problems faced by the average node). > > > > IMHO some or all of the above are worth seriously considering before > > deploying any new scheme. If we want to be empirical we need to measure the > > effect of our changes on the real network, not only in simulation. > > > I like this proposal :) > > Is the documentation on the math of how to get the random routing to > behave well sufficient? Let me know if it isn't. The MHMC routing math > shouldn't be too complicated, but we want to be certain it's > implemented correctly so that the data is sound.
Do you have a metric for how clustered vs uniform a node's peers are? What's MHMC?
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl