On Mon, Aug 29, 2011 at 2:09 PM, Matthew Toseland <t...@amphibian.dyndns.org > wrote:
> On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote: > > Ok, thinking about it, here is a proposal, or rather, the beginning of a > > proposal. I'm assuming that we get rid of NLM, fair sharing, and > anything > > else intended to control load, and replace it with this. We will > absolutely > > need to simulate this before we write a single line of code to deploy it. > > I have justified fair sharing in my other reply. However alternate > mechanisms might replace it. > > I agree we should simulate, although this might be significant work. > I doubt it would be more work than trying it without simulation and failing for a decade, which was our previous approach. > > The core idea is that a node will include a floating point number in > > response to any kind of request showing how close that node is to being > > overloaded. 0.0 would mean its doing nothing, 1.0 would mean that it is > > completely saturated and must reject requests. Clearly the goal is to > avoid > > getting anywhere near to 1.0. > > > > A node tracks several things: > > All of these apply to both locally originated requests and remotely > originated requests? And are they for only our direct peers? > Every peer along the reply path will take note of the load value, so it will be a mixture of the loads of close-by peers, and more distant peers. Obviously, since we are more likely to receive messages from peers that are "closer", our average will automatically be biased towards closer peers. > - What is the average load reported by responses this node forwarded, > per > > remote node > > Ahhh, this one could be interesting - you could use it to penalise nodes > which spam excessively. > Exactly. > > I think, given these metrics, we should be able to do the following: > > > > - Limit our overall rate of initiating local requests based on the > global > > average reported load > > We can already do this on the basis of AIMD's, based purely on behaviour of > local requests. However, taking more data into account might result in more > accurate results - although it might also dilute the feedback. > Right, AIMD tends to control the rate at which a specific event occurs, like a dropped IP packet with TCP, or a dropped request as we currently use it. This is a bit like signaling that you have a headache by banging your head against the wall. It is also a rather coarse way to deliver feedback. A floating point load number conveys more information in a non-destructive way. In any case, this part is "safe": it's not going to cause feedback problems, > on its own. Conversely, it doesn't deal with problems such as network > bottlenecks. > Well, it depends. If we are tracking load on a per-outbound-connection basis then we can potentially take action to manage load per-connection in addition to overall. > However, while AIMD's are based on the *global* average load (or rather the > load of all the nodes along the chain), it sounds like you are proposing to > only report the *local* load of our direct peers? How are you going to > compute the global average load? > No, its not the local load of our direct peers. Its the load reported in all reply messages that we forward, some of those will be from our direct peers, but most will be from remote peers. > > - Limit our rate of local requests based on the average load of the > > connection to the peer it would need to be forwarded to > > Not sure I follow here. Are you proposing that we initiate more requests in > keyspace areas where there is lower load? > Well, kinda, except the opposite. We might reduce requests in keyspaces where there is higher load. > > - Detect when remote peers are abusing the system by disregarding load > - > > as evidenced by a significantly higher average load in replies > forwarded to > > them > > Okay, that is useful - although I doubt it would be an unambiguous > detection, and what would we do about it? We would have to reject requests, > and if we're not sure, or if it could be a downstream troublemaker, couldn't > that result in misrouting and make things worse? On the other hand, any > malicious node would be most easily detected by the nodes directly connected > to it, so this could work much the same way as fair sharing - provided that > it can reject or terminate the requests. > No, we probably wouldn't start rejecting requests - more likely we would disconnect from that node. Obviously we'd need to be fairly sure of abusive behavior before doing that - but abuse should be rare. > Arguably fair sharing achieves the same objective. It could be extended in > various ways e.g. if we are relaying a lot of slow-down notices to a peer we > could start to think about squeezing the peer's allocation. > Yes, except it doesn't work, and its too complicated to reason about. > > Of course, lots of questions: > > > > - How do we compute the averages? A decaying running average of some > > kind? What time period? > > Yay more tunables. :( > Ideally we'd do this in a way that avoids tunables. This is control theory, we shouldn't have to reinvent the wheel. > > - How do we translate load into a desired rate of requests? > > And even more tunables. :( > Don't be pessimistic, hopefully we can avoid tunables if we are smart about it. > > - What criteria indicate that a peer is abusing the system? What is > the > > remedy? > > In the current system, don't accept their requests. > Yes, although I'm leaning towards disconnecting from them, which is more-or-less the same thing, just more permanent and they don't consume our resources any more. > In ANY system, there will always be a borderline below which they can get > away with it. We can either try to make this as small as possible or try not > to hit legitimate traffic too hard. > Abuse should be rare, we should bias against false positives when we look for it. > > This is basically control theory stuff, and we'll need robust answers to > > these questions before we proceed (ideally with a theoretical rather than > > experimental foundation). > > IMHO small tweaks to AIMDs, combined with fair sharing, will achieve much > the same without having to wait years for our theoreticians to wake up. > Yes, small tweaks have worked so well for us for the last decade, leaving us pretty-much where we were in 2003. No, we don't understand how the current system works, there is no point in trying to fix something when we don't even know what is broken. I am touched by your faith in "theoreticians", but I have a feeling we're in uncharted territory here. We need to agree on an approach that is simple enough to reason about, and then build a simulation for it. I think the simulation should just be a few days of work, we're not talking years here, hopefully not even months. Ian. -- Ian Clarke Founder, The Freenet Project Email: i...@freenetproject.org
_______________________________________________ Devl mailing list Devl@freenetproject.org http://freenetproject.org/cgi-bin/mailman/listinfo/devl