[freenet-dev] Beyond New Load Management

Matthew Toseland Mon, 29 Aug 2011 19:54:22 +0100

On Monday 29 Aug 2011 18:58:26 Ian Clarke wrote:
> On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland <
> toad at amphibian.dyndns.org> wrote:
> 
> > That is because we do not have the time or funding to empirically test
> > hypotheses. We don't gather enough data, we don't have a huge testnet to try
> > stuff on over extended periods, and so on. Most software development has
> > deadlines and resource limitations, but most software development doesn't
> > NEED to be empirical, at least not very often! See my other mail about this.
> 
> I don't think the problem is a lack of time or funding.  The problem is that
> we come up with solutions that are too complicated to analyze or fix when
> they don't work.  Throwing more time and money at that might help, but that
> is addressing the symptom, not the cause.  The cause is complexity, which
> just grows and grows as we try to fix problems we don't understand by
> layering on more complexity.


Perhaps. Ripping everything out and starting again is always tempting 
especially when subordinates have been doing things without asking you. :)
> 
> > Misrouting is unacceptable, in general. Extremely overloaded or extremely
> > low capacity nodes may be routed around. We might even allow some bounded
> > amount of misrouting in the more general case (e.g. go to either of the top
> > two peers for the key). But in general, transforming load into misrouting
> > (or into reduced HTL, or any other bogus escape valve) is a bad idea. We
> > need to reduce the incoming load.
> 
> Right, the same is true of queueing.  If nodes are forced to do things to
> deal with overloading that make the problem worse then the load balancing
> algorithm has failed.  Its job is to prevent that from happening.

Okay.
> 
> > > What if all nodes told other nodes that they were overloaded because all
> > > their peers are overloaded.  Such a situation would basically cause the
> > > entire network to think it was overloaded, even though nobody actually
> > was!
> > >  It becomes a bit like this cartoon: http://flic.kr/p/5npfm2  How can
> > this
> > > be avoided?
> >
> > Exactly. There are two possible approaches:
> > 1. Nodes control the rate at which *local* requests are initiated, based on
> > how many slow-down signals they get.
> >
> > (This ensures that such gridlock or feedback loop problems can't happen)
> 
> Ah, so slow-downs only ever affect local request initiation?  The problem
> there is that if a peer is overloaded, only those peers that are directly
> connected to it will slow down their rate of requests.  Peers elsewhere on
> the network will continue to fire requests at it, and there will be no way
> for them to know they need to slow down :-/

No, because the slow-down's are propagated back to the original request 
originator, and only the request originator takes notice of them. That's how it 
works now.
> 
> Unless... what if the DataReply message (or whatever its called these days)
> contains a slow-down request, and its respected by every peer along that
> path according to AIMD?  The problem is that there will be a very week link
> between the rate of requests, and the rate of slow-down messages.  It would
> be like having 10,000 individual heaters with thermostats all trying to
> regulate the temperature of a gigantic space.  Each individual
> thermostat/heater will have an almost imperceptible impact.  Although
> perhaps that would work...?

The rest is plausible - feedback might be too slow, but this can be helped in 
part by an increased window size between the point at which we start sending 
slow-down messages and the point at which things start breaking more badly.
> 
> Fair sharing can limit this. Limiting backoff so that we only misroute when
> > nodes are severely overloaded can also help. We could queue (provided this
> > happens only rarely and briefly), allow bounded misrouting (e.g. route to
> > one of the top two routes), or terminate requests (thus freeing up
> > resources).
> 
> I think "fair sharing" is one of those complexities that we need to go back
> to the drawing board on.  Its clear that even now we have hypotheses about
> how fair sharing could be causing problems.  Its too complicated to really
> reason about, or to anticipate all of its implications.

It is vital for:
1. Security: It quashes DoS attacks/floods quickly. Without it, any attacker 
can easily swamp any node it is connected to, using all of its capacity for 
sending its flood onward. With it, they are limited to a fraction of the node's 
capacity. This is a concern that load management must deal with!
2. Nodes with low numbers of peers, and slower nodes in general, to get a look 
in, rather than be rejected at the same rate as everyone else.
> 
> > - The easiest way to implement #1 is with AIMD's. We can keep them, but
> > send slow-down messages earlier.
> > - We could keep NLM, i.e. queue for limited times to smooth out delays.
> 
> Doesn't that violate the principal that the load balancing stuff shouldn't
> do anything that makes the problem worse?  Queueing does exactly that.

So what do you do if the incoming requests are not at the same time as the 
completed outgoing requests? I guess you keep a large number of request slots 
free at any given time?
> 
> > Currently we RNF after a (longish) period waiting for a node to route to.
> > We would send a slow-down message when we have queued for more than a
> > certain period.  This means we can have a fixed, limited amount of
> > misrouting, we can keep NLM's clear benefits for routing accuracy (as
> > visible in success rates), and we can ensure that the input load is low
> > enough to avoid severe slowdown due to queueing for a long time.
> 
> No, I think we need to avoid queueing except in case of emergency.  Queueing
> only makes things worse by tying up more resources for longer.

I have the same attitude to misrouting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/0394eae5/attachment.pgp>

[freenet-dev] Beyond New Load Management

Reply via email to