[freenet-dev] Beyond New Load Management

Ian Clarke Mon, 29 Aug 2011 12:58:26 -0500

On Mon, Aug 29, 2011 at 12:37 PM, Matthew Toseland <
toad at amphibian.dyndns.org> wrote:


> That is because we do not have the time or funding to empirically test
> hypotheses. We don't gather enough data, we don't have a huge testnet to try
> stuff on over extended periods, and so on. Most software development has
> deadlines and resource limitations, but most software development doesn't
> NEED to be empirical, at least not very often! See my other mail about this.
>

I don't think the problem is a lack of time or funding.  The problem is that
we come up with solutions that are too complicated to analyze or fix when
they don't work.  Throwing more time and money at that might help, but that
is addressing the symptom, not the cause.  The cause is complexity, which
just grows and grows as we try to fix problems we don't understand by
layering on more complexity.


> Misrouting is unacceptable, in general. Extremely overloaded or extremely
> low capacity nodes may be routed around. We might even allow some bounded
> amount of misrouting in the more general case (e.g. go to either of the top
> two peers for the key). But in general, transforming load into misrouting
> (or into reduced HTL, or any other bogus escape valve) is a bad idea. We
> need to reduce the incoming load.
>

Right, the same is true of queueing.  If nodes are forced to do things to
deal with overloading that make the problem worse then the load balancing
algorithm has failed.  Its job is to prevent that from happening.


> > What if all nodes told other nodes that they were overloaded because all
> > their peers are overloaded.  Such a situation would basically cause the
> > entire network to think it was overloaded, even though nobody actually
> was!
> >  It becomes a bit like this cartoon: http://flic.kr/p/5npfm2  How can
> this
> > be avoided?
>
> Exactly. There are two possible approaches:
> 1. Nodes control the rate at which *local* requests are initiated, based on
> how many slow-down signals they get.
>
> (This ensures that such gridlock or feedback loop problems can't happen)
>

Ah, so slow-downs only ever affect local request initiation?  The problem
there is that if a peer is overloaded, only those peers that are directly
connected to it will slow down their rate of requests.  Peers elsewhere on
the network will continue to fire requests at it, and there will be no way
for them to know they need to slow down :-/

Unless... what if the DataReply message (or whatever its called these days)
contains a slow-down request, and its respected by every peer along that
path according to AIMD?  The problem is that there will be a very week link
between the rate of requests, and the rate of slow-down messages.  It would
be like having 10,000 individual heaters with thermostats all trying to
regulate the temperature of a gigantic space.  Each individual
thermostat/heater will have an almost imperceptible impact.  Although
perhaps that would work...?

Fair sharing can limit this. Limiting backoff so that we only misroute when
> nodes are severely overloaded can also help. We could queue (provided this
> happens only rarely and briefly), allow bounded misrouting (e.g. route to
> one of the top two routes), or terminate requests (thus freeing up
> resources).
>

I think "fair sharing" is one of those complexities that we need to go back
to the drawing board on.  Its clear that even now we have hypotheses about
how fair sharing could be causing problems.  Its too complicated to really
reason about, or to anticipate all of its implications.


> - The easiest way to implement #1 is with AIMD's. We can keep them, but
> send slow-down messages earlier.
> - We could keep NLM, i.e. queue for limited times to smooth out delays.


Doesn't that violate the principal that the load balancing stuff shouldn't
do anything that makes the problem worse?  Queueing does exactly that.


> Currently we RNF after a (longish) period waiting for a node to route to.
> We would send a slow-down message when we have queued for more than a
> certain period.  This means we can have a fixed, limited amount of
> misrouting, we can keep NLM's clear benefits for routing accuracy (as
> visible in success rates), and we can ensure that the input load is low
> enough to avoid severe slowdown due to queueing for a long time.
>

No, I think we need to avoid queueing except in case of emergency.  Queueing
only makes things worse by tying up more resources for longer.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: ian at freenetproject.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/1d8153cd/attachment.html>

[freenet-dev] Beyond New Load Management

Reply via email to