[freenet-dev] Beyond New Load Management: A proposal

Ian Clarke Mon, 29 Aug 2011 14:32:13 -0500

On Mon, Aug 29, 2011 at 2:09 PM, Matthew Toseland <toad at amphibian.dyndns.org
> wrote:

> On Monday 29 Aug 2011 19:28:50 Ian Clarke wrote:
> > Ok, thinking about it, here is a proposal, or  rather, the beginning of a
> > proposal.  I'm assuming that we get rid of NLM, fair sharing, and
> anything
> > else intended to control load, and replace it with this.  We will
> absolutely
> > need to simulate this before we write a single line of code to deploy it.
>
> I have justified fair sharing in my other reply. However alternate
> mechanisms might replace it.
>
> I agree we should simulate, although this might be significant work.
>

I doubt it would be more work than trying it without simulation and failing
for a decade, which was our previous approach.

> > The core idea is that a node will include a floating point number in
> > response to any kind of request showing how close that node is to being
> > overloaded.  0.0 would mean its doing nothing, 1.0 would mean that it is
> > completely saturated and must reject requests.  Clearly the goal is to
> avoid
> > getting anywhere near to 1.0.
> >
> > A node tracks several things:
>
> All of these apply to both locally originated requests and remotely
> originated requests? And are they for only our direct peers?
>

Every peer along the reply path will take note of the load value, so it will
be a mixture of the loads of close-by peers, and more distant peers.
 Obviously, since we are more likely to receive messages from peers that are
"closer", our average will automatically be biased towards closer peers.

>    - What is the average load reported by responses this node forwarded,
> per
> >    remote node
>
> Ahhh, this one could be interesting - you could use it to penalise nodes
> which spam excessively.
>

Exactly.

> > I think, given these metrics, we should be able to do the following:
> >
> >    - Limit our overall rate of initiating local requests based on the
> global
> >    average reported load
>
> We can already do this on the basis of AIMD's, based purely on behaviour of
> local requests. However, taking more data into account might result in more
> accurate results - although it might also dilute the feedback.
>

Right, AIMD tends to control the rate at which a specific event occurs, like
a dropped IP packet with TCP, or a dropped request as we currently use it.
 This is a bit like signaling that you have a headache by banging your head
against the wall.  It is also a rather coarse way to deliver feedback.

A floating point load number conveys more information in a non-destructive
way.

In any case, this part is "safe": it's not going to cause feedback problems,
> on its own. Conversely, it doesn't deal with problems such as network
> bottlenecks.
>

Well, it depends.  If we are tracking load on a per-outbound-connection
basis then we can potentially take action to manage load per-connection in
addition to overall.

> However, while AIMD's are based on the *global* average load (or rather the
> load of all the nodes along the chain), it sounds like you are proposing to
> only report the *local* load of our direct peers? How are you going to
> compute the global average load?
>

No, its not the local load of our direct peers.  Its the load reported in
all reply messages that we forward, some of those will be from our direct
peers, but most will be from remote peers.

> >    - Limit our rate of local requests based on the average load of the
> >    connection to the peer it would need to be forwarded to
>
> Not sure I follow here. Are you proposing that we initiate more requests in
> keyspace areas where there is lower load?
>

Well, kinda, except the opposite.  We might reduce requests in keyspaces
where there is higher load.

> >    - Detect when remote peers are abusing the system by disregarding load
> -
> >    as evidenced by a significantly higher average load in replies
> forwarded to
> >    them
>
> Okay, that is useful - although I doubt it would be an unambiguous
> detection, and what would we do about it? We would have to reject requests,
> and if we're not sure, or if it could be a downstream troublemaker, couldn't
> that result in misrouting and make things worse? On the other hand, any
> malicious node would be most easily detected by the nodes directly connected
> to it, so this could work much the same way as fair sharing - provided that
> it can reject or terminate the requests.
>

No, we probably wouldn't start rejecting requests - more likely we would
disconnect from that node.  Obviously we'd need to be fairly sure of abusive
behavior before doing that - but abuse should be rare.

> Arguably fair sharing achieves the same objective. It could be extended in
> various ways e.g. if we are relaying a lot of slow-down notices to a peer we
> could start to think about squeezing the peer's allocation.
>

Yes, except it doesn't work, and its too complicated to reason about.

> > Of course, lots of questions:
> >
> >    - How do we compute the averages?  A decaying running average of some
> >    kind?  What time period?
>
> Yay more tunables. :(
>

Ideally we'd do this in a way that avoids tunables.  This is control theory,
we shouldn't have to reinvent the wheel.

> >    - How do we translate load into a desired rate of requests?
>
> And even more tunables. :(
>

Don't be pessimistic, hopefully we can avoid tunables if we are smart about
it.

> >    - What criteria indicate that a peer is abusing the system?  What is
> the
> >    remedy?
>
> In the current system, don't accept their requests.
>

Yes, although I'm leaning towards disconnecting from them, which is
more-or-less the same thing, just more permanent and they don't consume our
resources any more.

> In ANY system, there will always be a borderline below which they can get
> away with it. We can either try to make this as small as possible or try not
> to hit legitimate traffic too hard.
>

Abuse should be rare, we should bias against false positives when we look
for it.

> > This is basically control theory stuff, and we'll need robust answers to
> > these questions before we proceed (ideally with a theoretical rather than
> > experimental foundation).
>
> IMHO small tweaks to AIMDs, combined with fair sharing, will achieve much
> the same without having to wait years for our theoreticians to wake up.
>

Yes, small tweaks have worked so well for us for the last decade, leaving us
pretty-much where we were in 2003.  No, we don't understand how the current
system works, there is no point in trying to fix something when we don't
even know what is broken.

I am touched by your faith in "theoreticians", but I have a feeling we're in
uncharted territory here.  We need to agree on an approach that is simple
enough to reason about, and then build a simulation for it.  I think the
simulation should just be a few days of work, we're not talking years here,
hopefully not even months.

Ian.

-- 
Ian Clarke
Founder, The Freenet Project
Email: ian at freenetproject.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110829/caeca808/attachment.html>

[freenet-dev] Beyond New Load Management: A proposal

Reply via email to