[freenet-devl] Neststorms / low node throughput.

Gianni Johansson Thu, 18 Oct 2001 15:20:42 -0400

On Thursday 18 October 2001 08:04, you wrote:
> On Thu, Oct 18, 2001 at 12:25:44AM -0400, Gianni Johansson wrote:
> <>
>
> > I just checked in changes that give failed nodes an interval to
> > "rest" before they are retried again. I think that cooperatively backing
> > off in this manner will reduce network congestion.  Request will be
> > better routed to nodes that can answer them. If the node really is
> > completely out of usable refs requests will fail quickly with RNFs
> > instead of waiting for all of the known bad refs to be contacted and
> > fail.
>
> As I said, I don't see the point in this. The CP value already gives a
> "rest", it just does to ramdonly rather than absolutely, but it averages
> out.
>
Averaging out isn't good enough.  Transient routing failures -- caused by 
target nodes hitting their thread/connection limit   -- are very common. You 
need to recover from them as soon as possible.  If you have one node ref that 
just stopped working and the rest haven't worked for days, it makes sense to 
retry the one that was working as soon as reasonably possible (i.e after the 
rest interval expires). Otherwise the network will collapse under heavy 
stress a la 0.3.


Just to clarify, my change increases the timeout interval with each 
successive failure, so if the refs have been retried and failed and none's
timeout interval has expired, the request will RNF.

> If routing skips all the available refs, then the node just should
> simply send back a RouteNotFound on the request. Trying to make it force
> the routing to a node randomly was a well intentioned idea, but does not
> work out in reality. Complicating this will only make it worse.
>
My approach isn't random.  I am caching information about previous 
responsiveness.

The perfect is not the enemy of the good.  

Rember this code should almost never run.  It's purpose is to allow the node 
to bootstrap into the network with a small set of potentially crappy noderefs 
and/or to recover after it has been overloaded --i.e. routed so many requests 
that all of it's node refs are no longer responding.

As you have pointed out the previous implementation was just wrong.  I think
my modifications are an improvement.


> > 1) It looks like the thread pools job queue is way too big.
> > Snippet from Freenet.node.Main.startNode:
> >
> > if (node.maximumThreads > 0) {
> >    tm = new ThreadPool(tg, 5, node.maximumThreads >> 1,
> >                        node.maximumThreads);
> >    tm.start();
> > }
> >
> > First, only half as many threads as are specified in
> > freenet.conf are actually created.  This is somewhat
> > counter-intuitive to the end user.
>
> I agree, that seems wrong. I think that it should rather be
>
> tm = new ThreadPool(tg, node.maximumThreads >> 1, node.maximumThreads, 5);
>
> though "5" seems awefully arbitrary. I'm quite sure the intention was to
> have maximumThreads maximum, and half as many in the pool by default, so
> this is probably simply a mistake.
>
I'm not sure I am following you.  

public ThreadPool(ThreadGroup tg, int maxPool, int maxThreads, 
                   int maxJobs) {

maxPool -- The number of idle threads to keep in the pool.
maxThread -- The maximum number of concurrent threads to run.
maxJobs -- The maximum number of allowable jobs, including running jobs.

What I think you are saying is that the number of jobs that can be queued up 
should be half the maximum number of allowed threads.

m = new ThreadPool(tg, 5, node.maximumThreads, 3 * node.maximumThreads / 2);

>
> <>
>
> > It's easy to reduce the size of the job queue.  Figuring out what
> > to do to recover somewhat gracefully when it overflows is much more
> > difficult. I don't really know the answer.
>
> It should be ok:
>
> 1) Incoming connections use run() and simply close if nothing can be
> assigned.
>
> 2) The Ticker uses blockingRun() so it simply waits until it can run the
> threads.
>
> 3) Outgoing connections use forceRun() so they simply fall back on
> making new java threads.
>
> Those are really all the threads we start (I even considered creating
> connections by throwing them on the Ticker so as to not create any
> threads anywhere else).
>
> > I will not comment on the practice of using shift operators instead
> > of division...
>
> If somebody doesn't know what a bit shift does, they are better off
> learning now.
There is no accounting for taste.  I think that you should make simple things 
simple so that the people who read your code can concentrate on the things 
that are truly complicated.

>
> > 2) Tuning the ratio of open connections and allowed threads.
> > I think the default maxNodeConnections (30) is too big for the default
> > value of maximumThreads (120 which is only 60 real threads).  When the
> > network gets congested, message chains restart more often which
> > seems to eat up more threads.
> >
> > I have been running my node with 10 connections and 240 threads (120 real
> > threads).
> >
> > Thoughts?
>
> I think you are going going after symptoms here, clearly we should not
> need 12 threads per incoming connection. 
>The problem is (I guess) that
> your living outgoing connections are staying alive and grabbing all the
> threads, hardly allowing any incoming. This effectively creates a
> situation where your node can only be routed to using the already open
> connections (something that has been discussed as a feature, but which I
> think we can agree must be implemented under much more controlled
> circumstances).
>
> What clearly needs to be done is tuning the lifetime of connections.
> Just setting the connectionTimeout down a little may help (why is that
> the one setting nobody is messing with?),
Good point, I will look at it.
> but OpenConnectionManager
> probably needs to be better at pruning out connections (like removing
> redundant connections to the same node) as well.
>
Ok, I will look at this too.

-- gj

-- 
Freesites
(0.3) freenet:MSK at SSK@enI8YFo3gj8UVh-Au0HpKMftf6QQAgE/homepage//
(0.4) freenet:SSK at npfV5XQijFkF6sXZvuO0o~kG4wEPAgM/homepage//

_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl

[freenet-devl] Neststorms / low node throughput.

Reply via email to