[freenet-dev] Out of control load average / New CP code.

Matthew Toseland Mon, 6 Jan 2003 16:48:24 +0000

On Mon, Jan 06, 2003 at 10:51:42AM -0500, Gianni Johansson wrote:
> Mathew:
> I am still seeing load averages of 10 -20 and higher for about 6 hours then 
> my node goes moribund and stops answering incoming FNP requests.
Are you sure this isn't related to the CPAlgoRoutingTable deadlock that
people are getting causing vast numbers of threads (on the unstable
branch) ? I suppose that wouldn't cause load though.
> 
> The load seems to be unrelated to the amount of data my node is moving.
> 
> Here are a few ideas:
> 0)One explanation I can come up with for the excessive load is that the new 
> CP code is allowing the outbound link crypto to thrash.
> 
> Routing is highly non-linear.  The previous CP code explictly limited the 
> number of times a reference could be retried per unit time so that nodes that 
> were popular with the routing wouldn't be retried too often. i.e. If a 
> reference has failed once in the last minute, retrying it 5 times in the next 
> 20 seconds probably won't tell us much.  Making those connections could burn 
> a lot of mips, especially if the target node is hanging up after the link 
> crypto.
Whereas backing it off for a short time will not reduce the CP, and will
not be persisted.
> 
> You can argue that the CP will be reduced enough eventually to stop the ref 
> from being routed to, but eventually *doesn't matter*.  You have already paid 
> the price in unsuccessful connections to get there.  That price could be very 
> high for a popular node.
Hmmmm.
> 
> 1) There is a hack in the OCM that keeps too many outbound connections from 
> blocking trying to connect to the same host.  It throws 
> ConnectFailedExceptions.  I commented it out as a quick test.  Without it my 
> load average increased without bound until I lost control of the machine and 
> had to power down to regain control.  But it's good to keep it in mind as a 
> source of "fake" ConnectFailedExceptions.
> 
> 2) Persisting the updated routing table information might be burning mips.  I 
> don't think that the DataObjectRoutingMemory was ever designed to handle a 
> really fast update rate.  Maybe this is fixed. I know Oskar did some work a 
> while back to move the routing info into separate files.
I doubt that it is _that_ heavy now, but I will have a look.
> 
> --gj
> 
> p.s.:
> Do you intend to fix this?
> http://hawk.freenetproject.org/pipermail/devl/2002-December/003217.html
Not until 
a) the current deadlock-thread-runaway problems get fixed
b) I investigate it experimentally a bit more, and 
c) Oskar agrees.


There are other issues involved here, such as how the network adapts to
nodes going offline for more than the backoff time. Thus I do not know
exactly what the right fix is without eliminating bugs that might
interfere with its effects, further investigation and discussing it with
the other two people (you and oskar) who have some idea what is going
on. Additionally, load averages of 15 are what nice is for.

-- 
Matthew Toseland
toad at amphibian.dyndns.org
amphibian at users.sourceforge.net
Freenet/Coldstore open source hacker.
Employed full time by Freenet Project Inc. from 11/9/02 to 11/1/03
http://freenetproject.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20030106/7c53df92/attachment.pgp>

[freenet-dev] Out of control load average / New CP code.

Reply via email to