On Mon, Jan 06, 2003 at 10:51:42AM -0500, Gianni Johansson wrote: > Mathew: > I am still seeing load averages of 10 -20 and higher for about 6 hours then > my node goes moribund and stops answering incoming FNP requests. Are you sure this isn't related to the CPAlgoRoutingTable deadlock that people are getting causing vast numbers of threads (on the unstable branch) ? I suppose that wouldn't cause load though. > > The load seems to be unrelated to the amount of data my node is moving. > > Here are a few ideas: > 0)One explanation I can come up with for the excessive load is that the new > CP code is allowing the outbound link crypto to thrash. > > Routing is highly non-linear. The previous CP code explictly limited the > number of times a reference could be retried per unit time so that nodes that > were popular with the routing wouldn't be retried too often. i.e. If a > reference has failed once in the last minute, retrying it 5 times in the next > 20 seconds probably won't tell us much. Making those connections could burn > a lot of mips, especially if the target node is hanging up after the link > crypto. Whereas backing it off for a short time will not reduce the CP, and will not be persisted. > > You can argue that the CP will be reduced enough eventually to stop the ref > from being routed to, but eventually *doesn't matter*. You have already paid > the price in unsuccessful connections to get there. That price could be very > high for a popular node. Hmmmm. > > 1) There is a hack in the OCM that keeps too many outbound connections from > blocking trying to connect to the same host. It throws > ConnectFailedExceptions. I commented it out as a quick test. Without it my > load average increased without bound until I lost control of the machine and > had to power down to regain control. But it's good to keep it in mind as a > source of "fake" ConnectFailedExceptions. > > 2) Persisting the updated routing table information might be burning mips. I > don't think that the DataObjectRoutingMemory was ever designed to handle a > really fast update rate. Maybe this is fixed. I know Oskar did some work a > while back to move the routing info into separate files. I doubt that it is _that_ heavy now, but I will have a look. > > --gj > > p.s.: > Do you intend to fix this? > http://hawk.freenetproject.org/pipermail/devl/2002-December/003217.html Not until a) the current deadlock-thread-runaway problems get fixed b) I investigate it experimentally a bit more, and c) Oskar agrees.
There are other issues involved here, such as how the network adapts to nodes going offline for more than the backoff time. Thus I do not know exactly what the right fix is without eliminating bugs that might interfere with its effects, further investigation and discussing it with the other two people (you and oskar) who have some idea what is going on. Additionally, load averages of 15 are what nice is for. -- Matthew Toseland toad at amphibian.dyndns.org amphibian at users.sourceforge.net Freenet/Coldstore open source hacker. Employed full time by Freenet Project Inc. from 11/9/02 to 11/1/03 http://freenetproject.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20030106/7c53df92/attachment.pgp>
