On Thursday 24 January 2008 22:58, Robert Hailey wrote: > > On Jan 24, 2008, at 2:56 PM, Matthew Toseland wrote: > > > That is certainly possible... > > > > What's the fix? Take out the temporal averaging? Or only apply it if > > the > > number of peers is over a certain number? > > > > I don't like dropping the mechanism completely, historically ping > > time has > > proven to be a good general load indicator (for example it tends to > > go way > > high if there is a major CPU problem, because the threads involved are > > starved of CPU and therefore get longer ping times). > > But isn't that *still* the case. If it is a problem with our node in > that respect, all our ping times will be way-high, and using the per- > node ping time is equivalent.
Except that using the overall ping time may be more accurate, no? > > Certainly if a link ping time is too high, it indicates a problem > between (and including) the two nodes on that link. If the problem is > with the remote node, surely we only want it to negatively effect that > node's link (rejects from them, backoff from timeouts towards them). > Your argument is that it may help if the problem is with us (local > node). In practice, if 'we' are the cpu/pingtime problem our peers > will backoff from us. They may route to us anyway, because all their other peers are also overloaded. Preemptive rejection has several functions: - To prevent timeouts. - To propagate the fact that there is a load problem back to the request sender's AIMDs without a timeout. - To force the requestor to reroute, even if that means backing out to its requestor with a RouteNotFound. > > I'm not suggesting scrapping the stat, I think the proper fix (at > least for the time being) is making the ping time rejection per-node > (as committed in r17235)... There may be an argument for having both, but I'm not convinced that we should only have the one. I've seen high ping times on startup due to high CPU usage, the node rapidly recovers. Can this be self-sustaining on a per-peer basis? Yes we would have requests coming in from other peers, but if each side refused to accept requests from the other because of a high ping time, you still get a deadlock. Except it's not a deadlock: it should recover. It doesn't. That's a bug. Find the bug! > there is another part to this deadlock I'm > not understanding, though. It seems like the backed off status of the > other peers should eventually clear and be included in the average > (although, the ping average would slowly dropping from 100x the other > values). Yes the remote side is backed off too (from all the > rejections, and would be for some time...), but it should eventually > clear, right? Yes. It should clear pretty quickly. Unless they are all backed off except for this one node which is the only one which really ought to be backed off. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20080124/5020dd8e/attachment.pgp>
