On Jan 31, 2008, at 12:00 PM, Matthew Toseland wrote:

> On Thursday 31 January 2008 17:34, you wrote:
>>
>> On Jan 31, 2008, at 8:41 AM, Matthew Toseland wrote:
>>
>>> We are still getting timeouts. [...]
>>> Any theories about the most likely cause?
>>
>> Considering the rather common occurrence of high-ping opennet
>> peernodes,
>
> Oh?

Every time I look at my opennet peers, I *always* have at least two  
with pings greater than 2 seconds. Right now, one with 4.5 secs, and  
one with 8.9 (the rest are sane).

>> my first suspect is that they are culminated pingtimes and
>> coalescing delays.
>
> Is that possible? 2 minute timeout, say 30 hops, it'd have to be ~ 4  
> second
> round trip per hop, which doesn't happen much does it?

I don't know, I have seen ping times to peers in excess of 12 seconds.  
How common are the timeouts anyway?

>> If this is the case, the only way I am aware of to
>> solve it is to favor nodes with low ping times;
>
> Opennet favours nodes that get successful requests. I want to keep  
> alchemy out
> of it as much as possible, since this is what Oskar has shown to  
> work, and
> anyway it ought to effectively balance all the other factors - if a  
> node is
> too slow it won't generate successful requests.
>
>> I actually already
>> have a patch for that, although in it's present incantation it also
>> favors darknet nodes for routing (easily excised).
>
> :)
>>
>> My only other suspect is a bug in the message/link layer that drops
>> messages.
>
> Entirely possible.
>
> The current link layer sucks, it is in need of a major rewrite.  
> There is one
> major bug which needs to be fixed (we increase the AIMD transfer  
> rate to a
> peer indefinitely while we are not maxing it out, and then get a  
> huge spike
> causing big problems when we do get more traffic to send). But more
> generally, it's not as close to TCP as I'd like it to be, and it has  
> severe
> limits on packets in flight. The packet format is also much more  
> verbose than
> it needs to be.
>
> http://wiki.freenetproject.org/NewTransportLayer
> http://wiki.freenetproject.org/NewPacketFormat
>
> Any such rewrite is highly unlikely to go in before 0.7.0. But if it  
> turns out
> to be relatively urgent it should be done soon after that.
>>
>> In the past while examining the throttle controls, I have suspected
>> that (with priority queues) the "90-seconds at full throttle"  
>> constant
>> might actually reduce to taking on too many concurrent chk transfers
>> for them all to complete on time.
>
> Why? IIRC we include a fudge factor in that calculation, admittedly  
> it isn't
> very accurate and should be made more so by using stats on bandwidth  
> usage...

Just that the CHKs all use the same throttle, so they all throttle- 
down when we accept another CHK transfer.

>>> Do timeouts show up in simulation?
>>
>> I don't normally watch for them, I've started a new run with Accepted
>> & Fatal request timeouts being logged. So far nothing.
>
> Ok.

After running the simulator for two hours w/ ten nodes, I spot exactly  
one Accepted timeout (17 minutes into the simulation).

So the answer is yes... timeouts still occur in the simulator.

>>> What can we do to debug this?
>>
>> Probably:
>> (1) a simulated high-ping times seen in the public network at about
>> the same rate,
>
> You mean bugs cause high ping times and high ping times cause  
> timeouts?
>
>> (2) a message/link layer stress test complete with rekeying/
>> disconnects/and [busy/not-busy] spikes
>
> This would be a good idea, I dunno how much work would be involved?
>
> What can I usefully work on in this area? AFAICS:
> - The window-grows-while-unused bug.
> - More accurate bandwidth liability limiting.
> - Debug the not-forwarded detection and make assumeNATed false by  
> default.
> (Reduce baseload bandwidth usage).
>
> Anything else? You want to take any of these on?


I don't think I can take on a big project right now.

--
Robert Hailey


Reply via email to