On Jan 11, 2008, at 12:22 PM, Matthew Toseland wrote:
> On Tuesday 08 January 2008 22:57, Robert Hailey wrote:
>> Jan 08, 2008 20:03:41:146 (freenet.node.RequestSender, RequestSender
>> for UID -3998139406700477577, ERROR): discontinuing non-local request
>> search, general timeout (6 attempts, 3 overloads)
>> ...
>> Jan 08, 2008 20:12:21:226 (freenet.node.RequestSender, RequestSender
>> for UID 60170596711015291, ERROR): discontinuing non-local request
>> search, general timeout (1 attempts, 3 overloads)
>
> Ouch. How common is this?

The super-majority of requests which took to long asked only one peer  
(~65%), but this does not speak to how many completed correctly.

>> Presently the node will "RejectLoop" if it is one of the last 10000
>> completed requests. My node runs through that many requests in about
>> 16 minutes. With this logging statement it is already shown that a
>> request can last longer than 2 minutes for one peer (and most nodes
>> have 20). If you assume that a request takes 4 minutes (two peers,
>> VERY optimistic), then it would then only take 4 nodes ('near' each
>> other) to generate a request-live-lock (absent the HTL; the request
>> would never drop from the network); each trying two of it's other
>> peers, and then the next in the 4-chain.
>
> Okay, this is a problem.

On Jan 11, 2008, at 12:42 PM, Matthew Toseland wrote:
> On Wednesday 09 January 2008 17:14, Robert Hailey wrote:
>>
>> I have reverted r16886, as it appears to be based on a misunderstand
>> of how requests work against the topology of the network (r16980).
>
> I was rather hoping to be talked into accepting 16886 ... we should  
> at least
> have some logging in such cases IMHO.
>
> Requests really shouldn't be taking that long - maybe it's related  
> to the HTL
> problem, maybe we have such a perverse network topology that we are  
> resetting
> HTL time after time after time, I dunno, simulations would be  
> interesting.

I'm thinking that message queue priorities will obsolete this problem,  
as the path for responses to requests will solidify nearly  
immediately. Which is to say, we will then see mostly fetch-timeouts,  
not accepted/fatal-timeouts.

>> Instead of canceling the search request, I have put similar logic in
>> RequestHandler to simply not respond to very old requests (r16982),
>
> Ok. Maybe this is where we should have such logging.

Agreed, r17017. I've put it in at 'error' level, so it will be a bit  
chatty until priorities come into play.

>> and not to hang onto the thread if no response is therefore necessary
>> (r16983).
>>
>> r16983, (given the timeout) makes extra calls to  
>> rejectedOverload(); I
>> imagine that at the timeout (every 2 minutes while the local request
>> is running) they are benign, but I can code up an infinite wait (as
>> before) if needed.
>
> Well it's not strictly infinite - it's 2min*peers at most. You  
> reverted this
> in 16985.
>>
>> I'm going to try and stay away from high level changes for a while,
>> but at your suggestion I will look at the HTL code.
>
> You are more than welcome to continue to investigate the timeouts,  
> they have
> been a problem for a long time. Do they still show up in simulation  
> with the
> HTL problem fixed? I'm very happy to have somebody else working on  
> core
> stuff, I'm equally happy to point out any obvious problems with their
> commits. :)

Why thanks!

--
Robert Hailey


Reply via email to