Re: Where now? was Re: [freenet-dev] Re: iTL (was Re: Where now?)

Toad Tue, 02 Dec 2003 12:37:55 -0800

Okay, so the answer to the original question:
* Implement selective accept/unobtanium. When moderately overloaded,
  accept only those queries with the best NGRouting estimates. QR the
  rest. Rather than accepting the first ones to come in. This is based
  on probabilistic accept and NGR.
* Abolish termination of requests on receipt of DataNotFound. Rewrite
  estimators to take this into account, and make any obvious corrections
  necessary in other areas. DNF now means rejected the query
  for a reason related to the key. Send DNF when we selectively reject
  above, as well as the usual ways.
  This has the potential to cause a load explosion; this will be managed
  partly by the above.
* Implement the other half of ian's load balancing idea. However, we
  want to figure out a way to do this without sacrificing unobtanium.
  One possibility: queue requests to send briefly, as we would queue
  inbound requests, and then send the best ones (to comply with the
  node's rate limit), and make the rest move on to their next best
  choice.



As far as 0.5.3 goes.. reports from the userbase are widely variable and
on the whole discouraging... maybe we should hack on on NGR.

On Tue, Dec 02, 2003 at 11:18:46AM -0800, Martin Stone Davis wrote:
> Toad wrote:
> 
> >On Mon, Dec 01, 2003 at 10:52:52PM -0800, Martin Stone Davis wrote:
> >
> >>Toad wrote:
> >>
> >>
> >>>On Sun, Nov 30, 2003 at 10:09:40AM -0800, Martin Stone Davis wrote:
> >>>
> >>>
> >>>>Ian Clarke wrote:
> >>>>
> >>>>
> >>>>
> >>>>>Ok, the new stable build seems to be working quite well, are other 
> >>>>>people experiencing the same thing?
> >>>>>
> >>>>>We need to take stock of the situation with NGR.  I think one problem 
> >>>>>has been a willingness to dream up solutions, and implement them, 
> >>>>>before actually understanding what the problem is.
> >>>>>
> >>>>>I would like to propose that now that the time-pressure is off, we try 
> >>>>>to be more cautious - we need to form theories about what the problem 
> >>>>>is, figure out how to test these theories, and if they prove true, 
> >>>>>*then* we implement a solution.
> >>>>
> >>>>Well, okay, but how does that relate to the plan I outlined 
> >>>>(http://article.gmane.org/gmane.network.freenet.devel/8191)?
> >>>>
> >>>>I understand why you would want more proof that we should drop HTL in 
> >>>>favor of time-to-live (Toad's idea, which I support).  To do so *right 
> >>>>now* would be a big change, since we still have many details to be 
> >>>>filled in (http://article.gmane.org/gmane.network.freenet.devel/8184). 
> >>>>Therefore we should only do it if we're confident that it will cure a 
> >>>>major ill.
> >>>
> >>>
> >>>Unfortunately, we can't. Well, we can't easily. Because there will be a
> >>>nonlinear relationship between pDNF and the time to live value. 
> >>
> >>Right.  That's why I was having difficulty filling in the details on 
> >>your original plan.
> >
> >
> >However, there is a nonlinear relationship between pDNF and the HTL now.
> >We ignore it...
> True.
> 
> >
> >>>One
> >>>possible solution is to use something more sophisticated than an
> >>>estimator (some sort of machine learning algo maybe). 
> >>
> >>Well, it would still be an estimator.  It would just be a more 
> >>sophisticated estimator.
> >>
> >>
> >>>Another solution
> >>>would be to not have a variable timeout. Requests die when they go to
> >>>all *available* nodes (which is limited under load by unobtanium [ only
> >>>accepting requests we have a chance of completing ])...
> >>
> >>Nice.  That certainly would make calculating estimate() easier, since we 
> >>wouldn't have to worry about accounting for time-to-live.  Which, of 
> >>course, is your point.  The other benefit is that it removes an 
> >>alchemical parameter.
> >
> >
> >Definitely.
> >
> >>How does a node decide when a request was killed?  What if a node 
> >>accepts a query and never returns a DNF or a QR?  What if it just keeps 
> >>saying "I'm working on it..."?
> >
> >
> >We have timeouts... not an overall request timeout, a timeout for a
> >response, within the request. For example, if we don't get an Accepted
> >within 1 hop time, we timeout and use another node, in the current code.
> Okay, I didn't know that.
> 
> >All that is already written, but many parts would have to be rewritten
> >due to no HTL.
> 
> Doesn't the below QUERY_DEAD_FACTOR stuff amount to a sensible rewrite 
> of it then?
> 
> >
> >>I think the best thing we could do is the following: Compute the 
> >>estimatedSearchTimeForThisNode=estimatedSearchTime() as the amount of 
> >>time we expect this node to tell us it has found the data.  Then, once 
> >>any node has not started sending a trailer for longer than 
> >>QUERY_SEARCH_DEAD_FACTOR*estimatedSearchTimeForThisNode milliseconds, we 
> >>start considering whether we should consider the query killed.  To do 
> >>*that*, we check other available nodes and compute estimatedSearchTime() 
> >>for the best available node (still use estimate() to determine which 
> >>node is best).  When timeElapsed > QUERY_SEARCH_DEAD_FACTOR * 
> >>MAX(estimatedSearchTimeForThisNode,estimatedSearchTimeForBestAvailableNode), 
> >>we consider the query dead, and try the best available node.  The only 
> >>alchemy there is the configurable parameter, QUERY_DEAD_FACTOR. (I'd 
> >>recommend we try QUERY_DEAD_FACTOR=2.)
> >
> >
> >Hmmm.
> >
> >>In fact, we could use the same sort of philsosphy to kill transfers due 
> >>to a too-low transfer rate.  Once 
> >>bytesReceived/timeElapsedSinceSearchSuccess < 
> >>(1/OVERALL_TRANSFER_RATE_DEAD_FACTOR) * 
> >>MAX(estimatedRawTransferRateForThisNode,estimatedEffectiveTransferRateForBestAvailableNode),
> >> 
> >>we kill the transfer and go with the best available node.  (The 
> >>difference between a "raw" transfer rate and an "effective" transfer rate 
> >>is that the "effective" transfer rate accounts for the additional time 
> >>required to search for the key.)
> >
> >
> >I don't like killing transfers. Unless we have a really good reason.
> 
> The really good reason in this case is that a node which promised us 
> lots and lots of sex is only givin' it up when she's on her period now. 
>   And *also* because there's this hot chick at the office who's 
> promising to do it with us.  Gawd I love those sex metaphors.
> 
> >
> >Besides, why didn't we go with the best available node in the first
> >place?!
> 
> The best available node is the best available one RIGHT NOW.  That 
> changes with time due to backoff.  If the node we are currenly 
> querying/transferring with goes south and there's another another node 
> available which we think will do _much_ better than the current one 
> *actually is* doing, then we figure it's to our advantage to ditch the 
> current one and go with the best available.  How much is "much" is the 
> configurable factor.
> 
> >
> >>And, instead of using the transfer rate since the beginning of the 
> >>transfer, we might also consider just the RECENT transfer rate 
> >>experience.  That way we don't wait a long time while a node which 
> >>transferred all but the last 1k of a 1MB key stalls forever.  I guess 
> >>that would involve two configurable parameters: the amount of time that 
> >>is considered "recent", and the RECENT_TRANSFER_RATE_DEAD_FACTOR.
> >>
> >>Given the above, doesn't it also mean that we can throw away failure 
> >>tables???  Am I going nuts here?  What would be the use of failure 
> >>tables in such a system?
> >
> >
> >2 things:
> >1. Keys that Frost polls, for example, need to not take up a full
> >search's resources. Hence we DNF when we get a recently DNFed key. We
> >could still do this, with the new meaning of DNF.
> >2. We don't want such keys to influence routing stats and estimators.
> >This is the secondary failure table, discussed a while ago on this list.
> >
> Okay, I see now.  But at least we won't need to create failure tables 
> for each individual node, right?
> 
> >>If I'm right about all that, then we have all the details needed for 
> >>implementation and we now just need a name for it.  Not hops-to-live, 
> >>not time-to-live...  how about indefinite-to-live, or iTL?  Looks kinda 
> >>Macintoshy.  Hip.
> >
> >
> >How about Flood? :)
> hmm... might be bad for PR. ;)
> 
> >
> >>I've re-convinced myself that we (you) should go ahead and implement 
> >>iTL, ian's backoff, unobtanium, and the new estimate() all at once, ASAP 
> >>after you've gotten 0.5.3 out the door.
> >>
> >>Whatcha think?
> >>
> >>-Martin
> 
> 
> -Martin
> 
> 
> _______________________________________________
> Devl mailing list
> [EMAIL PROTECTED]
> http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

-- 
Matthew J Toseland - [EMAIL PROTECTED]
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.

signature.asc
Description: Digital signature

_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: Where now? was Re: [freenet-dev] Re: iTL (was Re: Where now?)

Reply via email to