[freenet-dev] Possible NGRouting bugs.

Tom Kaitchuck Mon, 10 Nov 2003 17:05:05 -0800

OK, I think I've found a few errors in how NGrouting operates. Please note 
that anything I say here takes precedence over my previous message. Also 
could someone please check all of this. I can't verify my own sanity :)


First, very minor:

line 643 of /node/rt/ResponceTimeEstimator
        if ((pos == 0 && n.compareTo(key[0]) < 0) ||
            (pos == key.length - 1)) {
not so sure. Should this be:
        if ((pos <= 0 && n.compareTo(key[0]) < 0) ||
            (pos >= key.length - 2)) {
Don't remember well enough whether key[key.length] is valid or not.
<= and >= are just in case the other function changes in the event of the data 
not being there.

Second in the Guess function, we should do:
if ( n.compareTo(key[pos]) == 0 )
        return time[pos];
This way if we have retrieved this exact key from this node before, we should 
be able to do so again. Although it might be faster if we just checked if we 
had ever seen the data before overall, and then immediately went to that node 
from the route(...) function in /node/rt/NGRoutingTable. That way, in that 
case, we would not need to calculate the estimates at all.

And in /node/rt/nodeStanderdEstimator.estimate(...)
replace ~ 162:
                if (pDNF==0)
                        pDNF = pLegitDNF;
with:
                if (pDNF < pLegitDNF && pDNF != 0)
                        pDNF = pLegitDNF;
... and ...
                estimate += pNotConnectFailedOrSearchFailed
                        * (pDNF - pLegitDNF) * (tDNF + requestFailTime);
with:
                if (pDNF==0)
                        estimate += pNotConnectFailedOrSearchFailed;
                else
                        estimate += pNotConnectFailedOrSearchFailed
                        * (pDNF - pLegitDNF) * (tDNF + requestFailTime);



Now both of these sections that I have just pointed out still have serious 
problems. 

First with the /node/rt/ResponceTimeEstimator.guess(...) function we are only 
interpolating between the nearest two points. Because it is possible to have 
one point that is a fluke, I would feel better if we averaged more, or did 
something like, compaired the closest point, the one above it and the one 
below it, and then of the three averaged the two that were closest to each 
other and returned that.

Second and far more importantly: In nodeStanderdEstimator.estimate(...) we 
should not be using requestFailTime for a DNF at all. RequestFailedTime is 
the time to get the file successfully from another node if this one should 
fail. The problem with this is that it looks in our global request time table 
and returns an estimated value for how long this would take for us to get. 
However we would not be being asked for the file if the other node did not 
think we were their best bet for quickly getting the file. Even then the time 
to successfully get it is that time plus the time it would take to get back 
to the original requester. It is true that we don't always want to include 
this number, because not all of the data is really in the network, and after 
so many tries the user is fairly likely not to ever find it and so will give 
up. However properly trying to estimate all this is somewhat difficult.

There are several problems:
If the request has to travel back all the way to the original requester, etDNF 
should be a global value rather than a node specific one. (Or we could use a 
node specific one then add a global one, but that seems unnecessary) It 
should be normalized by htl and then multiplied by the max htl. We cannot 
simply use the unnormalized values because as requests pass through our node, 
they are on average half way to their destination already. 

Also if the requester is then going to re-request the file, we should try to 
add that time into our estimate. This could be done by replacing 
requestFailTime (just for DNF's) with a new value that is created by 
averaging all of your own request times. (or at least a random sample)

The real problem with doing both of these is that it makes finding a the 
probability of a ligitDNF (which won't be re-requested) very important. The 
reason that this is so difficult, is that if we do fail, we don't know why 
that was. If we see the re-request (or if we don't), then we start a new 
timer, rather than adding it to the old one as we should. So as a result this 
skews all of our time averages. Then we have seen a failure and a success, 
and if other nodes tend to behave the same way, the first failure is seen as 
legitimate, and we do not get punished. One thing that could be done to 
lessen this would be to first check whether the key is a CHK or ksk/ssk then 
we could calculate different failure probability's biased on the key type. 
What's more, we could use CHK's as a baseline. We might assume that there is 
a near zero probability of a ligitDNF for a CHK, then we can take the 
probability of DNF for a CHK for the minimum probability of non-ligitDNF for 
an SSK or KSK at the same key value.

It's easy to say that there is a possibility of lowering estimates too much, 
because the data may be in the network and unreachable, or the user may not 
actually re-request it. However we don't need to worry about the former as 
long as we don't underestimate the probability of ligitDNF. (Which I don't 
think is likely given how it currently works) And the latter in not an issue 
at all right now, because we only can count the time that a successful 
request takes. So we will never be estimating more than one re-request. To 
the critics on the other side who say "It's always better to find it the 
first time and be safe", this method would always include a little bit of an 
overestimate that should even out any inaccuracies. This is because we don't 
know the original HTL, and we can't use the average, so we have to use the 
MAX.

_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Possible NGRouting bugs.

Reply via email to