OK, I think I've found a few errors in how NGrouting operates. Please note that anything I say here takes precedence over my previous message. Also could someone please check all of this. I can't verify my own sanity :)
First, very minor: line 643 of /node/rt/ResponceTimeEstimator if ((pos == 0 && n.compareTo(key[0]) < 0) || (pos == key.length - 1)) { not so sure. Should this be: if ((pos <= 0 && n.compareTo(key[0]) < 0) || (pos >= key.length - 2)) { Don't remember well enough whether key[key.length] is valid or not. <= and >= are just in case the other function changes in the event of the data not being there. Second in the Guess function, we should do: if ( n.compareTo(key[pos]) == 0 ) return time[pos]; This way if we have retrieved this exact key from this node before, we should be able to do so again. Although it might be faster if we just checked if we had ever seen the data before overall, and then immediately went to that node from the route(...) function in /node/rt/NGRoutingTable. That way, in that case, we would not need to calculate the estimates at all. And in /node/rt/nodeStanderdEstimator.estimate(...) replace ~ 162: if (pDNF==0) pDNF = pLegitDNF; with: if (pDNF < pLegitDNF && pDNF != 0) pDNF = pLegitDNF; ... and ... estimate += pNotConnectFailedOrSearchFailed * (pDNF - pLegitDNF) * (tDNF + requestFailTime); with: if (pDNF==0) estimate += pNotConnectFailedOrSearchFailed; else estimate += pNotConnectFailedOrSearchFailed * (pDNF - pLegitDNF) * (tDNF + requestFailTime); Now both of these sections that I have just pointed out still have serious problems. First with the /node/rt/ResponceTimeEstimator.guess(...) function we are only interpolating between the nearest two points. Because it is possible to have one point that is a fluke, I would feel better if we averaged more, or did something like, compaired the closest point, the one above it and the one below it, and then of the three averaged the two that were closest to each other and returned that. Second and far more importantly: In nodeStanderdEstimator.estimate(...) we should not be using requestFailTime for a DNF at all. RequestFailedTime is the time to get the file successfully from another node if this one should fail. The problem with this is that it looks in our global request time table and returns an estimated value for how long this would take for us to get. However we would not be being asked for the file if the other node did not think we were their best bet for quickly getting the file. Even then the time to successfully get it is that time plus the time it would take to get back to the original requester. It is true that we don't always want to include this number, because not all of the data is really in the network, and after so many tries the user is fairly likely not to ever find it and so will give up. However properly trying to estimate all this is somewhat difficult. There are several problems: If the request has to travel back all the way to the original requester, etDNF should be a global value rather than a node specific one. (Or we could use a node specific one then add a global one, but that seems unnecessary) It should be normalized by htl and then multiplied by the max htl. We cannot simply use the unnormalized values because as requests pass through our node, they are on average half way to their destination already. Also if the requester is then going to re-request the file, we should try to add that time into our estimate. This could be done by replacing requestFailTime (just for DNF's) with a new value that is created by averaging all of your own request times. (or at least a random sample) The real problem with doing both of these is that it makes finding a the probability of a ligitDNF (which won't be re-requested) very important. The reason that this is so difficult, is that if we do fail, we don't know why that was. If we see the re-request (or if we don't), then we start a new timer, rather than adding it to the old one as we should. So as a result this skews all of our time averages. Then we have seen a failure and a success, and if other nodes tend to behave the same way, the first failure is seen as legitimate, and we do not get punished. One thing that could be done to lessen this would be to first check whether the key is a CHK or ksk/ssk then we could calculate different failure probability's biased on the key type. What's more, we could use CHK's as a baseline. We might assume that there is a near zero probability of a ligitDNF for a CHK, then we can take the probability of DNF for a CHK for the minimum probability of non-ligitDNF for an SSK or KSK at the same key value. It's easy to say that there is a possibility of lowering estimates too much, because the data may be in the network and unreachable, or the user may not actually re-request it. However we don't need to worry about the former as long as we don't underestimate the probability of ligitDNF. (Which I don't think is likely given how it currently works) And the latter in not an issue at all right now, because we only can count the time that a successful request takes. So we will never be estimating more than one re-request. To the critics on the other side who say "It's always better to find it the first time and be safe", this method would always include a little bit of an overestimate that should even out any inaccuracies. This is because we don't know the original HTL, and we can't use the average, so we have to use the MAX. _______________________________________________ Devl mailing list [EMAIL PROTECTED] http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl