Hi,
I decided to take a look at routing to see if it could be improved before NG routing
is implemented. Think what I have come up with works better that the current
code.
I have done a few things. First I instrumented the module. This way we can see what
events are occurring. Then I made the number for ip level and application level
connects
be accounted independently, both globally and per node. I then changed the model used
to estimate CP to use the above numbers. Here is how it all works. Every event is
assigned
a weight and this is used to update the CP as follows:
CP = CP + .1 * (weight - CP)
let
RC = route_connect_successes / route_connect_trials
AC = application_connect_successes / application_connect_trials
then
routeConnected
increments route_connect_trials
routeAccepted
increments route_connect_successes
routeSucceeded
increments application_connect_successes and application_connect_trials
adjust CP with a weight of 1.0
connectFailed
increments route_connect_trials
adjust CP with a weight of 0.0
authFailed
adjust CP with a weight of 0.0
Note: what we do here does not really mater since this event does not happen
(I have yet to see it)
timedOut
adjust CP with a weight of (0.05)
Note: this may also increment route_connect_trials
transferFailed
adjust CP with a weight of (0.05)
Note: what we do here does not really mater since this event does not often
verityFailed
adjust CP with a weight of (0.0)
Note: what we do here does not really mater since this event does not happen
(I have yet to see it)
queryRejected
increments application_connect_trials
if this a cached connection
adjust CP with a weight of max(AC, 0.01)
else
adjust CP with a weight of max(AC*RC, 0.01)
Note: the minimum is there since, in ark terms, only a weight of 0.0 in
interesting. Anything else
implies we found a node...
In order to conserve nodes with lots of references two things were done. On my test
nodes RC=0.6 and
AC=.7, so I set the starting CP =0.4=(RC*AC). Nodes reset the data source to
themselves to ask for more
traffic. So the second thing I do is intercept 'reference' and adjust CP towards 1.0
depending on the
number of references for the node using:
CP = CP + (.9-.8*((max_refs-refs)/max_refs)**2)*(1.0 - CP)
The net affect of the above changes is that, using 64 entries in the routing table
with a max of 64 refs
I find an average of 12 refs per node after a few days of uptime.
Feedback from some busy nodes trying this would be really nice. The patch applies to
6043 and 6503.
Comments?
Ed Tomlinson
---------------
--- CPAlgoRoutingTable.java.orig 2003-06-06 21:42:30.000000000 -0400
+++ CPAlgoRoutingTable.java 2003-06-15 11:23:00.000000000 -0400
@@ -1,4 +1,4 @@
-package freenet.node.rt;
+Package freenet.node.rt;
import freenet.BadAddressException;
import freenet.Address;
@@ -66,12 +66,19 @@
this.minDowntimeARKLookup = minDowntimeARKLookup;
this.minCP = minCP;
this.maxLookupThreads = maxLookupThreads;
+ this.cachedAttempts = 0;
+ this.cachedSuccesses = 0;
+ aFailed = 0;
+ aTimedOut = 0;
+ tFailed = 0;
}
int consecutiveFailuresLookupARK;
int minDowntimeARKLookup;
float minCP;
int maxLookupThreads;
+ int cachedAttempts, cachedSuccesses;
+ int aFailed, aTimedOut, tFailed;
public RTDiagSnapshot getSnapshot() {
long nowMs = System.currentTimeMillis();
@@ -80,6 +87,7 @@
totalRefs = 0;
contactedRefs = 0;
requestingRefs = 0;
+ totalKeys = 0;
// ^^^ the fact that you have to do this suggests these
// should be local variables not instance variables -tc
@@ -111,6 +119,7 @@
}
ReferenceSet refSet = ReferenceSet.getProperty(memory, "keys");
+ totalKeys += refSet.size();
Enumeration refs = refSet.references();
while (refs.hasMoreElements()) {
if(shouldLog)
@@ -153,10 +162,16 @@
Object[] props = new Object[TABLE_PROPERTIES.length];
props[0] = new Integer(totalRefs);
props[1] = new Integer(contactedRefs);
- props[2] = new Integer(requestingRefs);
- props[3] = new Integer(totalAttempts);
- props[4] = new Integer(totalSuccesses);
- props[5] = getClass().getName();
+ props[2] = new Integer(totalKeys);
+ props[3] = new Integer(requestingRefs);
+ props[4] = new Integer(totalAttempts);
+ props[5] = new Integer(totalSuccesses);
+ props[6] = new Integer(cachedAttempts);
+ props[7] = new Integer(cachedSuccesses);
+ props[8] = new Integer(aFailed);
+ props[9] = new Integer(aTimedOut);
+ props[10] = new Integer(tFailed);
+ props[11] = getClass().getName();
Key[] keyList = (Key[])keys.toArray(new Key[keys.size()]);
if(shouldLog)
@@ -195,21 +210,11 @@
}
}
-
- private final synchronized void incTrials(RoutingMemory mem) {
- CPAlgoData cpd = getProperty(mem, RMKEY, node);
- cpd.trials++;
- // We don't call mem.setProperty() because trials isn't
- // persisted.
- }
-
- private final synchronized void fail(RoutingMemory mem) {
- fail(mem, false);
- }
-
- private final synchronized void fail(RoutingMemory mem, boolean weak) {
+ private final synchronized void fail(RoutingMemory mem, float done, boolean
cached) {
CPAlgoData cpd = getProperty(mem, RMKEY, node);
- if (cpd.decreaseContactProbability(weak)) {
+ if (done == 1.0f)
+ cpd.increaseContactProbability(done);
+ else if (cpd.decreaseContactProbability(done)) {
remove(mem, null);
return;
}
@@ -221,52 +226,30 @@
remove((RoutingMemory)(routingStore.getNode(i)), i);
}
- private final synchronized void succeed(RoutingMemory mem, boolean cached,
- boolean attenuated, boolean weak) {
- CPAlgoData cpd = getProperty(mem, RMKEY, node);
- cpd.successes++;
- if (!cached) {
- // Uncached connections don't count.
- cpd.increaseContactProbability(weak);
- }
- mem.setProperty(RMKEY, cpd);
- }
-
- /**
- * Succeed and/or fail on weak or strong in a single step
- * @param strongSuccess whether we strongly (connection level) succeded
- * @param weakSuccess whether we weakly (query level) succeeded
- * @param ignoreStrong whether to leave the strong component unchanged
- * - generally set when we are dealing with a cached connection
+ /*
+ * Nodes set references to themselves when they want more
+ * traffic - lets take that into account. This should help
+ * conserve nodes with lots of keys, which should lead to
+ * better route selections.
*/
- protected final synchronized void succeedFail(RoutingMemory mem,
- boolean strongSuccess,
- boolean weakSuccess,
- boolean ignoreStrong) {
- CPAlgoData cpd = getProperty(mem, RMKEY, node);
- if(!ignoreStrong) {
- if(!strongSuccess) {
- if(cpd.decreaseContactProbability(false)) {
- remove(mem, null);
- return;
- }
- } else {
- cpd.increaseContactProbability(false);
+ public synchronized void reference(Key k, NodeReference nr) {
+ super.reference(k, nr);
+ RoutingMemory mem = routingStore.getNode(nr.getIdentity());
+ if (mem != null) {
+ ReferenceSet refSet = ReferenceSet.getProperty(mem, "keys");
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ if (refSet.size() > 1) {
+ cpd.perturbCP(refSet.size());
+ mem.setProperty(RMKEY, cpd);
+ Core.logger.log(this, "key reference perturbed CP",
+ Core.logger.DEBUG);
}
- }
- if(weakSuccess) {
- cpd.increaseContactProbability(true);
} else {
- if(cpd.decreaseContactProbability(true)) {
- remove(mem, null);
- return;
- }
+ Core.logger.log(this, "key reference found null mem",
+ Core.logger.DEBUG);
}
- if(weakSuccess || strongSuccess)
- cpd.successes++;
- mem.setProperty(RMKEY, cpd);
}
-
+
protected boolean isRoutable(RoutingMemory mem,
boolean desperate) {
return isRoutable(mem, desperate, "");
@@ -332,43 +315,96 @@
length>500 ? Core.logger.MINOR : Core.logger.DEBUG);
}
}
-
+
+ // 33% done.
protected synchronized void routeConnected(RoutingMemory mem) {
- incTrials(mem);
- totalAttempts++;
+ synchronized(mem) {
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ cpd.trials++;
+ totalAttempts++;
+ cpd.state = 1;
+ }
}
+ // 66% done (routed at ip, auth ok)
protected synchronized void routeAccepted(RoutingMemory mem) {
- // hmm.. what?
+ synchronized(mem) {
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ if (cpd.state == 1) {
+ cpd.successes++;
+ totalSuccesses++;
+ cpd.state = 0;
+ }
+ }
}
+ // 100% done here
protected synchronized void routeSucceeded(RoutingMemory mem, boolean cached) {
- // Don't reward cached connections.
- succeedFail(mem, true, true, cached);
- totalSuccesses++;
+ synchronized(mem) {
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ if (!cached && cpd.state == 1) {
+ cpd.successes++;
+ totalSuccesses++;
+ cpd.state = 0;
+ }
+ cpd.cTrials++;
+ cpd.cSuccesses++;
+ cachedAttempts++;
+ cachedSuccesses++;
+ }
+ fail(mem,1.0f,cached);
}
+ // 0% done (did not get routeConnected)
protected synchronized void connectFailed(RoutingMemory mem) {
- incTrials(mem);
- totalAttempts++;
- fail(mem);
+ synchronized(mem) {
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ cpd.trials++;
+ totalAttempts++;
+ cpd.state = 0;
+ }
+ aFailed++;
+ fail(mem,0.0f,false);
}
+ // 0% done if this happens (have routeConnected but routeAccepted cannot occur)
protected synchronized void authFailed(RoutingMemory mem) {
- fail(mem);
+ synchronized(mem) {
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ cpd.state = 0;
+ }
+ fail(mem,0.0f,false);
// don't deref it completely in case this is corruption or something
- // don't ignore it because we want to clear out nodes that change identity
reasonably quickly
+ // don't ignrore it because we want to clear out nodes that change identity
reasonably quickly
}
+ // 5% done - seems only to be called during setup of the route. We give it 5%
since
+ // we did find something, but it was too busy to respond
protected synchronized void timedOut(RoutingMemory mem) {
- fail(mem);
+ synchronized(mem) {
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ if (cpd.state == 0) {
+ cpd.trials++;
+ totalAttempts++;
+ }
+ cpd.state = 0;
+ }
+ aTimedOut++;
+ fail(mem,0.05f,false);
}
+ // 5% done - suspect this happens after routeSucceeded, no counts are affected.
This
+ // does not happen enought to really matter...
protected synchronized void transferFailed(RoutingMemory mem) {
- fail(mem);
+ synchronized(mem) {
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ cpd.state = 0;
+ }
+ tFailed++;
+ fail(mem,0.05f,false);
}
- // this is not a tyop!
+ // this is not a tyop! No Fail call since we are removing the node...
protected synchronized void verityFailed(RoutingMemory mem) {
// dereference node
remove(mem, null);
@@ -404,6 +440,8 @@
// Reduce the chances of routing to nodes that always
// replies with QueryRejected.
protected synchronized void queryRejected(RoutingMemory mem, boolean cached, long
attenuation) {
+ float rp = 0.5f;
+ float cp = 0.7f;
// I would have preferred to make queryRejected() in addition
// to routeSucceeded() to keep transport and application
@@ -412,9 +450,29 @@
// Counter-intutitive. The routing has succeeded at the
// transport layer, but we got a QueryRejected
// back at the application layer.
- //
- succeedFail(mem, true, false, cached);
- totalSuccesses++;
+
+ synchronized(mem) {
+ CPAlgoData cpd = getProperty(mem, RMKEY, node);
+ if (!cached && cpd.state == 1) {
+ cpd.successes++;
+ totalSuccesses++;
+ cpd.state = 0;
+ }
+ cpd.cTrials++;
+ cachedAttempts++;
+
+ // we bottom out at 0.01 since this is not a routing failure
+ if (cpd.trials > 0)
+ rp = java.lang.Math.max(((float) cpd.successes) / ((float)
cpd.trials), 0.01f);
+
+ if (cpd.cTrials > 0)
+ cp = java.lang.Math.max(((float) cpd.cSuccesses) / ((float)
cpd.cTrials), 0.01f);
+ }
+
+ if (cached)
+ fail(mem, cp, cached); // it was cached, next try probably will be too
+ else
+ fail(mem, rp*cp, cached); // we started from scratch probably will next
time
// Disable routing attenuation. 20020421 --gj
//
@@ -452,9 +510,10 @@
"ARK version", "Fetching ARK", "ARK URL"};
private final static String[] TABLE_PROPERTIES
- = {"Number of node references", "Contacted node references",
- "Node references requesting ARKs", "Total Trials",
- "Total Successes", "Implementation" };
+ = {"Number of node references", "Contacted node references", "Total Reference
Keys",
+ "Node references requesting ARKs", "Route Trials",
+ "Route Successes", "Cached Trials", "Cached Successes",
+ "Route Failures", "Route TimedOut", "TransferFailed", "Implementation" };
// ^^^ i don't understand why you are doing this here
// instead of just passing the NodeReference back -tc
@@ -492,6 +551,7 @@
// Diagnostic stats set by makeRefInfo
private int totalRefs = 0;
private int requestingRefs = 0;
+ private int totalKeys = 0;
private int contactedRefs = 0;
private final Long LONG_MINUS_ONE = new Long(-1);
@@ -634,12 +694,17 @@
**/
class CPAlgoData implements DataObject {
- // persisted.
- float contactProbability = 1.0f;
+ // persisted. Initial is rp=0.55 x cp=0.70 or about 0.4
+ float contactProbability = 0.4f;
+
+ // better, but not foolproof timeout handling (0=initial, 1=routed)
+ int state = 0;
// Diagnostic info
long successes = 0;
long trials = 0;
+ long cSuccesses = 0;
+ long cTrials = 0;
// Number of consecutive failures
long consecutiveFailures = 0;
@@ -745,50 +810,57 @@
}
return contactProbability;
}
+
+ final void perturbCP(int keys) {
+ float t = (float)(Node.rtMaxRefs-keys)/(float)Node.rtMaxRefs;
+ nextCP(1.0f, 0.9f-0.8f*t*t);
+ }
+
+ final void nextCP(float value) {
+ nextCP(value, 0.1f);
+ }
+ final void nextCP(float value, float k) {
+ contactProbability = contactProbability + k*(value - contactProbability);
+ }
+
// returns true if the node should be dereferenced.
- final boolean decreaseContactProbability(boolean weak) {
- contactProbability =
- (float)(weak ? ((96.0 * contactProbability) / 100.0) :
- (((4.0 * contactProbability) /* + 0.0 */) / 5.0));
- // Decrease by 4% for weak, 20% for strong
- if(contactProbability < minCP) contactProbability = minCP;
- consecutiveFailures++;
- // When consecutiveFailures reaches some value, start an ARK lookup
- if(consecutiveFailures >= consecutiveFailuresLookupARK &&
- lastSuccessfulContact + minDowntimeARKLookup <
- System.currentTimeMillis()) {
- synchronized(lookupLock) {
- if(node == null) throw new IllegalStateException
+ final boolean decreaseContactProbability(float done) {
+ nextCP(done);
+ // When consecutiveFailures reaches some value, start an ARK lookup and
+ // its only a failure, in ARK terms, if the passed probability is zero.
+ if(done > 0f)
+ consecutiveFailures = 0;
+ else {
+ consecutiveFailures++;
+ if (consecutiveFailures >= consecutiveFailuresLookupARK &&
+ lastSuccessfulContact + minDowntimeARKLookup <
+ System.currentTimeMillis()) {
+ synchronized(lookupLock) {
+ if(node == null) throw new IllegalStateException
("creating LookupARK with null node");
- if(lookup == null && getARKURI() != null) {
- synchronized(lookupThreadsLock) {
- if(lookupThreads < maxLookupThreads)
- lookup = new LookupARK(getARKURI());
- }
+ if(lookup == null && getARKURI() != null) {
+ synchronized(lookupThreadsLock) {
+ if(lookupThreads < maxLookupThreads)
+ lookup = new LookupARK(getARKURI());
+ }
// FIXME: Kill the LookupARK whose ref is least recently used
+ }
}
- }
+ }
}
lastRetryMs = System.currentTimeMillis();
return false;
}
- final void increaseContactProbability(boolean weak) {
+ final void increaseContactProbability(float done) {
if(justFetchedARK) {
node.diagnostics.occurrenceCounting("successLookupARK",1);
justFetchedARK = false;
}
- if(contactProbability < minCP) contactProbability = minCP;
+ nextCP(done);
+ lastSuccessfulContact = lastRetryMs = System.currentTimeMillis();
consecutiveFailures = 0;
- float f = (float)
- (weak ? (((19 * contactProbability) + 1.0) / 20.0) :
- (((3.0 * contactProbability) + 1.0) / 4.0));
- // Move closer to 1.0 by 5% for weak
- // 25% for strong
- if(f>1) f = 1;
- contactProbability = f;
- lastRetryMs = lastSuccessfulContact = System.currentTimeMillis();
synchronized(lookupLock) {
if(lookup != null) {
lookup.kill();
@@ -796,7 +868,7 @@
}
}
}
-
+
final boolean fetchingARK() {
return lookup != null;
}
----------------
_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl