I uploaded some data extracted from Serapis simulations:

http://student.nada.kth.se/~md98-osa/nodes_30_days_low_caching
http://student.nada.kth.se/~md98-osa/documents_30_days_low_caching

This simulation ran for 2592011 seconds (30 days). 1306 nodes in the end
(1726 total), inserted 51713 docs of which 11903 docs (6491768414 bytes)
where still in the request pool in the end. 332292 requests were
successful, 185069 failed (64% success).

The number of documents kept in the request pool at any time was about
10% of the total capacity of the network. The size of the network was
set as a population of 2000 people, of which anybody outside the network
would join as an exponential dist. with a mean of 20 days, and anybody
in the network would leave as an exponential dist. with man of 60 days.

This network was caching rather little, with a probability of 4/(4 +
hopsSinceDSource^2) chance of caching the data. I ran a simulation
parallel on another machine that cached more (10 instead of 4 as the
constant) with marginally worse results). There was no load balancing
other than for nodes with datastores that were not full: when the
fullnessRatio of the store < .82 the chance of reseting the DataSource
was (1 - fullnessRatio)^2  (otherwise 1/30 as on we use in the field).

HopsToLive on requests and announcements 
In the first document:

- Created is when the node joined the network in seconds from the start
(though a bug divided it by 100 here, so 100s of seconds). 

- Id is just the nodes Id in the sim, Retrieves shows the number of
retrieves (DataRequests) handled, Inserts the number of inserts,
Announcements the number of announcements. 

- FoundData the number of times a Retrieve found the data it was looking
for at that node. 

- Traffic shows the total number of queries the node received per
second, the background activity for any node (the number of requests
normally started there) is about 2.5e-4 so any node with that or less
received few or no requests externally routed to it.  

- Spread shows the number of retrieves received as a distribution over
the 4 highest order bits in the requested key (a bug cut off the first
value).

In the second document:

- Created shows the time the document was inserted (actually in seconds
this time)

- Document shows the key value in hex, and the size of the data.

- Success shows the number of times it was retrieved successfully.

- Failed shows the number of times the document failed to be retrieved.

- Nodes shows the number of nodes the document was cached in when the
simulation finished.

- Mean (deriv (?)) shows the Mean and standard deviation of the number
of hops into the network the data was found when it was.

- Probability shows the probability that the data would be used if it
comes up in the pool when the simulation finished. This value increases
slightly with successful requests, decays slightly with failed ones, and
decreases toward 0 with time (this is probably not an ideal mathematical
model, but I'm not sure how it should be). It starts off normally
distributed around .5 with a small deviation and bounded by [0,1].


What is interesting to note is that earlier results from Theo regarding
the formation of two classes of nodes, class A nodes that get routed to,
and class B that don't, is reflected. Of the ~1200 nodes present in this
simulation in the end, only 299 show traffic significantly greater than
the background activity, of these the entire delta of requests is almost
without exception in a very small range of the keyspace. The oldest
nodes are almost all Class A, but they are alone, see for example node
848 and contrast that with the nodes created at around the same time.
It's possible that load balancing further by reseting the datasource
more often on nodes with low activity (and vice versa) would help, but
it could also be that the announcement protocol is to blaim and many of
these nodes never get a chance in the first place (it can be noted, that
in earlier runs when I increased the announcement HTL from 20 to 40 did
not seem to change much, though adding the DataSource fill ratio thing
did help a bit).

The documents output is harder to interpret. The low number of hops used
on the successful requests is interesting, especially as it holds even
for those documents that failed a large number of times (see for example
#923e7789fe1cfc41 - one would at least expect a greater deviation value
on such a document). It can also be noted that the majority of the
failures come from documents that are cached at significantly less
places on the network than most (for example #7df76f96ec18c469) - I'm
not sure what could be the reason for this. Another weird thing is that
documents at the bottom that were inserted right before the end of the
sim and never requested are all stored in 10 or 11 nodes, even though
they were inserted with HTL 20 (I suspected a bug, but it looks correct)
- it could be syndromr of steep requests, half the hops are spent
looping back and forth to the same Class A node that is found almost
right away (the high "gravity" of a Class A node could also explain the
failed requests - they get routed to one Class A node covering that
keyspace, but the data is in another covering the same keyspace, and
they cannot break free).

Don't know if this interests anybody. Serapis is running quite well
now, simulations like this only take a couple of hours on the nearest
PIII 600, and end up around 72 megs of RAM. I'll run a 90 day simulation
while I sleep to see what happens.

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar at freenetproject.org

_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl

Reply via email to