I uploaded some data extracted from Serapis simulations: http://student.nada.kth.se/~md98-osa/nodes_30_days_low_caching http://student.nada.kth.se/~md98-osa/documents_30_days_low_caching
This simulation ran for 2592011 seconds (30 days). 1306 nodes in the end (1726 total), inserted 51713 docs of which 11903 docs (6491768414 bytes) where still in the request pool in the end. 332292 requests were successful, 185069 failed (64% success). The number of documents kept in the request pool at any time was about 10% of the total capacity of the network. The size of the network was set as a population of 2000 people, of which anybody outside the network would join as an exponential dist. with a mean of 20 days, and anybody in the network would leave as an exponential dist. with man of 60 days. This network was caching rather little, with a probability of 4/(4 + hopsSinceDSource^2) chance of caching the data. I ran a simulation parallel on another machine that cached more (10 instead of 4 as the constant) with marginally worse results). There was no load balancing other than for nodes with datastores that were not full: when the fullnessRatio of the store < .82 the chance of reseting the DataSource was (1 - fullnessRatio)^2 (otherwise 1/30 as on we use in the field). HopsToLive on requests and announcements In the first document: - Created is when the node joined the network in seconds from the start (though a bug divided it by 100 here, so 100s of seconds). - Id is just the nodes Id in the sim, Retrieves shows the number of retrieves (DataRequests) handled, Inserts the number of inserts, Announcements the number of announcements. - FoundData the number of times a Retrieve found the data it was looking for at that node. - Traffic shows the total number of queries the node received per second, the background activity for any node (the number of requests normally started there) is about 2.5e-4 so any node with that or less received few or no requests externally routed to it. - Spread shows the number of retrieves received as a distribution over the 4 highest order bits in the requested key (a bug cut off the first value). In the second document: - Created shows the time the document was inserted (actually in seconds this time) - Document shows the key value in hex, and the size of the data. - Success shows the number of times it was retrieved successfully. - Failed shows the number of times the document failed to be retrieved. - Nodes shows the number of nodes the document was cached in when the simulation finished. - Mean (deriv (?)) shows the Mean and standard deviation of the number of hops into the network the data was found when it was. - Probability shows the probability that the data would be used if it comes up in the pool when the simulation finished. This value increases slightly with successful requests, decays slightly with failed ones, and decreases toward 0 with time (this is probably not an ideal mathematical model, but I'm not sure how it should be). It starts off normally distributed around .5 with a small deviation and bounded by [0,1]. What is interesting to note is that earlier results from Theo regarding the formation of two classes of nodes, class A nodes that get routed to, and class B that don't, is reflected. Of the ~1200 nodes present in this simulation in the end, only 299 show traffic significantly greater than the background activity, of these the entire delta of requests is almost without exception in a very small range of the keyspace. The oldest nodes are almost all Class A, but they are alone, see for example node 848 and contrast that with the nodes created at around the same time. It's possible that load balancing further by reseting the datasource more often on nodes with low activity (and vice versa) would help, but it could also be that the announcement protocol is to blaim and many of these nodes never get a chance in the first place (it can be noted, that in earlier runs when I increased the announcement HTL from 20 to 40 did not seem to change much, though adding the DataSource fill ratio thing did help a bit). The documents output is harder to interpret. The low number of hops used on the successful requests is interesting, especially as it holds even for those documents that failed a large number of times (see for example #923e7789fe1cfc41 - one would at least expect a greater deviation value on such a document). It can also be noted that the majority of the failures come from documents that are cached at significantly less places on the network than most (for example #7df76f96ec18c469) - I'm not sure what could be the reason for this. Another weird thing is that documents at the bottom that were inserted right before the end of the sim and never requested are all stored in 10 or 11 nodes, even though they were inserted with HTL 20 (I suspected a bug, but it looks correct) - it could be syndromr of steep requests, half the hops are spent looping back and forth to the same Class A node that is found almost right away (the high "gravity" of a Class A node could also explain the failed requests - they get routed to one Class A node covering that keyspace, but the data is in another covering the same keyspace, and they cannot break free). Don't know if this interests anybody. Serapis is running quite well now, simulations like this only take a couple of hours on the nearest PIII 600, and end up around 72 megs of RAM. I'll run a 90 day simulation while I sleep to see what happens. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar at freenetproject.org _______________________________________________ Devl mailing list Devl at freenetproject.org http://lists.freenetproject.org/mailman/listinfo/devl
