On Thursday 18 February 2010 20:44:57 Evan Daniel wrote:
> I've followed up my previous crude estimates of node churn with some
> more detailed numbers.  (See my mail in re: "data persistence again"
> on 20100122 for previous version and more detailed explanation.)
> 
> Again, some brief caveats: the following basically assumes that all
> samples are independent.  This is quite incorrect, because of time of
> day effects.  Nonetheless, I think it's useful.  Many of the obvious
> uses for this data ("If an insert is stored on 3 nodes, how likely is
> it one of them will be online later?") are strongly impacted by this.
> Use appropriate caution in analysis.  Also, I have a few missing
> samples; for each sample, I looked at the previous set of 24 samples
> that I did have, whether or not those were contiguous.
> 
> What I did: for each of the probe request samples, I computed how many
> nodes appeared in n of the previous 24 samples (24 samples at 5 hour
> intervals is a 5 day window).  I then averaged these counts across
> samples.  If an average sample has N_i nodes appearing in i of the
> previous 24 samples, then the average sample size over those 24 is
> sum(N_i*(i/24)).  Over the 387 samples (ignoring the first 23 where
> there aren't a "most recent 24 samples"), I have an average sample
> size of 5757.1 nodes.  If we assume that each node is online with
> probability i/24, and all nodes are independent (see previous caveat
> about this assumption being incorrect), then the number of nodes that
> are online in both of two different sampling intervals is
> sum(N_i*(i/24)^2).  For this number, I get 3511.5 nodes.  That is, if
> you select a random online node at some time t_1, the odds that it
> will be online at some later time t_2 are about 0.610.
> 
> I then repeated the above using the most recent 72 samples (15 days).
> The distributions were roughly similar.  Average sample size was
> 5824.1, expected nodes online in both of two samples is 3106.8, or a
> probability of 0.533 that a randomly chosen node will be online later.
> 
> Nodes online in 24 of 24 samples make up 21.9% of an average sample.
> Nodes online in 70, 71, or 72 samples make up 13.6%.  Low-uptime nodes
> (< 40% according to sink logic; here taken as <= 9 samples of 24 or <=
> 27 of 72 (to make the 24/72 numbers directly comparable)) are 30.8% on
> the 24-sample data, and 37.7% on the 72-sample data.  I believe both
> of these discrepancies result from join/leave churn, whether permanent
> or over medium time periods (ie users who use Freenet for a couple
> hours or days every few weeks).
> 
> Evan Daniel
> 
> (If you want the full spreadsheet or raw data, ask.  The spreadsheet
> was nearly 0.5 MiB, so I didn't attach it.  The averaged counts are
> below; this is enough to reproduce my calculations assuming samples
> are independent.)
> 
Some more analysis on this:

[14:24:50] <evanbd> toad_: 5757 nodes online in an average sample.  Taking high 
uptime as 23 or 24 samples, low uptime as 1-9 samples, and medium as 10-22...
[14:25:52] <toad_> evanbd: the other question of course is how much redundancy 
can we get away with before it starts to be a problem ... that sort of depends 
on MHKs though
[14:25:56] <evanbd> toad_: The high uptime group is 1505 nodes (1258 in 24/24). 
 They have an average uptime of 99.3%.
[14:26:23] <evanbd> toad_: The medium uptime group is 2478 nodes; they have an 
average uptime of 65%.
[14:26:25] <toad_> if we don't have MHKs, the top block will always be grossly 
unreliable ...
[14:26:38] <toad_> evanbd: this is by nodes typically online ?
[14:26:47] <evanbd> toad_: And the low uptime group is 1774 nodes, with average 
uptime 22.9%.
[14:27:51] <toad_> evanbd: okay, and this is by nodes online at an instant?
[14:28:09] <evanbd> toad_: This is: Choose a random sample; choose a random 
node online in that sample.  It will be a medium-uptime node with probability 
2478/5757 (= 0.430).  On average, its uptime will be 65%.
[14:28:17] <evanbd> toad_: (In other words, yes)
[14:28:31] <toad_> this is much better than i had expected
[14:28:47] <evanbd> Well, by definition their uptime is > 40% :)
[14:28:59] <toad_> so 26% have 99% uptime, 43% have 65% uptime, and 31% have 
23% uptime
[14:29:18] <toad_> right, but it means that nearly 70% of nodes online at any 
given time have 65%+ uptime
[14:29:32] <toad_> i.e. we are *not* swamped with low uptime nodes
[14:30:01] <toad_> at least if we consider a week ... this doesn't answer the 
question of try-it-and-leave
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20100504/1833ee81/attachment.pgp>

Reply via email to