On Thursday, April 23, 2009 Russ Weeks wrote:
> The theory is, the odds of generating 2 identical 64-bit numbers
> in the same nanosecond is less than the odds of generating 2 128-
> bit numbers over the lifetime of the system.  If this is true, 
> it's not clear to me.
>
> Can anybody suggest a way to quantify the likelyhood of collision
> in the second approach? 

First, you have to figure out how many different keys will be compared
to a given key over its lifetime. Say, your key lives for a year and 
you generate 1,000 keys per second - so the key has to be compared to
about 3*10^10 different keys. The probability of the 128-bit (3*10^38)
key colliding to one of them is about 10^-28. For all keys generated
over a year's time, the total probability of at least one of them
colliding with another key will be 3*10^10 more, or about 3*10^-18.
If you have 100M-machine installed base, the probability of at least
one of them having a collision during this year will be 3*10-10. This
means that collisions will be happening in your installed base at the
rate of approximately once in 3 billion years. Substitute your own
numbers and repeat.

Best wishes -
S.Osokine.
24 Apr 2009.

________________________________

From: p2p-hackers-boun...@lists.zooko.com 
[mailto:p2p-hackers-boun...@lists.zooko.com] On Behalf Of Russ Weeks
Sent: Thursday, April 23, 2009 11:54 PM
To: theory and practice of decentralized computer networks
Subject: [p2p-hackers] Probability analysis - any tips?


Hi, All. 

I'm playing around with building some data structures backed by a conventional 
DHT like Chord or Kademlia.  It seems like many of the elements in these 
structures don't need a semantically useful key.  Take a linked list, for 
instance.  I need to be able to do a "get" on the head or tail of the list, but 
the keys of the interior elements don't need to be exposed to the caller.  This 
suggests that I can use random data for these keys.  My question is, what's the 
best way to generate these keys.

The obvious approach is just to fill the key (say 128 bits) with random data.  
I think I understand the birthday problem and how to estimate the probability 
that a collision will occur.

An alternative approach that was proposed to me was to use 64 bits of random 
data and a 64-bit timestamp.  Here, we hope that the random number generator is 
not seeded with the system time.  The theory is, the odds of generating 2 
identical 64-bit numbers in the same nanosecond is less than the odds of 
generating 2 128-bit numbers over the lifetime of the system.  If this is true, 
it's not clear to me.

Can anybody suggest a way to quantify the likelyhood of collision in the second 
approach?

Thanks,

Russ
_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to