On Thursday, April 23, 2009 Russ Weeks wrote: > The theory is, the odds of generating 2 identical 64-bit numbers > in the same nanosecond is less than the odds of generating 2 128- > bit numbers over the lifetime of the system. If this is true, > it's not clear to me. > > Can anybody suggest a way to quantify the likelyhood of collision > in the second approach?
First, you have to figure out how many different keys will be compared to a given key over its lifetime. Say, your key lives for a year and you generate 1,000 keys per second - so the key has to be compared to about 3*10^10 different keys. The probability of the 128-bit (3*10^38) key colliding to one of them is about 10^-28. For all keys generated over a year's time, the total probability of at least one of them colliding with another key will be 3*10^10 more, or about 3*10^-18. If you have 100M-machine installed base, the probability of at least one of them having a collision during this year will be 3*10-10. This means that collisions will be happening in your installed base at the rate of approximately once in 3 billion years. Substitute your own numbers and repeat. Best wishes - S.Osokine. 24 Apr 2009. ________________________________ From: p2p-hackers-boun...@lists.zooko.com [mailto:p2p-hackers-boun...@lists.zooko.com] On Behalf Of Russ Weeks Sent: Thursday, April 23, 2009 11:54 PM To: theory and practice of decentralized computer networks Subject: [p2p-hackers] Probability analysis - any tips? Hi, All. I'm playing around with building some data structures backed by a conventional DHT like Chord or Kademlia. It seems like many of the elements in these structures don't need a semantically useful key. Take a linked list, for instance. I need to be able to do a "get" on the head or tail of the list, but the keys of the interior elements don't need to be exposed to the caller. This suggests that I can use random data for these keys. My question is, what's the best way to generate these keys. The obvious approach is just to fill the key (say 128 bits) with random data. I think I understand the birthday problem and how to estimate the probability that a collision will occur. An alternative approach that was proposed to me was to use 64 bits of random data and a 64-bit timestamp. Here, we hope that the random number generator is not seeded with the system time. The theory is, the odds of generating 2 identical 64-bit numbers in the same nanosecond is less than the odds of generating 2 128-bit numbers over the lifetime of the system. If this is true, it's not clear to me. Can anybody suggest a way to quantify the likelyhood of collision in the second approach? Thanks, Russ _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers