First, just review the birthday problem. Collision free set of k uniformly selected objects from a total set of N occurs with probability:
(1)(1-1/N)(1-2/N)...(1-(k-1)/N) If k << N, you can approximate the above by taking the ln, and noting that ln(1-x) ~ -x for small x: (1)(1-1/N)(1-2/N)...(1-(k-1)/N) ~ exp(-k(k-1)/(2N)) If you generate keys are rate R we can ask when you expect to have a collision: k = Rt, and solve for exp(k(k-1)/2N) ~ 1/2. Just to get the order, you get the square-root result: t = (1/R) \sqrt{N}, or the maximum rate: R = (1/t)\sqrt{N} In your second case, if we assume one timestamp per time P (or P units of time per timestamp), a maximum time T, and smaller set of names: n. We have a maximum number of timestamps as T/P. To compare, total names are N=nT/P. A collision happens if you choose the same name in the set n, at the same timestamp. If we assume key generation rate R per time, then you have RP keys per timestamp. So, use the above formula with k = RP and maximum number of names n = NP/T: approximately: (RP)^2 T/PN = 1, so T = N/(PR^2). The maximum key rate this can support: R = \sqrt{N/PT}. You can recover the original result if have one big timestamp: P = T, then R = (1/T) \sqrt{N}. To support a bigger R, we can control the parameter P. Clearly P <= T, so the timestamp approach is always better when compared this way (because \sqrt{N/PT} >= \sqrt{N/T^2} = (1/T)\sqrt{N}. So, choose P as small as is practical: so, the unit of time for the least precise timer you want to invoke. I guess you want to make sure T is ridiculously large to avoid Y2K issues. So, seems like a good idea. I would add one additional issue however: timestamps increase monotonically. You might want to use some one-to-one mapping for the bits of the timestamp that destroys the monotonicity property before using them as parts of keys in a hashtable. I guess this is well known, or I made a mistake in the analysis. Best, On Fri, Apr 24, 2009 at 2:54 AM, Russ Weeks <rwe...@gmail.com> wrote: > Hi, All. > I'm playing around with building some data structures backed by a > conventional DHT like Chord or Kademlia. It seems like many of the elements > in these structures don't need a semantically useful key. Take a linked > list, for instance. I need to be able to do a "get" on the head or tail of > the list, but the keys of the interior elements don't need to be exposed to > the caller. This suggests that I can use random data for these keys. My > question is, what's the best way to generate these keys. > The obvious approach is just to fill the key (say 128 bits) with random > data. I think I understand the birthday problem and how to estimate the > probability that a collision will occur. > An alternative approach that was proposed to me was to use 64 bits of random > data and a 64-bit timestamp. Here, we hope that the random number generator > is not seeded with the system time. The theory is, the odds of generating 2 > identical 64-bit numbers in the same nanosecond is less than the odds of > generating 2 128-bit numbers over the lifetime of the system. If this is > true, it's not clear to me. > Can anybody suggest a way to quantify the likelyhood of collision in the > second approach? > Thanks, > Russ > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@lists.zooko.com > http://lists.zooko.com/mailman/listinfo/p2p-hackers > > -- P. Oscar Boykin http://boykin.acis.ufl.edu Assistant Professor, Department of Electrical and Computer Engineering University of Florida _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers