First, just review the birthday problem.  Collision free set of k
uniformly selected objects from a total set of N occurs with
probability:

(1)(1-1/N)(1-2/N)...(1-(k-1)/N)
If k << N, you can approximate the above by taking the ln, and noting
that ln(1-x) ~ -x for small x:

(1)(1-1/N)(1-2/N)...(1-(k-1)/N) ~ exp(-k(k-1)/(2N))

If you generate keys are rate R we can ask when you expect to have a collision:
k = Rt, and solve for exp(k(k-1)/2N) ~ 1/2.  Just to get the order,
you get the square-root result: t = (1/R) \sqrt{N}, or the maximum
rate: R = (1/t)\sqrt{N}

In your second case, if we assume one timestamp per time P (or P units
of time per timestamp), a maximum time T, and smaller set of names: n.
 We have a maximum number of timestamps as T/P.  To compare, total
names are N=nT/P.  A collision happens if you choose the same name in
the set n, at the same timestamp.  If we assume key generation rate R
per time, then you have RP keys per timestamp.  So, use the above
formula with k = RP and maximum number of names n = NP/T:

approximately: (RP)^2 T/PN = 1, so T = N/(PR^2).  The maximum key rate
this can support: R = \sqrt{N/PT}.  You can recover the original
result if have one big timestamp: P = T, then R = (1/T) \sqrt{N}.  To
support a bigger R, we can control the parameter P.  Clearly P <= T,
so the timestamp approach is always better when compared this way
(because \sqrt{N/PT} >= \sqrt{N/T^2} = (1/T)\sqrt{N}.  So, choose P as
small as is practical: so, the unit of time for the least precise
timer you want to invoke.  I guess you want to make sure T is
ridiculously large to avoid Y2K issues.

So, seems like a good idea.  I would add one additional issue however:
timestamps increase monotonically.  You might want to use some
one-to-one mapping for the bits of the timestamp that destroys the
monotonicity property before using them as parts of keys in a
hashtable.

I guess this is well known, or I made a mistake in the analysis.

Best,

On Fri, Apr 24, 2009 at 2:54 AM, Russ Weeks <rwe...@gmail.com> wrote:
> Hi, All.
> I'm playing around with building some data structures backed by a
> conventional DHT like Chord or Kademlia.  It seems like many of the elements
> in these structures don't need a semantically useful key.  Take a linked
> list, for instance.  I need to be able to do a "get" on the head or tail of
> the list, but the keys of the interior elements don't need to be exposed to
> the caller.  This suggests that I can use random data for these keys.  My
> question is, what's the best way to generate these keys.
> The obvious approach is just to fill the key (say 128 bits) with random
> data.  I think I understand the birthday problem and how to estimate the
> probability that a collision will occur.
> An alternative approach that was proposed to me was to use 64 bits of random
> data and a 64-bit timestamp.  Here, we hope that the random number generator
> is not seeded with the system time.  The theory is, the odds of generating 2
> identical 64-bit numbers in the same nanosecond is less than the odds of
> generating 2 128-bit numbers over the lifetime of the system.  If this is
> true, it's not clear to me.
> Can anybody suggest a way to quantify the likelyhood of collision in the
> second approach?
> Thanks,
> Russ
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@lists.zooko.com
> http://lists.zooko.com/mailman/listinfo/p2p-hackers
>
>



-- 
P. Oscar Boykin                            http://boykin.acis.ufl.edu
Assistant Professor, Department of Electrical and Computer Engineering
University of Florida
_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to