On Tue, Jun 26, 2001 at 04:14:43PM -0400, Michael Bilow wrote:
> Fourth, the probability of a collision between your next randomly chosen
> number and any prior randomly chosen number is going to follow a Poisson
> distribution, which may not be obvious at first.  This means that the
> likelihood of collison is by no means neglible for databases of practical
> size even when using fairly big arbitrary numbers.  Exactly what does
> constitute "negligible" is a judgment call, not a technical decision.  
> Although a 128-bit hash seems unlikely to be a problem, it is certainly
> conceivable that a 64-bit hash could be.  Optimal behavior also depends
> upon the hash being properly seeded, as discussed.
> 
> In general, I think that these kinds of subtleties make using a random
> hash approach for primary key generation more trouble than it is worth.  
> If you absolutely need to eliminate a singleton constraint and you can
> afford to check for collisions, it may be worth the effort.

You can reduce the probability for random id collision by including a
time stamp. Given millisecond sampling the probability for a id clash
is reduced to 2^128 in one millisecond -- which indeed is less than
2^128 in the lifetime of the id space. The increased collision probability
in a cluster environment can be reduced by inserting a unique node id
into the id -- the ip-address e.g. or the MAC of the network interface.

Resolution:
Using a time stamp (ignoring the 2038 problem ;-)) can reduce the probability
of id collisiong of 128 random ids to a level which is IMHO sufficient for
most production environments.

Any would be comment appreciated.

-billy.

-- 
Philipp Meier                              o-matic GmbH
Geschäftsführer                  Pfarrer-Weiß-Weg 16-18
Tel.: +49-(0)700-66284236                     89077 Ulm

PGP signature

Reply via email to