On 2001-06-27 at 15:24 +0200, Philipp Meier wrote:
> On Tue, Jun 26, 2001 at 04:14:43PM -0400, Michael Bilow wrote:
> > Fourth, the probability of a collision between your next randomly chosen
> > number and any prior randomly chosen number is going to follow a Poisson
> > distribution, which may not be obvious at first. This means that the
> > likelihood of collison is by no means neglible for databases of practical
> > size even when using fairly big arbitrary numbers. Exactly what does
> > constitute "negligible" is a judgment call, not a technical decision.
> > Although a 128-bit hash seems unlikely to be a problem, it is certainly
> > conceivable that a 64-bit hash could be. Optimal behavior also depends
> > upon the hash being properly seeded, as discussed.
> >
> > In general, I think that these kinds of subtleties make using a random
> > hash approach for primary key generation more trouble than it is worth.
> > If you absolutely need to eliminate a singleton constraint and you can
> > afford to check for collisions, it may be worth the effort.
>
> You can reduce the probability for random id collision by including a
> time stamp. Given millisecond sampling the probability for a id clash
> is reduced to 2^128 in one millisecond -- which indeed is less than
> 2^128 in the lifetime of the id space. The increased collision probability
> in a cluster environment can be reduced by inserting a unique node id
> into the id -- the ip-address e.g. or the MAC of the network interface.
>
> Resolution:
> Using a time stamp (ignoring the 2038 problem ;-)) can reduce the probability
> of id collisiong of 128 random ids to a level which is IMHO sufficient for
> most production environments.
>
> Any would be comment appreciated.
I am not sure I understand you at all. If you mean to suggest that the
primary key could be constructed by concatenating a 64-bit timestamp with
a 128-bit hash, I suppose that is actually true. However, this would have
the side effect of making primary keys 196 bits long. Another possibility
to keep the key at 128 bits would be to use some arbitrary part of the
hash output, choosing 64 bits of the hash to concatenate with the 64-bit
timestamp. This, however, would likely INCREASE the probability of
collision significantly while introducing platform dependence.
As I pointed out in a separate message, Java has millisecond _resolution_
but this is not the same thing as millisecond _accuracy_ against real
world time. For example, on the IBM S/390 mainframe, time of day is only
accurate to a full second, so you see time with millisecond resolution
stepping 0.000, 1.000, 2.000, and so on. Even on Windows, danch said that
he sees time moving in 10ms steps, so using a millisecond timestamp is
just going to throw away about three bits of keyspace.
Java also has no way to represent time in a convenient 64-bit format.
The closest you can get is with java.lang.System.currentTimeMillis(), as
far as I know,which has an accessor that returns a long. However, you
lose one bit on the high side because a time stamp will never be negative,
and you lose some bits on the low side if your platform does not report
time with 1ms accuracy. As a practical matter, several of the bits on the
high side will change so slowly that they will be fixed over the lifetime
of the database, so they are also effectively removed from the keyspace
and thrown away.
If, on the other hand, you mean to concatenate the timestamp with the seed
for the hash, this makes no mathematical difference in the probability of
collision. The main design goal of a hash, especially a cryptographic
hash, it to provide fairly wide distribution of outputs with minimally
different inputs, so there will be essentially no correlation between the
seeds or initializers and the hash output.
-- Mike
_______________________________________________
JBoss-user mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/jboss-user