"Dave Sill" <[EMAIL PROTECTED]> writes:

> Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
> 
> >If the input numbers are truly random, then a modulos hash will
> >distribute well whether or not the hash size is prime.
> >
> >However, if the input numbers are not truly random, then a modulos
> >hash may pick out some regularity in the input, and preferentially
> >hash to a given set of buckets.
> 
> If the input numbers are not fairly random, then a modulo hash is not
> a choice.

Sure it is.  A modulos hash works fine if the input numbers are fairly
random with respect to the hash size.  That doesn't imply that the
input numbers are random in any real size.

> >For a trivial example, if the numbers
> >tend to be even, then an even modulos hash will tend toward using the
> >even numbered buckets.
> 
> Which, unfortunately, wouldn't be helped by a prime table size.

I guess one of us misunderstands the other.  My point is that if the
input numbers and the table size tend to have a divisor greater than
one, then the distribution will be skewed.  Using a prime number
minimizes the number of divisors greater than one which may be shared
by the table size and the input numbers.

> >A prime modulos hash minimizes the types of
> >regularity which will lead to a poor hash distribution.
> 
> Exactly how does a prime modulus help? Can you give an example?

Suppose the input numbers are 2 4 6 8 10 12.  Suppose the hash size is
8.  Then the buckets are 2 4 6 0 2 4.  Note the bad distribution.
Suppose the hash size is 7.  Then the buckers are 2 4 6 1 3 5.  Note
the good distribution.

Here's another way to say it.  For any given hash size, call a series
of inputs which lead to a bad distribution B.  Consider the set of all
bad inputs {B}.  A prime hash size minimizes the number of elements in
{B}.  Therefore, if you don't know anything about the inputs--in
particular, if you don't know if they are random--it is better to
choose a prime hash size.

Ian

Reply via email to