On Nov 1, 2012, at 16:14, John Gilmore wrote:

> If a hashing scheme is working well there is almost no clustering.
> Suppose we divide by 17, a prime, i.e., use it, in the jargon, as our
> hashing modulus..  Remainders will have one of the 17 values
>
> 0, 1, 2, . . . , 16.
>
> Then some goodly number of hashing operations the same or about the
> same number of of the hash values 0, 1, 2, . . . , 16 are generated,
> clustering does not occur.
>
> For concreteness, suppose we do 170 divisions.  Then if clustering
> does not occur there are about ten remainders having the value 0,
> about 10 having the value 1, about 10 having the value 2, etc., etc.
>
> What happens when the divisor used is composite is that hash values
> that are prime factors of the divisor occur more frequently than
> others.
>
> For 36 we have 36 = 2 x 2 x 3 x 3, which is usually written 2^2 x 3^2
> or 2**2 x 3**2.  Its prime factors are 2 and 3; and when it is used as
> a divisor there are more remainders having the value 2 and the value 3
> than there are having other pairs of values.
>
> 37, on the other hand, is prime, divisible only by 1 and itself.  Its
> use as a divisor yields no clustering of remainders.
>
> Never hesitate to ask notional gurus such questions.  A request for a
> further explanation is always in order.
>
I'm missing something here; most likely an unstated assumption.
My intuition agrees with test program making a histogram of
consecutive integers mod 3:

SPPG@MVS3:139$ cat foomod
/* Rexx */ signal on novalue;  /*
   Doc: Demonstrate hash clustering.
*/
Counts. = 000      /* Set all counters to 0.  */

do I = 1 to 99999  /* Generate some samples.  */
    R = I // 36    /* mod 36                  */
    Counts.R = Counts.R + 1;  end

do I = 0 to 35     /* Display counters.       */
    say 'Counts.'right(I, 2, 0 ) '=' right( Counts.I, 4 );  end

Which produces output:

SPPG@MVS3:140$ foomod
Counts.00 = 2777
Counts.01 = 2778
Counts.02 = 2778
Counts.03 = 2778
Counts.04 = 2778
Counts.05 = 2778
Counts.06 = 2778
Counts.07 = 2778
Counts.08 = 2778
Counts.09 = 2778
Counts.10 = 2778
Counts.11 = 2778
Counts.12 = 2778
Counts.13 = 2778
Counts.14 = 2778
Counts.15 = 2778
Counts.16 = 2778
Counts.17 = 2778
Counts.18 = 2778
Counts.19 = 2778
Counts.20 = 2778
Counts.21 = 2778
Counts.22 = 2778
Counts.23 = 2778
Counts.24 = 2778
Counts.25 = 2778
Counts.26 = 2778
Counts.27 = 2778
Counts.28 = 2777
Counts.29 = 2777
Counts.30 = 2777
Counts.31 = 2777
Counts.32 = 2777
Counts.33 = 2777
Counts.34 = 2777
Counts.35 = 2777

Here I see no evidence of clustering.  Perhaps I should try it with
random rather than consecutive integers.  But if it were to show
clustering I'd be more inclined to consider it an indictment of the
PRNG than of the modulo 36 hash.

-- gil

Reply via email to