Dave Sill <[EMAIL PROTECTED]> wrote:
> 
> The first thing I did was Google for "hash prime modulo even
> distribution". That turns up many repetitions of Charles' assertion,
> without proof or explanation.
[...]
> Being a ``Profile, don't speculate'' kind of guy, I decided to write a
> little program to test modulo hashes, which is attached to this
> message for your entertainment.
> 
> The result is that I can't see any effect of primality of the hash
> table size on distribution.

Funny, I can reproduce it easily.  Sure your random numbers are random?
With the attached Python script (it depends on the popular stats.py and
pstat.py modules), analyzing 250000 15-bit random integers (read from a text
file, one per line), I get the following:

[charlesc@charon personal]$ ./buckets.py 12 13
A:  12 buckets
  count:          250000
  mean:       20833.3333
  std. dev.:  196.3153

B:  13 buckets
  count:          250000
  mean:       19230.7692
  std. dev.:  103.8646


The effect does appear to diminish significantly for large values, though.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon                            <[EMAIL PROTECTED]>
GPL'ed software available at:  http://www.qcc.sk.ca/~charlesc/software/
-----------------------------------------------------------------------
#!/usr/bin/python

import whrandom
import string
import stats
import sys

ints = map (int, map (string.strip, open ('randints.txt').readlines ()))


#######################################
def fill_buckets (numbuckets):
        buckets = [0] * numbuckets
        for i in ints:
                x = i % numbuckets
                buckets[x] = buckets[x] + 1
        return buckets
                
#######################################
def main (a, b):
        A = fill_buckets (a)
        B = fill_buckets (b)

        for L, name in ((A, 'A'), (B, 'B')):
                sum = reduce (lambda a, b:  a+b, L)
                mean = stats.mean (L)
                dev = stats.stdev (L)
                print '%s:  %i buckets' % (name, len (L))
                print '  count:      %10i' % sum
                print '  mean:       %7.04f' % mean
                print '  std. dev.:  %7.04f' % dev
                print
        

#######################################
if __name__ == '__main__':
        a = int (sys.argv[1])
        b = int (sys.argv[2])
        main (a, b)

Reply via email to