"Dave Sill" <[EMAIL PROTECTED]> writes:

> You're right. The "hashing" used here is a simple modulo. From
> fmtqfn.c:
> 
>    i = fmt_ulong(s,id % auto_split); len += i; if (s) s += i;
> 
> I can't see that primality would do anything special here.
> 
> However, the default, 23, is prime, and in his only message to the
> list on the topic of conf-split, DJB suggested a value of 401, also
> prime, for a queue with 100000 entries:
> 
>   http://www.ornl.gov/its/archives/mailing-lists/qmail/1997/07/msg00295.html
> 
> Why would DJB use primes if they weren't necessary? He uses round
> numbers elsewhere (concurrencies, for example), so I don't think he
> just likes them.
> 
> So...anyone who still thinks conf-split must/should be prime... Could
> you explain why?

If the input numbers are truly random, then a modulos hash will
distribute well whether or not the hash size is prime.

However, if the input numbers are not truly random, then a modulos
hash may pick out some regularity in the input, and preferentially
hash to a given set of buckets.  For a trivial example, if the numbers
tend to be even, then an even modulos hash will tend toward using the
even numbered buckets.  A prime modulos hash minimizes the types of
regularity which will lead to a poor hash distribution.

Unix file system inode numbers are not truly random.  Therefore, it's
wise to choose a prime conf-split.

Ian

Reply via email to