On Fri, 22 Jun 2001, Dave Sill wrote:

>    i = fmt_ulong(s,id % auto_split); len += i; if (s) s += i;
> 
> I can't see that primality would do anything special here.

I am going to give the secret away right here to save your time. :)

It is obvious the distribution of B(i) = A(i) mod k will be very
non-uniform (i.e. very suboptimal when the values of B(i) are used as
hash values) when A(i) or its large subsequences can be expressed
as A(i+1) = A(i) + p such that gcd(p, k) > 1.

Our A(i)'s are inode numbers assigned to files holding the contents of
queued messages (queue/mess/*/*). An average implementation of a unix-like
filesystem assigns inode numbers less or more sequentially, i.e. the next
assigned number is often the last one plus 1. Of course, if we had
A(i+1) = A(i) + 1, there would be no problem. Alas, there are 2, 3, or
even 4 (or perhaps even more) files per a message in the qmail queue:
there are files in mess, as well as files in info, local, and remote.
This means A(i)'s are often incremented by 2, 3, or 4, and the risk of
uneven distribution of B(i)'s is rather high.

Using a prime number for k eliminates this risk for a very small price.
After all the set of primes is quite dense for reasonable values (<1000)
and you can always find a prime close to the number you want to use.

> However, the default, 23, is prime, and in his only message to the
> list on the topic of conf-split, DJB suggested a value of 401, also
> prime, for a queue with 100000 entries:

401 is quite close to sqrt(100000). It means the sizes of the 1st and 2nd
level are quite balanced. But I can only guess why DJB suggested 401
rather than any prime closer to that square root.

> Why would DJB use primes if they weren't necessary? He uses round
> numbers elsewhere (concurrencies, for example), so I don't think he
> just likes them.

There is a notation based on the (unique) factorization of numbers.
1 is (0) (or ()), 2 is (1), 3 is (0,1), 6 is (1,1) etc. It has some
interesting properties: like an incredible ease of multiplication (and an
incredible difficulty of addition). Prime numbers are the *round* ones in
this notation. :)


On Fri, 22 Jun 2001, Dave Sill wrote:

> Charles Cazabon <[EMAIL PROTECTED]> wrote:
> >
> >It does -- a large series of random numbers, modulo some number I, will result
> >in an even distribution of results if and only if I is prime.  If I isn't
> >prime, the results are skewed noticeably towards the low end.
> 
> Hmm.
> 
> On first reading that, I didn't believe it. I couldn't imagine how
> the primality of the divisor could "magically" guarantee an even
> distribution.

Indeed. Given a source A(i) of natural numbers uniformly distributed in
a given interval [0, N-1] (lrand48() is *supposed* to be such a source
for N = 2^31), it is trivial to show that the distribution D_B(x) of
B(i) = A(i) mod k for any natural k > 0 is as follows:

             / ceil(N/k)/N  if x < N mod k
   D_B(x) = <                                   for each x \in [0, k-1]
             \ floor(N/k)/N if x >= N mod k

You can see the results are skewed but for k << N, prime or not, the skew
is negligible and the resulting distribution is almost uniform. A test
showing anything different demonstrates nothing but an imperfection in
your PRNG or a flaw in the test itself.

Profiling may beat speculation but mathematical reasoning beats profiling
anytime. :)


On Fri, 22 Jun 2001, Dave Sill wrote:

> BTW, I modified my modhash program to read numbers from stdin, fed it
> lists of real, live inode numbers, and guess what? It still makes no
> difference whether you use a prime hash or not.

What "real, live inode numbers"? Have you picked the inode numbers of
messages stuffed in queue/mess/*? With all due respect, I doubt it.
Any profiling is pointless when you test something different than you
intented to.

If you still feel the need, feed your qmail with a large number of
messages that will get stuck in the queue (there are four important
cases: 1. qmail-send is not running, 2. local deliveries keep failing
temporarily, 3. remote deliveries keep failing, 4. both local and
remote deliveries keep failing) and test the inode numbers of files in
queue/mess/*.

And of course, you should repeat the test for different supported 
unix-like systems.

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."


Reply via email to