Joe Orton <[EMAIL PROTECTED]> writes: > RSE has done the maths on why *33 is a magic string hashing function... > this from his "str" library in str_hash.c:
Actually, the same huge comment also appears right above the hash computation in the current apr_hash.c. (So the patch to change that computation should change or delete the comment too, not that it matters at the moment since said patch hasn't been applied.) -K > /* > ** Str - String Library > ** Copyright (c) 1999-2000 Ralf S. Engelschall <[EMAIL PROTECTED]> > ** > ** This file is part of Str, a string handling and manipulation > ** library which can be found at http://www.engelschall.com/sw/str/. > ** > ** Permission to use, copy, modify, and distribute this software for > ** any purpose with or without fee is hereby granted, provided that > ** the above copyright notice and this permission notice appear in all > ** copies. > ** > ** THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED > ** WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF > ** MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. > ** IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR > ** CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > ** SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > ** LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF > ** USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > ** ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, > ** OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > ** OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > ** SUCH DAMAGE. > ** > ** str_hash.c: hashing functions > */ > > #include "str_p.h" > > /* > * DJBX33A (Daniel J. Bernstein, Times 33 with Addition) > * > * This is Daniel J. Bernstein's popular `times 33' hash function as > * posted by him years ago on comp.lang.c. It basically uses a function > * like ``hash(i) = hash(i-1) * 33 + string[i]''. This is one of the > * best hashing functions for strings. Because it is both computed very > * fast and distributes very well. > * > * The magic of the number 33, i.e. why it works better than many other > * constants, prime or not, has never been adequately explained by > * anyone. So I try an own RSE-explanation: if one experimentally tests > * all multipliers between 1 and 256 (as I did it) one detects that > * even numbers are not useable at all. The remaining 128 odd numbers > * (except for the number 1) work more or less all equally well. They > * all distribute in an acceptable way and this way fill a hash table > * with an average percent of approx. 86%. > * > * If one compares the Chi/2 values resulting of the various > * multipliers, the 33 not even has the best value. But the 33 and a > * few other equally good values like 17, 31, 63, 127 and 129 have > * nevertheless a great advantage over the remaining values in the large > * set of possible multipliers: their multiply operation can be replaced > * by a faster operation based on just one bit-wise shift plus either a > * single addition or subtraction operation. And because a hash function > * has to both distribute good and has to be very fast to compute, those > * few values should be preferred and seems to be also the reason why > * Daniel J. Bernstein also preferred it. > */ > > ...