I recently learned about Fibonacci hashing (aka Knuth's Multiplicative method) [1] as a way to have a power-of-two number of buckets and still get decent distribution of incoming hash function values across buckets for most real world scenarios, even when the user's hash function is itself not uniformly distributed. The key point is that multiplying by the inverse of the golden ratio, and then taking the high bits of the result with a shift, narrowing of a 32- or 64-bit hash value down to the power-of-two number of buckets can be done more efficiently than computing the remainder of a division by a prime number.
[1] https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/ But when trying to test it out in m4 sources to see if I could get faster performance by swapping how I narrow a string hash down to a bucket number, I noticed two things: hash.c is hard-coded to insisting on a prime number of buckets, AND it does so in a manner that looks unsafe: In hash.c, there is nothing guaranteeing that candidate is odd on entry: > static size_t _GL_ATTRIBUTE_PURE > compute_bucket_size (size_t candidate, const Hash_tuning *tuning) > { > if (!tuning->is_n_buckets) > { > float new_candidate = candidate / tuning->growth_threshold; > if ((float) SIZE_MAX <= new_candidate) > goto nomem; > candidate = new_candidate; > } > candidate = next_prime (candidate); But in next-prime.c, there is this contract: > /* Return true if CANDIDATE is a prime number or 1. > CANDIDATE should be an odd number >= 1. */ > static bool _GL_ATTRIBUTE_CONST > is_prime (size_t candidate) > { > size_t divisor = 3; > size_t square = divisor * divisor; > > for (;;) > { > if (square > candidate) > return true; > if ((candidate % divisor) == 0) > return false; > /* Increment divisor by 2. */ > divisor++; > square += 4 * divisor; > divisor++; > } > } which looks like it will mis-behave when handed an even number. Should hash.c be patched to ensure 'next_prime (candidate | 1)', at least as long as it insists on using a prime number of buckets with integer modulo? -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
