Hi all, While playing around with Peter E.'s unicode normalization patch [1], I found that HEAD failed to build a perfect hash function for any of the four sets of 4-byte keys ranging from 1k to 17k in number. It probably doesn't help that codepoints have nul bytes and often cluster into consecutive ranges. In addition, I found that a couple of the candidate hash multipliers don't compile to shift-and-add instructions, although they were chosen with that intent in mind. It seems compilers will only do that if the number is exactly 2^n +/- 1.
Using the latest gcc and clang, I tested all prime numbers up to 5000 (plus 8191 for good measure), and found a handful that are compiled into non-imul instructions. Dialing back the version, gcc 4.8 and clang 7.0 are the earliest I found that have the same behavior as newer ones. For reference: https://gcc.godbolt.org/z/bxcXHu In addition to shift-and-add, there are also a few using lea, lea-and-add, or 2 leas. Then I used the attached program to measure various combinations of compiled instructions using two constant multipliers iterating over bytes similar to a generated hash function. <cc> -O2 -Wall test-const-mult.c test-const-mult-2.c ./a.out Median of 3 with clang 10: lea, lea 0.181s lea, lea+add 0.248s lea, shift+add 0.251s lea+add, shift+add 0.273s shift+add, shift+add 0.276s 2 leas, 2 leas 0.290s shift+add, imul 0.329s Taking this with a grain of salt, it nonetheless seems plausible that a single lea could be faster than any two instructions here. The only primes that compile to a single lea are 3 and 5, but I've found those multipliers can build hash functions for all our keyword lists, as demonstration. None of the others we didn't have already are particularly interesting from a performance point of view. With the unicode quick check, I found that the larger sets need (257, 8191) as multipliers to build the hash table, and none of the smaller special primes I tested will work. Keeping these two properties in mind, I came up with the scheme in the attached patch that tries adjacent pairs in this array: (3, 5, 17, 31, 127, 257, 8191) so that we try (3,5) first, next (5,17), and then all the pure shift-and-adds with (257,8191) last. The main motivation is to be able to build the unicode quick check tables, but if we ever use this functionality in a hot code path, we may as well try to shave a few more cycles while we're at it. [1] https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8ca...@2ndquadrant.com -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/tools/PerfectHash.pm b/src/tools/PerfectHash.pm index 74fb1f2ef6..4a029621b2 100644 --- a/src/tools/PerfectHash.pm +++ b/src/tools/PerfectHash.pm @@ -78,18 +78,25 @@ sub generate_hash_function # Try different hash function parameters until we find a set that works # for these keys. The multipliers are chosen to be primes that are cheap - # to calculate via shift-and-add, so don't change them without care. + # to calculate via lea or shift, so don't change them without care. + # The list of multipliers is designed such that by iterating through + # adjacent pairs, we first try a couple combinations where at least one + # multiplier is compiled to a single lea (3 or 5). Also, we eventually + # want to include the largest numbers in our search for the sake of + # finicky key sets such as a large number of unicode codepoints. # (Commonly, random seeds are tried, but we want reproducible results # from this program so we don't do that.) - my $hash_mult1 = 31; + my @HASH_MULTS = (3, 5, 17, 31, 127, 257, 8191); + my $hash_mult1; my $hash_mult2; my $hash_seed1; my $hash_seed2; my @subresult; FIND_PARAMS: - foreach (127, 257, 521, 1033, 2053) + foreach my $i (0 .. $#HASH_MULTS - 1) { - $hash_mult2 = $_; # "foreach $hash_mult2" doesn't work + $hash_mult1 = $HASH_MULTS[$i]; + $hash_mult2 = $HASH_MULTS[$i+1]; for ($hash_seed1 = 0; $hash_seed1 < 10; $hash_seed1++) { for ($hash_seed2 = 0; $hash_seed2 < 10; $hash_seed2++)