Re: [Haskell-cafe] String Hashing

Jan-Willem Maessen Mon, 18 Jun 2007 14:17:36 -0700


On Jun 17, 2007, at 9:55 PM, Thomas Conway wrote:

Hi All,

I'm trying to figure out how to maximum performance out of one of my
inner loops which involves string hashing.

...
   mix :: Triple -> Triple

This looks like a version of the Bob Jenkins hash function from burtleburtle.net. I implemented the 32-bit version of this as follows:


mix :: Int32 -> Int32 -> Int32 -> (Int32 -> Int32 -> Int32 -> a) -> a
mix a0 b0 c0 k0 =
  let mixR k a b c = (a-b-c) `xor` (c `shiftR` k)
      mixL k b c a = (b-c-a) `xor` (a `shiftL` k)
      mix3 k1 k2 k3 k a  b  c  =
          let a' = mixR k1 a  b  c
              b' = mixL k2 b  c  a'
              c' = mixR k3 c  a' b'
          in k a' b' c'
  in  (mix3 13 8 13 $ mix3 12 16 5 $ mix3 3 10 15 $ k0) a0 b0 c0

I mention this because your code writes the whole thing out longhand---which might be faster, or might not, but certainly misses the highest-level structural patterns in the original. Your use of a data type to represent triples is probably better nowadays than my rather quirky use of CPS (in other words, this could have been a function Triple -> Triple instead of the rather odd type you see above).

That said, I assume you instrumented your code and determined that hash collisions are actually a bottleneck for you, and that a hash table is the right structure to begin with? I fell back on much- simpler multiplicative hashing schemes for Data.HashTable. A multiply is much faster than vast amounts of bit-fiddling---but of course its collision behavior isn't nearly as good and this can be a problem with large data sets. And note that the multiplicative hashing currently used in Data.HashTable doesn't require prime table sizes; in fact we use powers of two and table doubling. When last I checked the result was faster than Data.Map, but not by much, and using strings probably wipes out that advantage vs. tries.


-Jan-Willem Maessen

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] String Hashing

Reply via email to