On Mar 22, 2012, at 11:18 AM, H. S. Teoh wrote: > On Wed, Mar 21, 2012 at 04:35:11PM +1100, Daniel Murphy wrote: >> "H. S. Teoh" <hst...@quickfur.ath.cx> wrote in message >> news:mailman.951.1332306541.4860.digitalmar...@puremagic.com... >>> >>> Here's the current hashing code for char[] and string: >>> >>> foreach (char c; s) >>> hash = hash * 11 + c; >>> >>> For const(char)[], it's rt.util.hash.hashOf, which is Paul Hsieh's >>> SuperFastHash algorithm with very good hash distribution properties. >>> It does seem to involve a lot more operations that the simple loop >>> above, though; so I assume the above simple loop was chosen because >>> hashing strings are a very common operation and, for the most part, >>> only need a simple hash function. >>> >>> So I'm kinda leaning towards SuperFastHash, but worried about >>> whether it will cause performance degradation, in which case we >>> should stick with the simple loop. > [...] >> Benchmark time! > > Alright, so after some benchmarking, I found that the above custom hash > function works best for *short* (3 to 10 character) randomized > alphabetic strings (I assumed alphabetic to be the typical use case of > strings). It's faster than SuperFastHash, and even has better > distribution properties, probably because SuperFastHash is tuned for > arbitrary binary data of arbitrary length, whereas the custom function > is tuned for short string-like data. > > With longer strings, SuperFastHash beats the custom algorithm, and > distribution properties are approximately the same. > > So I'm still on the fence about which algorithm is better. I can see why > the custom hash was adopted for strings, since your typical AA tends to > have short alphabetic keys, and something like SuperFastHash is probably > overkill. But for longer keys, SuperFastHash is better.
Might be worth plugging in MurmurHash for comparison.