Re: getHash inconsistency

Sean Kelly Thu, 22 Mar 2012 12:09:34 -0700

On Mar 22, 2012, at 11:18 AM, H. S. Teoh wrote:

> On Wed, Mar 21, 2012 at 04:35:11PM +1100, Daniel Murphy wrote:
>> "H. S. Teoh" <hst...@quickfur.ath.cx> wrote in message 
>> news:mailman.951.1332306541.4860.digitalmar...@puremagic.com...
>>> 
>>> Here's the current hashing code for char[] and string:
>>> 
>>>       foreach (char c; s)
>>>           hash = hash * 11 + c;
>>> 
>>> For const(char)[], it's rt.util.hash.hashOf, which is Paul Hsieh's
>>> SuperFastHash algorithm with very good hash distribution properties.
>>> It does seem to involve a lot more operations that the simple loop
>>> above, though; so I assume the above simple loop was chosen because
>>> hashing strings are a very common operation and, for the most part,
>>> only need a simple hash function.
>>> 
>>> So I'm kinda leaning towards SuperFastHash, but worried about
>>> whether it will cause performance degradation, in which case we
>>> should stick with the simple loop.
> [...]
>> Benchmark time! 
> 
> Alright, so after some benchmarking, I found that the above custom hash
> function works best for *short* (3 to 10 character) randomized
> alphabetic strings (I assumed alphabetic to be the typical use case of
> strings). It's faster than SuperFastHash, and even has better
> distribution properties, probably because SuperFastHash is tuned for
> arbitrary binary data of arbitrary length, whereas the custom function
> is tuned for short string-like data.
> 
> With longer strings, SuperFastHash beats the custom algorithm, and
> distribution properties are approximately the same.
> 
> So I'm still on the fence about which algorithm is better. I can see why
> the custom hash was adopted for strings, since your typical AA tends to
> have short alphabetic keys, and something like SuperFastHash is probably
> overkill. But for longer keys, SuperFastHash is better.


Might be worth plugging in MurmurHash for comparison.

Re: getHash inconsistency

Reply via email to