Re: Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps

Doug Lea Mon, 28 May 2012 12:35:01 -0700

On 05/25/12 21:43, Ulf Zibis wrote:

To me it seems, that computing the murmur hash is more expensive, especially on
short strings, than the old hash algorithm.


It is definitely slower, but also definitely has better statistical
properties (fewer collisions when used in hash tables).

In its original (C) context, Murmur hash is often about as fast as other
string hashes, because it operates on 4byte words rather than 1-byte
elements at a time. So even though the per-element cost is much higher, it
takes 1/4 the steps. When done on Java char[]'s though, it can only
process 2 chars at a time. (Plus, we cannot necessarily read 32bitsat a
time because of possible byteswapping.) So it is a struggle to make
it only twice as slow.

This means that any application that hashes strings only once will
be slower than using (old) hashCode. but any application that
uses the (cached) hashes more than a few times will tend to run
faster than old hashCode version because of higher quality hash codes.
A few tests so far confirm this.

Because performance is so sensitive to this re-use factor, it is hard
to test expected performance impact without other JVM/JDK changes.
To get a better handle on this, I put together an emulation
(using Unsafe) that stashes murmur3 hash in current String.hash
field. This is not exactly correct for String objects that might be
concurrently hashed by different threads (one via the hash32 emulation,
one via plain hashCode), but might be useful for people to try
in contexts where this can't happen. See code at
http://gee.cs.oswego.edu/dl/wwwtmp/Althash.java

So I think, a developer should have an influence on the hash algorithm to be
used by a hash map,


I've had to make a lot of internal adjustments in hash tables to
counteract crummy user hashCode() algorithms over the years,
so I think that the less control, the better :-)

-Doug

Re: Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps

Reply via email to