[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073207#comment-13073207 ]
Brian Lindauer commented on CASSANDRA-2975: ------------------------------------------- Surprising, but yes. It's dramatically faster. The MurmurHash author reports a 50% speedup over v2 at http://code.google.com/p/smhasher/wiki/MurmurHash3. I ran my own simple benchmark on the Java version comparing the existing MurmurHash.hash64() function to the MurmurHash.hash3_x64_128() I added and found an even larger advantage. The improvement is so huge that I wonder a little bit if there isn't a flaw in my test, but here it is: {code:java} start = System.currentTimeMillis(); long[] reta = {0, 0}; ByteBuffer buf = strToByteBuffer(key); for (int i=0; i<cnt; i++) { buf.clear(); reta = MurmurHash.hash3_x64_128(buf, 0, key.length(), (int) reta[0]); } end = System.currentTimeMillis(); System.err.println("Ran v3 " + cnt + " times in " + (end - start) + " ms."); {code} Similarly for v2. Output: {code} Ran v2 100000000 times in 19993 ms. Ran v3 100000000 times in 3104 ms. {code} FWIW, I also ran some tests where I generated random strings and seeds and submitted them to both the reference implementation and the Java port and found no differences. > Upgrade MurmurHash to version 3 > ------------------------------- > > Key: CASSANDRA-2975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.8.3 > Reporter: Brian Lindauer > Priority: Trivial > > MurmurHash version 3 was finalized on June 3. It provides an enormous speedup > and increased robustness over version 2, which is implemented in Cassandra. > Information here: > http://code.google.com/p/smhasher/ > The reference implementation is here: > http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136 > I have already done the work to port the (public domain) reference > implementation to Java in the MurmurHash class and updated the BloomFilter > class to use the new implementation: > https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 > Apart from the faster hash time, the new version only requires one call to > hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira