[ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073207#comment-13073207
 ] 

Brian Lindauer commented on CASSANDRA-2975:
-------------------------------------------

Surprising, but yes. It's dramatically faster. The MurmurHash author reports a 
50% speedup over v2 at http://code.google.com/p/smhasher/wiki/MurmurHash3. I 
ran my own simple benchmark on the Java version comparing the existing 
MurmurHash.hash64() function to the MurmurHash.hash3_x64_128() I added and 
found an even larger advantage. The improvement is so huge that I wonder a 
little bit if there isn't a flaw in my test, but here it is:

{code:java}
start = System.currentTimeMillis();
long[] reta = {0, 0};
ByteBuffer buf = strToByteBuffer(key);
for (int i=0; i<cnt; i++)
{
  buf.clear();
  reta = MurmurHash.hash3_x64_128(buf, 0, key.length(), (int) reta[0]);
}
end = System.currentTimeMillis();
System.err.println("Ran v3 " + cnt + " times in " + (end - start) + " ms.");
{code}

Similarly for v2.

Output:
{code}
Ran v2 100000000 times in 19993 ms.
Ran v3 100000000 times in 3104 ms.
{code}

FWIW, I also ran some tests where I generated random strings and seeds and 
submitted them to both the reference implementation and the Java port and found 
no differences.

> Upgrade MurmurHash to version 3
> -------------------------------
>
>                 Key: CASSANDRA-2975
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.3
>            Reporter: Brian Lindauer
>            Priority: Trivial
>
> MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
> and increased robustness over version 2, which is implemented in Cassandra. 
> Information here:
> http://code.google.com/p/smhasher/
> The reference implementation is here:
> http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136
> I have already done the work to port the (public domain) reference 
> implementation to Java in the MurmurHash class and updated the BloomFilter 
> class to use the new implementation:
> https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
> Apart from the faster hash time, the new version only requires one call to 
> hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to