[ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150931#comment-13150931
 ] 

David Allsopp commented on CASSANDRA-2975:
------------------------------------------

Does anyone have any data on what the typical key size is (i.e. the average 
input size for the hash)?

I have a couple of optimisations for the MurmurHash3 implementation that I 
think give another 10-40% speedup, particularly for smaller values (e.g. 30% 
speedup for buffer lengths under 256 bytes) and no worse for large values (tens 
of KB). These results were on a AMD Phenom II X6 1055T @ 2.80 GHz, under 64-bit 
Windows 7, Java 1.6.0_27.

Firstly, inline the {rotl64}} calls , e.g. so that
{noformat}
k1 = rotl64(k1, 31);
{noformat}
becomes
{noformat}
k1 = (k1 << 31) | (k1 >>> 33);
{noformat}

Secondly, rather than a large {{switch-case}} to handle the 'tail', use nested 
{{if-else}} to form a simple binary search. Particularly for relatively small 
inputs, handling the 'tail' is a significant part of the computation. E.g:

{noformat}
int ln = length & 15;
if (ln > 8)
  {
     if (ln > 12)
       {
          // etc for cases 13 - 15
       }
     else
       {
          // cases 11 and 12
       }

  }
else
  {
     // etc for cases 1-7
  }
{noformat}

Will try to post a proper benchmark when I've tidied it up (run out of time 
today!) so anyone interested can try it on other hardware...

This latter optimisation is pretty verbose and ugly to look at - it _might_ be 
just as fast, and much more concise, to lookup the offsets and shifts from an 
array, and deal with the special cases 1 and 9 as, well, special cases - but 
haven't benchmarked this alternative yet...
                
> Upgrade MurmurHash to version 3
> -------------------------------
>
>                 Key: CASSANDRA-2975
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brian Lindauer
>            Assignee: Brian Lindauer
>            Priority: Trivial
>              Labels: lhf
>             Fix For: 1.1
>
>         Attachments: 
> 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 
> 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch
>
>
> MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
> and increased robustness over version 2, which is implemented in Cassandra. 
> Information here:
> http://code.google.com/p/smhasher/
> The reference implementation is here:
> http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136
> I have already done the work to port the (public domain) reference 
> implementation to Java in the MurmurHash class and updated the BloomFilter 
> class to use the new implementation:
> https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
> Apart from the faster hash time, the new version only requires one call to 
> hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to