[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3

David Allsopp (Commented) (JIRA) Fri, 18 Nov 2011 12:47:16 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153112#comment-13153112
 ]


David Allsopp commented on CASSANDRA-2975:
------------------------------------------

The benchmark does several rounds of warmup for each iteration (i.e. for each 
buffer size from 1 to 32 bytes). 

It reduces the number of iterations as the input buffer size grows, so that 
each run processes a similar number of bytes - though this is probably 
irrelevant since the performance seems fairly constant with respect to buffer 
size.

{noformat}
Running test for buffer lengths from 1 to 32
         *|    Ratio: 0.96 for keylength 1 iterations=100000000
         *|    Ratio: 0.95 for keylength 2 iterations=50000000
         *|    Ratio: 0.95 for keylength 3 iterations=33333333
         *|    Ratio: 0.96 for keylength 4 iterations=25000000
         *|    Ratio: 0.94 for keylength 5 iterations=20000000
         *|    Ratio: 0.94 for keylength 6 iterations=16666666
         *|    Ratio: 0.96 for keylength 7 iterations=14285714
         *|    Ratio: 0.93 for keylength 8 iterations=12500000
        * |    Ratio: 0.89 for keylength 9 iterations=11111111
         *|    Ratio: 0.93 for keylength 10 iterations=10000000
         *|    Ratio: 0.95 for keylength 11 iterations=9090909
         *|    Ratio: 0.95 for keylength 12 iterations=8333333
         *|    Ratio: 0.93 for keylength 13 iterations=7692307
         *|    Ratio: 0.90 for keylength 14 iterations=7142857
         *|    Ratio: 0.95 for keylength 15 iterations=6666666
        * |    Ratio: 0.86 for keylength 16 iterations=6250000
        * |    Ratio: 0.87 for keylength 17 iterations=5882352
         *|    Ratio: 0.91 for keylength 18 iterations=5555555
        * |    Ratio: 0.83 for keylength 19 iterations=5263157
        * |    Ratio: 0.83 for keylength 20 iterations=5000000
        * |    Ratio: 0.80 for keylength 21 iterations=4761904
        * |    Ratio: 0.88 for keylength 22 iterations=4545454
         *|    Ratio: 0.91 for keylength 23 iterations=4347826
         *|    Ratio: 0.91 for keylength 24 iterations=4166666
        * |    Ratio: 0.88 for keylength 25 iterations=4000000
         *|    Ratio: 0.92 for keylength 26 iterations=3846153
        * |    Ratio: 0.85 for keylength 27 iterations=3703703
        * |    Ratio: 0.88 for keylength 28 iterations=3571428
        * |    Ratio: 0.88 for keylength 29 iterations=3448275
        * |    Ratio: 0.89 for keylength 30 iterations=3333333
         *|    Ratio: 0.92 for keylength 31 iterations=3225806
--------
Old (ms): 18938
New (ms): 17470
Overall ratio: 0.9224838948146583
{noformat}

i.e. 8% improvement on average.


                
> Upgrade MurmurHash to version 3
> -------------------------------
>
>                 Key: CASSANDRA-2975
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brian Lindauer
>            Assignee: Brian Lindauer
>            Priority: Trivial
>              Labels: lhf
>             Fix For: 1.1
>
>         Attachments: 
> 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 
> 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch, 
> Murmur3Benchmark.java
>
>
> MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
> and increased robustness over version 2, which is implemented in Cassandra. 
> Information here:
> http://code.google.com/p/smhasher/
> The reference implementation is here:
> http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136
> I have already done the work to port the (public domain) reference 
> implementation to Java in the MurmurHash class and updated the BloomFilter 
> class to use the new implementation:
> https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
> Apart from the faster hash time, the new version only requires one call to 
> hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3

Reply via email to