[ https://issues.apache.org/jira/browse/CASSANDRA-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258728#comment-15258728 ]
Robert Stupp commented on CASSANDRA-9167: ----------------------------------------- Rebased and updated the patch to address the comments. But I just removed the {{1.03}} fudge factor - otherwise, if the buckets-per-element is calculated from the rounded hash code , it ends up with a too high false-positive-ratio (sometime 20% and more). This patch also requires a change to {{KeyCacheCqlTest}}, because it checks the number requests against the key cache, which this patch reduces. Also scheduled a couple of cstar tests to check for any notable change. Just [Trades|http://cstar.datastax.com/tests/id/b87cc382-0bc3-11e6-a0f0-0256e416528f] shows a slight 5% improvement. The other tests [Read/Write|http://cstar.datastax.com/tests/id/e8e12b4e-0bc3-11e6-a0f0-0256e416528f] and [3 MV|http://cstar.datastax.com/tests/id/0c8b8fe4-0bc4-11e6-a0f0-0256e416528f] show a very marginal change, that I’d file under normal stress result variance. > Improve bloom-filter false-positive-ratio > ----------------------------------------- > > Key: CASSANDRA-9167 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9167 > Project: Cassandra > Issue Type: Improvement > Reporter: Robert Stupp > Assignee: Robert Stupp > Priority: Minor > Labels: perfomance > > {{org.apache.cassandra.utils.BloomCalculations}} performs some table lookups > to calculate the bloom filter specification (size, # of hashes). Using the > exact maths for that computation brings a better false-positive-ratio (the > maths usually returns higher numbers for hash-counts). > TL;DR increasing the number of hash-rounds brings a nice improvement. Finally > it's a trade-off between CPU and I/O. > ||false-positive-chance||elements||capacity||hash count > new||false-positive-ratio new||hash count current||false-positive-ratio > current||improvement > |0.1|10000|50048|3|0.0848|3|0.0848|0 > |0.1|100000|500032|3|0.09203|3|0.09203|0 > |0.1|1000000|5000064|3|0.0919|3|0.0919|0 > |0.1|10000000|50000064|3|0.09182|3|0.09182|0 > |0.1|100000000|500000064|3|0.091874|3|0.091874|0 > |0.01|10000|100032|7|0.0092|5|0.0107|0.1630434783 > |0.01|100000|1000064|7|0.00818|5|0.00931|0.1381418093 > |0.01|1000000|10000064|7|0.008072|5|0.009405|0.1651387512 > |0.01|10000000|100000064|7|0.008174|5|0.009375|0.146929288 > |0.01|100000000|1000000064|7|0.008197|5|0.009428|0.150176894 > |0.001|10000|150080|10|0.0008|7|0.001|0.25 > |0.001|100000|1500032|10|0.0006|7|0.00094|0.5666666667 > |0.001|1000000|15000064|10|0.000717|7|0.000991|0.3821478382 > |0.001|10000000|150000064|10|0.000743|7|0.000992|0.33512786 > |0.001|100000000|1500000064|10|0.000741|7|0.001002|0.3522267206 > |0.0001|10000|200064|13|0|10|0.0002|#DIV/0! > |0.0001|100000|2000064|13|0.00004|10|0.0001|1.5 > |0.0001|1000000|20000064|13|0.000075|10|0.000091|0.2133333333 > |0.0001|10000000|200000064|13|0.000069|10|0.000087|0.2608695652 > |0.0001|100000000|2000000064|13|0.000068|10|0.00009|0.3235294118 > If we decide to allow more hash-rounds, it could be nicely back-ported even > to 2.0 without affecting existing sstables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)