Improve Bloom filter efficiency
-------------------------------

                 Key: CASSANDRA-1220
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1220
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Coda Hale
            Priority: Minor
         Attachments: bloom-filter-calculation-correction.patch

Attached is a patch which resolves two issues:

First, the table of false positives for a given number of bits per element and 
a given number of hashes in {{BloomCalculations}} is incomplete. This patch 
includes the calculations from Table 5 of [Bloom Filters - The 
Math|http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html#327], which 
is referenced in the documentation at the source of the table. This changes the 
optimal number of hashes for the Bloom filters created by {{SSTableWriter}} 
(with 15 bits per element) from 8 to 10, with a reduction in the false positive 
rate from 0.000852 to 0.000744. (The patch also extends the table to include 
false positive rates for 16-20 bits per element for future reference.)

Second, the values of {{BloomCalculations.optKPerBuckets}} were not correct, in 
that some of them did not choose the optimal number of hashes for a given 
number of bits per element, probably due to transcription errors in the past. 
This patch replaces that series of constants with an array of optimal hashes 
generated during {{BloomCalculation}}'s static initialization.

This patch should apply cleanly to the {{0.6}} branch and {{trunk}} with all 
tests passing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to