Yitong Zhou created HADOOP-11727:
------------------------------------

             Summary: Make org.hadoop.util.bloom.BloomFilter returns the 
expected false positive probability
                 Key: HADOOP-11727
                 URL: https://issues.apache.org/jira/browse/HADOOP-11727
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Yitong Zhou


When bloom filtering, sometimes it would be handy to know the current expected 
false positive rate (bitSet's cardinality / vector size)^(# of hash functions), 
so that when the FP rate is too high, we can choose to rebuild the bloomfilter 
into a larger size.

The codes would look like this:
{code}
  /*
   * Returns the expected false positive probability of the current filter.
   *
   * @return The expected false positive probability
   */
  public double expectedFalsePositiveProbability() {
    return Math.pow((double) bits.cardinality() / vectorSize, nbHash);
  }
{code}

Does this sound like a reasonable minor function that could be added into the 
code base?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to