Yitong Zhou created HADOOP-11727: ------------------------------------ Summary: Make org.hadoop.util.bloom.BloomFilter returns the expected false positive probability Key: HADOOP-11727 URL: https://issues.apache.org/jira/browse/HADOOP-11727 Project: Hadoop Common Issue Type: Improvement Reporter: Yitong Zhou
When bloom filtering, sometimes it would be handy to know the current expected false positive rate (bitSet's cardinality / vector size)^(# of hash functions), so that when the FP rate is too high, we can choose to rebuild the bloomfilter into a larger size. The codes would look like this: {code} /* * Returns the expected false positive probability of the current filter. * * @return The expected false positive probability */ public double expectedFalsePositiveProbability() { return Math.pow((double) bits.cardinality() / vectorSize, nbHash); } {code} Does this sound like a reasonable minor function that could be added into the code base? -- This message was sent by Atlassian JIRA (v6.3.4#6332)