[
https://issues.apache.org/jira/browse/HADOOP-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stu Hood updated HADOOP-2654:
-----------------------------
Attachment: counting-overflow-fourbit.patch
This patch implements the CountingBloomFilter using 4bit buckets in a long[].
Since the majority of the time is taken in the hash function, it runs at
approximately the same speed as BloomFilter.
> CountingBloomFilter can overflow its storage
> --------------------------------------------
>
> Key: HADOOP-2654
> URL: https://issues.apache.org/jira/browse/HADOOP-2654
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: Stu Hood
> Attachments: counting-overflow-fourbit.patch, counting-overflow.patch
>
>
> The org.onelab.filter.CountingBloomFilter implementation does not check the
> value of a bucket before incrementing/decrementing it. The buckets in a
> Counting Bloom filter must not be allowed to overflow, and if they reach
> their maximum value, they must not be allowed to decrement. This is the only
> way to preserve the assumptions of the filter (without larger buckets). See:
> http://en.wikipedia.org/wiki/Bloom_filter#Counting_filters
> Currently, if enough values hash to a bucket, the CountingBloomFilter may
> begin reporting false negatives when it wraps back around to 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.