[ https://issues.apache.org/jira/browse/HADOOP-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560490#action_12560490 ]
Stu Hood commented on HADOOP-2654: ---------------------------------- Yea, it is byte-wide: the storage in this implementation is an array of bytes, each byte representing a bucket. On a related topic, I think it would be a nice improvement to convert from byte buckets to 4bit buckets as suggested in the article. Currently, we are only effectively using 7bits of each bucket, because we are treating the value as unsigned (0-127). It would also obviously be a space improvement. > CountingBloomFilter can overflow its storage > -------------------------------------------- > > Key: HADOOP-2654 > URL: https://issues.apache.org/jira/browse/HADOOP-2654 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: Stu Hood > Attachments: counting-overflow.patch > > > The org.onelab.filter.CountingBloomFilter implementation does not check the > value of a bucket before incrementing/decrementing it. The buckets in a > Counting Bloom filter must not be allowed to overflow, and if they reach > their maximum value, they must not be allowed to decrement. This is the only > way to preserve the assumptions of the filter (without larger buckets). See: > http://en.wikipedia.org/wiki/Bloom_filter#Counting_filters > Currently, if enough values hash to a bucket, the CountingBloomFilter may > begin reporting false negatives when it wraps back around to 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.