[jira] Updated: (HADOOP-2654) CountingBloomFilter can overflow its storage

Stu Hood (JIRA) Fri, 18 Jan 2008 21:59:05 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stu Hood updated HADOOP-2654:
-----------------------------

    Attachment: counting-overflow-fourbit.patch

This patch implements the CountingBloomFilter using 4bit buckets in a long[]. 
Since the majority of the time is taken in the hash function, it runs at 
approximately the same speed as BloomFilter.

> CountingBloomFilter can overflow its storage
> --------------------------------------------
>
>                 Key: HADOOP-2654
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2654
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Stu Hood
>         Attachments: counting-overflow-fourbit.patch, counting-overflow.patch
>
>
> The org.onelab.filter.CountingBloomFilter implementation does not check the 
> value of a bucket before incrementing/decrementing it. The buckets in a 
> Counting Bloom filter must not be allowed to overflow, and if they reach 
> their maximum value, they must not be allowed to decrement. This is the only 
> way to preserve the assumptions of the filter (without larger buckets). See: 
> http://en.wikipedia.org/wiki/Bloom_filter#Counting_filters
> Currently, if enough values hash to a bucket, the CountingBloomFilter may 
> begin reporting false negatives when it wraps back around to 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2654) CountingBloomFilter can overflow its storage

Reply via email to