[jira] Commented: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using

Hadoop QA (JIRA) Mon, 14 Jan 2008 00:06:10 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558535#action_12558535
 ]


Hadoop QA commented on HADOOP-2588:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12373071/patch.txt
against trunk revision r611727.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1577/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1577/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1577/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1577/console

This message is automatically generated.

> org.onelab.filter.BloomFilter class uses 8X the memory it should be using
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-2588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2588
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: n/a
>            Reporter: Ian Clarke
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, 
> however in most Java implementations this will use a byte per bit stored, 
> meaning that 8X the actual used memory is required.  This is unfortunate as 
> the whole point of a BloomFilter is to save memory.
> As a sidebar, the implementation looks a bit shaky in other ways, such as the 
> way hashes are generated from a SHA1 digest in the Filter class, such as the 
> way that it just assumes the digestBytes array will be long enough in the 
> hash() method.
> I discovered this while looking for a good Bloom Filter implementation to use 
> in my own project.  In the end I went ahead and implemented my own, its very 
> simple and pretty elegant (even if I do say so myself ;) - you are welcome to 
> use it:
> http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using

Reply via email to