[
https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558272#action_12558272
]
Jim Kellerman commented on HADOOP-2588:
---------------------------------------
You must be looking at an older version than what is in trunk.
The current implementation uses a Jenkins hash rather than SHA-1.
You are correct that there is no guarantee how JVMs implement an array of
boolean.
Perhaps using a java.util.BitSet would be better.
> org.onelab.filter.BloomFilter class uses 8X the memory it should be using
> -------------------------------------------------------------------------
>
> Key: HADOOP-2588
> URL: https://issues.apache.org/jira/browse/HADOOP-2588
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Environment: n/a
> Reporter: Ian Clarke
> Priority: Minor
>
> The org.onelab.filter.BloomFilter uses a boolean[] to store the filter,
> however in most Java implementations this will use a byte per bit stored,
> meaning that 8X the actual used memory is required. This is unfortunate as
> the whole point of a BloomFilter is to save memory.
> As a sidebar, the implementation looks a bit shaky in other ways, such as the
> way hashes are generated from a SHA1 digest in the Filter class, such as the
> way that it just assumes the digestBytes array will be long enough in the
> hash() method.
> I discovered this while looking for a good Bloom Filter implementation to use
> in my own project. In the end I went ahead and implemented my own, its very
> simple and pretty elegant (even if I do say so myself ;) - you are welcome to
> use it:
> http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.