[ https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558274#action_12558274 ]
Ian Clarke commented on HADOOP-2588: ------------------------------------ You are right, the current implementation in trunk looks like an improvement, and is probably much faster than SHA1 (which was way-overkill for a Bloom Filter). The fact that the datastructure, which is supposed to save RAM, is using 8X more RAM than it needs to strikes me as a pretty serious issue if this class sees much use in the codebase, but I will leave the prioritization up to your good judgement. Correcting the problem through use of a BitSet should be trivial enough anyway. > org.onelab.filter.BloomFilter class uses 8X the memory it should be using > ------------------------------------------------------------------------- > > Key: HADOOP-2588 > URL: https://issues.apache.org/jira/browse/HADOOP-2588 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Environment: n/a > Reporter: Ian Clarke > Priority: Trivial > > The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, > however in most Java implementations this will use a byte per bit stored, > meaning that 8X the actual used memory is required. This is unfortunate as > the whole point of a BloomFilter is to save memory. > As a sidebar, the implementation looks a bit shaky in other ways, such as the > way hashes are generated from a SHA1 digest in the Filter class, such as the > way that it just assumes the digestBytes array will be long enough in the > hash() method. > I discovered this while looking for a good Bloom Filter implementation to use > in my own project. In the end I went ahead and implemented my own, its very > simple and pretty elegant (even if I do say so myself ;) - you are welcome to > use it: > http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.