[ https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2588: ---------------------------------- Fix Version/s: 0.16.0 Affects Version/s: 0.16.0 Status: Patch Available (was: Open) > org.onelab.filter.BloomFilter class uses 8X the memory it should be using > ------------------------------------------------------------------------- > > Key: HADOOP-2588 > URL: https://issues.apache.org/jira/browse/HADOOP-2588 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Affects Versions: 0.16.0 > Environment: n/a > Reporter: Ian Clarke > Priority: Trivial > Fix For: 0.16.0 > > Attachments: patch.txt > > > The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, > however in most Java implementations this will use a byte per bit stored, > meaning that 8X the actual used memory is required. This is unfortunate as > the whole point of a BloomFilter is to save memory. > As a sidebar, the implementation looks a bit shaky in other ways, such as the > way hashes are generated from a SHA1 digest in the Filter class, such as the > way that it just assumes the digestBytes array will be long enough in the > hash() method. > I discovered this while looking for a good Bloom Filter implementation to use > in my own project. In the end I went ahead and implemented my own, its very > simple and pretty elegant (even if I do say so myself ;) - you are welcome to > use it: > http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.