org.onelab.filter.BloomFilter class uses 8X the memory it should be using
-------------------------------------------------------------------------

                 Key: HADOOP-2588
                 URL: https://issues.apache.org/jira/browse/HADOOP-2588
             Project: Hadoop
          Issue Type: Bug
          Components: util
         Environment: n/a
            Reporter: Ian Clarke
            Priority: Minor


The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, however 
in most Java implementations this will use a byte per bit stored, meaning that 
8X the actual used memory is required.  This is unfortunate as the whole point 
of a BloomFilter is to save memory.

As a sidebar, the implementation looks a bit shaky in other ways, such as the 
way hashes are generated from a SHA1 digest in the Filter class, such as the 
way that it just assumes the digestBytes array will be long enough in the 
hash() method.

I discovered this while looking for a good Bloom Filter implementation to use 
in my own project.  In the end I went ahead and implemented my own, its very 
simple and pretty elegant (even if I do say so myself ;) - you are welcome to 
use it:

http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to