Claude Warren created COLLECTIONS-801:
-----------------------------------------

             Summary: Simplify Bloom filter implementation
                 Key: COLLECTIONS-801
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-801
             Project: Commons Collections
          Issue Type: Improvement
    Affects Versions: 4.5
            Reporter: Claude Warren


The initial Bloom filter implementation has a number of issues arising from 
attempting to verify that filters were built the same way (ie. same number of 
bits, number of functions, hashing algorithm, etc.)  The net result is that 
there is significant overhead in the processing of the filters.  For a 
structure that is intended to speed up decisions this is a fault.  There are 
also issues with implementation hiding in the current code.

The issue calls for:
 * The removal of the hashing tracking, all the registration and verification 
that the hashes are the same.  This becomes an operational issue for the 
developer using the library.  In most cases this is a trivial problem.
 * Removal of all methods that assume an implementation detail of the Bloom 
filter.  Rather than getting a list of indices or a list of bitmap longs (or 
bytes) the classes will implement a "Producer" style that uses IntConsumer, 
LongConsumer, etc to receive the indices or bit maps.  This pattern to carry 
forward to specialized filters like counting Bloom filters that will accept a 
BitCountConsumer to receive the counts for each bit.
 * There are cases where Bloom filters need to be updated and cases where they 
do not.  The change should include merge() methods to produce new Bloom filters 
and mergeInPlace() methods to update the Bloom filters.
 * There are issues with some assumptions in the Hasher implementations wherein 
the static hasher does not function as described.  Part of the goal of this 
change is to simplify the mechanism to pass the internal state of one bloom 
filter to another without either knowing the implementation of that state in 
the other.  This will be done via the "Producer" interface described above.  
Hashers will be able to create IndexProducers that will produce the indices for 
a Bloom filter shape.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to