[
https://issues.apache.org/jira/browse/COLLECTIONS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Claude Warren resolved COLLECTIONS-801.
---------------------------------------
Fix Version/s: 4.5
Resolution: Fixed
Closed with pull request
h1. Simplify bloom filters #258
> Simplify Bloom filter implementation
> ------------------------------------
>
> Key: COLLECTIONS-801
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-801
> Project: Commons Collections
> Issue Type: Improvement
> Affects Versions: 4.5
> Reporter: Claude Warren
> Priority: Minor
> Fix For: 4.5
>
>
> The initial Bloom filter implementation has a number of issues arising from
> attempting to verify that filters were built the same way (ie. same number of
> bits, number of functions, hashing algorithm, etc.) The net result is that
> there is significant overhead in the processing of the filters. For a
> structure that is intended to speed up decisions this is a fault. There are
> also issues with implementation hiding in the current code.
> The issue calls for:
> * The removal of the hashing tracking, all the registration and verification
> that the hashes are the same. This becomes an operational issue for the
> developer using the library. In most cases this is a trivial problem.
> * Removal of all methods that assume an implementation detail of the Bloom
> filter. Rather than getting a list of indices or a list of bitmap longs (or
> bytes) the classes will implement a "Producer" style that uses IntConsumer,
> LongConsumer, etc to receive the indices or bit maps. This pattern to carry
> forward to specialized filters like counting Bloom filters that will accept a
> BitCountConsumer to receive the counts for each bit.
> * There are cases where Bloom filters need to be updated and cases where
> they do not. The change should include merge() methods to produce new Bloom
> filters and mergeInPlace() methods to update the Bloom filters.
> * There are issues with some assumptions in the Hasher implementations
> wherein the static hasher does not function as described. Part of the goal
> of this change is to simplify the mechanism to pass the internal state of one
> bloom filter to another without either knowing the implementation of that
> state in the other. This will be done via the "Producer" interface described
> above. Hashers will be able to create IndexProducers that will produce the
> indices for a Bloom filter shape.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)