[jira] [Resolved] (COLLECTIONS-801) Simplify Bloom filter implementation

Claude Warren (Jira) Fri, 24 Jun 2022 04:30:03 -0700


     [ 
https://issues.apache.org/jira/browse/COLLECTIONS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Claude Warren resolved COLLECTIONS-801.
---------------------------------------
    Fix Version/s: 4.5
       Resolution: Fixed

Closed with pull request 
 
h1. Simplify bloom filters #258

> Simplify Bloom filter implementation
> ------------------------------------
>
>                 Key: COLLECTIONS-801
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-801
>             Project: Commons Collections
>          Issue Type: Improvement
>    Affects Versions: 4.5
>            Reporter: Claude Warren
>            Priority: Minor
>             Fix For: 4.5
>
>
> The initial Bloom filter implementation has a number of issues arising from 
> attempting to verify that filters were built the same way (ie. same number of 
> bits, number of functions, hashing algorithm, etc.)  The net result is that 
> there is significant overhead in the processing of the filters.  For a 
> structure that is intended to speed up decisions this is a fault.  There are 
> also issues with implementation hiding in the current code.
> The issue calls for:
>  * The removal of the hashing tracking, all the registration and verification 
> that the hashes are the same.  This becomes an operational issue for the 
> developer using the library.  In most cases this is a trivial problem.
>  * Removal of all methods that assume an implementation detail of the Bloom 
> filter.  Rather than getting a list of indices or a list of bitmap longs (or 
> bytes) the classes will implement a "Producer" style that uses IntConsumer, 
> LongConsumer, etc to receive the indices or bit maps.  This pattern to carry 
> forward to specialized filters like counting Bloom filters that will accept a 
> BitCountConsumer to receive the counts for each bit.
>  * There are cases where Bloom filters need to be updated and cases where 
> they do not.  The change should include merge() methods to produce new Bloom 
> filters and mergeInPlace() methods to update the Bloom filters.
>  * There are issues with some assumptions in the Hasher implementations 
> wherein the static hasher does not function as described.  Part of the goal 
> of this change is to simplify the mechanism to pass the internal state of one 
> bloom filter to another without either knowing the implementation of that 
> state in the other.  This will be done via the "Producer" interface described 
> above.  Hashers will be able to create IndexProducers that will produce the 
> indices for a Bloom filter shape.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (COLLECTIONS-801) Simplify Bloom filter implementation

Reply via email to