[ 
https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990527#comment-16990527
 ] 

Claude Warren commented on COLLECTIONS-728:
-------------------------------------------

*API Issue*

The Hash function takes a buffer and a seed and returns an integer.  In general 
this is a simple operation.  You might use MD5, SHA256, Murmur128_x64 or 
Murmur128_x86.  So the name must specify what hash is used -- I think we can 
agree on this.

The second the calculation may be iterative or cyclic.  Iterative is probably 
the one you are familiar with where the hashing function is called with a 
different seed each time and that result used.  cyclic is when two values are 
created: call them partA and partB.  The first time the function is called 
partA is returned.  for all subsequent calls partB is added to the previous 
results and returned.  This method is documented in the Cassandra codebase and 
their experimentation has shown that it does not impact the randomness of the 
subsequent Bloom filter but does significantly speed up processing.  Generally 
this method is used with a 128 bit hash as the result can be considers as 2 
longs.

Finally, the result of the function will differ depending on whether the 
numeric values were calculated using unsigned or signed arithmetic.

This information needs to be passed down to components that do not keep a 
reference to the hasher or may not even have the hasher implementation 
available.  When comparing 2 filters you have to know if the "shape" is the 
same.  but the "shape" does not maintain a reference to he hash function.

Classes like the static hasher which maintains a list of the bits that were 
enabled needs to know the shape but does not have a reference to the hash 
function.

These classes need to be able to verify that the function that generated the 
values was the same.

*Implementation Issues*

Are you saying that all classes in Collections have to be immutable?  Or that 
the classes noted are not mutable?  They are mutable and we had a discussion 
about this wherein the standard uses of Bloom filters requires  that they be 
mutable or significant overhead is imposed to create new Bloom filters whenever 
filters are merged (a common operation).




> BloomFilter contribution
> ------------------------
>
>                 Key: COLLECTIONS-728
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-728
>             Project: Commons Collections
>          Issue Type: Task
>            Reporter: Claude Warren
>            Priority: Minor
>         Attachments: BF_Func.md, BloomFilter.java, BloomFilterI2.java, 
> Usage.md
>
>
> Contribution of BloomFilter library comprising base implementation and gated 
> collections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to