[ https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990527#comment-16990527 ]
Claude Warren commented on COLLECTIONS-728: ------------------------------------------- *API Issue* The Hash function takes a buffer and a seed and returns an integer. In general this is a simple operation. You might use MD5, SHA256, Murmur128_x64 or Murmur128_x86. So the name must specify what hash is used -- I think we can agree on this. The second the calculation may be iterative or cyclic. Iterative is probably the one you are familiar with where the hashing function is called with a different seed each time and that result used. cyclic is when two values are created: call them partA and partB. The first time the function is called partA is returned. for all subsequent calls partB is added to the previous results and returned. This method is documented in the Cassandra codebase and their experimentation has shown that it does not impact the randomness of the subsequent Bloom filter but does significantly speed up processing. Generally this method is used with a 128 bit hash as the result can be considers as 2 longs. Finally, the result of the function will differ depending on whether the numeric values were calculated using unsigned or signed arithmetic. This information needs to be passed down to components that do not keep a reference to the hasher or may not even have the hasher implementation available. When comparing 2 filters you have to know if the "shape" is the same. but the "shape" does not maintain a reference to he hash function. Classes like the static hasher which maintains a list of the bits that were enabled needs to know the shape but does not have a reference to the hash function. These classes need to be able to verify that the function that generated the values was the same. *Implementation Issues* Are you saying that all classes in Collections have to be immutable? Or that the classes noted are not mutable? They are mutable and we had a discussion about this wherein the standard uses of Bloom filters requires that they be mutable or significant overhead is imposed to create new Bloom filters whenever filters are merged (a common operation). > BloomFilter contribution > ------------------------ > > Key: COLLECTIONS-728 > URL: https://issues.apache.org/jira/browse/COLLECTIONS-728 > Project: Commons Collections > Issue Type: Task > Reporter: Claude Warren > Priority: Minor > Attachments: BF_Func.md, BloomFilter.java, BloomFilterI2.java, > Usage.md > > > Contribution of BloomFilter library comprising base implementation and gated > collections. -- This message was sent by Atlassian Jira (v8.3.4#803005)