[ https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990798#comment-16990798 ]
Claude Warren commented on COLLECTIONS-728: ------------------------------------------- Hasher names: The Shape of the Bloom filter is dependent several things. One is that the hashing function is consistent; that is it uses the same techniques to generate the hashed values. Two different implementations of a hash function are are the "same" as long as they generate the same values for the same input. (referred to below as the "same function"). Comparing bloom filters that do not use the same function does not make sense. Thus a mechanism to distinguish between filters with different hashing is desired so that the user may be warned when such an attempt is made. This is where the naming arises. The naming is intended to provide a proxy for the implementation details as well as provide the user with some idea of what hashes were used in order to assist in the resolution of the conflict. You will note that the Shape class equality check verifies that the name is the same for the same reason. Similar to the Java Cryptography Architecture (JCA) two providers may provide implementations of the same hash function. Users assume that the JCA implementation has been vetted and is correctly implemented. In the Bloom filter case an improperly implemented hash is only a serious issue in cross application communication. Note that the code in this contribution does not require the name format be followed, only that different implementations be named differently. I do have a Caching Hasher implementation in another application that requires a cyclic hash function and will fail if the hash name does not match that presented here. Perhaps it makes more sense to have a name an the booleans for cyclinc/iterated and signed/unsigned. But I don't see a way to provide users with the ability to use new, old or broken hash functions and be able to evaluate if the same function is being used without using a name. In my mind Enums are appropriate in two basic conditions: 1) you know all the possible values; or 2) you want to tightly control the acceptable values. Neither of these conditions apply in the case of hash function identification. addendum: In reading back over the previous post I was struck the the use of the term "library". Perhaps this is just a "nomenclature" thing or perhaps it is a "conceptual" thing. I see the Bloom filter contribution as a "framework", a scaffolding on which other developers may hang new implementations and tweeks. In my mind a "library" is generally something that is used without modification or extension. I just wanted to ensure that we have the same "vision" of this contribution. > BloomFilter contribution > ------------------------ > > Key: COLLECTIONS-728 > URL: https://issues.apache.org/jira/browse/COLLECTIONS-728 > Project: Commons Collections > Issue Type: Task > Reporter: Claude Warren > Priority: Minor > Attachments: BF_Func.md, BloomFilter.java, BloomFilterI2.java, > Usage.md > > > Contribution of BloomFilter library comprising base implementation and gated > collections. -- This message was sent by Atlassian Jira (v8.3.4#803005)