[ 
https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990798#comment-16990798
 ] 

Claude Warren commented on COLLECTIONS-728:
-------------------------------------------

Hasher names:

The Shape of the Bloom filter is dependent several things.  One is that the 
hashing function is consistent; that is it uses the same techniques to generate 
the hashed values.  Two different implementations of a hash function are are 
the "same" as long as they generate the same values for the same input. 
(referred to below as the "same function"). 

Comparing bloom filters that do not use the same function does not make sense.  
Thus a mechanism to distinguish between filters with different hashing is 
desired so that the user may be warned when such an attempt is made.  This is 
where the naming arises.  The naming is intended to provide a proxy for the 
implementation details as well as provide the user with some idea of what 
hashes were used in order to assist in the resolution of the conflict.

You will note that the Shape class equality check verifies that the name is the 
same for the same reason.

Similar to the Java Cryptography Architecture (JCA) two providers may provide 
implementations of the same hash function.  Users assume that the JCA 
implementation has been vetted and is correctly implemented.  In the Bloom 
filter case an improperly implemented hash is only a serious issue in cross 
application communication. 

Note that the code in this contribution does not require the name format be 
followed, only that different implementations be named differently.  I do have 
a Caching Hasher implementation in another application that requires a cyclic 
hash function and will fail if the hash name does not match that presented here.

Perhaps it makes more sense to have a name an the booleans for cyclinc/iterated 
and signed/unsigned.  But I don't see a way to provide users with the ability 
to use new, old or broken hash functions and be able to evaluate if the same 
function is being used without using a name. 

In my mind Enums are appropriate in two basic conditions: 1) you know all the 
possible values; or 2) you want to tightly control the acceptable values.  
Neither of  these conditions apply in the case of hash function identification.

addendum:

In reading back over the previous post I was struck the the use of the term 
"library".  Perhaps this is just a "nomenclature" thing or perhaps it is a 
"conceptual" thing.  I see the Bloom filter contribution as a "framework", a 
scaffolding on which other developers may hang new implementations and tweeks.  
In my mind a "library" is generally something that is used without modification 
or extension.  I just wanted to ensure that we have the same "vision" of this 
contribution.

> BloomFilter contribution
> ------------------------
>
>                 Key: COLLECTIONS-728
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-728
>             Project: Commons Collections
>          Issue Type: Task
>            Reporter: Claude Warren
>            Priority: Minor
>         Attachments: BF_Func.md, BloomFilter.java, BloomFilterI2.java, 
> Usage.md
>
>
> Contribution of BloomFilter library comprising base implementation and gated 
> collections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to