I believe the issue (I think history is at https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17003600) is about the identification of hash implementations.
Currently there are a couple of classes involved: Hasher interface, has a method that returns a HashFunctionIdentity and a method that returns an iterator of enabled bits. There are a couple of implementations of Hasher: the DynamicHasher contains buffers that are passed to the hash function several times, the StaticHasher contains a list of bits enabled by a hasher for a specific Shape. HashFunction interface: extends HashFunctionIdentity and adds a method that calls the actual hash function. HashFunctionIdentity: contains the name of the hash function, the name of the provider, the processType (cyclic or iterative), Signedness and a signature. There are places in the code where the actual function is not required and is some use cases would make the implementation difficult or fragile. These code places are where the Bloom filter has been built and the system is verifying that two filters used the same hash function. In these cases the comparison is the hashName, processType and Signedness. In cases where the bloom filters are stored in a database retrieval would mean some sort of serialization/deserialization of the hash function or ensure that the hash function is otherwise available. This is problematic. The provider was added in a nod to a future factory that would follow the JCA pattern and allow implementations of multiple providers. The signature was added to support a requested quick check. The signature is calculated by calling hashFunction.apply( String.format( "%s-%s-%s", getName(), getSignedness(), getProcess() ).getBytes( "UTF-8" ), 0 ). There were suggestions to create an enum of HashFunctions controlled by the Collections. I think that this adds a layer of coordination and management on the Collections team that as a team we may not want to take on. In addition, it makes it almost impossible for 3rd party users to create new hash functions and test them with the library. I believe the current implementation provides the minimal information necessary to determine if two functions are supposed to produce the same result. In my mind the signature and provider methods are extra and not necessary but desirable. I think this is a summary of the open discussion. On Wed, Jan 8, 2020 at 2:32 PM Gilles Sadowski <gillese...@gmail.com> wrote: > Le mer. 8 janv. 2020 à 15:15, Gary Gregory <garydgreg...@gmail.com> a > écrit : > > > > I think it is time to bring this PR in and make any adjustments within > > master beyond that. This will be quicker and simpler than going round and > > round for simple things like Javadoc tweaks and small non-functional > > changes (formatting, variable names, and so on.) I'll proceed with that > > tonight. > > Design issues were raised on the ML: With no agreement and no opinions > other than Claude's and mine, things stayed where they were. > > Gilles > > >> [...] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren