I believe the issue (I think history is at
https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17003600)
is about the identification of hash implementations.

Currently there are a couple of classes involved:

Hasher interface, has a method that returns a HashFunctionIdentity and a
method that returns an iterator of enabled bits.  There are a couple of
implementations of Hasher:  the DynamicHasher contains buffers that are
passed to the hash function several times, the StaticHasher contains a list
of bits enabled by a hasher for a specific Shape.

HashFunction interface: extends HashFunctionIdentity and adds a method that
calls the actual hash function.

HashFunctionIdentity: contains the name of the hash function, the name of
the provider, the processType (cyclic or iterative), Signedness and a
signature.

There are places in the code where the actual function is not required and
is some use cases would make the implementation difficult or fragile.
These code places are where the Bloom filter has been built and the system
is verifying that two filters used the same hash function.  In these cases
the comparison is the hashName, processType and Signedness.  In cases where
the bloom filters are stored in a database retrieval would mean some sort
of serialization/deserialization of the hash function or ensure that the
hash function is otherwise available.  This is problematic.

The provider was added in a nod to a future factory that would follow the
JCA pattern and allow implementations of multiple providers.

The signature was added to support a requested quick check.  The signature
is calculated by calling hashFunction.apply( String.format( "%s-%s-%s",
getName(), getSignedness(), getProcess() ).getBytes( "UTF-8" ), 0 ).

There were suggestions to create an enum of HashFunctions controlled  by
the Collections.  I think that this adds a layer of coordination and
management on the Collections team that as a team we may not want to take
on.  In addition, it makes it almost impossible for 3rd party users to
create new hash functions and test them with the library.

I believe the current implementation provides the minimal information
necessary to determine if two functions are supposed to produce the same
result.  In my mind the signature and provider methods are extra and not
necessary but desirable.

I think this is a summary of the open discussion.


On Wed, Jan 8, 2020 at 2:32 PM Gilles Sadowski <gillese...@gmail.com> wrote:

> Le mer. 8 janv. 2020 à 15:15, Gary Gregory <garydgreg...@gmail.com> a
> écrit :
> >
> > I think it is time to bring this PR in and make any adjustments within
> > master beyond that. This will be quicker and simpler than going round and
> > round for simple things like Javadoc tweaks and small non-functional
> > changes (formatting, variable names, and so on.) I'll proceed with that
> > tonight.
>
> Design issues were raised on the ML: With no agreement and no opinions
> other than Claude's and mine, things stayed where they were.
>
> Gilles
>
> >> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to