On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory <garydgreg...@gmail.com> wrote:

> Hi Claude and all:
>
> Here are a couple of comments on the bloom filter PR.
>
> 10,100 ft level comment we do not have to worry about today: Before the
> release, we might want to split Commons Collection into a multi-module
> project and have the BF code in it own module. The _main_ reason for this
> is that it will allow the dependency on Commons Codecis be a non-optional
> dependency. Optional dependency are a a bit of a pain IMO, especially when
> you deal with large stacks. You end up sucking in everything that is
> optional when you deliver an app because you do not know for certain what's
> really required at runtime.
>
> Closer to the ground: I would make BloomFilter an interface and rename
> the current BloomFilter class AbstractBloomFilter implements BloomFilter.
> This will allow the user and maintainer to see what is truly public.
>

I have done this (not checked in) but the net effect is that all the public
methods in the AbstractBloomFilter class are reflected in the BloomFilter
interface and the Shape class has moved to the Interface as well.

I can remove several methods from BloomFilter that are not strictly
necessary for this code to function.  I am and have been reticent to do so
since there are a number of home-grown libraries used by various
researchers that provide these functions.  But if that is what it takes to
get this out the door it will be done.


> Since this is a first cut for Commons Collection, I would consider only
> making APIs public that must be public. Once released, we MUST maintain
> binary compatibility within minor releases. Major releases allow us to
> break this compatibility but these do not happen very often. I do not know
> what this means yet for BF but it's a thought to consider. This kind of
> work is made harder due to the current packaging of the BF code.
>
> We could consider putting it all in one package if that helps reduce the
> public API footprint.
>

I think that putting all the pieces into a single package will muddy the
waters a bit.  The classes in o.a.c.c.bloomfilter are high level classes
and used wherever the Bloom filter code is used.  Objects in the
o.a.c.c.bloomfilter.hasher classes are Hashers and are really only selected
when the application developer determines which hasher is best suited for
the environment.  Finally o.a.c.c.bloomfilter.hasher.function are
HasherFunction implementations.  These can be used or special cases built
by the implementer as necessary.  Perhaps I am being a bit too pedantic but
I do not think it makes sense to merge them into a single package.


>
> Q: All the code in main's
> org.apache.commons.collections4.bloomfilter.hasher.function is only called
> from test code. Why is that?
>

They are pluggable modules.  When a developer uses them they an instance is
created and it is passed to the BloomFilter.Shape() constructor as well as
to the various Hasher constructors.  For example:

    HashFunction hashFunction = new Murmur128x86Cyclic();
    Shape bloomFilterConfig = new Shape( hashFunction, 3, 1.0 / 100000);


>
> Nit: Some odd formatting like spaces after open parens: 'toUpperCase(
> Locale.ROOT)' should be  'toUpperCase(Locale.ROOT)'
>
> Gary
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to