I think it is time to bring this PR in and make any adjustments within
master beyond that. This will be quicker and simpler than going round and
round for simple things like Javadoc tweaks and small non-functional
changes (formatting, variable names, and so on.) I'll proceed with that
tonight.

Gary

On Sun, Dec 29, 2019 at 11:56 AM Claude Warren <cla...@xenei.com> wrote:

> It is currently a sub-class.  There was a suggestion to move it outside of
> the BloomFilter interface.
>
> On Sun, Dec 29, 2019 at 3:47 PM Gilles Sadowski <gillese...@gmail.com>
> wrote:
>
> > Le dim. 29 déc. 2019 à 15:30, Claude Warren <cla...@xenei.com> a écrit :
> > >
> > > If the Shape class (BloomFilter.Shape) is extracted from the
> BloomFilter
> > > interface and made a stand-alone class would the name change or is the
> > fact
> > > that it is in the o.a.c.c.bloomfilter package enough?
> > >
> > > I prefer the name Shape to BloomFilterShape.
> >
> > If "Shape" is only used for "BloomFilter" (as the suggestion above
> > seems to indicate, why not declare it as a sub-interface:
> > ---CUT---
> > public interface BloomFilter {
> >     // ...
> >
> >     public interface Shape {
> >         // ...
> >     }
> > }
> > ---CUT---
> > ?
> >
> > Regards,
> > Gilles
> >
> > >
> > > Claude
> > >
> > > On Sat, Dec 28, 2019 at 9:06 AM Claude Warren <cla...@xenei.com>
> wrote:
> > >
> > > > Once the interface is extracted and reduced to the minimum necessary
> > the
> > > > following methods are removed:
> > > >
> > > > orCardinality() -- we have andCardinality() and xorCardinality() this
> > was
> > > > included for completeness.
> > > >
> > > > isFull() -- true if all the bits in the vector are on. A convenience
> > > > method used to short circuit some logic.
> > > >
> > > > I think these 2 methods should go into the BloomFilter interface.
> > > >
> > > >
> > > > Set operations:
> > > >
> > > > jaccardSimilarity -- a measure of similarity in sets with range [0,1]
> > > >
> > > > jaccardDistance -- defined as 1 - jaccardSimilarity
> > > >
> > > > cosineSimilarity -- a measure of similarity with range  [0,1]
> > > >
> > > > cosineDistance -- defined as 1 - cosineSimilarity
> > > >
> > > > estimateSize -- Set operation estimate of the number of items that
> were
> > > > placed in the filter. (Requires Shape)
> > > >
> > > > estimateUnionSize -- Set operation estimate of the number of items
> that
> > > > would be represented by the merging of two filters. (Requires Shape)
> > > >
> > > > estimateIntersectionSize -- Set operations estimate of the number of
> > items
> > > > in the intersection. (Requires Shape)
> > > >
> > > > Perhaps it makes sense to move the Set operations into their own
> class
> > > > with static methods?  The set operations all operate on 2 Bloom
> > filters.
> > > > Moving them would clarify the AbstractBloomFilter class.
> > > >
> > > > Claude
> > > >
> > > >
> > > > On Sat, Dec 28, 2019 at 2:01 AM Gary Gregory <garydgreg...@gmail.com
> >
> > > > wrote:
> > > >
> > > >> On Fri, Dec 27, 2019 at 11:36 AM Claude Warren <cla...@xenei.com>
> > wrote:
> > > >>
> > > >> > On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory <
> > garydgreg...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Claude and all:
> > > >> > >
> > > >> > > Here are a couple of comments on the bloom filter PR.
> > > >> > >
> > > >> > > 10,100 ft level comment we do not have to worry about today:
> > Before
> > > >> the
> > > >> > > release, we might want to split Commons Collection into a
> > multi-module
> > > >> > > project and have the BF code in it own module. The _main_ reason
> > for
> > > >> this
> > > >> > > is that it will allow the dependency on Commons Codecis be a
> > > >> non-optional
> > > >> > > dependency. Optional dependency are a a bit of a pain IMO,
> > especially
> > > >> > when
> > > >> > > you deal with large stacks. You end up sucking in everything
> that
> > is
> > > >> > > optional when you deliver an app because you do not know for
> > certain
> > > >> > what's
> > > >> > > really required at runtime.
> > > >> > >
> > > >> > > Closer to the ground: I would make BloomFilter an interface and
> > rename
> > > >> > > the current BloomFilter class AbstractBloomFilter implements
> > > >> BloomFilter.
> > > >> > > This will allow the user and maintainer to see what is truly
> > public.
> > > >> > >
> > > >> >
> > > >> > I have done this (not checked in) but the net effect is that all
> the
> > > >> public
> > > >> > methods in the AbstractBloomFilter class are reflected in the
> > > >> BloomFilter
> > > >> > interface and the Shape class has moved to the Interface as well.
> > > >> >
> > > >>
> > > >> OK, I think I like BloomFilter as an interface especially since it
> is
> > used
> > > >> as an argument in methods. I'll leave it to you as to whether Shape
> > needs
> > > >> to be folded in. I did notice that Shape is an argument in a few
> > places.
> > > >> Might we loose some focus and abstraction by folding Shape into the
> > > >> BloomFilter interface?
> > > >>
> > > >>
> > > >> > I can remove several methods from BloomFilter that are not
> strictly
> > > >> > necessary for this code to function.  I am and have been reticent
> > to do
> > > >> so
> > > >> > since there are a number of home-grown libraries used by various
> > > >> > researchers that provide these functions.  But if that is what it
> > takes
> > > >> to
> > > >> > get this out the door it will be done.
> > > >> >
> > > >>
> > > >> Once you have BloomFilter as an interface with an implementing
> class,
> > it
> > > >> might make it much clearer as to what really belongs in the
> > interface. The
> > > >> handy methods can stay in the abstract class of course.
> > > >>
> > > >>
> > > >> >
> > > >> > > Since this is a first cut for Commons Collection, I would
> consider
> > > >> only
> > > >> > > making APIs public that must be public. Once released, we MUST
> > > >> maintain
> > > >> > > binary compatibility within minor releases. Major releases allow
> > us to
> > > >> > > break this compatibility but these do not happen very often. I
> do
> > not
> > > >> > know
> > > >> > > what this means yet for BF but it's a thought to consider. This
> > kind
> > > >> of
> > > >> > > work is made harder due to the current packaging of the BF code.
> > > >> > >
> > > >> > > We could consider putting it all in one package if that helps
> > reduce
> > > >> the
> > > >> > > public API footprint.
> > > >> > >
> > > >> >
> > > >> > I think that putting all the pieces into a single package will
> > muddy the
> > > >> > waters a bit.  The classes in o.a.c.c.bloomfilter are high level
> > classes
> > > >> > and used wherever the Bloom filter code is used.  Objects in the
> > > >> > o.a.c.c.bloomfilter.hasher classes are Hashers and are really only
> > > >> selected
> > > >> > when the application developer determines which hasher is best
> > suited
> > > >> for
> > > >> > the environment.  Finally o.a.c.c.bloomfilter.hasher.function are
> > > >> > HasherFunction implementations.  These can be used or special
> cases
> > > >> built
> > > >> > by the implementer as necessary.  Perhaps I am being a bit too
> > pedantic
> > > >> but
> > > >> > I do not think it makes sense to merge them into a single package.
> > > >> >
> > > >>
> > > >> Let's leave it as is then.
> > > >>
> > > >> Side note: I'll need to release Commons Codec 1.14 before we can
> > bring in
> > > >> this PR. I hope to do this after Commons VFS has gone through its
> own
> > > >> release cycle (which should be done in 48 hours or so if all goes
> > well.)
> > > >> Well, we could bring in the PR but depending on a SNAPSHOT build is
> > > >> usually
> > > >> not a good idea unless it is truly short term.
> > > >>
> > > >> Thank you,
> > > >> Gary
> > > >>
> > > >>
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > Q: All the code in main's
> > > >> > > org.apache.commons.collections4.bloomfilter.hasher.function is
> > only
> > > >> > called
> > > >> > > from test code. Why is that?
> > > >> > >
> > > >> >
> > > >> > They are pluggable modules.  When a developer uses them they an
> > > >> instance is
> > > >> > created and it is passed to the BloomFilter.Shape() constructor as
> > well
> > > >> as
> > > >> > to the various Hasher constructors.  For example:
> > > >> >
> > > >> >     HashFunction hashFunction = new Murmur128x86Cyclic();
> > > >> >     Shape bloomFilterConfig = new Shape( hashFunction, 3, 1.0 /
> > 100000);
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > Nit: Some odd formatting like spaces after open parens:
> > 'toUpperCase(
> > > >> > > Locale.ROOT)' should be  'toUpperCase(Locale.ROOT)'
> > > >> > >
> > > >> > > Gary
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > I like: Like Like - The likeliest place on the web
> > > >> > <http://like-like.xenei.com>
> > > >> > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > I like: Like Like - The likeliest place on the web
> > > > <http://like-like.xenei.com>
> > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > >
> > >
> > >
> > > --
> > > I like: Like Like - The likeliest place on the web
> > > <http://like-like.xenei.com>
> > > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Reply via email to