It is currently a sub-class.  There was a suggestion to move it outside of
the BloomFilter interface.

On Sun, Dec 29, 2019 at 3:47 PM Gilles Sadowski <gillese...@gmail.com>
wrote:

> Le dim. 29 déc. 2019 à 15:30, Claude Warren <cla...@xenei.com> a écrit :
> >
> > If the Shape class (BloomFilter.Shape) is extracted from the BloomFilter
> > interface and made a stand-alone class would the name change or is the
> fact
> > that it is in the o.a.c.c.bloomfilter package enough?
> >
> > I prefer the name Shape to BloomFilterShape.
>
> If "Shape" is only used for "BloomFilter" (as the suggestion above
> seems to indicate, why not declare it as a sub-interface:
> ---CUT---
> public interface BloomFilter {
>     // ...
>
>     public interface Shape {
>         // ...
>     }
> }
> ---CUT---
> ?
>
> Regards,
> Gilles
>
> >
> > Claude
> >
> > On Sat, Dec 28, 2019 at 9:06 AM Claude Warren <cla...@xenei.com> wrote:
> >
> > > Once the interface is extracted and reduced to the minimum necessary
> the
> > > following methods are removed:
> > >
> > > orCardinality() -- we have andCardinality() and xorCardinality() this
> was
> > > included for completeness.
> > >
> > > isFull() -- true if all the bits in the vector are on. A convenience
> > > method used to short circuit some logic.
> > >
> > > I think these 2 methods should go into the BloomFilter interface.
> > >
> > >
> > > Set operations:
> > >
> > > jaccardSimilarity -- a measure of similarity in sets with range [0,1]
> > >
> > > jaccardDistance -- defined as 1 - jaccardSimilarity
> > >
> > > cosineSimilarity -- a measure of similarity with range  [0,1]
> > >
> > > cosineDistance -- defined as 1 - cosineSimilarity
> > >
> > > estimateSize -- Set operation estimate of the number of items that were
> > > placed in the filter. (Requires Shape)
> > >
> > > estimateUnionSize -- Set operation estimate of the number of items that
> > > would be represented by the merging of two filters. (Requires Shape)
> > >
> > > estimateIntersectionSize -- Set operations estimate of the number of
> items
> > > in the intersection. (Requires Shape)
> > >
> > > Perhaps it makes sense to move the Set operations into their own class
> > > with static methods?  The set operations all operate on 2 Bloom
> filters.
> > > Moving them would clarify the AbstractBloomFilter class.
> > >
> > > Claude
> > >
> > >
> > > On Sat, Dec 28, 2019 at 2:01 AM Gary Gregory <garydgreg...@gmail.com>
> > > wrote:
> > >
> > >> On Fri, Dec 27, 2019 at 11:36 AM Claude Warren <cla...@xenei.com>
> wrote:
> > >>
> > >> > On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory <
> garydgreg...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi Claude and all:
> > >> > >
> > >> > > Here are a couple of comments on the bloom filter PR.
> > >> > >
> > >> > > 10,100 ft level comment we do not have to worry about today:
> Before
> > >> the
> > >> > > release, we might want to split Commons Collection into a
> multi-module
> > >> > > project and have the BF code in it own module. The _main_ reason
> for
> > >> this
> > >> > > is that it will allow the dependency on Commons Codecis be a
> > >> non-optional
> > >> > > dependency. Optional dependency are a a bit of a pain IMO,
> especially
> > >> > when
> > >> > > you deal with large stacks. You end up sucking in everything that
> is
> > >> > > optional when you deliver an app because you do not know for
> certain
> > >> > what's
> > >> > > really required at runtime.
> > >> > >
> > >> > > Closer to the ground: I would make BloomFilter an interface and
> rename
> > >> > > the current BloomFilter class AbstractBloomFilter implements
> > >> BloomFilter.
> > >> > > This will allow the user and maintainer to see what is truly
> public.
> > >> > >
> > >> >
> > >> > I have done this (not checked in) but the net effect is that all the
> > >> public
> > >> > methods in the AbstractBloomFilter class are reflected in the
> > >> BloomFilter
> > >> > interface and the Shape class has moved to the Interface as well.
> > >> >
> > >>
> > >> OK, I think I like BloomFilter as an interface especially since it is
> used
> > >> as an argument in methods. I'll leave it to you as to whether Shape
> needs
> > >> to be folded in. I did notice that Shape is an argument in a few
> places.
> > >> Might we loose some focus and abstraction by folding Shape into the
> > >> BloomFilter interface?
> > >>
> > >>
> > >> > I can remove several methods from BloomFilter that are not strictly
> > >> > necessary for this code to function.  I am and have been reticent
> to do
> > >> so
> > >> > since there are a number of home-grown libraries used by various
> > >> > researchers that provide these functions.  But if that is what it
> takes
> > >> to
> > >> > get this out the door it will be done.
> > >> >
> > >>
> > >> Once you have BloomFilter as an interface with an implementing class,
> it
> > >> might make it much clearer as to what really belongs in the
> interface. The
> > >> handy methods can stay in the abstract class of course.
> > >>
> > >>
> > >> >
> > >> > > Since this is a first cut for Commons Collection, I would consider
> > >> only
> > >> > > making APIs public that must be public. Once released, we MUST
> > >> maintain
> > >> > > binary compatibility within minor releases. Major releases allow
> us to
> > >> > > break this compatibility but these do not happen very often. I do
> not
> > >> > know
> > >> > > what this means yet for BF but it's a thought to consider. This
> kind
> > >> of
> > >> > > work is made harder due to the current packaging of the BF code.
> > >> > >
> > >> > > We could consider putting it all in one package if that helps
> reduce
> > >> the
> > >> > > public API footprint.
> > >> > >
> > >> >
> > >> > I think that putting all the pieces into a single package will
> muddy the
> > >> > waters a bit.  The classes in o.a.c.c.bloomfilter are high level
> classes
> > >> > and used wherever the Bloom filter code is used.  Objects in the
> > >> > o.a.c.c.bloomfilter.hasher classes are Hashers and are really only
> > >> selected
> > >> > when the application developer determines which hasher is best
> suited
> > >> for
> > >> > the environment.  Finally o.a.c.c.bloomfilter.hasher.function are
> > >> > HasherFunction implementations.  These can be used or special cases
> > >> built
> > >> > by the implementer as necessary.  Perhaps I am being a bit too
> pedantic
> > >> but
> > >> > I do not think it makes sense to merge them into a single package.
> > >> >
> > >>
> > >> Let's leave it as is then.
> > >>
> > >> Side note: I'll need to release Commons Codec 1.14 before we can
> bring in
> > >> this PR. I hope to do this after Commons VFS has gone through its own
> > >> release cycle (which should be done in 48 hours or so if all goes
> well.)
> > >> Well, we could bring in the PR but depending on a SNAPSHOT build is
> > >> usually
> > >> not a good idea unless it is truly short term.
> > >>
> > >> Thank you,
> > >> Gary
> > >>
> > >>
> > >> >
> > >> >
> > >> > >
> > >> > > Q: All the code in main's
> > >> > > org.apache.commons.collections4.bloomfilter.hasher.function is
> only
> > >> > called
> > >> > > from test code. Why is that?
> > >> > >
> > >> >
> > >> > They are pluggable modules.  When a developer uses them they an
> > >> instance is
> > >> > created and it is passed to the BloomFilter.Shape() constructor as
> well
> > >> as
> > >> > to the various Hasher constructors.  For example:
> > >> >
> > >> >     HashFunction hashFunction = new Murmur128x86Cyclic();
> > >> >     Shape bloomFilterConfig = new Shape( hashFunction, 3, 1.0 /
> 100000);
> > >> >
> > >> >
> > >> > >
> > >> > > Nit: Some odd formatting like spaces after open parens:
> 'toUpperCase(
> > >> > > Locale.ROOT)' should be  'toUpperCase(Locale.ROOT)'
> > >> > >
> > >> > > Gary
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > I like: Like Like - The likeliest place on the web
> > >> > <http://like-like.xenei.com>
> > >> > LinkedIn: http://www.linkedin.com/in/claudewarren
> > >> >
> > >>
> > >
> > >
> > > --
> > > I like: Like Like - The likeliest place on the web
> > > <http://like-like.xenei.com>
> > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > >
> >
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to