Once the interface is extracted and reduced to the minimum necessary the
following methods are removed:

orCardinality() -- we have andCardinality() and xorCardinality() this was
included for completeness.

isFull() -- true if all the bits in the vector are on. A convenience method
used to short circuit some logic.

I think these 2 methods should go into the BloomFilter interface.


Set operations:

jaccardSimilarity -- a measure of similarity in sets with range [0,1]

jaccardDistance -- defined as 1 - jaccardSimilarity

cosineSimilarity -- a measure of similarity with range  [0,1]

cosineDistance -- defined as 1 - cosineSimilarity

estimateSize -- Set operation estimate of the number of items that were
placed in the filter. (Requires Shape)

estimateUnionSize -- Set operation estimate of the number of items that
would be represented by the merging of two filters. (Requires Shape)

estimateIntersectionSize -- Set operations estimate of the number of items
in the intersection. (Requires Shape)

Perhaps it makes sense to move the Set operations into their own class with
static methods?  The set operations all operate on 2 Bloom filters.  Moving
them would clarify the AbstractBloomFilter class.

Claude


On Sat, Dec 28, 2019 at 2:01 AM Gary Gregory <garydgreg...@gmail.com> wrote:

> On Fri, Dec 27, 2019 at 11:36 AM Claude Warren <cla...@xenei.com> wrote:
>
> > On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory <garydgreg...@gmail.com>
> > wrote:
> >
> > > Hi Claude and all:
> > >
> > > Here are a couple of comments on the bloom filter PR.
> > >
> > > 10,100 ft level comment we do not have to worry about today: Before the
> > > release, we might want to split Commons Collection into a multi-module
> > > project and have the BF code in it own module. The _main_ reason for
> this
> > > is that it will allow the dependency on Commons Codecis be a
> non-optional
> > > dependency. Optional dependency are a a bit of a pain IMO, especially
> > when
> > > you deal with large stacks. You end up sucking in everything that is
> > > optional when you deliver an app because you do not know for certain
> > what's
> > > really required at runtime.
> > >
> > > Closer to the ground: I would make BloomFilter an interface and rename
> > > the current BloomFilter class AbstractBloomFilter implements
> BloomFilter.
> > > This will allow the user and maintainer to see what is truly public.
> > >
> >
> > I have done this (not checked in) but the net effect is that all the
> public
> > methods in the AbstractBloomFilter class are reflected in the BloomFilter
> > interface and the Shape class has moved to the Interface as well.
> >
>
> OK, I think I like BloomFilter as an interface especially since it is used
> as an argument in methods. I'll leave it to you as to whether Shape needs
> to be folded in. I did notice that Shape is an argument in a few places.
> Might we loose some focus and abstraction by folding Shape into the
> BloomFilter interface?
>
>
> > I can remove several methods from BloomFilter that are not strictly
> > necessary for this code to function.  I am and have been reticent to do
> so
> > since there are a number of home-grown libraries used by various
> > researchers that provide these functions.  But if that is what it takes
> to
> > get this out the door it will be done.
> >
>
> Once you have BloomFilter as an interface with an implementing class, it
> might make it much clearer as to what really belongs in the interface. The
> handy methods can stay in the abstract class of course.
>
>
> >
> > > Since this is a first cut for Commons Collection, I would consider only
> > > making APIs public that must be public. Once released, we MUST maintain
> > > binary compatibility within minor releases. Major releases allow us to
> > > break this compatibility but these do not happen very often. I do not
> > know
> > > what this means yet for BF but it's a thought to consider. This kind of
> > > work is made harder due to the current packaging of the BF code.
> > >
> > > We could consider putting it all in one package if that helps reduce
> the
> > > public API footprint.
> > >
> >
> > I think that putting all the pieces into a single package will muddy the
> > waters a bit.  The classes in o.a.c.c.bloomfilter are high level classes
> > and used wherever the Bloom filter code is used.  Objects in the
> > o.a.c.c.bloomfilter.hasher classes are Hashers and are really only
> selected
> > when the application developer determines which hasher is best suited for
> > the environment.  Finally o.a.c.c.bloomfilter.hasher.function are
> > HasherFunction implementations.  These can be used or special cases built
> > by the implementer as necessary.  Perhaps I am being a bit too pedantic
> but
> > I do not think it makes sense to merge them into a single package.
> >
>
> Let's leave it as is then.
>
> Side note: I'll need to release Commons Codec 1.14 before we can bring in
> this PR. I hope to do this after Commons VFS has gone through its own
> release cycle (which should be done in 48 hours or so if all goes well.)
> Well, we could bring in the PR but depending on a SNAPSHOT build is usually
> not a good idea unless it is truly short term.
>
> Thank you,
> Gary
>
>
> >
> >
> > >
> > > Q: All the code in main's
> > > org.apache.commons.collections4.bloomfilter.hasher.function is only
> > called
> > > from test code. Why is that?
> > >
> >
> > They are pluggable modules.  When a developer uses them they an instance
> is
> > created and it is passed to the BloomFilter.Shape() constructor as well
> as
> > to the various Hasher constructors.  For example:
> >
> >     HashFunction hashFunction = new Murmur128x86Cyclic();
> >     Shape bloomFilterConfig = new Shape( hashFunction, 3, 1.0 / 100000);
> >
> >
> > >
> > > Nit: Some odd formatting like spaces after open parens: 'toUpperCase(
> > > Locale.ROOT)' should be  'toUpperCase(Locale.ROOT)'
> > >
> > > Gary
> > >
> >
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to