If the Shape class (BloomFilter.Shape) is extracted from the BloomFilter
interface and made a stand-alone class would the name change or is the fact
that it is in the o.a.c.c.bloomfilter package enough?

I prefer the name Shape to BloomFilterShape.

Claude

On Sat, Dec 28, 2019 at 9:06 AM Claude Warren <cla...@xenei.com> wrote:

> Once the interface is extracted and reduced to the minimum necessary the
> following methods are removed:
>
> orCardinality() -- we have andCardinality() and xorCardinality() this was
> included for completeness.
>
> isFull() -- true if all the bits in the vector are on. A convenience
> method used to short circuit some logic.
>
> I think these 2 methods should go into the BloomFilter interface.
>
>
> Set operations:
>
> jaccardSimilarity -- a measure of similarity in sets with range [0,1]
>
> jaccardDistance -- defined as 1 - jaccardSimilarity
>
> cosineSimilarity -- a measure of similarity with range  [0,1]
>
> cosineDistance -- defined as 1 - cosineSimilarity
>
> estimateSize -- Set operation estimate of the number of items that were
> placed in the filter. (Requires Shape)
>
> estimateUnionSize -- Set operation estimate of the number of items that
> would be represented by the merging of two filters. (Requires Shape)
>
> estimateIntersectionSize -- Set operations estimate of the number of items
> in the intersection. (Requires Shape)
>
> Perhaps it makes sense to move the Set operations into their own class
> with static methods?  The set operations all operate on 2 Bloom filters.
> Moving them would clarify the AbstractBloomFilter class.
>
> Claude
>
>
> On Sat, Dec 28, 2019 at 2:01 AM Gary Gregory <garydgreg...@gmail.com>
> wrote:
>
>> On Fri, Dec 27, 2019 at 11:36 AM Claude Warren <cla...@xenei.com> wrote:
>>
>> > On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory <garydgreg...@gmail.com>
>> > wrote:
>> >
>> > > Hi Claude and all:
>> > >
>> > > Here are a couple of comments on the bloom filter PR.
>> > >
>> > > 10,100 ft level comment we do not have to worry about today: Before
>> the
>> > > release, we might want to split Commons Collection into a multi-module
>> > > project and have the BF code in it own module. The _main_ reason for
>> this
>> > > is that it will allow the dependency on Commons Codecis be a
>> non-optional
>> > > dependency. Optional dependency are a a bit of a pain IMO, especially
>> > when
>> > > you deal with large stacks. You end up sucking in everything that is
>> > > optional when you deliver an app because you do not know for certain
>> > what's
>> > > really required at runtime.
>> > >
>> > > Closer to the ground: I would make BloomFilter an interface and rename
>> > > the current BloomFilter class AbstractBloomFilter implements
>> BloomFilter.
>> > > This will allow the user and maintainer to see what is truly public.
>> > >
>> >
>> > I have done this (not checked in) but the net effect is that all the
>> public
>> > methods in the AbstractBloomFilter class are reflected in the
>> BloomFilter
>> > interface and the Shape class has moved to the Interface as well.
>> >
>>
>> OK, I think I like BloomFilter as an interface especially since it is used
>> as an argument in methods. I'll leave it to you as to whether Shape needs
>> to be folded in. I did notice that Shape is an argument in a few places.
>> Might we loose some focus and abstraction by folding Shape into the
>> BloomFilter interface?
>>
>>
>> > I can remove several methods from BloomFilter that are not strictly
>> > necessary for this code to function.  I am and have been reticent to do
>> so
>> > since there are a number of home-grown libraries used by various
>> > researchers that provide these functions.  But if that is what it takes
>> to
>> > get this out the door it will be done.
>> >
>>
>> Once you have BloomFilter as an interface with an implementing class, it
>> might make it much clearer as to what really belongs in the interface. The
>> handy methods can stay in the abstract class of course.
>>
>>
>> >
>> > > Since this is a first cut for Commons Collection, I would consider
>> only
>> > > making APIs public that must be public. Once released, we MUST
>> maintain
>> > > binary compatibility within minor releases. Major releases allow us to
>> > > break this compatibility but these do not happen very often. I do not
>> > know
>> > > what this means yet for BF but it's a thought to consider. This kind
>> of
>> > > work is made harder due to the current packaging of the BF code.
>> > >
>> > > We could consider putting it all in one package if that helps reduce
>> the
>> > > public API footprint.
>> > >
>> >
>> > I think that putting all the pieces into a single package will muddy the
>> > waters a bit.  The classes in o.a.c.c.bloomfilter are high level classes
>> > and used wherever the Bloom filter code is used.  Objects in the
>> > o.a.c.c.bloomfilter.hasher classes are Hashers and are really only
>> selected
>> > when the application developer determines which hasher is best suited
>> for
>> > the environment.  Finally o.a.c.c.bloomfilter.hasher.function are
>> > HasherFunction implementations.  These can be used or special cases
>> built
>> > by the implementer as necessary.  Perhaps I am being a bit too pedantic
>> but
>> > I do not think it makes sense to merge them into a single package.
>> >
>>
>> Let's leave it as is then.
>>
>> Side note: I'll need to release Commons Codec 1.14 before we can bring in
>> this PR. I hope to do this after Commons VFS has gone through its own
>> release cycle (which should be done in 48 hours or so if all goes well.)
>> Well, we could bring in the PR but depending on a SNAPSHOT build is
>> usually
>> not a good idea unless it is truly short term.
>>
>> Thank you,
>> Gary
>>
>>
>> >
>> >
>> > >
>> > > Q: All the code in main's
>> > > org.apache.commons.collections4.bloomfilter.hasher.function is only
>> > called
>> > > from test code. Why is that?
>> > >
>> >
>> > They are pluggable modules.  When a developer uses them they an
>> instance is
>> > created and it is passed to the BloomFilter.Shape() constructor as well
>> as
>> > to the various Hasher constructors.  For example:
>> >
>> >     HashFunction hashFunction = new Murmur128x86Cyclic();
>> >     Shape bloomFilterConfig = new Shape( hashFunction, 3, 1.0 / 100000);
>> >
>> >
>> > >
>> > > Nit: Some odd formatting like spaces after open parens: 'toUpperCase(
>> > > Locale.ROOT)' should be  'toUpperCase(Locale.ROOT)'
>> > >
>> > > Gary
>> > >
>> >
>> >
>> > --
>> > I like: Like Like - The likeliest place on the web
>> > <http://like-like.xenei.com>
>> > LinkedIn: http://www.linkedin.com/in/claudewarren
>> >
>>
>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to