It is currently a sub-class. There was a suggestion to move it outside of the BloomFilter interface.
On Sun, Dec 29, 2019 at 3:47 PM Gilles Sadowski <gillese...@gmail.com> wrote: > Le dim. 29 déc. 2019 à 15:30, Claude Warren <cla...@xenei.com> a écrit : > > > > If the Shape class (BloomFilter.Shape) is extracted from the BloomFilter > > interface and made a stand-alone class would the name change or is the > fact > > that it is in the o.a.c.c.bloomfilter package enough? > > > > I prefer the name Shape to BloomFilterShape. > > If "Shape" is only used for "BloomFilter" (as the suggestion above > seems to indicate, why not declare it as a sub-interface: > ---CUT--- > public interface BloomFilter { > // ... > > public interface Shape { > // ... > } > } > ---CUT--- > ? > > Regards, > Gilles > > > > > Claude > > > > On Sat, Dec 28, 2019 at 9:06 AM Claude Warren <cla...@xenei.com> wrote: > > > > > Once the interface is extracted and reduced to the minimum necessary > the > > > following methods are removed: > > > > > > orCardinality() -- we have andCardinality() and xorCardinality() this > was > > > included for completeness. > > > > > > isFull() -- true if all the bits in the vector are on. A convenience > > > method used to short circuit some logic. > > > > > > I think these 2 methods should go into the BloomFilter interface. > > > > > > > > > Set operations: > > > > > > jaccardSimilarity -- a measure of similarity in sets with range [0,1] > > > > > > jaccardDistance -- defined as 1 - jaccardSimilarity > > > > > > cosineSimilarity -- a measure of similarity with range [0,1] > > > > > > cosineDistance -- defined as 1 - cosineSimilarity > > > > > > estimateSize -- Set operation estimate of the number of items that were > > > placed in the filter. (Requires Shape) > > > > > > estimateUnionSize -- Set operation estimate of the number of items that > > > would be represented by the merging of two filters. (Requires Shape) > > > > > > estimateIntersectionSize -- Set operations estimate of the number of > items > > > in the intersection. (Requires Shape) > > > > > > Perhaps it makes sense to move the Set operations into their own class > > > with static methods? The set operations all operate on 2 Bloom > filters. > > > Moving them would clarify the AbstractBloomFilter class. > > > > > > Claude > > > > > > > > > On Sat, Dec 28, 2019 at 2:01 AM Gary Gregory <garydgreg...@gmail.com> > > > wrote: > > > > > >> On Fri, Dec 27, 2019 at 11:36 AM Claude Warren <cla...@xenei.com> > wrote: > > >> > > >> > On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory < > garydgreg...@gmail.com> > > >> > wrote: > > >> > > > >> > > Hi Claude and all: > > >> > > > > >> > > Here are a couple of comments on the bloom filter PR. > > >> > > > > >> > > 10,100 ft level comment we do not have to worry about today: > Before > > >> the > > >> > > release, we might want to split Commons Collection into a > multi-module > > >> > > project and have the BF code in it own module. The _main_ reason > for > > >> this > > >> > > is that it will allow the dependency on Commons Codecis be a > > >> non-optional > > >> > > dependency. Optional dependency are a a bit of a pain IMO, > especially > > >> > when > > >> > > you deal with large stacks. You end up sucking in everything that > is > > >> > > optional when you deliver an app because you do not know for > certain > > >> > what's > > >> > > really required at runtime. > > >> > > > > >> > > Closer to the ground: I would make BloomFilter an interface and > rename > > >> > > the current BloomFilter class AbstractBloomFilter implements > > >> BloomFilter. > > >> > > This will allow the user and maintainer to see what is truly > public. > > >> > > > > >> > > > >> > I have done this (not checked in) but the net effect is that all the > > >> public > > >> > methods in the AbstractBloomFilter class are reflected in the > > >> BloomFilter > > >> > interface and the Shape class has moved to the Interface as well. > > >> > > > >> > > >> OK, I think I like BloomFilter as an interface especially since it is > used > > >> as an argument in methods. I'll leave it to you as to whether Shape > needs > > >> to be folded in. I did notice that Shape is an argument in a few > places. > > >> Might we loose some focus and abstraction by folding Shape into the > > >> BloomFilter interface? > > >> > > >> > > >> > I can remove several methods from BloomFilter that are not strictly > > >> > necessary for this code to function. I am and have been reticent > to do > > >> so > > >> > since there are a number of home-grown libraries used by various > > >> > researchers that provide these functions. But if that is what it > takes > > >> to > > >> > get this out the door it will be done. > > >> > > > >> > > >> Once you have BloomFilter as an interface with an implementing class, > it > > >> might make it much clearer as to what really belongs in the > interface. The > > >> handy methods can stay in the abstract class of course. > > >> > > >> > > >> > > > >> > > Since this is a first cut for Commons Collection, I would consider > > >> only > > >> > > making APIs public that must be public. Once released, we MUST > > >> maintain > > >> > > binary compatibility within minor releases. Major releases allow > us to > > >> > > break this compatibility but these do not happen very often. I do > not > > >> > know > > >> > > what this means yet for BF but it's a thought to consider. This > kind > > >> of > > >> > > work is made harder due to the current packaging of the BF code. > > >> > > > > >> > > We could consider putting it all in one package if that helps > reduce > > >> the > > >> > > public API footprint. > > >> > > > > >> > > > >> > I think that putting all the pieces into a single package will > muddy the > > >> > waters a bit. The classes in o.a.c.c.bloomfilter are high level > classes > > >> > and used wherever the Bloom filter code is used. Objects in the > > >> > o.a.c.c.bloomfilter.hasher classes are Hashers and are really only > > >> selected > > >> > when the application developer determines which hasher is best > suited > > >> for > > >> > the environment. Finally o.a.c.c.bloomfilter.hasher.function are > > >> > HasherFunction implementations. These can be used or special cases > > >> built > > >> > by the implementer as necessary. Perhaps I am being a bit too > pedantic > > >> but > > >> > I do not think it makes sense to merge them into a single package. > > >> > > > >> > > >> Let's leave it as is then. > > >> > > >> Side note: I'll need to release Commons Codec 1.14 before we can > bring in > > >> this PR. I hope to do this after Commons VFS has gone through its own > > >> release cycle (which should be done in 48 hours or so if all goes > well.) > > >> Well, we could bring in the PR but depending on a SNAPSHOT build is > > >> usually > > >> not a good idea unless it is truly short term. > > >> > > >> Thank you, > > >> Gary > > >> > > >> > > >> > > > >> > > > >> > > > > >> > > Q: All the code in main's > > >> > > org.apache.commons.collections4.bloomfilter.hasher.function is > only > > >> > called > > >> > > from test code. Why is that? > > >> > > > > >> > > > >> > They are pluggable modules. When a developer uses them they an > > >> instance is > > >> > created and it is passed to the BloomFilter.Shape() constructor as > well > > >> as > > >> > to the various Hasher constructors. For example: > > >> > > > >> > HashFunction hashFunction = new Murmur128x86Cyclic(); > > >> > Shape bloomFilterConfig = new Shape( hashFunction, 3, 1.0 / > 100000); > > >> > > > >> > > > >> > > > > >> > > Nit: Some odd formatting like spaces after open parens: > 'toUpperCase( > > >> > > Locale.ROOT)' should be 'toUpperCase(Locale.ROOT)' > > >> > > > > >> > > Gary > > >> > > > > >> > > > >> > > > >> > -- > > >> > I like: Like Like - The likeliest place on the web > > >> > <http://like-like.xenei.com> > > >> > LinkedIn: http://www.linkedin.com/in/claudewarren > > >> > > > >> > > > > > > > > > -- > > > I like: Like Like - The likeliest place on the web > > > <http://like-like.xenei.com> > > > LinkedIn: http://www.linkedin.com/in/claudewarren > > > > > > > > > -- > > I like: Like Like - The likeliest place on the web > > <http://like-like.xenei.com> > > LinkedIn: http://www.linkedin.com/in/claudewarren > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren