On 2024/04/30 14:33:47 Alex Herbert wrote:
> On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org> wrote:
> 
> > Hi Claude,
> >
> > Thank you for the detailed reply :-) A few comments below.
> >
> > On 2024/04/30 06:29:38 Claude Warren wrote:
> > > I will see if I can clarify the javadocs and make things clearer.
> > >
> > > What I think I specifically heard is:
> > >
> > >    - Be clear that producers are fast fail iterators with predicate
> > tests.
> > >    - Rename CellConsumer to CellPredicate (?)
> >
> > Agreed (as suggested by Albert)
> >
> > >    - The semantic nomenclature:
> > >       - Bitmaps are arrays of bits not a BitMap object.
> > >       - Indexes are ints and not an instance of a Collection object.
> > >       - Cells are pairs of ints representing an index and a value.  They
> > >       are not Pair<> objects.
> > >       - Producers iterate over collections of the object (Bitmap, Index,
> > >       Cell) applying a predicate to do work and stop the iteration early
> > if
> > >       necessary.  They are carriers/transporters of Bloom filter enabled
> > bits.
> > >       They allow us to query the contents of the Bloom filter in an
> > >       implementation agnostic way.
> >
> > As you say naming is hard. The above is a great example and a good
> > exercise I've gone through at work and in other FOSS projects: "Producers
> > iterate over collections of the object...". In general when I see or write
> > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run", you
> > get the idea ;-) I know that either the class (or method) name is bad or
> > the Javadoc/documentation is bad; not _wrong_, just bad in the sense that
> > it's confusing (to me).
> >
> > I am not advocating for a specific change ATM but I want to discuss the
> > option because it is possible the current name is not as good as it could
> > be. It could end up as an acceptable compromise if we cannot use more Java
> > friendly terms though.
> >
> > Whenever I see a class that implements a "forEach"-kind of method, I think
> > "Iterable".
> >
> 
> Here we should think "Collection", or generally more than 1. In the Java
> sense an Iterable is something you can walk through to the
> end, possibly removing elements as you go using the Iterator interface. We
> would not require supporting removal, and we want to control a
> short-circuit. We could make this distinction by including it in the name:
> forEachUntil(Predicate ...), forEachUnless, ...

I really like the idea of have the method name reflect the short-circuit aspect 
of the operation!

We do not have to invent a new method name though, Stream uses 
"takeWhile(Predicate)" in Java 9: 
https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate-

The question becomes whether this name make the class too close to a Stream and 
introduces some kind of confusion. If that's the case, then forEachUntil() is 
good.

> 
> 
> >
> > Note the difference with "Iterator", and I had to lookup the difference
> > since the former implements "forEach" and the  later "forEachRemaining"!
> > "Iterable" is also a factory of "Iterator"s.
> >
> > Should the Producers ever be implementations of Iterable or Iterator?
> > Right now, the answer is no because of the short-circuit aspect of using a
> > predicate. I'm not using the term fail-fast here because I don't think of
> > the iteration being in error (please tell me if I'm wrong).
> >
> > If not iterable, then we should not use that name as part of the class
> > name. Generally, the short-circuit aspect of Producers do not make a bad
> > candidates for implementations of Iterable since it can throw (unchecked)
> > exceptions. Different for call sites granted, but I'm just mentioning it
> > for fun.
> >
> 
> I already mentioned throwing runtime exceptions for the short-circuit
> functionality, and that it was ruled out on the basis of performance (given
> a lot of short-circuiting is expected) and convenience for the caller. I
> don't think we should go there. Design the API for the intended purpose,
> and not push it into a box that is easily recognisable.

Sounds good.

> 
> 
> >
> > So maybe there's nothing to do. I just want to be clear about it. For
> > example, I think of "factory" and "producer" as synonyms but in this case,
> > this is not a traditional application of the factory pattern.
> >
> > As an aside I can see that Producers would not be Streams out of the box
> > because Stream#filter(Predicate) filters but does not short-circuit
> > iteration. While Stream#findAny() and #findFirst() don't fit the
> > short-circuit bill, we could implement a #findLast() in a Stream
> > implementation. What I do not know is if Streams otherwise fit the bill of
> > Bloom filters.
> >
> 
> In the case of small loops with few instructions per loop, the overhead of
> Streams is significant. Unfortunately we don't have any performance tests
> for this library, but I wouldn't change to Streams without knowing it does
> not impact performance. Performance is a key feature of Bloom filters.
> Otherwise you can achieve some of their functionality with conventional
> collections.

OK.

> 
> 
> >
> > >
> > > Does that basically cover the confusion?   If there are better terms,
> > let's
> > > hash them out now before I update the javadocs.
> > >
> > > As an aside, Cells and Bitmaps are referenced in the literature.  For the
> > > most part the rest is made up out of whole cloth.  So we could change
> > > "Producer" to something else but we would need a good name.
> >
> > We have a class called BitMap and methods that use "BitMap" in the same
> > but I think I am more comfortable with the term reuse now.
> >
> > The question that remains is must it be public? Since the Javadoc mentions
> > it is about indices and bit positions, could all these methods be moved to
> > the package-private IndexUtils? My concern is to reduce the public and
> > protected API surface we will have to support and keep for the lifetime of
> > the 4.x code base.
> >
> 
> It was originally package-private and Claude requested it be made public
> for general re-use.
> 
> We could use the name BitMaps, in keeping with the pluralisation of classes
> seen in the JDK when working with elements of a given type: Executors,
> Collections, etc.

Agreed! I really like the plural convention for these static utility classes.

Gary

> 
> Alex
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to