Good suggestions.

short-circuit. We could make this distinction by including it in the name:
> forEachUntil(Predicate ...), forEachUnless, ...


We need the unit name in the method name.  All Bloom filters implement
IndexProducer and BitmapProducer and since they use Predicate method
parameters they will conflict.


I have opened a ticket [1] with the list of tasks, which I think is now:

   - Be clear that producers are like interruptible iterators with
   predicate tests acting as a switch to short-circuit the iteration.
   - Rename classes:
      - CellConsumer to CellPredicate (?)
      - Rename BitMap to BitMaps.
   - Rename methods:
      - Producer forEachX() to forEachUntil()
      - The semantic nomenclature:
      - Bitmaps are arrays of bits not a BitMaps object.
      - Indexes are ints and not an instance of a Collection object.
      - Cells are pairs of ints representing an index and a value.  They
      are not Pair<> objects.
      - Producers iterate over collections of the object (Bitmap, Index,
      Cell) applying a predicate to do work and stop the iteration early if
      necessary.  They are carriers/transporters of Bloom filter enabled bits.
      They allow us to query the contents of the Bloom filter in an
      implementation agnostic way.


In thinking about the term Producer, other terms could be used Interrogator
(sounds like you can add a query), Extractor might work.  But it has also
come to mind that there is a "compute" series of methods in the
ConcurrentMap class.  Perhaps the term we want is not "forEach", but
"process".  The current form of usage is something like:

IndexProducer ip = ....
ip.forEachIndex(idx -> someIntPredicate)

We could change the name from XProducer to XProcessor, or XExtractor; and
the method to processXs.  So the above code would look like:

IndexExtractor ix = ....
ix.processIndexs(idx -> someIntPredicate)

another example

BitMapExtractor bx = .....
bx.processBitMaps(bitmap -> someBitMapPredicate)

Claude

[1] https://issues.apache.org/jira/browse/COLLECTIONS-854


On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory <ggreg...@apache.org> wrote:

>
>
> On 2024/04/30 14:33:47 Alex Herbert wrote:
> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org>
> wrote:
> >
> > > Hi Claude,
> > >
> > > Thank you for the detailed reply :-) A few comments below.
> > >
> > > On 2024/04/30 06:29:38 Claude Warren wrote:
> > > > I will see if I can clarify the javadocs and make things clearer.
> > > >
> > > > What I think I specifically heard is:
> > > >
> > > >    - Be clear that producers are fast fail iterators with predicate
> > > tests.
> > > >    - Rename CellConsumer to CellPredicate (?)
> > >
> > > Agreed (as suggested by Albert)
> > >
> > > >    - The semantic nomenclature:
> > > >       - Bitmaps are arrays of bits not a BitMap object.
> > > >       - Indexes are ints and not an instance of a Collection object.
> > > >       - Cells are pairs of ints representing an index and a value.
> They
> > > >       are not Pair<> objects.
> > > >       - Producers iterate over collections of the object (Bitmap,
> Index,
> > > >       Cell) applying a predicate to do work and stop the iteration
> early
> > > if
> > > >       necessary.  They are carriers/transporters of Bloom filter
> enabled
> > > bits.
> > > >       They allow us to query the contents of the Bloom filter in an
> > > >       implementation agnostic way.
> > >
> > > As you say naming is hard. The above is a great example and a good
> > > exercise I've gone through at work and in other FOSS projects:
> "Producers
> > > iterate over collections of the object...". In general when I see or
> write
> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run",
> you
> > > get the idea ;-) I know that either the class (or method) name is bad
> or
> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense
> that
> > > it's confusing (to me).
> > >
> > > I am not advocating for a specific change ATM but I want to discuss the
> > > option because it is possible the current name is not as good as it
> could
> > > be. It could end up as an acceptable compromise if we cannot use more
> Java
> > > friendly terms though.
> > >
> > > Whenever I see a class that implements a "forEach"-kind of method, I
> think
> > > "Iterable".
> > >
> >
> > Here we should think "Collection", or generally more than 1. In the Java
> > sense an Iterable is something you can walk through to the
> > end, possibly removing elements as you go using the Iterator interface.
> We
> > would not require supporting removal, and we want to control a
> > short-circuit. We could make this distinction by including it in the
> name:
> > forEachUntil(Predicate ...), forEachUnless, ...
>
> I really like the idea of have the method name reflect the short-circuit
> aspect of the operation!
>
> We do not have to invent a new method name though, Stream uses
> "takeWhile(Predicate)" in Java 9:
> https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate-
>
> The question becomes whether this name make the class too close to a
> Stream and introduces some kind of confusion. If that's the case, then
> forEachUntil() is good.
>
> >
> >
> > >
> > > Note the difference with "Iterator", and I had to lookup the difference
> > > since the former implements "forEach" and the  later
> "forEachRemaining"!
> > > "Iterable" is also a factory of "Iterator"s.
> > >
> > > Should the Producers ever be implementations of Iterable or Iterator?
> > > Right now, the answer is no because of the short-circuit aspect of
> using a
> > > predicate. I'm not using the term fail-fast here because I don't think
> of
> > > the iteration being in error (please tell me if I'm wrong).
> > >
> > > If not iterable, then we should not use that name as part of the class
> > > name. Generally, the short-circuit aspect of Producers do not make a
> bad
> > > candidates for implementations of Iterable since it can throw
> (unchecked)
> > > exceptions. Different for call sites granted, but I'm just mentioning
> it
> > > for fun.
> > >
> >
> > I already mentioned throwing runtime exceptions for the short-circuit
> > functionality, and that it was ruled out on the basis of performance
> (given
> > a lot of short-circuiting is expected) and convenience for the caller. I
> > don't think we should go there. Design the API for the intended purpose,
> > and not push it into a box that is easily recognisable.
>
> Sounds good.
>
> >
> >
> > >
> > > So maybe there's nothing to do. I just want to be clear about it. For
> > > example, I think of "factory" and "producer" as synonyms but in this
> case,
> > > this is not a traditional application of the factory pattern.
> > >
> > > As an aside I can see that Producers would not be Streams out of the
> box
> > > because Stream#filter(Predicate) filters but does not short-circuit
> > > iteration. While Stream#findAny() and #findFirst() don't fit the
> > > short-circuit bill, we could implement a #findLast() in a Stream
> > > implementation. What I do not know is if Streams otherwise fit the
> bill of
> > > Bloom filters.
> > >
> >
> > In the case of small loops with few instructions per loop, the overhead
> of
> > Streams is significant. Unfortunately we don't have any performance tests
> > for this library, but I wouldn't change to Streams without knowing it
> does
> > not impact performance. Performance is a key feature of Bloom filters.
> > Otherwise you can achieve some of their functionality with conventional
> > collections.
>
> OK.
>
> >
> >
> > >
> > > >
> > > > Does that basically cover the confusion?   If there are better terms,
> > > let's
> > > > hash them out now before I update the javadocs.
> > > >
> > > > As an aside, Cells and Bitmaps are referenced in the literature.
> For the
> > > > most part the rest is made up out of whole cloth.  So we could change
> > > > "Producer" to something else but we would need a good name.
> > >
> > > We have a class called BitMap and methods that use "BitMap" in the same
> > > but I think I am more comfortable with the term reuse now.
> > >
> > > The question that remains is must it be public? Since the Javadoc
> mentions
> > > it is about indices and bit positions, could all these methods be
> moved to
> > > the package-private IndexUtils? My concern is to reduce the public and
> > > protected API surface we will have to support and keep for the
> lifetime of
> > > the 4.x code base.
> > >
> >
> > It was originally package-private and Claude requested it be made public
> > for general re-use.
> >
> > We could use the name BitMaps, in keeping with the pluralisation of
> classes
> > seen in the JDK when working with elements of a given type: Executors,
> > Collections, etc.
>
> Agreed! I really like the plural convention for these static utility
> classes.
>
> Gary
>
> >
> > Alex
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to