LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does that PR look to you?
Gary On Fri, May 3, 2024, 11:44 AM Claude Warren <cla...@xenei.com> wrote: > Gary and Alex, > > Any thoughts on this? > > Claude > > On Wed, May 1, 2024 at 7:55 AM Claude Warren <cla...@xenei.com> wrote: > >> Good suggestions. >> >> short-circuit. We could make this distinction by including it in the name: >>> forEachUntil(Predicate ...), forEachUnless, ... >> >> >> We need the unit name in the method name. All Bloom filters implement >> IndexProducer and BitmapProducer and since they use Predicate method >> parameters they will conflict. >> >> >> I have opened a ticket [1] with the list of tasks, which I think is now: >> >> - Be clear that producers are like interruptible iterators with >> predicate tests acting as a switch to short-circuit the iteration. >> - Rename classes: >> - CellConsumer to CellPredicate (?) >> - Rename BitMap to BitMaps. >> - Rename methods: >> - Producer forEachX() to forEachUntil() >> - The semantic nomenclature: >> - Bitmaps are arrays of bits not a BitMaps object. >> - Indexes are ints and not an instance of a Collection object. >> - Cells are pairs of ints representing an index and a value. They >> are not Pair<> objects. >> - Producers iterate over collections of the object (Bitmap, Index, >> Cell) applying a predicate to do work and stop the iteration early if >> necessary. They are carriers/transporters of Bloom filter enabled >> bits. >> They allow us to query the contents of the Bloom filter in an >> implementation agnostic way. >> >> >> In thinking about the term Producer, other terms could be used >> Interrogator (sounds like you can add a query), Extractor might work. But >> it has also come to mind that there is a "compute" series of methods in the >> ConcurrentMap class. Perhaps the term we want is not "forEach", but >> "process". The current form of usage is something like: >> >> IndexProducer ip = .... >> ip.forEachIndex(idx -> someIntPredicate) >> >> We could change the name from XProducer to XProcessor, or XExtractor; and >> the method to processXs. So the above code would look like: >> >> IndexExtractor ix = .... >> ix.processIndexs(idx -> someIntPredicate) >> >> another example >> >> BitMapExtractor bx = ..... >> bx.processBitMaps(bitmap -> someBitMapPredicate) >> >> Claude >> >> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854 >> >> >> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory <ggreg...@apache.org> >> wrote: >> >>> >>> >>> On 2024/04/30 14:33:47 Alex Herbert wrote: >>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org> >>> wrote: >>> > >>> > > Hi Claude, >>> > > >>> > > Thank you for the detailed reply :-) A few comments below. >>> > > >>> > > On 2024/04/30 06:29:38 Claude Warren wrote: >>> > > > I will see if I can clarify the javadocs and make things clearer. >>> > > > >>> > > > What I think I specifically heard is: >>> > > > >>> > > > - Be clear that producers are fast fail iterators with predicate >>> > > tests. >>> > > > - Rename CellConsumer to CellPredicate (?) >>> > > >>> > > Agreed (as suggested by Albert) >>> > > >>> > > > - The semantic nomenclature: >>> > > > - Bitmaps are arrays of bits not a BitMap object. >>> > > > - Indexes are ints and not an instance of a Collection >>> object. >>> > > > - Cells are pairs of ints representing an index and a >>> value. They >>> > > > are not Pair<> objects. >>> > > > - Producers iterate over collections of the object (Bitmap, >>> Index, >>> > > > Cell) applying a predicate to do work and stop the iteration >>> early >>> > > if >>> > > > necessary. They are carriers/transporters of Bloom filter >>> enabled >>> > > bits. >>> > > > They allow us to query the contents of the Bloom filter in an >>> > > > implementation agnostic way. >>> > > >>> > > As you say naming is hard. The above is a great example and a good >>> > > exercise I've gone through at work and in other FOSS projects: >>> "Producers >>> > > iterate over collections of the object...". In general when I see or >>> write >>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run", >>> you >>> > > get the idea ;-) I know that either the class (or method) name is >>> bad or >>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense >>> that >>> > > it's confusing (to me). >>> > > >>> > > I am not advocating for a specific change ATM but I want to discuss >>> the >>> > > option because it is possible the current name is not as good as it >>> could >>> > > be. It could end up as an acceptable compromise if we cannot use >>> more Java >>> > > friendly terms though. >>> > > >>> > > Whenever I see a class that implements a "forEach"-kind of method, I >>> think >>> > > "Iterable". >>> > > >>> > >>> > Here we should think "Collection", or generally more than 1. In the >>> Java >>> > sense an Iterable is something you can walk through to the >>> > end, possibly removing elements as you go using the Iterator >>> interface. We >>> > would not require supporting removal, and we want to control a >>> > short-circuit. We could make this distinction by including it in the >>> name: >>> > forEachUntil(Predicate ...), forEachUnless, ... >>> >>> I really like the idea of have the method name reflect the short-circuit >>> aspect of the operation! >>> >>> We do not have to invent a new method name though, Stream uses >>> "takeWhile(Predicate)" in Java 9: >>> https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate- >>> >>> The question becomes whether this name make the class too close to a >>> Stream and introduces some kind of confusion. If that's the case, then >>> forEachUntil() is good. >>> >>> > >>> > >>> > > >>> > > Note the difference with "Iterator", and I had to lookup the >>> difference >>> > > since the former implements "forEach" and the later >>> "forEachRemaining"! >>> > > "Iterable" is also a factory of "Iterator"s. >>> > > >>> > > Should the Producers ever be implementations of Iterable or Iterator? >>> > > Right now, the answer is no because of the short-circuit aspect of >>> using a >>> > > predicate. I'm not using the term fail-fast here because I don't >>> think of >>> > > the iteration being in error (please tell me if I'm wrong). >>> > > >>> > > If not iterable, then we should not use that name as part of the >>> class >>> > > name. Generally, the short-circuit aspect of Producers do not make a >>> bad >>> > > candidates for implementations of Iterable since it can throw >>> (unchecked) >>> > > exceptions. Different for call sites granted, but I'm just >>> mentioning it >>> > > for fun. >>> > > >>> > >>> > I already mentioned throwing runtime exceptions for the short-circuit >>> > functionality, and that it was ruled out on the basis of performance >>> (given >>> > a lot of short-circuiting is expected) and convenience for the caller. >>> I >>> > don't think we should go there. Design the API for the intended >>> purpose, >>> > and not push it into a box that is easily recognisable. >>> >>> Sounds good. >>> >>> > >>> > >>> > > >>> > > So maybe there's nothing to do. I just want to be clear about it. For >>> > > example, I think of "factory" and "producer" as synonyms but in this >>> case, >>> > > this is not a traditional application of the factory pattern. >>> > > >>> > > As an aside I can see that Producers would not be Streams out of the >>> box >>> > > because Stream#filter(Predicate) filters but does not short-circuit >>> > > iteration. While Stream#findAny() and #findFirst() don't fit the >>> > > short-circuit bill, we could implement a #findLast() in a Stream >>> > > implementation. What I do not know is if Streams otherwise fit the >>> bill of >>> > > Bloom filters. >>> > > >>> > >>> > In the case of small loops with few instructions per loop, the >>> overhead of >>> > Streams is significant. Unfortunately we don't have any performance >>> tests >>> > for this library, but I wouldn't change to Streams without knowing it >>> does >>> > not impact performance. Performance is a key feature of Bloom filters. >>> > Otherwise you can achieve some of their functionality with conventional >>> > collections. >>> >>> OK. >>> >>> > >>> > >>> > > >>> > > > >>> > > > Does that basically cover the confusion? If there are better >>> terms, >>> > > let's >>> > > > hash them out now before I update the javadocs. >>> > > > >>> > > > As an aside, Cells and Bitmaps are referenced in the literature. >>> For the >>> > > > most part the rest is made up out of whole cloth. So we could >>> change >>> > > > "Producer" to something else but we would need a good name. >>> > > >>> > > We have a class called BitMap and methods that use "BitMap" in the >>> same >>> > > but I think I am more comfortable with the term reuse now. >>> > > >>> > > The question that remains is must it be public? Since the Javadoc >>> mentions >>> > > it is about indices and bit positions, could all these methods be >>> moved to >>> > > the package-private IndexUtils? My concern is to reduce the public >>> and >>> > > protected API surface we will have to support and keep for the >>> lifetime of >>> > > the 4.x code base. >>> > > >>> > >>> > It was originally package-private and Claude requested it be made >>> public >>> > for general re-use. >>> > >>> > We could use the name BitMaps, in keeping with the pluralisation of >>> classes >>> > seen in the JDK when working with elements of a given type: Executors, >>> > Collections, etc. >>> >>> Agreed! I really like the plural convention for these static utility >>> classes. >>> >>> Gary >>> >>> > >>> > Alex >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >> -- >> LinkedIn: http://www.linkedin.com/in/claudewarren >> > > > -- > LinkedIn: http://www.linkedin.com/in/claudewarren >