uschindler commented on code in PR #15823:
URL: https://github.com/apache/lucene/pull/15823#discussion_r3010979109
##########
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java:
##########
@@ -174,6 +177,38 @@ public void addAll(Collection<T> elements) {
}
}
+ /**
+ * Adds all elements of the stream into the queue. This method should be
preferred over calling
+ * {@link #add(Object)} in loop if all elements are known in advance as it
builds queue faster.
+ *
+ * <p>If one needs to map or filter element in the iteration of elements in
this method, call this
+ * method with elements wrapped by {@link Stream#map(Function)} or {@link
+ * Stream#filter(Predicate)}, etc. In these cases, this method should be
preferred over calling
+ * {@link #addAll(Collection)}.
+ *
+ * <p>If one tries to add more objects than the maxSize passed in the
constructor, an {@link
+ * ArrayIndexOutOfBoundsException} is thrown. Which may result in parts of
elements added into the
+ * queue, but the heap is still stay in correct state. In this case, if
caller wants to readd or
+ * {@link #updateTop(Object)} with remaining elements, it should use a new
stream, and use {@link
+ * Stream#skip(long)} to skip consumed elements with the delta size of queue.
+ */
+ public void addAll(Stream<T> elements) {
+ // Heap with size S always takes first S elements of the array,
+ // and thus it's safe to fill array further - no actual non-sentinel value
will be overwritten.
+ try {
+ elements.forEachOrdered(
+ element -> {
+ this.heap[size + 1] = element;
+ this.size++;
Review Comment:
> No doubt I likely have a misunderstanding here :). I don't often leverage
streams to be honest.
>
> If I'm understanding, your point here is that `forEachOrdered` is going to
guarantee sequential invocation. Is that correct? I took a look at the
[documentation](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/stream/Stream.html#forEachOrdered(java.util.function.Consumer))[1]
and it seems to highlight this point as well.
Yes that what I am after. The difference betwen forEach() and forEachOrdered
is just the order. Because of the defined order (oteratoion order of the
stream), you can repeat the operation after an exception, stepping over the
already processed ones (using skipping as mentioned in the docs here).
> Apologies for the confusion. I should have looked at this documentation
first.
No problem. Parallel streams often make people think they need to prevent
this, but it is only an issue if non-terminal operations change the stream.
E.g., if the map() method's lambda would change the collection. Foreach is a
terminal operation, you can be sure it is running in order with happens-before.
> [1] "Performing the action for one element
[**happens-before**](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/package-summary.html#MemoryVisibility)
performing the action for subsequent elements"
Thanks for that documentation hint, this makes sure for-each is sequential
and itsself is not parallelized. In fact, `forEach()`, `forEachOrdered() or
iterator() disables parallelization on the stream anyways, therefor features
like `` collect()` or `reduce()` are much better when you have a parallel
stream.
Foreach converts the stream to a for-each loop. Alternatively, we could also
call `iterator()` on the stream and consume the iterator like this:
```java
Iterable<T> it = () -> stream.iterator();
for (T element : it) {
this.heap[size + 1] = element;
this.size++;
}
```
Of course that's not really better readable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]