findepi commented on code in PR #10691:
URL: https://github.com/apache/iceberg/pull/10691#discussion_r1682523817
##########
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##########
@@ -88,16 +92,26 @@ private ParallelIterator(
@Override
public void close() {
// close first, avoid new task submit
- this.closed = true;
+ this.closed.set(true);
+
+ try (Closer closer = Closer.create()) {
+ yieldedTasks.forEach(closer::register);
+ yieldedTasks.clear();
- // cancel background tasks
- for (Future<?> taskFuture : taskFutures) {
- if (taskFuture != null && !taskFuture.isDone()) {
- taskFuture.cancel(true);
+ // TODO close input iterables that were not started yet
Review Comment:
> The existing code use a try-with-resources for the iterable and used a
`for` loop.
I do believe that the code in this PR is semantically equivalent, so
whatever assumptions/contract there were, they should not be violated by these
changes.
> The way our iterables should work is that there is no need to close the
iterable itself (i.e. no resources it holds) but it is a convenient way to
close any iterators that were opened from the iterable, which do hold resources.
i find this mental model harder to reason about.
It's the iterable who is checked for being Closeable, so I would feel
obliged to call close on the instance that was given to me, for me to own it.
But it's not the place to debate this model. I will remove the TODO.
> We do need to make sure the yielded tasks or currently running tasks are
closed. I think that closing the yielded tasks above and cancelling the running
Task instances should take care of that. The running instances should hit the
catch for Throwable that calls close.
Yes, mostly, but there is a race condition.
(a) a task may be about to yield or (b) already yielded but not moved to
`yieldedTasks` collection yet.
In both cases `taskFuture.cancel(true)` won't cause the closing of the
iterable in the task.
What the solution should be depends on whether the Task can be closed
asyncly. If yes, it would makes things easier, but i don't think we can make
such an assumption.
I will introduce timed-wait inside the close to wait for the tasks
continuations and close them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]