rdblue commented on a change in pull request #1446:
URL: https://github.com/apache/iceberg/pull/1446#discussion_r488102684
##########
File path: core/src/main/java/org/apache/iceberg/util/ParallelIterable.java
##########
@@ -98,6 +99,22 @@ private boolean checkTasks() {
for (int i = 0; i < taskFutures.length; i += 1) {
if (taskFutures[i] == null || taskFutures[i].isDone()) {
+ if (taskFutures[i] != null) {
+ // check for task failure and re-throw any exception
+ try {
+ taskFutures[i].get();
Review comment:
This class is used to parallelize operations that are consumed from a
single thread, and primarily for scan planning where we need to scan lots of
manifest files (the bottleneck) but split and combine them into tasks (quick).
We want the reads to operate in parallel, but the consumer thread should not
have additional concerns.
In addition, this class attempts to avoid running the back-end tasks too
quickly and accumulating too many results in memory. If the consumer stops
calling `next`, then we should also not submit more tasks. This avoids large
memory costs in scan planning for limit queries in Presto.
To avoid submitting tasks that are unnecessary and to avoid using an
additional monitor thread, this class manages task submission in the `Iterator`
methods called by the consumer. Those methods are synchronized so that
monitoring the task futures is a safe operation, which should not add much
overhead because we expect a single consumer thread.
I'm not quite sure why you'd think that `Future.get` is a code smell. Isn't
that the correct way to get the result of the future?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]