stevenzwu commented on code in PR #10691:
URL: https://github.com/apache/iceberg/pull/10691#discussion_r1685542742
##########
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##########
@@ -20,84 +20,117 @@
import java.io.Closeable;
import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.ArrayDeque;
+import java.util.Deque;
import java.util.Iterator;
import java.util.NoSuchElementException;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
-import java.util.concurrent.Future;
-import org.apache.iceberg.exceptions.RuntimeIOException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.function.Supplier;
import org.apache.iceberg.io.CloseableGroup;
import org.apache.iceberg.io.CloseableIterable;
import org.apache.iceberg.io.CloseableIterator;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.io.Closer;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
public class ParallelIterable<T> extends CloseableGroup implements
CloseableIterable<T> {
+
+ private static final Logger LOG =
LoggerFactory.getLogger(ParallelIterable.class);
+
+ // Logic behind default value: ParallelIterable is often used for file
planning.
+ // Assuming that a DataFile or DeleteFile is about 500 bytes, a 30k limit
uses 14.3 MB of memory.
+ private static final int DEFAULT_MAX_QUEUE_SIZE = 30_000;
Review Comment:
finding a good default here is a bit tricky as it depends on two variables
1) consumer speed which is hard to predict
2) `Thread.sleep(10)` in the `hasNext` method for `checkTasks` while loop.
Half the queue size should be large enough to avoid starving the consumer
Anyway, I am good with the default here since I don't know how to come up
with a better number. I would be ok to go even a little higher like 50K. even
assuming 1KB per item, it is 50 MB which is pretty small in modern computer.
since we are changing from unbounded to some bound, technically a higher value
would not make problem worse compared to before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]