danielcweeks commented on code in PR #7731:
URL: https://github.com/apache/iceberg/pull/7731#discussion_r1209471374
##########
core/src/main/java/org/apache/iceberg/util/TableScanUtil.java:
##########
@@ -35,16 +38,22 @@
import org.apache.iceberg.ScanTaskGroup;
import org.apache.iceberg.SplittableScanTask;
import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.CloseableGroup;
import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.CloseableIterator;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.relocated.com.google.common.collect.FluentIterable;
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
import org.apache.iceberg.relocated.com.google.common.collect.Lists;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.types.Types;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
public class TableScanUtil {
+ private static final Logger LOG =
LoggerFactory.getLogger(TableScanUtil.class);
+ private static final long MIN_SPLIT_SIZE = 16 * 1024 * 1024; // 16 MB
Review Comment:
This feels like it's too large for a min split size. Either that or we
should take into account what the row group size is actually configured to. For
example, if the row group was set to 8MB, then we still wouldn't be able to
achieve maximum parallelism because of this arbitrary default.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]