hantangwangd commented on code in PR #12201:
URL: https://github.com/apache/iceberg/pull/12201#discussion_r1947986708
##########
core/src/main/java/org/apache/iceberg/util/TableScanUtil.java:
##########
@@ -236,6 +236,9 @@ public static long adjustSplitSize(long scanSize, int
parallelism, long splitSiz
// use the configured split size if it produces at least one split per slot
// otherwise, adjust the split size to target parallelism with a
reasonable minimum
// increasing the split size may cause expensive spills and is not done
automatically
+ if (splitSize <= 0) {
Review Comment:
When querying in Spark, the `adjustSplitSize(...)` is invoked earlier than
`TableScanUtil.planTaskGroups(...)` as follows, so the illegal split size will
be handle in `adjustSplitSize(...)` first and meet this problem. I believe in
other engine, if they have the logic to adjust the split size, this would be
the case as well.
```
TableScanUtil.planTaskGroups(
CloseableIterable.withNoopClose(tasks()),
adjustSplitSize(tasks(), scan.targetSplitSize()),
scan.splitLookback(),
scan.splitOpenFileCost());
```
According to my understanding of what you mead, we should simply add a
`checkArgument` for splitSize in `adjustSplitSize(...)` to throw an error for
illegal value, is that right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]