aokolnychyi commented on code in PR #5077:
URL: https://github.com/apache/iceberg/pull/5077#discussion_r900472693
##########
api/src/main/java/org/apache/iceberg/Scan.java:
##########
@@ -113,26 +113,23 @@
Schema schema();
/**
- * Plan the {@link FileScanTask files} that will be read by this scan.
+ * Plan tasks for this scan without trying to balance the work.
* <p>
- * Each file has a residual expression that should be applied to filter the
file's rows.
- * <p>
- * This simple plan returns file scans for each file from position 0 to the
file's length. For
- * planning that will combine small files, split large files, and attempt to
balance work, use
- * {@link #planTasks()} instead.
+ * Use {@link #planTasks()} for planning that will attempt to balance the
work
+ * by combining small or splitting large files.
*
- * @return an Iterable of file tasks that are required by this scan
+ * @return an Iterable of tasks required by this scan
*/
- CloseableIterable<FileScanTask> planFiles();
+ CloseableIterable<T> planFiles();
/**
- * Plan the {@link CombinedScanTask tasks} for this scan.
+ * Plan input split tasks for this scan and balance the work.
* <p>
- * Tasks created by this method may read partial input files, multiple input
files, or both.
+ * Tasks created by this method may read partial input files, multiple input
files or both.
*
- * @return an Iterable of tasks for this scan
+ * @return an Iterable of input splits required by this scan
*/
- CloseableIterable<CombinedScanTask> planTasks();
+ CloseableIterable<S> planTasks();
Review Comment:
The plan for other scans like CDC to use `InputSplit<T>` and
`BaseInputSplit` instead of `CombinedScanTask`.
Alternatively, we could add something like `planInputSplits` instead of
parameterizing `Scan` with `S extends InputSplit<T>` and keep `planTasks` only
in `TableScan` for compatibility. Not sure it is any better.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]