aokolnychyi commented on code in PR #2276:
URL: https://github.com/apache/iceberg/pull/2276#discussion_r1007098070
##########
api/src/main/java/org/apache/iceberg/Scan.java:
##########
@@ -129,6 +129,34 @@ default ThisT select(String... columns) {
*/
ThisT planWith(ExecutorService executorService);
+ /**
+ * Create a new {@link TableScan} which dictates that when plan tasks via
the {@link
+ * #planTasks()}, the scan should preserve partition boundary specified by
the provided partition
+ * column names. In other words, the scan will not attempt to combine tasks
whose input files have
+ * different partition data w.r.t `columns`.
+ *
+ * @param columns the partition column names to preserve boundary when
planning tasks
+ * @return a table scan preserving partition boundary when planning tasks
+ * @throws IllegalArgumentException if any of the input columns is not a
partition column, or if
+ * the table is un-partitioned.
+ */
+ ThisT preservePartitions(Collection<String> columns);
+
+ /**
+ * Create a new {@link TableScan} which dictates that when plan tasks via
the {@link
+ * #planTasks()}, the scan should preserve partition boundary specified by
the provided partition
Review Comment:
@sunchao, what if we make Spark pass join keys instead of V2 partition
expressions?
I think that can be easily done as the operator optimization batch in Spark
runs before Spark builds scans, so we should know all equality predicates of
our join condition by that time. That means, we can have a new mix-in interface
for `ScanBuilder`. If our join condition is `t.dep = s.dep AND t.id = s.id` and
tables are partitioned by `dep, bucket(id)`, Spark will pass just `dep`, `id`
and Iceberg will interpret this as don't combine tasks across partition
expressions on top `dep`, `id`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]