[GitHub] [iceberg] sunchao commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

GitBox Wed, 19 Oct 2022 11:00:07 -0700


sunchao commented on code in PR #2276:
URL: https://github.com/apache/iceberg/pull/2276#discussion_r999781025



##########
core/src/main/java/org/apache/iceberg/BaseScan.java:
##########
@@ -160,6 +161,31 @@ public ThisT planWith(ExecutorService executorService) {
     return newRefinedScan(ops, table, schema, 
context.planWith(executorService));
   }
 
+  @Override
+  public ThisT preservePartitions(Collection<String> columns) {
+    if (table.spec().isUnpartitioned()) {

Review Comment:
   I wonder what's the implication of having multiple partition specs for the 
table. The workflow I'm thinking:
   1. Iceberg reports to Spark the partitioning for the table, via 
`SupportsReportPartitioning` interface, say (`day(ts), bucket(id)`)
   2. Spark performs query planning and pushes down the actual partition 
columns used by the query (say `bucket(id)`).
   3. Iceberg generates splits based on the result from step 2).
   
   Here, as long as step 1) reports partitioning according to the current 
partition spec, it should be fine? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] sunchao commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

Reply via email to