Re: [PR] Spark: Support aggregate pushdown for identity partition column GROUP BY [iceberg]

via GitHub Sun, 03 May 2026 10:04:13 -0700


hemanthboyina commented on code in PR #16176:
URL: https://github.com/apache/iceberg/pull/16176#discussion_r3178458003



##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -186,6 +196,213 @@ public boolean pushAggregation(Aggregation aggregation) {
     return true;
   }
 
+  /**
+   * Push down aggregation with GROUP BY on identity partition columns. When 
all GROUP BY columns
+   * are identity partition fields, aggregates can be computed from file 
metadata grouped by
+   * partition values, avoiding reading any data files.
+   */
+  private boolean pushGroupByAggregation(
+      Aggregation aggregation, List<BoundAggregate<?, ?>> boundAggregates) {
+    PartitionSpec spec = table().spec();
+    Schema tableSchema = table().schema();
+
+    List<Integer> groupByPositions = Lists.newArrayList();
+    List<Types.NestedField> groupByFields = Lists.newArrayList();
+    if (!resolveGroupByPartitions(
+        aggregation, spec, tableSchema, groupByPositions, groupByFields)) {
+      return false;
+    }
+
+    Map<List<Object>, AggregateEvaluator> evaluatorsByPartition =
+        groupFilesByPartition(spec, groupByPositions, boundAggregates);

Review Comment:
   handled partition spec evolution changes, can you please review



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Support aggregate pushdown for identity partition column GROUP BY [iceberg]

Reply via email to