hemanthboyina commented on code in PR #16176:
URL: https://github.com/apache/iceberg/pull/16176#discussion_r3176041459
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -186,6 +196,213 @@ public boolean pushAggregation(Aggregation aggregation) {
return true;
}
+ /**
+ * Push down aggregation with GROUP BY on identity partition columns. When
all GROUP BY columns
+ * are identity partition fields, aggregates can be computed from file
metadata grouped by
+ * partition values, avoiding reading any data files.
+ */
+ private boolean pushGroupByAggregation(
+ Aggregation aggregation, List<BoundAggregate<?, ?>> boundAggregates) {
+ PartitionSpec spec = table().spec();
+ Schema tableSchema = table().schema();
+
+ List<Integer> groupByPositions = Lists.newArrayList();
+ List<Types.NestedField> groupByFields = Lists.newArrayList();
+ if (!resolveGroupByPartitions(
+ aggregation, spec, tableSchema, groupByPositions, groupByFields)) {
+ return false;
+ }
+
+ Map<List<Object>, AggregateEvaluator> evaluatorsByPartition =
+ groupFilesByPartition(spec, groupByPositions, boundAggregates);
Review Comment:
Thanks for the review @singhpk234 You raised a valid point. the current
implementation only considers the current partition spec and bails out for
files from different specs. Will look into handling spec evolution properly and
update the PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]