rdblue commented on code in PR #6622:
URL: https://github.com/apache/iceberg/pull/6622#discussion_r1103898014
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -193,6 +337,19 @@ private Schema schemaWithMetadataColumns() {
@Override
public Scan build() {
+ // if aggregates are pushed down, instead of constructing a
SparkBatchQueryScan, creating file
+ // read tasks and sending over the tasks to Spark executors, a
SparkLocalScan will be created
+ // and the scan is done locally on the Spark driver instead of the
executors. The statistics
+ // info will be retrieved from manifest file and used to build a Spark
internal row, which
+ // contains the pushed down aggregate values.
+ if (pushedAggregateRows != null) {
Review Comment:
I think it would be slightly better to create the scan in the aggregation
methods. Then this could be `if (localScan != null) { return localScan }` which
is a bit more generic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]