Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19488#discussion_r144483555 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -205,14 +205,15 @@ object PhysicalAggregation { case logical.Aggregate(groupingExpressions, resultExpressions, child) => // A single aggregate expression might appear multiple times in resultExpressions. // In order to avoid evaluating an individual aggregate function multiple times, we'll - // build a set of the distinct aggregate expressions and build a function which can + // build a map of the distinct aggregate expressions and build a function which can // be used to re-write expressions so that they reference the single copy of the // aggregate function which actually gets computed. - val aggregateExpressions = resultExpressions.flatMap { expr => + val aggregateExpressionMap = resultExpressions.flatMap { expr => expr.collect { - case agg: AggregateExpression => agg + case agg: AggregateExpression => (agg.canonicalized, agg.deterministic) -> agg --- End diff -- I think non-deterministic functions should not be deduplicated, e.g. `select max(a + rand()), max(a + rand()) from ...` should still eveluate 2 aggregate funcitions. my suggestion: ``` val aggregateExpressions = resultExpressions.flatMap { expr => expr.collect { case agg: AggregateExpression => agg } } val aggregateExpressionMap = aggregateExpressions.filter(_.deterministic).map { agg => agg.canonicalized -> agg }.toMap ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org