beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r353571892
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ########## @@ -135,19 +135,25 @@ object AggUtils { } val distinctAttributes = namedDistinctExpressions.map(_.toAttribute) val groupingAttributes = groupingExpressions.map(_.toAttribute) + val filterWithDistinctAttributes = functionsWithDistinct.flatMap(_.filterAttributes.toSeq) // 1. Create an Aggregate Operator for partial aggregations. val partialAggregate: SparkPlan = { val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial)) val aggregateAttributes = aggregateExpressions.map(_.resultAttribute) // We will group by the original grouping expression, plus an additional expression for the - // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping - // expressions will be [key, value]. + // DISTINCT column and the referred attributes in the FILTER clause associated with each + // aggregate function. For example: + // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression will be [key, value]; + // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression + // will be [key, value, value2]. Review comment: For a query like `SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) FILTER (WHERE d = 0) FROM table` will be ``` AGG-4 (count distinct) Shuffle to a single reducer Partial-AGG-3 (count distinct, no grouping, apply function COUNT on a with c > 0) Partial-AGG-2 (grouping on a and c) Shuffle by a Partial-AGG-1 (grouping on a and c, apply function SUM on b with d = 0) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org