[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r979562710 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2594,6 +2601,31 @@ class Analyzer(override val catalogManager: CatalogManager) }) } +private def resolveTemp(expr: Expression, agg: Aggregate): Expression = { Review Comment: This is very similar to the `resolveCol` method inside `resolveExprsWithAggregate`. Can we have a common method to share code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866863003 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2563,7 +2563,23 @@ class Analyzer(override val catalogManager: CatalogManager) case Sort(sortOrder, global, agg: Aggregate) if agg.resolved => // We should resolve the references normally based on child (agg.output) first. -val maybeResolved = sortOrder.map(_.child).map(resolveExpressionByPlanOutput(_, agg)) Review Comment: e.g. ``` val maybeResolved = sortOrder.map(_.child).map { expr => val resolved = resolveTemp(expr, agg) if (resolved.exists(_.isInstanceOf[AggregateFunction])) { unresolve(resolved) } else { resolved } } resolveOperatorWithAggregate... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866859656 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2563,7 +2563,23 @@ class Analyzer(override val catalogManager: CatalogManager) case Sort(sortOrder, global, agg: Aggregate) if agg.resolved => // We should resolve the references normally based on child (agg.output) first. -val maybeResolved = sortOrder.map(_.child).map(resolveExpressionByPlanOutput(_, agg)) Review Comment: My proposal is: we still try to resolve the column to `agg.output` first, but temporarily (use `TempResolvedColumn`). If the order by expression is resolved but contains aggregate functions, unresolve the column and call `resolveOperatorWithAggregate`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866856072 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2563,7 +2563,23 @@ class Analyzer(override val catalogManager: CatalogManager) case Sort(sortOrder, global, agg: Aggregate) if agg.resolved => // We should resolve the references normally based on child (agg.output) first. -val maybeResolved = sortOrder.map(_.child).map(resolveExpressionByPlanOutput(_, agg)) Review Comment: I think this bug is much more complicated. Think about `order by sum(id)` and `order by abs(id)`. For the first case, we want to resolve `id` to the table column and push `sum(id)` to the Aggregate. For the second case, we want to resolve `id` to `sum(id) as id`. How to define a clear rule here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866842272 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -1176,4 +1176,13 @@ class AnalysisSuite extends AnalysisTest with Matchers { false) } } + + test("SPARK-38951: Aggregate aliases override field names in ResolveAggregateFunctions") { +assertAnalysisSuccess(parsePlan( + s""" + |select sum(id) as id + |from range(10) + |group by id + |order by sum(id)""".stripMargin)) Review Comment: does this query work on other databases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866842272 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -1176,4 +1176,13 @@ class AnalysisSuite extends AnalysisTest with Matchers { false) } } + + test("SPARK-38951: Aggregate aliases override field names in ResolveAggregateFunctions") { +assertAnalysisSuccess(parsePlan( + s""" + |select sum(id) as id + |from range(10) + |group by id + |order by sum(id)""".stripMargin)) Review Comment: does this query work on other databases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org