Kris Mok created SPARK-26741: -------------------------------- Summary: Analyzer incorrectly resolves aggregate function outside of Aggregate operators Key: SPARK-26741 URL: https://issues.apache.org/jira/browse/SPARK-26741 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Kris Mok
The analyzer can sometimes hit issues with resolving functions. e.g. {code:sql} select max(id) from range(10) group by id having count(1) >= 1 order by max(id) {code} The analyzed plan of this query is: {code:none} == Analyzed Logical Plan == max(id): bigint Project [max(id)#91L] +- Sort [max(id#88L) ASC NULLS FIRST], true +- Project [max(id)#91L, id#88L] +- Filter (count(1)#93L >= cast(1 as bigint)) +- Aggregate [id#88L], [max(id#88L) AS max(id)#91L, count(1) AS count(1)#93L, id#88L] +- Range (0, 10, step=1, splits=None) {code} Note how an aggregate function is outside of {{Aggregate}} operators in the fully analyzed plan: {{Sort [max(id#88L) ASC NULLS FIRST], true}}, which makes the plan invalid. Trying to run this query will lead to weird issues in codegen, but the root cause is in the analyzer: {code:none} java.lang.UnsupportedOperationException: Cannot generate code for expression: max(input[1, bigint, false]) at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:291) at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:290) at org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:87) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:138) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:133) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.$anonfun$createOrderKeys$1(GenerateOrdering.scala:82) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.createOrderKeys(GenerateOrdering.scala:82) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.genComparisons(GenerateOrdering.scala:91) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:152) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:44) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1194) at org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.<init>(GenerateOrdering.scala:195) at org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.<init>(GenerateOrdering.scala:192) at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:153) at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3302) at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2470) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3291) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:147) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3287) at org.apache.spark.sql.Dataset.head(Dataset.scala:2470) at org.apache.spark.sql.Dataset.take(Dataset.scala:2684) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:262) at org.apache.spark.sql.Dataset.showString(Dataset.scala:299) at org.apache.spark.sql.Dataset.show(Dataset.scala:753) at org.apache.spark.sql.Dataset.show(Dataset.scala:712) at org.apache.spark.sql.Dataset.show(Dataset.scala:721) {code} The test case {{SPARK-23957 Remove redundant sort from subquery plan(scalar subquery)}} in {{SubquerySuite}} has been disabled because of hitting this issue, caught by SPARK-26735. We should re-enable that test once this bug is fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org