[ https://issues.apache.org/jira/browse/SPARK-26741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753748#comment-16753748 ]
Xiao Li commented on SPARK-26741: --------------------------------- Could we re-enable the test by making the test case valid? That means, we just need to remove the aggregate function in the order by clause? > Analyzer incorrectly resolves aggregate function outside of Aggregate > operators > ------------------------------------------------------------------------------- > > Key: SPARK-26741 > URL: https://issues.apache.org/jira/browse/SPARK-26741 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Kris Mok > Priority: Major > > The analyzer can sometimes hit issues with resolving functions. e.g. > {code:sql} > select max(id) > from range(10) > group by id > having count(1) >= 1 > order by max(id) > {code} > The analyzed plan of this query is: > {code:none} > == Analyzed Logical Plan == > max(id): bigint > Project [max(id)#91L] > +- Sort [max(id#88L) ASC NULLS FIRST], true > +- Project [max(id)#91L, id#88L] > +- Filter (count(1)#93L >= cast(1 as bigint)) > +- Aggregate [id#88L], [max(id#88L) AS max(id)#91L, count(1) AS > count(1)#93L, id#88L] > +- Range (0, 10, step=1, splits=None) > {code} > Note how an aggregate function is outside of {{Aggregate}} operators in the > fully analyzed plan: > {{Sort [max(id#88L) ASC NULLS FIRST], true}}, which makes the plan invalid. > Trying to run this query will lead to weird issues in codegen, but the root > cause is in the analyzer: > {code:none} > java.lang.UnsupportedOperationException: Cannot generate code for expression: > max(input[1, bigint, false]) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:291) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:290) > at > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:87) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:138) > at scala.Option.getOrElse(Option.scala:138) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:133) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.$anonfun$createOrderKeys$1(GenerateOrdering.scala:82) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.createOrderKeys(GenerateOrdering.scala:82) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.genComparisons(GenerateOrdering.scala:91) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:152) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:44) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1194) > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.<init>(GenerateOrdering.scala:195) > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.<init>(GenerateOrdering.scala:192) > at > org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:153) > at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3302) > at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2470) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3291) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:147) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3287) > at org.apache.spark.sql.Dataset.head(Dataset.scala:2470) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2684) > at org.apache.spark.sql.Dataset.getRows(Dataset.scala:262) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:299) > at org.apache.spark.sql.Dataset.show(Dataset.scala:753) > at org.apache.spark.sql.Dataset.show(Dataset.scala:712) > at org.apache.spark.sql.Dataset.show(Dataset.scala:721) > {code} > The test case {{SPARK-23957 Remove redundant sort from subquery plan(scalar > subquery)}} in {{SubquerySuite}} has been disabled because of hitting this > issue, caught by SPARK-26735. We should re-enable that test once this bug is > fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org