[ https://issues.apache.org/jira/browse/SPARK-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hovell resolved SPARK-26084. --------------------------------------- Resolution: Fixed Assignee: Simeon Simeonov Fix Version/s: 3.0.0 2.4.1 2.3.3 > AggregateExpression.references fails on unresolved expression trees > ------------------------------------------------------------------- > > Key: SPARK-26084 > URL: https://issues.apache.org/jira/browse/SPARK-26084 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.1 > Reporter: Simeon Simeonov > Assignee: Simeon Simeonov > Priority: Major > Labels: aggregate, regression, sql > Fix For: 2.3.3, 2.4.1, 3.0.0 > > > [SPARK-18394|https://issues.apache.org/jira/browse/SPARK-18394] introduced a > stable ordering in {{AttributeSet.toSeq}} using expression IDs > ([PR-18959|https://github.com/apache/spark/pull/18959/files#diff-75576f0ec7f9d8b5032000245217d233R128]) > without noticing that {{AggregateExpression.references}} used > {{AttributeSet.toSeq}} as a shortcut > ([link|https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala#L132]). > The net result is that {{AggregateExpression.references}} fails for > unresolved aggregate functions. > {code:scala} > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression( > org.apache.spark.sql.catalyst.expressions.aggregate.Sum(('x + 'y).expr), > mode = org.apache.spark.sql.catalyst.expressions.aggregate.Complete, > isDistinct = false > ).references > {code} > fails with > {code:scala} > org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to > exprId on unresolved object, tree: 'y > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:104) > at > org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128) > at > org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128) > at scala.math.Ordering$$anon$5.compare(Ordering.scala:122) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) > at java.util.TimSort.sort(TimSort.java:220) > at java.util.Arrays.sort(Arrays.java:1438) > at scala.collection.SeqLike$class.sorted(SeqLike.scala:648) > at scala.collection.AbstractSeq.sorted(Seq.scala:41) > at scala.collection.SeqLike$class.sortBy(SeqLike.scala:623) > at scala.collection.AbstractSeq.sortBy(Seq.scala:41) > at > org.apache.spark.sql.catalyst.expressions.AttributeSet.toSeq(AttributeSet.scala:128) > at > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.references(interfaces.scala:201) > {code} > The solution is to avoid calling {{toSeq}} as ordering is not important in > {{references}} and simplify (and speed up) the implementation to something > like > {code:scala} > mode match { > case Partial | Complete => aggregateFunction.references > case PartialMerge | Final => > AttributeSet(aggregateFunction.aggBufferAttributes) > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org