[ https://issues.apache.org/jira/browse/SPARK-34946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reopened SPARK-34946: --------------------------------- > Block unsupported correlated scalar subquery in Aggregate > --------------------------------------------------------- > > Key: SPARK-34946 > URL: https://issues.apache.org/jira/browse/SPARK-34946 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Allison Wang > Priority: Major > > Currently, Spark supports Aggregate to host correlated scalar subqueries, but > in some cases, those subqueries cannot be rewritten properly in the > `RewriteCorrelatedScalarSubquery` rule. The error messages are also > confusing. Hence we should block these cases in CheckAnalysis. > > Case 1: correlated scalar subquery in the grouping expressions but not in > aggregate expressions > {code:java} > SELECT SUM(c2) FROM t t1 GROUP BY (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = > t2.c1) > {code} > We get this error: > {code:java} > java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; > something went wrong in analysis > {code} > because the correlated scalar subquery is not rewritten properly: > {code:java} > == Optimized Logical Plan == > Aggregate [scalar-subquery#5 [(c1#6 = c1#6#93)]], [sum(c2#7) AS sum(c2)#11L] > : +- Aggregate [c1#6], [sum(c2#7) AS sum(c2)#15L, c1#6 AS c1#6#93] > : +- LocalRelation [c1#6, c2#7] > +- LocalRelation [c1#6, c2#7] > {code} > > Case 2: correlated scalar subquery in the aggregate expressions but not in > the grouping expressions > {code:java} > SELECT (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1), SUM(c2) FROM t t1 > GROUP BY c1 > {code} > We get this error: > {code:java} > java.lang.IllegalStateException: Couldn't find sum(c2)#69L in > [c1#60,sum(c2#61)#64L] > {code} > because the transformed correlated scalar subquery output is not present in > the grouping expression of the Aggregate: > {code:java} > == Optimized Logical Plan == > Aggregate [c1#60], [sum(c2)#69L AS scalarsubquery(c1)#70L, sum(c2#61) AS > sum(c2)#65L] > +- Project [c1#60, c2#61, sum(c2)#69L] > +- Join LeftOuter, (c1#60 = c1#60#95) > :- LocalRelation [c1#60, c2#61] > +- Aggregate [c1#60], [sum(c2#61) AS sum(c2)#69L, c1#60 AS c1#60#95] > +- LocalRelation [c1#60, c2#61] > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org