[ https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409174#comment-17409174 ]
Apache Spark commented on SPARK-36656: -------------------------------------- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/33903 > CollapseProject should not collapse correlated scalar subqueries > ---------------------------------------------------------------- > > Key: SPARK-36656 > URL: https://issues.apache.org/jira/browse/SPARK-36656 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.0 > Reporter: Allison Wang > Priority: Major > > Currently, the optimizer rule `CollapseProject` inlines expressions generated > from correlated scalar subqueries, which can create unnecessary left outer > joins. > {code:sql} > select c1, s, s * 10 from ( > select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) > {code} > {code:scala} > // Before > Project [c1, s, (s * 10)] > +- Project [c1, scalar-subquery [c1] AS s] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > // After (scalar subqueries are inlined) > Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > {code} > Then this query will have two LeftOuter joins created. We should only > collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org