[ https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allison Wang updated SPARK-36656: --------------------------------- Description: Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins. {code:sql} select c1, s, s * 10 from ( select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) {code} {code:scala} // Before Project [c1, s, (s * 10)] +- Project [c1, scalar-subquery [c1] AS s] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] // After (scalar subqueries are inlined) Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] {code} Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins. was: Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins. {code:scala} // Before Project [c1, s, (s * 10)] +- Project [c1, scalar-subquery [c1] AS s] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] // After (scalar subqueries are inlined) Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] {code} Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins. > CollapseProject should not collapse correlated scalar subqueries > ---------------------------------------------------------------- > > Key: SPARK-36656 > URL: https://issues.apache.org/jira/browse/SPARK-36656 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.0 > Reporter: Allison Wang > Priority: Major > > Currently, the optimizer rule `CollapseProject` inlines expressions generated > from correlated scalar subqueries, which can create unnecessary left outer > joins. > {code:sql} > select c1, s, s * 10 from ( > select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) > {code} > {code:scala} > // Before > Project [c1, s, (s * 10)] > +- Project [c1, scalar-subquery [c1] AS s] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > // After (scalar subqueries are inlined) > Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > {code} > Then this query will have two LeftOuter joins created. We should only > collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org