[jira] [Assigned] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36656:
---

Assignee: Allison Wang

> CollapseProject should not collapse correlated scalar subqueries
> 
>
> Key: SPARK-36656
> URL: https://issues.apache.org/jira/browse/SPARK-36656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated 
> from correlated scalar subqueries, which can create unnecessary left outer 
> joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>:  +- Aggregate [c1], [first(c2), c1] 
>:  +- LocalRelation [c1, c2]
>+- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only 
> collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36656:


Assignee: Apache Spark

> CollapseProject should not collapse correlated scalar subqueries
> 
>
> Key: SPARK-36656
> URL: https://issues.apache.org/jira/browse/SPARK-36656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated 
> from correlated scalar subqueries, which can create unnecessary left outer 
> joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>:  +- Aggregate [c1], [first(c2), c1] 
>:  +- LocalRelation [c1, c2]
>+- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only 
> collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36656:


Assignee: (was: Apache Spark)

> CollapseProject should not collapse correlated scalar subqueries
> 
>
> Key: SPARK-36656
> URL: https://issues.apache.org/jira/browse/SPARK-36656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated 
> from correlated scalar subqueries, which can create unnecessary left outer 
> joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>:  +- Aggregate [c1], [first(c2), c1] 
>:  +- LocalRelation [c1, c2]
>+- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only 
> collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org