[jira] [Commented] (SPARK-28940) Subquery reuse across all subquery levels

Apache Spark (Jira) Sun, 21 Jun 2020 01:04:48 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-28940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141344#comment-17141344
 ]


Apache Spark commented on SPARK-28940:
--------------------------------------

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/28885

> Subquery reuse across all subquery levels
> -----------------------------------------
>
>                 Key: SPARK-28940
>                 URL: https://issues.apache.org/jira/browse/SPARK-28940
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Peter Toth
>            Priority: Major
>
> Currently subquery reuse doesn't work across all subquery levels.
> Here is an example query:
> {noformat}
> SELECT (SELECT avg(key) FROM testData), (SELECT (SELECT avg(key) FROM 
> testData))
> FROM testData
> LIMIT 1
> {noformat}
> where the plan now is:
> {noformat}
> CollectLimit 1
> +- *(1) Project [Subquery scalar-subquery#268, [id=#231] AS 
> scalarsubquery()#276, Subquery scalar-subquery#270, [id=#266] AS 
> scalarsubquery()#277]
>    :  :- Subquery scalar-subquery#268, [id=#231]
>    :  :  +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as 
> bigint))], output=[avg(key)#272])
>    :  :     +- Exchange SinglePartition, true, [id=#227]
>    :  :        +- *(1) HashAggregate(keys=[], 
> functions=[partial_avg(cast(key#13 as bigint))], output=[sum#282, count#283L])
>    :  :           +- *(1) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
>    :  :              +- Scan[obj#12]
>    :  +- Subquery scalar-subquery#270, [id=#266]
>    :     +- *(1) Project [Subquery scalar-subquery#269, [id=#263] AS 
> scalarsubquery()#275]
>    :        :  +- Subquery scalar-subquery#269, [id=#263]
>    :        :     +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 
> as bigint))], output=[avg(key)#274])
>    :        :        +- Exchange SinglePartition, true, [id=#259]
>    :        :           +- *(1) HashAggregate(keys=[], 
> functions=[partial_avg(cast(key#13 as bigint))], output=[sum#286, count#287L])
>    :        :              +- *(1) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
>    :        :                 +- Scan[obj#12]
>    :        +- *(1) Scan OneRowRelation[]
>    +- *(1) SerializeFromObject
>       +- Scan[obj#12]
> {noformat}
> but it could be:
> {noformat}
> CollectLimit 1
> +- *(1) Project [ReusedSubquery Subquery scalar-subquery#241, [id=#148] AS 
> scalarsubquery()#248, Subquery scalar-subquery#242, [id=#164] AS 
> scalarsubquery()#249]
>    :  :- ReusedSubquery Subquery scalar-subquery#241, [id=#148]
>    :  +- Subquery scalar-subquery#242, [id=#164]
>    :     +- *(1) Project [Subquery scalar-subquery#241, [id=#148] AS 
> scalarsubquery()#247]
>    :        :  +- Subquery scalar-subquery#241, [id=#148]
>    :        :     +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 
> as bigint))], output=[avg(key)#246])
>    :        :        +- Exchange SinglePartition, true, [id=#144]
>    :        :           +- *(1) HashAggregate(keys=[], 
> functions=[partial_avg(cast(key#13 as bigint))], output=[sum#258, count#259L])
>    :        :              +- *(1) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13]
>    :        :                 +- Scan[obj#12]
>    :        +- *(1) Scan OneRowRelation[]
>    +- *(1) SerializeFromObject
>       +- Scan[obj#12]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28940) Subquery reuse across all subquery levels

Reply via email to