[ https://issues.apache.org/jira/browse/SPARK-28940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141344#comment-17141344 ]
Apache Spark commented on SPARK-28940: -------------------------------------- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/28885 > Subquery reuse across all subquery levels > ----------------------------------------- > > Key: SPARK-28940 > URL: https://issues.apache.org/jira/browse/SPARK-28940 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Peter Toth > Priority: Major > > Currently subquery reuse doesn't work across all subquery levels. > Here is an example query: > {noformat} > SELECT (SELECT avg(key) FROM testData), (SELECT (SELECT avg(key) FROM > testData)) > FROM testData > LIMIT 1 > {noformat} > where the plan now is: > {noformat} > CollectLimit 1 > +- *(1) Project [Subquery scalar-subquery#268, [id=#231] AS > scalarsubquery()#276, Subquery scalar-subquery#270, [id=#266] AS > scalarsubquery()#277] > : :- Subquery scalar-subquery#268, [id=#231] > : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as > bigint))], output=[avg(key)#272]) > : : +- Exchange SinglePartition, true, [id=#227] > : : +- *(1) HashAggregate(keys=[], > functions=[partial_avg(cast(key#13 as bigint))], output=[sum#282, count#283L]) > : : +- *(1) SerializeFromObject > [knownnotnull(assertnotnull(input[0, > org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] > : : +- Scan[obj#12] > : +- Subquery scalar-subquery#270, [id=#266] > : +- *(1) Project [Subquery scalar-subquery#269, [id=#263] AS > scalarsubquery()#275] > : : +- Subquery scalar-subquery#269, [id=#263] > : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 > as bigint))], output=[avg(key)#274]) > : : +- Exchange SinglePartition, true, [id=#259] > : : +- *(1) HashAggregate(keys=[], > functions=[partial_avg(cast(key#13 as bigint))], output=[sum#286, count#287L]) > : : +- *(1) SerializeFromObject > [knownnotnull(assertnotnull(input[0, > org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] > : : +- Scan[obj#12] > : +- *(1) Scan OneRowRelation[] > +- *(1) SerializeFromObject > +- Scan[obj#12] > {noformat} > but it could be: > {noformat} > CollectLimit 1 > +- *(1) Project [ReusedSubquery Subquery scalar-subquery#241, [id=#148] AS > scalarsubquery()#248, Subquery scalar-subquery#242, [id=#164] AS > scalarsubquery()#249] > : :- ReusedSubquery Subquery scalar-subquery#241, [id=#148] > : +- Subquery scalar-subquery#242, [id=#164] > : +- *(1) Project [Subquery scalar-subquery#241, [id=#148] AS > scalarsubquery()#247] > : : +- Subquery scalar-subquery#241, [id=#148] > : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 > as bigint))], output=[avg(key)#246]) > : : +- Exchange SinglePartition, true, [id=#144] > : : +- *(1) HashAggregate(keys=[], > functions=[partial_avg(cast(key#13 as bigint))], output=[sum#258, count#259L]) > : : +- *(1) SerializeFromObject > [knownnotnull(assertnotnull(input[0, > org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] > : : +- Scan[obj#12] > : +- *(1) Scan OneRowRelation[] > +- *(1) SerializeFromObject > +- Scan[obj#12] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org