Peter Toth created SPARK-28940: ---------------------------------- Summary: Subquery reuse accross all subquery levels Key: SPARK-28940 URL: https://issues.apache.org/jira/browse/SPARK-28940 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Peter Toth
Currently subquery reuse doesn't work across all subquery levels. Here is an example query: {noformat} SELECT (SELECT avg(key) FROM testData), (SELECT (SELECT avg(key) FROM testData)) FROM testData LIMIT 1 {noformat} where the plan now is: {noformat} CollectLimit 1 +- *(1) Project [Subquery scalar-subquery#268, [id=#231] AS scalarsubquery()#276, Subquery scalar-subquery#270, [id=#266] AS scalarsubquery()#277] : :- Subquery scalar-subquery#268, [id=#231] : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#272]) : : +- Exchange SinglePartition, true, [id=#227] : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#282, count#283L]) : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : : +- Scan[obj#12] : +- Subquery scalar-subquery#270, [id=#266] : +- *(1) Project [Subquery scalar-subquery#269, [id=#263] AS scalarsubquery()#275] : : +- Subquery scalar-subquery#269, [id=#263] : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#274]) : : +- Exchange SinglePartition, true, [id=#259] : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#286, count#287L]) : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : : +- Scan[obj#12] : +- *(1) Scan OneRowRelation[] +- *(1) SerializeFromObject +- Scan[obj#12] {noformat} but it could be: {noformat} CollectLimit 1 +- *(1) Project [ReusedSubquery Subquery scalar-subquery#241, [id=#148] AS scalarsubquery()#248, Subquery scalar-subquery#242, [id=#164] AS scalarsubquery()#249] : :- ReusedSubquery Subquery scalar-subquery#241, [id=#148] : +- Subquery scalar-subquery#242, [id=#164] : +- *(1) Project [Subquery scalar-subquery#241, [id=#148] AS scalarsubquery()#247] : : +- Subquery scalar-subquery#241, [id=#148] : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#246]) : : +- Exchange SinglePartition, true, [id=#144] : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#258, count#259L]) : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : : +- Scan[obj#12] : +- *(1) Scan OneRowRelation[] +- *(1) SerializeFromObject +- Scan[obj#12] {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org