Yuming Wang created SPARK-34807: ----------------------------------- Summary: Push down filter through window after TransposeWindow Key: SPARK-34807 URL: https://issues.apache.org/jira/browse/SPARK-34807 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang
{code:scala} spark.range(10).selectExpr("id AS a", "id AS b", "id AS c", "id AS d").createTempView("t1") val df = spark.sql( """ |SELECT * | FROM ( | SELECT b, | sum(d) OVER (PARTITION BY a, b), | rank() OVER (PARTITION BY a ORDER BY c) | FROM t1 | ) v1 |WHERE b = 2 |""".stripMargin) {code} Current optimized plan: {noformat} == Optimized Logical Plan == Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232] +- Filter (b#221L = 2) +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC NULLS FIRST] +- Project [b#221L, a#220L, c#222L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L] +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L] +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS a#220L, id#218L AS c#222L] +- Range (0, 10, step=1, splits=Some(2)) {noformat} Expected optimized plan: {noformat} == Optimized Logical Plan == Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232] +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L] +- Project [b#221L, d#223L, a#220L, RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232] +- Filter (b#221L = 2) +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC NULLS FIRST] +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS a#220L, id#218L AS c#222L] +- Range (0, 10, step=1, splits=Some(2)) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org