Yuming Wang created SPARK-34807:
-----------------------------------

             Summary: Push down filter through window after TransposeWindow
                 Key: SPARK-34807
                 URL: https://issues.apache.org/jira/browse/SPARK-34807
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Yuming Wang



{code:scala}
      spark.range(10).selectExpr("id AS a", "id AS b", "id AS c", "id AS 
d").createTempView("t1")
      val df = spark.sql(
        """
          |SELECT *
          |  FROM (
          |    SELECT b,
          |      sum(d) OVER (PARTITION BY a, b),
          |      rank() OVER (PARTITION BY a ORDER BY c)
          |    FROM t1
          |  ) v1
          |WHERE b = 2
          |""".stripMargin)
{code}

Current optimized plan:
{noformat}
== Optimized Logical Plan ==
Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY c 
ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
+- Filter (b#221L = 2)
   +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK() 
OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC NULLS FIRST]
      +- Project [b#221L, a#220L, c#222L, sum(d) OVER (PARTITION BY a, b ROWS 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L]
         +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L]
            +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS 
a#220L, id#218L AS c#222L]
               +- Range (0, 10, step=1, splits=Some(2))
{noformat}


Expected optimized plan:
{noformat}
== Optimized Logical Plan ==
Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY c 
ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
+- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L]
   +- Project [b#221L, d#223L, a#220L, RANK() OVER (PARTITION BY a ORDER BY c 
ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
      +- Filter (b#221L = 2)
         +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC NULLS 
FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC NULLS FIRST]
            +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS 
a#220L, id#218L AS c#222L]
               +- Range (0, 10, step=1, splits=Some(2))
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to