[ https://issues.apache.org/jira/browse/SPARK-37226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-37226: ------------------------------------ Assignee: Yuming Wang (was: Apache Spark) > Filter push down through window > ------------------------------- > > Key: SPARK-37226 > URL: https://issues.apache.org/jira/browse/SPARK-37226 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Yuming Wang > Assignee: Yuming Wang > Priority: Major > > {code:scala} > spark.sql("CREATE TABLE t1 using parquet as select id as a, id as b from > range(1000)") > spark.sql("select * from (SELECT a, count(*) cnt, row_number() over (order by > a desc) as rn from t1 group by a) where rn <= 10").explain(true) > {code} > We can optimize this query: > {noformat} > == Optimized Logical Plan == > Filter (rn#4 <= 10) > +- Window [row_number() windowspecdefinition(a#7L DESC NULLS LAST, > specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS > rn#4], [a#7L DESC NULLS LAST] > +- GlobalLimit 10 > +- LocalLimit 10 > +- Sort [a#7L DESC NULLS LAST], true > +- Aggregate [a#7L], [a#7L, count(1) AS cnt#3L] > +- Project [a#7L] > +- Relation default.t1[a#7L,b#8L] parquet > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org