[ https://issues.apache.org/jira/browse/SPARK-34622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-34622: -------------------------------- Summary: Push down limit through Project (was: Fix push down limit through join) > Push down limit through Project > ------------------------------- > > Key: SPARK-34622 > URL: https://issues.apache.org/jira/browse/SPARK-34622 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0 > Reporter: Yuming Wang > Priority: Major > > We will insert a Project if the Join output is not match the LocalLimit. We > can not push down limit through join in this case. For example: > {code:scala} > spark.sql("create table t1(a int, b int, c int) using parquet") > spark.sql("create table t2(x int, y int, z int) using parquet") > spark.sql("select a from t1 left join t2 on a = x and b = y limit > 5").explain("cost") > {code} > Current: > {noformat} > == Optimized Logical Plan == > GlobalLimit 5 > +- LocalLimit 5 > +- Project [a#0] > +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4)) > :- Project [a#0, b#1] > : +- Relation default.t1[a#0,b#1,c#2] parquet > +- Project [x#3, y#4] > +- Filter (isnotnull(x#3) AND isnotnull(y#4)) > +- Relation default.t2[x#3,y#4,z#5] parquet > {noformat} > Excepted: > {noformat} > == Optimized Logical Plan == > GlobalLimit 5 > +- LocalLimit 5 > +- Project [a#0] > +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4)) > :- LocalLimit 5 > : +- Project [a#0, b#1] > : +- Relation default.t1[a#0,b#1,c#2] parquet > +- Project [x#3, y#4] > +- Filter (isnotnull(x#3) AND isnotnull(y#4)) > +- Relation default.t2[x#3,y#4,z#5] parquet > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org