[ https://issues.apache.org/jira/browse/SPARK-33399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro resolved SPARK-33399. -------------------------------------- Fix Version/s: 3.1.0 Assignee: Prakhar Jain Resolution: Fixed Resolved by https://github.com/apache/spark/pull/30300 > Normalize output partitioning and sortorder with respect to aliases to avoid > unneeded exchange/sort nodes > --------------------------------------------------------------------------------------------------------- > > Key: SPARK-33399 > URL: https://issues.apache.org/jira/browse/SPARK-33399 > Project: Spark > Issue Type: Task > Components: SQL > Affects Versions: 2.4.7, 3.0.0, 3.0.1 > Reporter: Prakhar Jain > Assignee: Prakhar Jain > Priority: Major > Fix For: 3.1.0 > > > Spark introduces unneeded exchanges if there is a Project after Inner join. > Example: > > {noformat} > spark.range(10).repartition($"id").createTempView("t1") > spark.range(20).repartition($"id").createTempView("t2") > spark.range(30).repartition($"id").createTempView("t3") > val planned = sql( > """ > |SELECT t2id, t3.id as t3id > |FROM ( > | SELECT t1.id as t1id, t2.id as t2id > | FROM t1, t2 > | WHERE t1.id = t2.id > |) t12, t3 > |WHERE t1id = t3.id > """.stripMargin).queryExecution.executedPlan > *(9) Project [t2id#1034L, id#1004L AS t3id#1035L] > +- *(9) SortMergeJoin [t1id#1033L], [id#1004L], Inner > :- *(6) Sort [t1id#1033L ASC NULLS FIRST], false, 0 > : +- Exchange hashpartitioning(t1id#1033L, 5), true, [id=#1343] > <--------------- > : +- *(5) Project [id#996L AS t1id#1033L, id#1000L AS t2id#1034L] > : +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner > : :- *(2) Sort [id#996L ASC NULLS FIRST], false, 0 > : : +- Exchange hashpartitioning(id#996L, 5), true, [id=#1329] > : : +- *(1) Range (0, 10, step=1, splits=2) > : +- *(4) Sort [id#1000L ASC NULLS FIRST], false, 0 > : +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1335] > : +- *(3) Range (0, 20, step=1, splits=2) > +- *(8) Sort [id#1004L ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(id#1004L, 5), true, [id=#1349] > +- *(7) Range (0, 30, step=1, splits=2){noformat} > The marked exchange in the above plan can be removed. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org