[ https://issues.apache.org/jira/browse/SPARK-34808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305432#comment-17305432 ]
Apache Spark commented on SPARK-34808: -------------------------------------- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/31908 > Removes outer join if it only has distinct on streamed side > ----------------------------------------------------------- > > Key: SPARK-34808 > URL: https://issues.apache.org/jira/browse/SPARK-34808 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Yuming Wang > Priority: Major > > For example: > {code:scala} > spark.range(200L).selectExpr("id AS a").createTempView("t1") > spark.range(300L).selectExpr("id AS b").createTempView("t2") > spark.sql("SELECT DISTINCT a FROM t1 LEFT JOIN t2 ON a = b").explain(true) > {code} > Current optimized plan: > {noformat} > == Optimized Logical Plan == > Aggregate [a#2L], [a#2L] > +- Project [a#2L] > +- Join LeftOuter, (a#2L = b#6L) > :- Project [id#0L AS a#2L] > : +- Range (0, 200, step=1, splits=Some(2)) > +- Project [id#4L AS b#6L] > +- Range (0, 300, step=1, splits=Some(2)) > {noformat} > Expected optimized plan: > {noformat} > == Optimized Logical Plan == > Aggregate [a#2L], [a#2L] > +- Project [id#0L AS a#2L] > +- Range (0, 200, step=1, splits=Some(2)) > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org