Yuming Wang created SPARK-34808: ----------------------------------- Summary: Removes outer join if it only has distinct on streamed side Key: SPARK-34808 URL: https://issues.apache.org/jira/browse/SPARK-34808 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang Assignee: Yuming Wang
For example: {code:scala} spark.range(200L).selectExpr("id AS a").createTempView("t1") spark.range(300L).selectExpr("id AS b").createTempView("t2") spark.sql("SELECT DISTINCT a FROM t1 LEFT JOIN t2 ON a = b").explain(true) {code} Current optimized plan: {noformat} == Optimized Logical Plan == Aggregate [a#2L], [a#2L] +- Project [a#2L] +- Join LeftOuter, (a#2L = b#6L) :- Project [id#0L AS a#2L] : +- Range (0, 200, step=1, splits=Some(2)) +- Project [id#4L AS b#6L] +- Range (0, 300, step=1, splits=Some(2)) {noformat} Expected optimized plan: {noformat} == Optimized Logical Plan == Aggregate [a#2L], [a#2L] +- Project [id#0L AS a#2L] +- Range (0, 200, step=1, splits=Some(2)) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org