Wan Kun created SPARK-36478: ------------------------------- Summary: Removes outer join if all grouping and aggregate expressions are from the streamed side Key: SPARK-36478 URL: https://issues.apache.org/jira/browse/SPARK-36478 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Wan Kun
Removes outer join if all grouping and aggregate expressions are from the streamed side. For example: {code:java} spark.range(200L).selectExpr("id AS a").createTempView("t1") spark.range(300L).selectExpr("id AS b").createTempView("t2") spark.sql("SELECT DISTINCT a FROM t1 LEFT JOIN t2 ON a = b").explain(true){code} Current optimized plan: {code:java} == Optimized Logical Plan == Aggregate [b#3L], [b#3L, max(c#4L) AS c#20L] +- Project [b#3L, c#4L] +- Join LeftOuter, (a#2L = a#10L) :- Project [id#0L AS a#2L, id#0L AS b#3L, id#0L AS c#4L] : +- Range (0, 200, step=1, splits=Some(1)) +- Project [id#8L AS a#10L] +- Range (0, 300, step=1, splits=Some(1)) {code} Expected optimized plan: {code:java} == Optimized Logical Plan == Aggregate [b#277L], [b#277L, max(c#278L) AS c#290L] +- Project [id#274L AS b#277L, id#274L AS c#278L] +- Range (0, 200, step=1, splits=Some(2)) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org