[ https://issues.apache.org/jira/browse/SPARK-36478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wan Kun updated SPARK-36478: ---------------------------- Description: Removes outer join if all grouping and aggregate expressions are from the streamed side. For example: {code:java} spark.range(200L).selectExpr("id AS a", "id as b", "id as c").createTempView("t1") spark.range(300L).selectExpr("id AS a", "id as b", "id as c").createTempView("t2") spark.sql("SELECT t1.b, max(t1.c) as c FROM t1 LEFT JOIN t2 ON t1.a = t2.a GROUP BY t1.b").explain(true) {code} Current optimized plan: {code:java} == Optimized Logical Plan == Aggregate [b#3L], [b#3L, max(c#4L) AS c#20L] +- Project [b#3L, c#4L] +- Join LeftOuter, (a#2L = a#10L) :- Project [id#0L AS a#2L, id#0L AS b#3L, id#0L AS c#4L] : +- Range (0, 200, step=1, splits=Some(1)) +- Project [id#8L AS a#10L] +- Range (0, 300, step=1, splits=Some(1)) {code} Expected optimized plan: {code:java} == Optimized Logical Plan == Aggregate [b#277L], [b#277L, max(c#278L) AS c#290L] +- Project [id#274L AS b#277L, id#274L AS c#278L] +- Range (0, 200, step=1, splits=Some(2)) {code} was: Removes outer join if all grouping and aggregate expressions are from the streamed side. For example: {code:java} spark.range(200L).selectExpr("id AS a").createTempView("t1") spark.range(300L).selectExpr("id AS b").createTempView("t2") spark.sql("SELECT DISTINCT a FROM t1 LEFT JOIN t2 ON a = b").explain(true){code} Current optimized plan: {code:java} == Optimized Logical Plan == Aggregate [b#3L], [b#3L, max(c#4L) AS c#20L] +- Project [b#3L, c#4L] +- Join LeftOuter, (a#2L = a#10L) :- Project [id#0L AS a#2L, id#0L AS b#3L, id#0L AS c#4L] : +- Range (0, 200, step=1, splits=Some(1)) +- Project [id#8L AS a#10L] +- Range (0, 300, step=1, splits=Some(1)) {code} Expected optimized plan: {code:java} == Optimized Logical Plan == Aggregate [b#277L], [b#277L, max(c#278L) AS c#290L] +- Project [id#274L AS b#277L, id#274L AS c#278L] +- Range (0, 200, step=1, splits=Some(2)) {code} > Removes outer join if all grouping and aggregate expressions are from the > streamed side > --------------------------------------------------------------------------------------- > > Key: SPARK-36478 > URL: https://issues.apache.org/jira/browse/SPARK-36478 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Wan Kun > Priority: Minor > > Removes outer join if all grouping and aggregate expressions are from the > streamed side. > For example: > {code:java} > spark.range(200L).selectExpr("id AS a", "id as b", "id as > c").createTempView("t1") > spark.range(300L).selectExpr("id AS a", "id as b", "id as > c").createTempView("t2") > spark.sql("SELECT t1.b, max(t1.c) as c FROM t1 LEFT JOIN t2 ON t1.a = t2.a > GROUP BY t1.b").explain(true) > {code} > Current optimized plan: > {code:java} > == Optimized Logical Plan == > Aggregate [b#3L], [b#3L, max(c#4L) AS c#20L] > +- Project [b#3L, c#4L] > +- Join LeftOuter, (a#2L = a#10L) > :- Project [id#0L AS a#2L, id#0L AS b#3L, id#0L AS c#4L] > : +- Range (0, 200, step=1, splits=Some(1)) > +- Project [id#8L AS a#10L] > +- Range (0, 300, step=1, splits=Some(1)) > {code} > Expected optimized plan: > {code:java} > == Optimized Logical Plan == > Aggregate [b#277L], [b#277L, max(c#278L) AS c#290L] > +- Project [id#274L AS b#277L, id#274L AS c#278L] > +- Range (0, 200, step=1, splits=Some(2)) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org