[ https://issues.apache.org/jira/browse/SPARK-37502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
XiDuo You updated SPARK-37502: ------------------------------ Description: If a `Cast` is up cast then it should be without any truncating or precision lose or possible runtime failures. So the output partitioning should be same with/without `Cast` if the `Cast` is up cast. Let's say we have a query: {code:java} -- v1: c1 int -- v2: c2 long SELECT * FROM v2 JOIN (SELECT c1, count(*) FROM v1 GROUP BY c1) v1 ON v1.c1 = v2.c2 {code} The executed plan contains three shuffle nodes which looks like: {code:java} SortMergeJoin Exchange(cast(c1 as bigint)) HashAggregate Exchange(c1) Scan v1 Exchange(c2) Scan v2 {code} We can simplify the plan using two shuffle nodes: {code:java} SortMergeJoin HashAggregate Exchange(c1) Scan v1 Exchange(c2) Scan v2 {code} was: if a `Cast` is up cast then it should be without any truncating or precision lose or possible runtime failures. So the output partitioning should be same with/without `Cast` if the `Cast` is up cast. Let's say we have a query: {code:java} -- v1: c1 int -- v2: c2 long SELECT * FROM v2 JOIN (SELECT c1, count(*) FROM v1 GROUP BY c1) v1 ON v1.c1 = v2.c2 {code} The executed plan contains three shuffle nodes which looks like: {code:java} SortMergeJoin Exchange(cast(c1 as bigint)) HashAggregate Exchange(c1) Scan v1 Exchange(c2) Scan v2 {code} We can simply the plan using two shuffle nodes: {code:java} SortMergeJoin HashAggregate Exchange(c1) Scan v1 Exchange(c2) Scan v2 {code} > Support cast aware output partitioning and required if it can up cast > --------------------------------------------------------------------- > > Key: SPARK-37502 > URL: https://issues.apache.org/jira/browse/SPARK-37502 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: XiDuo You > Priority: Major > > If a `Cast` is up cast then it should be without any truncating or precision > lose or possible runtime failures. So the output partitioning should be same > with/without `Cast` if the `Cast` is up cast. > Let's say we have a query: > {code:java} > -- v1: c1 int > -- v2: c2 long > SELECT * FROM v2 JOIN (SELECT c1, count(*) FROM v1 GROUP BY c1) v1 ON v1.c1 = > v2.c2 > {code} > The executed plan contains three shuffle nodes which looks like: > {code:java} > SortMergeJoin > Exchange(cast(c1 as bigint)) > HashAggregate > Exchange(c1) > Scan v1 > Exchange(c2) > Scan v2 > {code} > We can simplify the plan using two shuffle nodes: > {code:java} > SortMergeJoin > HashAggregate > Exchange(c1) > Scan v1 > Exchange(c2) > Scan v2 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org