[ https://issues.apache.org/jira/browse/SPARK-34283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-34283: ------------------------------------ Assignee: (was: Apache Spark) > Combines all adjacent 'Union' operators into a single 'Union' when using > 'Dataset.union.distinct' > ------------------------------------------------------------------------------------------------- > > Key: SPARK-34283 > URL: https://issues.apache.org/jira/browse/SPARK-34283 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Zhichao Zhang > Priority: Major > Attachments: image-2021-01-29-11-12-44-112.png, > image-2021-01-29-11-13-42-055.png, image-2021-01-29-11-14-08-822.png, > image-2021-01-29-11-14-42-700.png > > > Problem: > Currently when using 'Dataset.union.distinct' to union some datasets, > Optimizer can't combine all adjacent 'Union' operators into a single 'Union', > but it can handle this case when using sql. > For example: > !image-2021-01-29-11-12-44-112.png! > The 'Physical Plan' is shown below: > !image-2021-01-29-11-13-42-055.png! > But using sql: > !image-2021-01-29-11-14-08-822.png! > The 'Physical Plan' is shown below: > !image-2021-01-29-11-14-42-700.png! > > Root cause: > When using 'Dataset.union.distinct', the operator is 'Deduplicate(Columns, > Union)', but AstBuilder transform sql 'Union' to operator 'Distinct(Union)', > the rule 'CombineUnions' in Optimizer only handle 'Distinct(Union)' operator > not Deduplicate(Columns, Union) > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org