[ https://issues.apache.org/jira/browse/SPARK-34283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-34283. --------------------------------- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31404 [https://github.com/apache/spark/pull/31404] > Combines all adjacent 'Union' operators into a single 'Union' when using > 'Dataset.union.distinct.union.distinct' > ---------------------------------------------------------------------------------------------------------------- > > Key: SPARK-34283 > URL: https://issues.apache.org/jira/browse/SPARK-34283 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Zhichao Zhang > Priority: Major > Fix For: 3.2.0 > > Attachments: image-2021-01-29-11-12-44-112.png, > image-2021-01-29-11-13-42-055.png, image-2021-01-29-11-14-08-822.png, > image-2021-01-29-11-14-42-700.png > > > Problem: > Currently when using 'Dataset.union.distinct.union.distinct' to union some > datasets, Optimizer can't combine all adjacent 'Union' operators into a > single 'Union', but it can handle this case when using sql. > For example: > !image-2021-01-29-11-12-44-112.png! > The 'Physical Plan' is shown below: > !image-2021-01-29-11-13-42-055.png! > But using sql: > !image-2021-01-29-11-14-08-822.png! > The 'Physical Plan' is shown below: > !image-2021-01-29-11-14-42-700.png! > > Root cause: > When using 'Dataset.union.distinct.union.distinct', the operator is > 'Deduplicate(Keys, Union)', but AstBuilder transform sql 'Union' to operator > 'Distinct(Union)', the rule 'CombineUnions' in Optimizer only handle > 'Distinct(Union)' operator but not Deduplicate(Keys, Union). > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org