[ https://issues.apache.org/jira/browse/SPARK-35703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chao Sun updated SPARK-35703: ----------------------------- Description: Currently Spark has {{HashClusteredDistribution}} and {{ClusteredDistribution}}. The only difference between the two is that the former is more strict when deciding whether bucket join is allowed to avoid shuffle: comparing to the latter, it requires *exact* match between the clustering keys from the output partitioning (i.e., {{HashPartitioning}}) and the join keys. However, this is unnecessary, as we should be able to avoid shuffle when the set of clustering keys is a subset of join keys, just like {{ClusteredDistribution}}. (was: Currently Spark has {{HashClusteredDistribution}} and {{ClusteredDistribution}}. The only difference between the two is that the former is more strict when deciding whether bucket join is allowed to avoid shuffle: comparing to the latter, it requires *exact* match between the clustering keys from the output partitioning and the join keys. However, this is unnecessary, as we should be able to avoid shuffle when the set of clustering keys is a subset of join keys, just like {{ClusteredDistribution}}. ) > Remove HashClusteredDistribution > -------------------------------- > > Key: SPARK-35703 > URL: https://issues.apache.org/jira/browse/SPARK-35703 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Chao Sun > Priority: Major > > Currently Spark has {{HashClusteredDistribution}} and > {{ClusteredDistribution}}. The only difference between the two is that the > former is more strict when deciding whether bucket join is allowed to avoid > shuffle: comparing to the latter, it requires *exact* match between the > clustering keys from the output partitioning (i.e., {{HashPartitioning}}) and > the join keys. However, this is unnecessary, as we should be able to avoid > shuffle when the set of clustering keys is a subset of join keys, just like > {{ClusteredDistribution}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org