[ https://issues.apache.org/jira/browse/SPARK-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tejas Patil updated SPARK-17570: -------------------------------- Issue Type: Sub-task (was: Improvement) Parent: SPARK-19256 > Avoid Hash and Exchange in Sort Merge join if bucketing factor is multiple > for tables > ------------------------------------------------------------------------------------- > > Key: SPARK-17570 > URL: https://issues.apache.org/jira/browse/SPARK-17570 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.0.0 > Reporter: Tejas Patil > Priority: Minor > > In case of bucketed tables, Spark will avoid doing `Sort` and `Exchange` if > the input tables and output table has same number of buckets. However, > unequal bucketing will always lead to `Sort` and `Exchange`. If the number of > buckets in the output table is a factor of the buckets in the input table, we > should be able to avoid `Sort` and `Exchange` and directly join those. > eg. > Assume Input1, Input2 and Output be bucketed + sorted tables over the same > columns but with different number of buckets. Input1 has 8 buckets, Input1 > has 4 buckets and Output has 4 buckets. Since hash-partitioning is done using > Modulus, if we JOIN buckets (0, 4) of Input1 and buckets (0, 4, 8) of Input2 > in the same task, it would give the bucket 0 of output table. > {noformat} > Input1 (0, 4) (1, 3) (2, 5) (3, 7) > Input2 (0, 4, 8) (1, 3, 9) (2, 5, 10) (3, 7, 11) > Output (0) (1) (2) (3) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org