[GitHub] spark pull request #19001: [SPARK-19256][SQL] Hive bucketing support

cloud-fan Tue, 24 Apr 2018 23:25:49 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19001#discussion_r183949990
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
    @@ -39,7 +40,10 @@ case class SortMergeJoinExec(
         joinType: JoinType,
         condition: Option[Expression],
         left: SparkPlan,
    -    right: SparkPlan) extends BinaryExecNode with CodegenSupport {
    +    right: SparkPlan,
    +    requiredNumPartitions: Option[Int] = None,
    +    hashingFunctionClass: Class[_ <: HashExpression[Int]] = 
classOf[Murmur3Hash])
    --- End diff --
    
    I think this can be done in a followup. For the first version we can just 
add a `HiveHashPartitioning`, which can satisfy `ClusteredDistribution`(save 
shuffle for aggregate) but not `HashClusteredDistribution`(can't save shuffle 
for join).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19001: [SPARK-19256][SQL] Hive bucketing support

Reply via email to