Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19001#discussion_r183949990 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -39,7 +40,10 @@ case class SortMergeJoinExec( joinType: JoinType, condition: Option[Expression], left: SparkPlan, - right: SparkPlan) extends BinaryExecNode with CodegenSupport { + right: SparkPlan, + requiredNumPartitions: Option[Int] = None, + hashingFunctionClass: Class[_ <: HashExpression[Int]] = classOf[Murmur3Hash]) --- End diff -- I think this can be done in a followup. For the first version we can just add a `HiveHashPartitioning`, which can satisfy `ClusteredDistribution`(save shuffle for aggregate) but not `HashClusteredDistribution`(can't save shuffle for join).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org