Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11788#discussion_r56696068
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -101,10 +100,41 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
        *     [[org.apache.spark.sql.functions.broadcast()]] function to a 
DataFrame), then that side
        *     of the join will be broadcasted and the other side will be 
streamed, with no shuffling
        *     performed. If both sides of the join are eligible to be 
broadcasted then the
    +   * - Shuffle hash join: if single partition is small enough to build a 
hash table.
        * - Sort merge: if the matching join keys are sortable.
        */
       object EquiJoinSelection extends Strategy with PredicateHelper {
     
    +    /**
    +     * Matches a plan whose single partition should be small enough to 
build a hash table.
    +     */
    +    def canBuildHashMap(plan: LogicalPlan): Boolean = {
    +      plan.statistics.sizeInBytes < conf.autoBroadcastJoinThreshold * 
conf.numShufflePartitions
    --- End diff --
    
    `conf.numShufflePartitions` only works if the number of shuffle partitions 
is a constant value for a query, right? Can you add a comment at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to