Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56696068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -101,10 +100,41 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { * [[org.apache.spark.sql.functions.broadcast()]] function to a DataFrame), then that side * of the join will be broadcasted and the other side will be streamed, with no shuffling * performed. If both sides of the join are eligible to be broadcasted then the + * - Shuffle hash join: if single partition is small enough to build a hash table. * - Sort merge: if the matching join keys are sortable. */ object EquiJoinSelection extends Strategy with PredicateHelper { + /** + * Matches a plan whose single partition should be small enough to build a hash table. + */ + def canBuildHashMap(plan: LogicalPlan): Boolean = { + plan.statistics.sizeInBytes < conf.autoBroadcastJoinThreshold * conf.numShufflePartitions --- End diff -- `conf.numShufflePartitions` only works if the number of shuffle partitions is a constant value for a query, right? Can you add a comment at here?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org