nchammas commented on code in PR #45374: URL: https://github.com/apache/spark/pull/45374#discussion_r1512030136
########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -582,11 +582,7 @@ object SQLConf { val AUTO_BROADCASTJOIN_THRESHOLD = buildConf("spark.sql.autoBroadcastJoinThreshold") .doc("Configures the maximum size in bytes for a table that will be broadcast to all worker " + - "nodes when performing a join. By setting this value to -1 broadcasting can be disabled. " + - "Note that currently statistics are only supported for Hive Metastore tables where the " + Review Comment: Fair question. I removed it because I don't think it explains anything. Across all of Spark, statistics come from one of the three sources I described in this PR: data source, catalog, and runtime. And this applies to all cost-based optimizations, not just to auto-broadcast. Isn't that so? So I thought it would be better to remove this comment since it indirectly suggests that there is something special about auto-broadcast and statistics, when that isn't the case. But I confess I am concluding this based on a high-level understanding of the optimizer. I didn't dig in to the details of this particular optimization to see if there is anything really special about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org