Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r160077189 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -263,6 +263,17 @@ object SQLConf { .booleanConf .createWithDefault(false) + val DISK_TO_MEMORY_SIZE_FACTOR = buildConf( + "spark.sql.sources.compressionFactor") + .internal() + .doc("The result of multiplying this factor with the size of data source files is propagated " + + "to serve as the stats to choose the best execution plan. In the case where the " + --- End diff -- `When estimating the output data size of a table scan, multiple the file size with this factor as the estimated data size, in case the data is compressed in the file and lead to a heavily underestimated result.`
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org