GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/20072
[SPARK-22790][SQL] add a configurable factor to describe HadoopFsRelation's size ## What changes were proposed in this pull request? as per discussion in https://github.com/apache/spark/pull/19864#discussion_r156847927 the current HadoopFsRelation is purely based on the underlying file size which is not accurate and makes the execution vulnerable to errors like OOM Users can enable CBO with the functionalities in https://github.com/apache/spark/pull/19864 to avoid this issue This JIRA proposes to add a configurable factor to sizeInBytes method in HadoopFsRelation class so that users can mitigate this problem without CBO ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/CodingCat/spark SPARK-22790 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20072.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20072 ---- commit b02d857f20f594d87c8c48991bfbbe95a71b364a Author: CodingCat <zhunansjtu@...> Date: 2016-03-07T14:37:37Z improve the doc for "spark.memory.offHeap.size" commit e09d60f3dd212dc0ce9687b112970c7cf1e4c83b Author: CodingCat <zhunansjtu@...> Date: 2016-03-07T14:37:37Z improve the doc for "spark.memory.offHeap.size" commit 2ebc6caab7c540b43a50b7e0f27b8f4c278e5611 Author: CodingCat <zhunansjtu@...> Date: 2016-03-07T19:00:16Z fix commit 9b87ba8830d102c2568e338787e8b49b284dd8b1 Author: CodingCat <zhunansjtu@...> Date: 2016-03-07T19:00:16Z fix commit e6065c75015b8a2c0eff9f3c6e1ebfe148b28e65 Author: CodingCat <zhunansjtu@...> Date: 2017-12-25T03:21:02Z add a configurable factor to describe HadoopFsRelation's size ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org