GitHub user CodingCat opened a pull request:

    https://github.com/apache/spark/pull/20072

    [SPARK-22790][SQL] add a configurable factor to describe HadoopFsRelation's 
size

    ## What changes were proposed in this pull request?
    
    as per discussion in 
https://github.com/apache/spark/pull/19864#discussion_r156847927
    
    the current HadoopFsRelation is purely based on the underlying file size 
which is not accurate and makes the execution vulnerable to errors like OOM
    
    Users can enable CBO with the functionalities in 
https://github.com/apache/spark/pull/19864 to avoid this issue
    
    This JIRA proposes to add a configurable factor to sizeInBytes method in 
HadoopFsRelation class so that users can mitigate this problem without CBO
    
    ## How was this patch tested?
    
    Existing tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/CodingCat/spark SPARK-22790

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20072.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20072
    
----
commit b02d857f20f594d87c8c48991bfbbe95a71b364a
Author: CodingCat <zhunansjtu@...>
Date:   2016-03-07T14:37:37Z

    improve the doc for "spark.memory.offHeap.size"

commit e09d60f3dd212dc0ce9687b112970c7cf1e4c83b
Author: CodingCat <zhunansjtu@...>
Date:   2016-03-07T14:37:37Z

    improve the doc for "spark.memory.offHeap.size"

commit 2ebc6caab7c540b43a50b7e0f27b8f4c278e5611
Author: CodingCat <zhunansjtu@...>
Date:   2016-03-07T19:00:16Z

    fix

commit 9b87ba8830d102c2568e338787e8b49b284dd8b1
Author: CodingCat <zhunansjtu@...>
Date:   2016-03-07T19:00:16Z

    fix

commit e6065c75015b8a2c0eff9f3c6e1ebfe148b28e65
Author: CodingCat <zhunansjtu@...>
Date:   2017-12-25T03:21:02Z

    add a configurable factor to describe HadoopFsRelation's size

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to