GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/19864
[SPARK-22673][SQL] InMemoryRelation should utilize on-disk table stats whenever possible ## What changes were proposed in this pull request? The current implementation of InMemoryRelation always uses the most expensive execution plan when writing cache With CBO enabled, we can actually have a more exact estimation of the underlying table size... ## How was this patch tested? existing test You can merge this pull request into a Git repository by running: $ git pull https://github.com/CodingCat/spark SPARK-22673 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19864.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19864 ---- commit b2fb1d25804b7bdbe1a767306a319dc748965bce Author: CodingCat <zhunans...@gmail.com> Date: 2016-03-07T14:37:37Z improve the doc for "spark.memory.offHeap.size" commit 0971900d562cb1a18af6f7de02bb8eb95637a640 Author: CodingCat <zhunans...@gmail.com> Date: 2016-03-07T19:00:16Z fix commit 32f7c74a9b5cf4f19e7d14357bb87064383e11b3 Author: CodingCat <zhunans...@gmail.com> Date: 2017-12-01T23:05:35Z use cbo stats in inmemoryrelation ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org