GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/14726
[SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader` ## What changes were proposed in this pull request? Jira: https://issues.apache.org/jira/browse/SPARK-16862 `BufferedInputStream` used in `UnsafeSorterSpillReader` uses the default 8k buffer to read data off disk. This PR makes it configurable to improve on disk reads. I have made the default value to be 1 MB as with that value I observed improved performance. ## How was this patch tested? I am relying on the existing unit tests. ## Performance After deploying this change to prod and setting the config to 1 mb, there was a 12% reduction in the CPU time and 19.5% reduction in CPU reservation time. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark spill_buffer_2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14726.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14726 ---- commit c4f37b6c8d3f1a8a565b1f215f55a501edece778 Author: Tejas Patil <tej...@fb.com> Date: 2016-08-20T05:06:03Z [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader` ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org