GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/20205
[SPARK-16060][SQL][follow-up] add a wrapper solution for vectorized orc reader ## What changes were proposed in this pull request? This is mostly from https://github.com/apache/spark/pull/13775 The wrapper solution is pretty good for string/binary type, as the ORC column vector doesn't keep bytes in a continuous memory region, and has a significant overhead when copying the data to Spark columnar batch. For other cases, the wrapper solution is almost same with the current solution. I think we can treat the wrapper solution as a baseline and keep improving the writing to Spark solution. ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark orc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20205.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20205 ---- commit bdf9dbfa807d3b6840f3133889d9c8ba7abc475f Author: Wenchen Fan <wenchen@...> Date: 2018-01-09T16:01:47Z add a wrapper solution for vectorized orc reader ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org