[GitHub] spark pull request #20205: [SPARK-16060][SQL][follow-up] add a wrapper solut...

cloud-fan Tue, 09 Jan 2018 08:09:01 -0800

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20205


    [SPARK-16060][SQL][follow-up] add a wrapper solution for vectorized orc 
reader

    ## What changes were proposed in this pull request?
    
    This is mostly from https://github.com/apache/spark/pull/13775
    
    The wrapper solution is pretty good for string/binary type, as the ORC 
column vector doesn't keep bytes in a continuous memory region, and has a 
significant overhead when copying the data to Spark columnar batch. For other 
cases, the wrapper solution is almost same with the current solution.
    
    I think we can treat the wrapper solution as a baseline and keep improving 
the writing to Spark solution.
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark orc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20205
    
----
commit bdf9dbfa807d3b6840f3133889d9c8ba7abc475f
Author: Wenchen Fan <wenchen@...>
Date:   2018-01-09T16:01:47Z

    add a wrapper solution for vectorized orc reader

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20205: [SPARK-16060][SQL][follow-up] add a wrapper solut...

Reply via email to