[GitHub] spark pull request #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needs...

cloud-fan Tue, 16 Oct 2018 09:06:02 -0700

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/22750


    [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeRowConversion

    ## What changes were proposed in this pull request?
    
    `needsUnsafeRowConversion` is used in 2 places:
    1. `ColumnarBatchScan.produceRows`
    2. `FileSourceScanExec.doExecute`
    
    When we go to `ColumnarBatchScan.produceRows`, it means whole stage codegen 
is on but the vectorized reader is off. The vectorized reader can be off for 
several reasons:
    1. the file format doesn't have a vectorized reader(json, csv, etc.)
    2. the vectorized reader config is off
    3. the schema is not supported
    
    Anyway when the vectorized reader is off, file format reader will always 
return unsafe rows, so `ColumnarBatchScan.needsUnsafeRowConversion` is not 
needed.
    
    When we go to `FileSourceScanExec.doExecute`, it means whole stage codegen 
is off. For this case, we need the `needsUnsafeRowConversion` to convert 
`ColumnarRow` to `UnsafeRow`, if the file format reader returns batch.
    
    This PR removes `ColumnarBatchScan.needsUnsafeRowConversion`, and keep this 
flag only in `FileSourceScanExec`
    
    ## How was this patch tested?
    
    existing tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22750.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22750
    
----
commit 27e6b974192596baa86ac8b38b28c56e65e3c184
Author: Wenchen Fan <wenchen@...>
Date:   2018-10-16T15:55:06Z

    remove ColumnarBatchScan.needsUnsafeRowConversion

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needs...

Reply via email to