GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22750
[SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeRowConversion ## What changes were proposed in this pull request? `needsUnsafeRowConversion` is used in 2 places: 1. `ColumnarBatchScan.produceRows` 2. `FileSourceScanExec.doExecute` When we go to `ColumnarBatchScan.produceRows`, it means whole stage codegen is on but the vectorized reader is off. The vectorized reader can be off for several reasons: 1. the file format doesn't have a vectorized reader(json, csv, etc.) 2. the vectorized reader config is off 3. the schema is not supported Anyway when the vectorized reader is off, file format reader will always return unsafe rows, so `ColumnarBatchScan.needsUnsafeRowConversion` is not needed. When we go to `FileSourceScanExec.doExecute`, it means whole stage codegen is off. For this case, we need the `needsUnsafeRowConversion` to convert `ColumnarRow` to `UnsafeRow`, if the file format reader returns batch. This PR removes `ColumnarBatchScan.needsUnsafeRowConversion`, and keep this flag only in `FileSourceScanExec` ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22750.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22750 ---- commit 27e6b974192596baa86ac8b38b28c56e65e3c184 Author: Wenchen Fan <wenchen@...> Date: 2018-10-16T15:55:06Z remove ColumnarBatchScan.needsUnsafeRowConversion ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org