tokoko opened a new pull request, #3520: URL: https://github.com/apache/datafusion-comet/pull/3520
Closes #3518 ## What changes are included in this PR? - Introduces a new `tryZeroCopyConvert` method in `CometArrowConverters` which receives `ColumarBatch` of any type and returns `ColumnarBatch` of `CometVector` objects if the input is composed of `ArrowColumnVector` objects, returns None otherwise. - Columnar conversion path in `CometSparkToColumnarExec` always tries `tryZeroCopyConvert` first and falls back to current flow if zero-copy conversion is impossible. - The implementation **ignores batchSize configuration** as it would be a lot more involved to do that with zero-copy... and I think zero-copy is more important in this case, especially if you assume that whatever operator produces the input will also have some similar configuration. Happy to change the implementation if you disagree though. ## How are these changes tested? - added tests that test conversion of hand-crafted `ColumnarBatch` objects as there's no out-of-box data source in spark that produces `ColumnarBatch` of `ArrowColumnVector` objects. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
