andygrove opened a new issue, #745: URL: https://github.com/apache/datafusion-comet/issues/745
### What is the problem the feature request solves? It is very common to have scan -> filter as inputs to a join. The copying of data in the filter can be expensive when the batch contains strings and complex types, and the result of the filter is discarded after the join. I believe that it would be more efficient to have the join use a selection vector to read inputs from the scanned batch rather than perform a filter. This issue is for tracking the work to create a small prototype to demonstrate. If succesful, then we can discuss making changes in upstream DataFusion to add support for a new `ColumnarValue::ArrayWithSelectionVector` and then add a specialization in SortMergeJoin to take advantage of this. ### Describe the potential solution _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org