karlovnv commented on issue #7000: URL: https://github.com/apache/datafusion/issues/7000#issuecomment-2096605072
> I wonder if we could combine this with something like #7955 🤔 It's quite a good idea! But I think it's a tricky to push ON condition down. The main reason is following: we know the list of ids (in perspective of columnindex) only at JOIN stage but not at filtering and getting data from the source. So the second approach: > 2\. Introduce row-based table provider Is about adding an ability of getting data directly from HASH stream by list of ids like so: <img width="662" alt="image" src="https://github.com/apache/datafusion/assets/3950601/d3bb46bf-ece3-4519-95e2-6b6b0f33e505"> Or even better to get only offsets by ids (arrow `take` index for take_record_batch() kernel). This idea is very similar to indices in duck db. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
