returnString opened a new pull request #8917: URL: https://github.com/apache/arrow/pull/8917
I've got a use case for this with a custom TableProvider implementation, so thought I'd give this a go :) This PR allows TableProviders to optionally indicate that they support handling filter expressions either: - Inexactly, to simply optimise data retrieval in an approximate fashion; e.g. pruning in your classic chunked storage system with min/max column metadata stored per chunk - Exactly, in which case the relevant filter plan nodes can be optimised out entirely Some preemptive concerns from my side: - Most of these concepts could probably have better names, open to suggestions here. - I'm not sure whether expressions are the correct thing to be pushing down to the provider. - I've had to update quite a few `scan` callsites with empty filter lists. Could this be handled in a better way? - Currently, only table scans using TableSource::FromProvider are supported, because we need a reference to the provider at optimisation time. #8910 removes the provider/named-based reference distinction entirely so I can rebase this once that's merged and add an extra test using an ordinary sql statement, rather than just a `ctx.read_table(provider)` call. I'd appreciate any thoughts or feedback! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
