alamb opened a new pull request, #7537: URL: https://github.com/apache/arrow-rs/pull/7537
Draft until - [ ] Change from RowSelection to an `enum` - [ ] Write unit tests # Which issue does this PR close? This is a step towards implementing Adaptive parquet filter selections: - #5523 # Rationale for this change Part of the idea of adaptive decoding is the need to have different read strategies based on the patterns of rows selected The current code mixes 1. The determination of the exact read/skip pattern 3. The actual decoding of the rows. This makes it hard to add additional complexity to determining the read/skip pattern, for example @zhuqi-lucas had to put Bitmap selection the logic in the middle of the decoder here: - https://github.com/apache/arrow-rs/pull/7524 Similarly to the way the `filter` kernel decides up front how to scan, I think we should also change the parquet reader to determine what to do up front and then just do it during decode. Splitting the planning from the execution also gives us a place to generate (and unit test) various heuristics for the plan Change 1. Move the calculation of when to read/emit rows into ReadPlan construction 2. Decode simply There is no change in behavior intended -- the selection evaluation is not yet adaptive. This is meant to be a pure refactoring. I have added tests / test framework to make it easier to make this adaptive in the future # What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!--- If there are any breaking changes to public APIs, please call them out. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org