alamb opened a new pull request, #7537:
URL: https://github.com/apache/arrow-rs/pull/7537

   Draft until
   - [ ] Change from RowSelection to an `enum`
   - [ ] Write unit tests
   
   # Which issue does this PR close?
   
   
   This is a step towards implementing Adaptive parquet filter selections:
    - #5523
    
   # Rationale for this change
    
   Part of the idea of adaptive decoding is the need to have different read 
strategies based on the patterns of rows selected
   
   The current code mixes 
   1. The determination of the exact read/skip pattern 
   3. The actual  decoding of the rows. 
   
   This makes it hard to add additional complexity to determining the read/skip 
pattern, for example @zhuqi-lucas had to put Bitmap selection the logic in the 
middle of the decoder here:  
   - https://github.com/apache/arrow-rs/pull/7524
   
   Similarly to the way the `filter` kernel decides up front how to scan, I 
think we should also change the parquet reader to determine what to do up front 
and then just do it during decode.
   
   Splitting the planning from the execution also gives us a place to generate 
(and unit test) various heuristics for the plan
    
   
   Change
   
   1. Move the calculation of when to read/emit rows into ReadPlan construction
   2. Decode simply 
   
   There is no change in behavior intended -- the selection evaluation is not 
yet adaptive. This is meant to be a pure refactoring. I have added tests / test 
framework to make it easier to make this adaptive in the future
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is 
sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please call them out.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to