alamb opened a new issue, #8844:
URL: https://github.com/apache/arrow-rs/issues/8844

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   After the great work from @hhhizzz  in 
https://github.com/apache/arrow-rs/pull/8733, we will (finally) have the 
ability to use a Bitmask filter representation when applying filters *during* 
Parquet decode. 
   
   https://github.com/apache/arrow-rs/pull/8733 automatically converts an 
existing 
[`RowSelection`](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html)
 (aka a `Vec<RowSelector>` of ranges) into a bitmask for evaluation. 
   
   However, at the moment, when a filter is initially evaluated, it is *always* 
converted from Bitmask --> 
[`RowSelection`](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html)
 here:
   
https://github.com/apache/arrow-rs/blob/911331aafa13f5e230440cf5d02feb245985c64e/parquet/src/arrow/arrow_reader/read_plan.rs#L168-L167
   
   This leads to inefficiency in the case where a Bitmask is converted to a 
RowSelection only to be turned back into a Bitmask for evaluation
   
   **Describe the solution you'd like**
   Add a way to avoid converting from a Mask --> Selection with the result of 
evaluating predicates
   
   I think the tricky bit will be to quickly look at a Mask and determine if it 
should be turned back into a Selection (probably we can use the same heuristics 
that @hhhizzz added in https://github.com/apache/arrow-rs/pull/8733  for going 
the other way)
   
   **Describe alternatives you've considered**
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to