ethe opened a new issue, #7348: URL: https://github.com/apache/arrow-rs/issues/7348
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** So far, the Parquet Arrow reader provides two kinds of conditional retrieval/filtering: - Row selection: Offers select and skip methods based on row offsets, which can be pushed down to in-memory row group fetches. - Row filter: Only applies to record batches that have been read into memory and cannot be pushed down to in-memory row group fetches. Therefore, it cannot be used to skip fetching column chunks / pages that do not match the filter conditions. Although the Parquet format's statistics include min/max values, and optionally enabled sparse index information can be used to accelerate random reads and avoid unnecessary disk fetches, the row selection mechanism only supports operations related to row offsets. It lacks an API that allows users to declare filter conditions that can be pushed down into the fetch behavior, and the actual implementation of skipping column chunks / pages that do not match the filter conditions based on the index has not been realized. **Describe the solution you'd like** Add a third kind of method in addition to selection and filter. This new method allows users to specify an exact match for a column's value or a range of values, and to utilize indexing during in-memory row group fetches. This will reduce the reading of column chunks/pages that do not meet the filter conditions and improve random read efficiency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org