alamb commented on code in PR #7850: URL: https://github.com/apache/arrow-rs/pull/7850#discussion_r2208531957
########## parquet/src/arrow/async_reader/mod.rs: ########## @@ -613,8 +623,18 @@ where .fetch(&mut self.input, predicate.projection(), selection) .await?; + let mut cache_projection = predicate.projection().clone(); + cache_projection.intersect(&projection); Review Comment: So one thing I didn't understand after reading this PR in detail was how the relative row positions are updated after applying a filter. For example if we are applying multiple filters, the first may reduce the original RowSelection down to `[100->200]`, and now when the second filter runs it is only evaluated on the 100->200 rows , not the original selection In other words I think there needs to be some sort of function equvalent to `RowSelection::and_then` that applies to the cache ```rust // Narrow the cache so that it only retains the results of evaluating the predicate let row_group_cache = row_group_cache.and_then(resulting_selection) ``` Maybe this is the root cause of https://github.com/apache/datafusion/actions/runs/16302299778/job/46039904381?pr=16711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org