Re: [PR] Allow swapping projections / row filters at row group boundaries in Parquet reader [arrow-rs]

via GitHub Fri, 15 May 2026 12:33:48 -0700


alamb commented on code in PR #9968:
URL: https://github.com/apache/arrow-rs/pull/9968#discussion_r3250552491



##########
parquet/src/arrow/push_decoder/mod.rs:
##########
@@ -377,6 +379,100 @@ impl ParquetPushDecoder {
     pub fn clear_all_ranges(&mut self) {
         self.state.clear_all_ranges();
     }
+
+    /// True iff the decoder is at a row-group boundary and a
+    /// [`Self::swap_strategy`] call would succeed.
+    ///
+    /// A boundary is "between row groups": the previous row group's
+    /// [`ParquetRecordBatchReader`] has been fully extracted (via
+    /// [`Self::try_next_reader`]) or fully drained (via [`Self::try_decode`]),
+    /// and the next row group has not yet been planned. While
+    /// [`Self::try_decode`] is iterating an active row group's reader this
+    /// returns `false`; with [`Self::try_next_reader`] there is a clean
+    /// window between two consecutive returns where this is `true`.
+    pub fn can_swap_strategy(&self) -> bool {
+        self.state.can_swap_strategy()
+    }
+
+    /// Number of row groups left to decode after the one currently in flight.
+    /// Useful as a "should I bother computing a new strategy?" signal.
+    pub fn row_groups_remaining(&self) -> usize {
+        self.state.row_groups_remaining()
+    }
+
+    /// Replace projection / row filter / row selection policy for subsequent
+    /// row groups.
+    ///
+    /// Returns `Err(ParquetError::General)` when called outside a row-group
+    /// boundary; check [`Self::can_swap_strategy`] first.
+    ///
+    /// The decoder's internal `PushBuffers` are preserved across the swap.
+    /// Bytes that have already been fetched for columns the new strategy
+    /// still needs will not be re-requested. Bytes for columns the new
+    /// strategy no longer needs remain buffered until
+    /// [`Self::clear_all_ranges`] is called or the decoder is dropped.
+    ///
+    /// Limit, offset, batch size, metadata, fields, and predicate-cache size
+    /// are unchanged by a swap.
+    pub fn swap_strategy(&mut self, swap: StrategySwap) -> Result<(), 
ParquetError> {
+        self.state.swap_strategy(swap)
+    }
+}
+
+/// Description of a strategy swap to apply via
+/// [`ParquetPushDecoder::swap_strategy`].
+///
+/// Each field is `Option`-wrapped so callers only override what they intend
+/// to change. Fields left as `None` carry their previous value through the
+/// swap.
+///
+/// [`Self::filter`] is doubly-`Option`-wrapped on purpose: the outer
+/// `Option` is "do you want to change the filter?", the inner is "set or
+/// clear?". `Some(None)` clears any previously-installed filter.
+#[derive(Debug, Default)]
+#[non_exhaustive]

Review Comment:
   Our comments crossed
   
   I think the way you handled it looks good to me (and the `into_parts` API is 
a bit wacky but it is an internal implementation detail)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Allow swapping projections / row filters at row group boundaries in Parquet reader [arrow-rs]

Reply via email to