HippoBaro commented on code in PR #9653:
URL: https://github.com/apache/arrow-rs/pull/9653#discussion_r3042910286
##########
parquet/src/encodings/rle.rs:
##########
@@ -122,6 +122,27 @@ impl RleEncoder {
bit_packed_max_size.max(rle_max_size)
}
+ /// Returns `true` if the encoder is currently in RLE accumulation mode
+ /// for the given value (i.e., `repeat_count > 8` and `current_value ==
value`).
+ ///
+ /// When this returns `true`, callers may use
[`extend_run`](Self::extend_run)
+ /// to add more repetitions without per-element overhead.
+ #[inline]
+ pub fn is_accumulating(&self, value: u64) -> bool {
+ self.repeat_count > 8 && self.current_value == value
Review Comment:
> should this be '>= 8'?
The RLE encoder transitions to accumulation mode **after** the 8th value has
been buffered and `flush_buffered_values()` has committed the RLE decision.
> Also, given the discussion in
https://github.com/apache/arrow-rs/issues/7739, I think it's time to at least
replace the magic 8 with a constant.
I agree! Happy to add that at the end of this series.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]