Re: [PR] GH-45847: [C++] Optimize Parquet column reader by fusing decoding and counting [arrow]

via GitHub Thu, 18 Dec 2025 02:01:50 -0800


pitrou commented on code in PR #48549:
URL: https://github.com/apache/arrow/pull/48549#discussion_r2630368777



##########
cpp/src/arrow/util/rle_encoding_internal.h:
##########
@@ -377,6 +394,19 @@ class BitPackedRunDecoder {
     return steps;
   }
 
+  /// Get a batch of values and count how many equal match_value
+  /// Note: For bit-packed runs, we use std::count after GetBatch since it's
+  /// highly optimized by the compiler. The fused approach is only beneficial
+  /// for RLE runs where counting is O(1).
+  [[nodiscard]] rle_size_t GetBatchWithCount(value_type* out, rle_size_t 
batch_size,
+                                             rle_size_t value_bit_width,
+                                             value_type match_value, int64_t* 
out_count) {
+    const auto steps = GetBatch(out, batch_size, value_bit_width);
+    // std::count is highly optimized (SIMD) by modern compilers
+    *out_count += std::count(out, out + steps, match_value);

Review Comment:
   The typical batch size for levels is probably small, so it would fit at 
least in L2 cache and perhaps L1. Not sure it's worth trying to do it while 
decoding.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-45847: [C++] Optimize Parquet column reader by fusing decoding and counting [arrow]

Reply via email to