pitrou commented on code in PR #47896:
URL: https://github.com/apache/arrow/pull/47896#discussion_r2452398998


##########
cpp/src/arrow/util/bpacking_dispatch_internal.h:
##########
@@ -47,7 +49,113 @@ int unpack_full(const uint8_t* in, Uint* out, int 
batch_size) {
       out[k] = FromLittleEndian(SafeLoadAs<Uint>(in + (k * sizeof(Uint))));
     }
   }
-  return batch_size;
+}
+
+/// Compute the maximum spread in bytes that a packed integer can cover.
+///
+/// This is assuming contiguous packed integer starting with the given bit 
offset away
+/// from a byte boundary.
+/// This function is non-monotonic, for instance with zero offset, three bit 
integers
+/// will be split on the first byte boundary (hence having a spread of two 
bytes) while
+/// four bit integer will be well behaved and never spread over byte boundary 
(hence
+/// having a spread of one).
+constexpr int PackedMaxSpreadBytes(int width, int bit_offset) {

Review Comment:
   This function looks complicated, but if I recode it in Python it looks like 
it can be reduced to a much simpler form that's more amenable to compile-time 
execution:
   ```python
   >>> def spread(width, bit_offset):
   ...:     max_spread = (width >> 3) + ((width & 7) != 0);
   ...:     start = bit_offset
   ...:     while True:
   ...:         byte_start = start // 8
   ...:         byte_end = (start + width - 1) // 8
   ...:         spread = byte_end - byte_start + 1
   ...:         max_spread = max(spread, max_spread)
   ...:         start += width
   ...:         if start % 8 == bit_offset:
   ...:             return max_spread
   ...: 
   >>> [spread(8, b) for b in range(8)]
   [1, 2, 2, 2, 2, 2, 2, 2]
   >>> [spread(16, b) for b in range(8)]
   [2, 3, 3, 3, 3, 3, 3, 3]
   >>> [spread(32, b) for b in range(8)]
   [4, 5, 5, 5, 5, 5, 5, 5]
   ```
   



##########
cpp/src/arrow/util/bpacking_dispatch_internal.h:
##########
@@ -47,7 +49,113 @@ int unpack_full(const uint8_t* in, Uint* out, int 
batch_size) {
       out[k] = FromLittleEndian(SafeLoadAs<Uint>(in + (k * sizeof(Uint))));
     }
   }
-  return batch_size;
+}
+
+/// Compute the maximum spread in bytes that a packed integer can cover.
+///
+/// This is assuming contiguous packed integer starting with the given bit 
offset away
+/// from a byte boundary.
+/// This function is non-monotonic, for instance with zero offset, three bit 
integers
+/// will be split on the first byte boundary (hence having a spread of two 
bytes) while
+/// four bit integer will be well behaved and never spread over byte boundary 
(hence
+/// having a spread of one).
+constexpr int PackedMaxSpreadBytes(int width, int bit_offset) {

Review Comment:
   And, as noted below, calling with `bit_offset` equal to 8 or larger results 
in an infinite loop...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to