jhorstmann commented on code in PR #7748:
URL: https://github.com/apache/arrow-rs/pull/7748#discussion_r2166381695
##########
arrow-array/src/array/byte_view_array.rs:
##########
@@ -537,17 +538,46 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
left_idx: usize,
right: &GenericByteViewArray<T>,
right_idx: usize,
- ) -> std::cmp::Ordering {
+ ) -> Ordering {
let l_view = left.views().get_unchecked(left_idx);
let l_len = *l_view as u32;
let r_view = right.views().get_unchecked(right_idx);
let r_len = *r_view as u32;
if l_len <= 12 && r_len <= 12 {
- let l_data = unsafe {
GenericByteViewArray::<T>::inline_value(l_view, l_len as usize) };
- let r_data = unsafe {
GenericByteViewArray::<T>::inline_value(r_view, r_len as usize) };
- return l_data.cmp(r_data);
+ // Directly load the 16-byte view as an u128 (little-endian)
+ let l_bits: u128 = unsafe { *left.views().get_unchecked(left_idx)
};
+ let r_bits: u128 = unsafe {
*right.views().get_unchecked(right_idx) };
+
+ // The lower 32 bits encode the length (little-endian),
+ // the upper 96 bits hold the actual data
+ let l_len = l_bits as u32;
+ let r_len = r_bits as u32;
+
+ // Remove the length bits, leaving only the data
+ let l_data = l_bits >> 32;
+ let r_data = r_bits >> 32;
+
+ // The data is stored in little-endian order. To compare
lexicographically,
+ // convert to big-endian:
+ let l_be = l_data.swap_bytes();
+ let r_be = r_data.swap_bytes();
+
+ // Compare only the first min_len bytes
+ let min_len = l_len.min(r_len);
+ // We have all 12 bytes in the high bits, but only want the top
min_len
+ let shift = (12 - min_len) * 8;
+ let l_partial = l_be >> shift;
Review Comment:
It might be possible to OR the length back into the lower bits, which would
then allow getting a result with a single u128 comparison. I think it would
also be beneficial to extract this code block into a shared helper function,
and add some unit tests for it. The generic code here might not be well
convered by tests because of the fast path for inline buffers elsewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]