Re: [PR] feat[arrow-ord]: suppport REE comparisons [arrow-rs]

via GitHub Fri, 27 Mar 2026 06:54:52 -0700


asubiotto commented on code in PR #9621:
URL: https://github.com/apache/arrow-rs/pull/9621#discussion_r3001146354



##########
arrow-ord/src/cmp.rs:
##########
@@ -701,10 +828,37 @@ pub fn compare_byte_view<T: ByteViewType>(
     unsafe { GenericByteViewArray::compare_unchecked(left, left_idx, right, 
right_idx) }
 }
 
+/// Run-end encoding metadata for one side of a comparison. Only stores the
+/// physical run-end positions (m elements), not a logical-length index vector.
+struct ReeInfo {
+    run_ends: Vec<usize>,
+    offset: usize,
+    start_physical: usize,
+    len: usize,
+}
+
+/// If `array` is RunEndEncoded, return its physical values array and run 
metadata.
+fn ree_unwrap(array: &dyn Array) -> Option<(&dyn Array, ReeInfo)> {
+    downcast_run_array!(
+        array => {
+            let run_ends = array.run_ends();
+            let info = ReeInfo {
+                run_ends: run_ends.values().iter().map(|v| 
v.as_usize()).collect(),

Review Comment:
   Ok, so adding downcast at all leaves that need access to the indices 
resulted in 8% `arrow-ord` library binary bloat. Instead I chose to keep the 
dyn run ends and lazily expand where the physical indices are needed (as usize 
to avoid aforementioned bloat). This is particularly helpful in the REE vs 
scalar case, where we can avoid the physical run ends allocation altogether to 
materialize the resulting mask (we do downcast in that path now). This results 
in a 5% perf improvement with only a 0.4% additional increase in size.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat[arrow-ord]: suppport REE comparisons [arrow-rs]

Reply via email to