mbutrovich commented on code in PR #9542:
URL: https://github.com/apache/arrow-rs/pull/9542#discussion_r2935498510
##########
arrow-select/src/interleave.rs:
##########
@@ -154,13 +154,51 @@ fn interleave_primitive<T: ArrowPrimitiveType>(
data_type: &DataType,
) -> Result<ArrayRef, ArrowError> {
let interleaved = Interleave::<'_, PrimitiveArray<T>>::new(values,
indices);
+ let arrays = &interleaved.arrays;
+ let len = indices.len();
+
+ let mut output = Vec::with_capacity(len);
+ let dst: *mut T::Native = output.as_mut_ptr();
+ let mut base = 0;
+
+ // Process 8 elements at a time to issue multiple independent loads
+ // and increase memory-level parallelism for random access patterns.
+ let chunks = indices.chunks_exact(8);
+ let remainder = chunks.remainder();
+ for chunk in chunks {
+ let v0 = arrays[chunk[0].0].value(chunk[0].1);
Review Comment:
Not even knowing architecture specific stuff could help a user optimize
this, necessarily. If all the targets (or many) being gathered are on the same
cache line you won't get a huge benefit from this. Skew and cardinality per
batch could affect the effectiveness. I think the current approach makes sense
and don't see a good way to expose knobs here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]