tustvold commented on code in PR #2740:
URL: https://github.com/apache/arrow-rs/pull/2740#discussion_r973717499
##########
arrow/src/compute/kernels/arity.rs:
##########
@@ -298,36 +297,41 @@ where
let len = a.len();
if a.null_count() == 0 && b.null_count() == 0 {
- let values = a.values().iter().zip(b.values()).map(|(l, r)| op(*l,
*r));
- let buffer = unsafe { Buffer::try_from_trusted_len_iter(values) }?;
- // JUSTIFICATION
- // Benefit
- // ~75% speedup
- // Soundness
- // `values` is an iterator with a known size from a PrimitiveArray
- return Ok(unsafe { build_primitive_array(len, buffer, 0, None) });
+ let mut buffer = BufferBuilder::<O::Native>::new(len);
+ buffer.append_n_zeroed(len);
Review Comment:
So this appears to boil down to the whims of LLVMs inlining heuristics,
try_from_trusted_len_iter doesn't appear to get fully inlined and therefore
gets optimised properly (although I have a really hard time understanding the
generated code). Potentially rustc is doing something to help LLVM here.
However, when the loop is defined in the function body of try_binary LLVM
doesn't optimize it properly. If you split it out into a free function with
inline(never), it optimises correctly and is actually marginally faster than
the iterator.
Annoyingly using try_from_trusted_len_iter but doing something like
`(0..a.len()).map(|idx| ...)` to construct the iterator also doesn't work,
unless split into a free function.
I would therefore suggest splitting the no null variant into a free function
marked inline never, as this appears to work...
FYI @dandan as I wonder if we're running into this elsewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]