yordan-pavlov edited a comment on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-621983970
Hi, I thought I would do some profiling yesterday (to help make sure packed_simd is not removed prematurely) and noticed that a lot of time in `simd_compare_op` is spent in this loop here: https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L236 ``` for i in 0..lanes { result.append(T::mask_get(&simd_result, i))?; } ``` I attempted to change this to use `mask.bitmask().to_byte_slice()` instead of `mask_get`; this was looking promising performance-wise before I attempted to change the arrow code by adding ``` fn bitmask(mask: &Self::SimdMask) -> &[u8] { unsafe { let bits = mask.bitmask(); bits.to_byte_slice() } } ``` to the `impl ArrowNumericType` section of the `make_numeric_type` macro and changing `simd_compare_op` to ``` //... // notice this has changed to a MutableBuffer let mut result = MutableBuffer::new(left.len() * mem::size_of::<bool>()); for i in (0..left.len()).step_by(lanes) { let simd_left = T::load(left.value_slice(i, lanes)); let simd_right = T::load(right.value_slice(i, lanes)); let simd_result = op(simd_left, simd_right); // this line is added result.write(T::bitmask(&simd_result)); // this is the old code commented out // for i in 0..lanes { // result.append(T::mask_get(&simd_result, i))?; // } } //... ``` ~~but this doesn't compile because of error `returns a value referencing data owned by the current function`; I have been doing mostly C# recently so was hoping you might be able to help.~~ I got it to compile by changing to: ``` fn bitmask<F>(mask: &Self::SimdMask, mut action: F) where F: FnMut(&[u8]) { action(mask.bitmask().to_byte_slice()); } ``` and ``` for i in (0..left.len()).step_by(lanes) { let simd_left = T::load(left.value_slice(i, lanes)); let simd_right = T::load(right.value_slice(i, lanes)); let simd_result = op(simd_left, simd_right); T::bitmask(&simd_result, |b| { result.write(b); }); // for i in 0..lanes { // result.append(T::mask_get(&simd_result, i))?; // } } ``` this led to a significant performance improvement: ``` filter with arrow SIMD time: [2.0026 ms 2.0091 ms 2.0172 ms] change: [-70.479% -68.776% -67.260%] (p = 0.00 < 0.05) Performance has improved. ``` @nevi-me @paddyhoran any thoughts how to improve this further? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
