[GitHub] [arrow] yordan-pavlov edited a comment on pull request #7037: ARROW-6718: [Rust] Remove packed_simd

GitBox Thu, 30 Apr 2020 14:12:13 -0700


yordan-pavlov edited a comment on pull request #7037:
URL: https://github.com/apache/arrow/pull/7037#issuecomment-621983970



   Hi,
   I thought I would do some profiling yesterday (to help make sure packed_simd 
is not removed prematurely) and noticed that a lot of time in `simd_compare_op` 
is spent in this loop here: 
https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L236
   ```
           for i in 0..lanes {
               result.append(T::mask_get(&simd_result, i))?;
           }
   ``` 
   I attempted to change this to use `mask.bitmask().to_byte_slice()` instead 
of `mask_get`;
   this was looking promising performance-wise before I attempted to change the 
arrow code by adding
   ```
   fn bitmask(mask: &Self::SimdMask) -> &[u8] {
                   unsafe { let bits = mask.bitmask(); bits.to_byte_slice() }
               }
   ```
   to the `impl ArrowNumericType` section of the `make_numeric_type` macro and 
changing `simd_compare_op` to 
   ```
   //...
   // notice this has changed to a MutableBuffer
   let mut result = MutableBuffer::new(left.len() * mem::size_of::<bool>());
   for i in (0..left.len()).step_by(lanes) {
           let simd_left = T::load(left.value_slice(i, lanes));
           let simd_right = T::load(right.value_slice(i, lanes));
           let simd_result = op(simd_left, simd_right);
           // this line is added
           result.write(T::bitmask(&simd_result));
           // this is the old code commented out
           // for i in 0..lanes {
           //     result.append(T::mask_get(&simd_result, i))?;
           // }
       }
   //...
   
   ```
   
    ~~but this doesn't compile because of error `returns a value referencing 
data owned by the current function`;  I have been doing mostly C# recently so 
was hoping you might be able to help.~~
   I got it to compile by changing to:
   ```
   fn bitmask<F>(mask: &Self::SimdMask, mut action: F) where F: FnMut(&[u8]) {
                   action(mask.bitmask().to_byte_slice());
               }
   ```
   and
   ```
   for i in (0..left.len()).step_by(lanes) {
           let simd_left = T::load(left.value_slice(i, lanes));
           let simd_right = T::load(right.value_slice(i, lanes));
           let simd_result = op(simd_left, simd_right);
           T::bitmask(&simd_result, |b| { result.write(b); });
           // for i in 0..lanes {
           //     result.append(T::mask_get(&simd_result, i))?;
           // }
       }
   ```
   
   this led to a significant performance improvement:
   ```
   filter with arrow SIMD
                           time:   [2.0026 ms 2.0091 ms 2.0172 ms]
                           change: [-70.479% -68.776% -67.260%] (p = 0.00 < 
0.05)
                           Performance has improved.
   ```
   
   
   @nevi-me  @paddyhoran any thoughts how to improve this further?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] yordan-pavlov edited a comment on pull request #7037: ARROW-6718: [Rust] Remove packed_simd

Reply via email to