yordan-pavlov commented on pull request #7037:
URL: https://github.com/apache/arrow/pull/7037#issuecomment-624864888


   I agree with @paddyhoran - if the goal is just to enable the use of arrow 
with stable rust, it would be reasonable to just not enable the SIMD feature by 
default, but still keep it so it is available as a choice for those users who 
need the best performance possible.
   
   A lot of work has gone into the SIMD feature already and it would be a shame 
to remove it prematurely, without doing enough benchmarking.
   
   Furthermore, I think Rust could have a great future in the big data space 
and I think this project could play an important part. But SIMD is important in 
big data. So we should be looking to have SIMD stabilized (in Rust) rather than 
remove it. If SIMD is removed from arrow, what killer feature would motivate 
its stabilization in Rust? 
   
   For convenience here are the results from my filtering benchmarks:
   | Benchmark                       | Time      |
   | ------------------------------- | --------- |
   | filter with loop                | 567.78 us |
   | filter with iter                | 671.40 us |
   | filter with arrow loop          | 1.2900 ms |
   | filter with arrow NO SIMD       | 8.5939 ms |
   | filter with arrow SIMD (array)  | 599.05 us |
   | filter with arrow SIMD (scalar) | 381.38 us |
   
   In the table above we can see that SIMD filtering (against scalar values) is 
49% faster than a loop, and 76% faster than an iterator implementation. This 
could mean a difference between waiting 12h or 7h for a job to complete. So I 
think more benchmarking, profiling and performance improvements have to be done 
before it can be decided with confidence to remove SIMD (or not).
   
   The source code for the benchmarks used to produce the results about is here:
   
https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs
   
   I am happy to contribute benchmarks, I just have to figure out how / if they 
would fit in the main arrow repo.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to