Re: [PR] Minor: Document SIMD rationale and tips [arrow-rs]

via GitHub Wed, 16 Oct 2024 03:43:09 -0700


alamb commented on code in PR #6554:
URL: https://github.com/apache/arrow-rs/pull/6554#discussion_r1802840595



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair

Review Comment:
   fixed



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.

Review Comment:
   I changed the docs to say "the Rust compilers auto-vectorization" as I think 
that is the high level description of what is going on
   
   In this context, I think the use of `llvm` is an "implementation detail" 
(albliet an important one) about how that auto-vectorization is accomplished. 



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex
+3. No bitwise horizontal reductions or masking

Review Comment:
   > Perhaps we could link to 
https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html
   
   
   TIL: That is a nice description
   
   I reworded this item to
   
   > 3. No [horizontal reductions] or data dependencies
   
   



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex
+3. No bitwise horizontal reductions or masking
+4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` 
`RUSTFLAGS` flag)
+
+The last point is especially important as the default `target-cpu` doesn't
+support many SIMD instructions. See the Performance Tips section at the
+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend getting familiar with
+tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) 
and

Review Comment:
   done



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.

Review Comment:
   I rephrased the sentence to hopefully be clearer now
   
   "In fact, this crate used to contain several manual SIMD implementations, 
which were removed after discovering the auto-vectorized code was faster."



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex
+3. No bitwise horizontal reductions or masking
+4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` 
`RUSTFLAGS` flag)

Review Comment:
   Changed to "Suitable SIMD instructions available in the target ISA (e.g. 
`target-cpu` `RUSTFLAGS` flag)"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Minor: Document SIMD rationale and tips [arrow-rs]

Reply via email to