| Issue |
182306
|
| Summary |
[SLPVectorizer] SLP Vectorizer Miscalculates Vectorization Benefit When the Vectorized Values Are Used in Other Basic Blocks
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
ibogosavljevic
|
SLP vectorizer miscalculates vectorization benefit when vectorized values are used in other basic block. In C/C++ this doesn't happen all too often, but in Java it happens all the time because LLVM inserts deoptimization points and therefore breaks big basic blocks into two smaller ones.
### To Reproduce
Attached is the C repro. Reproducible with clang 21.1.8 (but also older versions, e.g. clang 18)
[bench_with_loop.c](https://github.com/user-attachments/files/25418325/bench_with_loop.c)
[repro_with_loop.c](https://github.com/user-attachments/files/25418326/repro_with_loop.c)
```
clang -O3 -march=znver2-static bench_with_loop.c -o bench-loop-slp
clang -O3 -march=znver2-fno-slp-vectorize -static bench_with_loop.c -o bench-loop-no-slp
```
We use here `-march=znver2`, but the issue reproduces on other architectures as well. Here are the runtimes on Zen 2 architecture (native hardware) with the above configuration:
```
Binary Per encryption round Per mainStep call
bench-loop-slp 178 ns 5.55 ns
bench-loop-no-slp 163 ns 5.07 ns
```
### Investigation
Compile the example to LLVM IR just before slp-vectorizer pass:
```
clang -O3 -march=znver2 -S -emit-llvm -mllvm -print-before=slp-vectorizer -mllvm -print-module-scope bench_with_loop.c -o /dev/null 2> before_slp.ll.txt
```
We get the file `before_slp.ll` which contains LLVM IR just before SLP pass. This file has several copies of the module (each begins with `source_filename = "bench_with_loop.c"`), remove all of them except for the first. I did it and I am attaching the .ll file before slp vectorize pass.
[before_slp.ll.txt](https://github.com/user-attachments/files/25420446/before_slp.ll.txt)
If you open this file, notice three blocks `if.end:`, `if.then87:` and `if.end88:`. Notice that if there weren't for `if.then87:`, all of this would be the same block.
We run the slp-vectorizer on [before_slp.ll.txt] like this:
```
opt -passes=slp-vectorizer -debug-_only_=SLP -mcpu=znver2 before_slp.ll.txt -S -o after_slp.ll.txt 2>slp-output.txt
```
This applies SLP vectorization step. The SLP vectorizer found that this vectorization is beneficial (but we know it is not). It also generated slp-output.txt
I asked AI to take the input before_slp.ll.txt and to remove block `if.then87:`. Now we have one big block in the file
[before_slp_one_block.ll.txt](https://github.com/user-attachments/files/25421172/before_slp_one_block.ll.txt)
We run SLP vectorization on this file:
```
opt -passes=slp-vectorizer -debug-_only_=SLP -mcpu=znver2 before_slp_one_block.ll.txt -S -o after_slp_one_block.ll.txt 2>slp-output_one_block.txt
```
With this trivial change, SLP didn't vectorize this function.
I am attaching all the artifacts of the investigation. The SLP debug log suggests that the problem with the version with the original version compared to the version with one block is that the cost model counts differently the cost of usage in external blocks.
[after_slp.ll.txt](https://github.com/user-attachments/files/25421240/after_slp.ll.txt)
[after_slp_one_block.ll.txt](https://github.com/user-attachments/files/25421239/after_slp_one_block.ll.txt)
[slp-output.txt](https://github.com/user-attachments/files/25421237/slp-output.txt)
[slp-output_one_block.txt](https://github.com/user-attachments/files/25421238/slp-output_one_block.txt)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs