Issue |
91370
|
Summary |
[X86] Worse runtime performance on Zen 4 CPU when optimizing for `znver4` or `skylake`
|
Labels |
new issue
|
Assignees |
|
Reporter |
Systemcluster
|
The following code runs around 300% slower on Zen 4 when optimized for `znver4` or `skylake` than when optimized for `znver3` or other targets.
```rust
pub fn sum(a: &[i64]) -> i64 {
let mut sum = 0;
a.chunks_exact(8).for_each(|x| {
for i in x {
sum += i;
}
});
sum
}
```
<details>
<summary>Full code</summary>
```rust
pub fn sum(a: &[i64]) -> i64 {
let mut sum = 0;
a.chunks_exact(8).for_each(|x| {
for i in x {
sum += i;
}
});
sum
}
fn main() {
let nums = std::hint::black_box(generate());
let now = std::time::Instant::now();
let sum = sum(&nums);
println!("{:?} / {}", now.elapsed(), sum);
}
fn generate() -> Vec<i64> {
let mut v = Vec::new();
for i in 0..1000000000 {
v.push(i);
}
v
}
```
</details>
Running on a Ryzen 7950X:
```cmd
> rustc.exe -Ctarget-cpu=x86-64-v4 -Copt-level=3 .\src\main.rs && ./main.exe
138.7342ms / 499999999500000000
> rustc.exe -Ctarget-cpu=x86-64-v3 -Copt-level=3 .\src\main.rs && ./main.exe
136.2689ms / 499999999500000000
> rustc.exe -Ctarget-cpu=x86-64 -Copt-level=3 .\src\main.rs && ./main.exe
136.0648ms / 499999999500000000
> rustc.exe -Ctarget-cpu=znver4 -Copt-level=3 .\src\main.rs && ./main.exe
543.1562ms / 499999999500000000
> rustc.exe -Ctarget-cpu=znver3 -Copt-level=3 .\src\main.rs && ./main.exe
137.4426ms / 499999999500000000
> rustc.exe -Ctarget-cpu=skylake -Copt-level=3 .\src\main.rs && ./main.exe
588.4743ms / 499999999500000000
> rustc.exe -Ctarget-cpu=haswell -Copt-level=3 .\src\main.rs && ./main.exe
138.5313ms / 499999999500000000
```
Disassembly here: https://godbolt.org/z/fzaGhGdWW
The tested optimization targets all generate different assembly with different levels of unrolling, but the `znver4` and `skylake` targets seem to be outliers.
I don't know whether the `skylake` target has the same issue or whether it's just caused by optimization target / CPU mismatch, but both result in the long list of constant values and show similar runtime performance. I also didn't test other targets than the above listed.
Split from https://github.com/llvm/llvm-project/issues/90985#issuecomment-2096057259
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs