Issue |
106011
|
Summary |
Lack of or poor autovectorization of loops over references into contiguous memory
|
Labels |
new issue
|
Assignees |
|
Reporter |
the8472
|
Rust iterators over slices yield references to each item. Even when iterating over references to primitives and applying simple, pure functions LLVM only manages to unroll the loops but fails to vectorize them.
## Case 1 - slice.iter().fold()
https://rust.godbolt.org/z/fn3W8PGE3
The second function from https://github.com/rust-lang/rust/issues/113789
```rust
pub fn check_range_fold(keys: &[u32], max: u32) -> bool {
keys.iter().fold(true, |a, x| a && *x < max)
}
```
results in this loop body
```assembly
.LBB0_5:
cmp dword ptr [rdi + 4*r8], edx
setb r9b
and r9b, al
cmp dword ptr [rdi + 4*r8 + 4], edx
setb al
cmp dword ptr [rdi + 4*r8 + 8], edx
setb r10b
and r10b, al
and r10b, r9b
cmp dword ptr [rdi + 4*r8 + 12], edx
setb al
and al, r10b
add r8, 4
cmp rsi, r8
jne .LBB0_5
```
This is unrolled and the values are loaded unconditionally, so it's not an issue of dereferencability or profitability of loading the data.
Expected behavior: Vectorizes with `vpcmpud`
## Case 2 - manually unrolled slice.iter().all()
https://rust.godbolt.org/z/dxEsqj3fx
This is derived from an attempt to implement manual unrolling for `iterator.all()` - which is short-circuiting - to convince LLVM that vectorization would be profitable. Again the unrolling happens and the values are loaded unconditionally but they're not vectorized.
```rust
#![feature(slice_as_chunks)]
pub fn all_auto(a: &[u32], b: u32) -> bool {
all_generic_simple(a, |&x| x == b)
}
fn all_generic_simple<'a>(a: &'a [u32], mut b: impl FnMut(&'a u32) -> bool) -> bool {
const UNROLL: usize = 4;
let (chunks, remainder) = a.as_chunks::<UNROLL>();
for chunk in chunks {
let chunk: &[u32; 4] = chunk;
if !(b(&chunk[0]) && b(&chunk[1]) && b(&chunk[2]) && b(&chunk[3])) {
return false;
}
}
remainder.iter().all(b)
}
```assembly
.LBB0_1:
test rsi, rsi
je .LBB0_2
cmp dword ptr [rax], edx
jne .LBB0_12
cmp dword ptr [rax + 4], edx
jne .LBB0_12
cmp dword ptr [rax + 8], edx
jne .LBB0_12
add rsi, -16
lea r9, [rax + 16]
cmp dword ptr [rax + 12], edx
mov rax, r9
je .LBB0_1
```
Expected behavior: Vectorizes with `vpcmpeqd`
Curiously putting the unrolled short-circuiting comparison in a function - which still gets inlined - *and* bloading its return type with an additional dummy value causes it to vectorize.
```assembly
.LBB0_2:
vpcmpeqd k0, xmm0, xmmword ptr [rdi + rcx]
kmovd r8d, k0
cmp r8b, 15
jne .LBB0_10
add rcx, 16
cmp rax, rcx
jne .LBB0_2
```
Rust issue similar to this case: https://github.com/rust-lang/rust/issues/105259
## Case 3 - manually unrolled slice.iter().max()
https://github.com/rust-lang/rust/issues/106539
https://rust.godbolt.org/z/3W5ne4hsK
This does attempt to vectorize, but the code looks pretty bad. The inner loop still seems to contain a bunch of individual loads. I was expecting some use of `vpcmpud` in there.
Though this case is less important I realize AVX/AVX2 don't have unsigned comparisons and I'd actually have to write this differently to get something based on `vpmaxud` instead.
```rust
#![feature(slice_as_chunks)]
pub fn iter_max<'a>(a: &'a [u32]) -> Option<&u32> {
const UNROLL: usize = 8;
let (chunks, remainder) = a.as_chunks::<UNROLL>();
let mut cur_max = unsafe { chunks.get(0)?.iter().max().unwrap_unchecked() };
let mut cur = *cur_max;
for chunk in chunks {
// this should be the fast path that vectorizes
if chunk.map(|e| e >= cur) == [false ; UNROLL] {
continue;
}
for i in 0..UNROLL {
if chunk[i] >= cur {
cur_max = &chunk[i];
}
}
cur = *cur_max;
}
for r in remainder {
cur_max = if r >= cur_max { r } else { cur_max };
}
Some(cur_max)
}
```
```assembly
.LBB0_3:
lea r9, [rdi + 32]
cmp r9, r8
je .LBB0_4
mov ebp, dword ptr [rdi + 32]
mov r10d, dword ptr [rdi + 52]
xor r14d, r14d
cmp r10d, esi
setae r14b
mov r11d, dword ptr [rdi + 56]
xor r12d, r12d
cmp r11d, esi
setae r12b
mov ebx, dword ptr [rdi + 60]
xor r13d, r13d
cmp ebx, esi
setae r13b
xor r15d, r15d
cmp ebp, esi
setae r15b
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs