https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117112
Bug ID: 117112
Summary: missed vectorization opportunity: "not vectorized: no
grouped stores in basic block"
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: 652023330028 at smail dot nju.edu.cn
Target Milestone: ---
Hello, we noticed that there seems to be a missing vectorization for the code
below (at line 10).
reduced code:
https://godbolt.org/z/K9WhWrjno
int data[20];
void f(int * __restrict arr1, int * __restrict arr2)
{
for(int i = 0; i < 20; i++){
arr1[i] = 1;
}
for(int i = 0; i < 20; i++){
arr2[i] = 2 * arr1[i];
int k = arr2[i] % arr1[i];
data[i] = data[i - k] + 1; // line 10, can be vectorized
}
}
GCC -O3 -fno-vect-cost-model:
f(int*, int*):
pcmpeqd xmm0, xmm0
mov r8, rsi
xor ecx, ecx
psrld xmm0, 31
movups XMMWORD PTR [rdi], xmm0
movups XMMWORD PTR [rdi+16], xmm0
movups XMMWORD PTR [rdi+32], xmm0
movups XMMWORD PTR [rdi+48], xmm0
movups XMMWORD PTR [rdi+64], xmm0
.L2:
mov esi, DWORD PTR [rdi+rcx*4]
lea eax, [rsi+rsi]
cdq
mov DWORD PTR [r8+rcx*4], eax
idiv esi
mov eax, ecx
sub eax, edx
cdqe
mov eax, DWORD PTR data[0+rax*4]
add eax, 1
mov DWORD PTR data[0+rcx*4], eax
add rcx, 1
cmp rcx, 20
jne .L2
ret
missed:
<source>:12:1: missed: not consecutive access _7 = *_6;
<source>:12:1: missed: not consecutive access *_8 = _9;
<source>:12:1: missed: not consecutive access _11 = data[_10];
<source>:12:1: missed: not consecutive access data[i_31] = _12;
<source>:12:1: missed: not vectorized: no grouped stores in basic block.
Thank you very much for your time and effort! We look forward to hearing from
you.