https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117717
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2024-11-20
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.L4:
movq (%rax), %xmm0
pshufd $0xe5, %xmm0, %xmm1
movd %xmm0, %edx
movd %xmm1, %ecx
cmpl %edx, %ecx
jnb .L3
pshufd $225, %xmm0, %xmm0
movl $1, %edi
movq %xmm0, (%rax)
.L3:
addq $4, %rax
cmpq %rsi, %rax
jne .L4
vs:
.L4:
movl 4(%rax), %edx
movl (%rax), %ecx
cmpl %ecx, %edx
jnb .L3
movl %ecx, 4(%rax)
movl $1, %edi
movl %edx, (%rax)
.L3:
addq $4, %rax
cmpq %rsi, %rax
jne .L4
Note the aarch64 cost model rejects the vectorization.
X86_64 (on the trunk) cost model says:
/app/example.cpp:10:26: note: Cost model analysis for part in loop 2:
Vector cost: 44
Scalar cost: 48
While aarch64 says:
/app/example.cpp:10:26: note: Cost model analysis for part in loop 2:
Vector cost: 12
Scalar cost: 4