https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123244
Bug ID: 123244
Summary: Vectorized loop falls back to unvectorized loop
instead of using “count trailing zeros” instruction
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: me at manueljacob dot de
Target Milestone: ---
The following C code:
const unsigned char *search_nonascii(const unsigned char *p, const unsigned
char *e) {
for (const unsigned char *s = p; s < e; s++) {
if (*s & 0x80)
return s;
}
return 0;
}
compiled with GCC 16.0.0 20251221 using options `-O3 -march=x86-64-v4` contains
the following vectorized loop:
vpxor xmm1, xmm1, xmm1
<...>
.L7:
add rax, 64
cmp rax, rcx
je <...>
.L8:
vmovdqa64 zmm0, ZMMWORD PTR [rdx+rax]
vpcmpb k0, zmm0, zmm1, 1
kortestq k0, k0
je .L7
<jump to unvectorized loop>
Instead of falling back to the unvectorized loop, the code could move k0 into a
GPR and get the offset to the matching byte using tzcnt.