https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122533
Bug ID: 122533
Summary: x86 with -O2 --- uses mov+and instead of test, moves
register back and forth, might omit memcpy()
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: zero at smallinteger dot com
Target Milestone: ---
Created attachment 62692
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62692&action=edit
Sample code
Consider the attached code, compiled with -O2 with gcc 15.2 per Godbolt. The
resulting assembly looks thus.
"test":
movdqu xmm0, XMMWORD PTR [rdi]
mov ecx, esi
mov edx, 1
xor eax, eax
sal rdx, cl
lea rcx, [rsp-56]
movaps XMMWORD PTR [rsp-56], xmm0
movdqu xmm0, XMMWORD PTR [rdi+16]
movaps XMMWORD PTR [rsp-40], xmm0
movdqu xmm0, XMMWORD PTR [rdi+32]
movaps XMMWORD PTR [rsp-24], xmm0
.L3:
mov rsi, rdx
and rsi, QWORD PTR [rcx+rax*8]
jne .L4
add rax, 1
cmp rax, 6
jne .L3
mov rax, rsi
ret
.L4:
mov rsi, rax
mov rax, rsi
ret
Observe that the first two instructions at L3 could be combined into a single
test instruction. Using a test instruction is preferred per the manual because
the test instruction does not need to store a result. Also, fewer instructions
are generally better.
Moreover, the code at L4 first moves rax to rsi, then moves rsi back to rax.
Note that changing return 0; to return 6; eliminates this seemingly unnecessary
work.
Finally, it is tempting to consider eliminating memcpy(). Note that clang with
-O3 does eliminate the memcpy().