https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122533

            Bug ID: 122533
           Summary: x86 with -O2 --- uses mov+and instead of test, moves
                    register back and forth, might omit memcpy()
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zero at smallinteger dot com
  Target Milestone: ---

Created attachment 62692
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62692&action=edit
Sample code

Consider the attached code, compiled with -O2 with gcc 15.2 per Godbolt.  The
resulting assembly looks thus.

"test":
        movdqu  xmm0, XMMWORD PTR [rdi]
        mov     ecx, esi
        mov     edx, 1
        xor     eax, eax
        sal     rdx, cl
        lea     rcx, [rsp-56]
        movaps  XMMWORD PTR [rsp-56], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+16]
        movaps  XMMWORD PTR [rsp-40], xmm0
        movdqu  xmm0, XMMWORD PTR [rdi+32]
        movaps  XMMWORD PTR [rsp-24], xmm0
.L3:
        mov     rsi, rdx
        and     rsi, QWORD PTR [rcx+rax*8]
        jne     .L4
        add     rax, 1
        cmp     rax, 6
        jne     .L3
        mov     rax, rsi
        ret
.L4:
        mov     rsi, rax
        mov     rax, rsi
        ret

Observe that the first two instructions at L3 could be combined into a single
test instruction.  Using a test instruction is preferred per the manual because
the test instruction does not need to store a result.  Also, fewer instructions
are generally better.

Moreover, the code at L4 first moves rax to rsi, then moves rsi back to rax. 
Note that changing return 0; to return 6; eliminates this seemingly unnecessary
work.

Finally, it is tempting to consider eliminating memcpy().  Note that clang with
-O3 does eliminate the memcpy().

Reply via email to