https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105429
Bug ID: 105429 Summary: Unnecessary moves generated by the compiler. Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: mareksz1958 at wp dot pl Target Milestone: --- The following C code: >>> #include <nmmintrin.h> #include <stdint.h> uint32_t crc(uint32_t current, const uint8_t *buffer, size_t size) { for(size_t i = 0; i < size; i++) current = _mm_crc32_u64(current, buffer[i]); return current; } <<< Generates inefficient assembly on all optimisation presets due to the extra `mov eax, eax' - Os and O3 below: >>> crc: movl %edi, %eax xorl %ecx, %ecx .L2: cmpq %rdx, %rcx je .L5 movzbl (%rsi,%rcx), %edi movl %eax, %eax incq %rcx crc32q %rdi, %rax jmp .L2 .L5: ret crc: movl %edi, %eax testq %rdx, %rdx je .L6 leaq (%rsi,%rdx), %rcx .L3: movzbl (%rsi), %edx movl %eax, %eax addq $1, %rsi crc32q %rdx, %rax cmpq %rsi, %rcx jne .L3 .L6: ret <<< The problem seems to be present in all GCC versions I have access to. The redundant move greatly worsens the performance of the generated code. When `_mm_crc32_u64' is replaced by any other function, the problem seems to disappear.