https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483
Bug ID: 102483 Summary: Regression in codegen of reduction of 4 chars Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- char foo (char* p) { char sum = 0; for (int i = 0; i != 4; i++) sum += p[i]; return sum; } -O3 -march=x86-64 GCC trunk: foo: mov edx, DWORD PTR [rdi] movzx eax, dh mov ecx, edx add eax, edx shr ecx, 16 add eax, ecx shr edx, 24 add eax, edx ret GCC 11 (much better): foo: movzx eax, BYTE PTR [rdi+1] add al, BYTE PTR [rdi] add al, BYTE PTR [rdi+2] add al, BYTE PTR [rdi+3] ret Best? llvm-mca says so.. foo: # @foo movd xmm0, dword ptr [rdi] # xmm0 = mem[0],zero,zero,zero pxor xmm1, xmm1 psadbw xmm1, xmm0 movd eax, xmm1 ret https://godbolt.org/z/sT9svvj7W