https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101822

            Bug ID: 101822
           Summary: Codegen bug for popcount
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: llvm at rifkin dot dev
  Target Milestone: ---

GCC cleverly optimizes the following loop into a popcount intrinsic:

uint32_t foo(uint32_t n) {
    uint32_t count = 0;
    while(n) {
        n &= n - 1;
        count++;
    }
    return count;
}

But the generated assembly is highly redundant https://godbolt.org/z/nbGb13G5W:

foo(unsigned int):
        xor     eax, eax
        xor     edx, edx
        popcnt  eax, edi
        test    edi, edi
        cmove   eax, edx
        ret

if(n == 0) __builtin_unreachable(); does seem to help the compiler's analysis.

It seems here the compiler is not realizing both the loop and popcnt intrinsic
are well-defined for n == 0. This is closely related to another bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101821.

Reply via email to