https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90178

            Bug ID: 90178
           Summary: Missed optimization: duplicated terminal basic block
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alex.reinking at gmail dot com
  Target Milestone: ---

The following short C program,

int* find_ptr(int* mem, int sz, int val) {
    for (int i = 0; i < sz; i++) {
        if (mem[i] == val) { 
            return &mem[i];
        }
    }
    return nullptr;
}

compiles to the following on GCC (trunk) with -O3 -march=skylake on Godbolt.

find_ptr(int*, int, int):
        mov     rax, rdi
        test    esi, esi
        jle     .L4                  ## Why not .L8?
        lea     ecx, [rsi-1]
        lea     rcx, [rdi+4+rcx*4]
        jmp     .L3
.L9:
        add     rax, 4
        cmp     rax, rcx
        je      .L8
.L3:
        cmp     DWORD PTR [rax], edx
        jne     .L9
        ret
.L8:
        xor     eax, eax
        ret
.L4:
        xor     eax, eax
        ret

Godbolt link: https://godbolt.org/z/WczJ3J

Here the terminal basic blocks .L8 and .L4 are identical. It seems to me that
there is no benefit to keeping .L4 around, and jumps should be redirected to
.L8. Disabling AVX via -mno-avx eliminates the duplicate. However, a similar
code generation quirk exists in Clang for this program, so I apologize if there
is a microarchitectural subtlety I'm missing here.

Godbolt link for Clang comparison: https://godbolt.org/z/2uVZ8v

Reply via email to