https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90178
Bug ID: 90178 Summary: Missed optimization: duplicated terminal basic block Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: alex.reinking at gmail dot com Target Milestone: --- The following short C program, int* find_ptr(int* mem, int sz, int val) { for (int i = 0; i < sz; i++) { if (mem[i] == val) { return &mem[i]; } } return nullptr; } compiles to the following on GCC (trunk) with -O3 -march=skylake on Godbolt. find_ptr(int*, int, int): mov rax, rdi test esi, esi jle .L4 ## Why not .L8? lea ecx, [rsi-1] lea rcx, [rdi+4+rcx*4] jmp .L3 .L9: add rax, 4 cmp rax, rcx je .L8 .L3: cmp DWORD PTR [rax], edx jne .L9 ret .L8: xor eax, eax ret .L4: xor eax, eax ret Godbolt link: https://godbolt.org/z/WczJ3J Here the terminal basic blocks .L8 and .L4 are identical. It seems to me that there is no benefit to keeping .L4 around, and jumps should be redirected to .L8. Disabling AVX via -mno-avx eliminates the duplicate. However, a similar code generation quirk exists in Clang for this program, so I apologize if there is a microarchitectural subtlety I'm missing here. Godbolt link for Clang comparison: https://godbolt.org/z/2uVZ8v