https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94617
Bug ID: 94617 Summary: Simple if condition not optimized Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: soap at gentoo dot org Target Milestone: --- Given the following C++ snippet const char* vanilla_bandpass(int a, int b, int x, const char* low, const char* high) { const bool within_interval { (a <= x) && (x < b) }; return (within_interval ? high : low); } GCC trunk yields with -O3 -march=znver2 the following assembly vanilla_bandpass(int, int, int, char const*, char const*): mov rax, r8 cmp edi, edx jg .L4 cmp edx, esi jge .L4 ret .L4: mov rax, rcx ret which is terrible. On the other hand, Clang emits vanilla_bandpass(int, int, int, char const*, char const*): cmp edx, esi cmovge r8, rcx cmp edi, edx cmovg r8, rcx mov rax, r8 ret which is a lot better. There exists an unbranched version for which I'm not 100% certain whether it's free of UB: #include <cstdint> const char* funky_bandpass(int a, int b, int x, const char* low, const char* high) { const bool within_interval { (a <= x) && (x < b) }; const auto low_ptr = reinterpret_cast<uintptr_t>(low) * (!within_interval); const auto high_ptr = reinterpret_cast<uintptr_t>(high) * within_interval; const auto ptr_sum = low_ptr + high_ptr; const auto* result = reinterpret_cast<const char*>(ptr_sum); return result; } which yields funky_bandpass(int, int, int, char const*, char const*): cmp edi, edx setle al cmp edx, esi setl dl and eax, edx mov edx, eax xor edx, 1 movzx edx, dl movzx eax, al imul rcx, rdx imul rax, r8 add rax, rcx ret which is jump-free and in practice executes at the same observable rate as Clang's assembly, but still looks needlessly complex. Clang manages to compile this code to the same assembly as vanilla_bandpass. Any chance of getting the optimizer ironed out for this?