https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82580
Bug ID: 82580 Summary: Optimize comparisons for __int128 on x86-64 Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: morwenn29 at hotmail dot fr Target Milestone: --- Given the following simple code: bool foobar(unsigned __int128 lhs, unsigned __int128 rhs) { return lhs < rhs; } GCC generates branchful code for x86-64 at -O3 optimization level: foobar(unsigned __int128, unsigned __int128): cmp rsi, rcx mov eax, 1 jb .L2 jbe .L6 .L3: xor eax, eax .L2: rep ret .L6: cmp rdi, rdx jnb .L3 rep ret On the other hand, Clang is able to generate branchless code with just a few instructions at the same optimization level: foobar(unsigned __int128, unsigned __int128): # @foobar(unsigned __int128, unsigned __int128) cmp rdi, rdx sbb rsi, rcx setb al ret The codegen results are equivalent for the other comparison instructions. Would it be possible to optimize these comparison instructions the same way for GCC?