https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82580

            Bug ID: 82580
           Summary: Optimize comparisons for __int128 on x86-64
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: morwenn29 at hotmail dot fr
  Target Milestone: ---

Given the following simple code:

    bool foobar(unsigned __int128 lhs, unsigned __int128 rhs) {
        return lhs < rhs;
    }

GCC generates branchful code for x86-64 at -O3 optimization level:

    foobar(unsigned __int128, unsigned __int128):
    cmp rsi, rcx
    mov eax, 1
    jb .L2
    jbe .L6
    .L3:
    xor eax, eax
    .L2:
    rep ret
    .L6:
    cmp rdi, rdx
    jnb .L3
    rep ret

On the other hand, Clang is able to generate branchless code with just a few
instructions at the same optimization level:

    foobar(unsigned __int128, unsigned __int128): # @foobar(unsigned __int128,
unsigned __int128)
    cmp rdi, rdx
    sbb rsi, rcx
    setb al
    ret

The codegen results are equivalent for the other comparison instructions. Would
it be possible to optimize these comparison instructions the same way for GCC?

Reply via email to