https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68000

--- Comment #3 from Tim Haines <thaines.astro at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> Note I think the trunk already has improved code generation.

Here is the codegen from the latest trunk build using the same options as
before.

foo_manual_hoist:
    movzx eax, BYTE PTR [rdi]
    mov   edx, 0
    inc   eax
    cmp   al, BYTE PTR [rdi+1]
    cmove eax, edx
    ret

foo:
    movzx edx, BYTE PTR [rdi]
    movzx ecx, BYTE PTR [rdi+1]
    mov   eax, edx
    inc   edx
    cmp   edx, ecx
    je    .L6
    inc   eax
    ret
.L6:
    xor   eax, eax
    ret

foo_if:
    movzx eax, BYTE PTR [rdi]
    mov   edx, 0
    inc   eax
    cmp   al, BYTE PTR [rdi+1]
    cmove eax, edx
    ret

Changing from a cmove to a cmp/jmp doesn't change the instruction latency
(although I couldn't find the latency for je on intel. I assume it's 1 like on
AMD), but now the branch predictor will be invoked- bringing possible pipeline
hazards. I don't mean to be overly critical, but I wouldn't consider this to be
an improvement to the previous code- especially since the other two versions of
the function use cmove. As Marc noted, we are still missing CSE here.

NB: The structure offsets are different here because the assembly I originally
posted was poorly anonymized by me. Mea culpa!

Reply via email to