--- Comment #3 from Tim Haines <thaines.astro at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> Note I think the trunk already has improved code generation.

Here is the codegen from the latest trunk build using the same options as

    movzx eax, BYTE PTR [rdi]
    mov   edx, 0
    inc   eax
    cmp   al, BYTE PTR [rdi+1]
    cmove eax, edx

    movzx edx, BYTE PTR [rdi]
    movzx ecx, BYTE PTR [rdi+1]
    mov   eax, edx
    inc   edx
    cmp   edx, ecx
    je    .L6
    inc   eax
    xor   eax, eax

    movzx eax, BYTE PTR [rdi]
    mov   edx, 0
    inc   eax
    cmp   al, BYTE PTR [rdi+1]
    cmove eax, edx

Changing from a cmove to a cmp/jmp doesn't change the instruction latency
(although I couldn't find the latency for je on intel. I assume it's 1 like on
AMD), but now the branch predictor will be invoked- bringing possible pipeline
hazards. I don't mean to be overly critical, but I wouldn't consider this to be
an improvement to the previous code- especially since the other two versions of
the function use cmove. As Marc noted, we are still missing CSE here.

NB: The structure offsets are different here because the assembly I originally
posted was poorly anonymized by me. Mea culpa!

Reply via email to