https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68000
--- Comment #3 from Tim Haines <thaines.astro at gmail dot com> --- (In reply to Andrew Pinski from comment #1) > Note I think the trunk already has improved code generation. Here is the codegen from the latest trunk build using the same options as before. foo_manual_hoist: movzx eax, BYTE PTR [rdi] mov edx, 0 inc eax cmp al, BYTE PTR [rdi+1] cmove eax, edx ret foo: movzx edx, BYTE PTR [rdi] movzx ecx, BYTE PTR [rdi+1] mov eax, edx inc edx cmp edx, ecx je .L6 inc eax ret .L6: xor eax, eax ret foo_if: movzx eax, BYTE PTR [rdi] mov edx, 0 inc eax cmp al, BYTE PTR [rdi+1] cmove eax, edx ret Changing from a cmove to a cmp/jmp doesn't change the instruction latency (although I couldn't find the latency for je on intel. I assume it's 1 like on AMD), but now the branch predictor will be invoked- bringing possible pipeline hazards. I don't mean to be overly critical, but I wouldn't consider this to be an improvement to the previous code- especially since the other two versions of the function use cmove. As Marc noted, we are still missing CSE here. NB: The structure offsets are different here because the assembly I originally posted was poorly anonymized by me. Mea culpa!