https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80854
Bug ID: 80854 Summary: hot path is slowed down when the cold return path is merged into it Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- i see subomptimal code gen for float foo (float x) { if (__builtin_expect (x > 0, 0)) if (x>2) return 0; return x*x; } because the return path merge causes extra register move in the hot path https://godbolt.org/g/AZxxrR x86_64: foo: pxor %xmm1, %xmm1 ucomiss %xmm1, %xmm0 ja .L8 .L2: movaps %xmm0, %xmm1 // extra reg move mulss %xmm0, %xmm1 .L1: movaps %xmm1, %xmm0 // extra reg move ret .L8: ucomiss .LC1(%rip), %xmm0 jbe .L2 jmp .L1 // need not jmp back .LC1: .long 1073741824 aarch64: foo: fcmpe s0, #0.0 bgt .L8 .L2: fmul s1, s0, s0 .L1: fmov s0, s1 // extra reg move ret .p2align 3 .L8: fmov s2, 2.0e+0 movi v1.2s, #0 fcmpe s0, s2 ble .L2 b .L1 // need not jmp back i wonder if gcc could do better if there is information about hot/cold paths (by not merging the hot/cold return paths in some cases).