https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573

Martin Sebor <msebor at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|c++                         |rtl-optimization

--- Comment #3 from Martin Sebor <msebor at gcc dot gnu.org> ---
Since this isn't C++ specific but rather a doing of the RTL optimizer I changed
the component to rtl-optimization.

On powerpc64, which historically has implemented the same static branch
prediction strategy, and where IBM XLC emits the same code as Clang, GCC emits
the following at -O1:

foo:
        ...
        cmpdi 7,3,0
        beq 7,.L2
        bl bar1
        ...
        b .L1
.L2:
        bl bar2
        ...
.L1:
        ...
        blr

while the following at -O2:

foo:
        ...
        cmpdi 7,3,0
        ...
        bne 7,.L6
        bl bar2
        ...
        blr
.L6:
        bl bar1
        ...
        blr

Comparing the RTL dumps between -O1 and -O2 it looks like the change is
introduced in the basic block reordering (bbro) pass that only runs at -O2. 
There, the if_then_else eq instruction

(jump_insn 7 6 8 2 (set (pc)
        (if_then_else (eq (reg:CC 75 7 [156])
                (const_int 0 [0]))
            (label_ref 12)
            (pc))) t.c:5 693 {*rs6000.md:12637}
     (expr_list:REG_DEAD (reg:CC 75 7 [156])
        (int_list:REG_BR_PROB 6100 (nil)))
 -> 12)
...
(call_insn 14 13 17 3 (parallel [
            (call (mem:SI (symbol_ref:DI ("bar2") [flags 0x41]  <function_decl
0x3fff84018c10 bar2>) [0 bar2 S4 A8])

is replaced with

(jump_insn 7 6 13 2 (set (pc)
        (if_then_else (ne (reg:CC 75 7 [156])
                (const_int 0 [0]))
            (label_ref:DI 53)
            (pc))) t.c:5 693 {*rs6000.md:12637}
     (expr_list:REG_DEAD (reg:CC 75 7 [156])
        (int_list:REG_BR_PROB 3900 (nil)))
 -> 53)
...
(call_insn 14 13 34 3 (parallel [
            (call (mem:SI (symbol_ref:DI ("bar2") [flags 0x41]  <function_decl
0x3fff84018c10 bar2>) [0 bar2 S4 A8])

I would expect this to then be corrected if necessary according to the
processor's static branch prediction strategy but it clearly doesn't happen for
powerpc64 or apparently x86_64.

Reply via email to