https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78200

--- Comment #11 from Venkataramanan <venkataramanan.kumar at amd dot com> ---
Hi Richard 


On haswell machine original run time for -O3 -max2 -mprefer-avx2

real    2m35.325s
user    2m35.257s
sys     0m0.070s

Changing the assembly from  

.L98:
        jle     .L97
        cmpl    $2, %r9d
        jne     .L97
.L99:

To 
.L98:
       cmpl    $2, %r9d
        jne     .L97
        cmpq    $0, %rdi 
        jle     .L97           
.L99:

real    2m27.224s
user    2m27.138s
sys     0m0.087s

improves run time. 


> -----Original Message-----
> From: rguenth at gcc dot gnu.org [mailto:gcc-bugzi...@gcc.gnu.org]
> Sent: Wednesday, November 9, 2016 6:02 PM
> To: Kumar, Venkataramanan <venkataramanan.ku...@amd.com>
> Subject: [Bug rtl-optimization/78200] [7 Regression] 429.mcf of cpu2006
> regresses in GCC trunk for avx2 target.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78200
> 
> --- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- OTOH we
> _do_ have initial RTL
> 
> (insn 167 166 168 20 (set (reg:CCGOC 17 flags)
>         (compare:CCGOC (reg/v:DI 217 [ red_cost ])
>             (const_int 0 [0]))) "pbeampp.c":42 -1
>      (nil))
> (jump_insn 168 167 169 20 (set (pc)
>         (if_then_else (ge (reg:CCGOC 17 flags)
>                 (const_int 0 [0]))
>             (label_ref 175)
>             (pc))) "pbeampp.c":42 -1
>      (int_list:REG_BR_PROB 6400 (nil))
>  -> 175)
> ;;  succ:       21 [36.0%]  (FALLTHRU)
> ;;              23 [64.0%]
> 
> ;; basic block 23, loop depth 2, count 0, freq 1067, maybe hot ;; Invalid sum 
> of
> incoming frequencies 1216, should be 1067 ;;  prev block 22, next block 24,
> flags: (NEW, REACHABLE, RTL, MODIFIED,
> VISITED)
> ;;  pred:       20 [64.0%]
> (code_label 175 173 176 23 98 "" [1 uses]) (note 176 175 177 23 [bb 23]
> NOTE_INSN_BASIC_BLOCK) (insn 177 176 178 23 (set (reg:CCNO 17 flags)
>         (compare:CCNO (reg/v:DI 217 [ red_cost ])
>             (const_int 0 [0]))) "pbeampp.c":42 -1
>      (nil))
> (insn 178 177 179 23 (set (reg:QI 273)
>         (gt:QI (reg:CCNO 17 flags)
>             (const_int 0 [0]))) "pbeampp.c":42 -1
>      (nil))
> (insn 179 178 180 23 (set (reg:CCZ 17 flags)
>         (compare:CCZ (reg:QI 273)
>             (const_int 0 [0]))) "pbeampp.c":42 -1
>      (nil))
> (jump_insn 180 179 587 23 (set (pc)
>         (if_then_else (eq (reg:CCZ 17 flags)
>                 (const_int 0 [0]))
>             (label_ref 196)
>             (pc))) "pbeampp.c":42 -1
>      (int_list:REG_BR_PROB 3300 (nil))
>  -> 196)
> 
> that is, it compares in a sensible order allowing for combining (which
> appearantly is what causes the code to run slower for not yet explored 
> reasons).
> 
> Expanding the other way around does not have any justification IMHO and thus
> the "fix" would be to the later stage where we combine the compare with the
> one on the backedge.
> 
> The issue is CSE2 which does
> 
> (insn 167 166 168 21 (set (reg:CC 17 flags)
>         (compare:CC (reg/v:DI 217 [ red_cost ])
>             (const_int 0 [0]))) "pbeampp.c":42 8 {*cmpdi_1}
>      (nil))
> (jump_insn 168 167 169 21 (set (pc)
>         (if_then_else (ge (reg:CC 17 flags)
>                 (const_int 0 [0]))
>             (label_ref 175)
>             (pc))) "pbeampp.c":42 635 {*jcc_1}
>      (expr_list:REG_DEAD (reg:CC 17 flags)
>         (int_list:REG_BR_PROB 6400 (nil)))  -> 175) ...
> (insn 178 176 179 24 (set (reg:QI 273)
>         (gt:QI (reg:CC 17 flags)
>             (const_int 0 [0]))) "pbeampp.c":42 631 {*setcc_qi}
>      (expr_list:REG_DEAD (reg:CC 17 flags)
>         (nil)))
> 
> thus changes the earlier compare to CC and re-uses that CCmode.  Note it's 
> still
> a mystery to me why this is slower (and I did not reproduce that myself yet).
> 
> Then we combine it to
> 
> (insn 167 166 168 18 (set (reg:CC 17 flags)
>         (compare:CC (reg/v:DI 217 [ red_cost ])
>             (const_int 0 [0]))) "pbeampp.c":42 8 {*cmpdi_1}
>      (nil))
> (jump_insn 168 167 169 18 (set (pc)
>         (if_then_else (ge (reg:CC 17 flags)
>                 (const_int 0 [0]))
>             (label_ref 175)
>             (pc))) "pbeampp.c":42 635 {*jcc_1}
>      (int_list:REG_BR_PROB 6400 (nil))
>  -> 175)
> ;;  succ:       19 [36.0%]  (FALLTHRU)
> ;;              20 [64.0%]
> 
> 
> ;; basic block 20, loop depth 0, count 0, freq 1067, maybe hot ;; Invalid sum 
> of
> incoming frequencies 1216, should be 1067 (jump_insn 180 179 587 20 (set (pc)
>         (if_then_else (le (reg:CC 17 flags)
>                 (const_int 0 [0]))
>             (label_ref:DI 196)
>             (pc))) "pbeampp.c":42 635 {*jcc_1}
>      (int_list:REG_BR_PROB 3300 (expr_list:REG_DEAD (reg:CCZ 17 flags)
>             (nil)))
> 
> --
> You are receiving this mail because:
> You reported the bug.

Reply via email to