On 26/08/16 11:14, Kyrill Tkachov wrote:
> Hi all,
> 
> The scheduling automata sizes are getting a bit out of control (as the
> PR complains about) and the Cortex-A8
> one is one of the largest offenders. An easy, low-hanging fruit in
> dealing with this are some of the FP/NEON operations
> that have very large reservation durations specified for them. They
> bloat the state space by quite a lot and it's not
> likely that there is enough parallelism present in the program to fill
> the (for example) 64 cycles that are modelled
> for the double-precision division. In the past we've dealt with this by
> decreasing the modelled reservation duration
> to keep the state space down.
> 
> This patch does that for the cortex_a8_neon automaton and caps the
> reservation duration for a particular reservation
> to 15 cycles. This should be plenty to demonstrate that these are high
> latency instructions.
> With this patch the number of NDFA states is massively reduced by more
> than 70% (26796 -> 6020).
> 
> As I don't have access to reasonable Cortex-A8 hardware I benchmarked it
> on SPEC2000 on a Cortex-A15.
> The idea (from Ramana) is that since Cortex-A8 tuning is the default
> tuning for armv7-a the patch shouldn't hurt
> the more widely accessible Cortex-A15 targets. There were no regressions
> in performance there.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-08-26  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> 
>     PR target/70473
>     * config/arm/cortex-a8-neon.md (cortex_a8_vfp_muld): Reduce
>     reservation duration to 15 cycles.
>     (cortex_a8_vfp_macs): Likewise.
>     (cortex_a8_vfp_macd): Likewise.
>     (cortex_a8_vfp_divs): Likewise.
>     (cortex_a8_vfp_divd): Likewise.
> 

OK.

R.

> arm-a8-automaton.patch
> 
> 
> diff --git a/gcc/config/arm/cortex-a8-neon.md 
> b/gcc/config/arm/cortex-a8-neon.md
> index 
> 45f861f6c6f840bd113e468eeec5373e06058f6d..b16c29974a7278e70d64dc83b5b388aebb51718b
>  100644
> --- a/gcc/config/arm/cortex-a8-neon.md
> +++ b/gcc/config/arm/cortex-a8-neon.md
> @@ -357,30 +357,34 @@ (define_insn_reservation "cortex_a8_vfp_muls" 12
>         (eq_attr "type" "fmuls"))
>    "cortex_a8_vfp,cortex_a8_vfplite*11")
>  
> +;; Don't model a reservation for more than 15 cycles as this explodes the
> +;; state space of the automaton for little gain.  It is unlikely that the
> +;; scheduler will find enough instructions to hide the full latency of the
> +;; instructions.
>  (define_insn_reservation "cortex_a8_vfp_muld" 17
>    (and (eq_attr "tune" "cortexa8")
>         (eq_attr "type" "fmuld"))
> -  "cortex_a8_vfp,cortex_a8_vfplite*16")
> +  "cortex_a8_vfp,cortex_a8_vfplite*15")
>  
>  (define_insn_reservation "cortex_a8_vfp_macs" 21
>    (and (eq_attr "tune" "cortexa8")
>         (eq_attr "type" "fmacs,ffmas"))
> -  "cortex_a8_vfp,cortex_a8_vfplite*20")
> +  "cortex_a8_vfp,cortex_a8_vfplite*15")
>  
>  (define_insn_reservation "cortex_a8_vfp_macd" 26
>    (and (eq_attr "tune" "cortexa8")
>         (eq_attr "type" "fmacd,ffmad"))
> -  "cortex_a8_vfp,cortex_a8_vfplite*25")
> +  "cortex_a8_vfp,cortex_a8_vfplite*15")
>  
>  (define_insn_reservation "cortex_a8_vfp_divs" 37
>    (and (eq_attr "tune" "cortexa8")
>         (eq_attr "type" "fdivs, fsqrts"))
> -  "cortex_a8_vfp,cortex_a8_vfplite*36")
> +  "cortex_a8_vfp,cortex_a8_vfplite*15")
>  
>  (define_insn_reservation "cortex_a8_vfp_divd" 65
>    (and (eq_attr "tune" "cortexa8")
>         (eq_attr "type" "fdivd, fsqrtd"))
> -  "cortex_a8_vfp,cortex_a8_vfplite*64")
> +  "cortex_a8_vfp,cortex_a8_vfplite*15")
>  
>  ;; Comparisons can actually take 7 cycles sometimes instead of four,
>  ;; but given all the other instructions lumped into type=ffarith that
> 

Reply via email to