On 26/08/16 11:14, Kyrill Tkachov wrote: > Hi all, > > The scheduling automata sizes are getting a bit out of control (as the > PR complains about) and the Cortex-A8 > one is one of the largest offenders. An easy, low-hanging fruit in > dealing with this are some of the FP/NEON operations > that have very large reservation durations specified for them. They > bloat the state space by quite a lot and it's not > likely that there is enough parallelism present in the program to fill > the (for example) 64 cycles that are modelled > for the double-precision division. In the past we've dealt with this by > decreasing the modelled reservation duration > to keep the state space down. > > This patch does that for the cortex_a8_neon automaton and caps the > reservation duration for a particular reservation > to 15 cycles. This should be plenty to demonstrate that these are high > latency instructions. > With this patch the number of NDFA states is massively reduced by more > than 70% (26796 -> 6020). > > As I don't have access to reasonable Cortex-A8 hardware I benchmarked it > on SPEC2000 on a Cortex-A15. > The idea (from Ramana) is that since Cortex-A8 tuning is the default > tuning for armv7-a the patch shouldn't hurt > the more widely accessible Cortex-A15 targets. There were no regressions > in performance there. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > Ok for trunk? > > Thanks, > Kyrill > > 2016-08-26 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR target/70473 > * config/arm/cortex-a8-neon.md (cortex_a8_vfp_muld): Reduce > reservation duration to 15 cycles. > (cortex_a8_vfp_macs): Likewise. > (cortex_a8_vfp_macd): Likewise. > (cortex_a8_vfp_divs): Likewise. > (cortex_a8_vfp_divd): Likewise. >
OK. R. > arm-a8-automaton.patch > > > diff --git a/gcc/config/arm/cortex-a8-neon.md > b/gcc/config/arm/cortex-a8-neon.md > index > 45f861f6c6f840bd113e468eeec5373e06058f6d..b16c29974a7278e70d64dc83b5b388aebb51718b > 100644 > --- a/gcc/config/arm/cortex-a8-neon.md > +++ b/gcc/config/arm/cortex-a8-neon.md > @@ -357,30 +357,34 @@ (define_insn_reservation "cortex_a8_vfp_muls" 12 > (eq_attr "type" "fmuls")) > "cortex_a8_vfp,cortex_a8_vfplite*11") > > +;; Don't model a reservation for more than 15 cycles as this explodes the > +;; state space of the automaton for little gain. It is unlikely that the > +;; scheduler will find enough instructions to hide the full latency of the > +;; instructions. > (define_insn_reservation "cortex_a8_vfp_muld" 17 > (and (eq_attr "tune" "cortexa8") > (eq_attr "type" "fmuld")) > - "cortex_a8_vfp,cortex_a8_vfplite*16") > + "cortex_a8_vfp,cortex_a8_vfplite*15") > > (define_insn_reservation "cortex_a8_vfp_macs" 21 > (and (eq_attr "tune" "cortexa8") > (eq_attr "type" "fmacs,ffmas")) > - "cortex_a8_vfp,cortex_a8_vfplite*20") > + "cortex_a8_vfp,cortex_a8_vfplite*15") > > (define_insn_reservation "cortex_a8_vfp_macd" 26 > (and (eq_attr "tune" "cortexa8") > (eq_attr "type" "fmacd,ffmad")) > - "cortex_a8_vfp,cortex_a8_vfplite*25") > + "cortex_a8_vfp,cortex_a8_vfplite*15") > > (define_insn_reservation "cortex_a8_vfp_divs" 37 > (and (eq_attr "tune" "cortexa8") > (eq_attr "type" "fdivs, fsqrts")) > - "cortex_a8_vfp,cortex_a8_vfplite*36") > + "cortex_a8_vfp,cortex_a8_vfplite*15") > > (define_insn_reservation "cortex_a8_vfp_divd" 65 > (and (eq_attr "tune" "cortexa8") > (eq_attr "type" "fdivd, fsqrtd")) > - "cortex_a8_vfp,cortex_a8_vfplite*64") > + "cortex_a8_vfp,cortex_a8_vfplite*15") > > ;; Comparisons can actually take 7 cycles sometimes instead of four, > ;; but given all the other instructions lumped into type=ffarith that >