On Fri, Jan 16, 2015 at 3:06 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> wrote: > On Nov 19, 2014, at 12:27 PM, Ramana Radhakrishnan > <ramana.radhakrish...@arm.com> wrote: > >> > > Hi Ramana, > Hi Vladimir, > > I still don't have SPEC2000/SPEC2006 benchmark numbers for this patch. Since > stage3 is about to finish, I'm going to commit the target independent part of > the patch now (it was approved by Vladimir a while ago). > > I'll commit ARM part of the patch once Ramana is happy with it. > > I've addressed all review comments and updated patch is attached. > > Any objections? > >> On 14/11/14 15:12, Maxim Kuvyrkov wrote: >>> On Nov 14, 2014, at 8:38 AM, Jeff Law <l...@redhat.com> wrote: >>> >>>> On 10/20/14 22:06, Maxim Kuvyrkov wrote: >>>>> Hi, >>>>> Ramana, this change requires benchmarking, which I can't easily do >>>>> at >>>> the moment. I would appreciate any benchmarking results that you can >>>> share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH >>>> needs to be tuned/confirmed for Cortex-A15. >>>> What were the results of that benchmarking? IIRC I tabled reviewing this >>>> work waiting for those results (and I probably should have let you know >>>> that. Sorry, my bad there). >>> >>> I don't have the benchmarking results yet, and I was hoping for ARM to help >>> with getting the numbers. The arm maintainers still need to OK the >>> arm-specific portion of the patch, which, I imagine, will happen only of >>> benchmark scores improve. >> >> >> I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the trees >> you gave me links to against the equivalent version on trunk. >> >> The results with SPEC2k on A15 were in the noise with the default value for >> PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm still >> waiting on results for values 0, 1 and 3 and hopefully something will come >> back soon for SPEC2k. > > After going back to the original benchmark that exposed the autoprefetcher > issue (STREAMS) the required value of the parameter to fix the regression > turned out to be 5. > > I have also discovered that Cortex-A57 also benefits from this optimization. > Running STREAMS benchmark on Juno in AArch32 mode showed increased > performance by 10-15%, which is not as big as Cortex-A15's 250-300%, but > still significant. > > In this updated patch the parameter is set to maximum -- look through all > scheduling queue of instructions in search of memory ops that are relevant to > L2 autoprefetcher. In practice the setting to maximum will not change much, > since there are rarely many instructions in the queue. > >> >> >>> >>> @@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void) >>> return issue_rate > 1 ? issue_rate : 0; >>> } >>> >>> +/* Enable modeling of Cortex-A15 L2 auto-prefetcher. */ >>> +static int >>> +arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index) >>> +{ >>> + switch (arm_tune) >>> + { >>> + case cortexa15: >>> + return autopref_multipass_dfa_lookahead_guard (insn, ready_index); >>> + >>> + default: >>> + return 0; >>> + } >>> +} >>> + >> >> >> It would be better to have this as a flag in the tuning tables rather than >> hardcoding for a core here. The backend has been moving in that direction >> for all core centric information and it is preferable that be continued. >> >> So this logic here should just be >> >> if (current_tune->multipass_lookahead) >> return autopref_multipass_lookahead_guard (insn, ready_index); >> else >> return 0; >> > > Done. The autoprefetcher is enabled in Cortex-A15 and Cortex-A57 tuning > tables.
The ARM changes look sane. As long as you've tested this with a bootstrap, I'm ok with it - we probably need a similar change for AArch64 and Cortex-A57 but I think we should do that for the next stage1. Thanks for persevering with this. Ramana > > -- > Maxim Kuvyrkov > www.linaro.org > >