On Fri, Feb 03, 2017 at 02:58:23PM +0300, Maxim Kuvyrkov wrote:
> > On Jan 30, 2017, at 5:50 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
> > wrote:
> > 
> >> On Jan 30, 2017, at 3:23 PM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> 
> >> wrote:
> >> 
> >> Hi Maxim,
> >> 
> >> On 30/01/17 12:06, Maxim Kuvyrkov wrote:
> >>> This patch enables prefetching at -O3 for aarch64 cores that set 
> >>> "simultaneous prefetches" parameter above 0.  There are currently no such 
> >>> settings, so this patch doesn't change default code generation.
> >>> 
> >>> I'm now working on improvements to -fprefetch-loop-arrays pass to make it 
> >>> suitable for -O2. I'll post this work in the next month.
> >>> 
> >>> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.
> >>> 
> >> 
> >> Are you aiming to get this in for GCC 8?
> >> I have one small comment on this patch:
> >> 
> >> +  /* Enable sw prefetching at -O3 for CPUS that have prefetch, and we
> >> +     have deemed it beneficial (signified by setting
> >> +     prefetch.num_slots to 1 or more).  */
> >> +  if (flag_prefetch_loop_arrays < 0
> >> +      && HAVE_prefetch
> >> 
> >> HAVE_prefetch will always be true on aarch64.
> >> I imagine midend code that had logic like this would need this check, but 
> >> aarch64-specific code shouldn't need it.
> > 
> > Agree, I'll remove HAVE_prefetch.
> > 
> > This pattern was copied from other backends, and HAVE_prefetch is most 
> > likely a historical artifact.
> 
> Andrew raised a good point in the review of his patch that it is a bad idea
> to use one of prefetching parameters (simultaneous_prefetches) as indicator
> for whether to enable prefetching pass by default.  Indeed there are cases
> when we want to set simultaneous_prefetch according to HW documentation (or
> experimental results), but not enable prefetching pass by default.
> 
> This update to the patch addresses it.  The patch adds a new explicit field
> to prefetch tuning structure "default_opt_level" that sets optimization level
> from which prefetching should be enabled by default.  The current value is to
> enable prefetching at -O3; additionally, this parameter will come handy for
> enabling prefetching at -O2 [when it is ready].

I really don't like the scheme of changing the optimisation threshold when
profiling data is used.

I've seen too many reports and presentations by the uninitiated who believe
that the use of profiling data has made the difference, when in reality
it is just GCC changing behaviour on which passes run. It is very
misleading!

With that line removed, and any rebasing needed over changes to the macro,
I'm happy with this patch.

Thanks,
James

Reply via email to