https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915

--- Comment #17 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Wilco from comment #16)
> (In reply to Andrew Pinski from comment #13)
> > (In reply to Wilco from comment #9)
> > > I committed a workaround
> > > (http://gcc.gnu.org/ml/gcc-patches/2014-09/msg00362.html) by increasing 
> > > the
> > > int<->fp move cost. Can you try this and check the issue has indeed gone?
> > > You need -mcpu=cortex-a57.
> > 
> > Note when I submitted ThunderX support I used a base of 2 instead of a base
> > of 1 due to 2 being the default and all values are relative to that.  This
> > is mentioned in https://gcc.gnu.org/onlinedocs/gccint/Costs.html .  In fact
> > a value of 2 means reload will not look at the constraints of a move
> > instruction.
> > 
> > So I think the cortex* cpus should also re-base these values based on 2
> > being gpr-to-gpr value.
> 
> You mean only use multiples of 2? That's interesting as I've not seen that
> done elsewhere. Are these costs in any way related to real issue and latency
> cycles? Most targets have complex tables with all the exact latencies for
> every little uarch detail, but from what I've seen in the allocator these
> costs have almost no meaning.

Not always multiple of 2 though in the case of ThunderX they are multiple of
twos.  The costs are not really directly related to the latency cost but it is
relative to one another.  So I could have used 2, 3, 4 (meaning latency of 1,
2, 3) instead.  I used the factor of 2 instead of 1 for ThunderX because 2 + 3
!= 4 but rather 5.

> 
> So did you find that setting the FP move cost so low actually works alright
> on ThunderX? I'd like to figure out a setting for the generic target that
> works out well across all AArch64 implementations.

Yes it seems to at least on the things we have benchmarked but we have not done
much big benchmarks like SPEC yet.

Reply via email to