On Tue, Jul 30, 2019 at 4:16 PM Wilco Dijkstra <wilco.dijks...@arm.com> wrote:
>
> Hi all,
>
>  >On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
>  >> On 30/07/2019 10:08, Christophe Lyon wrote:
>
>  >>> Hi Wilco,
>  >>>
>  >>> Do you know which benchmarks were used when this was checked-in?
>  >>> It isn't clear from
>  >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
>  >>
>  >> It was from my time in Linaro and thus would have been a famous embedded
>  >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and
>  >> Cortex-A15. In addition to this I would like to see what the impact of
>  >> this is on something like Cortex-A53 as the issue rates are likely to be
>  >> different on the schedulers causing different behaviour.
>
> Obviously there are differences between various schedulers, but the general
> issue is that register pressure is increased many times beyond the spilling 
> limit
> (a few cases I looked at had a pressure well over 120 when there are only 14
> integer registers - this causes panic spilling in the register allocator).
>
> In fact the spilling overhead between the 2 algorithms is almost identical on
> Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline
> model used. It seems more related to the scheduler being too aggressive
> and not caring about register pressure at all (for example lifting a load 100
> instructions before its use so it must be spilled).

In those days it would have been the Cortex-A8, Cortex-A9 schedulers
and the Cortex-A15
schedulers and IIRC the benchmarking would have been mostly on a
Cortex-A9 board or on some
Cortex-A15 boards we had (long gone now) inside Arm.

Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
spread the range
across some v7-a CPUs as well ? While they aren't that popular today I
would suggest
you look at them because the defaults for v7-a are still to use the
Cortex-A8 scheduler and
the Cortex-A9 scheduler might well also get used in places given the
availability of hardware.


>
>  >> I don't have all the notes today for that - maybe you can look into the
>  >> linaro wiki.
>  >>
>  >> I am concerned about taking this patch in without some more data across
>  >> a variety of cores.
>  >>
>  >
>  > My concern is the original patch
>  > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in
>  > any real detail as to the reasons for the choice of the second algorithm
>  > over the first.
>  >
>  > - It's not clear what the win was
>  > - It's not clear what outliers there were and whether they were 
> significant.
>  >
>  > And finally, it's not clear if, 7 years later, this is still the best
>  > choice.
>  >
>  > If the second algorithm really is better, why is no other target using
>  > it by default?
>  >
>  > I think we need a bit more information (both ways).  In particular I'm
>  > concerned not just by the overall benchmark average, but also the amount
>  > of variance across the benchmarks.  I think the default needs to avoid
>  > significant outliers if at all possible, even if it is marginally less
>  > good on the average.
>
> The results clearly show that algorithm 1 works best on Arm today - I haven't
> seen a single benchmark where algorithm 2 results in less spilling. We could
> tune algorithm 2 so it switches back to algorithm 1 when register pressure is
> high or a basic block is large. However until it is fixed, the evidence is 
> that
> algorithm 1 is the best choice for current cores.

I'd be happy to move this forward if you could show if there is no
*increase* in spills
for the same range of benchmarks that you are doing for the Cortex-A8
and Cortex-A9
schedulers.

Sorry about the time it has taken. I've been a bit otherwise occupied recently.

regards
Ramana

>
> Wilco

Reply via email to