On Tue, Jul 30, 2019 at 4:16 PM Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > > Hi all, > > >On 30/07/2019 10:31, Ramana Radhakrishnan wrote: > >> On 30/07/2019 10:08, Christophe Lyon wrote: > > >>> Hi Wilco, > >>> > >>> Do you know which benchmarks were used when this was checked-in? > >>> It isn't clear from > >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html > >> > >> It was from my time in Linaro and thus would have been a famous embedded > >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and > >> Cortex-A15. In addition to this I would like to see what the impact of > >> this is on something like Cortex-A53 as the issue rates are likely to be > >> different on the schedulers causing different behaviour. > > Obviously there are differences between various schedulers, but the general > issue is that register pressure is increased many times beyond the spilling > limit > (a few cases I looked at had a pressure well over 120 when there are only 14 > integer registers - this causes panic spilling in the register allocator). > > In fact the spilling overhead between the 2 algorithms is almost identical on > Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline > model used. It seems more related to the scheduler being too aggressive > and not caring about register pressure at all (for example lifting a load 100 > instructions before its use so it must be spilled).
In those days it would have been the Cortex-A8, Cortex-A9 schedulers and the Cortex-A15 schedulers and IIRC the benchmarking would have been mostly on a Cortex-A9 board or on some Cortex-A15 boards we had (long gone now) inside Arm. Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to spread the range across some v7-a CPUs as well ? While they aren't that popular today I would suggest you look at them because the defaults for v7-a are still to use the Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given the availability of hardware. > > >> I don't have all the notes today for that - maybe you can look into the > >> linaro wiki. > >> > >> I am concerned about taking this patch in without some more data across > >> a variety of cores. > >> > > > > My concern is the original patch > > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in > > any real detail as to the reasons for the choice of the second algorithm > > over the first. > > > > - It's not clear what the win was > > - It's not clear what outliers there were and whether they were > significant. > > > > And finally, it's not clear if, 7 years later, this is still the best > > choice. > > > > If the second algorithm really is better, why is no other target using > > it by default? > > > > I think we need a bit more information (both ways). In particular I'm > > concerned not just by the overall benchmark average, but also the amount > > of variance across the benchmarks. I think the default needs to avoid > > significant outliers if at all possible, even if it is marginally less > > good on the average. > > The results clearly show that algorithm 1 works best on Arm today - I haven't > seen a single benchmark where algorithm 2 results in less spilling. We could > tune algorithm 2 so it switches back to algorithm 1 when register pressure is > high or a basic block is large. However until it is fixed, the evidence is > that > algorithm 1 is the best choice for current cores. I'd be happy to move this forward if you could show if there is no *increase* in spills for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9 schedulers. Sorry about the time it has taken. I've been a bit otherwise occupied recently. regards Ramana > > Wilco