On Tue, Mar 22, 2016 at 11:22 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Mar 16, 2016 at 10:06 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> >> On Wed, Mar 16, 2016 at 10:48 AM, Bin Cheng <bin.ch...@arm.com> wrote: >> > Hi, >> > When I tried to decrease # of IV candidates, I removed code that adds IV >> > candidates for use with constant offset stripped in use->base. This is >> > kind of too aggressive and triggers PR69042. So here is a patch adding >> > back the missing candidates. Honestly, this patch doesn't truly fix the >> > issue, it just brings back the original behavior in IVOPT part (Which is >> > still a right thing to do I think). The issue still depends on PIC_OFFSET >> > register used on x86 target. As discussed in >> > https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html. Furthermore, the real >> > problem could be in register pressure modeling about PIC_OFFSET symbol in >> > IVOPT. >> > >> > On AArch64, overall spec2k number isn't changed, though 173.applu is >> > regressed by ~4% because couple of loops' # of candidates now hits >> > "--param iv-consider-all-candidates-bound=30". For spec2k6 data on >> > AArch64, INT is not affected; FP overall is not changed, as for specific >> > case: 459.GemsFDTD is regressed by 2%, 433.milc is improved by 2%. To >> > address the regression, I will send another patch increasing the parameter >> > bound. >> > >> > Bootstrap&test on x86_64 and AArch64, is it OK? In the meantime, I will >> > collect spec2k6 data on x86_64. >> >> Ok. > Hi Richard, > Hmm, I got spec2k6 data on my x86_64, it (along with patch increasing > param iv-consider-all-candidates-bound) causes 1% regression for > 436.cactusADM in my run. I looked into the code, for function > bench_staggeredleapfrog2_ (takes 99% running time after patching), > IVOPT chooses one fewer candidates for outer loop, but it does result > in couple of more instructions there.
You mean IVOPTs chooses one fewer IVs for the outer loop? > For this case, register > pressure is a more interesting issue (36 candidates chosen in outer > loop, many stack accesses), not sure if this 1% regression blocks the > patch at this stage, or not? Is this with or without the increase of the param? What compiler options and on what sub-architecture was this? I think if the IVO choice looks optimal before the patch and not optimal after then it's worth blocking but it sounds like the IVO choice is a mess anyway? [can you maybe check IV choice by ICC?] Thanks, Richard. > Thanks, > bin