On Tue, Mar 22, 2016 at 11:01 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Tue, Mar 22, 2016 at 11:22 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Wed, Mar 16, 2016 at 10:06 AM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>>
>>> On Wed, Mar 16, 2016 at 10:48 AM, Bin Cheng <bin.ch...@arm.com> wrote:
>>> > Hi,
>>> > When I tried to decrease # of IV candidates, I removed code that adds IV 
>>> > candidates for use with constant offset stripped in use->base.  This is 
>>> > kind of too aggressive and triggers PR69042.  So here is a patch adding 
>>> > back the missing candidates.  Honestly, this patch doesn't truly fix the 
>>> > issue, it just brings back the original behavior in IVOPT part (Which is 
>>> > still a right thing to do I think).  The issue still depends on 
>>> > PIC_OFFSET register used on x86 target.  As discussed in 
>>> > https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html.  Furthermore, the real 
>>> > problem could be in register pressure modeling about PIC_OFFSET symbol in 
>>> > IVOPT.
>>> >
>>> > On AArch64, overall spec2k number isn't changed, though 173.applu is 
>>> > regressed by ~4% because couple of loops' # of candidates now hits 
>>> > "--param iv-consider-all-candidates-bound=30".  For spec2k6 data on 
>>> > AArch64, INT is not affected; FP overall is not changed, as for specific 
>>> > case: 459.GemsFDTD is regressed by 2%, 433.milc is improved by 2%.  To 
>>> > address the regression, I will send another patch increasing the 
>>> > parameter bound.
>>> >
>>> > Bootstrap&test on x86_64 and AArch64, is it OK?  In the meantime, I will 
>>> > collect spec2k6 data on x86_64.
>>>
>>> Ok.
>> Hi Richard,
>> Hmm, I got spec2k6 data on my x86_64, it (along with patch increasing
>> param iv-consider-all-candidates-bound) causes 1% regression for
>> 436.cactusADM in my run.  I looked into the code, for function
>> bench_staggeredleapfrog2_ (takes 99% running time after patching),
>> IVOPT chooses one fewer candidates for outer loop, but it does result
>> in couple of more instructions there.
>
> You mean IVOPTs chooses one fewer IVs for the outer loop?
Yes.
>
>>  For this case, register
>> pressure is a more interesting issue (36 candidates chosen in outer
>> loop, many stack accesses), not sure if this 1% regression blocks the
>> patch at this stage, or not?
>
> Is this with or without the increase of the param?  What compiler options and
> on what sub-architecture was this?
It's not the param, we have both address and generic iv_uses in the
form of {base + 4, step}, and there is another address iv_use {base +
8, step}.  With the patch we now add candidate {base, step}, which
will be used for all these uses.  Before the patch, candidates {base +
4, step} and {base + 8, step} are chosen for these iv_uses
respectively.
As a matter of fact, {base + 4, step} should be the best choice, but
we can't do that because candidate added from one iv_use is not
related to other iv_use and we exceed the bound of param
iv-consider-all-candidates-bound = 30.
>
> I think if the IVO choice looks optimal before the patch and not optimal after
> then it's worth blocking but it sounds like the IVO choice is a mess anyway?
Yes, it's a mess here.  For three outermost loops in 436, IVO chooses
>32 candidates for each loop.  I found at least one bug in register
pressure calculation is responsible for this.  By fixing this (plus
one another local patch), # of candidates of these loops are reduced
from 36/36/32 to 12/12/7.  And # of lines of assembly is reduced from
~7150 to ~6850.  Interesting thing is, the performance isn't
improved...  That may be because they are outermost loops.

> [can you maybe check IV choice by ICC?]
I don't have ICC at hand, but from past experiments, ICC tends to
reuse more common IVs than GCC.

Thanks,
bin

Reply via email to