On Fri, Jun 20, 2014 at 5:01 PM, Bingfeng Mei <b...@broadcom.com> wrote:
>
>
>> -----Original Message-----
>> From: Bin.Cheng [mailto:amker.ch...@gmail.com]
>> Sent: 20 June 2014 06:25
>> To: Bingfeng Mei
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: regs_used estimation in IVOPTS seriously flawed
>>
>> On Tue, Jun 17, 2014 at 10:59 PM, Bingfeng Mei <b...@broadcom.com> wrote:
>> > Hi,
>> > I am looking at a performance regression in our code. A big loop
>> produces
>> > and uses a lot of temporary variables inside the loop body. The
>> problem
>> > appears that IVOPTS pass creates even more induction variables (from
>> original
>> > 2 to 27). It causes a lot of register spilling later and performance
>> Do you have a simplified case which can be posted here?  I guess it
>> affects some other targets too.
>>
>> > take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does call
>> > estimate_reg_pressure_cost function to take # of registers into
>> > consideration. The second parameter passed as data->regs_used is
>> supposed
>> > to represent old register usage before IVOPTS.
>> >
>> >   return size + estimate_reg_pressure_cost (size, data->regs_used,
>> data->speed,
>> >                                             data->body_includes_call);
>> >
>> > In this case, it is mere 2 by following calculation. Essentially, it
>> only counts
>> > all loop invariant registers, ignoring all registers produced/used
>> inside the loop.
>> There are two kinds of registers produced/used inside the loop.  One
>> is induction variable irrelevant, it includes non-linear uses as
>> mentioned by Richard.  The other kind relates to induction variable
>> rewrite, and one issue with this kind is expression generated during
>> iv use rewriting is not reflecting the estimated one in ivopt very
>> well.
>>
>
> As a short term solution, I tried some simple non-linear functions as
Richard suggested

Oh, I misread the non-linear way as non-linear iv uses.

> to penalize using too many IVs. For example, the following cost in
> ivopts_global_cost_for_size fixed my regression and actually improves 
> performance
> slightly over a set of benchmarks we usually use.

Great, I will try to tweak it on ARM.

>
>   return size * (1 + size * 0.2)
>           + estimate_reg_pressure_cost (size, data->regs_used, data->speed,
>                                                        
> data->body_includes_call);
>
> The trouble is choice of this non-linear function could be highly target 
> dependent
> (# of registers?). I don't have setup to prove performance gain for other 
> targets.
>
> I also tried counting all SSA names and divide it by a factor. It does seem 
> to work

So the number currently computed is the lower bound which is too
small.  Maybe it's possible to do some analysis with relatively low
cost increasing the number somehow.  While on the other hand, doesn't
bring restriction to IVOPT for loops with low register pressure.

Thanks,
bin

> so well.
>
> Long term, if we have infrastructure to analyze maximal live variable in a 
> loop
> at tree-level, that would be great for many loop optimizations.
>
> Thanks,
> Bingfeng



-- 
Best Regards.

Reply via email to