> -----Original Message-----
> From: Bin.Cheng [mailto:amker.ch...@gmail.com]
> Sent: 20 June 2014 06:25
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: regs_used estimation in IVOPTS seriously flawed
> 
> On Tue, Jun 17, 2014 at 10:59 PM, Bingfeng Mei <b...@broadcom.com> wrote:
> > Hi,
> > I am looking at a performance regression in our code. A big loop
> produces
> > and uses a lot of temporary variables inside the loop body. The
> problem
> > appears that IVOPTS pass creates even more induction variables (from
> original
> > 2 to 27). It causes a lot of register spilling later and performance
> Do you have a simplified case which can be posted here?  I guess it
> affects some other targets too.
> 
> > take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does call
> > estimate_reg_pressure_cost function to take # of registers into
> > consideration. The second parameter passed as data->regs_used is
> supposed
> > to represent old register usage before IVOPTS.
> >
> >   return size + estimate_reg_pressure_cost (size, data->regs_used,
> data->speed,
> >                                             data->body_includes_call);
> >
> > In this case, it is mere 2 by following calculation. Essentially, it
> only counts
> > all loop invariant registers, ignoring all registers produced/used
> inside the loop.
> There are two kinds of registers produced/used inside the loop.  One
> is induction variable irrelevant, it includes non-linear uses as
> mentioned by Richard.  The other kind relates to induction variable
> rewrite, and one issue with this kind is expression generated during
> iv use rewriting is not reflecting the estimated one in ivopt very
> well.
> 

As a short term solution, I tried some simple non-linear functions as Richard 
suggested
to penalize using too many IVs. For example, the following cost in 
ivopts_global_cost_for_size fixed my regression and actually improves 
performance
slightly over a set of benchmarks we usually use. 

  return size * (1 + size * 0.2)
          + estimate_reg_pressure_cost (size, data->regs_used, data->speed,
                                                       
data->body_includes_call); 

The trouble is choice of this non-linear function could be highly target 
dependent
(# of registers?). I don't have setup to prove performance gain for other 
targets.

I also tried counting all SSA names and divide it by a factor. It does seem to 
work
so well.

Long term, if we have infrastructure to analyze maximal live variable in a loop
at tree-level, that would be great for many loop optimizations.

Thanks,
Bingfeng

Reply via email to