> -----Original Message-----
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: 18 June 2014 12:36
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: regs_used estimation in IVOPTS seriously flawed
> 
> On Tue, Jun 17, 2014 at 4:59 PM, Bingfeng Mei <b...@broadcom.com> wrote:
> > Hi,
> > I am looking at a performance regression in our code. A big loop
> produces
> > and uses a lot of temporary variables inside the loop body. The
> problem
> > appears that IVOPTS pass creates even more induction variables (from
> original
> > 2 to 27). It causes a lot of register spilling later and performance
> > take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does call
> > estimate_reg_pressure_cost function to take # of registers into
> > consideration. The second parameter passed as data->regs_used is
> supposed
> > to represent old register usage before IVOPTS.
> >
> >   return size + estimate_reg_pressure_cost (size, data->regs_used,
> data->speed,
> >                                             data->body_includes_call);
> >
> > In this case, it is mere 2 by following calculation. Essentially, it
> only counts
> > all loop invariant registers, ignoring all registers produced/used
> inside the loop.
> >
> >   n = 0;
> >   for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next
> (&psi))
> >     {
> >       phi = gsi_stmt (psi);
> >       op = PHI_RESULT (phi);
> >
> >       if (virtual_operand_p (op))
> >         continue;
> >
> >       if (get_iv (data, op))
> >         continue;
> >
> >       n++;
> >     }
> >
> >   EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, j, bi)
> >     {
> >       struct version_info *info = ver_info (data, j);
> >
> >       if (info->inv_id && info->has_nonlin_use)
> >         n++;
> >     }
> >
> >   data->regs_used = n;
> >
> > I believe how regs_used is calculated is seriously flawed,
> > or estimate_reg_pressure_cost is problematic if n_old is
> > only supposed to be loop invariant registers. Either way,
> > it affects how IVOPTS makes decision and could result in
> > worse code. What do you think? Any idea on how to improve
> > this?
> 
> Well, it's certainly a lower bound on the number of registers
> live through the whole loop execution (thus over the backedge).
> So they have the same cost as an induction variable as far
> as register pressure is concerned.
> 
> What it doesn't account for is the maximum number of live
> registers anywhere in the loop body - but that is hard to
> estimate at this point in the compilation.  You could compute
> the maximum number of live SSA names which could be
> an upper bound on the register pressure - but that needs
> liveness analysis which is expensive also that upper bound
> is probably way too high.
> 
Yes, I agree it is hard and probably expensive at this stage of
compilation to do accurate analysis. But it could be quite useful
for many tree-level loop optimizations, even just a half-accurate
estimation for register pressure, as also discussed in another 
thread a few days ago.


> So I think the current logic is sensible and simple.  It's just
> not perfect.
> 
> Maybe it's just the cost function of the IV set choosen that
> needs to be adjusted to account for the number of IVs
> in a non-linear way?  That is, adjust ivopts_global_cost_for_size
> which just adds size to sth that pessimizes more IVs even
> more like size * (1 + size / (1 + data->regs_used)) or
> simply size ** (1. + eps) with a suitable eps < 2.
> 
I am going to try a few cost functions as you suggested. Maybe also
just count all SSA together and divide it by a factor.

Thanks,
Bingfeng

Reply via email to