On Mon, 11 Jun 2012, William J. Schmidt wrote:

> On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > 
> > > This patch adds a heuristic to the vectorizer when estimating the
> > > minimum profitable number of iterations.  The heuristic is
> > > target-dependent, and is currently disabled for all targets except
> > > PowerPC.  However, the intent is to make it general enough to be useful
> > > for other targets that want to opt in.
> > > 
> > > A previous patch addressed some PowerPC SPEC degradations by modifying
> > > the vector cost model values for vec_perm and vec_promote_demote.  The
> > > values were set a little higher than their natural values because the
> > > natural values were not sufficient to prevent a poor vectorization
> > > choice.  However, this is not the right long-term solution, since it can
> > > unnecessarily constrain other vectorization choices involving permute
> > > instructions.
> > > 
> > > Analysis of the badly vectorized loop (in sphinx3) showed that the
> > > problem was overcommitment of vector resources -- too many vector
> > > instructions issued without enough non-vector instructions available to
> > > cover the delays.  The vector cost model assumes that instructions
> > > always have a constant cost, and doesn't have a way of judging this kind
> > > of "density" of vector instructions.
> > > 
> > > The present patch adds a heuristic to recognize when a loop is likely to
> > > overcommit resources, and adds a small penalty to the inside-loop cost
> > > to account for the expected stalls.  The heuristic is parameterized with
> > > three target-specific values:
> > > 
> > >  * Density threshold: The heuristic will apply only when the
> > >    percentage of inside-loop cost attributable to vectorized
> > >    instructions exceeds this value.
> > > 
> > >  * Size threshold: The heuristic will apply only when the
> > >    inside-loop cost exceeds this value.
> > > 
> > >  * Penalty: The inside-loop cost will be increased by this
> > >    percentage value when the heuristic applies.
> > > 
> > > Thus only reasonably large loop bodies that are mostly vectorized
> > > instructions will be affected.
> > > 
> > > By applying only a small percentage bump to the inside-loop cost, the
> > > heuristic will only turn off vectorization for loops that were
> > > considered "barely profitable" to begin with (such as the sphinx3 loop).
> > > So the heuristic is quite conservative and should not affect the vast
> > > majority of vectorization decisions.
> > > 
> > > Together with the new heuristic, this patch reduces the vec_perm and
> > > vec_promote_demote costs for PowerPC to their natural values.
> > > 
> > > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
> > > and verified that no performance regressions occur on SPEC cpu2006.  Is
> > > this ok for trunk?
> > 
> > Hmm.  I don't like this patch or its general idea too much.  Instead
> > I'd like us to move more of the cost model detail to the target, giving
> > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > posting the overall idea at some point, but let me repeat it here instead
> > of trying to find that e-mail.
> > 
> > The basic interface of the cost model should be, in targetm.vectorize
> > 
> >   /* Tell the target to start cost analysis of a loop or a basic-block
> >      (if the loop argument is NULL).  Returns an opaque pointer to
> >      target-private data.  */
> >   void *init_cost (struct loop *loop);
> > 
> >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> >   void add_stmt_cost (void *data, unsigned n,
> >                   vectorized-stmt-kind,
> >                       enum machine_mode vector_mode);
> > 
> >   /* Tell the target to compute and return the cost of the accumulated
> >      statements and free any target-private data.  */
> >   unsigned finish_cost (void *data);
> > 
> > with eventually slightly different signatures for add_stmt_cost
> > (like pass in the original scalar stmt?).
> > 
> > It allows the target, at finish_cost time, to evaluate things like
> > register pressure and resource utilization.
> 
> OK, I'm trying to understand how you would want this built into the
> present structure.  Taking just the loop case for now:
> 
> Judging by your suggested API, we would have to call add_stmt_cost ()
> everywhere that we now call stmt_vinfo_set_inside_of_loop_cost ().  For
> now this would be an additional call, not a replacement, though maybe
> the other goes away eventually.  This allows the target to save more
> data about the vectorized instructions than just an accumulated cost
> number (order and quantity of various kinds of instructions can be
> maintained for better modeling).  Presumably the call to finish_cost
> would be done within vect_estimate_min_profitable_iters () to produce
> the final value of inside_cost for the loop.

Yes.  I didn't look in detail at the difference between inner/outer
costs though.  Maybe that complicates things, maybe not.

> The default target hook for add_stmt_cost would duplicate what we
> currently do for calculating the inside_cost of a statement, and the
> default target hook for finish_cost would just return the sum.

Right.  We should be able to remove STMT_VINFO_INSIDE_OF_LOOP_COST
and maybe STMT_VINFO_OUTSIDE_OF_LOOP_COST and the target would track
cost in whatever way it wants - either by summing on-the-fly or
queuing info from the add_stmt_cost calls to be able to look at the
"whole" thing (plus if it gets the scalar stmt as argument to
add_stmt_cost it knows some of the dependencies of the vector 
instructions, too).

> I'll have to go hunting where the similar code would fit for SLP in a
> basic block.
> 
> If I read you correctly, you don't object to a density heuristic such as
> the one I implemented here, but you want to see such heuristics confined
> to a specific target rather than parameterized for all targets.

Correct.

> How'm I doing?  I want to be sure we're on the same page before I delve
> into this.

Thanks,
Richard.

Reply via email to