On Mon, 11 Jun 2012, William J. Schmidt wrote: > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > This patch adds a heuristic to the vectorizer when estimating the > > > minimum profitable number of iterations. The heuristic is > > > target-dependent, and is currently disabled for all targets except > > > PowerPC. However, the intent is to make it general enough to be useful > > > for other targets that want to opt in. > > > > > > A previous patch addressed some PowerPC SPEC degradations by modifying > > > the vector cost model values for vec_perm and vec_promote_demote. The > > > values were set a little higher than their natural values because the > > > natural values were not sufficient to prevent a poor vectorization > > > choice. However, this is not the right long-term solution, since it can > > > unnecessarily constrain other vectorization choices involving permute > > > instructions. > > > > > > Analysis of the badly vectorized loop (in sphinx3) showed that the > > > problem was overcommitment of vector resources -- too many vector > > > instructions issued without enough non-vector instructions available to > > > cover the delays. The vector cost model assumes that instructions > > > always have a constant cost, and doesn't have a way of judging this kind > > > of "density" of vector instructions. > > > > > > The present patch adds a heuristic to recognize when a loop is likely to > > > overcommit resources, and adds a small penalty to the inside-loop cost > > > to account for the expected stalls. The heuristic is parameterized with > > > three target-specific values: > > > > > > * Density threshold: The heuristic will apply only when the > > > percentage of inside-loop cost attributable to vectorized > > > instructions exceeds this value. > > > > > > * Size threshold: The heuristic will apply only when the > > > inside-loop cost exceeds this value. > > > > > > * Penalty: The inside-loop cost will be increased by this > > > percentage value when the heuristic applies. > > > > > > Thus only reasonably large loop bodies that are mostly vectorized > > > instructions will be affected. > > > > > > By applying only a small percentage bump to the inside-loop cost, the > > > heuristic will only turn off vectorization for loops that were > > > considered "barely profitable" to begin with (such as the sphinx3 loop). > > > So the heuristic is quite conservative and should not affect the vast > > > majority of vectorization decisions. > > > > > > Together with the new heuristic, this patch reduces the vec_perm and > > > vec_promote_demote costs for PowerPC to their natural values. > > > > > > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu > > > and verified that no performance regressions occur on SPEC cpu2006. Is > > > this ok for trunk? > > > > Hmm. I don't like this patch or its general idea too much. Instead > > I'd like us to move more of the cost model detail to the target, giving > > it a chance to look at the whole loop before deciding on a cost. ISTR > > posting the overall idea at some point, but let me repeat it here instead > > of trying to find that e-mail. > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > (if the loop argument is NULL). Returns an opaque pointer to > > target-private data. */ > > void *init_cost (struct loop *loop); > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > void add_stmt_cost (void *data, unsigned n, > > vectorized-stmt-kind, > > enum machine_mode vector_mode); > > > > /* Tell the target to compute and return the cost of the accumulated > > statements and free any target-private data. */ > > unsigned finish_cost (void *data); > > > > with eventually slightly different signatures for add_stmt_cost > > (like pass in the original scalar stmt?). > > > > It allows the target, at finish_cost time, to evaluate things like > > register pressure and resource utilization. > > OK, I'm trying to understand how you would want this built into the > present structure. Taking just the loop case for now: > > Judging by your suggested API, we would have to call add_stmt_cost () > everywhere that we now call stmt_vinfo_set_inside_of_loop_cost (). For > now this would be an additional call, not a replacement, though maybe > the other goes away eventually. This allows the target to save more > data about the vectorized instructions than just an accumulated cost > number (order and quantity of various kinds of instructions can be > maintained for better modeling). Presumably the call to finish_cost > would be done within vect_estimate_min_profitable_iters () to produce > the final value of inside_cost for the loop.
Yes. I didn't look in detail at the difference between inner/outer costs though. Maybe that complicates things, maybe not. > The default target hook for add_stmt_cost would duplicate what we > currently do for calculating the inside_cost of a statement, and the > default target hook for finish_cost would just return the sum. Right. We should be able to remove STMT_VINFO_INSIDE_OF_LOOP_COST and maybe STMT_VINFO_OUTSIDE_OF_LOOP_COST and the target would track cost in whatever way it wants - either by summing on-the-fly or queuing info from the add_stmt_cost calls to be able to look at the "whole" thing (plus if it gets the scalar stmt as argument to add_stmt_cost it knows some of the dependencies of the vector instructions, too). > I'll have to go hunting where the similar code would fit for SLP in a > basic block. > > If I read you correctly, you don't object to a density heuristic such as > the one I implemented here, but you want to see such heuristics confined > to a specific target rather than parameterized for all targets. Correct. > How'm I doing? I want to be sure we're on the same page before I delve > into this. Thanks, Richard.