On 21/03/12 13:40, William J. Schmidt wrote: > On Wed, 2012-03-21 at 10:33 +0100, Richard Guenther wrote: >> On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski <pins...@gmail.com> wrote: >>> On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt >>> <wschm...@linux.vnet.ibm.com> wrote: >>>> Greetings, >>>> >>>> Now that we're into stage 1 again, I'd like to submit the first round of >>>> changes for dominator-based strength reduction, which will address >>>> issues from PR22586, PR35308, PR46556, and perhaps others. I'm >>>> attaching two patches: the smaller (slsr-part1) is the patch I'm >>>> submitting for approval today, while the larger (slsr-fyi) is for >>>> reference only, but may be useful if questions arise about how the small >>>> patch fits into the intended whole. >>>> >>>> This patch contains the logic for identifying strength reduction >>>> candidates, and makes replacements only for those candidates where the >>>> stride is a fixed constant. Replacement for candidates with fixed but >>>> unknown strides are not implemented herein, but that logic can be viewed >>>> in the larger patch. This patch does not address strength reduction of >>>> data reference expressions, or candidates with conditional increments; >>>> those issues will be dealt with in future patches. >>>> >>>> The cost model is built on the one used by tree-ssa-ivopts.c, and I've >>>> added some new instruction costs to that model in place. It might >>>> eventually be good to divorce that modeling code from IVOPTS, but that's >>>> an orthogonal patch and somewhat messy. >>> >>> I think this is the wrong way to do straight line strength reduction >>> considering we have a nice value numbering system which should be easy >>> to extended to support it. >> >> Well, it is easy to handle very specific easy cases like >> >> a = i * 2; >> b = i * 3; >> c = i * 4; >> >> to transform it to >> >> a = i * 2; >> b = a + i; >> c = b + i; >> >> but already >> >> a = i * 2; >> b = i * 4; >> c = i * 6; >> >> would need extra special code. The easy case could be handled in eliminate >> () >> by, when seeing A * CST, looking up A * (CST - 1) and if that >> succeeds, transform >> it to VAL + A. Cost issues are increasing the lifetime of VAL. I've done >> this >> simple case at some point, but it failed to handle the common associated >> cases, >> when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 + >> 3, etc. I think >> it is the re-association in case of a strength-reduction opportunity >> that makes the >> separate pass better? How would you suggest handling this case in the >> VN framework? Detect the a * 3 + 3 pattern and then do two lookups, one for >> a * 2 and one for val + 2? But then we still don't have a value for a + 1 >> to re-use ... > > And it becomes even more difficult with more complex scenarios. > Consider: > > a = x + (3 * s); > b = x + (5 * s); > c = x + (7 * s); > > The framework I've developed recognizes that this group of instructions > is related, and that it is profitable to replace them as follows: > > a = x + (3 * s); > t = 2 * s; > b = a + t; > c = b + t; >
Given that CPUs often have shift+add, that's not necessarily best either. Also, on pipelined super-scalar systems you're serializing a problem when it might be better to improve the parallelism. The best sequence on ARM would probably be something like a = x + (3 * s); b = a + (2 * s); (ADD b, a, s, LSL #1) c = a + (4 * s); (ADD b, a, s, LSL #2). R.