On Fri, Apr 12, 2013 at 7:42 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > On Fri, 2013-04-12 at 11:18 -0500, Bill Schmidt wrote: >> On Fri, 2013-04-12 at 15:51 +0100, Sofiane Naci wrote: >> > Hi, >> > >> > Consider the following sequence, which computes 2 addresses to access an >> > array: >> > >> > _2 = (long unsigned int) i_1(D); >> > _3 = _2 * 200; >> > _4 = _3 + 1000; >> > _6 = A2_5(D) + _4; >> > *_6[0] = 1; >> > _9 = _3 + 2000; >> > _10 = A2_5(D) + _9; >> > _11 = _2 * 4; >> > _13 = A1_12(D) + _11; >> > _14 = *_13; >> > *_10[0] = _14; >> > >> > >> > There is an opportunity for optimization here that the compiler misses, >> > probably due to the order of Gimple statements. If we rewrite >> > >> > _3 = _2 * 200; >> > _4 = _3 + 1000; >> > _6 = A2_5(D) + _4; >> > ... >> > _9 = _3 + 2000; >> > _10 = A2_5(D) + _9; >> > >> > as >> > >> > _3 = _2 * 200; >> > _4 = _3 + A2_5(D); >> > _6 = 1000 + _4; >> > ... >> > _9 = _3 + A2_5(D); >> > _10 = 1000 + _9; >> > >> > We can clearly omit instruction _9. >> > >> > As the widening multiply pass has been improved to consider constant >> > operands [1], this opportunity for optimization is lost as the widening >> > multiply pass converts the sequence into: >> > >> > _3 = i_1(D) w* 200; >> > _4 = WIDEN_MULT_PLUS_EXPR <i_1(D), 200, 1000>; >> > _6 = A2_5(D) + _4; >> > ... >> > _9 = WIDEN_MULT_PLUS_EXPR <i_1(D), 200, 2000>; >> > _10 = A2_5(D) + _9; >> > >> > >> > With this particular example, this causes a Dhrystone regression at the >> > AArch64 back end. >> > >> > Where in the front end could such an optimization take place? >> > >> > Bill, is this something that your Strength Reduction work [2] could be >> > addressing? >> >> Hm, no, this isn't really a strength reduction issue. You're not >> wanting to remove an unwanted multiply. This is more of a >> reassociation/value numbering concern. Essentially you have: >> >> = A2 + ((i * 200) + 1000) >> = A2 + ((i * 200) + 2000) >> >> which you'd like to reassociate into >> >> = (A2 + (i * 200)) + 1000 >> = (A2 + (i * 200)) + 2000 >> >> so the parenthesized expression can be seen as available by value >> numbering: >> >> T = A2 + (i * 200) >> = T + 1000 >> = T + 2000 >> >> But reassociation just looks at a single expression tree and doesn't >> know about the potential optimization. I'm not sure this fits well into >> any of the existing passes, but others will have more authoritative >> answers than me... > > All this said, it's not completely foreign to how the strength reduction > pass is structured. The problem is that strength reduction looks for > candidates of very restricted patterns, which keeps compile time down > and avoids deep searching: (a * x) + b or a * (x + b). Your particular > case adds only one more addend, but the number of ways that can be > reassociated immediately adds a fair amount of complexity. If your > example is extended to a two-dimensional array case, it becomes more > complex still. So the methods used by strength reduction don't scale > well to these more general problems. > > I imagine the existing canonical form for address calculations is good > for some things, but not for others. Hopefully someone with more > history in that area can suggest something.
Nothing. All canonicalizations will have cases where they lose. It's a know thing. Solutions welcome ;) Richard.