------- Comment #1 from amonakov at gcc dot gnu dot org 2010-03-01 17:43 ------- Created an attachment (id=20001) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20001&action=view) Simplify increments in IVopts using final values of inner loop IVs
A quick & dirty attempt to implement register pressure reduction in outer loops by using final values of inner loop IVs. Currently, given for (i = 0; i < N; i++) for (j = 0; j < N; j++) s += a[i][j]; we generate something like <bb1> L1: s.0 = PHI(0, s.2) i.0 = PHI(0, i.1) ivtmp.0 = &a[i.0][0] <bb2> L2: s.1 = PHI(s.0, s.2) j.0 = PHI(122, j.1) ivtmp.1 = PHI(ivtmp.0, ivtmp.2) s.2 = s.1 + MEM(ivtmp.1) ivtmp.2 = ivtmp.1 + 4 j.1 = j.0 - 1 if (j.1 >= 0) goto L2 <bb3> i.1 = i.0 + 1 if (i.1 <= 122) goto L1 This together with the patch mentioned in the previous comment allows to generate: ivtmp.0 = &a[0][0] <bb1> L1: s.0 = PHI(0, s.2) i.0 = PHI(122, i.1) ivtmp.1 = PHI(ivtmp.0, ivtmp.4) <bb2> L2: s.1 = PHI(s.0, s.2) j.0 = PHI(122, j.1) ivtmp.2 = PHI(ivtmp.1, ivtmp.3) s.2 = s.1 + MEM(ivtmp.2) ivtmp.3 = ivtmp.2 + 4 j.1 = j.0 - 1 if (j.1 >= 0) goto L2 <bb3> ivtmp.4 = ivtmp.3 // would be ivtmp.4 = ivtmp.1 + stride i.1 = i.0 - 1 if (i.1 >= 0) goto L1 The improvement is that ivtmp.1 is not live across the inner loop. The approach is to store final values of IVs in a hashtable, mapping SSA_NAME of initial value in the preheader to aff_tree with final value, and then try to replace increments of new IVs with uses of IVs from inner loops (currently I just implemented a brute force loop over all IV uses to find a useful entry in that hashtable). Does this make sense and sound acceptable? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174