------- Comment #1 from amonakov at gcc dot gnu dot org  2010-03-01 17:43 
-------
Created an attachment (id=20001)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20001&action=view)
Simplify increments in IVopts using final values of inner loop IVs

A quick & dirty attempt to implement register pressure reduction in outer loops
by using final values of inner loop IVs.  Currently, given
for (i = 0; i < N; i++)
  for (j = 0; j < N; j++)
    s += a[i][j];
we generate something like
<bb1>
L1:
s.0 = PHI(0, s.2)
i.0 = PHI(0, i.1)
ivtmp.0 = &a[i.0][0]
<bb2>
L2:
s.1 = PHI(s.0, s.2)
j.0 = PHI(122, j.1)
ivtmp.1 = PHI(ivtmp.0, ivtmp.2)
s.2 = s.1 + MEM(ivtmp.1)
ivtmp.2 = ivtmp.1 + 4
j.1 = j.0 - 1
if (j.1 >= 0) goto L2
<bb3>
i.1 = i.0 + 1
if (i.1 <= 122) goto L1

This together with the patch mentioned in the previous comment allows to
generate:
ivtmp.0 = &a[0][0]
<bb1>
L1:
s.0 = PHI(0, s.2)
i.0 = PHI(122, i.1)
ivtmp.1 = PHI(ivtmp.0, ivtmp.4)
<bb2>
L2:
s.1 = PHI(s.0, s.2)
j.0 = PHI(122, j.1)
ivtmp.2 = PHI(ivtmp.1, ivtmp.3)
s.2 = s.1 + MEM(ivtmp.2)
ivtmp.3 = ivtmp.2 + 4
j.1 = j.0 - 1
if (j.1 >= 0) goto L2
<bb3>
ivtmp.4 = ivtmp.3 // would be ivtmp.4 = ivtmp.1 + stride
i.1 = i.0 - 1
if (i.1 >= 0) goto L1

The improvement is that ivtmp.1 is not live across the inner loop.

The approach is to store final values of IVs in a hashtable, mapping SSA_NAME
of initial value in the preheader to aff_tree with final value, and then try to
replace increments of new IVs with uses of IVs from inner loops (currently I
just implemented a brute force loop over all IV uses to find a useful entry in
that hashtable).
Does this make sense and sound acceptable?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174

Reply via email to