http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39838

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at redhat dot com
         AssignedTo|unassigned at gcc dot       |rakdver at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #8 from Jeffrey A. Law <law at redhat dot com> 2011-01-13 15:17:50 
UTC ---
I'm by no means an expert on the current IV code, so please take this with a
grain of salt.

If we look at the .ivcanon dump we have the two following critical blocks:

  # BLOCK 3 freq:9100
  # PRED: 4 [91.0%]  (true,exec)
  # VUSE <.MEM_21>
  D.1969_8 = p_4(D)->data;
  D.1972_11 = D.1969_8 + D.1971_10;
  # VUSE <.MEM_21>
  D.1973_12 = *D.1972_11;
  D.1977_17 = D.1969_8 + D.1976_16;
  # VUSE <.MEM_21>
  D.1978_18 = *D.1977_17;
  # .MEM_24 = VDEF <.MEM_21>
  func (D.1973_12, D.1978_18);
  j_19 = j_2 + 1;
  # SUCC: 4 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:10000
  # PRED: 7 [100.0%]  (fallthru,exec) 3 [100.0%]  (fallthru,exec)
  # j_2 = PHI <0(7), j_19(3)>
  # .MEM_21 = PHI <.MEM_22(7), .MEM_24(3)>
  if (j_2 < count_7(D))
    goto <bb 3>;
  else
    goto <bb 5>;
  # SUCC: 3 [91.0%]  (true,exec) 5 [9.0%]  (false,exec)

  # BLOCK 5 freq:900
  # PRED: 4 [9.0%]  (false,exec)
  i_20 = i_1 + 1;
  # SUCC: 6 [100.0%]  (fallthru,exec)

  # BLOCK 6 freq:989
  # PRED: 2 [100.0%]  (fallthru,exec) 5 [100.0%]  (fallthru,exec)
  # i_1 = PHI <0(2), i_20(5)>
  # .MEM_22 = PHI <.MEM_23(D)(2), .MEM_21(5)>
  # VUSE <.MEM_22>
  D.1979_5 = p_4(D)->count;
  if (i_1 < D.1979_5)
    goto <bb 7>;
  else
    goto <bb 8>;
  # SUCC: 7 [91.0%]  (true,exec) 8 [9.0%]  (false,exec)

  # BLOCK 7 freq:900
  # PRED: 6 [91.0%]  (true,exec)
  i.0_9 = (unsigned int) i_1;
  D.1971_10 = i.0_9 * 4;
  D.1975_15 = i.0_9 + 1;
  D.1976_16 = D.1975_15 * 4;
  goto <bb 4>;
  # SUCC: 4 [100.0%]  (fallthru,exec)

Note carefully how we use D1971_10 and D1976_16 to build addresses for the two
memory references in block #3 (p->data[i] and p->data[i+1] respectively).

After IVopts we have:


  # BLOCK 3 freq:9100
  # PRED: 4 [91.0%]  (true,exec)
  # VUSE <.MEM_21>
  D.1969_8 = p_4(D)->data;
  D.1972_11 = D.1969_8 + D.1971_10;
  # VUSE <.MEM_21>
  D.1973_12 = *D.1972_11;
  D.1977_17 = D.1969_8 + D.1976_16;
  # VUSE <.MEM_21>
  D.1978_18 = *D.1977_17;
  # .MEM_24 = VDEF <.MEM_21>
  func (D.1973_12, D.1978_18);
  j_19 = j_2 + 1;
  # SUCC: 4 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:10000
  # PRED: 7 [100.0%]  (fallthru,exec) 3 [100.0%]  (fallthru,exec)
  # j_2 = PHI <0(7), j_19(3)>
  # .MEM_21 = PHI <.MEM_22(7), .MEM_24(3)>
  if (j_2 < count_7(D))
    goto <bb 3>;
  else
    goto <bb 5>;
  # SUCC: 3 [91.0%]  (true,exec) 5 [9.0%]  (false,exec)

  # BLOCK 5 freq:900
  # PRED: 4 [9.0%]  (false,exec)
  i_20 = i_1 + 1;
  ivtmp.12_14 = ivtmp.12_13 + 4;
  # SUCC: 6 [100.0%]  (fallthru,exec)

  # BLOCK 6 freq:989
  # PRED: 2 [100.0%]  (fallthru,exec) 5 [100.0%]  (fallthru,exec)
  # i_1 = PHI <0(2), i_20(5)>
  # .MEM_22 = PHI <.MEM_23(D)(2), .MEM_21(5)>
  # ivtmp.12_13 = PHI <4(2), ivtmp.12_14(5)>
  # VUSE <.MEM_22>
  D.1979_5 = p_4(D)->count;
  if (i_1 < D.1979_5)
    goto <bb 7>;
  else
    goto <bb 8>;
  # SUCC: 7 [91.0%]  (true,exec) 8 [9.0%]  (false,exec)

  # BLOCK 7 freq:900
  # PRED: 6 [91.0%]  (true,exec) 
  D.1992_6 = (unsigned int) i_1;
  D.1993_26 = D.1992_6 * 4; 
  D.1971_10 = D.1993_26;
  D.1976_16 = ivtmp.12_13; 
  goto <bb 4>; 
  # SUCC: 4 [100.0%]  (fallthru,exec)

UGH.  Everything involving ivtmp.12 is a waste of time.  We really just need to
realize that D1976_16 is D1971_10 + 4 which avoids all the nonsense with
ivtmp.12 and I *think* would restore the quality of this code.   I don't know
enough about the current ivopts code to prototype this and verify that such a
change would restore the quality of this code.

Zdenek, can you take a look?

Reply via email to