http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39838
Jeffrey A. Law <law at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |law at redhat dot com AssignedTo|unassigned at gcc dot |rakdver at gcc dot gnu.org |gnu.org | --- Comment #8 from Jeffrey A. Law <law at redhat dot com> 2011-01-13 15:17:50 UTC --- I'm by no means an expert on the current IV code, so please take this with a grain of salt. If we look at the .ivcanon dump we have the two following critical blocks: # BLOCK 3 freq:9100 # PRED: 4 [91.0%] (true,exec) # VUSE <.MEM_21> D.1969_8 = p_4(D)->data; D.1972_11 = D.1969_8 + D.1971_10; # VUSE <.MEM_21> D.1973_12 = *D.1972_11; D.1977_17 = D.1969_8 + D.1976_16; # VUSE <.MEM_21> D.1978_18 = *D.1977_17; # .MEM_24 = VDEF <.MEM_21> func (D.1973_12, D.1978_18); j_19 = j_2 + 1; # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:10000 # PRED: 7 [100.0%] (fallthru,exec) 3 [100.0%] (fallthru,exec) # j_2 = PHI <0(7), j_19(3)> # .MEM_21 = PHI <.MEM_22(7), .MEM_24(3)> if (j_2 < count_7(D)) goto <bb 3>; else goto <bb 5>; # SUCC: 3 [91.0%] (true,exec) 5 [9.0%] (false,exec) # BLOCK 5 freq:900 # PRED: 4 [9.0%] (false,exec) i_20 = i_1 + 1; # SUCC: 6 [100.0%] (fallthru,exec) # BLOCK 6 freq:989 # PRED: 2 [100.0%] (fallthru,exec) 5 [100.0%] (fallthru,exec) # i_1 = PHI <0(2), i_20(5)> # .MEM_22 = PHI <.MEM_23(D)(2), .MEM_21(5)> # VUSE <.MEM_22> D.1979_5 = p_4(D)->count; if (i_1 < D.1979_5) goto <bb 7>; else goto <bb 8>; # SUCC: 7 [91.0%] (true,exec) 8 [9.0%] (false,exec) # BLOCK 7 freq:900 # PRED: 6 [91.0%] (true,exec) i.0_9 = (unsigned int) i_1; D.1971_10 = i.0_9 * 4; D.1975_15 = i.0_9 + 1; D.1976_16 = D.1975_15 * 4; goto <bb 4>; # SUCC: 4 [100.0%] (fallthru,exec) Note carefully how we use D1971_10 and D1976_16 to build addresses for the two memory references in block #3 (p->data[i] and p->data[i+1] respectively). After IVopts we have: # BLOCK 3 freq:9100 # PRED: 4 [91.0%] (true,exec) # VUSE <.MEM_21> D.1969_8 = p_4(D)->data; D.1972_11 = D.1969_8 + D.1971_10; # VUSE <.MEM_21> D.1973_12 = *D.1972_11; D.1977_17 = D.1969_8 + D.1976_16; # VUSE <.MEM_21> D.1978_18 = *D.1977_17; # .MEM_24 = VDEF <.MEM_21> func (D.1973_12, D.1978_18); j_19 = j_2 + 1; # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:10000 # PRED: 7 [100.0%] (fallthru,exec) 3 [100.0%] (fallthru,exec) # j_2 = PHI <0(7), j_19(3)> # .MEM_21 = PHI <.MEM_22(7), .MEM_24(3)> if (j_2 < count_7(D)) goto <bb 3>; else goto <bb 5>; # SUCC: 3 [91.0%] (true,exec) 5 [9.0%] (false,exec) # BLOCK 5 freq:900 # PRED: 4 [9.0%] (false,exec) i_20 = i_1 + 1; ivtmp.12_14 = ivtmp.12_13 + 4; # SUCC: 6 [100.0%] (fallthru,exec) # BLOCK 6 freq:989 # PRED: 2 [100.0%] (fallthru,exec) 5 [100.0%] (fallthru,exec) # i_1 = PHI <0(2), i_20(5)> # .MEM_22 = PHI <.MEM_23(D)(2), .MEM_21(5)> # ivtmp.12_13 = PHI <4(2), ivtmp.12_14(5)> # VUSE <.MEM_22> D.1979_5 = p_4(D)->count; if (i_1 < D.1979_5) goto <bb 7>; else goto <bb 8>; # SUCC: 7 [91.0%] (true,exec) 8 [9.0%] (false,exec) # BLOCK 7 freq:900 # PRED: 6 [91.0%] (true,exec) D.1992_6 = (unsigned int) i_1; D.1993_26 = D.1992_6 * 4; D.1971_10 = D.1993_26; D.1976_16 = ivtmp.12_13; goto <bb 4>; # SUCC: 4 [100.0%] (fallthru,exec) UGH. Everything involving ivtmp.12 is a waste of time. We really just need to realize that D1976_16 is D1971_10 + 4 which avoids all the nonsense with ivtmp.12 and I *think* would restore the quality of this code. I don't know enough about the current ivopts code to prototype this and verify that such a change would restore the quality of this code. Zdenek, can you take a look?