------- Comment #16 from steven at gcc dot gnu dot org 2010-02-05 13:33 ------- I'm trying to coerce IVOPTSs into producing the following, optimal code in the GIMPLE optimizers (without much luck, so far):
<bb 2>: pretmp.11_26 = (int) s_11(D); ivtmp.20_28 = (long unsigned int) b_inout_5(D); D.1848_31 = (long unsigned int) b_inout_5(D); D.1849_32 = D.1848_31 + 256; <bb 3>: # ivtmp.20_25 = PHI <ivtmp.20_24(4), ivtmp.20_28(2)> D.1846_29 = (void *) ivtmp.20_25; D.1814_10 = MEM[base: D.1846_29]{*D.1813}; D.1816_13 = D.1814_10 * pretmp.11_26; D.1847_30 = (void *) ivtmp.20_25; MEM[base: D.1847_30]{*D.1813} = D.1816_13; ivtmp.20_24 = ivtmp.20_25 + 4; if (ivtmp.20_24 != D.1849_32) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: return; If we can get the compiler to generate the above code in IVOPTS, then we should get the same code as the hand-unrolled example (although it also looks like a bit more scheduler look-ahead freedom is necessary). I asked Zdenek for help. For ARM9 he found the following problem: /quote/ Address costs: index costs 6 cst + index costs 2 arm_arm_address_cost pretends that having reg + cst as an address is cheaper than having reg by itself. Ivopts are happy to make this happen :-) There used to be the same problem on x86, which was eventually fixed by making address_cost reflect the real cost of addresses, rather than "which addressing modes should CSE prefer" metrics. /quote/ Of course all cases with cost of "index" > cost of "index + cst" results in the code like that of an unpatched compiler. But if I adjust the cost to make "index" cost only 1 or 2, I get this: <bb 2>: pretmp.11_26 = (int) s_11(D); ivtmp.25_28 = (long unsigned int) b_inout_5(D); <bb 3>: # i_20 = PHI <i_14(4), 0(2)> # ivtmp.25_25 = PHI <ivtmp.25_24(4), ivtmp.25_28(2)> D.1846_29 = (void *) ivtmp.25_25; D.1814_10 = MEM[base: D.1846_29]{*D.1813}; D.1816_13 = D.1814_10 * pretmp.11_26; D.1847_30 = (void *) ivtmp.25_25; MEM[base: D.1847_30]{*D.1813} = D.1816_13; ivtmp.25_24 = ivtmp.25_25 + 4; i_14 = i_20 + 1; if (i_14 != 64) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: return; That is still not optimal: We get an extra IV for some reason. -- steven at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rakdver at gcc dot gnu dot | |org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712