http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56049
--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> 2013-02-11 22:55:44 UTC --- Well, I think we should try to toamn fantasy of our optimizers here. What unroller sees at -O3 -fno-tree-vectorize is quite ugly: <bb 2>: a = {}; <bb 3>: # i_1 = PHI <1(2), i_7(7)> # prephitmp_99 = PHI <0(2), pretmp_98(7)> # prephitmp_102 = PHI <0(2), pretmp_101(7)> # prephitmp_105 = PHI <0(2), pretmp_104(7)> # prephitmp_108 = PHI <0(2), pretmp_107(7)> # prephitmp_111 = PHI <0(2), pretmp_110(7)> # prephitmp_114 = PHI <0(2), pretmp_113(7)> # prephitmp_117 = PHI <0(2), pretmp_116(7)> # prephitmp_120 = PHI <0(2), pretmp_119(7)> # ivtmp_57 = PHI <10000000(2), ivtmp_64(7)> <bb 4>: # S.0_90 = PHI <S.0_36(5), 1(3)> # prephitmp_126 = PHI <pretmp_125(5), prephitmp_99(3)> # prephitmp_129 = PHI <pretmp_128(5), prephitmp_102(3)> # prephitmp_132 = PHI <pretmp_131(5), prephitmp_105(3)> # prephitmp_135 = PHI <pretmp_134(5), prephitmp_108(3)> # prephitmp_138 = PHI <pretmp_137(5), prephitmp_111(3)> # prephitmp_141 = PHI <pretmp_140(5), prephitmp_114(3)> # prephitmp_144 = PHI <pretmp_143(5), prephitmp_117(3)> # prephitmp_147 = PHI <pretmp_146(5), prephitmp_120(3)> # ivtmp_43 = PHI <ivtmp_50(5), 8(3)> _29 = S.0_90 * 8; _42 = _29 + -8; _44 = prephitmp_126 + 1; b[_42] = _44; _49 = _29 + -7; _51 = prephitmp_129 + 1; b[_49] = _51; _56 = _29 + -6; _58 = prephitmp_132 + 1; b[_56] = _58; _63 = _29 + -5; _65 = prephitmp_135 + 1; b[_63] = _65; _70 = _29 + -4; b[_63] = _65; _70 = _29 + -4; _72 = prephitmp_138 + 1; b[_70] = _72; _77 = _29 + -3; _79 = prephitmp_141 + 1; b[_77] = _79; _84 = _29 + -2; _86 = prephitmp_144 + 1; b[_84] = _86; _91 = _29 + -1; _93 = prephitmp_147 + 1; b[_91] = _93; S.0_36 = S.0_90 + 1; ivtmp_50 = ivtmp_43 - 1; if (ivtmp_50 == 0) goto <bb 6>; else goto <bb 5>; <bb 5>: pretmp_122 = S.0_36 * 8; pretmp_124 = pretmp_122 + -8; pretmp_125 = a[pretmp_124]; pretmp_127 = pretmp_122 + -7; pretmp_128 = a[pretmp_127]; pretmp_130 = pretmp_122 + -6; pretmp_131 = a[pretmp_130]; pretmp_133 = pretmp_122 + -5; pretmp_134 = a[pretmp_133]; pretmp_136 = pretmp_122 + -4; pretmp_137 = a[pretmp_136]; pretmp_139 = pretmp_122 + -3; pretmp_140 = a[pretmp_139]; pretmp_142 = pretmp_122 + -2; pretmp_143 = a[pretmp_142]; pretmp_145 = pretmp_122 + -1; pretmp_146 = a[pretmp_145]; goto <bb 4>; With vectorization we actually unroll the inner loop but the outer one gets so ugly that we don't do much about it... So what about keeping it as enhancement request? I will try to poke about it, but htere is but about PR overactivity of this type already, right?