http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56049



--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> 2013-02-11 
22:55:44 UTC ---

Well, I think we should try to toamn fantasy of our optimizers here.  What

unroller sees at -O3 -fno-tree-vectorize is quite ugly:



  <bb 2>:

  a = {};



  <bb 3>:

  # i_1 = PHI <1(2), i_7(7)>

  # prephitmp_99 = PHI <0(2), pretmp_98(7)>

  # prephitmp_102 = PHI <0(2), pretmp_101(7)>

  # prephitmp_105 = PHI <0(2), pretmp_104(7)>

  # prephitmp_108 = PHI <0(2), pretmp_107(7)>

  # prephitmp_111 = PHI <0(2), pretmp_110(7)>

  # prephitmp_114 = PHI <0(2), pretmp_113(7)>

  # prephitmp_117 = PHI <0(2), pretmp_116(7)>

  # prephitmp_120 = PHI <0(2), pretmp_119(7)>

  # ivtmp_57 = PHI <10000000(2), ivtmp_64(7)>



  <bb 4>:

  # S.0_90 = PHI <S.0_36(5), 1(3)>

  # prephitmp_126 = PHI <pretmp_125(5), prephitmp_99(3)>

  # prephitmp_129 = PHI <pretmp_128(5), prephitmp_102(3)>

  # prephitmp_132 = PHI <pretmp_131(5), prephitmp_105(3)>

  # prephitmp_135 = PHI <pretmp_134(5), prephitmp_108(3)>

  # prephitmp_138 = PHI <pretmp_137(5), prephitmp_111(3)>

  # prephitmp_141 = PHI <pretmp_140(5), prephitmp_114(3)>

  # prephitmp_144 = PHI <pretmp_143(5), prephitmp_117(3)>

  # prephitmp_147 = PHI <pretmp_146(5), prephitmp_120(3)>

  # ivtmp_43 = PHI <ivtmp_50(5), 8(3)>

  _29 = S.0_90 * 8;

  _42 = _29 + -8;

  _44 = prephitmp_126 + 1;

  b[_42] = _44;

  _49 = _29 + -7;

  _51 = prephitmp_129 + 1;

  b[_49] = _51;

  _56 = _29 + -6;

  _58 = prephitmp_132 + 1;

  b[_56] = _58;

  _63 = _29 + -5;

  _65 = prephitmp_135 + 1;

  b[_63] = _65;

  _70 = _29 + -4;

  b[_63] = _65;

  _70 = _29 + -4;

  _72 = prephitmp_138 + 1;

  b[_70] = _72;

  _77 = _29 + -3;

  _79 = prephitmp_141 + 1;

  b[_77] = _79;

  _84 = _29 + -2;

  _86 = prephitmp_144 + 1;

  b[_84] = _86;

  _91 = _29 + -1;

  _93 = prephitmp_147 + 1;

  b[_91] = _93;

  S.0_36 = S.0_90 + 1;

  ivtmp_50 = ivtmp_43 - 1;

  if (ivtmp_50 == 0)

    goto <bb 6>;

  else

    goto <bb 5>;



  <bb 5>:

  pretmp_122 = S.0_36 * 8;

  pretmp_124 = pretmp_122 + -8;

  pretmp_125 = a[pretmp_124];

  pretmp_127 = pretmp_122 + -7;

  pretmp_128 = a[pretmp_127];

  pretmp_130 = pretmp_122 + -6;

  pretmp_131 = a[pretmp_130];

  pretmp_133 = pretmp_122 + -5;

  pretmp_134 = a[pretmp_133];

  pretmp_136 = pretmp_122 + -4;

  pretmp_137 = a[pretmp_136];

  pretmp_139 = pretmp_122 + -3;

  pretmp_140 = a[pretmp_139];

  pretmp_142 = pretmp_122 + -2;

  pretmp_143 = a[pretmp_142];

  pretmp_145 = pretmp_122 + -1;

  pretmp_146 = a[pretmp_145];

  goto <bb 4>;



With vectorization we actually unroll the inner loop but the outer one gets so

ugly that we don't do much about it...



So what about keeping it as enhancement request? I will try to poke about it,

but htere is but about PR overactivity of this type already, right?

Reply via email to