[Bug tree-optimization/105053] Wrong loop count for scalar code from vectorizer

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 25 Mar 2022 05:53:47 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105053


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
One notable difference is that the first loop is detected to require peeling
for gaps while the second one is not (probably an artifact of the low trip
count).
The second is that the first loop is detected as reduction path while the
second one as reduction chain.

OK, so I think I see what goes wrong.  We elided the load permutation but
the load is still biased wrongly.

  vectp.67_112 = _93 + 8;

  <bb 12> [local count: 405853744]:
  # i_98 = PHI <i_44(21), 0(11)>
  # prephitmp_7 = PHI <prephitmp_97(21), 0(11)>
  # ivtmp_31 = PHI <ivtmp_37(21), 4(11)>
  # vectp.66_105 = PHI <vectp.66_68(21), vectp.67_112(11)>
  # vect_prephitmp_7.71_61 = PHI <vect__26.72_62(21), { 0, 0, 0, 0 }(11)>
  # ivtmp_58 = PHI <ivtmp_81(21), 0(11)>
  _3 = (long unsigned int) i_98;
  _59 = _3 * 16;
  _60 = _93 + _59;
  _106 = MEM <vector(2) int> [(const int &)vectp.66_105];
  vect__54.68_113 = {_106, { 0, 0 }};
  vectp.66_95 = vectp.66_105 + 16;
  _89 = MEM <vector(2) int> [(const int &)vectp.66_95];
  vect__54.69_90 = {_89, { 0, 0 }};
  vect__51.70_78 = VEC_PERM_EXPR <vect__54.68_113, vect__54.69_90, { 0, 1, 4, 5
}>;

possibly because the SLP representative is unchanged when we transform

t.C:17:16: note:   node 0x3382280 (max_nunits=4, refcnt=2) const vector(4) int
t.C:17:16: note:   op template: _54 = MEM[(const int &)_60 + 12];
t.C:17:16: note:        stmt 0 _54 = MEM[(const int &)_60 + 12];
t.C:17:16: note:        stmt 1 _51 = MEM[(const int &)_60 + 8];
t.C:17:16: note:        load permutation { 1 0 }

into

t.C:17:16: note:   node 0x3382280 (max_nunits=4, refcnt=1) const vector(4) int
t.C:17:16: note:   op template: _54 = MEM[(const int &)_60 + 12];
t.C:17:16: note:        stmt 0 _51 = MEM[(const int &)_60 + 8];
t.C:17:16: note:        stmt 1 _54 = MEM[(const int &)_60 + 12];
t.C:17:16: note:        load permutation { 0 1 }

during SLP optimize.

[Bug tree-optimization/105053] Wrong loop count for scalar code from vectorizer

Reply via email to