https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108677

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|wrong-code                  |missed-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>gcc vectorize the loop even if a dependency is present...[1]

What dependency? TrigArr cannot point to tp.

Also it is just doing just doing SLP, the vectorization is fine here even.
It is only doing basic block form.


IR:

  _2 = jp_31 + 4294967295;
  _3 = (long unsigned int) _2;
  _4 = _3 * 16;
  _5 = TrigArr.0_1 + _4;
  vect__22.19_48 = MEM <vector(2) double> [(double *)_5];
  vect__22.23_43 = VEC_PERM_EXPR <vect__22.19_48, vect__22.19_48, { 1, 0 }>;
  vect__20.15_54 = MEM <const vector(2) double> [(double *)tp_14(D)];
  vect__20.16_53 = VEC_PERM_EXPR <vect__20.15_54, vect__20.15_54, { 0, 0 }>;
  vect__20.27_38 = VEC_PERM_EXPR <vect__20.15_54, vect__20.15_54, { 1, 1 }>;
  vect__27.28_37 = vect__20.27_38 * vect__22.23_43;
  vect__57.29_36 = .VEC_FMADDSUB (vect__20.16_53, vect__22.19_48,
vect__27.28_37);
  _7 = (long unsigned int) jp_31;
  _8 = _7 * 16;
  _9 = TrigArr.0_1 + _8;
  MEM <vector(2) double> [(double *)_9] = vect__57.29_36;

_5 is &TrigArr[jp-1], _9 is &TrigArr[jp]

This looks fine to me.

As not doing SLP without the copy constructor, it seems like the IR changes and
forced the load from tp_14 outside of the loop which caused SLP not do to the
load there ...

Reply via email to