https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92177

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2019-10-22
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, thanks - that might be an undesided side-effect of r277241

We now vectorize

      out[0] = a0 * x;
      out[1] = a1 * y;
      out[2] = a2 * x;
      out[3] = a3 * y;

as

  _5 = a0_40 * x_44(D);
  _6 = a1_41 * y_45(D);
  _7 = a2_42 * x_44(D);
  _8 = a3_43 * y_45(D);
  _66 = {_7, _8};
  vect_cst__67 = _66;
  _68 = {_5, _6};
  vect_cst__69 = _68;
  MEM <vector(2) unsigned int> [(unsigned int *)&out] = vect_cst__69;
  _71 = &out[0] + 8;
  MEM <vector(2) unsigned int> [(unsigned int *)_71] = vect_cst__67;

since we're no longer fencing the build-from-scalar code via
&& !SLP_TREE_CHILDREN (child).is_empty () (previously we had no
SLP children nodes for the SLP node representing the multiplication).

So the test is no longer testing vectorization of multiplications.

It also shows that BB vectorizing this function at strict basic-block
boundaries is suboptimal.

I'll see what to best do here, clearly a sort-term fix would be to
change the code to make vectorization of the multiplication more
obviously profitable.

Reply via email to