https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92177
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2019-10-22 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Ah, thanks - that might be an undesided side-effect of r277241 We now vectorize out[0] = a0 * x; out[1] = a1 * y; out[2] = a2 * x; out[3] = a3 * y; as _5 = a0_40 * x_44(D); _6 = a1_41 * y_45(D); _7 = a2_42 * x_44(D); _8 = a3_43 * y_45(D); _66 = {_7, _8}; vect_cst__67 = _66; _68 = {_5, _6}; vect_cst__69 = _68; MEM <vector(2) unsigned int> [(unsigned int *)&out] = vect_cst__69; _71 = &out[0] + 8; MEM <vector(2) unsigned int> [(unsigned int *)_71] = vect_cst__67; since we're no longer fencing the build-from-scalar code via && !SLP_TREE_CHILDREN (child).is_empty () (previously we had no SLP children nodes for the SLP node representing the multiplication). So the test is no longer testing vectorization of multiplications. It also shows that BB vectorizing this function at strict basic-block boundaries is suboptimal. I'll see what to best do here, clearly a sort-term fix would be to change the code to make vectorization of the multiplication more obviously profitable.