https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101178
Bug ID: 101178 Summary: SLP permute propagation doesn't handle VEC_PERM Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- The current permute propagation code simply treats VEC_PERM nodes as materialization points (they can consume incoming permutes) but it does neither handle them as sources for permutes nor does it consider propagating a common source permute through itself. The latter can be seen for double x[2], y[2], z[2], w[2]; void foo () { double tem0 = x[1] + y[1]; double tem1 = x[0] - y[0]; double tem2 = z[1] * tem0; double tem3 = z[0] * tem1; z[0] = tem2 - w[0]; z[1] = tem3 + w[1]; } where we do not end up materializing the x[], y[] and w[] permute at the last +- node but instead materialize at the first +- node and thus end up with incoming permute differences at the second +- one: <bb 2> [local count: 1073741824]: _21 = &x[1] + 18446744073709551608; vect__3.9_22 = MEM <vector(2) double> [(double *)_21]; _1 = x[1]; _23 = &y[1] + 18446744073709551608; vect__4.12_24 = MEM <vector(2) double> [(double *)_23]; vect_tem1_13.14_26 = vect__3.9_22 - vect__4.12_24; vect_tem0_12.13_25 = vect__3.9_22 + vect__4.12_24; _27 = VEC_PERM_EXPR <vect_tem0_12.13_25, vect_tem1_13.14_26, { 1, 2 }>; _2 = y[1]; tem0_12 = _1 + _2; _3 = x[0]; _4 = y[0]; tem1_13 = _3 - _4; _18 = &z[1] + 18446744073709551608; vect__5.5_19 = MEM <vector(2) double> [(double *)_18]; vect__6.6_20 = VEC_PERM_EXPR <vect__5.5_19, vect__5.5_19, { 1, 0 }>; vect_tem2_14.15_28 = vect__6.6_20 * _27; _5 = z[1]; tem2_14 = _5 * tem0_12; _6 = z[0]; tem3_15 = _6 * tem1_13; vect__7.18_29 = MEM <vector(2) double> [(double *)&w]; vect__10.20_31 = vect_tem2_14.15_28 + vect__7.18_29; vect__8.19_30 = vect_tem2_14.15_28 - vect__7.18_29; _32 = VEC_PERM_EXPR <vect__8.19_30, vect__10.20_31, { 0, 3 }>; _7 = w[0]; _8 = tem2_14 - _7; _9 = w[1]; _10 = _9 + tem3_15; MEM <vector(2) double> [(double *)&z] = _32; The permute vect__6.6_20 = VEC_PERM_EXPR <vect__5.5_19, vect__5.5_19, { 1, 0 }> could have been elided.