http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-13 15:58:31 UTC --- The following testcase shows the issue well: _Complex double self[1024]; _Complex double a[1024][1024]; _Complex double b[1024]; void foo (void) { int i, j; for (i = 0; i < 1024; i+=3) for (j = 0; j < 1024; j+=3) self[i] = self[i] + a[i][j]*b[j]; } we have to get the complex multiplication pattern recognized by SLP which looks like (without PRE): <bb 3>: <bb 4>: # j_21 = PHI <j_13(3), 0(7)> # self_I_RE_lsm.2_12 = PHI <_26(3), self_I_RE_lsm.2_7(7)> # self_I_IM_lsm.3_28 = PHI <_27(3), self_I_IM_lsm.3_8(7)> # ivtmp_16 = PHI <ivtmp_1(3), 342(7)> _2 = REALPART_EXPR <a[i_20][j_21]>; _18 = IMAGPART_EXPR <a[i_20][j_21]>; _19 = REALPART_EXPR <b[j_21]>; _17 = IMAGPART_EXPR <b[j_21]>; _4 = _19 * _2; _3 = _18 * _17; _6 = _17 * _2; _23 = _19 * _18; _24 = _4 - _3; _25 = _23 + _6; _26 = _24 + self_I_RE_lsm.2_12; _27 = _25 + self_I_IM_lsm.3_28; j_13 = j_21 + 3; ivtmp_1 = ivtmp_16 - 1; if (ivtmp_1 != 0) goto <bb 3>; we fail to build the SLP tree for _25 = _23 + _6 because the matching stmt is _24 = _4 - _3 which has a different operation (SSE4 addsub would support vectorizing this). I don't see how we can easily make this supported with the current pattern support ... the support doesn't allow tieing together two SLP group members. Simply allowing analysis to proceeed here reveals the fact that the interleaving has a gap of 6 which makes the analysis fail. Allowing it to proceed for ncopies == 1 (thus no actual interleaving required) reveals the next check is slightly bogus in that case. Fixing that ends us with t.c:9: note: Load permutation 0 0 1 0 1 1 0 1 t.c:9: note: Build SLP failed: unsupported load permutation _27 = _25 + self_I_IM_lsm.3_28; ... (to be continued)