https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68558
Bug ID: 68558 Summary: Fails to SLP loop Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Blocks: 53947 Target Milestone: --- void IMB_double_fast_x (int *destf, int *dest, int y, int *p1f) { int i; for (i = y; i > 0; i--) { *dest++ = 0; destf[0] = p1f[0]; destf[1] = p1f[1]; destf[2] = p1f[2]; destf[3] = p1f[3]; destf[4] = p1f[8]; destf[5] = p1f[9]; destf[6] = p1f[10]; destf[7] = p1f[11]; destf += 8; p1f += 12; } } fails to SLP because of t.c:4:3: note: Detected interleaving store of size 8 starting with *destf_37 = _13; t.c:4:3: note: Detected interleaving load of size 12 starting with _13 = *p1f_39; t.c:4:3: note: Data access with gaps requires scalar epilogue loop ... t.c:4:3: note: Build SLP failed: the number of interleaved loads is greater than the SLP group size _13 = *p1f_39; splitting the load group doesn't help because then we'll hit t.c:4:3: note: Build SLP failed: differen interleaving chains in one node splitting the store group to vector-size pieces would generally make sense but may have interesting effects on SLP discovery like w/o also splitting loads will hit the first issue above. The best fix would be to lift the above restrictions and let permutation support decide whether it can create the required loads or not. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations