https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66051
Bug ID: 66051 Summary: can't vectorize reductions inside an SLP group Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Blocks: 53947 Target Milestone: --- void foo (int *p, short *q, int n) { int i; for (i = 0; i < n; ++i) { p[i*4+0] = q[i*8+0] + q[i*8+4]; p[i*4+1] = q[i*8+1] + q[i*8+5]; p[i*4+2] = q[i*8+2] + q[i*8+6]; p[i*4+3] = q[i*8+3] + q[i*8+7]; } } is vectorized by unrolling the loop 4 times instead of using SLP because t3.c:4:3: note: Build SLP for _15 = *_14; t3.c:4:3: note: Build SLP failed: grouped loads have gaps _15 = *_14; which isn't the whole story (I don't think we support this kind of "reductions"). The SLP build runs into two load children, one loading [0, 3] and one loading [4, 7] of a single group (and thus producing gaps). Ideally we vectorize this with a single load of [0, 7], shuffle, add and only unpack the low part. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations