https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- So we fail to vectorize the outer loop (with double reduction) because of t.c:7:25: note: === vect_analyze_data_ref_accesses === t.c:7:25: note: Detected interleaving load _7->red and _7->green t.c:7:25: note: Detected interleaving load _7->red and _7->blue t.c:7:25: note: grouped access in outer loop. t.c:7:25: missed: not vectorized: complicated access pattern. for vectorizing the inner loop SLP discovery fails because of a not grouped load - r[u].{red,green,blue} is handled but k[u] not - I think this is a well-known limitation (that ought to be fixed). We then vectorize the loop with interleaving and peeling for gaps, but profitability says 'width' needs to be 16. We also vectorize the epilog. I suppose the vectorized body isn't entered? Note outer loop vectorization likely isn't profitable even if implemented, so the SLP failure is the thing to fix (which should be easy). Need to find the duplicate bug for this.