https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target| |x86_64-*-* i?86-*-* CC| |hubicka at gcc dot gnu.org, | |rguenth at gcc dot gnu.org --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- The strange code is because we perform basic-block vectorization resulting in vect_cst__249 = {_251, _251, _251, _251, _334, _334, _334, _334, _417, _417, _417, _417, _48, _48, _48, _48}; MEM[(unsigned int *)&tmp] = vect_cst__249; _186 = tmp[0][0]; _185 = tmp[1][0]; ... which for some reason is deemed profitable: t.c:32:12: note: Cost model analysis: Vector inside of basic block cost: 24 Vector prologue cost: 64 Vector epilogue cost: 0 Scalar cost of basic block: 192 t.c:32:12: note: Basic block will be vectorized using SLP what is odd is that the single vector store is costed 24 while the 16 scalar int stores are costed 192. The vector build from scalar costs 64. I guess Honzas cost-model tweaks might have gone wrong here or we're hitting an oddity in the SLP costing. Even if it looks strange maybe the sequence _is_ profitable? The second loop would be vectorized if 'sum' was unsigned.