http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target| |x86_64-*-* Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2013-09-23 Depends on| |53947 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Heh ;) I suppose this started with BIT_FIELD_REF support in SLP, 4.8 didn't vectorize this at all. Note that with for example typedef float float4 __attribute__((vector_size(16))); float4 g(int x) { float4 W; W[0]=W[1]=x+1; W[2]=x+2; W[3]=x+3; return W; } vectorizing two same operations may be profitable. But yes, if all scalars are the same there is no point to do it. And the cost model should have disabled it as well (though likely the four "stores" made it profitable in the end). I will have a look at some point. OTOH generated code is g: .LFB0: .cfi_startproc movl %edi, -12(%rsp) movd -12(%rsp), %xmm1 pshufd $0, %xmm1, %xmm0 paddd .LC0(%rip), %xmm0 cvtdq2ps %xmm0, %xmm0 ret vs. -fno-tree-vectorize: g: .LFB0: .cfi_startproc xorps %xmm1, %xmm1 addl $1, %edi xorps %xmm0, %xmm0 cvtsi2ss %edi, %xmm1 movaps %xmm0, %xmm2 movss %xmm1, %xmm2 shufps $36, %xmm2, %xmm0 movaps %xmm0, %xmm2 movss %xmm1, %xmm2 shufps $196, %xmm2, %xmm0 movaps %xmm0, %xmm2 unpcklps %xmm0, %xmm0 movss %xmm1, %xmm0 shufps $225, %xmm2, %xmm0 movss %xmm1, %xmm0 ret so clearly a win, but improvable to sth like addl $1, %edi cvtsi2ss %edi, %xmm1 pshufd $0, %xmm1, %xmm0 the above also shows that vector init by BIT_FIELD_REF is not expanded very well (sth for a generalized vector shuffle recognition in the bswap pass).