https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- We now generate g: .LFB0: .cfi_startproc pxor %xmm1, %xmm1 addl $1, %edi movaps %xmm1, %xmm0 cvtsi2ss %edi, %xmm0 shufps $36, %xmm0, %xmm1 movaps %xmm1, %xmm0 cvtsi2ss %edi, %xmm0 shufps $196, %xmm0, %xmm1 movaps %xmm1, %xmm0 unpcklps %xmm1, %xmm0 cvtsi2ss %edi, %xmm0 shufps $225, %xmm1, %xmm0 cvtsi2ss %edi, %xmm0 ret or with SSE4 g: .LFB0: .cfi_startproc addl $1, %edi pxor %xmm1, %xmm1 pxor %xmm0, %xmm0 cvtsi2ss %edi, %xmm1 insertps $48, %xmm1, %xmm0 insertps $32, %xmm1, %xmm0 insertps $16, %xmm1, %xmm0 movss %xmm1, %xmm0 ret on GIMPLE we end up with g (int x) { float4 W; int _1; float _2; <bb 2> [local count: 1073741824]: _1 = x_3(D) + 1; _2 = (float) _1; W_6 = BIT_INSERT_EXPR <W_5(D), _2, 96 (32 bits)>; W_7 = BIT_INSERT_EXPR <W_6, _2, 64 (32 bits)>; W_8 = BIT_INSERT_EXPR <W_7, _2, 32 (32 bits)>; W_9 = BIT_INSERT_EXPR <W_8, _2, 0 (32 bits)>; return W_9; so we miss to recognize the splat. The GIMPLE looks like this very early already (update-address-taken + forwprop). SLP vectorization doesn't treat a BIT_INSERT_EXPR "reduction" as sink but we could probably pattern-match a VEC_DUPLICATE_EXPR for the above.