https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot 
gnu.org

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
We now generate

g:
.LFB0:
        .cfi_startproc
        pxor    %xmm1, %xmm1
        addl    $1, %edi
        movaps  %xmm1, %xmm0
        cvtsi2ss        %edi, %xmm0
        shufps  $36, %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        cvtsi2ss        %edi, %xmm0
        shufps  $196, %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        unpcklps        %xmm1, %xmm0
        cvtsi2ss        %edi, %xmm0
        shufps  $225, %xmm1, %xmm0
        cvtsi2ss        %edi, %xmm0
        ret

or with SSE4

g:
.LFB0:
        .cfi_startproc
        addl    $1, %edi
        pxor    %xmm1, %xmm1
        pxor    %xmm0, %xmm0
        cvtsi2ss        %edi, %xmm1
        insertps        $48, %xmm1, %xmm0
        insertps        $32, %xmm1, %xmm0
        insertps        $16, %xmm1, %xmm0
        movss   %xmm1, %xmm0
        ret

on GIMPLE we end up with

g (int x)
{
  float4 W;
  int _1;
  float _2;

  <bb 2> [local count: 1073741824]:
  _1 = x_3(D) + 1;
  _2 = (float) _1;
  W_6 = BIT_INSERT_EXPR <W_5(D), _2, 96 (32 bits)>;
  W_7 = BIT_INSERT_EXPR <W_6, _2, 64 (32 bits)>;
  W_8 = BIT_INSERT_EXPR <W_7, _2, 32 (32 bits)>;
  W_9 = BIT_INSERT_EXPR <W_8, _2, 0 (32 bits)>;
  return W_9;

so we miss to recognize the splat.  The GIMPLE looks like this very early
already (update-address-taken + forwprop).  SLP vectorization
doesn't treat a BIT_INSERT_EXPR "reduction" as sink but we could probably
pattern-match a VEC_DUPLICATE_EXPR for the above.

Reply via email to