[Bug tree-optimization/58497] SLP vectorizes identical operations

rguenth at gcc dot gnu.org Mon, 23 Sep 2013 01:34:47 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2013-09-23
         Depends on|                            |53947
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Heh ;)  I suppose this started with BIT_FIELD_REF support in SLP, 4.8 didn't
vectorize this at all.

Note that with for example

typedef float float4 __attribute__((vector_size(16)));

float4 g(int x)
{
  float4 W;
  W[0]=W[1]=x+1;
  W[2]=x+2;
  W[3]=x+3;
  return W;
}

vectorizing two same operations may be profitable.  But yes, if all
scalars are the same there is no point to do it.  And the cost model
should have disabled it as well (though likely the four "stores"
made it profitable in the end).

I will have a look at some point.

OTOH generated code is

g:
.LFB0:
        .cfi_startproc
        movl    %edi, -12(%rsp)
        movd    -12(%rsp), %xmm1
        pshufd  $0, %xmm1, %xmm0
        paddd   .LC0(%rip), %xmm0
        cvtdq2ps        %xmm0, %xmm0
        ret

vs. -fno-tree-vectorize:

g:
.LFB0:
        .cfi_startproc
        xorps   %xmm1, %xmm1
        addl    $1, %edi
        xorps   %xmm0, %xmm0
        cvtsi2ss        %edi, %xmm1
        movaps  %xmm0, %xmm2
        movss   %xmm1, %xmm2
        shufps  $36, %xmm2, %xmm0
        movaps  %xmm0, %xmm2
        movss   %xmm1, %xmm2
        shufps  $196, %xmm2, %xmm0
        movaps  %xmm0, %xmm2
        unpcklps        %xmm0, %xmm0
        movss   %xmm1, %xmm0
        shufps  $225, %xmm2, %xmm0
        movss   %xmm1, %xmm0
        ret

so clearly a win, but improvable to sth like

        addl    $1, %edi
        cvtsi2ss        %edi, %xmm1
        pshufd  $0, %xmm1, %xmm0

the above also shows that vector init by BIT_FIELD_REF is not expanded
very well (sth for a generalized vector shuffle recognition in the bswap pass).

[Bug tree-optimization/58497] SLP vectorizes identical operations

Reply via email to