http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855



             Bug #: 54855

           Summary: Unnecessary duplication when performing scalar

                    operation on vector element

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: tree-optimization

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: drepper....@gmail.com





Take the following code:





#include <stdio.h>



typedef double v2df __attribute__((vector_size(16)));



int

main(int argc, char *argv[])

{

  v2df v = { 2.0, 2.0 };

  v2df v2 = { 2.0, 2.0 };

  while (argc-- > 1)

    {

      v[0] -= 1.0;

      v *= v2;

    }

  printf("%g\n", v[0] + v[1]);

  return 0;

}



It compiles as C and C++, both compilers behave the same.



When compiling on x86-64 (therefore with SSE enabled) it generates for the loop

this code:





  4003f0:       66 0f 28 c1             movapd %xmm1,%xmm0

  4003f4:       83 e8 01                sub    $0x1,%eax

  4003f7:       f2 0f 5c c2             subsd  %xmm2,%xmm0

  4003fb:       f2 0f 10 c8             movsd  %xmm0,%xmm1

  4003ff:       66 0f 58 c9             addpd  %xmm1,%xmm1

  400403:       75 eb                   jne    4003f0 <main+0x20>





I.e., the value is pulled out of the vector, the subtraction is performed, and

then the scalar value is put back into the vector.



Instead the following sequence would have been completely sufficient:



sub    $0x1,%eax

subsd  %xmm2,%xmm1

addpd  %xmm1,%xmm1

jne    ...back



The subsd instruction doesn't touch the high parts of the register.





I know this is a special case, it only works if the scalar operation is for the

element zero of the vector.  But code can be designed like that.  I have some

code which would work nicely like this.  I don't know whether this translates

to other architectures as well.

Reply via email to