http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676



             Bug #: 56676

           Summary: unnecesary splitted load when using avx2

    Classification: Unclassified

           Product: gcc

           Version: 4.7.1

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: nel...@seznam.cz





Compile notorious example

int foo(int *a,int *b){

  int i;

  int r=0;

 for(i=0;i<32;i++) r+= a[i]*b[i];

  return r;

}

with -O3 -mavx2. gcc generates code that is suboptimal in several ways.

Part relevant to this bug is spliting 32byte load into two 16byte loads.



.L5:

  vmovdqu (%r8,%rdx), %xmm1

  addl  $1, %ecx

  vinserti128 $0x1, 16(%r8,%rdx), %ymm1, %ymm1

  vpmulld (%rbx,%rdx), %ymm1, %ymm1

Reply via email to