[Bug middle-end/30201] gcc doesn't unroll nested loops

bangerth at dealii dot org Thu, 14 Dec 2006 07:36:07 -0800


------- Comment #8 from bangerth at dealii dot org  2006-12-14 15:35 -------
Here is an analysis of the assembler code we get when using
my first command line in my previous comment, i.e. no hand unrolling.
I'm using 4.1.0, btw.


The main loop looks like this:
--------------------------
.L2:
        pushl   %edx            // push 'factor'
        xorl    %eax, %eax      // eax=0
        fildl   (%esp)          // st(0)=(double)factor
        addl    $1, %edx        // ++factor
        fstl    data            // data[0]=factor
        movl    %eax, (%esp)    // push 0
        fildl   (%esp)          // st(0)=0
        addl    $4, %esp
        cmpl    $1000000000, %edx
        fstl    data+24         // data[3]=0
        fstl    data+48         // data[6]=0
        fstl    data+8          // data[1]=0
        fxch    %st(1)          // st(0)=factor
        fstl    data+32         // data[4]=factor
        fxch    %st(1)          // st(0)=0
        fstl    data+56         // data[7]=0
        fstl    data+16         // data[2]=0
        fstpl   data+40         // data[5]=0; st(0)=factor
        fstpl   data+64         // data[8]=factor
        jne     .L2
---------------------

I can find several things wrong with this:
a/ the sequence
    xorl        %eax, %eax
    movl        %eax, (%esp)
    fildl       (%esp)
   could certainly be made more efficient by using fldz.
b/ I find the use of fstpl at the end of the loop quite ingenious, since
   it avoids another fxch. However, the two uses of fxch in the middle
   may nevertheless be avoided if we manage to realize that we can
   reorder all those stores. 

So, in summary, it is not that gcc doesn't realize that it can unroll
these loops -- it actually does that, the slowdown comes from other places.

W.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30201

[Bug middle-end/30201] gcc doesn't unroll nested loops

Reply via email to