Martin v. Löwis <mar...@v.loewis.de> added the comment:

Marc-Andre: gcc will normally not unroll loops, unless -funroll-loops is given 
on the command line. Then, it will unroll many loops, and do so with 8 
iterations per outer loop. This typically causes significant code bloat, which 
is why unrolling is normally disabled and left to the programmer.

For those who want to experiment with this, I attach a C file with just the 
code in question. Compile this with your favorite compiler settings, and see 
what the compile generates. clang, on an x64 system, compiles the original loop 
into


LBB0_2:                                 ## =>This Inner Loop Header: Depth=1
        movzbl  (%rdi), %eax
        movw    %ax, (%rdx)
        incq    %rdi
        addq    $2, %rdx
        decq    %rsi
        jne     LBB0_2

and the unrolled loop into

LBB1_2:                                 ## %.lr.ph6
                                        ## =>This Inner Loop Header: Depth=1
        movzbl  (%rdi,%rcx), %r8d
        movw    %r8w, (%rdx)
        movzbl  1(%rdi,%rcx), %r8d
        movw    %r8w, 2(%rdx)
        movzbl  2(%rdi,%rcx), %r8d
        movw    %r8w, 4(%rdx)
        movzbl  3(%rdi,%rcx), %r8d
        movw    %r8w, 6(%rdx)
        addq    $8, %rdx
        addq    $4, %rcx
        cmpq    %rax, %rcx
        jl      LBB1_2

----------
nosy: +loewis
Added file: http://bugs.python.org/file23353/unroll.c

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13136>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to