Martin v. Löwis <mar...@v.loewis.de> added the comment: Marc-Andre: gcc will normally not unroll loops, unless -funroll-loops is given on the command line. Then, it will unroll many loops, and do so with 8 iterations per outer loop. This typically causes significant code bloat, which is why unrolling is normally disabled and left to the programmer.
For those who want to experiment with this, I attach a C file with just the code in question. Compile this with your favorite compiler settings, and see what the compile generates. clang, on an x64 system, compiles the original loop into LBB0_2: ## =>This Inner Loop Header: Depth=1 movzbl (%rdi), %eax movw %ax, (%rdx) incq %rdi addq $2, %rdx decq %rsi jne LBB0_2 and the unrolled loop into LBB1_2: ## %.lr.ph6 ## =>This Inner Loop Header: Depth=1 movzbl (%rdi,%rcx), %r8d movw %r8w, (%rdx) movzbl 1(%rdi,%rcx), %r8d movw %r8w, 2(%rdx) movzbl 2(%rdi,%rcx), %r8d movw %r8w, 4(%rdx) movzbl 3(%rdi,%rcx), %r8d movw %r8w, 6(%rdx) addq $8, %rdx addq $4, %rcx cmpq %rax, %rcx jl LBB1_2 ---------- nosy: +loewis Added file: http://bugs.python.org/file23353/unroll.c _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13136> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com