http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46847
--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-12-08 12:02:09 UTC --- GCC 4.5 doesn't move end computation inside the loop. What do you expect to be "good" code? I get at -O[23]: foo: .LFB0: mov %esi, %ecx subl %edi, %esi mov %edi, %eax andl $7, %esi leaq (%rdx,%rax,8), %rax leaq (%rdx,%rcx,8), %rdx jmp *.L10(,%rsi,8) .section .rodata .align 8 .align 4 .L10: .quad .L2 .quad .L3 .quad .L4 .quad .L5 .quad .L6 .quad .L7 .quad .L8 .quad .L9 .text .p2align 4,,10 .p2align 3 .L9: movq $0, (%rax) addq $8, %rax .L8: movq $0, (%rax) addq $8, %rax .L7: movq $0, (%rax) addq $8, %rax .L6: movq $0, (%rax) addq $8, %rax .L5: movq $0, (%rax) addq $8, %rax .L4: movq $0, (%rax) addq $8, %rax .L3: movq $0, (%rax) addq $8, %rax .L2: movq $0, (%rax) addq $8, %rax cmpq %rax, %rdx jae .L9 rep ret Similar code is generated by ICC 11.1. Duffs device may be a fun thing from a C language perspective, but it is a bad thing in general because you defy most loop optimizations as it is a loop with multiple entries (which means the loop isn't recognized as a loop by GCC).