[Bug other/46847] Missed optimization for variant of Duff's device
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46847 --- Comment #1 from Jens Kilian jjk at acm dot org 2010-12-08 09:12:07 UTC --- Created attachment 22681 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22681 Source code and generated assembly
[Bug other/46847] Missed optimization for variant of Duff's device
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46847 --- Comment #2 from Richard Guenther rguenth at gcc dot gnu.org 2010-12-08 12:02:09 UTC --- GCC 4.5 doesn't move end computation inside the loop. What do you expect to be good code? I get at -O[23]: foo: .LFB0: mov %esi, %ecx subl%edi, %esi mov %edi, %eax andl$7, %esi leaq(%rdx,%rax,8), %rax leaq(%rdx,%rcx,8), %rdx jmp *.L10(,%rsi,8) .section.rodata .align 8 .align 4 .L10: .quad .L2 .quad .L3 .quad .L4 .quad .L5 .quad .L6 .quad .L7 .quad .L8 .quad .L9 .text .p2align 4,,10 .p2align 3 .L9: movq$0, (%rax) addq$8, %rax .L8: movq$0, (%rax) addq$8, %rax .L7: movq$0, (%rax) addq$8, %rax .L6: movq$0, (%rax) addq$8, %rax .L5: movq$0, (%rax) addq$8, %rax .L4: movq$0, (%rax) addq$8, %rax .L3: movq$0, (%rax) addq$8, %rax .L2: movq$0, (%rax) addq$8, %rax cmpq%rax, %rdx jae .L9 rep ret Similar code is generated by ICC 11.1. Duffs device may be a fun thing from a C language perspective, but it is a bad thing in general because you defy most loop optimizations as it is a loop with multiple entries (which means the loop isn't recognized as a loop by GCC).
[Bug other/46847] Missed optimization for variant of Duff's device
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46847 --- Comment #3 from Jens Kilian jjk at acm dot org 2010-12-08 12:10:02 UTC --- (In reply to comment #2) GCC 4.5 doesn't move end computation inside the loop. What do you expect to be good code? I get at -O[23]: [snip] The code generated by 4.5 is what I would have expected. Feel free to close this as fixed in the latest version. Duffs device may be a fun thing from a C language perspective, but it is a bad thing in general because you defy most loop optimizations as it is a loop with multiple entries (which means the loop isn't recognized as a loop by GCC). I usually wouldn't consider using it, I just noticed the problem while trying to tune some heavily used parts of our code. Thanks, Jens.