[Bug other/46847] Missed optimization for variant of Duff's device

2010-12-08 Thread jjk at acm dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46847

--- Comment #1 from Jens Kilian jjk at acm dot org 2010-12-08 09:12:07 UTC ---
Created attachment 22681
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22681
Source code and generated assembly


[Bug other/46847] Missed optimization for variant of Duff's device

2010-12-08 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46847

--- Comment #2 from Richard Guenther rguenth at gcc dot gnu.org 2010-12-08 
12:02:09 UTC ---
GCC 4.5 doesn't move end computation inside the loop.  What do you expect
to be good code?  I get at -O[23]:

foo:
.LFB0:
mov %esi, %ecx
subl%edi, %esi
mov %edi, %eax
andl$7, %esi
leaq(%rdx,%rax,8), %rax
leaq(%rdx,%rcx,8), %rdx
jmp *.L10(,%rsi,8)
.section.rodata
.align 8
.align 4
.L10:
.quad   .L2
.quad   .L3
.quad   .L4
.quad   .L5
.quad   .L6
.quad   .L7
.quad   .L8
.quad   .L9
.text
.p2align 4,,10
.p2align 3
.L9:
movq$0, (%rax)
addq$8, %rax
.L8:
movq$0, (%rax)
addq$8, %rax
.L7:
movq$0, (%rax)
addq$8, %rax
.L6:
movq$0, (%rax)
addq$8, %rax
.L5:
movq$0, (%rax)
addq$8, %rax
.L4:
movq$0, (%rax)
addq$8, %rax
.L3:
movq$0, (%rax)
addq$8, %rax
.L2:
movq$0, (%rax)
addq$8, %rax
cmpq%rax, %rdx
jae .L9
rep
ret

Similar code is generated by ICC 11.1.

Duffs device may be a fun thing from a C language perspective, but it
is a bad thing in general because you defy most
loop optimizations as it is a loop with multiple entries (which means
the loop isn't recognized as a loop by GCC).


[Bug other/46847] Missed optimization for variant of Duff's device

2010-12-08 Thread jjk at acm dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46847

--- Comment #3 from Jens Kilian jjk at acm dot org 2010-12-08 12:10:02 UTC ---
(In reply to comment #2)
 GCC 4.5 doesn't move end computation inside the loop.  What do you expect
 to be good code?  I get at -O[23]:

[snip]

The code generated by 4.5 is what I would have expected.
Feel free to close this as fixed in the latest version.

 Duffs device may be a fun thing from a C language perspective, but it
 is a bad thing in general because you defy most
 loop optimizations as it is a loop with multiple entries (which means
 the loop isn't recognized as a loop by GCC).

I usually wouldn't consider using it, I just noticed the problem while
trying to tune some heavily used parts of our code.

Thanks,
Jens.