------- Comment #4 from steven at gcc dot gnu dot org 2006-08-28 13:59 ------- >From the hammer branch for AMD64:
.globl f .type f, @function f: .LFB4: testl %edi, %edi movl %esi, %eax jne .L3 movl %edx, %esi movl %edx, %eax .L3: leal (%rax,%rsi), %eax ret .LFE4: .size f, .-f .p2align 4,,15 .globl f1 .type f1, @function f1: .LFB5: testl %edi, %edi cmove %edx, %esi leal (%rsi,%rsi), %eax ret .LFE5: .size f1, .-f1 And from gcc 4.2 20060818: .globl f .type f, @function f: .LFB2: testl %edi, %edi movl %esi, %eax cmove %edx, %esi cmove %esi, %eax addl %esi, %eax ret .LFE2: .size f, .-f .p2align 4,,15 .globl f1 .type f1, @function f1: .LFB3: testl %edi, %edi cmove %edx, %esi leal (%rsi,%rsi), %eax ret .LFE3: .size f1, .-f1 So not all gcc3 releases do so well. Are there GCC releases that optimize the two functions to identical code? In any case, this is a missed optimization. I suppose the trick in this case is to recognise that "c + d" == "c + c" (perhaps during value numbering?), but the first step to analyze this bug would be to figure out where gcc3 (supposedly) performs this optimization. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28868