http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45903

           Summary: unnecessary load of 32/64bit variable when only 8 bits
                    are needed
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: zso...@seznam.cz


Created attachment 21967
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=21967
testcase for both cases

For the following testcase:

uint8_t f64(uint64_t a, uint64_t b)
{
    return (a >> 8) + (b >> 8);
}

gcc (r164716) generates this code:

$ gcc -O3 -S -m32 tst10.s tst10.c -masm=intel

f64:
        push    ebx
        mov     ecx, DWORD PTR [esp+8]
        mov     ebx, DWORD PTR [esp+12]
        mov     eax, DWORD PTR [esp+16]
        mov     edx, DWORD PTR [esp+20]
        shrd    ecx, ebx, 8
        pop     ebx
        shrd    eax, edx, 8
        add     eax, ecx
        ret

while it could use just something like:
f64:
mov al, DWORD PTR [esp+5]
add al, DWORD PTR [esp+9]
ret


The situation is better for 32bit case:

uint8_t f32(uint32_t a, uint32_t b)
{
    return (a >> 8) + (b >> 8);
}

where gcc generates:
$ gcc -O3 -S -m32 tst10.s tst10.c -masm=intel

f32:
        mov     eax, DWORD PTR [esp+4]
        mov     edx, DWORD PTR [esp+8]
        shr     eax, 8
        shr     edx, 8
        add     eax, edx
        ret

while it could generate the same code as for f64:
f32:
mov al, DWORD PTR [esp+5]
add al, DWORD PTR [esp+9]
ret

Reply via email to