https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115690

            Bug ID: 115690
           Summary: Strange codegen for small fixed-size `memcpy` when
                    targeting `-march=i486`
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: arcata at gmail dot com
  Target Milestone: ---

Given the following C code:

```
void *memcpy(void *a, const void *b, unsigned long c);

void foo(unsigned *x, unsigned *y) {
    memcpy(x, y, 16);
}
```

Using gcc 14.1, `gcc -m32 -march=i486 -O2` produces the following assembly:

```
foo:
        push    edi
        push    esi
        mov     ecx, DWORD PTR [esp+12]
        mov     esi, DWORD PTR [esp+16]
        mov     eax, DWORD PTR [esi]
        mov     DWORD PTR [ecx], eax
        mov     eax, DWORD PTR [esi+12]
        mov     DWORD PTR [ecx+12], eax
        lea     edi, [ecx+4]
        and     edi, -4
        sub     ecx, edi
        sub     esi, ecx
        add     ecx, 16
        shr     ecx, 2
        rep movsd
        pop     esi
        pop     edi
        ret
```

While not wrong, this seems suboptimal compared to either using `rep movsd` to
do the entire memcpy or breaking it down into four 32-bit loads and stores.
`-march=i386` does the former:

```
foo:
        push    edi
        push    esi
        mov     esi, DWORD PTR [esp+16]
        mov     ecx, 4
        mov     edi, DWORD PTR [esp+12]
        rep movsd
        pop     esi
        pop     edi
        ret
```

and `-march=i586` does the latter:

```
foo:
        mov     edx, DWORD PTR [esp+8]
        mov     eax, DWORD PTR [esp+4]
        mov     ecx, DWORD PTR [edx]
        mov     DWORD PTR [eax], ecx
        mov     ecx, DWORD PTR [edx+4]
        mov     DWORD PTR [eax+4], ecx
        mov     ecx, DWORD PTR [edx+8]
        mov     DWORD PTR [eax+8], ecx
        mov     edx, DWORD PTR [edx+12]
        mov     DWORD PTR [eax+12], edx
        ret
```

either of which seems like it would better suit the i486 microarchitecture than
the hybrid approach it seems to be taking.

Reply via email to