https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115690
Bug ID: 115690 Summary: Strange codegen for small fixed-size `memcpy` when targeting `-march=i486` Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: arcata at gmail dot com Target Milestone: --- Given the following C code: ``` void *memcpy(void *a, const void *b, unsigned long c); void foo(unsigned *x, unsigned *y) { memcpy(x, y, 16); } ``` Using gcc 14.1, `gcc -m32 -march=i486 -O2` produces the following assembly: ``` foo: push edi push esi mov ecx, DWORD PTR [esp+12] mov esi, DWORD PTR [esp+16] mov eax, DWORD PTR [esi] mov DWORD PTR [ecx], eax mov eax, DWORD PTR [esi+12] mov DWORD PTR [ecx+12], eax lea edi, [ecx+4] and edi, -4 sub ecx, edi sub esi, ecx add ecx, 16 shr ecx, 2 rep movsd pop esi pop edi ret ``` While not wrong, this seems suboptimal compared to either using `rep movsd` to do the entire memcpy or breaking it down into four 32-bit loads and stores. `-march=i386` does the former: ``` foo: push edi push esi mov esi, DWORD PTR [esp+16] mov ecx, 4 mov edi, DWORD PTR [esp+12] rep movsd pop esi pop edi ret ``` and `-march=i586` does the latter: ``` foo: mov edx, DWORD PTR [esp+8] mov eax, DWORD PTR [esp+4] mov ecx, DWORD PTR [edx] mov DWORD PTR [eax], ecx mov ecx, DWORD PTR [edx+4] mov DWORD PTR [eax+4], ecx mov ecx, DWORD PTR [edx+8] mov DWORD PTR [eax+8], ecx mov edx, DWORD PTR [edx+12] mov DWORD PTR [eax+12], edx ret ``` either of which seems like it would better suit the i486 microarchitecture than the hybrid approach it seems to be taking.