https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102877
Bug ID: 102877 Summary: missed optimization: memcpy produces lots more asm than otherwise Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jengelh at inai dot de Target Milestone: --- Input (C++) =========== struct GLOBCNT { unsigned char ab[6]; }; unsigned long long gc_to_num(GLOBCNT gc) { unsigned long long value; auto v = reinterpret_cast<unsigned char *>(&value); v[0] = 0; v[1] = 0; #ifdef WITH_MEMCPY __builtin_memcpy(v + 2, gc.ab, 6); #else v[2] = gc.ab[0]; v[3] = gc.ab[1]; v[4] = gc.ab[2]; v[5] = gc.ab[3]; v[6] = gc.ab[4]; v[7] = gc.ab[5]; #endif if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) value = __builtin_bswap64(value); return value; } I hope this is UB-free. Observed behavior ================= The use of memcpy/__builtin_memcpy produces a function with 28 instructions/0x5c bytes long. ► g++ -O2 -c t3.cpp -Wall -DWITH_MEMCPY -v Target: x86_64-suse-linux gcc version 11.2.1 20210816 [revision 056e324ce46a7924b5cf10f61010cf9dd2ca10e9] (SUSE Linux) ► objdump -Mintel -d t3.o 0000000000000000 <_Z9gc_to_num7GLOBCNT>: 0: 89 f8 mov eax,edi 2: 89 f9 mov ecx,edi 4: 89 fa mov edx,edi 6: 44 0f b6 c7 movzx r8d,dil a: c1 e9 10 shr ecx,0x10 d: 0f b6 f4 movzx esi,ah ... 5c: c3 ret Expected behavior ================= ► g++ -O2 -c t3.cpp -Wall -UWITH_MEMCPY ► objdump -Mintel -d t3.o 0000000000000000 <_Z9gc_to_num7GLOBCNT>: 0: 0f b7 c7 movzx eax,di 3: 48 c1 ef 10 shr rdi,0x10 7: 48 c1 e7 20 shl rdi,0x20 b: 48 c1 e0 10 shl rax,0x10 f: 48 09 f8 or rax,rdi 12: 48 0f c8 bswap rax 15: c3 ret Other notes =========== In a twist, clang 13.0.0 produces the short version for memcpy (even shorter than gcc), and produces a long version for non-memcpy case (even longer than gcc). ► clang++ -O2 -c t3.cpp -Wall -DWITH_MEMCPY; objdump -Mintel -d t3.o 0000000000000000 <_Z9gc_to_num7GLOBCNT>: 0: 48 89 f8 mov rax,rdi 3: 48 c1 e0 10 shl rax,0x10 7: 48 0f c8 bswap rax a: c3 ret ► clang++ -O2 -c t3.cpp -Wall -UWITH_MEMCPY; objdump -Mintel -d t3.o 0000000000000000 <_Z9gc_to_num7GLOBCNT>: 0: 48 89 f8 mov rax,rdi 3: 48 b9 ff ff ff ff ff movabs rcx,0xffffffffffff a: ff 00 00 ... 6c: c3 ret