https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140
Bug ID: 70140 Summary: Inefficient expansion of __builtin_mempcpy Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The expansion of __builtin_mempcpy is inefficient on many targets (eg. AArch64, ARM, PPC). The issue is due to not using the same expansion options that memcpy uses in builtins.c. As a result GCC6 produces for __builtin_mempcpy(x, y, 32): PPC: 0: 38 a0 00 20 li r5,32 4: 48 00 00 00 b 4 <foo+0x4> 4: R_PPC_REL24 mempcpy 8: 60 00 00 00 nop c: 60 42 00 00 ori r2,r2,0 AArch64: mov x2, 32 b mempcpy A second issue is that GCC always calls mempcpy. mempcpy is not supported or implemented efficiently in many (if not most) library/target combinations. GLIBC only has 3 targets which implement an optimized mempcpy, so GLIBC currently inlines mempcpy into memcpy by default unless a target explicitly disables this. It seems better to do this in GCC so it works for all libraries.