https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140

            Bug ID: 70140
           Summary: Inefficient expansion of __builtin_mempcpy
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wdijkstr at arm dot com
  Target Milestone: ---

The expansion of __builtin_mempcpy is inefficient on many targets (eg. AArch64,
ARM, PPC). The issue is due to not using the same expansion options that memcpy
uses in builtins.c. As a result GCC6 produces for __builtin_mempcpy(x, y, 32):

PPC:
   0:   38 a0 00 20     li      r5,32
   4:   48 00 00 00     b       4 <foo+0x4>
                        4: R_PPC_REL24  mempcpy
   8:   60 00 00 00     nop
   c:   60 42 00 00     ori     r2,r2,0

AArch64:
        mov     x2, 32
        b       mempcpy

A second issue is that GCC always calls mempcpy. mempcpy is not supported or
implemented efficiently in many (if not most) library/target combinations.
GLIBC only has 3 targets which implement an optimized mempcpy, so GLIBC
currently inlines mempcpy into memcpy by default unless a target explicitly
disables this. It seems better to do this in GCC so it works for all libraries.

Reply via email to