GCC does not currently do inline expansion of overlapping memmove, nor does it have an expansion pattern to allow for non-overlapping memcpy, so I plan to add patterns and support to implement this in gcc 10 timeframe.
At present memcpy and memmove are kind of entangled. Here's the current state of play: memcpy -> expand with movmem pattern memmove (no overlap) -> transform to memcpy -> expand with movmem pattern memmove (overlap) -> remains memmove -> glibc call There are several problems currently. If the memmove() arguments are in fact overlapping, then the expansion is actually not used which makes no sense and costs performance of calling a library function instead of inline expanding memmove() of small blocks. There is currently no way to have a separate memcpy pattern. I know from experience with expansion of memcmp on power that lengths on the order of hundreds of bytes are needed before the function call overhead is overcome by optimized glibc code. But we need the memcpy guarantee of non-overlapping arguments to make that happen, as we don't want to do a runtime overlap test. There is some analysis that happens in gimple_fold_builtin_memory_op() that determines when memmove calls cannot have an overlap between the arguments and converts them into memcpy() which is nice. However in builtins.c expand_builtin_memmove() does not actually do the expansion using the memmove pattern. This is why a memmove() call that cannot be converted to memcpy() by gimple_fold_builtin_memory_op() is not expanded and we call glibc memmove(). Only expand_builtin_memcpy() actually uses the memmove pattern. So here's my proposed set of fixes: * Add new optab entries for nonoverlapping_memcpy and overlapping_memmove cases. * The movmem optab will continue to be treated exactly as it is today so that ports that might have a broken movmem pattern that doesn't actually handle the overlap cases will continue to work. * expand_builtin_memmove() needs to actually do the memmove() expansion. * expand_builtin_memcpy() needs to use cpymem. Currently this happens down in emit_block_move_via_movmem() so some functions might need to be renamed. * ports can then add the new overlapping move and nonoverlapping copy expanders and will get better expansion of both memmove and memcpy functions. I'd be interested in any comments about pieces of this machinery that need to work a certain way, or other related issues that should be addressed in between expand_builtin_memcpy() and emit_block_move_via_movmem(). Thanks! Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain