MOVE_MAX on x86* used to accept up to 16 bytes, even without SSE, which enabled inlining of small memmove by loading and then storing the entire range. After the "x86: Update piecewise move and store" r12-2666 change, memmove of more than 4 bytes would not be inlined in gimple_fold_bultin_memory_op, failing the expectations of a few tests.
I can see how lowering it for MOVE_MAX_PIECES can get us better codegen decisions overall, but surely inlining memmove with 2 32-bit loads and stores is better than an outline call that requires setting up 3 arguments. I suppose even 3 or 4 could do better. But maybe it is gimple_fold_builtin_memory_op that needs tweaking? Anyhow, this patch raises MOVE_MAX back a little for non-SSE targets, while preserving the new value for MOVE_MAX_PIECES. Bootstrapped on x86_64-linux-gnu. Also tested on ppc- and x86-vx7r2 with gcc-12. for gcc/ChangeLog * config/i386/i386.h (MOVE_MAX): Rename to... (MOVE_MAX_VEC): ... this. Add NONVEC parameter, and use it as the last resort, instead of UNITS_PER_WORD. (MOVE_MAX): Reintroduce in terms of MOVE_MAX_VEC, with 2*UNITS_PER_WORD. (MOVE_MAX_PIECES): Likewise, but with UNITS_PER_WORD. --- gcc/config/i386/i386.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index c7439f89bdf92..5293a332a969a 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1801,7 +1801,9 @@ typedef struct ix86_args { is the number of bytes at a time which we can move efficiently. MOVE_MAX_PIECES defaults to MOVE_MAX. */ -#define MOVE_MAX \ +#define MOVE_MAX MOVE_MAX_VEC (2 * UNITS_PER_WORD) +#define MOVE_MAX_PIECES MOVE_MAX_VEC (UNITS_PER_WORD) +#define MOVE_MAX_VEC(NONVEC) \ ((TARGET_AVX512F \ && (ix86_move_max == PVW_AVX512 \ || ix86_store_max == PVW_AVX512)) \ @@ -1813,7 +1815,7 @@ typedef struct ix86_args { : ((TARGET_SSE2 \ && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ - ? 16 : UNITS_PER_WORD))) + ? 16 : (NONVEC)))) /* STORE_MAX_PIECES is the number of bytes at a time that we can store efficiently. Allow 16/32/64 bytes only if inter-unit move is enabled -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>