For op_by_pieces operations between two areas of memory on non-strict alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to generate overlapping operations to minimize number of operations if it is not a stack push which must not overlap.
When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes and finishes with the smallest usable integer size, MIN_SIZE, for the remaining bytes where MAX_SIZE >= MIN_SIZE. If MIN_SIZE > the remaining bytes, the last operation is performed on MIN_SIZE bytes of overlapping memory from the previous operation. For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates an overlapping fill with MAX_SIZE if the number of the remaining bytes is greater than one. Code sizes are reduced slightly on glibc and GCC. Performance impact on SPEC CPU 2017 on Intel Xeon are within noise range. H.J. Lu (2): Generate overlapping operations between two areas of memory x86: Enable -foverlap-op-by-pieces by default gcc/common.opt | 19 +++ gcc/config/i386/i386-options.c | 3 + gcc/config/i386/i386.h | 2 + gcc/config/i386/x86-tune.def | 6 + gcc/doc/invoke.texi | 14 ++ gcc/expr.c | 159 ++++++++++++++++----- gcc/testsuite/g++.dg/pr90773-1.h | 14 ++ gcc/testsuite/g++.dg/pr90773-1a.C | 13 ++ gcc/testsuite/g++.dg/pr90773-1b.C | 5 + gcc/testsuite/g++.dg/pr90773-1c.C | 5 + gcc/testsuite/g++.dg/pr90773-1d.C | 19 +++ gcc/testsuite/gcc.target/i386/pr90773-1.c | 17 +++ gcc/testsuite/gcc.target/i386/pr90773-10.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-11.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-12.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-13.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-2.c | 20 +++ gcc/testsuite/gcc.target/i386/pr90773-3.c | 23 +++ gcc/testsuite/gcc.target/i386/pr90773-4.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-5.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-6.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-7.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-8.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-9.c | 13 ++ gcc/toplev.c | 8 ++ 25 files changed, 416 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c -- 2.30.2