For op_by_pieces operations between two areas of memory on non-strict
alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to
generate overlapping operations to minimize number of operations if it
is not a stack push which must not overlap.

When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on
starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes
and finishes with the smallest usable integer size, MIN_SIZE, for the
remaining bytes where MAX_SIZE >= MIN_SIZE.  If MIN_SIZE > the remaining
bytes, the last operation is performed on MIN_SIZE bytes of overlapping
memory from the previous operation.

For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates
an overlapping fill with MAX_SIZE if the number of the remaining bytes is
greater than one.

Code sizes are reduced slightly on glibc and GCC.  Performance impact on
SPEC CPU 2017 on Intel Xeon are within noise range.

H.J. Lu (2):
  Generate overlapping operations between two areas of memory
  x86: Enable -foverlap-op-by-pieces by default

 gcc/common.opt                             |  19 +++
 gcc/config/i386/i386-options.c             |   3 +
 gcc/config/i386/i386.h                     |   2 +
 gcc/config/i386/x86-tune.def               |   6 +
 gcc/doc/invoke.texi                        |  14 ++
 gcc/expr.c                                 | 159 ++++++++++++++++-----
 gcc/testsuite/g++.dg/pr90773-1.h           |  14 ++
 gcc/testsuite/g++.dg/pr90773-1a.C          |  13 ++
 gcc/testsuite/g++.dg/pr90773-1b.C          |   5 +
 gcc/testsuite/g++.dg/pr90773-1c.C          |   5 +
 gcc/testsuite/g++.dg/pr90773-1d.C          |  19 +++
 gcc/testsuite/gcc.target/i386/pr90773-1.c  |  17 +++
 gcc/testsuite/gcc.target/i386/pr90773-10.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-11.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-12.c |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-13.c |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-2.c  |  20 +++
 gcc/testsuite/gcc.target/i386/pr90773-3.c  |  23 +++
 gcc/testsuite/gcc.target/i386/pr90773-4.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-5.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-6.c  |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-7.c  |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-8.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-9.c  |  13 ++
 gcc/toplev.c                               |   8 ++
 25 files changed, 416 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c

-- 
2.30.2

Reply via email to