Re: [PATCH 1/2] Generate overlapping operations between two areas of memory
On Fri, Apr 23, 2021 at 1:35 AM H.J. Lu via Gcc-patches wrote: > > For op_by_pieces operations between two areas of memory on non-strict > alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to > generate overlapping operations to minimize number of operations if it > is not a stack push which must not overlap. > > When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on > starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes > and finishes with the smallest usable integer size, MIN_SIZE, for the > remaining bytes where MAX_SIZE >= MIN_SIZE. If MIN_SIZE > the remaining > bytes, the last operation is performed on MIN_SIZE bytes of overlapping > memory from the previous operation. > > For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates > an overlapping fill with MAX_SIZE if the number of the remaining bytes is > greater than one. > > Tested on Linux/x86-64 with both -foverlap-op-by-pieces enabled and > disabled by default. Neither the user documentation nor the patch description tells me what "generate overlapping operations" does. I _suspect_ it's doing an offset adjusted read/write of the last piece of a memory region to avoid doing more than one smaller operations. Thus for a region of size 7 and 4-byte granular ops you'd do operations at offset 0 and 3 rather than one at 0, a two-byte at offset 4 and a one-byte at offset 7. When the tail is of power-of-two size you still generate non-overlapping ops? For memmove there's a correctness issue so you have to make sure to first load the last two ops before performing the stores which increases register pressure. I'm not sure we want a -f option to control this - not all targets will be able to support this. So I'd use a target hook or rather extend the existing use_by_pieces_infrastructure_p hook with an alternate return (some flags bitmask I guess). We do have one extra target hook, compare_by_pieces_branch_ratio, so by that using an alternate hook might be also OK. Adding a -m option in targets that want this user-controllable would be OK of course. Richard. > gcc/ > > PR middl-end/90773 > * common.opt (-foverlap-op-by-pieces): New. > * expr.c (by_pieces_ninsns): If -foverlap-op-by-pieces is enabled, > round up size and alignment to the widest integer mode for maximum > size > (op_by_pieces_d): Add get_usable_mode, m_push and > m_non_zero_memset. > (op_by_pieces_d::op_by_pieces_d): Add 2 bool arguments to > initialize m_push and m_non_zero_memset. > (op_by_pieces_d::get_usable_mode): New. > (op_by_pieces_d::run): Use get_usable_mode to get the largest > usable integer mode and generate overlapping operations for > -foverlap-op-by-pieces. > (PUSHG_P): New. > (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d > change. > (store_by_pieces_d::store_by_pieces_d): Likewise. > (clear_by_pieces): Likewsie. > * toplev.c (process_options): Issue an error when > -foverlap-op-by-pieces is used for strict alignment target. > * doc/invoke.texi: Document -foverlap-op-by-pieces. > > gcc/testsuite/ > > PR middl-end/90773 > * g++.dg/pr90773-1.h: New test. > * g++.dg/pr90773-1a.C: Likewise. > * g++.dg/pr90773-1b.C: Likewise. > * g++.dg/pr90773-1c.C: Likewise. > * g++.dg/pr90773-1d.C: Likewise. > * gcc.target/i386/pr90773-1.c: Likewise. > * gcc.target/i386/pr90773-2.c: Likewise. > * gcc.target/i386/pr90773-3.c: Likewise. > * gcc.target/i386/pr90773-4.c: Likewise. > * gcc.target/i386/pr90773-5.c: Likewise. > * gcc.target/i386/pr90773-6.c: Likewise. > * gcc.target/i386/pr90773-7.c: Likewise. > * gcc.target/i386/pr90773-8.c: Likewise. > * gcc.target/i386/pr90773-9.c: Likewise. > * gcc.target/i386/pr90773-10.c: Likewise. > * gcc.target/i386/pr90773-11.c: Likewise. > --- > gcc/common.opt | 19 +++ > gcc/doc/invoke.texi| 14 ++ > gcc/expr.c | 159 - > gcc/testsuite/g++.dg/pr90773-1.h | 14 ++ > gcc/testsuite/g++.dg/pr90773-1a.C | 13 ++ > gcc/testsuite/g++.dg/pr90773-1b.C | 5 + > gcc/testsuite/g++.dg/pr90773-1c.C | 5 + > gcc/testsuite/g++.dg/pr90773-1d.C | 19 +++ > gcc/testsuite/gcc.target/i386/pr90773-1.c | 17 +++ > gcc/testsuite/gcc.target/i386/pr90773-10.c | 13 ++ > gcc/testsuite/gcc.target/i386/pr90773-11.c | 13 ++ > gcc/testsuite/gcc.target/i386/pr90773-2.c | 20 +++ > gcc/testsuite/gcc.target/i386/pr90773-3.c | 23 +++ > gcc/testsuite/gcc.target/i386/pr90773-4.c | 13 ++ > gcc/testsuite/gcc.target/i386/pr90773-5.c | 13 ++ > gcc/testsuite/gcc.target/i386/pr90773-6.c | 11 ++ >
[PATCH 1/2] Generate overlapping operations between two areas of memory
For op_by_pieces operations between two areas of memory on non-strict alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to generate overlapping operations to minimize number of operations if it is not a stack push which must not overlap. When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes and finishes with the smallest usable integer size, MIN_SIZE, for the remaining bytes where MAX_SIZE >= MIN_SIZE. If MIN_SIZE > the remaining bytes, the last operation is performed on MIN_SIZE bytes of overlapping memory from the previous operation. For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates an overlapping fill with MAX_SIZE if the number of the remaining bytes is greater than one. Tested on Linux/x86-64 with both -foverlap-op-by-pieces enabled and disabled by default. gcc/ PR middl-end/90773 * common.opt (-foverlap-op-by-pieces): New. * expr.c (by_pieces_ninsns): If -foverlap-op-by-pieces is enabled, round up size and alignment to the widest integer mode for maximum size (op_by_pieces_d): Add get_usable_mode, m_push and m_non_zero_memset. (op_by_pieces_d::op_by_pieces_d): Add 2 bool arguments to initialize m_push and m_non_zero_memset. (op_by_pieces_d::get_usable_mode): New. (op_by_pieces_d::run): Use get_usable_mode to get the largest usable integer mode and generate overlapping operations for -foverlap-op-by-pieces. (PUSHG_P): New. (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d change. (store_by_pieces_d::store_by_pieces_d): Likewise. (clear_by_pieces): Likewsie. * toplev.c (process_options): Issue an error when -foverlap-op-by-pieces is used for strict alignment target. * doc/invoke.texi: Document -foverlap-op-by-pieces. gcc/testsuite/ PR middl-end/90773 * g++.dg/pr90773-1.h: New test. * g++.dg/pr90773-1a.C: Likewise. * g++.dg/pr90773-1b.C: Likewise. * g++.dg/pr90773-1c.C: Likewise. * g++.dg/pr90773-1d.C: Likewise. * gcc.target/i386/pr90773-1.c: Likewise. * gcc.target/i386/pr90773-2.c: Likewise. * gcc.target/i386/pr90773-3.c: Likewise. * gcc.target/i386/pr90773-4.c: Likewise. * gcc.target/i386/pr90773-5.c: Likewise. * gcc.target/i386/pr90773-6.c: Likewise. * gcc.target/i386/pr90773-7.c: Likewise. * gcc.target/i386/pr90773-8.c: Likewise. * gcc.target/i386/pr90773-9.c: Likewise. * gcc.target/i386/pr90773-10.c: Likewise. * gcc.target/i386/pr90773-11.c: Likewise. --- gcc/common.opt | 19 +++ gcc/doc/invoke.texi| 14 ++ gcc/expr.c | 159 - gcc/testsuite/g++.dg/pr90773-1.h | 14 ++ gcc/testsuite/g++.dg/pr90773-1a.C | 13 ++ gcc/testsuite/g++.dg/pr90773-1b.C | 5 + gcc/testsuite/g++.dg/pr90773-1c.C | 5 + gcc/testsuite/g++.dg/pr90773-1d.C | 19 +++ gcc/testsuite/gcc.target/i386/pr90773-1.c | 17 +++ gcc/testsuite/gcc.target/i386/pr90773-10.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-11.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-2.c | 20 +++ gcc/testsuite/gcc.target/i386/pr90773-3.c | 23 +++ gcc/testsuite/gcc.target/i386/pr90773-4.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-5.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-6.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-7.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-8.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-9.c | 13 ++ gcc/toplev.c | 8 ++ 20 files changed, 383 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c diff --git a/gcc/common.opt b/gcc/common.opt index a75b44ee47e..7f5b38c7810 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2123,6 +2123,25 @@