Re: [PATCH 1/2] Generate overlapping operations between two areas of memory

2021-04-23 Thread Richard Biener via Gcc-patches
On Fri, Apr 23, 2021 at 1:35 AM H.J. Lu via Gcc-patches
 wrote:
>
> For op_by_pieces operations between two areas of memory on non-strict
> alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to
> generate overlapping operations to minimize number of operations if it
> is not a stack push which must not overlap.
>
> When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on
> starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes
> and finishes with the smallest usable integer size, MIN_SIZE, for the
> remaining bytes where MAX_SIZE >= MIN_SIZE.  If MIN_SIZE > the remaining
> bytes, the last operation is performed on MIN_SIZE bytes of overlapping
> memory from the previous operation.
>
> For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates
> an overlapping fill with MAX_SIZE if the number of the remaining bytes is
> greater than one.
>
> Tested on Linux/x86-64 with both -foverlap-op-by-pieces enabled and
> disabled by default.

Neither the user documentation nor the patch description tells me what
"generate overlapping operations" does.  I _suspect_ it's doing an
offset adjusted read/write of the last piece of a memory region to
avoid doing more than one smaller operations.  Thus for a region
of size 7 and 4-byte granular ops you'd do operations at
offset 0 and 3 rather than one at 0, a two-byte at offset 4 and
a one-byte at offset 7.

When the tail is of power-of-two size you still generate non-overlapping
ops?

For memmove there's a correctness issue so you have to make sure
to first load the last two ops before performing the stores which
increases register pressure.

I'm not sure we want a -f option to control this - not all targets will
be able to support this.  So I'd use a target hook or rather extend
the existing use_by_pieces_infrastructure_p hook with an alternate
return (some flags bitmask I guess).  We do have one extra
target hook, compare_by_pieces_branch_ratio, so by that using
an alternate hook might be also OK.

Adding a -m option in targets that want this user-controllable would
be OK of course.

Richard.

> gcc/
>
> PR middl-end/90773
> * common.opt (-foverlap-op-by-pieces): New.
> * expr.c (by_pieces_ninsns): If -foverlap-op-by-pieces is enabled,
> round up size and alignment to the widest integer mode for maximum
> size
> (op_by_pieces_d): Add get_usable_mode, m_push and
> m_non_zero_memset.
> (op_by_pieces_d::op_by_pieces_d): Add 2 bool arguments to
> initialize m_push and m_non_zero_memset.
> (op_by_pieces_d::get_usable_mode): New.
> (op_by_pieces_d::run): Use get_usable_mode to get the largest
> usable integer mode and generate overlapping operations for
> -foverlap-op-by-pieces.
> (PUSHG_P): New.
> (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
> change.
> (store_by_pieces_d::store_by_pieces_d): Likewise.
> (clear_by_pieces): Likewsie.
> * toplev.c (process_options): Issue an error when
> -foverlap-op-by-pieces is used for strict alignment target.
> * doc/invoke.texi: Document -foverlap-op-by-pieces.
>
> gcc/testsuite/
>
> PR middl-end/90773
> * g++.dg/pr90773-1.h: New test.
> * g++.dg/pr90773-1a.C: Likewise.
> * g++.dg/pr90773-1b.C: Likewise.
> * g++.dg/pr90773-1c.C: Likewise.
> * g++.dg/pr90773-1d.C: Likewise.
> * gcc.target/i386/pr90773-1.c: Likewise.
> * gcc.target/i386/pr90773-2.c: Likewise.
> * gcc.target/i386/pr90773-3.c: Likewise.
> * gcc.target/i386/pr90773-4.c: Likewise.
> * gcc.target/i386/pr90773-5.c: Likewise.
> * gcc.target/i386/pr90773-6.c: Likewise.
> * gcc.target/i386/pr90773-7.c: Likewise.
> * gcc.target/i386/pr90773-8.c: Likewise.
> * gcc.target/i386/pr90773-9.c: Likewise.
> * gcc.target/i386/pr90773-10.c: Likewise.
> * gcc.target/i386/pr90773-11.c: Likewise.
> ---
>  gcc/common.opt |  19 +++
>  gcc/doc/invoke.texi|  14 ++
>  gcc/expr.c | 159 -
>  gcc/testsuite/g++.dg/pr90773-1.h   |  14 ++
>  gcc/testsuite/g++.dg/pr90773-1a.C  |  13 ++
>  gcc/testsuite/g++.dg/pr90773-1b.C  |   5 +
>  gcc/testsuite/g++.dg/pr90773-1c.C  |   5 +
>  gcc/testsuite/g++.dg/pr90773-1d.C  |  19 +++
>  gcc/testsuite/gcc.target/i386/pr90773-1.c  |  17 +++
>  gcc/testsuite/gcc.target/i386/pr90773-10.c |  13 ++
>  gcc/testsuite/gcc.target/i386/pr90773-11.c |  13 ++
>  gcc/testsuite/gcc.target/i386/pr90773-2.c  |  20 +++
>  gcc/testsuite/gcc.target/i386/pr90773-3.c  |  23 +++
>  gcc/testsuite/gcc.target/i386/pr90773-4.c  |  13 ++
>  gcc/testsuite/gcc.target/i386/pr90773-5.c  |  13 ++
>  gcc/testsuite/gcc.target/i386/pr90773-6.c  |  11 ++
>  

[PATCH 1/2] Generate overlapping operations between two areas of memory

2021-04-22 Thread H.J. Lu via Gcc-patches
For op_by_pieces operations between two areas of memory on non-strict
alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to
generate overlapping operations to minimize number of operations if it
is not a stack push which must not overlap.

When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on
starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes
and finishes with the smallest usable integer size, MIN_SIZE, for the
remaining bytes where MAX_SIZE >= MIN_SIZE.  If MIN_SIZE > the remaining
bytes, the last operation is performed on MIN_SIZE bytes of overlapping
memory from the previous operation.

For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates
an overlapping fill with MAX_SIZE if the number of the remaining bytes is
greater than one.

Tested on Linux/x86-64 with both -foverlap-op-by-pieces enabled and
disabled by default.

gcc/

PR middl-end/90773
* common.opt (-foverlap-op-by-pieces): New.
* expr.c (by_pieces_ninsns): If -foverlap-op-by-pieces is enabled,
round up size and alignment to the widest integer mode for maximum
size
(op_by_pieces_d): Add get_usable_mode, m_push and
m_non_zero_memset.
(op_by_pieces_d::op_by_pieces_d): Add 2 bool arguments to
initialize m_push and m_non_zero_memset.
(op_by_pieces_d::get_usable_mode): New.
(op_by_pieces_d::run): Use get_usable_mode to get the largest
usable integer mode and generate overlapping operations for
-foverlap-op-by-pieces.
(PUSHG_P): New.
(move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
change.
(store_by_pieces_d::store_by_pieces_d): Likewise.
(clear_by_pieces): Likewsie.
* toplev.c (process_options): Issue an error when
-foverlap-op-by-pieces is used for strict alignment target.
* doc/invoke.texi: Document -foverlap-op-by-pieces.

gcc/testsuite/

PR middl-end/90773
* g++.dg/pr90773-1.h: New test.
* g++.dg/pr90773-1a.C: Likewise.
* g++.dg/pr90773-1b.C: Likewise.
* g++.dg/pr90773-1c.C: Likewise.
* g++.dg/pr90773-1d.C: Likewise.
* gcc.target/i386/pr90773-1.c: Likewise.
* gcc.target/i386/pr90773-2.c: Likewise.
* gcc.target/i386/pr90773-3.c: Likewise.
* gcc.target/i386/pr90773-4.c: Likewise.
* gcc.target/i386/pr90773-5.c: Likewise.
* gcc.target/i386/pr90773-6.c: Likewise.
* gcc.target/i386/pr90773-7.c: Likewise.
* gcc.target/i386/pr90773-8.c: Likewise.
* gcc.target/i386/pr90773-9.c: Likewise.
* gcc.target/i386/pr90773-10.c: Likewise.
* gcc.target/i386/pr90773-11.c: Likewise.
---
 gcc/common.opt |  19 +++
 gcc/doc/invoke.texi|  14 ++
 gcc/expr.c | 159 -
 gcc/testsuite/g++.dg/pr90773-1.h   |  14 ++
 gcc/testsuite/g++.dg/pr90773-1a.C  |  13 ++
 gcc/testsuite/g++.dg/pr90773-1b.C  |   5 +
 gcc/testsuite/g++.dg/pr90773-1c.C  |   5 +
 gcc/testsuite/g++.dg/pr90773-1d.C  |  19 +++
 gcc/testsuite/gcc.target/i386/pr90773-1.c  |  17 +++
 gcc/testsuite/gcc.target/i386/pr90773-10.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-11.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-2.c  |  20 +++
 gcc/testsuite/gcc.target/i386/pr90773-3.c  |  23 +++
 gcc/testsuite/gcc.target/i386/pr90773-4.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-5.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-6.c  |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-7.c  |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-8.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-9.c  |  13 ++
 gcc/toplev.c   |   8 ++
 20 files changed, 383 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c

diff --git a/gcc/common.opt b/gcc/common.opt
index a75b44ee47e..7f5b38c7810 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2123,6 +2123,25 @@