[Bug middle-end/90773] Improve piecewise operation

2021-10-06 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

H.J. Lu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #20 from H.J. Lu  ---
*** Bug 98442 has been marked as a duplicate of this bug. ***

[Bug middle-end/90773] Improve piecewise operation

2021-08-21 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

H.J. Lu  changed:

   What|Removed |Added

 CC||guillaume.melquiond at inria 
dot f
   ||r

--- Comment #19 from H.J. Lu  ---
*** Bug 33103 has been marked as a duplicate of this bug. ***

[Bug middle-end/90773] Improve piecewise operation

2021-08-07 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #18 from H.J. Lu  ---
*** Bug 74113 has been marked as a duplicate of this bug. ***

[Bug middle-end/90773] Improve piecewise operation

2021-08-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

H.J. Lu  changed:

   What|Removed |Added

 CC||jyasskin at gcc dot gnu.org

--- Comment #17 from H.J. Lu  ---
*** Bug 56511 has been marked as a duplicate of this bug. ***

[Bug middle-end/90773] Improve piecewise operation

2021-08-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

H.J. Lu  changed:

   What|Removed |Added

 CC||barry.revzin at gmail dot com

--- Comment #16 from H.J. Lu  ---
*** Bug 89226 has been marked as a duplicate of this bug. ***

[Bug middle-end/90773] Improve piecewise operation

2021-08-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

H.J. Lu  changed:

   What|Removed |Added

 CC||vincenzo.innocente at cern dot 
ch

--- Comment #15 from H.J. Lu  ---
*** Bug 80566 has been marked as a duplicate of this bug. ***

[Bug middle-end/90773] Improve piecewise operation

2021-08-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #14 from H.J. Lu  ---
*** Bug 90599 has been marked as a duplicate of this bug. ***

[Bug middle-end/90773] Improve piecewise operation

2021-08-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #13 from H.J. Lu  ---
Fixed for GCC 12.

[Bug middle-end/90773] Improve piecewise operation

2021-08-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #12 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:bf159e5e124838ddfdb91e0688b1df60645d4ba9

commit r12-2667-gbf159e5e124838ddfdb91e0688b1df60645d4ba9
Author: H.J. Lu 
Date:   Mon Aug 2 10:01:46 2021 -0700

x86: Add AVX2 tests for PR middle-end/90773

PR middle-end/90773
* gcc.target/i386/pr90773-20.c: New test.
* gcc.target/i386/pr90773-21.c: Likewise.
* gcc.target/i386/pr90773-22.c: Likewise.
* gcc.target/i386/pr90773-23.c: Likewise.
* gcc.target/i386/pr90773-26.c: Likewise.

[Bug middle-end/90773] Improve piecewise operation

2021-08-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #11 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:1bee034e012d1146d34b0d767fe28a485c210e4b

commit r12-2664-g1bee034e012d1146d34b0d767fe28a485c210e4b
Author: H.J. Lu 
Date:   Mon Aug 2 10:01:46 2021 -0700

x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX

Define TARGET_GEN_MEMSET_SCRATCH_RTX to ix86_gen_scratch_sse_rtx to
return a scratch SSE register for memset.

gcc/

PR middle-end/90773
* config/i386/i386.c (TARGET_GEN_MEMSET_SCRATCH_RTX): New.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-5.c: Updated to expect XMM register.
* gcc.target/i386/pr90773-14.c: Likewise.
* gcc.target/i386/pr90773-15.c: New test.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-18.c: Likewise.
* gcc.target/i386/pr90773-19.c: Likewise.

[Bug middle-end/90773] Improve piecewise operation

2021-08-01 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #10 from H.J. Lu  ---
*** Bug 89252 has been marked as a duplicate of this bug. ***

[Bug middle-end/90773] Improve piecewise operation

2021-07-30 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #9 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:e5e164effa30fd2b5c5bc3e6883d63889e96d8da

commit r12-2633-ge5e164effa30fd2b5c5bc3e6883d63889e96d8da
Author: H.J. Lu 
Date:   Sun Mar 6 06:38:21 2016 -0800

Add QI vector mode support to by-pieces for memset

1. Replace scalar_int_mode with fixed_size_mode in the by-pieces
infrastructure to allow non-integer mode.
2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size
to return QI vector mode for memset.
3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the
smallest integer or QI vector mode.
4. Remove clear_by_pieces_1 and use builtin_memset_read_str in
clear_by_pieces to support vector mode broadcast.
5. Add lowpart_subreg_regno, a wrapper around simplify_subreg_regno that
uses subreg_lowpart_offset (mode, prev_mode) as the offset.
6. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

gcc/

PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Change the mode argument
from scalar_int_mode to fixed_size_mode.
(builtin_strncpy_read_str): Likewise.
(gen_memset_value_from_prev): New function.
(builtin_memset_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Use gen_memset_value_from_prev
and support CONST_VECTOR.
(builtin_memset_gen_str): Likewise.
(try_store_by_multiple_pieces): Use by_pieces_constfn to declare
constfun.
* builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode
with fixed_size_mode.
(builtin_memset_read_str): Likewise.
* expr.c (widest_int_mode_for_size): Renamed to ...
(widest_fixed_size_mode_for_size): Add a bool argument to
indicate if QI vector mode can be used.
(by_pieces_ninsns): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(pieces_addr::adjust): Change the mode argument from
scalar_int_mode to fixed_size_mode.
(op_by_pieces_d): Make m_len read-only.  Add a bool member,
m_qi_vector_mode, to indicate that QI vector mode can be used.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_qi_vector_mode.  Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(op_by_pieces_d::get_usable_mode): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Call
widest_fixed_size_mode_for_size instead of
widest_int_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): New member
function to return the smallest integer or QI vector mode.
(op_by_pieces_d::run): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Call
smallest_fixed_size_mode_for_size instead of
smallest_int_mode_for_size.
(store_by_pieces_d::store_by_pieces_d): Add a bool argument to
indicate that QI vector mode can be used and pass it to
op_by_pieces_d::op_by_pieces_d.
(can_store_by_pieces): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Pass memsetp to
widest_fixed_size_mode_for_size to support QI vector mode.
Allow all CONST_VECTORs for memset if vec_duplicate is supported.
(store_by_pieces): Pass memsetp to
store_by_pieces_d::store_by_pieces_d.
(clear_by_pieces_1): Removed.
(clear_by_pieces): Replace clear_by_pieces_1 with
builtin_memset_read_str and pass true to store_by_pieces_d to
support vector mode broadcast.
(string_cst_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.
* expr.h (by_pieces_constfn): Change scalar_int_mode to
fixed_size_mode.
(by_pieces_prev): Likewise.
* rtl.h (lowpart_subreg_regno): New.
* rtlanal.c (lowpart_subreg_regno): New.  A wrapper around
simplify_subreg_regno.
* target.def (gen_memset_scratch_rtx): New hook.
* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
* doc/tm.texi: Regenerated.

gcc/testsuite/

* gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of
vmovdqu.
* gcc.target/i386/pr100865-4b.c: Likewise.

[Bug middle-end/90773] Improve piecewise operation

2021-05-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #8 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:53fb833d635da04f5b44af16bcea1082e7b59e75

commit r12-978-g53fb833d635da04f5b44af16bcea1082e7b59e75
Author: H.J. Lu 
Date:   Fri May 21 05:16:20 2021 -0700

Elide expand_constructor if move by pieces is preferred

Elide expand_constructor when the constructor is static storage and not
mostly zeros and we can move it by pieces prefer to do so since that's
usually more efficient than performing a series of stores from immediates.

2021-05-21  Richard Biener  
H.J. Lu  

gcc/

PR middle-end/90773
* expr.c (expand_constructor): Elide expand_constructor if
move by pieces is preferred.

gcc/testsuite/

* gcc.target/i386/pr90773-24.c: New test.
* gcc.target/i386/pr90773-25.c: Likewise.

[Bug middle-end/90773] Improve piecewise operation

2021-04-30 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #7 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:86c77c52f7b812adccf9620860f7c392f9a16cfc

commit r12-313-g86c77c52f7b812adccf9620860f7c392f9a16cfc
Author: H.J. Lu 
Date:   Thu Apr 29 11:12:09 2021 -0700

Don't use nullptr return from simplify_gen_subreg

Check nullptr return from simplify_gen_subreg.  Don't use it if it is
nullptr.

PR middle-end/90773
* builtins.c (builtin_memset_gen_str): Don't use return from
simplify_gen_subreg.

[Bug middle-end/90773] Improve piecewise operation

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #6 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:985b3a6837dee7001e6b618f073ed74f0edf5787

commit r12-285-g985b3a6837dee7001e6b618f073ed74f0edf5787
Author: H.J. Lu 
Date:   Mon Jun 10 09:57:15 2019 -0700

Generate offset adjusted operation for op_by_pieces operations

Add an overlap_op_by_pieces_p target hook for op_by_pieces operations
between two areas of memory to generate one offset adjusted operation
in the smallest integer mode for the remaining bytes on the last piece
operation of a memory region to avoid doing more than one smaller
operations.

Pass the RTL information from the previous iteration to m_constfn in
op_by_pieces operation so that builtin_memset_[read|gen]_str can
generate the new RTL from the previous RTL.

Tested on Linux/x86-64.

gcc/

PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Add a dummy argument.
(builtin_strncpy_read_str): Likewise.
(builtin_memset_read_str): Add an argument for the previous RTL
information and generate the new RTL from the previous RTL info.
(builtin_memset_gen_str): Likewise.
* builtins.h (builtin_strncpy_read_str): Update the prototype.
(builtin_memset_read_str): Likewise.
* expr.c (by_pieces_ninsns): If targetm.overlap_op_by_pieces_p()
returns true, round up size and alignment to the widest integer
mode for maximum size.
(pieces_addr::adjust): Add a pointer to by_pieces_prev argument
and pass it to m_constfn.
(op_by_pieces_d): Add m_push and m_overlap_op_by_pieces.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_push.  Initialize m_overlap_op_by_pieces with
targetm.overlap_op_by_pieces_p ().
(op_by_pieces_d::run): Pass the previous RTL information to
pieces_addr::adjust and generate overlapping operations if
m_overlap_op_by_pieces is true.
(PUSHG_P): New.
(move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
change.
(store_by_pieces_d::store_by_pieces_d): Updated for op_by_pieces_d
change.
(can_store_by_pieces): Use by_pieces_constfn on constfun.
(store_by_pieces): Use by_pieces_constfn on constfun.  Updated
for op_by_pieces_d change.
(clear_by_pieces_1): Add a dummy argument.
(clear_by_pieces): Updated for op_by_pieces_d change.
(compare_by_pieces_d::compare_by_pieces_d): Likewise.
(string_cst_read_str): Add a dummy argument.
* expr.h (by_pieces_constfn): Add a dummy argument.
(by_pieces_prev): New.
* target.def (overlap_op_by_pieces_p): New target hook.
* config/i386/i386.c (TARGET_OVERLAP_OP_BY_PIECES_P): New.
* doc/tm.texi.in: Add TARGET_OVERLAP_OP_BY_PIECES_P.
* doc/tm.texi: Regenerated.

gcc/testsuite/

PR middle-end/90773
* g++.dg/pr90773-1.h: New test.
* g++.dg/pr90773-1a.C: Likewise.
* g++.dg/pr90773-1b.C: Likewise.
* g++.dg/pr90773-1c.C: Likewise.
* g++.dg/pr90773-1d.C: Likewise.
* gcc.target/i386/pr90773-1.c: Likewise.
* gcc.target/i386/pr90773-2.c: Likewise.
* gcc.target/i386/pr90773-3.c: Likewise.
* gcc.target/i386/pr90773-4.c: Likewise.
* gcc.target/i386/pr90773-5.c: Likewise.
* gcc.target/i386/pr90773-6.c: Likewise.
* gcc.target/i386/pr90773-7.c: Likewise.
* gcc.target/i386/pr90773-8.c: Likewise.
* gcc.target/i386/pr90773-9.c: Likewise.
* gcc.target/i386/pr90773-10.c: Likewise.
* gcc.target/i386/pr90773-11.c: Likewise.
* gcc.target/i386/pr90773-12.c: Likewise.
* gcc.target/i386/pr90773-13.c: Likewise.
* gcc.target/i386/pr90773-14.c: Likewise.

[Bug middle-end/90773] Improve piecewise operation

2021-04-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #5 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:3bb41228d76b3a3cbd9923d57388f0903f7683de

commit r12-162-g3bb41228d76b3a3cbd9923d57388f0903f7683de
Author: H.J. Lu 
Date:   Mon Apr 26 15:36:18 2021 -0700

op_by_pieces_d::run: Change a while loop to a do-while loop

Change a while loop in op_by_pieces_d::run to a do-while loop to prepare
for offset adjusted operation for the remaining bytes on the last piece
operation of a memory region.

PR middle-end/90773
* expr.c (op_by_pieces_d::get_usable_mode): New member function.
(op_by_pieces_d::run): Cange a while loop to a do-while loop.

[Bug middle-end/90773] Improve piecewise operation

2019-07-03 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #4 from Jeffrey A. Law  ---
But isn't -Os a better choice if we care about this?   It gives more compact
code.  I guess I'm just not sure optimizing this using an overlapping store is
all that important.

[Bug middle-end/90773] Improve piecewise operation

2019-06-10 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #3 from H.J. Lu  ---
Created attachment 46476
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46476&action=edit
I am testing this patch.

[Bug middle-end/90773] Improve piecewise operation

2019-06-07 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #2 from H.J. Lu  ---
Something like this:

diff --git a/gcc/expr.c b/gcc/expr.c
index c78bc74c0d9..4412aa7518c 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1090,10 +1090,13 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
 void
 op_by_pieces_d::run ()
 {
+  bool started = false;
+
   while (m_max_size > 1 && m_len > 0)
 {
   scalar_int_mode mode = widest_int_mode_for_size (m_max_size);

+repeat:
   if (prepare_mode (mode, m_align))
{
  unsigned int size = GET_MODE_SIZE (mode);
@@ -1101,6 +1104,8 @@ op_by_pieces_d::run ()

  while (m_len >= size)
{
+ started = true;
+
  if (m_reverse)
m_offset -= size;

@@ -1124,6 +1129,24 @@ op_by_pieces_d::run ()
  finish_mode (mode);
}

+  if (m_len == 0)
+   break;
+
+  if (started)
+   {
+ mode = smallest_int_mode_for_size (m_len * BITS_PER_UNIT);
+ unsigned int last_gap = GET_MODE_SIZE (mode) - m_len;
+ if (last_gap)
+   {
+ if (m_reverse)
+   m_offset += last_gap;
+ else
+   m_offset -= last_gap;
+ m_len += last_gap;
+ goto repeat;
+   }
+   }
+
   m_max_size = GET_MODE_SIZE (mode);
 }

which should be opt-in by target.

[Bug middle-end/90773] Improve piecewise operation

2019-06-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-06-07
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.  Note that both the unaligned load and the unaligned store may
have issues with preceeding stores or following loads and store-to-load
forwarding.  I'm not sure the speculative forwarding of aligned parts
will be retired successfully because of the aliased unaligned parts, even
if they have the same value.

But definitely an improvement for -Os.

Probably needs some target tuning knobs if implemented in the middle-end
block-copy machinery.