[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 H.J. Lu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #20 from H.J. Lu --- *** Bug 98442 has been marked as a duplicate of this bug. ***
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 H.J. Lu changed: What|Removed |Added CC||guillaume.melquiond at inria dot f ||r --- Comment #19 from H.J. Lu --- *** Bug 33103 has been marked as a duplicate of this bug. ***
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #18 from H.J. Lu --- *** Bug 74113 has been marked as a duplicate of this bug. ***
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 H.J. Lu changed: What|Removed |Added CC||jyasskin at gcc dot gnu.org --- Comment #17 from H.J. Lu --- *** Bug 56511 has been marked as a duplicate of this bug. ***
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 H.J. Lu changed: What|Removed |Added CC||barry.revzin at gmail dot com --- Comment #16 from H.J. Lu --- *** Bug 89226 has been marked as a duplicate of this bug. ***
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 H.J. Lu changed: What|Removed |Added CC||vincenzo.innocente at cern dot ch --- Comment #15 from H.J. Lu --- *** Bug 80566 has been marked as a duplicate of this bug. ***
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #14 from H.J. Lu --- *** Bug 90599 has been marked as a duplicate of this bug. ***
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 H.J. Lu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #13 from H.J. Lu --- Fixed for GCC 12.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #12 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:bf159e5e124838ddfdb91e0688b1df60645d4ba9 commit r12-2667-gbf159e5e124838ddfdb91e0688b1df60645d4ba9 Author: H.J. Lu Date: Mon Aug 2 10:01:46 2021 -0700 x86: Add AVX2 tests for PR middle-end/90773 PR middle-end/90773 * gcc.target/i386/pr90773-20.c: New test. * gcc.target/i386/pr90773-21.c: Likewise. * gcc.target/i386/pr90773-22.c: Likewise. * gcc.target/i386/pr90773-23.c: Likewise. * gcc.target/i386/pr90773-26.c: Likewise.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #11 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:1bee034e012d1146d34b0d767fe28a485c210e4b commit r12-2664-g1bee034e012d1146d34b0d767fe28a485c210e4b Author: H.J. Lu Date: Mon Aug 2 10:01:46 2021 -0700 x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX Define TARGET_GEN_MEMSET_SCRATCH_RTX to ix86_gen_scratch_sse_rtx to return a scratch SSE register for memset. gcc/ PR middle-end/90773 * config/i386/i386.c (TARGET_GEN_MEMSET_SCRATCH_RTX): New. gcc/testsuite/ PR middle-end/90773 * gcc.target/i386/pr90773-5.c: Updated to expect XMM register. * gcc.target/i386/pr90773-14.c: Likewise. * gcc.target/i386/pr90773-15.c: New test. * gcc.target/i386/pr90773-16.c: Likewise. * gcc.target/i386/pr90773-17.c: Likewise. * gcc.target/i386/pr90773-18.c: Likewise. * gcc.target/i386/pr90773-19.c: Likewise.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #10 from H.J. Lu --- *** Bug 89252 has been marked as a duplicate of this bug. ***
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #9 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:e5e164effa30fd2b5c5bc3e6883d63889e96d8da commit r12-2633-ge5e164effa30fd2b5c5bc3e6883d63889e96d8da Author: H.J. Lu Date: Sun Mar 6 06:38:21 2016 -0800 Add QI vector mode support to by-pieces for memset 1. Replace scalar_int_mode with fixed_size_mode in the by-pieces infrastructure to allow non-integer mode. 2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size to return QI vector mode for memset. 3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the smallest integer or QI vector mode. 4. Remove clear_by_pieces_1 and use builtin_memset_read_str in clear_by_pieces to support vector mode broadcast. 5. Add lowpart_subreg_regno, a wrapper around simplify_subreg_regno that uses subreg_lowpart_offset (mode, prev_mode) as the offset. 6. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard scratch register to avoid stack realignment when expanding memset. gcc/ PR middle-end/90773 * builtins.c (builtin_memcpy_read_str): Change the mode argument from scalar_int_mode to fixed_size_mode. (builtin_strncpy_read_str): Likewise. (gen_memset_value_from_prev): New function. (builtin_memset_read_str): Change the mode argument from scalar_int_mode to fixed_size_mode. Use gen_memset_value_from_prev and support CONST_VECTOR. (builtin_memset_gen_str): Likewise. (try_store_by_multiple_pieces): Use by_pieces_constfn to declare constfun. * builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode with fixed_size_mode. (builtin_memset_read_str): Likewise. * expr.c (widest_int_mode_for_size): Renamed to ... (widest_fixed_size_mode_for_size): Add a bool argument to indicate if QI vector mode can be used. (by_pieces_ninsns): Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. (pieces_addr::adjust): Change the mode argument from scalar_int_mode to fixed_size_mode. (op_by_pieces_d): Make m_len read-only. Add a bool member, m_qi_vector_mode, to indicate that QI vector mode can be used. (op_by_pieces_d::op_by_pieces_d): Add a bool argument to initialize m_qi_vector_mode. Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. (op_by_pieces_d::get_usable_mode): Change the mode argument from scalar_int_mode to fixed_size_mode. Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. (op_by_pieces_d::smallest_fixed_size_mode_for_size): New member function to return the smallest integer or QI vector mode. (op_by_pieces_d::run): Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. Call smallest_fixed_size_mode_for_size instead of smallest_int_mode_for_size. (store_by_pieces_d::store_by_pieces_d): Add a bool argument to indicate that QI vector mode can be used and pass it to op_by_pieces_d::op_by_pieces_d. (can_store_by_pieces): Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. Pass memsetp to widest_fixed_size_mode_for_size to support QI vector mode. Allow all CONST_VECTORs for memset if vec_duplicate is supported. (store_by_pieces): Pass memsetp to store_by_pieces_d::store_by_pieces_d. (clear_by_pieces_1): Removed. (clear_by_pieces): Replace clear_by_pieces_1 with builtin_memset_read_str and pass true to store_by_pieces_d to support vector mode broadcast. (string_cst_read_str): Change the mode argument from scalar_int_mode to fixed_size_mode. * expr.h (by_pieces_constfn): Change scalar_int_mode to fixed_size_mode. (by_pieces_prev): Likewise. * rtl.h (lowpart_subreg_regno): New. * rtlanal.c (lowpart_subreg_regno): New. A wrapper around simplify_subreg_regno. * target.def (gen_memset_scratch_rtx): New hook. * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX. * doc/tm.texi: Regenerated. gcc/testsuite/ * gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of vmovdqu. * gcc.target/i386/pr100865-4b.c: Likewise.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #8 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:53fb833d635da04f5b44af16bcea1082e7b59e75 commit r12-978-g53fb833d635da04f5b44af16bcea1082e7b59e75 Author: H.J. Lu Date: Fri May 21 05:16:20 2021 -0700 Elide expand_constructor if move by pieces is preferred Elide expand_constructor when the constructor is static storage and not mostly zeros and we can move it by pieces prefer to do so since that's usually more efficient than performing a series of stores from immediates. 2021-05-21 Richard Biener H.J. Lu gcc/ PR middle-end/90773 * expr.c (expand_constructor): Elide expand_constructor if move by pieces is preferred. gcc/testsuite/ * gcc.target/i386/pr90773-24.c: New test. * gcc.target/i386/pr90773-25.c: Likewise.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #7 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:86c77c52f7b812adccf9620860f7c392f9a16cfc commit r12-313-g86c77c52f7b812adccf9620860f7c392f9a16cfc Author: H.J. Lu Date: Thu Apr 29 11:12:09 2021 -0700 Don't use nullptr return from simplify_gen_subreg Check nullptr return from simplify_gen_subreg. Don't use it if it is nullptr. PR middle-end/90773 * builtins.c (builtin_memset_gen_str): Don't use return from simplify_gen_subreg.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #6 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:985b3a6837dee7001e6b618f073ed74f0edf5787 commit r12-285-g985b3a6837dee7001e6b618f073ed74f0edf5787 Author: H.J. Lu Date: Mon Jun 10 09:57:15 2019 -0700 Generate offset adjusted operation for op_by_pieces operations Add an overlap_op_by_pieces_p target hook for op_by_pieces operations between two areas of memory to generate one offset adjusted operation in the smallest integer mode for the remaining bytes on the last piece operation of a memory region to avoid doing more than one smaller operations. Pass the RTL information from the previous iteration to m_constfn in op_by_pieces operation so that builtin_memset_[read|gen]_str can generate the new RTL from the previous RTL. Tested on Linux/x86-64. gcc/ PR middle-end/90773 * builtins.c (builtin_memcpy_read_str): Add a dummy argument. (builtin_strncpy_read_str): Likewise. (builtin_memset_read_str): Add an argument for the previous RTL information and generate the new RTL from the previous RTL info. (builtin_memset_gen_str): Likewise. * builtins.h (builtin_strncpy_read_str): Update the prototype. (builtin_memset_read_str): Likewise. * expr.c (by_pieces_ninsns): If targetm.overlap_op_by_pieces_p() returns true, round up size and alignment to the widest integer mode for maximum size. (pieces_addr::adjust): Add a pointer to by_pieces_prev argument and pass it to m_constfn. (op_by_pieces_d): Add m_push and m_overlap_op_by_pieces. (op_by_pieces_d::op_by_pieces_d): Add a bool argument to initialize m_push. Initialize m_overlap_op_by_pieces with targetm.overlap_op_by_pieces_p (). (op_by_pieces_d::run): Pass the previous RTL information to pieces_addr::adjust and generate overlapping operations if m_overlap_op_by_pieces is true. (PUSHG_P): New. (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d change. (store_by_pieces_d::store_by_pieces_d): Updated for op_by_pieces_d change. (can_store_by_pieces): Use by_pieces_constfn on constfun. (store_by_pieces): Use by_pieces_constfn on constfun. Updated for op_by_pieces_d change. (clear_by_pieces_1): Add a dummy argument. (clear_by_pieces): Updated for op_by_pieces_d change. (compare_by_pieces_d::compare_by_pieces_d): Likewise. (string_cst_read_str): Add a dummy argument. * expr.h (by_pieces_constfn): Add a dummy argument. (by_pieces_prev): New. * target.def (overlap_op_by_pieces_p): New target hook. * config/i386/i386.c (TARGET_OVERLAP_OP_BY_PIECES_P): New. * doc/tm.texi.in: Add TARGET_OVERLAP_OP_BY_PIECES_P. * doc/tm.texi: Regenerated. gcc/testsuite/ PR middle-end/90773 * g++.dg/pr90773-1.h: New test. * g++.dg/pr90773-1a.C: Likewise. * g++.dg/pr90773-1b.C: Likewise. * g++.dg/pr90773-1c.C: Likewise. * g++.dg/pr90773-1d.C: Likewise. * gcc.target/i386/pr90773-1.c: Likewise. * gcc.target/i386/pr90773-2.c: Likewise. * gcc.target/i386/pr90773-3.c: Likewise. * gcc.target/i386/pr90773-4.c: Likewise. * gcc.target/i386/pr90773-5.c: Likewise. * gcc.target/i386/pr90773-6.c: Likewise. * gcc.target/i386/pr90773-7.c: Likewise. * gcc.target/i386/pr90773-8.c: Likewise. * gcc.target/i386/pr90773-9.c: Likewise. * gcc.target/i386/pr90773-10.c: Likewise. * gcc.target/i386/pr90773-11.c: Likewise. * gcc.target/i386/pr90773-12.c: Likewise. * gcc.target/i386/pr90773-13.c: Likewise. * gcc.target/i386/pr90773-14.c: Likewise.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #5 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:3bb41228d76b3a3cbd9923d57388f0903f7683de commit r12-162-g3bb41228d76b3a3cbd9923d57388f0903f7683de Author: H.J. Lu Date: Mon Apr 26 15:36:18 2021 -0700 op_by_pieces_d::run: Change a while loop to a do-while loop Change a while loop in op_by_pieces_d::run to a do-while loop to prepare for offset adjusted operation for the remaining bytes on the last piece operation of a memory region. PR middle-end/90773 * expr.c (op_by_pieces_d::get_usable_mode): New member function. (op_by_pieces_d::run): Cange a while loop to a do-while loop.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 Jeffrey A. Law changed: What|Removed |Added CC||law at redhat dot com --- Comment #4 from Jeffrey A. Law --- But isn't -Os a better choice if we care about this? It gives more compact code. I guess I'm just not sure optimizing this using an overlapping store is all that important.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #3 from H.J. Lu --- Created attachment 46476 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46476&action=edit I am testing this patch.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 --- Comment #2 from H.J. Lu --- Something like this: diff --git a/gcc/expr.c b/gcc/expr.c index c78bc74c0d9..4412aa7518c 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1090,10 +1090,13 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load, void op_by_pieces_d::run () { + bool started = false; + while (m_max_size > 1 && m_len > 0) { scalar_int_mode mode = widest_int_mode_for_size (m_max_size); +repeat: if (prepare_mode (mode, m_align)) { unsigned int size = GET_MODE_SIZE (mode); @@ -1101,6 +1104,8 @@ op_by_pieces_d::run () while (m_len >= size) { + started = true; + if (m_reverse) m_offset -= size; @@ -1124,6 +1129,24 @@ op_by_pieces_d::run () finish_mode (mode); } + if (m_len == 0) + break; + + if (started) + { + mode = smallest_int_mode_for_size (m_len * BITS_PER_UNIT); + unsigned int last_gap = GET_MODE_SIZE (mode) - m_len; + if (last_gap) + { + if (m_reverse) + m_offset += last_gap; + else + m_offset -= last_gap; + m_len += last_gap; + goto repeat; + } + } + m_max_size = GET_MODE_SIZE (mode); } which should be opt-in by target.
[Bug middle-end/90773] Improve piecewise operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2019-06-07 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed. Note that both the unaligned load and the unaligned store may have issues with preceeding stores or following loads and store-to-load forwarding. I'm not sure the speculative forwarding of aligned parts will be retired successfully because of the aliased unaligned parts, even if they have the same value. But definitely an improvement for -Os. Probably needs some target tuning knobs if implemented in the middle-end block-copy machinery.