On November 3, 2017 8:17:30 PM GMT+01:00, Jakub Jelinek <ja...@redhat.com> wrote: >On Fri, Nov 03, 2017 at 03:04:18PM +0100, Jakub Jelinek wrote: >> single-use vs. multiple uses is something I've thought about, but >don't >> know whether it is better to require single-use or not (or sometimes, >> under some condition?). Say if we have: > >So, here is what I've committed in the end after >bootstrapping/regtesting >it on x86_64-linux and i686-linux, the only changes from the earlier >patch >were comments and addition of has_single_use checks. > >In those bootstraps/regtests, the number of integer_cst stores were >expectedly the same, and so were the number of bit_*_expr cases, but >it apparently matters a lot for the memory copying (rhs_code MEM_REF). >Without this patch new/orig stores: >16943 35369 >and with the patch: >12111 24911 >So, perhaps we'll need to do something smarter (approximate how many >original loads would be kept and how many new loads/stores we'd need to >add >to get rid of how many original stores). >Or allow multiple uses for the MEM_REF rhs_code only and for anything >else >require single use.
Probably interesting to look at the individual cases. But yes, it should be factored into the cost model somehow. It's possibly also increasing register pressure. Richard. >2017-11-03 Jakub Jelinek <ja...@redhat.com> > > PR tree-optimization/78821 > * gimple-ssa-store-merging.c: Update the file comment. > (MAX_STORE_ALIAS_CHECKS): Define. > (struct store_operand_info): New type. > (store_operand_info::store_operand_info): New constructor. > (struct store_immediate_info): Add rhs_code and ops data members. > (store_immediate_info::store_immediate_info): Add rhscode, op0r > and op1r arguments to the ctor, initialize corresponding data members. > (struct merged_store_group): Add load_align_base and load_align > data members. > (merged_store_group::merged_store_group): Initialize them. > (merged_store_group::do_merge): Update them. > (merged_store_group::apply_stores): Pick the constant for > encode_tree_to_bitpos from one of the two operands, or skip > encode_tree_to_bitpos if neither operand is a constant. > (class pass_store_merging): Add process_store method decl. Remove > bool argument from terminate_all_aliasing_chains method decl. > (pass_store_merging::terminate_all_aliasing_chains): Remove > var_offset_p argument and corresponding handling. > (stmts_may_clobber_ref_p): New function. > (compatible_load_p): New function. > (imm_store_chain_info::coalesce_immediate_stores): Terminate group > if there is overlap and rhs_code is not INTEGER_CST. For > non-overlapping stores terminate group if rhs is not mergeable. > (get_alias_type_for_stmts): Change first argument from > auto_vec<gimple *> & to vec<gimple *> &. Add IS_LOAD, CLIQUEP and > BASEP arguments. If IS_LOAD is true, look at rhs1 of the stmts > instead of lhs. Compute *CLIQUEP and *BASEP in addition to the > alias type. > (get_location_for_stmts): Change first argument from > auto_vec<gimple *> & to vec<gimple *> &. > (struct split_store): Remove orig_stmts data member, add orig_stores. > (split_store::split_store): Create orig_stores rather than orig_stmts. > (find_constituent_stmts): Renamed to ... > (find_constituent_stores): ... this. Change second argument from > vec<gimple *> * to vec<store_immediate_info *> *, push pointers > to info structures rather than the statements. > (split_group): Rename ALLOW_UNALIGNED argument to > ALLOW_UNALIGNED_STORE, add ALLOW_UNALIGNED_LOAD argument and handle > it. Adjust find_constituent_stores caller. > (imm_store_chain_info::output_merged_store): Handle rhs_code other > than INTEGER_CST, adjust split_group, get_alias_type_for_stmts and > get_location_for_stmts callers. Set MR_DEPENDENCE_CLIQUE and > MR_DEPENDENCE_BASE on the MEM_REFs if they are the same in all stores. > (mem_valid_for_store_merging): New function. > (handled_load): New function. > (pass_store_merging::process_store): New method. > (pass_store_merging::execute): Use process_store method. Adjust > terminate_all_aliasing_chains caller. > > * gcc.dg/store_merging_13.c: New test. > * gcc.dg/store_merging_14.c: New test. > >--- gcc/gimple-ssa-store-merging.c.jj 2017-11-03 15:37:02.869561500 >+0100 >+++ gcc/gimple-ssa-store-merging.c 2017-11-03 16:15:15.059282459 +0100 >@@ -19,7 +19,8 @@ > <http://www.gnu.org/licenses/>. */ > > /* The purpose of this pass is to combine multiple memory stores of >- constant values to consecutive memory locations into fewer wider >stores. >+ constant values, values loaded from memory or bitwise operations >+ on those to consecutive memory locations into fewer wider stores. > For example, if we have a sequence peforming four byte stores to > consecutive memory locations: > [p ] := imm1; >@@ -29,21 +30,49 @@ >we can transform this into a single 4-byte store if the target supports >it: >[p] := imm1:imm2:imm3:imm4 //concatenated immediates according to >endianness. > >+ Or: >+ [p ] := [q ]; >+ [p + 1B] := [q + 1B]; >+ [p + 2B] := [q + 2B]; >+ [p + 3B] := [q + 3B]; >+ if there is no overlap can be transformed into a single 4-byte >+ load followed by single 4-byte store. >+ >+ Or: >+ [p ] := [q ] ^ imm1; >+ [p + 1B] := [q + 1B] ^ imm2; >+ [p + 2B] := [q + 2B] ^ imm3; >+ [p + 3B] := [q + 3B] ^ imm4; >+ if there is no overlap can be transformed into a single 4-byte >+ load, xored with imm1:imm2:imm3:imm4 and stored using a single >4-byte store. >+ > The algorithm is applied to each basic block in three phases: > >- 1) Scan through the basic block recording constant assignments to >+ 1) Scan through the basic block recording assignments to >destinations that can be expressed as a store to memory of a certain >size >- at a certain bit offset. Record store chains to different bases in >a >- hash_map (m_stores) and make sure to terminate such chains when >appropriate >- (for example when when the stored values get used subsequently). >+ at a certain bit offset from expressions we can handle. For >bit-fields >+ we also note the surrounding bit region, bits that could be stored >in >+ a read-modify-write operation when storing the bit-field. Record >store >+ chains to different bases in a hash_map (m_stores) and make sure to >+ terminate such chains when appropriate (for example when when the >stored >+ values get used subsequently). >These stores can be a result of structure element initializers, array >stores > etc. A store_immediate_info object is recorded for every such store. > Record as many such assignments to a single base as possible until a > statement that interferes with the store sequence is encountered. >+ Each store has up to 2 operands, which can be an immediate constant >+ or a memory load, from which the value to be stored can be >computed. >+ At most one of the operands can be a constant. The operands are >recorded >+ in store_operand_info struct. > >2) Analyze the chain of stores recorded in phase 1) (i.e. the vector of > store_immediate_info objects) and coalesce contiguous stores into >- merged_store_group objects. >+ merged_store_group objects. For bit-fields stores, we don't need >to >+ require the stores to be contiguous, just their surrounding bit >regions >+ have to be contiguous. If the expression being stored is different >+ between adjacent stores, such as one store storing a constant and >+ following storing a value loaded from memory, or if the loaded >memory >+ objects are not adjacent, a new merged_store_group is created as >well. > > For example, given the stores: > [p ] := 0; >@@ -134,8 +163,35 @@ > #define MAX_STORE_BITSIZE (BITS_PER_WORD) > #define MAX_STORE_BYTES (MAX_STORE_BITSIZE / BITS_PER_UNIT) > >+/* Limit to bound the number of aliasing checks for loads with the >same >+ vuse as the corresponding store. */ >+#define MAX_STORE_ALIAS_CHECKS 64 >+ > namespace { > >+/* Struct recording one operand for the store, which is either a >constant, >+ then VAL represents the constant and all the other fields are zero, >+ or a memory load, then VAL represents the reference, BASE_ADDR is >non-NULL >+ and the other fields also reflect the memory load. */ >+ >+struct store_operand_info >+{ >+ tree val; >+ tree base_addr; >+ unsigned HOST_WIDE_INT bitsize; >+ unsigned HOST_WIDE_INT bitpos; >+ unsigned HOST_WIDE_INT bitregion_start; >+ unsigned HOST_WIDE_INT bitregion_end; >+ gimple *stmt; >+ store_operand_info (); >+}; >+ >+store_operand_info::store_operand_info () >+ : val (NULL_TREE), base_addr (NULL_TREE), bitsize (0), bitpos (0), >+ bitregion_start (0), bitregion_end (0), stmt (NULL) >+{ >+} >+ >/* Struct recording the information about a single store of an >immediate > to memory. These are created in the first phase and coalesced into > merged_store_group objects in the second phase. */ >@@ -149,9 +205,17 @@ struct store_immediate_info > unsigned HOST_WIDE_INT bitregion_end; > gimple *stmt; > unsigned int order; >+ /* INTEGER_CST for constant stores, MEM_REF for memory copy or >+ BIT_*_EXPR for logical bitwise operation. */ >+ enum tree_code rhs_code; >+ /* Operands. For BIT_*_EXPR rhs_code both operands are used, >otherwise >+ just the first one. */ >+ store_operand_info ops[2]; > store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, > unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, >- gimple *, unsigned int); >+ gimple *, unsigned int, enum tree_code, >+ const store_operand_info &, >+ const store_operand_info &); > }; > > store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs, >@@ -159,11 +223,22 @@ store_immediate_info::store_immediate_in > unsigned HOST_WIDE_INT brs, > unsigned HOST_WIDE_INT bre, > gimple *st, >- unsigned int ord) >+ unsigned int ord, >+ enum tree_code rhscode, >+ const store_operand_info &op0r, >+ const store_operand_info &op1r) >: bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end >(bre), >- stmt (st), order (ord) >+ stmt (st), order (ord), rhs_code (rhscode) >+#if __cplusplus >= 201103L >+ , ops { op0r, op1r } >+{ >+} >+#else > { >+ ops[0] = op0r; >+ ops[1] = op1r; > } >+#endif > >/* Struct representing a group of stores to contiguous memory >locations. >These are produced by the second phase (coalescing) and consumed in the >@@ -178,8 +253,10 @@ struct merged_store_group > /* The size of the allocated memory for val and mask. */ > unsigned HOST_WIDE_INT buf_size; > unsigned HOST_WIDE_INT align_base; >+ unsigned HOST_WIDE_INT load_align_base[2]; > > unsigned int align; >+ unsigned int load_align[2]; > unsigned int first_order; > unsigned int last_order; > >@@ -576,6 +653,20 @@ merged_store_group::merged_store_group ( > get_object_alignment_1 (gimple_assign_lhs (info->stmt), > &align, &align_bitpos); > align_base = start - align_bitpos; >+ for (int i = 0; i < 2; ++i) >+ { >+ store_operand_info &op = info->ops[i]; >+ if (op.base_addr == NULL_TREE) >+ { >+ load_align[i] = 0; >+ load_align_base[i] = 0; >+ } >+ else >+ { >+ get_object_alignment_1 (op.val, &load_align[i], &align_bitpos); >+ load_align_base[i] = op.bitpos - align_bitpos; >+ } >+ } > stores.create (1); > stores.safe_push (info); > last_stmt = info->stmt; >@@ -608,6 +699,19 @@ merged_store_group::do_merge (store_imme > align = this_align; > align_base = info->bitpos - align_bitpos; > } >+ for (int i = 0; i < 2; ++i) >+ { >+ store_operand_info &op = info->ops[i]; >+ if (!op.base_addr) >+ continue; >+ >+ get_object_alignment_1 (op.val, &this_align, &align_bitpos); >+ if (this_align > load_align[i]) >+ { >+ load_align[i] = this_align; >+ load_align_base[i] = op.bitpos - align_bitpos; >+ } >+ } > > gimple *stmt = info->stmt; > stores.safe_push (info); >@@ -682,16 +786,21 @@ merged_store_group::apply_stores () > FOR_EACH_VEC_ELT (stores, i, info) > { > unsigned int pos_in_buffer = info->bitpos - bitregion_start; >- bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 >(info->stmt), >- val, info->bitsize, >- pos_in_buffer, buf_size); >- if (dump_file && (dump_flags & TDF_DETAILS)) >+ tree cst = NULL_TREE; >+ if (info->ops[0].val && info->ops[0].base_addr == NULL_TREE) >+ cst = info->ops[0].val; >+ else if (info->ops[1].val && info->ops[1].base_addr == >NULL_TREE) >+ cst = info->ops[1].val; >+ bool ret = true; >+ if (cst) >+ ret = encode_tree_to_bitpos (cst, val, info->bitsize, >+ pos_in_buffer, buf_size); >+ if (cst && dump_file && (dump_flags & TDF_DETAILS)) > { > if (ret) > { > fprintf (dump_file, "After writing "); >- print_generic_expr (dump_file, >- gimple_assign_rhs1 (info->stmt), 0); >+ print_generic_expr (dump_file, cst, 0); > fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC > " at position %d the merged region contains:\n", > info->bitsize, pos_in_buffer); >@@ -799,9 +908,10 @@ private: > decisions when going out of SSA). */ > imm_store_chain_info *m_stores_head; > >+ void process_store (gimple *); > bool terminate_and_process_all_chains (); > bool terminate_all_aliasing_chains (imm_store_chain_info **, >- bool, gimple *); >+ gimple *); > bool terminate_and_release_chain (imm_store_chain_info *); > }; // class pass_store_merging > >@@ -831,7 +941,6 @@ pass_store_merging::terminate_and_proces > bool >pass_store_merging::terminate_all_aliasing_chains (imm_store_chain_info > **chain_info, >- bool var_offset_p, > gimple *stmt) > { > bool ret = false; >@@ -845,37 +954,21 @@ pass_store_merging::terminate_all_aliasi > of a chain. */ > if (chain_info) > { >- /* We have a chain at BASE and we're writing to [BASE + ><variable>]. >- This can interfere with any of the stores so terminate >- the chain. */ >- if (var_offset_p) >- { >- terminate_and_release_chain (*chain_info); >- ret = true; >- } >- /* Otherwise go through every store in the chain to see if it >- aliases with any of them. */ >- else >+ store_immediate_info *info; >+ unsigned int i; >+ FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) > { >- store_immediate_info *info; >- unsigned int i; >- FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) >+ if (ref_maybe_used_by_stmt_p (stmt, gimple_assign_lhs (info->stmt)) >+ || stmt_may_clobber_ref_p (stmt, gimple_assign_lhs >(info->stmt))) > { >- if (ref_maybe_used_by_stmt_p (stmt, >- gimple_assign_lhs (info->stmt)) >- || stmt_may_clobber_ref_p (stmt, >- gimple_assign_lhs (info->stmt))) >+ if (dump_file && (dump_flags & TDF_DETAILS)) > { >- if (dump_file && (dump_flags & TDF_DETAILS)) >- { >- fprintf (dump_file, >- "stmt causes chain termination:\n"); >- print_gimple_stmt (dump_file, stmt, 0); >- } >- terminate_and_release_chain (*chain_info); >- ret = true; >- break; >+ fprintf (dump_file, "stmt causes chain termination:\n"); >+ print_gimple_stmt (dump_file, stmt, 0); > } >+ terminate_and_release_chain (*chain_info); >+ ret = true; >+ break; > } > } > } >@@ -920,6 +1013,125 @@ pass_store_merging::terminate_and_releas > return ret; > } > >+/* Return true if stmts in between FIRST (inclusive) and LAST >(exclusive) >+ may clobber REF. FIRST and LAST must be in the same basic block >and >+ have non-NULL vdef. */ >+ >+bool >+stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref) >+{ >+ ao_ref r; >+ ao_ref_init (&r, ref); >+ unsigned int count = 0; >+ tree vop = gimple_vdef (last); >+ gimple *stmt; >+ >+ gcc_checking_assert (gimple_bb (first) == gimple_bb (last)); >+ do >+ { >+ stmt = SSA_NAME_DEF_STMT (vop); >+ if (stmt_may_clobber_ref_p_1 (stmt, &r)) >+ return true; >+ /* Avoid quadratic compile time by bounding the number of checks >+ we perform. */ >+ if (++count > MAX_STORE_ALIAS_CHECKS) >+ return true; >+ vop = gimple_vuse (stmt); >+ } >+ while (stmt != first); >+ return false; >+} >+ >+/* Return true if INFO->ops[IDX] is mergeable with the >+ corresponding loads already in MERGED_STORE group. >+ BASE_ADDR is the base address of the whole store group. */ >+ >+bool >+compatible_load_p (merged_store_group *merged_store, >+ store_immediate_info *info, >+ tree base_addr, int idx) >+{ >+ store_immediate_info *infof = merged_store->stores[0]; >+ if (!info->ops[idx].base_addr >+ || (info->ops[idx].bitpos - infof->ops[idx].bitpos >+ != info->bitpos - infof->bitpos) >+ || !operand_equal_p (info->ops[idx].base_addr, >+ infof->ops[idx].base_addr, 0)) >+ return false; >+ >+ store_immediate_info *infol = merged_store->stores.last (); >+ tree load_vuse = gimple_vuse (info->ops[idx].stmt); >+ /* In this case all vuses should be the same, e.g. >+ _1 = s.a; _2 = s.b; _3 = _1 | 1; t.a = _3; _4 = _2 | 2; t.b = _4; >+ or >+ _1 = s.a; _2 = s.b; t.a = _1; t.b = _2; >+ and we can emit the coalesced load next to any of those loads. >*/ >+ if (gimple_vuse (infof->ops[idx].stmt) == load_vuse >+ && gimple_vuse (infol->ops[idx].stmt) == load_vuse) >+ return true; >+ >+ /* Otherwise, at least for now require that the load has the same >+ vuse as the store. See following examples. */ >+ if (gimple_vuse (info->stmt) != load_vuse) >+ return false; >+ >+ if (gimple_vuse (infof->stmt) != gimple_vuse (infof->ops[idx].stmt) >+ || (infof != infol >+ && gimple_vuse (infol->stmt) != gimple_vuse >(infol->ops[idx].stmt))) >+ return false; >+ >+ /* If the load is from the same location as the store, already >+ the construction of the immediate chain info guarantees no >intervening >+ stores, so no further checks are needed. Example: >+ _1 = s.a; _2 = _1 & -7; s.a = _2; _3 = s.b; _4 = _3 & -7; s.b = >_4; */ >+ if (info->ops[idx].bitpos == info->bitpos >+ && operand_equal_p (info->ops[idx].base_addr, base_addr, 0)) >+ return true; >+ >+ /* Otherwise, we need to punt if any of the loads can be clobbered >by any >+ of the stores in the group, or any other stores in between those. >+ Previous calls to compatible_load_p ensured that for all the >+ merged_store->stores IDX loads, no stmts starting with >+ merged_store->first_stmt and ending right before >merged_store->last_stmt >+ clobbers those loads. */ >+ gimple *first = merged_store->first_stmt; >+ gimple *last = merged_store->last_stmt; >+ unsigned int i; >+ store_immediate_info *infoc; >+ /* The stores are sorted by increasing store bitpos, so if >info->stmt store >+ comes before the so far first load, we'll be changing >+ merged_store->first_stmt. In that case we need to give up if >+ any of the earlier processed loads clobber with the stmts in the >new >+ range. */ >+ if (info->order < merged_store->first_order) >+ { >+ FOR_EACH_VEC_ELT (merged_store->stores, i, infoc) >+ if (stmts_may_clobber_ref_p (info->stmt, first, infoc->ops[idx].val)) >+ return false; >+ first = info->stmt; >+ } >+ /* Similarly, we could change merged_store->last_stmt, so ensure >+ in that case no stmts in the new range clobber any of the earlier >+ processed loads. */ >+ else if (info->order > merged_store->last_order) >+ { >+ FOR_EACH_VEC_ELT (merged_store->stores, i, infoc) >+ if (stmts_may_clobber_ref_p (last, info->stmt, infoc->ops[idx].val)) >+ return false; >+ last = info->stmt; >+ } >+ /* And finally, we'd be adding a new load to the set, ensure it >isn't >+ clobbered in the new range. */ >+ if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val)) >+ return false; >+ >+ /* Otherwise, we are looking for: >+ _1 = s.a; _2 = _1 ^ 15; t.a = _2; _3 = s.b; _4 = _3 ^ 15; t.b = >_4; >+ or >+ _1 = s.a; t.a = _1; _2 = s.b; t.b = _2; */ >+ return true; >+} >+ >/* Go through the candidate stores recorded in m_store_info and merge >them > into merged_store_group objects recorded into m_merged_store_groups >representing the widened stores. Return true if coalescing was >successful >@@ -967,32 +1179,56 @@ imm_store_chain_info::coalesce_immediate > if (IN_RANGE (start, merged_store->start, > merged_store->start + merged_store->width - 1)) > { >- merged_store->merge_overlapping (info); >- continue; >+ /* Only allow overlapping stores of constants. */ >+ if (info->rhs_code == INTEGER_CST >+ && merged_store->stores[0]->rhs_code == INTEGER_CST) >+ { >+ merged_store->merge_overlapping (info); >+ continue; >+ } >+ } >+ /* |---store 1---||---store 2---| >+ This store is consecutive to the previous one. >+ Merge it into the current store group. There can be gaps in between >+ the stores, but there can't be gaps in between bitregions. */ >+ else if (info->bitregion_start <= merged_store->bitregion_end >+ && info->rhs_code == merged_store->stores[0]->rhs_code) >+ { >+ store_immediate_info *infof = merged_store->stores[0]; >+ >+ /* All the rhs_code ops that take 2 operands are commutative, >+ swap the operands if it could make the operands compatible. */ >+ if (infof->ops[0].base_addr >+ && infof->ops[1].base_addr >+ && info->ops[0].base_addr >+ && info->ops[1].base_addr >+ && (info->ops[1].bitpos - infof->ops[0].bitpos >+ == info->bitpos - infof->bitpos) >+ && operand_equal_p (info->ops[1].base_addr, >+ infof->ops[0].base_addr, 0)) >+ std::swap (info->ops[0], info->ops[1]); >+ if ((!infof->ops[0].base_addr >+ || compatible_load_p (merged_store, info, base_addr, 0)) >+ && (!infof->ops[1].base_addr >+ || compatible_load_p (merged_store, info, base_addr, 1))) >+ { >+ merged_store->merge_into (info); >+ continue; >+ } > } > > /* |---store 1---| <gap> |---store 2---|. >- Gap between stores. Start a new group if there are any gaps >- between bitregions. */ >- if (info->bitregion_start > merged_store->bitregion_end) >- { >- /* Try to apply all the stores recorded for the group to determine >- the bitpattern they write and discard it if that fails. >- This will also reject single-store groups. */ >- if (!merged_store->apply_stores ()) >- delete merged_store; >- else >- m_merged_store_groups.safe_push (merged_store); >- >- merged_store = new merged_store_group (info); >+ Gap between stores or the rhs not compatible. Start a new group. >*/ > >- continue; >- } >+ /* Try to apply all the stores recorded for the group to >determine >+ the bitpattern they write and discard it if that fails. >+ This will also reject single-store groups. */ >+ if (!merged_store->apply_stores ()) >+ delete merged_store; >+ else >+ m_merged_store_groups.safe_push (merged_store); > >- /* |---store 1---||---store 2---| >- This store is consecutive to the previous one. >- Merge it into the current store group. */ >- merged_store->merge_into (info); >+ merged_store = new merged_store_group (info); > } > > /* Record or discard the last store group. */ >@@ -1014,35 +1250,57 @@ imm_store_chain_info::coalesce_immediate > return success; > } > >-/* Return the type to use for the merged stores described by STMTS. >- This is needed to get the alias sets right. */ >+/* Return the type to use for the merged stores or loads described by >STMTS. >+ This is needed to get the alias sets right. If IS_LOAD, look for >rhs, >+ otherwise lhs. Additionally set *CLIQUEP and *BASEP to >MR_DEPENDENCE_* >+ of the MEM_REFs if any. */ > > static tree >-get_alias_type_for_stmts (auto_vec<gimple *> &stmts) >+get_alias_type_for_stmts (vec<gimple *> &stmts, bool is_load, >+ unsigned short *cliquep, unsigned short *basep) > { > gimple *stmt; > unsigned int i; >- tree lhs = gimple_assign_lhs (stmts[0]); >- tree type = reference_alias_ptr_type (lhs); >+ tree type = NULL_TREE; >+ tree ret = NULL_TREE; >+ *cliquep = 0; >+ *basep = 0; > > FOR_EACH_VEC_ELT (stmts, i, stmt) > { >- if (i == 0) >- continue; >+ tree ref = is_load ? gimple_assign_rhs1 (stmt) >+ : gimple_assign_lhs (stmt); >+ tree type1 = reference_alias_ptr_type (ref); >+ tree base = get_base_address (ref); > >- lhs = gimple_assign_lhs (stmt); >- tree type1 = reference_alias_ptr_type (lhs); >+ if (i == 0) >+ { >+ if (TREE_CODE (base) == MEM_REF) >+ { >+ *cliquep = MR_DEPENDENCE_CLIQUE (base); >+ *basep = MR_DEPENDENCE_BASE (base); >+ } >+ ret = type = type1; >+ continue; >+ } > if (!alias_ptr_types_compatible_p (type, type1)) >- return ptr_type_node; >+ ret = ptr_type_node; >+ if (TREE_CODE (base) != MEM_REF >+ || *cliquep != MR_DEPENDENCE_CLIQUE (base) >+ || *basep != MR_DEPENDENCE_BASE (base)) >+ { >+ *cliquep = 0; >+ *basep = 0; >+ } > } >- return type; >+ return ret; > } > > /* Return the location_t information we can find among the statements > in STMTS. */ > > static location_t >-get_location_for_stmts (auto_vec<gimple *> &stmts) >+get_location_for_stmts (vec<gimple *> &stmts) > { > gimple *stmt; > unsigned int i; >@@ -1062,7 +1320,7 @@ struct split_store > unsigned HOST_WIDE_INT bytepos; > unsigned HOST_WIDE_INT size; > unsigned HOST_WIDE_INT align; >- auto_vec<gimple *> orig_stmts; >+ auto_vec<store_immediate_info *> orig_stores; >/* True if there is a single orig stmt covering the whole split store. >*/ > bool orig; > split_store (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, >@@ -1076,21 +1334,20 @@ split_store::split_store (unsigned HOST_ > unsigned HOST_WIDE_INT al) > : bytepos (bp), size (sz), align (al), orig (false) > { >- orig_stmts.create (0); >+ orig_stores.create (0); > } > >-/* Record all statements corresponding to stores in GROUP that write >to >- the region starting at BITPOS and is of size BITSIZE. Record such >- statements in STMTS if non-NULL. The stores in GROUP must be >sorted by >- bitposition. Return INFO if there is exactly one original store >- in the range. */ >+/* Record all stores in GROUP that write to the region starting at >BITPOS and >+ is of size BITSIZE. Record infos for such statements in STORES if >+ non-NULL. The stores in GROUP must be sorted by bitposition. >Return INFO >+ if there is exactly one original store in the range. */ > > static store_immediate_info * >-find_constituent_stmts (struct merged_store_group *group, >- vec<gimple *> *stmts, >- unsigned int *first, >- unsigned HOST_WIDE_INT bitpos, >- unsigned HOST_WIDE_INT bitsize) >+find_constituent_stores (struct merged_store_group *group, >+ vec<store_immediate_info *> *stores, >+ unsigned int *first, >+ unsigned HOST_WIDE_INT bitpos, >+ unsigned HOST_WIDE_INT bitsize) > { > store_immediate_info *info, *ret = NULL; > unsigned int i; >@@ -1119,9 +1376,9 @@ find_constituent_stmts (struct merged_st > if (stmt_start >= end) > return ret; > >- if (stmts) >+ if (stores) > { >- stmts->safe_push (info->stmt); >+ stores->safe_push (info); > if (ret) > { > ret = NULL; >@@ -1143,11 +1400,14 @@ find_constituent_stmts (struct merged_st > This is to separate the splitting strategy from the statement > building/emission/linking done in output_merged_store. > Return number of new stores. >+ If ALLOW_UNALIGNED_STORE is false, then all stores must be aligned. >+ If ALLOW_UNALIGNED_LOAD is false, then all loads must be aligned. > If SPLIT_STORES is NULL, it is just a dry run to count number of > new stores. */ > > static unsigned int >-split_group (merged_store_group *group, bool allow_unaligned, >+split_group (merged_store_group *group, bool allow_unaligned_store, >+ bool allow_unaligned_load, > vec<struct split_store *> *split_stores) > { > unsigned HOST_WIDE_INT pos = group->bitregion_start; >@@ -1155,6 +1415,7 @@ split_group (merged_store_group *group, > unsigned HOST_WIDE_INT bytepos = pos / BITS_PER_UNIT; > unsigned HOST_WIDE_INT group_align = group->align; > unsigned HOST_WIDE_INT align_base = group->align_base; >+ unsigned HOST_WIDE_INT group_load_align = group_align; > >gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0)); > >@@ -1162,9 +1423,14 @@ split_group (merged_store_group *group, > unsigned HOST_WIDE_INT try_pos = bytepos; > group->stores.qsort (sort_by_bitpos); > >+ if (!allow_unaligned_load) >+ for (int i = 0; i < 2; ++i) >+ if (group->load_align[i]) >+ group_load_align = MIN (group_load_align, group->load_align[i]); >+ > while (size > 0) > { >- if ((allow_unaligned || group_align <= BITS_PER_UNIT) >+ if ((allow_unaligned_store || group_align <= BITS_PER_UNIT) > && group->mask[try_pos - bytepos] == (unsigned char) ~0U) > { > /* Skip padding bytes. */ >@@ -1180,10 +1446,34 @@ split_group (merged_store_group *group, > unsigned HOST_WIDE_INT align = group_align; > if (align_bitpos) > align = least_bit_hwi (align_bitpos); >- if (!allow_unaligned) >+ if (!allow_unaligned_store) > try_size = MIN (try_size, align); >+ if (!allow_unaligned_load) >+ { >+ /* If we can't do or don't want to do unaligned stores >+ as well as loads, we need to take the loads into account >+ as well. */ >+ unsigned HOST_WIDE_INT load_align = group_load_align; >+ align_bitpos = (try_bitpos - align_base) & (load_align - 1); >+ if (align_bitpos) >+ load_align = least_bit_hwi (align_bitpos); >+ for (int i = 0; i < 2; ++i) >+ if (group->load_align[i]) >+ { >+ align_bitpos = try_bitpos - group->stores[0]->bitpos; >+ align_bitpos += group->stores[0]->ops[i].bitpos; >+ align_bitpos -= group->load_align_base[i]; >+ align_bitpos &= (group_load_align - 1); >+ if (align_bitpos) >+ { >+ unsigned HOST_WIDE_INT a = least_bit_hwi (align_bitpos); >+ load_align = MIN (load_align, a); >+ } >+ } >+ try_size = MIN (try_size, load_align); >+ } > store_immediate_info *info >- = find_constituent_stmts (group, NULL, &first, try_bitpos, try_size); >+ = find_constituent_stores (group, NULL, &first, try_bitpos, >try_size); > if (info) > { > /* If there is just one original statement for the range, see if >@@ -1191,8 +1481,8 @@ split_group (merged_store_group *group, > than try_size. */ > unsigned HOST_WIDE_INT stmt_end > = ROUND_UP (info->bitpos + info->bitsize, BITS_PER_UNIT); >- info = find_constituent_stmts (group, NULL, &first, try_bitpos, >- stmt_end - try_bitpos); >+ info = find_constituent_stores (group, NULL, &first, try_bitpos, >+ stmt_end - try_bitpos); > if (info && info->bitpos >= try_bitpos) > { > try_size = stmt_end - try_bitpos; >@@ -1221,7 +1511,7 @@ split_group (merged_store_group *group, > nonmasked *= BITS_PER_UNIT; > while (nonmasked <= try_size / 2) > try_size /= 2; >- if (!allow_unaligned && group_align > BITS_PER_UNIT) >+ if (!allow_unaligned_store && group_align > BITS_PER_UNIT) > { > /* Now look for whole padding bytes at the start of that bitsize. >*/ > unsigned int try_bytesize = try_size / BITS_PER_UNIT, masked; >@@ -1252,8 +1542,8 @@ split_group (merged_store_group *group, > { > struct split_store *store > = new split_store (try_pos, try_size, align); >- info = find_constituent_stmts (group, &store->orig_stmts, >- &first, try_bitpos, try_size); >+ info = find_constituent_stores (group, &store->orig_stores, >+ &first, try_bitpos, try_size); > if (info > && info->bitpos >= try_bitpos > && info->bitpos + info->bitsize <= try_bitpos + try_size) >@@ -1288,19 +1578,23 @@ imm_store_chain_info::output_merged_stor > > auto_vec<struct split_store *, 32> split_stores; > split_stores.create (0); >- bool allow_unaligned >+ bool allow_unaligned_store >= !STRICT_ALIGNMENT && PARAM_VALUE >(PARAM_STORE_MERGING_ALLOW_UNALIGNED); >- if (allow_unaligned) >+ bool allow_unaligned_load = allow_unaligned_store; >+ if (allow_unaligned_store) > { > /* If unaligned stores are allowed, see how many stores we'd emit > for unaligned and how many stores we'd emit for aligned stores. > Only use unaligned stores if it allows fewer stores than aligned. */ >- unsigned aligned_cnt = split_group (group, false, NULL); >- unsigned unaligned_cnt = split_group (group, true, NULL); >+ unsigned aligned_cnt >+ = split_group (group, false, allow_unaligned_load, NULL); >+ unsigned unaligned_cnt >+ = split_group (group, true,