On Fri, 3 Nov 2017, Jakub Jelinek wrote: > On Fri, Nov 03, 2017 at 02:14:39PM +0100, Richard Biener wrote: > > > +/* Return true if stmts in between FIRST (inclusive) and LAST (exclusive) > > > + may clobber REF. FIRST and LAST must be in the same basic block and > > > + have non-NULL vdef. */ > > > + > > > +bool > > > +stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref) > > > +{ > > > + ao_ref r; > > > + ao_ref_init (&r, ref); > > > + unsigned int count = 0; > > > + tree vop = gimple_vdef (last); > > > + gimple *stmt; > > > + > > > + gcc_checking_assert (gimple_bb (first) == gimple_bb (last)); > > > > EBB would probably work as well, thus we should assert we do not > > end up visiting a PHI in the loop? > > For a general purpose routine sure, this one is in anonymous namespace > and meant for use in this pass. And there it is only checking stores from > the same store group and any other stores intermixed between those. > The pass at least right now is resetting all of its state at the end of > basic blocks, so gimple_bb (first) == gimple_bb (last) is indeed always > guaranteed. If we ever extend it such that we don't have this guarantee, > then this assert would fail and then of course it should be adjusted to > handle whatever is needed. But do we need to do that right now?
No, we don't. Just wondered about the assert and the real limitation of the implementation. > Note extending store-merging to handle groups of stores within EBB > doesn't look useful, then not all stores would be unconditional. Yes. > What could make sense is if we have e.g. a diamond > | > bb1 > / \ > bb2 bb3 > \ / > bb4 > | > and stores are in bb1 and bb4 and no stores in bb2 or bb3 can alias > with those. But then we'd likely need full-blown walk_aliased_vdefs > for this... Yes. > > > + gimple *first = merged_store->first_stmt; > > > + gimple *last = merged_store->last_stmt; > > > + unsigned int i; > > > + store_immediate_info *infoc; > > > + if (info->order < merged_store->first_order) > > > + { > > > + FOR_EACH_VEC_ELT (merged_store->stores, i, infoc) > > > + if (stmts_may_clobber_ref_p (info->stmt, first, infoc->ops[idx].val)) > > > + return false; > > > + first = info->stmt; > > > + } > > > + else if (info->order > merged_store->last_order) > > > + { > > > + FOR_EACH_VEC_ELT (merged_store->stores, i, infoc) > > > + if (stmts_may_clobber_ref_p (last, info->stmt, infoc->ops[idx].val)) > > > + return false; > > > + last = info->stmt; > > > + } > > > + if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val)) > > > + return false; > > > > Can you comment on what you check in this block? It first checks > > all stmts (but not info->stmt itself if it is after last!?) > > against > > all stores that would be added when adding 'info'. Then it checks > > from new first to last against the newly added stmt (again > > excluding that stmt if it was added last). > > The stmts_may_clobber_ref_p routine doesn't check aliasing on the last > stmt, only on the first stmt and stmts in between. > > Previous iterations have checked FOR_EACH_VEC_ELT (merged_store->stores, i, > infoc) > that merged_store->first_stmt and stmts in between that and > merged_store->last_stmt don't clobber any of the infoc->ops[idx].val > references and we want to maintain that invariant if we add another store to > the group. So, if we are about to extend the range between first_stmt and > last_stmt, then we need to check all the earlier refs on the stmts we've > added to the range. Note that the stores are sorted by bitpos, not by > their order within the basic block at this point, so it is possible that a > store with a higher bitpos extends to earlier stmts or later stmts. > > And finally the if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val)) > is checking the reference we are adding against the whole new range. > > > > + if (offset != NULL_TREE) > > > + { > > > + /* If the access is variable offset then a base decl has to be > > > + address-taken to be able to emit pointer-based stores to it. > > > + ??? We might be able to get away with re-using the original > > > + base up to the first variable part and then wrapping that inside > > > + a BIT_FIELD_REF. */ > > > > Yes, that's what I'd generally recommend... OTOH it can get quite > > fugly but it only has to survive until RTL expansion... > > This is an preexisting comment I've just moved around from the > pass_store_merging::execute method into a helper function (it grew too much > and needed too big indentation and furthermore I wanted to use it for the > loads too). Haven't really changed anything on that. > > > As extra sanity check I'd rather have that all refs share a common > > base (operand-equal-p'ish). But I guess that's what usually will > > happen anyways. The alias-ptr-type trick will be tricky to do > > here as well (you have to go down to the base MEM_REF, wrap > > a decl if there's no MEM_REF and adjust the offset type). > > For the aliasing, I have an untested incremental patch, need to finish > testcases for that, then test and then I can post it. > > > given the style of processing we can end up doing more than > > necessary work when following ! single-use chains here, no? > > Would it make sense to restrict this to single-uses or do we > > handle any case of ! single-uses? When extending things to > > allow an intermediate swap or general permute we could handle > > a byte/word-splat for example. > > single-use vs. multiple uses is something I've thought about, but don't > know whether it is better to require single-use or not (or sometimes, > under some condition?). Say if we have: > _1 = t.a; > _2 = t.b; > _3 = t.c; > _4 = t.d; > s.a = _1; > s.b = _2; > s.c = _3; > s.d = _4; > use (_1, _2, _3, _4); > it will likely be shorter and maybe faster if we do: > _1 = t.a; > _2 = t.b; > _3 = t.c; > _4 = t.d; > _5 = load[t.a...t.d]; > store[s.a...s.d] = _5; > use (_1, _2, _3, _4); > If there are just 2 stores combined together, having > _1 = t.a; _2 = t.b; _5 = load[t.a...t.b]; store[t.a...t.b] = _5; > use (_1, _2); > will be as many stmts as before. And if there is BIT_*_EXPR, we can be > adding not just loads, but also the bitwise ops. > > So, if you want, I can at least for now add has_single_use checks > in all the spots where it follows SSA_NAME_DEF_STMT. > > > Otherwise the patch looks good. Quite some parts of the changes > > seem to be due to splitting out stuff into functions and refactoring. > > Indeed, the mem_valid_for_store_merging and pass_store_merging::process_store > functions are mostly code move + reindent + tweaks. > > Here are hand made diffs between old portions of pass_store_merging::execute > corresponding to the above 2 functions and those new functions with -upbd > if it is of any help. > > @@ -1,33 +1,36 @@ > - HOST_WIDE_INT bitsize, bitpos; > +static tree > +mem_valid_for_store_merging (tree mem, unsigned HOST_WIDE_INT *pbitsize, > + unsigned HOST_WIDE_INT *pbitpos, > + unsigned HOST_WIDE_INT *pbitregion_start, > + unsigned HOST_WIDE_INT *pbitregion_end) > +{ > + HOST_WIDE_INT bitsize; > + HOST_WIDE_INT bitpos; > unsigned HOST_WIDE_INT bitregion_start = 0; > unsigned HOST_WIDE_INT bitregion_end = 0; > machine_mode mode; > int unsignedp = 0, reversep = 0, volatilep = 0; > - tree offset, base_addr; > - base_addr > - = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode, > + tree offset; > + tree base_addr = get_inner_reference (mem, &bitsize, &bitpos, &offset, > &mode, > &unsignedp, &reversep, &volatilep); > - if (TREE_CODE (lhs) == COMPONENT_REF > - && DECL_BIT_FIELD_TYPE (TREE_OPERAND (lhs, 1))) > + *pbitsize = bitsize; > + if (bitsize == 0) > + return NULL_TREE; > + > + if (TREE_CODE (mem) == COMPONENT_REF > + && DECL_BIT_FIELD_TYPE (TREE_OPERAND (mem, 1))) > { > - get_bit_range (&bitregion_start, &bitregion_end, lhs, > - &bitpos, &offset); > + get_bit_range (&bitregion_start, &bitregion_end, mem, &bitpos, > &offset); > if (bitregion_end) > ++bitregion_end; > } > - if (bitsize == 0) > - continue; > > - /* As a future enhancement we could handle stores with the same > - base and offset. */ > - bool invalid = reversep > - || ((bitsize > MAX_BITSIZE_MODE_ANY_INT) > - && (TREE_CODE (rhs) != INTEGER_CST)) > - || !rhs_valid_for_store_merging_p (rhs); > + if (reversep) > + return NULL_TREE; > > /* We do not want to rewrite TARGET_MEM_REFs. */ > if (TREE_CODE (base_addr) == TARGET_MEM_REF) > - invalid = true; > + return NULL_TREE; > /* In some cases get_inner_reference may return a > MEM_REF [ptr + byteoffset]. For the purposes of this pass > canonicalize the base_addr to MEM_REF [ptr] and take > @@ -60,7 +63,7 @@ > } > } > else > - invalid = true; > + return NULL_TREE; > base_addr = TREE_OPERAND (base_addr, 0); > } > /* get_inner_reference returns the base object, get at its > @@ -68,7 +71,7 @@ > else > { > if (bitpos < 0) > - invalid = true; > + return NULL_TREE; > base_addr = build_fold_addr_expr (base_addr); > } > > @@ -78,23 +81,25 @@ > bitregion_end = ROUND_UP (bitpos + bitsize, BITS_PER_UNIT); > } > > - if (! invalid > - && offset != NULL_TREE) > + if (offset != NULL_TREE) > { > - /* If the access is variable offset then a base > - decl has to be address-taken to be able to > - emit pointer-based stores to it. > - ??? We might be able to get away with > - re-using the original base up to the first > - variable part and then wrapping that inside > + /* If the access is variable offset then a base decl has to be > + address-taken to be able to emit pointer-based stores to it. > + ??? We might be able to get away with re-using the original > + base up to the first variable part and then wrapping that inside > a BIT_FIELD_REF. */ > tree base = get_base_address (base_addr); > if (! base > - || (DECL_P (base) > - && ! TREE_ADDRESSABLE (base))) > - invalid = true; > - else > - base_addr = build2 (POINTER_PLUS_EXPR, > - TREE_TYPE (base_addr), > + || (DECL_P (base) && ! TREE_ADDRESSABLE (base))) > + return NULL_TREE; > + > + base_addr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (base_addr), > base_addr, offset); > } > + > + *pbitsize = bitsize; > + *pbitpos = bitpos; > + *pbitregion_start = bitregion_start; > + *pbitregion_end = bitregion_end; > + return base_addr; > +} > > ------ > > @@ -1,71 +1,129 @@ > +void > +pass_store_merging::process_store (gimple *stmt) > +{ > tree lhs = gimple_assign_lhs (stmt); > tree rhs = gimple_assign_rhs1 (stmt); > + unsigned HOST_WIDE_INT bitsize, bitpos; > + unsigned HOST_WIDE_INT bitregion_start; > + unsigned HOST_WIDE_INT bitregion_end; > + tree base_addr > + = mem_valid_for_store_merging (lhs, &bitsize, &bitpos, > + &bitregion_start, &bitregion_end); > + if (bitsize == 0) > + return; > > - /* As a future enhancement we could handle stores with the same > - base and offset. */ > - bool invalid = reversep > + bool invalid = (base_addr == NULL_TREE > || ((bitsize > MAX_BITSIZE_MODE_ANY_INT) > - && (TREE_CODE (rhs) != INTEGER_CST)) > - || !rhs_valid_for_store_merging_p (rhs); > + && (TREE_CODE (rhs) != INTEGER_CST))); > + enum tree_code rhs_code = ERROR_MARK; > + store_operand_info ops[2]; > + if (invalid) > + ; > + else if (rhs_valid_for_store_merging_p (rhs)) > + { > + rhs_code = INTEGER_CST; > + ops[0].val = rhs; > + } > + else if (TREE_CODE (rhs) != SSA_NAME) > + invalid = true; > + else > + { > + gimple *def_stmt = SSA_NAME_DEF_STMT (rhs), *def_stmt1, *def_stmt2; > + if (!is_gimple_assign (def_stmt)) > + invalid = true; > + else if (handled_load (def_stmt, &ops[0], bitsize, bitpos, > + bitregion_start, bitregion_end)) > + rhs_code = MEM_REF; > + else > + switch ((rhs_code = gimple_assign_rhs_code (def_stmt))) > + { > + case BIT_AND_EXPR: > + case BIT_IOR_EXPR: > + case BIT_XOR_EXPR: > + tree rhs1, rhs2; > + rhs1 = gimple_assign_rhs1 (def_stmt); > + rhs2 = gimple_assign_rhs2 (def_stmt); > + invalid = true; > + if (TREE_CODE (rhs1) != SSA_NAME) > + break; > + def_stmt1 = SSA_NAME_DEF_STMT (rhs1); > + if (!is_gimple_assign (def_stmt1) > + || !handled_load (def_stmt1, &ops[0], bitsize, bitpos, > + bitregion_start, bitregion_end)) > + break; > + if (rhs_valid_for_store_merging_p (rhs2)) > + ops[1].val = rhs2; > + else if (TREE_CODE (rhs2) != SSA_NAME) > + break; > + else > + { > + def_stmt2 = SSA_NAME_DEF_STMT (rhs2); > + if (!is_gimple_assign (def_stmt2)) > + break; > + else if (!handled_load (def_stmt2, &ops[1], bitsize, bitpos, > + bitregion_start, bitregion_end)) > + break; > + } > + invalid = false; > + break; > + default: > + invalid = true; > + break; > + } > + } > > - struct imm_store_chain_info **chain_info > - = m_stores.get (base_addr); > + struct imm_store_chain_info **chain_info = NULL; > + if (base_addr) > + chain_info = m_stores.get (base_addr); > > - if (!invalid) > + if (invalid) > { > + terminate_all_aliasing_chains (chain_info, stmt); > + return; > + } > + > store_immediate_info *info; > if (chain_info) > { > unsigned int ord = (*chain_info)->m_store_info.length (); > - info = new store_immediate_info (bitsize, bitpos, > - bitregion_start, > - bitregion_end, > - stmt, ord); > + info = new store_immediate_info (bitsize, bitpos, bitregion_start, > + bitregion_end, stmt, ord, rhs_code, > + ops[0], ops[1]); > if (dump_file && (dump_flags & TDF_DETAILS)) > { > - fprintf (dump_file, > - "Recording immediate store from stmt:\n"); > + fprintf (dump_file, "Recording immediate store from stmt:\n"); > print_gimple_stmt (dump_file, stmt, 0); > } > (*chain_info)->m_store_info.safe_push (info); > - /* If we reach the limit of stores to merge in a chain > - terminate and process the chain now. */ > + /* If we reach the limit of stores to merge in a chain terminate and > + process the chain now. */ > if ((*chain_info)->m_store_info.length () > - == (unsigned int) > - PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE)) > + == (unsigned int) PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > - "Reached maximum number of statements" > - " to merge:\n"); > + "Reached maximum number of statements to merge:\n"); > terminate_and_release_chain (*chain_info); > } > - continue; > + return; > } > > /* Store aliases any existing chain? */ > - terminate_all_aliasing_chains (chain_info, false, stmt); > + terminate_all_aliasing_chains (chain_info, stmt); > /* Start a new chain. */ > struct imm_store_chain_info *new_chain > = new imm_store_chain_info (m_stores_head, base_addr); > - info = new store_immediate_info (bitsize, bitpos, > - bitregion_start, > - bitregion_end, > - stmt, 0); > + info = new store_immediate_info (bitsize, bitpos, bitregion_start, > + bitregion_end, stmt, 0, rhs_code, > + ops[0], ops[1]); > new_chain->m_store_info.safe_push (info); > m_stores.put (base_addr, new_chain); > if (dump_file && (dump_flags & TDF_DETAILS)) > { > - fprintf (dump_file, > - "Starting new chain with statement:\n"); > + fprintf (dump_file, "Starting new chain with statement:\n"); > print_gimple_stmt (dump_file, stmt, 0); > fprintf (dump_file, "The base object is:\n"); > print_generic_expr (dump_file, base_addr); > fprintf (dump_file, "\n"); > } > - } > - else > - terminate_all_aliasing_chains (chain_info, > - offset != NULL_TREE, stmt); > - > - continue; > +} > > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)