Re: [PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821, take 2)

Richard Biener Fri, 03 Nov 2017 07:09:41 -0700

On Fri, 3 Nov 2017, Jakub Jelinek wrote:

> On Fri, Nov 03, 2017 at 02:14:39PM +0100, Richard Biener wrote:
> > > +/* Return true if stmts in between FIRST (inclusive) and LAST (exclusive)
> > > +   may clobber REF.  FIRST and LAST must be in the same basic block and
> > > +   have non-NULL vdef.  */
> > > +
> > > +bool
> > > +stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
> > > +{
> > > +  ao_ref r;
> > > +  ao_ref_init (&r, ref);
> > > +  unsigned int count = 0;
> > > +  tree vop = gimple_vdef (last);
> > > +  gimple *stmt;
> > > +
> > > +  gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
> > 
> > EBB would probably work as well, thus we should assert we do not
> > end up visiting a PHI in the loop?
> 
> For a general purpose routine sure, this one is in anonymous namespace
> and meant for use in this pass.  And there it is only checking stores from
> the same store group and any other stores intermixed between those.
> The pass at least right now is resetting all of its state at the end of
> basic blocks, so gimple_bb (first) == gimple_bb (last) is indeed always
> guaranteed.  If we ever extend it such that we don't have this guarantee,
> then this assert would fail and then of course it should be adjusted to
> handle whatever is needed.  But do we need to do that right now?


No, we don't.  Just wondered about the assert and the real limitation
of the implementation.

> Note extending store-merging to handle groups of stores within EBB
> doesn't look useful, then not all stores would be unconditional.

Yes.

> What could make sense is if we have e.g. a diamond
>      |
>     bb1
>    /  \
>   bb2 bb3
>    \  /
>     bb4
>      |
> and stores are in bb1 and bb4 and no stores in bb2 or bb3 can alias
> with those.  But then we'd likely need full-blown walk_aliased_vdefs
> for this...

Yes.

> > > +  gimple *first = merged_store->first_stmt;
> > > +  gimple *last = merged_store->last_stmt;
> > > +  unsigned int i;
> > > +  store_immediate_info *infoc;
> > > +  if (info->order < merged_store->first_order)
> > > +    {
> > > +      FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
> > > + if (stmts_may_clobber_ref_p (info->stmt, first, infoc->ops[idx].val))
> > > +   return false;
> > > +      first = info->stmt;
> > > +    }
> > > +  else if (info->order > merged_store->last_order)
> > > +    {
> > > +      FOR_EACH_VEC_ELT (merged_store->stores, i, infoc)
> > > + if (stmts_may_clobber_ref_p (last, info->stmt, infoc->ops[idx].val))
> > > +   return false;
> > > +      last = info->stmt;
> > > +    }
> > > +  if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val))
> > > +    return false;
> > 
> > Can you comment on what you check in this block?  It first checks
> > all stmts (but not info->stmt itself if it is after last!?) 
> > against
> > all stores that would be added when adding 'info'.  Then it checks
> > from new first to last against the newly added stmt (again
> > excluding that stmt if it was added last).
> 
> The stmts_may_clobber_ref_p routine doesn't check aliasing on the last
> stmt, only on the first stmt and stmts in between.
> 
> Previous iterations have checked FOR_EACH_VEC_ELT (merged_store->stores, i, 
> infoc)
> that merged_store->first_stmt and stmts in between that and 
> merged_store->last_stmt don't clobber any of the infoc->ops[idx].val
> references and we want to maintain that invariant if we add another store to
> the group.  So, if we are about to extend the range between first_stmt and
> last_stmt, then we need to check all the earlier refs on the stmts we've
> added to the range.  Note that the stores are sorted by bitpos, not by
> their order within the basic block at this point, so it is possible that a
> store with a higher bitpos extends to earlier stmts or later stmts.
> 
> And finally the if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val))
> is checking the reference we are adding against the whole new range.
> 
> > > +  if (offset != NULL_TREE)
> > > +    {
> > > +      /* If the access is variable offset then a base decl has to be
> > > +  address-taken to be able to emit pointer-based stores to it.
> > > +  ???  We might be able to get away with re-using the original
> > > +  base up to the first variable part and then wrapping that inside
> > > +  a BIT_FIELD_REF.  */
> > 
> > Yes, that's what I'd generally recommend...  OTOH it can get quite
> > fugly but it only has to survive until RTL expansion...
> 
> This is an preexisting comment I've just moved around from the
> pass_store_merging::execute method into a helper function (it grew too much
> and needed too big indentation and furthermore I wanted to use it for the
> loads too).  Haven't really changed anything on that.
> 
> > As extra sanity check I'd rather have that all refs share a common
> > base (operand-equal-p'ish).  But I guess that's what usually will
> > happen anyways.  The alias-ptr-type trick will be tricky to do
> > here as well (you have to go down to the base MEM_REF, wrap
> > a decl if there's no MEM_REF and adjust the offset type).
> 
> For the aliasing, I have an untested incremental patch, need to finish
> testcases for that, then test and then I can post it.
> 
> > given the style of processing we can end up doing more than
> > necessary work when following ! single-use chains here, no?
> > Would it make sense to restrict this to single-uses or do we
> > handle any case of ! single-uses?  When extending things to
> > allow an intermediate swap or general permute we could handle
> > a byte/word-splat for example.
> 
> single-use vs. multiple uses is something I've thought about, but don't
> know whether it is better to require single-use or not (or sometimes,
> under some condition?).  Say if we have:
>   _1 = t.a;
>   _2 = t.b;
>   _3 = t.c;
>   _4 = t.d;
>   s.a = _1;
>   s.b = _2;
>   s.c = _3;
>   s.d = _4;
>   use (_1, _2, _3, _4);
> it will likely be shorter and maybe faster if we do:
>   _1 = t.a;
>   _2 = t.b;
>   _3 = t.c;
>   _4 = t.d;
>   _5 = load[t.a...t.d];
>   store[s.a...s.d] = _5;
>   use (_1, _2, _3, _4);
> If there are just 2 stores combined together, having
>   _1 = t.a; _2 = t.b; _5 = load[t.a...t.b]; store[t.a...t.b] = _5;
>   use (_1, _2);
> will be as many stmts as before.  And if there is BIT_*_EXPR, we can be
> adding not just loads, but also the bitwise ops.
> 
> So, if you want, I can at least for now add has_single_use checks
> in all the spots where it follows SSA_NAME_DEF_STMT.
> 
> > Otherwise the patch looks good.  Quite some parts of the changes
> > seem to be due to splitting out stuff into functions and refactoring.
> 
> Indeed, the mem_valid_for_store_merging and pass_store_merging::process_store
> functions are mostly code move + reindent + tweaks.
> 
> Here are hand made diffs between old portions of pass_store_merging::execute
> corresponding to the above 2 functions and those new functions with -upbd
> if it is of any help.
> 
> @@ -1,33 +1,36 @@
> -           HOST_WIDE_INT bitsize, bitpos;
> +static tree
> +mem_valid_for_store_merging (tree mem, unsigned HOST_WIDE_INT *pbitsize,
> +                          unsigned HOST_WIDE_INT *pbitpos,
> +                          unsigned HOST_WIDE_INT *pbitregion_start,
> +                          unsigned HOST_WIDE_INT *pbitregion_end)
> +{
> +  HOST_WIDE_INT bitsize;
> +  HOST_WIDE_INT bitpos;
>             unsigned HOST_WIDE_INT bitregion_start = 0;
>             unsigned HOST_WIDE_INT bitregion_end = 0;
>             machine_mode mode;
>             int unsignedp = 0, reversep = 0, volatilep = 0;
> -           tree offset, base_addr;
> -           base_addr
> -             = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
> +  tree offset;
> +  tree base_addr = get_inner_reference (mem, &bitsize, &bitpos, &offset, 
> &mode,
>                                      &unsignedp, &reversep, &volatilep);
> -           if (TREE_CODE (lhs) == COMPONENT_REF
> -               && DECL_BIT_FIELD_TYPE (TREE_OPERAND (lhs, 1)))
> +  *pbitsize = bitsize;
> +  if (bitsize == 0)
> +    return NULL_TREE;
> +
> +  if (TREE_CODE (mem) == COMPONENT_REF
> +      && DECL_BIT_FIELD_TYPE (TREE_OPERAND (mem, 1)))
>               {
> -               get_bit_range (&bitregion_start, &bitregion_end, lhs,
> -                              &bitpos, &offset);
> +      get_bit_range (&bitregion_start, &bitregion_end, mem, &bitpos, 
> &offset);
>                 if (bitregion_end)
>                   ++bitregion_end;
>               }
> -           if (bitsize == 0)
> -             continue;
>  
> -           /* As a future enhancement we could handle stores with the same
> -              base and offset.  */
> -           bool invalid = reversep
> -                          || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
> -                               && (TREE_CODE (rhs) != INTEGER_CST))
> -                          || !rhs_valid_for_store_merging_p (rhs);
> +  if (reversep)
> +    return NULL_TREE;
>  
>             /* We do not want to rewrite TARGET_MEM_REFs.  */
>             if (TREE_CODE (base_addr) == TARGET_MEM_REF)
> -             invalid = true;
> +    return NULL_TREE;
>             /* In some cases get_inner_reference may return a
>                MEM_REF [ptr + byteoffset].  For the purposes of this pass
>                canonicalize the base_addr to MEM_REF [ptr] and take
> @@ -60,7 +63,7 @@
>                       }
>                   }
>                 else
> -                 invalid = true;
> +     return NULL_TREE;
>                 base_addr = TREE_OPERAND (base_addr, 0);
>               }
>             /* get_inner_reference returns the base object, get at its
> @@ -68,7 +71,7 @@
>             else
>               {
>                 if (bitpos < 0)
> -                 invalid = true;
> +     return NULL_TREE;
>                 base_addr = build_fold_addr_expr (base_addr);
>               }
>  
> @@ -78,23 +81,25 @@
>                 bitregion_end = ROUND_UP (bitpos + bitsize, BITS_PER_UNIT);
>               }
>  
> -           if (! invalid
> -               && offset != NULL_TREE)
> +  if (offset != NULL_TREE)
>               {
> -               /* If the access is variable offset then a base
> -                  decl has to be address-taken to be able to
> -                  emit pointer-based stores to it.
> -                  ???  We might be able to get away with
> -                  re-using the original base up to the first
> -                  variable part and then wrapping that inside
> +      /* If the access is variable offset then a base decl has to be
> +      address-taken to be able to emit pointer-based stores to it.
> +      ???  We might be able to get away with re-using the original
> +      base up to the first variable part and then wrapping that inside
>                    a BIT_FIELD_REF.  */
>                 tree base = get_base_address (base_addr);
>                 if (! base
> -                   || (DECL_P (base)
> -                       && ! TREE_ADDRESSABLE (base)))
> -                 invalid = true;
> -               else
> -                 base_addr = build2 (POINTER_PLUS_EXPR,
> -                                     TREE_TYPE (base_addr),
> +       || (DECL_P (base) && ! TREE_ADDRESSABLE (base)))
> +     return NULL_TREE;
> +
> +      base_addr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (base_addr),
>                                       base_addr, offset);
>               }
> +
> +  *pbitsize = bitsize;
> +  *pbitpos = bitpos;
> +  *pbitregion_start = bitregion_start;
> +  *pbitregion_end = bitregion_end;
> +  return base_addr;
> +}
> 
> ------
> 
> @@ -1,71 +1,129 @@
> +void
> +pass_store_merging::process_store (gimple *stmt)
> +{
>             tree lhs = gimple_assign_lhs (stmt);
>             tree rhs = gimple_assign_rhs1 (stmt);
> +  unsigned HOST_WIDE_INT bitsize, bitpos;
> +  unsigned HOST_WIDE_INT bitregion_start;
> +  unsigned HOST_WIDE_INT bitregion_end;
> +  tree base_addr
> +    = mem_valid_for_store_merging (lhs, &bitsize, &bitpos,
> +                                &bitregion_start, &bitregion_end);
> +  if (bitsize == 0)
> +    return;
>  
> -           /* As a future enhancement we could handle stores with the same
> -              base and offset.  */
> -           bool invalid = reversep
> +  bool invalid = (base_addr == NULL_TREE
>                            || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
> -                               && (TREE_CODE (rhs) != INTEGER_CST))
> -                          || !rhs_valid_for_store_merging_p (rhs);
> +                    && (TREE_CODE (rhs) != INTEGER_CST)));
> +  enum tree_code rhs_code = ERROR_MARK;
> +  store_operand_info ops[2];
> +  if (invalid)
> +    ;
> +  else if (rhs_valid_for_store_merging_p (rhs))
> +    {
> +      rhs_code = INTEGER_CST;
> +      ops[0].val = rhs;
> +    }
> +  else if (TREE_CODE (rhs) != SSA_NAME)
> +    invalid = true;
> +  else
> +    {
> +      gimple *def_stmt = SSA_NAME_DEF_STMT (rhs), *def_stmt1, *def_stmt2;
> +      if (!is_gimple_assign (def_stmt))
> +     invalid = true;
> +      else if (handled_load (def_stmt, &ops[0], bitsize, bitpos,
> +                          bitregion_start, bitregion_end))
> +     rhs_code = MEM_REF;
> +      else
> +     switch ((rhs_code = gimple_assign_rhs_code (def_stmt)))
> +       {
> +       case BIT_AND_EXPR:
> +       case BIT_IOR_EXPR:
> +       case BIT_XOR_EXPR:
> +         tree rhs1, rhs2;
> +         rhs1 = gimple_assign_rhs1 (def_stmt);
> +         rhs2 = gimple_assign_rhs2 (def_stmt);
> +         invalid = true;
> +         if (TREE_CODE (rhs1) != SSA_NAME)
> +           break;
> +         def_stmt1 = SSA_NAME_DEF_STMT (rhs1);
> +         if (!is_gimple_assign (def_stmt1)
> +             || !handled_load (def_stmt1, &ops[0], bitsize, bitpos,
> +                               bitregion_start, bitregion_end))
> +           break;
> +         if (rhs_valid_for_store_merging_p (rhs2))
> +           ops[1].val = rhs2;
> +         else if (TREE_CODE (rhs2) != SSA_NAME)
> +           break;
> +         else
> +           {
> +             def_stmt2 = SSA_NAME_DEF_STMT (rhs2);
> +             if (!is_gimple_assign (def_stmt2))
> +               break;
> +             else if (!handled_load (def_stmt2, &ops[1], bitsize, bitpos,
> +                                     bitregion_start, bitregion_end))
> +               break;
> +           }
> +         invalid = false;
> +         break;
> +       default:
> +         invalid = true;
> +         break;
> +       }
> +    }
>  
> -           struct imm_store_chain_info **chain_info
> -             = m_stores.get (base_addr);
> +  struct imm_store_chain_info **chain_info = NULL;
> +  if (base_addr)
> +    chain_info = m_stores.get (base_addr);
>  
> -           if (!invalid)
> +  if (invalid)
>               {
> +      terminate_all_aliasing_chains (chain_info, stmt);
> +      return;
> +    }
> +
>                 store_immediate_info *info;
>                 if (chain_info)
>                   {
>                     unsigned int ord = (*chain_info)->m_store_info.length ();
> -                   info = new store_immediate_info (bitsize, bitpos,
> -                                                    bitregion_start,
> -                                                    bitregion_end,
> -                                                    stmt, ord);
> +      info = new store_immediate_info (bitsize, bitpos, bitregion_start,
> +                                    bitregion_end, stmt, ord, rhs_code,
> +                                    ops[0], ops[1]);
>                     if (dump_file && (dump_flags & TDF_DETAILS))
>                       {
> -                       fprintf (dump_file,
> -                                "Recording immediate store from stmt:\n");
> +       fprintf (dump_file, "Recording immediate store from stmt:\n");
>                         print_gimple_stmt (dump_file, stmt, 0);
>                       }
>                     (*chain_info)->m_store_info.safe_push (info);
> -                   /* If we reach the limit of stores to merge in a chain
> -                      terminate and process the chain now.  */
> +      /* If we reach the limit of stores to merge in a chain terminate and
> +      process the chain now.  */
>                     if ((*chain_info)->m_store_info.length ()
> -                        == (unsigned int)
> -                           PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE))
> +       == (unsigned int) PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE))
>                       {
>                         if (dump_file && (dump_flags & TDF_DETAILS))
>                           fprintf (dump_file,
> -                              "Reached maximum number of statements"
> -                              " to merge:\n");
> +                  "Reached maximum number of statements to merge:\n");
>                         terminate_and_release_chain (*chain_info);
>                       }
> -                   continue;
> +      return;
>                   }
>  
>                 /* Store aliases any existing chain?  */
> -               terminate_all_aliasing_chains (chain_info, false, stmt);
> +  terminate_all_aliasing_chains (chain_info, stmt);
>                 /* Start a new chain.  */
>                 struct imm_store_chain_info *new_chain
>                   = new imm_store_chain_info (m_stores_head, base_addr);
> -               info = new store_immediate_info (bitsize, bitpos,
> -                                                bitregion_start,
> -                                                bitregion_end,
> -                                                stmt, 0);
> +  info = new store_immediate_info (bitsize, bitpos, bitregion_start,
> +                                bitregion_end, stmt, 0, rhs_code,
> +                                ops[0], ops[1]);
>                 new_chain->m_store_info.safe_push (info);
>                 m_stores.put (base_addr, new_chain);
>                 if (dump_file && (dump_flags & TDF_DETAILS))
>                   {
> -                   fprintf (dump_file,
> -                            "Starting new chain with statement:\n");
> +      fprintf (dump_file, "Starting new chain with statement:\n");
>                     print_gimple_stmt (dump_file, stmt, 0);
>                     fprintf (dump_file, "The base object is:\n");
>                     print_generic_expr (dump_file, base_addr);
>                     fprintf (dump_file, "\n");
>                   }
> -             }
> -           else
> -             terminate_all_aliasing_chains (chain_info,
> -                                            offset != NULL_TREE, stmt);
> -
> -           continue;
> +}
> 
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Improve store merging to handle load+store or bitwise logicals (PR tree-optimization/78821, take 2)

Reply via email to