Re: [PATCH V2] introduce light expander sra

Richard Biener Tue, 28 Nov 2023 02:52:36 -0800

On Tue, 28 Nov 2023, Jiufu Guo wrote:

> 
> Hi,
> 
> Thanks so much for your helpful review!
> 
> Richard Biener <richard.guent...@gmail.com> writes:
> 
> > On Fri, Oct 27, 2023 at 3:51?AM Jiufu Guo <guoji...@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> Compare with previous version:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632399.html
> >> This verion supports TI/VEC mode of the access.
> >>
> >> There are a few PRs (meta-bug PR101926) on various targets.
> >> The root causes of them are similar: the aggeragte param/
> >> returns are passed by multi-registers, but they are stored
> >> to stack from registers first; and then, access the
> >> parameter through stack slot.
> >>
> >> A general idea to enhance this: accessing the aggregate
> >> parameters/returns directly through incoming/outgoing
> >> scalar registers.  This idea would be a kind of SRA.
> >>
> >> This experimental patch for light-expander-sra contains
> >> below parts:
> >>
> >> a. Check if the parameters/returns are ok/profitable to
> >>    scalarize, and set the incoming/outgoing registers(
> >>    pseudos) for the parameter/return.
> >>   - This is done in "expand_function_start", after the
> >>     incoming/outgoing hard registers are determined for the
> >>     paramter/return.
> >>     The scalarized registers are recorded in DECL_RTL for
> >>     the parameter/return in parallel form.
> >>   - At the time when setting DECL_RTL, "scalarizable_aggregate"
> >>     is called to check the accesses are ok/profitable to
> >>     scalarize.
> >>     We can continue to enhance this function, to support
> >>     more cases.  For example:
> >>     - 'reverse storage order'.
> >>     - 'writing to parameter'/'overlap accesses'.
> >>
> >> b. When expanding the accesses of the parameters/returns,
> >>    according to the info of the access(e.g. bitpos,bitsize,
> >>    mode), the scalar(pseudos) can be figured out to expand
> >>    the access.  This may happen when expand below accesses:
> >>   - The component access of a parameter: "_1 = arg.f1".
> >>     Or whole parameter access: rhs of "_2 = arg"
> >>   - The assignment to a return val:
> >>     "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return
> >>     stmt.
> >>   - This is mainly done in expr.cc(expand_expr_real_1, and
> >>     expand_assignment).  Function "extract_sub_member" is
> >>     used to figure out the scalar rtxs(pseudos).
> >>
> >> Besides the above two parts, some work are done in the GIMPLE
> >> tree:  collect sra candidates for parameters/returns, and
> >> collect the SRA access info.
> >> This is mainly done at the beginning of the expander pass.
> >> Below are two major items of this part.
> >>  - Collect light-expand-sra candidates.
> >>   Each parameter is checked if it has the proper aggregate
> >>   type.  Collect return val (VAR_P) on each return stmts if
> >>   the function is returning via registers.
> >>   This is implemented in expand_sra::collect_sra_candidates.
> >>
> >>  - Build/collect/manage all the access on the candidates.
> >>   The function "scan_function" is used to do this work, it
> >>   goes through all basicblocks, and all interesting stmts (
> >>   phi, return, assign, call, asm) are checked.
> >>   If there is an interesting expression (e.g. COMPONENT_REF
> >>   or PARM_DECL), then record the required info for the access
> >>   (e.g. pos, size, type, base).
> >>   And if it is risky to do SRA, the candidates may be removed.
> >>   e.g. address-taken and accessed via memory.
> >>   "foo(struct S arg) {bar (&arg);}"
> >>
> >> This patch is tested on ppc64{,le} and x86_64.
> >> Is this ok for trunk?
> >>
> >> BR,
> >> Jeff (Jiufu Guo)
> >>
> >>         PR target/65421
> >>
> >> gcc/ChangeLog:
> >>
> >>         * cfgexpand.cc (struct access): New class.
> >>         (struct expand_sra): New class.
> >>         (expand_sra::collect_sra_candidates): New member function.
> >>         (expand_sra::add_sra_candidate): Likewise.
> >>         (expand_sra::build_access): Likewise.
> >>         (expand_sra::analyze_phi): Likewise.
> >>         (expand_sra::analyze_assign): Likewise.
> >>         (expand_sra::visit_base): Likewise.
> >>         (expand_sra::protect_mem_access_in_stmt): Likewise.
> >>         (expand_sra::expand_sra):  Class constructor.
> >>         (expand_sra::~expand_sra): Class destructor.
> >>         (expand_sra::scalarizable_access):  New member function.
> >>         (expand_sra::scalarizable_accesses):  Likewise.
> >>         (scalarizable_aggregate):  New function.
> >>         (set_scalar_rtx_for_returns):  New function.
> >>         (expand_value_return): Updated.
> >>         (expand_debug_expr): Updated.
> >>         (pass_expand::execute): Updated to use expand_sra.
> >>         * cfgexpand.h (scalarizable_aggregate): New declare.
> >>         (set_scalar_rtx_for_returns): New declare.
> >>         * expr.cc (expand_assignment): Updated.
> >>         (expand_constructor): Updated.
> >>         (query_position_in_parallel): New function.
> >>         (extract_sub_member): New function.
> >>         (expand_expr_real_1): Updated.
> >>         * expr.h (query_position_in_parallel): New declare.
> >>         * function.cc (assign_parm_setup_block): Updated.
> >>         (assign_parms): Updated.
> >>         (expand_function_start): Updated.
> >>         * tree-sra.h (struct sra_base_access): New class.
> >>         (struct sra_default_analyzer): New class.
> >>         (scan_function): New template function.
> >>         * var-tracking.cc (track_loc_p): Updated.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         * g++.target/powerpc/pr102024.C: Updated
> >>         * gcc.target/powerpc/pr108073.c: New test.
> >>         * gcc.target/powerpc/pr65421-1.c: New test.
> >>         * gcc.target/powerpc/pr65421-2.c: New test.
> >>
> >> ---
> >>  gcc/cfgexpand.cc                             | 352 ++++++++++++++++++-
> >>  gcc/cfgexpand.h                              |   2 +
> >>  gcc/expr.cc                                  | 179 +++++++++-
> >>  gcc/expr.h                                   |   3 +
> >>  gcc/function.cc                              |  36 +-
> >>  gcc/tree-sra.h                               |  76 ++++
> >>  gcc/var-tracking.cc                          |   3 +-
> >>  gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
> >>  gcc/testsuite/gcc.target/i386/pr20020-2.c    |   5 +
> >>  gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
> >>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
> >>  gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
> >>  12 files changed, 718 insertions(+), 7 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c
> >>
> >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> >> index 4262703a138..ef99ca8ac13 100644
> >> --- a/gcc/cfgexpand.cc
> >> +++ b/gcc/cfgexpand.cc
> >> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
> >>  #include "output.h"
> >>  #include "builtins.h"
> >>  #include "opts.h"
> >> +#include "tree-sra.h"
> >>
> >>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> >> these
> >>     cases use the macros NAME__MAIN to give a quoted symbol and 
> >> SYMBOL__MAIN to
> >> @@ -97,6 +98,343 @@ static bool defer_stack_allocation (tree, bool);
> >>
> >>  static void record_alignment_for_reg_var (unsigned int);
> >>
> >> +/* For light SRA in expander about paramaters and returns.  */
> >> +struct access : public sra_base_access
> >> +{
> >> +};
> >> +
> >> +typedef struct access *access_p;
> >> +
> >> +struct expand_sra : public sra_default_analyzer
> >> +{
> >> +  /* Construct/destruct resources, e.g. sra candidates.  */
> >> +  expand_sra ();
> >> +  ~expand_sra ();
> >> +
> >> +  /* No actions for pre_analyze_stmt, analyze_return.  */
> >> +
> >> +  /* Overwrite phi,call,asm analyzations.  */
> >> +  void analyze_phi (gphi *s);
> >> +
> >> +  /* TODO: Check accesses on call/asm.  */
> >> +  void analyze_call (gcall *s) { protect_mem_access_in_stmt (s); };
> >> +  void analyze_asm (gasm *s) { protect_mem_access_in_stmt (s); };
> >> +
> >> +  /* Check access of SRA on assignment.  */
> >> +  void analyze_assign (gassign *);
> >> +
> >> +  /* Check if the accesses of BASE(parameter or return) are
> >> +     scalarizable, according to the incoming/outgoing REGS. */
> >> +  bool scalarizable_accesses (tree base, rtx regs);
> >> +
> >> +private:
> >> +  /* Collect the parameter and returns to check if they are suitable for
> >> +     scalarization.  */
> >> +  bool collect_sra_candidates (void);
> >> +
> >> +  /* Return true if VAR is added as a candidate for SRA.  */
> >> +  bool add_sra_candidate (tree var);
> >> +
> >> +  /* Return true if EXPR has interesting sra access, and created access,
> >> +     return false otherwise.  */
> >> +  access_p build_access (tree expr, bool write);
> >> +
> >> +  /* Check if the access ACC is scalarizable.  REGS is the 
> >> incoming/outgoing
> >> +     registers which the access is based on. */
> >> +  bool scalarizable_access (access_p acc, rtx regs, bool is_parm);
> >> +
> >> +  /* If there is risk (stored/loaded or addr taken),
> >> +     disqualify the sra candidates in the un-interesting STMT. */
> >> +  void protect_mem_access_in_stmt (gimple *stmt);
> >> +
> >> +  /* Callback of walk_stmt_load_store_addr_ops, used to remove
> >> +     unscalarizable accesses.  */
> >> +  static bool visit_base (gimple *, tree op, tree, void *data);
> >> +
> >> +  /* Base (tree) -> Vector (vec<access_p> *) map.  */
> >> +  hash_map<tree, auto_vec<access_p> > *base_access_vec;
> >> +};
> >> +
> >> +bool
> >> +expand_sra::collect_sra_candidates (void)
> >> +{
> >> +  bool ret = false;
> >> +
> >> +  /* Collect parameters.  */
> >> +  for (tree parm = DECL_ARGUMENTS (current_function_decl); parm;
> >> +       parm = DECL_CHAIN (parm))
> >> +    ret |= add_sra_candidate (parm);
> >> +
> >> +  /* Collect VARs on returns.  */
> >> +  if (DECL_RESULT (current_function_decl))
> >> +    {
> >> +      edge_iterator ei;
> >> +      edge e;
> >> +      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> >> +       if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
> >> +         {
> >> +           tree val = gimple_return_retval (r);
> >> +           /* To sclaraized the return, the return value should be only
> >> +              writen (except this return stmt).  Using 'true(write)' to
> >> +              pretend the access only be 'writen'. */
> >> +           if (val && VAR_P (val))
> >> +             ret |= add_sra_candidate (val) && build_access (val, true);
> >> +         }
> >> +    }
> >> +
> >> +  return ret;
> >> +}
> >> +
> >> +bool
> >> +expand_sra::add_sra_candidate (tree var)
> >> +{
> >> +  tree type = TREE_TYPE (var);
> >> +
> >> +  if (!AGGREGATE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type))
> >> +      || tree_to_shwi (TYPE_SIZE (type)) == 0 || TREE_THIS_VOLATILE (var)
> >> +      || is_va_list_type (type))
> >> +    return false;
> >> +
> >> +  base_access_vec->get_or_insert (var);
> >> +
> >> +  return true;
> >> +}
> >> +
> >> +access_p
> >> +expand_sra::build_access (tree expr, bool write)
> >> +{
> >> +  enum tree_code code = TREE_CODE (expr);
> >> +  if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF
> >> +      && code != ARRAY_REF && code != ARRAY_RANGE_REF)
> >> +    return NULL;
> >> +
> >> +  HOST_WIDE_INT offset, size;
> >> +  bool reverse;
> >> +  tree base = get_ref_base_and_extent_hwi (expr, &offset, &size, 
> >> &reverse);
> >> +  if (!base || !DECL_P (base))
> >> +    return NULL;
> >> +
> >> +  vec<access_p> *access_vec = base_access_vec->get (base);
> >> +  if (!access_vec)
> >> +    return NULL;
> >> +
> >> +  /* TODO: support reverse. */
> >> +  if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE 
> >> (base)))
> >> +    {
> >> +      base_access_vec->remove (base);
> >> +      return NULL;
> >> +    }
> >> +
> >> +  struct access *access = XNEWVEC (struct access, 1);
> >> +
> >> +  memset (access, 0, sizeof (struct access));
> >> +  access->offset = offset;
> >> +  access->size = size;
> >> +  access->expr = expr;
> >> +  access->write = write;
> >> +  access->reverse = reverse;
> >> +
> >> +  access_vec->safe_push (access);
> >> +
> >> +  return access;
> >> +}
> >
> > I still think there's SRA code we can share for producing
> > accesses from a stmt.
> 
> Yeap, there is similar code in existing SRAs. To common the code is a
> todo item. 
> One potential issue, for the example: "analyze/scan expr and
> build/create access", there are some small/scatter (e.g. the order of
> if conditions) difference may cause different behavior. This may cause
> the common code un-pretty.
> 
> >
> >> +
> >> +/* Function protect_mem_access_in_stmt removes the SRA candidates if
> >> +   there is addr-taken on the candidate in the STMT.  */
> >> +
> >> +void
> >> +expand_sra::analyze_phi (gphi *stmt)
> >> +{
> >> +  if (base_access_vec && !base_access_vec->is_empty ())
> >> +    walk_stmt_load_store_addr_ops (stmt, this, NULL, NULL, visit_base);
> >> +}
> >> +
> >> +void
> >> +expand_sra::analyze_assign (gassign *stmt)
> >> +{
> >> +  if (!base_access_vec || base_access_vec->is_empty ())
> >> +    return;
> >> +
> >> +  if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt))
> >> +    {
> >> +      tree rhs = gimple_assign_rhs1 (stmt);
> >> +      tree lhs = gimple_assign_lhs (stmt);
> >> +      bool res_r = build_access (rhs, false);
> >> +      bool res_l = build_access (lhs, true);
> >> +
> >> +      if (res_l || res_r)
> >> +       return;
> >> +    }
> >> +
> >> +  protect_mem_access_in_stmt (stmt);
> >> +}
> >> +
> >> +/* Callback of walk_stmt_load_store_addr_ops, used to remove
> >> +   unscalarizable accesses.  Called by protect_mem_access_in_stmt.  */
> >> +
> >> +bool
> >> +expand_sra::visit_base (gimple *, tree op, tree, void *data)
> >> +{
> >> +  op = get_base_address (op);
> >> +  if (op && DECL_P (op))
> >> +    {
> >> +      expand_sra *p = (expand_sra *) data;
> >> +      p->base_access_vec->remove (op);
> >> +    }
> >> +  return false;
> >> +}
> >> +
> >> +/* Function protect_mem_access_in_stmt removes the SRA candidates if
> >> +   there is store/load/addr-taken on the candidate in the STMT.
> >> +
> >> +   For some statements, which SRA does not care about, if there are
> >> +   possible memory operation on the SRA candidates, it would be risky
> >> +   to scalarize it.  */
> >> +
> >> +void
> >> +expand_sra::protect_mem_access_in_stmt (gimple *stmt)
> >> +{
> >> +  if (base_access_vec && !base_access_vec->is_empty ())
> >> +    walk_stmt_load_store_addr_ops (stmt, this, visit_base, visit_base,
> >> +                                  visit_base);
> >> +}
> >> +
> >> +expand_sra::expand_sra () : base_access_vec (NULL)
> >> +{
> >> +  if (optimize <= 0)
> >> +    return;
> >> +
> >> +  base_access_vec = new hash_map<tree, auto_vec<access_p> >;
> >> +  collect_sra_candidates ();
> >> +}
> >> +
> >> +expand_sra::~expand_sra ()
> >> +{
> >> +  if (optimize <= 0)
> >> +    return;
> >> +
> >> +  delete base_access_vec;
> >> +}
> >> +
> >> +bool
> >> +expand_sra::scalarizable_access (access_p acc, rtx regs, bool is_parm)
> >> +{
> >> +  /* Now only support reading from parms
> >> +     or writing to returns.  */
> >> +  if (is_parm && acc->write)
> >> +    return false;
> >> +  if (!is_parm && !acc->write)
> >> +    return false;
> >> +
> >> +  /* Compute the position of the access in the parallel regs.  */
> >> +  int start_index = -1;
> >> +  int end_index = -1;
> >> +  HOST_WIDE_INT left_bits = 0;
> >> +  HOST_WIDE_INT right_bits = 0;
> >> +  query_position_in_parallel (acc->offset, acc->size, regs, start_index,
> >> +                             end_index, left_bits, right_bits);
> >> +
> >> +  /* Invalid access possition: padding or outof bound.  */
> >> +  if (start_index < 0 || end_index < 0)
> >> +    return false;
> >> +
> >> +  machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr));
> >> +  /* Need multi-registers in a parallel for the access.  */
> >> +  if (expr_mode == BLKmode || end_index > start_index)
> >> +    {
> >> +      if (left_bits || right_bits)
> >> +       return false;
> >> +      if (expr_mode == BLKmode)
> >> +       return true;
> >> +
> >> +      /* For large modes, only support TI/VECTOR in mult-registers. */
> >> +      if (known_gt (acc->size, GET_MODE_BITSIZE (word_mode)))
> >> +       return expr_mode == TImode || VECTOR_MODE_P (expr_mode);
> >> +      return true;
> >> +    }
> >> +
> >> +  gcc_assert (end_index == start_index);
> >> +
> >> +  /* Just need one reg for the access.  */
> >> +  if (left_bits == 0 && right_bits == 0)
> >> +    return true;
> >> +
> >> +  scalar_int_mode imode;
> >> +  /* Need to extract bits from the reg for the access.  */
> >> +  return !acc->write && int_mode_for_mode (expr_mode).exists (&imode);
> >> +}
> >> +
> >> +/* Now, the base (parm/return) is scalarizable, only if all
> >> +   accesses of the BASE are scalariable.
> >> +
> >> +   This function need to be updated, to support more complicate
> >> +   cases, like:
> >> +   - Some access are scalarizable, but some are not.
> >> +   - Access is writing to a parameter.
> >> +   - Writing accesses are overlap with multi-accesses.   */
> >> +
> >> +bool
> >> +expand_sra::scalarizable_accesses (tree base, rtx regs)
> >> +{
> >> +  if (!base_access_vec)
> >> +    return false;
> >> +  vec<access_p> *access_vec = base_access_vec->get (base);
> >> +  if (!access_vec)
> >> +    return false;
> >> +  if (access_vec->is_empty ())
> >> +    return false;
> >> +
> >> +  bool is_parm = TREE_CODE (base) == PARM_DECL;
> >> +  int n = access_vec->length ();
> >> +  int cur_access_index = 0;
> >> +  for (; cur_access_index < n; cur_access_index++)
> >> +    if (!scalarizable_access ((*access_vec)[cur_access_index], regs, 
> >> is_parm))
> >> +      break;
> >> +
> >> +  /* It is ok if all access are scalarizable.  */
> >> +  if (cur_access_index == n)
> >> +    return true;
> >> +
> >> +  base_access_vec->remove (base);
> >> +  return false;
> >> +}
> >> +
> >> +static expand_sra *current_sra = NULL;
> >> +
> >> +/* Check if the PARAM (or return) is scalarizable.
> >> +
> >> +   This interface is used in expand_function_start
> >> +   to check sra possiblity for parmeters. */
> >> +
> >> +bool
> >> +scalarizable_aggregate (tree parm, rtx regs)
> >> +{
> >> +  if (!current_sra)
> >> +    return false;
> >> +  return current_sra->scalarizable_accesses (parm, regs);
> >> +}
> >> +
> >> +/* Check if interesting returns, and if they are scalarizable,
> >> +   set DECL_RTL as scalar registers.
> >> +
> >> +   This interface is used in expand_function_start
> >> +   when outgoing registers are determinded for DECL_RESULT.  */
> >> +
> >> +void
> >> +set_scalar_rtx_for_returns ()
> >> +{
> >> +  rtx res = DECL_RTL (DECL_RESULT (current_function_decl));
> >> +  edge_iterator ei;
> >> +  edge e;
> >> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> >> +    if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
> >> +      {
> >> +       tree val = gimple_return_retval (r);
> >> +       if (val && VAR_P (val) && scalarizable_aggregate (val, res))
> >> +         SET_DECL_RTL (val, res);
> >> +      }
> >> +}
> >> +
> >>  /* Return an expression tree corresponding to the RHS of GIMPLE
> >>     statement STMT.  */
> >>
> >> @@ -3707,7 +4045,7 @@ expand_value_return (rtx val)
> >>
> >>    tree decl = DECL_RESULT (current_function_decl);
> >>    rtx return_reg = DECL_RTL (decl);
> >> -  if (return_reg != val)
> >> +  if (!rtx_equal_p (return_reg, val))
> >>      {
> >>        tree funtype = TREE_TYPE (current_function_decl);
> >>        tree type = TREE_TYPE (decl);
> >> @@ -4423,6 +4761,12 @@ expand_debug_expr (tree exp)
> >>    addr_space_t as;
> >>    scalar_int_mode op0_mode, op1_mode, addr_mode;
> >>
> >> +  /* TODO: Enable to debug expand-sra optimized parm/returns.  */
> >> +  tree base = get_base_address (exp);
> >> +  if ((TREE_CODE (base) == PARM_DECL || (VAR_P (base) && DECL_RTL_SET_P 
> >> (base)))
> >> +      && GET_CODE (DECL_RTL (base)) == PARALLEL)
> >> +    return NULL_RTX;
> >> +
> >>    switch (TREE_CODE_CLASS (TREE_CODE (exp)))
> >>      {
> >>      case tcc_expression:
> >> @@ -6628,6 +6972,10 @@ pass_expand::execute (function *fun)
> >>    auto_bitmap forced_stack_vars;
> >>    discover_nonconstant_array_refs (forced_stack_vars);
> >>
> >> +  /* Enable light-expander-sra.  */
> >> +  current_sra = new expand_sra;
> >> +  scan_function (cfun, *current_sra);
> >> +
> >>    /* Make sure all values used by the optimization passes have sane
> >>       defaults.  */
> >>    reg_renumber = 0;
> >> @@ -7056,6 +7404,8 @@ pass_expand::execute (function *fun)
> >>        loop_optimizer_finalize ();
> >>      }
> >>
> >> +  delete current_sra;
> >> +  current_sra = NULL;
> >>    timevar_pop (TV_POST_EXPAND);
> >>
> >>    return 0;
> >> diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h
> >> index 0e551f6cfd3..3415c217708 100644
> >> --- a/gcc/cfgexpand.h
> >> +++ b/gcc/cfgexpand.h
> >> @@ -24,5 +24,7 @@ extern tree gimple_assign_rhs_to_tree (gimple *);
> >>  extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *);
> >>  extern void set_parm_rtl (tree, rtx);
> >>
> >> +extern bool scalarizable_aggregate (tree, rtx);
> >> +extern void set_scalar_rtx_for_returns ();
> >>
> >>  #endif /* GCC_CFGEXPAND_H */
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> index 763bd82c59f..5ba26e0ef52 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -5618,7 +5618,10 @@ expand_assignment (tree to, tree from, bool 
> >> nontemporal)
> >>       Assignment of an array element at a constant index, and assignment of
> >>       an array element in an unaligned packed structure field, has the same
> >>       problem.  Same for (partially) storing into a non-memory object.  */
> >> -  if (handled_component_p (to)
> >> +  if ((handled_component_p (to)
> >> +       && !(VAR_P (get_base_address (to))
> >
> > get_base_address is pretty expensive so please cache the result.
> 
> Thanks for pointing out this!
> 
> >
> >> +           && DECL_RTL_SET_P (get_base_address (to))
> >> +           && GET_CODE (DECL_RTL (get_base_address (to))) == PARALLEL))
> >
> > I think that's possibly fragile in case a PARALLEL can also appear by
> > other means here.  I also think it's the wrong thing to map a structural
> 
> Understand your concern. The current trunk code uses PARALLEL to carry
> the incoming/outgoing registers for aggregate, so this patch reuse this
> character. While it is possible that PARALLEL is used by other means.
> To make it safe, adding an additional flag may be an idea. (While this
> would not needed if using your suggestions below).
> 
> > GIMPLE accordingly with artificial decls which would also be the way
> > to preserve debug info.
> >
> > In principle "expander SRA" isn't different from regular SRA apart from
> > the fact that it handles parameter and return variables and heuristically
> > includes the ABI (does it?) in decomposition.
> >
> > So the only important difference is that we cannot represent the
> > change for the incoming arguments or the outgoing regs in GIMPLE
> > but we should only have to special case _those_ bits, not all the
> > other accesses we can rewrite.
> >
> > We could even use the actual SRA pass in a special mode right
> > before RTL expansion (or close to it) and invent a special (but ugly)
> > representation for the incoming/outgoing part, say by using
> > new internal functions present between the SRA pass and RTL
> > expansion:
> >
> Previously, I tried the method to rerun tree-sra just before the
> "optimized" pass once, and find that some enhancement woudl be needed to
> make it work: for example, the parameters analyzing seems not work for
> some cases.  While the two major reasons for me to combine the
> 'light-sra' inside expander:
> 1. We may just use the reg/pseudos based on incoming/outgoing registers
> to access the struct.  Then, no need to create the replacements like
> tree-sra and ipa-sra.
> 
> 2. For small structures which may have int mode, RTL passes (start from
> expander) are using the int reg to access the struct.  It would be
> better to avoid struct/member access at the beggining of expeander.
> It seems these issues can also be relieved by the idea of introducing
> "new internal function". We just need to expand the internal function
> in any expected way.
> 
> > struct X foo (struct X x)
> > {
> >   int D.1234 = .ARG_PART (x, offset, size);
> >   int D.1235 = .ARG_PART (x, offset, size);
> >   D.2345 = .SET_ARG_PARTS (D.1234, offset, size, D.1235, offset, size);
> >   return D.2345;
> > }
> 
> This code expresses two major functionalities: One is extracting fields
> from aggregate (mainly for parameters); and another one is constructing
> an aggregate from the value of elements (mainly for returns).


Yes.  In principle BIT_FIELD_REF could be used for the former and
a CONSTRUCTOR used for the latter but I've used internal functions
to tightly control how we expand these bits - in fact I think for
all practical cases we care about the SRA pass should have made
sure they expand to plain (register) moves, that is, the pieces
1:1 correspond to incoming argument registers or outgoing return
value registers.

Richard.

Re: [PATCH V2] introduce light expander sra

Reply via email to