On Tue, 28 Nov 2023, Jiufu Guo wrote: > > Hi, > > Thanks so much for your helpful review! > > Richard Biener <richard.guent...@gmail.com> writes: > > > On Fri, Oct 27, 2023 at 3:51?AM Jiufu Guo <guoji...@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> Compare with previous version: > >> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632399.html > >> This verion supports TI/VEC mode of the access. > >> > >> There are a few PRs (meta-bug PR101926) on various targets. > >> The root causes of them are similar: the aggeragte param/ > >> returns are passed by multi-registers, but they are stored > >> to stack from registers first; and then, access the > >> parameter through stack slot. > >> > >> A general idea to enhance this: accessing the aggregate > >> parameters/returns directly through incoming/outgoing > >> scalar registers. This idea would be a kind of SRA. > >> > >> This experimental patch for light-expander-sra contains > >> below parts: > >> > >> a. Check if the parameters/returns are ok/profitable to > >> scalarize, and set the incoming/outgoing registers( > >> pseudos) for the parameter/return. > >> - This is done in "expand_function_start", after the > >> incoming/outgoing hard registers are determined for the > >> paramter/return. > >> The scalarized registers are recorded in DECL_RTL for > >> the parameter/return in parallel form. > >> - At the time when setting DECL_RTL, "scalarizable_aggregate" > >> is called to check the accesses are ok/profitable to > >> scalarize. > >> We can continue to enhance this function, to support > >> more cases. For example: > >> - 'reverse storage order'. > >> - 'writing to parameter'/'overlap accesses'. > >> > >> b. When expanding the accesses of the parameters/returns, > >> according to the info of the access(e.g. bitpos,bitsize, > >> mode), the scalar(pseudos) can be figured out to expand > >> the access. This may happen when expand below accesses: > >> - The component access of a parameter: "_1 = arg.f1". > >> Or whole parameter access: rhs of "_2 = arg" > >> - The assignment to a return val: > >> "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return > >> stmt. > >> - This is mainly done in expr.cc(expand_expr_real_1, and > >> expand_assignment). Function "extract_sub_member" is > >> used to figure out the scalar rtxs(pseudos). > >> > >> Besides the above two parts, some work are done in the GIMPLE > >> tree: collect sra candidates for parameters/returns, and > >> collect the SRA access info. > >> This is mainly done at the beginning of the expander pass. > >> Below are two major items of this part. > >> - Collect light-expand-sra candidates. > >> Each parameter is checked if it has the proper aggregate > >> type. Collect return val (VAR_P) on each return stmts if > >> the function is returning via registers. > >> This is implemented in expand_sra::collect_sra_candidates. > >> > >> - Build/collect/manage all the access on the candidates. > >> The function "scan_function" is used to do this work, it > >> goes through all basicblocks, and all interesting stmts ( > >> phi, return, assign, call, asm) are checked. > >> If there is an interesting expression (e.g. COMPONENT_REF > >> or PARM_DECL), then record the required info for the access > >> (e.g. pos, size, type, base). > >> And if it is risky to do SRA, the candidates may be removed. > >> e.g. address-taken and accessed via memory. > >> "foo(struct S arg) {bar (&arg);}" > >> > >> This patch is tested on ppc64{,le} and x86_64. > >> Is this ok for trunk? > >> > >> BR, > >> Jeff (Jiufu Guo) > >> > >> PR target/65421 > >> > >> gcc/ChangeLog: > >> > >> * cfgexpand.cc (struct access): New class. > >> (struct expand_sra): New class. > >> (expand_sra::collect_sra_candidates): New member function. > >> (expand_sra::add_sra_candidate): Likewise. > >> (expand_sra::build_access): Likewise. > >> (expand_sra::analyze_phi): Likewise. > >> (expand_sra::analyze_assign): Likewise. > >> (expand_sra::visit_base): Likewise. > >> (expand_sra::protect_mem_access_in_stmt): Likewise. > >> (expand_sra::expand_sra): Class constructor. > >> (expand_sra::~expand_sra): Class destructor. > >> (expand_sra::scalarizable_access): New member function. > >> (expand_sra::scalarizable_accesses): Likewise. > >> (scalarizable_aggregate): New function. > >> (set_scalar_rtx_for_returns): New function. > >> (expand_value_return): Updated. > >> (expand_debug_expr): Updated. > >> (pass_expand::execute): Updated to use expand_sra. > >> * cfgexpand.h (scalarizable_aggregate): New declare. > >> (set_scalar_rtx_for_returns): New declare. > >> * expr.cc (expand_assignment): Updated. > >> (expand_constructor): Updated. > >> (query_position_in_parallel): New function. > >> (extract_sub_member): New function. > >> (expand_expr_real_1): Updated. > >> * expr.h (query_position_in_parallel): New declare. > >> * function.cc (assign_parm_setup_block): Updated. > >> (assign_parms): Updated. > >> (expand_function_start): Updated. > >> * tree-sra.h (struct sra_base_access): New class. > >> (struct sra_default_analyzer): New class. > >> (scan_function): New template function. > >> * var-tracking.cc (track_loc_p): Updated. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * g++.target/powerpc/pr102024.C: Updated > >> * gcc.target/powerpc/pr108073.c: New test. > >> * gcc.target/powerpc/pr65421-1.c: New test. > >> * gcc.target/powerpc/pr65421-2.c: New test. > >> > >> --- > >> gcc/cfgexpand.cc | 352 ++++++++++++++++++- > >> gcc/cfgexpand.h | 2 + > >> gcc/expr.cc | 179 +++++++++- > >> gcc/expr.h | 3 + > >> gcc/function.cc | 36 +- > >> gcc/tree-sra.h | 76 ++++ > >> gcc/var-tracking.cc | 3 +- > >> gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- > >> gcc/testsuite/gcc.target/i386/pr20020-2.c | 5 + > >> gcc/testsuite/gcc.target/powerpc/pr108073.c | 29 ++ > >> gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 + > >> gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 32 ++ > >> 12 files changed, 718 insertions(+), 7 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c > >> > >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > >> index 4262703a138..ef99ca8ac13 100644 > >> --- a/gcc/cfgexpand.cc > >> +++ b/gcc/cfgexpand.cc > >> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3. If not see > >> #include "output.h" > >> #include "builtins.h" > >> #include "opts.h" > >> +#include "tree-sra.h" > >> > >> /* Some systems use __main in a way incompatible with its use in gcc, in > >> these > >> cases use the macros NAME__MAIN to give a quoted symbol and > >> SYMBOL__MAIN to > >> @@ -97,6 +98,343 @@ static bool defer_stack_allocation (tree, bool); > >> > >> static void record_alignment_for_reg_var (unsigned int); > >> > >> +/* For light SRA in expander about paramaters and returns. */ > >> +struct access : public sra_base_access > >> +{ > >> +}; > >> + > >> +typedef struct access *access_p; > >> + > >> +struct expand_sra : public sra_default_analyzer > >> +{ > >> + /* Construct/destruct resources, e.g. sra candidates. */ > >> + expand_sra (); > >> + ~expand_sra (); > >> + > >> + /* No actions for pre_analyze_stmt, analyze_return. */ > >> + > >> + /* Overwrite phi,call,asm analyzations. */ > >> + void analyze_phi (gphi *s); > >> + > >> + /* TODO: Check accesses on call/asm. */ > >> + void analyze_call (gcall *s) { protect_mem_access_in_stmt (s); }; > >> + void analyze_asm (gasm *s) { protect_mem_access_in_stmt (s); }; > >> + > >> + /* Check access of SRA on assignment. */ > >> + void analyze_assign (gassign *); > >> + > >> + /* Check if the accesses of BASE(parameter or return) are > >> + scalarizable, according to the incoming/outgoing REGS. */ > >> + bool scalarizable_accesses (tree base, rtx regs); > >> + > >> +private: > >> + /* Collect the parameter and returns to check if they are suitable for > >> + scalarization. */ > >> + bool collect_sra_candidates (void); > >> + > >> + /* Return true if VAR is added as a candidate for SRA. */ > >> + bool add_sra_candidate (tree var); > >> + > >> + /* Return true if EXPR has interesting sra access, and created access, > >> + return false otherwise. */ > >> + access_p build_access (tree expr, bool write); > >> + > >> + /* Check if the access ACC is scalarizable. REGS is the > >> incoming/outgoing > >> + registers which the access is based on. */ > >> + bool scalarizable_access (access_p acc, rtx regs, bool is_parm); > >> + > >> + /* If there is risk (stored/loaded or addr taken), > >> + disqualify the sra candidates in the un-interesting STMT. */ > >> + void protect_mem_access_in_stmt (gimple *stmt); > >> + > >> + /* Callback of walk_stmt_load_store_addr_ops, used to remove > >> + unscalarizable accesses. */ > >> + static bool visit_base (gimple *, tree op, tree, void *data); > >> + > >> + /* Base (tree) -> Vector (vec<access_p> *) map. */ > >> + hash_map<tree, auto_vec<access_p> > *base_access_vec; > >> +}; > >> + > >> +bool > >> +expand_sra::collect_sra_candidates (void) > >> +{ > >> + bool ret = false; > >> + > >> + /* Collect parameters. */ > >> + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; > >> + parm = DECL_CHAIN (parm)) > >> + ret |= add_sra_candidate (parm); > >> + > >> + /* Collect VARs on returns. */ > >> + if (DECL_RESULT (current_function_decl)) > >> + { > >> + edge_iterator ei; > >> + edge e; > >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > >> + if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src))) > >> + { > >> + tree val = gimple_return_retval (r); > >> + /* To sclaraized the return, the return value should be only > >> + writen (except this return stmt). Using 'true(write)' to > >> + pretend the access only be 'writen'. */ > >> + if (val && VAR_P (val)) > >> + ret |= add_sra_candidate (val) && build_access (val, true); > >> + } > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +bool > >> +expand_sra::add_sra_candidate (tree var) > >> +{ > >> + tree type = TREE_TYPE (var); > >> + > >> + if (!AGGREGATE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type)) > >> + || tree_to_shwi (TYPE_SIZE (type)) == 0 || TREE_THIS_VOLATILE (var) > >> + || is_va_list_type (type)) > >> + return false; > >> + > >> + base_access_vec->get_or_insert (var); > >> + > >> + return true; > >> +} > >> + > >> +access_p > >> +expand_sra::build_access (tree expr, bool write) > >> +{ > >> + enum tree_code code = TREE_CODE (expr); > >> + if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF > >> + && code != ARRAY_REF && code != ARRAY_RANGE_REF) > >> + return NULL; > >> + > >> + HOST_WIDE_INT offset, size; > >> + bool reverse; > >> + tree base = get_ref_base_and_extent_hwi (expr, &offset, &size, > >> &reverse); > >> + if (!base || !DECL_P (base)) > >> + return NULL; > >> + > >> + vec<access_p> *access_vec = base_access_vec->get (base); > >> + if (!access_vec) > >> + return NULL; > >> + > >> + /* TODO: support reverse. */ > >> + if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE > >> (base))) > >> + { > >> + base_access_vec->remove (base); > >> + return NULL; > >> + } > >> + > >> + struct access *access = XNEWVEC (struct access, 1); > >> + > >> + memset (access, 0, sizeof (struct access)); > >> + access->offset = offset; > >> + access->size = size; > >> + access->expr = expr; > >> + access->write = write; > >> + access->reverse = reverse; > >> + > >> + access_vec->safe_push (access); > >> + > >> + return access; > >> +} > > > > I still think there's SRA code we can share for producing > > accesses from a stmt. > > Yeap, there is similar code in existing SRAs. To common the code is a > todo item. > One potential issue, for the example: "analyze/scan expr and > build/create access", there are some small/scatter (e.g. the order of > if conditions) difference may cause different behavior. This may cause > the common code un-pretty. > > > > >> + > >> +/* Function protect_mem_access_in_stmt removes the SRA candidates if > >> + there is addr-taken on the candidate in the STMT. */ > >> + > >> +void > >> +expand_sra::analyze_phi (gphi *stmt) > >> +{ > >> + if (base_access_vec && !base_access_vec->is_empty ()) > >> + walk_stmt_load_store_addr_ops (stmt, this, NULL, NULL, visit_base); > >> +} > >> + > >> +void > >> +expand_sra::analyze_assign (gassign *stmt) > >> +{ > >> + if (!base_access_vec || base_access_vec->is_empty ()) > >> + return; > >> + > >> + if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt)) > >> + { > >> + tree rhs = gimple_assign_rhs1 (stmt); > >> + tree lhs = gimple_assign_lhs (stmt); > >> + bool res_r = build_access (rhs, false); > >> + bool res_l = build_access (lhs, true); > >> + > >> + if (res_l || res_r) > >> + return; > >> + } > >> + > >> + protect_mem_access_in_stmt (stmt); > >> +} > >> + > >> +/* Callback of walk_stmt_load_store_addr_ops, used to remove > >> + unscalarizable accesses. Called by protect_mem_access_in_stmt. */ > >> + > >> +bool > >> +expand_sra::visit_base (gimple *, tree op, tree, void *data) > >> +{ > >> + op = get_base_address (op); > >> + if (op && DECL_P (op)) > >> + { > >> + expand_sra *p = (expand_sra *) data; > >> + p->base_access_vec->remove (op); > >> + } > >> + return false; > >> +} > >> + > >> +/* Function protect_mem_access_in_stmt removes the SRA candidates if > >> + there is store/load/addr-taken on the candidate in the STMT. > >> + > >> + For some statements, which SRA does not care about, if there are > >> + possible memory operation on the SRA candidates, it would be risky > >> + to scalarize it. */ > >> + > >> +void > >> +expand_sra::protect_mem_access_in_stmt (gimple *stmt) > >> +{ > >> + if (base_access_vec && !base_access_vec->is_empty ()) > >> + walk_stmt_load_store_addr_ops (stmt, this, visit_base, visit_base, > >> + visit_base); > >> +} > >> + > >> +expand_sra::expand_sra () : base_access_vec (NULL) > >> +{ > >> + if (optimize <= 0) > >> + return; > >> + > >> + base_access_vec = new hash_map<tree, auto_vec<access_p> >; > >> + collect_sra_candidates (); > >> +} > >> + > >> +expand_sra::~expand_sra () > >> +{ > >> + if (optimize <= 0) > >> + return; > >> + > >> + delete base_access_vec; > >> +} > >> + > >> +bool > >> +expand_sra::scalarizable_access (access_p acc, rtx regs, bool is_parm) > >> +{ > >> + /* Now only support reading from parms > >> + or writing to returns. */ > >> + if (is_parm && acc->write) > >> + return false; > >> + if (!is_parm && !acc->write) > >> + return false; > >> + > >> + /* Compute the position of the access in the parallel regs. */ > >> + int start_index = -1; > >> + int end_index = -1; > >> + HOST_WIDE_INT left_bits = 0; > >> + HOST_WIDE_INT right_bits = 0; > >> + query_position_in_parallel (acc->offset, acc->size, regs, start_index, > >> + end_index, left_bits, right_bits); > >> + > >> + /* Invalid access possition: padding or outof bound. */ > >> + if (start_index < 0 || end_index < 0) > >> + return false; > >> + > >> + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr)); > >> + /* Need multi-registers in a parallel for the access. */ > >> + if (expr_mode == BLKmode || end_index > start_index) > >> + { > >> + if (left_bits || right_bits) > >> + return false; > >> + if (expr_mode == BLKmode) > >> + return true; > >> + > >> + /* For large modes, only support TI/VECTOR in mult-registers. */ > >> + if (known_gt (acc->size, GET_MODE_BITSIZE (word_mode))) > >> + return expr_mode == TImode || VECTOR_MODE_P (expr_mode); > >> + return true; > >> + } > >> + > >> + gcc_assert (end_index == start_index); > >> + > >> + /* Just need one reg for the access. */ > >> + if (left_bits == 0 && right_bits == 0) > >> + return true; > >> + > >> + scalar_int_mode imode; > >> + /* Need to extract bits from the reg for the access. */ > >> + return !acc->write && int_mode_for_mode (expr_mode).exists (&imode); > >> +} > >> + > >> +/* Now, the base (parm/return) is scalarizable, only if all > >> + accesses of the BASE are scalariable. > >> + > >> + This function need to be updated, to support more complicate > >> + cases, like: > >> + - Some access are scalarizable, but some are not. > >> + - Access is writing to a parameter. > >> + - Writing accesses are overlap with multi-accesses. */ > >> + > >> +bool > >> +expand_sra::scalarizable_accesses (tree base, rtx regs) > >> +{ > >> + if (!base_access_vec) > >> + return false; > >> + vec<access_p> *access_vec = base_access_vec->get (base); > >> + if (!access_vec) > >> + return false; > >> + if (access_vec->is_empty ()) > >> + return false; > >> + > >> + bool is_parm = TREE_CODE (base) == PARM_DECL; > >> + int n = access_vec->length (); > >> + int cur_access_index = 0; > >> + for (; cur_access_index < n; cur_access_index++) > >> + if (!scalarizable_access ((*access_vec)[cur_access_index], regs, > >> is_parm)) > >> + break; > >> + > >> + /* It is ok if all access are scalarizable. */ > >> + if (cur_access_index == n) > >> + return true; > >> + > >> + base_access_vec->remove (base); > >> + return false; > >> +} > >> + > >> +static expand_sra *current_sra = NULL; > >> + > >> +/* Check if the PARAM (or return) is scalarizable. > >> + > >> + This interface is used in expand_function_start > >> + to check sra possiblity for parmeters. */ > >> + > >> +bool > >> +scalarizable_aggregate (tree parm, rtx regs) > >> +{ > >> + if (!current_sra) > >> + return false; > >> + return current_sra->scalarizable_accesses (parm, regs); > >> +} > >> + > >> +/* Check if interesting returns, and if they are scalarizable, > >> + set DECL_RTL as scalar registers. > >> + > >> + This interface is used in expand_function_start > >> + when outgoing registers are determinded for DECL_RESULT. */ > >> + > >> +void > >> +set_scalar_rtx_for_returns () > >> +{ > >> + rtx res = DECL_RTL (DECL_RESULT (current_function_decl)); > >> + edge_iterator ei; > >> + edge e; > >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > >> + if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src))) > >> + { > >> + tree val = gimple_return_retval (r); > >> + if (val && VAR_P (val) && scalarizable_aggregate (val, res)) > >> + SET_DECL_RTL (val, res); > >> + } > >> +} > >> + > >> /* Return an expression tree corresponding to the RHS of GIMPLE > >> statement STMT. */ > >> > >> @@ -3707,7 +4045,7 @@ expand_value_return (rtx val) > >> > >> tree decl = DECL_RESULT (current_function_decl); > >> rtx return_reg = DECL_RTL (decl); > >> - if (return_reg != val) > >> + if (!rtx_equal_p (return_reg, val)) > >> { > >> tree funtype = TREE_TYPE (current_function_decl); > >> tree type = TREE_TYPE (decl); > >> @@ -4423,6 +4761,12 @@ expand_debug_expr (tree exp) > >> addr_space_t as; > >> scalar_int_mode op0_mode, op1_mode, addr_mode; > >> > >> + /* TODO: Enable to debug expand-sra optimized parm/returns. */ > >> + tree base = get_base_address (exp); > >> + if ((TREE_CODE (base) == PARM_DECL || (VAR_P (base) && DECL_RTL_SET_P > >> (base))) > >> + && GET_CODE (DECL_RTL (base)) == PARALLEL) > >> + return NULL_RTX; > >> + > >> switch (TREE_CODE_CLASS (TREE_CODE (exp))) > >> { > >> case tcc_expression: > >> @@ -6628,6 +6972,10 @@ pass_expand::execute (function *fun) > >> auto_bitmap forced_stack_vars; > >> discover_nonconstant_array_refs (forced_stack_vars); > >> > >> + /* Enable light-expander-sra. */ > >> + current_sra = new expand_sra; > >> + scan_function (cfun, *current_sra); > >> + > >> /* Make sure all values used by the optimization passes have sane > >> defaults. */ > >> reg_renumber = 0; > >> @@ -7056,6 +7404,8 @@ pass_expand::execute (function *fun) > >> loop_optimizer_finalize (); > >> } > >> > >> + delete current_sra; > >> + current_sra = NULL; > >> timevar_pop (TV_POST_EXPAND); > >> > >> return 0; > >> diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h > >> index 0e551f6cfd3..3415c217708 100644 > >> --- a/gcc/cfgexpand.h > >> +++ b/gcc/cfgexpand.h > >> @@ -24,5 +24,7 @@ extern tree gimple_assign_rhs_to_tree (gimple *); > >> extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *); > >> extern void set_parm_rtl (tree, rtx); > >> > >> +extern bool scalarizable_aggregate (tree, rtx); > >> +extern void set_scalar_rtx_for_returns (); > >> > >> #endif /* GCC_CFGEXPAND_H */ > >> diff --git a/gcc/expr.cc b/gcc/expr.cc > >> index 763bd82c59f..5ba26e0ef52 100644 > >> --- a/gcc/expr.cc > >> +++ b/gcc/expr.cc > >> @@ -5618,7 +5618,10 @@ expand_assignment (tree to, tree from, bool > >> nontemporal) > >> Assignment of an array element at a constant index, and assignment of > >> an array element in an unaligned packed structure field, has the same > >> problem. Same for (partially) storing into a non-memory object. */ > >> - if (handled_component_p (to) > >> + if ((handled_component_p (to) > >> + && !(VAR_P (get_base_address (to)) > > > > get_base_address is pretty expensive so please cache the result. > > Thanks for pointing out this! > > > > >> + && DECL_RTL_SET_P (get_base_address (to)) > >> + && GET_CODE (DECL_RTL (get_base_address (to))) == PARALLEL)) > > > > I think that's possibly fragile in case a PARALLEL can also appear by > > other means here. I also think it's the wrong thing to map a structural > > Understand your concern. The current trunk code uses PARALLEL to carry > the incoming/outgoing registers for aggregate, so this patch reuse this > character. While it is possible that PARALLEL is used by other means. > To make it safe, adding an additional flag may be an idea. (While this > would not needed if using your suggestions below). > > > GIMPLE accordingly with artificial decls which would also be the way > > to preserve debug info. > > > > In principle "expander SRA" isn't different from regular SRA apart from > > the fact that it handles parameter and return variables and heuristically > > includes the ABI (does it?) in decomposition. > > > > So the only important difference is that we cannot represent the > > change for the incoming arguments or the outgoing regs in GIMPLE > > but we should only have to special case _those_ bits, not all the > > other accesses we can rewrite. > > > > We could even use the actual SRA pass in a special mode right > > before RTL expansion (or close to it) and invent a special (but ugly) > > representation for the incoming/outgoing part, say by using > > new internal functions present between the SRA pass and RTL > > expansion: > > > Previously, I tried the method to rerun tree-sra just before the > "optimized" pass once, and find that some enhancement woudl be needed to > make it work: for example, the parameters analyzing seems not work for > some cases. While the two major reasons for me to combine the > 'light-sra' inside expander: > 1. We may just use the reg/pseudos based on incoming/outgoing registers > to access the struct. Then, no need to create the replacements like > tree-sra and ipa-sra. > > 2. For small structures which may have int mode, RTL passes (start from > expander) are using the int reg to access the struct. It would be > better to avoid struct/member access at the beggining of expeander. > It seems these issues can also be relieved by the idea of introducing > "new internal function". We just need to expand the internal function > in any expected way. > > > struct X foo (struct X x) > > { > > int D.1234 = .ARG_PART (x, offset, size); > > int D.1235 = .ARG_PART (x, offset, size); > > D.2345 = .SET_ARG_PARTS (D.1234, offset, size, D.1235, offset, size); > > return D.2345; > > } > > This code expresses two major functionalities: One is extracting fields > from aggregate (mainly for parameters); and another one is constructing > an aggregate from the value of elements (mainly for returns).
Yes. In principle BIT_FIELD_REF could be used for the former and a CONSTRUCTOR used for the latter but I've used internal functions to tightly control how we expand these bits - in fact I think for all practical cases we care about the SRA pass should have made sure they expand to plain (register) moves, that is, the pieces 1:1 correspond to incoming argument registers or outgoing return value registers. Richard.