Hi Martin,
Not sure about your current option about re-using the ipa-sra code in the light-expander-sra. And if anything I could input please let me know. And I'm thinking about the difference between the expander-sra, ipa-sra and tree-sra. 1. For stmts walking, expander-sra has special behavior for return-stmt, and also a little special on assign-stmt. And phi stmts are not checked by ipa-sra/tree-sra. 2. For the access structure, I'm also thinking if we need a tree structure; it would be useful when checking overlaps, it was not used now in the expander-sra. For ipa-sra and tree-sra, I notice that there is some similar code, but of cause there are differences. While it seems the difference is 'intended', for example: 1. when creating and accessing, 'size != max_size' is acceptable in tree-sra but not for ipa-sra. 2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but not ok for tree-ipa. I'm wondering if those slight difference blocks re-use the code between ipa-sra and tree-sra. The expander-sra may be more light, for example, maybe we can use FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not need to walk all the stmts. BR, Jeff (Jiufu Guo) Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi Martin, > > Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> Hi, >> >> Martin Jambor <mjam...@suse.cz> writes: >> >>> Hi, >>> >>> On Tue, May 30 2023, Richard Biener wrote: >>>> On Mon, 29 May 2023, Jiufu Guo wrote: >>>> >>>>> Hi, >>>>> >>>>> Previously, I was investigating some struct parameters and returns related >>>>> PRs 69143/65421/108073. >>>>> >>>>> Investigating the issues case by case, and drafting patches for each of >>>>> them one by one. This would help us to enhance code incrementally. >>>>> While, this way, patches would interact with each other and implement >>>>> different codes for similar issues (because of the different paths in >>>>> gimple/rtl). We may have a common fix for those issues. >>>>> >>>>> We know a few other related PRs(such as meta-bug PR101926) exist. For >>>>> those >>>>> PRs in different targets with different symptoms (and also different root >>>>> cause), I would expect a method could help some of them, but it may >>>>> be hard to handle all of them in one fix. >>>>> >>>>> With investigation and check discussion for the issues, I remember a >>>>> suggestion from Richard: it would be nice to perform some SRA-like >>>>> analysis >>>>> for the accesses on the structs (parameter/returns). >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html >>>>> This may be a 'fairly common method' for those issues. With this idea, >>>>> I drafted a patch as below in this mail. >>>>> >>>>> I also thought about directly using tree-sra.cc, e.g. enhance it and >>>>> rerun it >>>>> at the end of GIMPLE passes. While since some issues are introduced inside >>>>> the expander, so below patch also co-works with other parts of the >>>>> expander. >>>>> And since we already have tree-sra in gimple pass, we only need to take >>>>> more >>>>> care on parameter and return in this patch: other decls could be handled >>>>> well in tree-sra. >>>>> >>>>> The steps of this patch are: >>>>> 1. Collect struct type parameters and returns, and then scan the function >>>>> to >>>>> get the accesses on them. And figure out the accesses which would be >>>>> profitable >>>>> to be scalarized (using registers of the parameter/return ). Now, reading >>>>> on >>>>> parameter and writing on returns are checked in the current patch. >>>>> 2. When/after the scalar registers are determined/expanded for the return >>>>> or >>>>> parameters, compute the corresponding scalar register(s) for each >>>>> accesses of >>>>> the return/parameter, and prepare the scalar RTLs for those accesses. >>>>> 3. When using/expanding the accesses expression, leverage the >>>>> computed/prepared >>>>> scalars directly. >>>>> >>>>> This patch is tested on ppc64 both LE and BE. >>>>> To continue, I would ask for comments and suggestions first. And then I >>>>> would >>>>> update/enhance accordingly. Thanks in advance! >>>> >>>> Thanks for working on this - the description above sounds exactly like >>>> what should be done. >>>> >>>> Now - I'd like the code to re-use the access tree data structure from >>>> SRA plus at least the worker creating the accesses from a stmt. >>> > > I'm thinking about which part of the code can be re-used from > ipa-sra and tree-sra. > It seems there are some similar concepts between them: > "access with offset/size", "collect and check candidates", > "analyze accesses"... > > While because the purposes are different, the logic and behavior > between them (ipa-sra, tree-sra, and expander-sra) are different, > even for similar concepts. > > The same behavior and similar concept may be reusable. Below list > may be part of them. > *. allocate and maintain access > basic access structure: offset, size, reverse > *. type or expr checking > *. disqualify > *. scan and build expr access > *. scan and walk stmts (return/assign/call/asm) > *. collect candidates > *. initialize/deinitialize > *. access dump > > There are different behaviors for a similar concept. > For examples: > *. Access has grg/queues in tree-sra, access has nonarg in ipa-sra, > and expander-sra does not check access's child/sibling yet. > *. for same stmt(assign/call), different sra checks different logic. > *. candidates have different checking logic: ipa-sra checks more stuff. > > Is this align with your thoughts? Thanks for comments! > > BR, > Jeff (Jiufu Guo) > >> Thanks Martin for your reply and thanks for your time! >> >>> I have had a first look at the patch but still need to look into it more >>> to understand how it uses the information it gathers. >>> >>> My plan is to make the access-tree infrastructure of IPA-SRA more >>> generic and hopefully usable even for this purpose, rather than the one >>> in tree-sra.cc. But that really builds a tree of accesses, bailing out >>> on any partial overlaps, for example, which may not be the right thing >>> here since I don't see any tree-building here. >> >> Yeap, both in tree-sra and ipa-sra, there are concepts about >> "access" and "scan functions/stmts". In this light-sra, these concepts >> are also used. And you may notice that ipa-sra and tree-sra have more >> logic than the current 'light-expand-sra'. >> >> Currently, the 'light-expand-sra' just takes care few things: reading >> from parameter, writing to returns, and disabling sra if address-taken. >> As you notice, now the "access" in this patch is not in a 'tree-struct', >> it is just a 'flat' (or say map & vector). And overlaps between >> accesses are not checked because they are all just reading (for parm). >> >> When we take care of more stuff: passing to call argument, occur in >> memory assignment, occur in line asm... This light-expander-sra would be >> more and more like tee-sra and ipa-sra. And it would be good to leverage >> more capabilities from tree-sra and ipa-sra. So, I agree that it would be >> a great idea to share and reuse the same struct. >> >>> But I still need to >>> properly read set_scalar_rtx_for_aggregate_access function in the patch, >>> which I plan to do next week. >> >> set_scalar_rtx_for_aggregate_access is another key part of this patch. >> Different from tree-sra/ipa-sra (which creates new scalars SSA for each >> access), this patch invokes "set_scalar_rtx_for_aggregate_access" to >> create an rtx expression for each access. Now, this part may not common >> with tree-sra and ipa-sra. >> >> This function is invoked for each parameter if the parameter is >> aggregate type and passed via registers. >> For each access about this parameter, the function creates an rtx >> according to the offset/size/mode of the access. The created rtx maybe: >> 1. one rtx pseudo corresponds to an incoming reg, >> 2. one rtx pseudo which is assigned by a part of incoming reg after >> shift and mode adjust, >> 3. a parallel rtx contains a few rtx pseudos corresponding to the >> incoming registers. >> For return, only 1 and 3 are ok. >> >> BR, >> Jeff (Jiufu Guo) >> >>> >>> Thanks, >>> >>> Martin >>> >>>> >>>> The RTL expansion code already does a sweep over stmts in >>>> discover_nonconstant_array_refs which makes sure RTL expansion doesn't >>>> scalarize (aka assign non-stack) to variables which have accesses >>>> that would later eventually FAIL to expand when operating on registers. >>>> That's very much related to the task at hand so we should try to >>>> at least merge the CFG walks of both (it produces a forced_stack_vars >>>> bitmap). >>>> >>>> Can you work together with Martin to split out the access tree >>>> data structure and share it? >>>> >>>> I didn't look in detail as of how you make use of the information >>>> yet. >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> >>>>> BR, >>>>> Jeff (Jiufu) >>>>> >>>>> >>>>> --- >>>>> gcc/cfgexpand.cc | 567 ++++++++++++++++++- >>>>> gcc/expr.cc | 15 +- >>>>> gcc/function.cc | 26 +- >>>>> gcc/opts.cc | 8 +- >>>>> gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- >>>>> gcc/testsuite/gcc.target/powerpc/pr108073.c | 29 + >>>>> gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 + >>>>> gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 32 ++ >>>>> 8 files changed, 675 insertions(+), 10 deletions(-) >>>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c >>>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c >>>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c >>>>> >>>>> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc >>>>> index 85a93a547c0..95c29b6b6fe 100644 >>>>> --- a/gcc/cfgexpand.cc >>>>> +++ b/gcc/cfgexpand.cc >>>>> @@ -97,6 +97,564 @@ static bool defer_stack_allocation (tree, bool); >>>>> >>>>> static void record_alignment_for_reg_var (unsigned int); >>>>> >>>>> +/* For light SRA in expander about paramaters and returns. */ >>>>> +namespace { >>>>> + >>>>> +struct access >>>>> +{ >>>>> + /* Each accessing on the aggragate is about OFFSET/SIZE and BASE. */ >>>>> + HOST_WIDE_INT offset; >>>>> + HOST_WIDE_INT size; >>>>> + tree base; >>>>> + bool writing; >>>>> + >>>>> + /* The context expression of this access. */ >>>>> + tree expr; >>>>> + >>>>> + /* The rtx for the access: link to incoming/returning register(s). */ >>>>> + rtx rtx_val; >>>>> +}; >>>>> + >>>>> +typedef struct access *access_p; >>>>> + >>>>> +/* Expr (tree) -> Acess (access_p) map. */ >>>>> +static hash_map<tree, access_p> *expr_access_vec; >>>>> + >>>>> +/* Base (tree) -> Vector (vec<access_p> *) map. */ >>>>> +static hash_map<tree, auto_vec<access_p> > *base_access_vec; >>>>> + >>>>> +/* Return a vector of pointers to accesses for the variable given in >>>>> BASE or >>>>> + NULL if there is none. */ >>>>> + >>>>> +static vec<access_p> * >>>>> +get_base_access_vector (tree base) >>>>> +{ >>>>> + return base_access_vec->get (base); >>>>> +} >>>>> + >>>>> +/* Remove DECL from candidates for SRA. */ >>>>> +static void >>>>> +disqualify_candidate (tree decl) >>>>> +{ >>>>> + decl = get_base_address (decl); >>>>> + base_access_vec->remove (decl); >>>>> +} >>>>> + >>>>> +/* Create and insert access for EXPR. Return created access, or NULL if >>>>> it is >>>>> + not possible. */ >>>>> +static struct access * >>>>> +create_access (tree expr, bool write) >>>>> +{ >>>>> + poly_int64 poffset, psize, pmax_size; >>>>> + bool reverse; >>>>> + >>>>> + tree base >>>>> + = get_ref_base_and_extent (expr, &poffset, &psize, &pmax_size, >>>>> &reverse); >>>>> + >>>>> + if (!DECL_P (base)) >>>>> + return NULL; >>>>> + >>>>> + vec<access_p> *access_vec = get_base_access_vector (base); >>>>> + if (!access_vec) >>>>> + return NULL; >>>>> + >>>>> + /* TODO: support reverse. */ >>>>> + if (reverse) >>>>> + { >>>>> + disqualify_candidate (expr); >>>>> + return NULL; >>>>> + } >>>>> + >>>>> + HOST_WIDE_INT offset, size, max_size; >>>>> + if (!poffset.is_constant (&offset) || !psize.is_constant (&size) >>>>> + || !pmax_size.is_constant (&max_size)) >>>>> + return NULL; >>>>> + >>>>> + if (size != max_size || size == 0 || offset < 0 || size < 0 >>>>> + || offset + size > tree_to_shwi (DECL_SIZE (base))) >>>>> + return NULL; >>>>> + >>>>> + struct access *access = XNEWVEC (struct access, 1); >>>>> + >>>>> + memset (access, 0, sizeof (struct access)); >>>>> + access->base = base; >>>>> + access->offset = offset; >>>>> + access->size = size; >>>>> + access->expr = expr; >>>>> + access->writing = write; >>>>> + access->rtx_val = NULL_RTX; >>>>> + >>>>> + access_vec->safe_push (access); >>>>> + >>>>> + return access; >>>>> +} >>>>> + >>>>> +/* Return true if VAR is a candidate for SRA. */ >>>>> +static bool >>>>> +add_sra_candidate (tree var) >>>>> +{ >>>>> + tree type = TREE_TYPE (var); >>>>> + >>>>> + if (!AGGREGATE_TYPE_P (type) || TREE_THIS_VOLATILE (var) >>>>> + || !COMPLETE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type)) >>>>> + || tree_to_shwi (TYPE_SIZE (type)) == 0 >>>>> + || TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT >>>>> (va_list_type_node)) >>>>> + return false; >>>>> + >>>>> + base_access_vec->get_or_insert (var); >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>>> +/* Callback of walk_stmt_load_store_addr_ops visit_addr used to remove >>>>> + operands with address taken. */ >>>>> +static tree >>>>> +visit_addr (tree *tp, int *, void *) >>>>> +{ >>>>> + tree op = *tp; >>>>> + if (op && DECL_P (op)) >>>>> + disqualify_candidate (op); >>>>> + >>>>> + return NULL; >>>>> +} >>>>> + >>>>> +/* Scan expression EXPR and create access structures for all accesses to >>>>> + candidates for scalarization. Return the created access or NULL if >>>>> none is >>>>> + created. */ >>>>> +static struct access * >>>>> +build_access_from_expr (tree expr, bool write) >>>>> +{ >>>>> + if (TREE_CODE (expr) == VIEW_CONVERT_EXPR) >>>>> + expr = TREE_OPERAND (expr, 0); >>>>> + >>>>> + if (TREE_CODE (expr) == BIT_FIELD_REF || storage_order_barrier_p (expr) >>>>> + || TREE_THIS_VOLATILE (expr)) >>>>> + { >>>>> + disqualify_candidate (expr); >>>>> + return NULL; >>>>> + } >>>>> + >>>>> + switch (TREE_CODE (expr)) >>>>> + { >>>>> + case MEM_REF: { >>>>> + tree op = TREE_OPERAND (expr, 0); >>>>> + if (TREE_CODE (op) == ADDR_EXPR) >>>>> + disqualify_candidate (TREE_OPERAND (op, 0)); >>>>> + break; >>>>> + } >>>>> + case ADDR_EXPR: >>>>> + case IMAGPART_EXPR: >>>>> + case REALPART_EXPR: >>>>> + disqualify_candidate (TREE_OPERAND (expr, 0)); >>>>> + break; >>>>> + case VAR_DECL: >>>>> + case PARM_DECL: >>>>> + case RESULT_DECL: >>>>> + case COMPONENT_REF: >>>>> + case ARRAY_REF: >>>>> + case ARRAY_RANGE_REF: >>>>> + return create_access (expr, write); >>>>> + break; >>>>> + default: >>>>> + break; >>>>> + } >>>>> + >>>>> + return NULL; >>>>> +} >>>>> + >>>>> +/* Scan function and look for interesting expressions and create access >>>>> + structures for them. */ >>>>> +static void >>>>> +scan_function (void) >>>>> +{ >>>>> + basic_block bb; >>>>> + >>>>> + FOR_EACH_BB_FN (bb, cfun) >>>>> + { >>>>> + for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); >>>>> + gsi_next (&gsi)) >>>>> + { >>>>> + gphi *phi = gsi.phi (); >>>>> + for (size_t i = 0; i < gimple_phi_num_args (phi); i++) >>>>> + { >>>>> + tree t = gimple_phi_arg_def (phi, i); >>>>> + walk_tree (&t, visit_addr, NULL, NULL); >>>>> + } >>>>> + } >>>>> + >>>>> + for (gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb >>>>> (bb); >>>>> + !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) >>>>> + { >>>>> + gimple *stmt = gsi_stmt (gsi); >>>>> + switch (gimple_code (stmt)) >>>>> + { >>>>> + case GIMPLE_RETURN: { >>>>> + tree r = gimple_return_retval (as_a<greturn *> (stmt)); >>>>> + if (r && VAR_P (r) && r != DECL_RESULT (current_function_decl)) >>>>> + build_access_from_expr (r, true); >>>>> + } >>>>> + break; >>>>> + case GIMPLE_ASSIGN: >>>>> + if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt)) >>>>> + { >>>>> + tree lhs = gimple_assign_lhs (stmt); >>>>> + tree rhs = gimple_assign_rhs1 (stmt); >>>>> + if (TREE_CODE (rhs) == CONSTRUCTOR) >>>>> + disqualify_candidate (lhs); >>>>> + else >>>>> + { >>>>> + build_access_from_expr (rhs, false); >>>>> + build_access_from_expr (lhs, true); >>>>> + } >>>>> + } >>>>> + break; >>>>> + default: >>>>> + walk_gimple_op (stmt, visit_addr, NULL); >>>>> + break; >>>>> + } >>>>> + } >>>>> + } >>>>> +} >>>>> + >>>>> +/* Collect the parameter and returns with type which is suitable for >>>>> + * scalarization. */ >>>>> +static bool >>>>> +collect_light_sra_candidates (void) >>>>> +{ >>>>> + bool ret = false; >>>>> + >>>>> + /* Collect parameters. */ >>>>> + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; >>>>> + parm = DECL_CHAIN (parm)) >>>>> + ret |= add_sra_candidate (parm); >>>>> + >>>>> + /* Collect VARs on returns. */ >>>>> + if (DECL_RESULT (current_function_decl)) >>>>> + { >>>>> + edge_iterator ei; >>>>> + edge e; >>>>> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) >>>>> + if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src))) >>>>> + { >>>>> + tree val = gimple_return_retval (r); >>>>> + if (val && VAR_P (val)) >>>>> + ret |= add_sra_candidate (val); >>>>> + } >>>>> + } >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> +/* Now, only scalarize the parms only with reading >>>>> + or returns only with writing. */ >>>>> +bool >>>>> +check_access_vec (tree const &base, auto_vec<access_p> const &access_vec, >>>>> + auto_vec<tree> *unqualify_vec) >>>>> +{ >>>>> + bool read = false; >>>>> + bool write = false; >>>>> + for (unsigned int j = 0; j < access_vec.length (); j++) >>>>> + { >>>>> + struct access *access = access_vec[j]; >>>>> + if (access->writing) >>>>> + write = true; >>>>> + else >>>>> + read = true; >>>>> + >>>>> + if (write && read) >>>>> + break; >>>>> + } >>>>> + if ((write && read) || (!write && !read)) >>>>> + unqualify_vec->safe_push (base); >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>>> +/* Analyze all the accesses, remove those inprofitable candidates. >>>>> + And build the expr->access map. */ >>>>> +static void >>>>> +analyze_accesses () >>>>> +{ >>>>> + auto_vec<tree> unqualify_vec; >>>>> + base_access_vec->traverse<auto_vec<tree> *, check_access_vec> ( >>>>> + &unqualify_vec); >>>>> + >>>>> + tree base; >>>>> + unsigned i; >>>>> + FOR_EACH_VEC_ELT (unqualify_vec, i, base) >>>>> + disqualify_candidate (base); >>>>> +} >>>>> + >>>>> +static void >>>>> +prepare_expander_sra () >>>>> +{ >>>>> + if (optimize <= 0) >>>>> + return; >>>>> + >>>>> + base_access_vec = new hash_map<tree, auto_vec<access_p> >; >>>>> + expr_access_vec = new hash_map<tree, access_p>; >>>>> + >>>>> + if (collect_light_sra_candidates ()) >>>>> + { >>>>> + scan_function (); >>>>> + analyze_accesses (); >>>>> + } >>>>> +} >>>>> + >>>>> +static void >>>>> +free_expander_sra () >>>>> +{ >>>>> + if (optimize <= 0 || !expr_access_vec) >>>>> + return; >>>>> + delete expr_access_vec; >>>>> + expr_access_vec = 0; >>>>> + delete base_access_vec; >>>>> + base_access_vec = 0; >>>>> +} >>>>> +} /* namespace */ >>>>> + >>>>> +/* Check If there is an sra access for the expr. >>>>> + Return the correspond scalar sym for the access. */ >>>>> +rtx >>>>> +get_scalar_rtx_for_aggregate_expr (tree expr) >>>>> +{ >>>>> + if (!expr_access_vec) >>>>> + return NULL_RTX; >>>>> + access_p *access = expr_access_vec->get (expr); >>>>> + return access ? (*access)->rtx_val : NULL_RTX; >>>>> +} >>>>> + >>>>> +extern rtx >>>>> +expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int); >>>>> + >>>>> +/* Compute/Set RTX registers for those accesses on BASE. */ >>>>> +void >>>>> +set_scalar_rtx_for_aggregate_access (tree base, rtx regs) >>>>> +{ >>>>> + if (!base_access_vec) >>>>> + return; >>>>> + vec<access_p> *access_vec = get_base_access_vector (base); >>>>> + if (!access_vec) >>>>> + return; >>>>> + >>>>> + /* Go through each access, compute corresponding rtx(regs or subregs) >>>>> + for the expression. */ >>>>> + int n = access_vec->length (); >>>>> + int cur_access_index = 0; >>>>> + for (; cur_access_index < n; cur_access_index++) >>>>> + { >>>>> + access_p acc = (*access_vec)[cur_access_index]; >>>>> + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr)); >>>>> + /* non BLK in mult registers*/ >>>>> + if (expr_mode != BLKmode >>>>> + && known_gt (acc->size, GET_MODE_BITSIZE (word_mode))) >>>>> + break; >>>>> + >>>>> + int start_index = -1; >>>>> + int end_index = -1; >>>>> + HOST_WIDE_INT left_margin_bits = 0; >>>>> + HOST_WIDE_INT right_margin_bits = 0; >>>>> + int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1; >>>>> + for (; cur_index < XVECLEN (regs, 0); cur_index++) >>>>> + { >>>>> + rtx slot = XVECEXP (regs, 0, cur_index); >>>>> + HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT; >>>>> + HOST_WIDE_INT size >>>>> + = GET_MODE_BITSIZE (GET_MODE (XEXP (slot, 0))).to_constant (); >>>>> + if (off <= acc->offset && off + size > acc->offset) >>>>> + { >>>>> + start_index = cur_index; >>>>> + left_margin_bits = acc->offset - off; >>>>> + } >>>>> + if (off + size >= acc->offset + acc->size) >>>>> + { >>>>> + end_index = cur_index; >>>>> + right_margin_bits = off + size - (acc->offset + acc->size); >>>>> + break; >>>>> + } >>>>> + } >>>>> + /* accessing pading and outof bound. */ >>>>> + if (start_index < 0 || end_index < 0) >>>>> + break; >>>>> + >>>>> + /* Need a parallel for possible multi-registers. */ >>>>> + if (expr_mode == BLKmode || end_index > start_index) >>>>> + { >>>>> + /* Can not support start from middle of a register. */ >>>>> + if (left_margin_bits != 0) >>>>> + break; >>>>> + >>>>> + int len = end_index - start_index + 1; >>>>> + const int margin = 3; /* more space for SI, HI, QI. */ >>>>> + rtx *tmps = XALLOCAVEC (rtx, len + (right_margin_bits ? margin : 0)); >>>>> + >>>>> + HOST_WIDE_INT start_off >>>>> + = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1)); >>>>> + int pos = 0; >>>>> + for (; pos < len - (right_margin_bits ? 1 : 0); pos++) >>>>> + { >>>>> + int index = start_index + pos; >>>>> + rtx orig_reg = XEXP (XVECEXP (regs, 0, index), 0); >>>>> + machine_mode mode = GET_MODE (orig_reg); >>>>> + rtx reg = NULL_RTX; >>>>> + if (HARD_REGISTER_P (orig_reg)) >>>>> + { >>>>> + /* Reading from param hard reg need to be moved to a temp. */ >>>>> + gcc_assert (!acc->writing); >>>>> + reg = gen_reg_rtx (mode); >>>>> + emit_move_insn (reg, orig_reg); >>>>> + } >>>>> + else >>>>> + reg = orig_reg; >>>>> + >>>>> + HOST_WIDE_INT off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)); >>>>> + tmps[pos] >>>>> + = gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off - start_off)); >>>>> + } >>>>> + >>>>> + /* There are some fields are in part of registers. */ >>>>> + if (right_margin_bits != 0) >>>>> + { >>>>> + if (acc->writing) >>>>> + break; >>>>> + >>>>> + gcc_assert ((right_margin_bits % BITS_PER_UNIT) == 0); >>>>> + HOST_WIDE_INT off_byte >>>>> + = UINTVAL (XEXP (XVECEXP (regs, 0, end_index), 1)) - start_off; >>>>> + rtx orig_reg = XEXP (XVECEXP (regs, 0, end_index), 0); >>>>> + machine_mode orig_mode = GET_MODE (orig_reg); >>>>> + gcc_assert (GET_MODE_CLASS (orig_mode) == MODE_INT); >>>>> + >>>>> + machine_mode mode_aux[] = {SImode, HImode, QImode}; >>>>> + HOST_WIDE_INT reg_size >>>>> + = GET_MODE_BITSIZE (orig_mode).to_constant (); >>>>> + HOST_WIDE_INT off_bits = 0; >>>>> + for (unsigned long j = 0; >>>>> + j < sizeof (mode_aux) / sizeof (mode_aux[0]); j++) >>>>> + { >>>>> + HOST_WIDE_INT submode_bitsize >>>>> + = GET_MODE_BITSIZE (mode_aux[j]).to_constant (); >>>>> + if (reg_size - right_margin_bits - off_bits >>>>> + >= submode_bitsize) >>>>> + { >>>>> + rtx reg = gen_reg_rtx (orig_mode); >>>>> + emit_move_insn (reg, orig_reg); >>>>> + >>>>> + poly_uint64 lowpart_off >>>>> + = subreg_lowpart_offset (mode_aux[j], orig_mode); >>>>> + int lowpart_off_bits >>>>> + = lowpart_off.to_constant () * BITS_PER_UNIT; >>>>> + int shift_bits = lowpart_off_bits >= off_bits >>>>> + ? (lowpart_off_bits - off_bits) >>>>> + : (off_bits - lowpart_off_bits); >>>>> + if (shift_bits > 0) >>>>> + reg = expand_shift (RSHIFT_EXPR, orig_mode, reg, >>>>> + shift_bits, NULL, 1); >>>>> + rtx subreg = gen_lowpart (mode_aux[j], reg); >>>>> + rtx off = GEN_INT (off_byte); >>>>> + tmps[pos++] >>>>> + = gen_rtx_EXPR_LIST (mode_aux[j], subreg, off); >>>>> + off_byte += submode_bitsize / BITS_PER_UNIT; >>>>> + off_bits += submode_bitsize; >>>>> + } >>>>> + } >>>>> + } >>>>> + >>>>> + /* Currently, PARALLELs with register elements for param/returns >>>>> + are using BLKmode. */ >>>>> + acc->rtx_val = gen_rtx_PARALLEL (TYPE_MODE (TREE_TYPE (acc->expr)), >>>>> + gen_rtvec_v (pos, tmps)); >>>>> + continue; >>>>> + } >>>>> + >>>>> + /* The access corresponds to one reg. */ >>>>> + if (end_index == start_index && left_margin_bits == 0 >>>>> + && right_margin_bits == 0) >>>>> + { >>>>> + rtx orig_reg = XEXP (XVECEXP (regs, 0, start_index), 0); >>>>> + rtx reg = NULL_RTX; >>>>> + if (HARD_REGISTER_P (orig_reg)) >>>>> + { >>>>> + /* Reading from param hard reg need to be moved to a temp. */ >>>>> + gcc_assert (!acc->writing); >>>>> + reg = gen_reg_rtx (GET_MODE (orig_reg)); >>>>> + emit_move_insn (reg, orig_reg); >>>>> + } >>>>> + else >>>>> + reg = orig_reg; >>>>> + if (GET_MODE (orig_reg) != expr_mode) >>>>> + reg = gen_lowpart (expr_mode, reg); >>>>> + >>>>> + acc->rtx_val = reg; >>>>> + continue; >>>>> + } >>>>> + >>>>> + /* It is accessing a filed which is part of a register. */ >>>>> + scalar_int_mode imode; >>>>> + if (!acc->writing && end_index == start_index >>>>> + && int_mode_for_size (acc->size, 1).exists (&imode)) >>>>> + { >>>>> + /* get and copy original register inside the param. */ >>>>> + rtx orig_reg = XEXP (XVECEXP (regs, 0, start_index), 0); >>>>> + machine_mode mode = GET_MODE (orig_reg); >>>>> + gcc_assert (GET_MODE_CLASS (mode) == MODE_INT); >>>>> + rtx reg = gen_reg_rtx (mode); >>>>> + emit_move_insn (reg, orig_reg); >>>>> + >>>>> + /* shift to expect part. */ >>>>> + poly_uint64 lowpart_off = subreg_lowpart_offset (imode, mode); >>>>> + int lowpart_off_bits = lowpart_off.to_constant () * BITS_PER_UNIT; >>>>> + int shift_bits = lowpart_off_bits >= left_margin_bits >>>>> + ? (lowpart_off_bits - left_margin_bits) >>>>> + : (left_margin_bits - lowpart_off_bits); >>>>> + if (shift_bits > 0) >>>>> + reg = expand_shift (RSHIFT_EXPR, mode, reg, shift_bits, NULL, 1); >>>>> + >>>>> + /* move corresond part subreg to result. */ >>>>> + rtx subreg = gen_lowpart (imode, reg); >>>>> + rtx result = gen_reg_rtx (imode); >>>>> + emit_move_insn (result, subreg); >>>>> + >>>>> + if (expr_mode != imode) >>>>> + result = gen_lowpart (expr_mode, result); >>>>> + >>>>> + acc->rtx_val = result; >>>>> + continue; >>>>> + } >>>>> + >>>>> + break; >>>>> + } >>>>> + >>>>> + /* Some access expr(s) are not scalarized. */ >>>>> + if (cur_access_index != n) >>>>> + disqualify_candidate (base); >>>>> + else >>>>> + { >>>>> + /* Add elements to expr->access map. */ >>>>> + for (int j = 0; j < n; j++) >>>>> + { >>>>> + access_p access = (*access_vec)[j]; >>>>> + expr_access_vec->put (access->expr, access); >>>>> + } >>>>> + } >>>>> +} >>>>> + >>>>> +void >>>>> +set_scalar_rtx_for_returns () >>>>> +{ >>>>> + tree res = DECL_RESULT (current_function_decl); >>>>> + gcc_assert (res); >>>>> + edge_iterator ei; >>>>> + edge e; >>>>> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) >>>>> + if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src))) >>>>> + { >>>>> + tree val = gimple_return_retval (r); >>>>> + if (val && VAR_P (val)) >>>>> + set_scalar_rtx_for_aggregate_access (val, DECL_RTL (res)); >>>>> + } >>>>> +} >>>>> + >>>>> /* Return an expression tree corresponding to the RHS of GIMPLE >>>>> statement STMT. */ >>>>> >>>>> @@ -3778,7 +4336,8 @@ expand_return (tree retval) >>>>> >>>>> /* If we are returning the RESULT_DECL, then the value has already >>>>> been stored into it, so we don't have to do anything special. */ >>>>> - if (TREE_CODE (retval_rhs) == RESULT_DECL) >>>>> + if (TREE_CODE (retval_rhs) == RESULT_DECL >>>>> + || get_scalar_rtx_for_aggregate_expr (retval_rhs)) >>>>> expand_value_return (result_rtl); >>>>> >>>>> /* If the result is an aggregate that is being returned in one (or >>>>> more) >>>>> @@ -4422,6 +4981,9 @@ expand_debug_expr (tree exp) >>>>> int unsignedp = TYPE_UNSIGNED (TREE_TYPE (exp)); >>>>> addr_space_t as; >>>>> scalar_int_mode op0_mode, op1_mode, addr_mode; >>>>> + rtx x = get_scalar_rtx_for_aggregate_expr (exp); >>>>> + if (x) >>>>> + return NULL_RTX;/* optimized out. */ >>>>> >>>>> switch (TREE_CODE_CLASS (TREE_CODE (exp))) >>>>> { >>>>> @@ -6630,6 +7192,8 @@ pass_expand::execute (function *fun) >>>>> avoid_deep_ter_for_debug (gsi_stmt (gsi), 0); >>>>> } >>>>> >>>>> + prepare_expander_sra (); >>>>> + >>>>> /* Mark arrays indexed with non-constant indices with >>>>> TREE_ADDRESSABLE. */ >>>>> auto_bitmap forced_stack_vars; >>>>> discover_nonconstant_array_refs (forced_stack_vars); >>>>> @@ -7062,6 +7626,7 @@ pass_expand::execute (function *fun) >>>>> loop_optimizer_finalize (); >>>>> } >>>>> >>>>> + free_expander_sra (); >>>>> timevar_pop (TV_POST_EXPAND); >>>>> >>>>> return 0; >>>>> diff --git a/gcc/expr.cc b/gcc/expr.cc >>>>> index 56b51876f80..b970f98e689 100644 >>>>> --- a/gcc/expr.cc >>>>> +++ b/gcc/expr.cc >>>>> @@ -100,6 +100,7 @@ static void do_tablejump (rtx, machine_mode, rtx, >>>>> rtx, rtx, >>>>> static rtx const_vector_from_tree (tree); >>>>> static tree tree_expr_size (const_tree); >>>>> static void convert_mode_scalar (rtx, rtx, int); >>>>> +rtx get_scalar_rtx_for_aggregate_expr (tree); >>>>> >>>>> >>>>> /* This is run to set up which modes can be used >>>>> @@ -5623,11 +5624,12 @@ expand_assignment (tree to, tree from, bool >>>>> nontemporal) >>>>> Assignment of an array element at a constant index, and assignment >>>>> of >>>>> an array element in an unaligned packed structure field, has the >>>>> same >>>>> problem. Same for (partially) storing into a non-memory object. */ >>>>> - if (handled_component_p (to) >>>>> - || (TREE_CODE (to) == MEM_REF >>>>> - && (REF_REVERSE_STORAGE_ORDER (to) >>>>> - || mem_ref_refers_to_non_mem_p (to))) >>>>> - || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE) >>>>> + if (!get_scalar_rtx_for_aggregate_expr (to) >>>>> + && (handled_component_p (to) >>>>> + || (TREE_CODE (to) == MEM_REF >>>>> + && (REF_REVERSE_STORAGE_ORDER (to) >>>>> + || mem_ref_refers_to_non_mem_p (to))) >>>>> + || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE)) >>>>> { >>>>> machine_mode mode1; >>>>> poly_int64 bitsize, bitpos; >>>>> @@ -8995,6 +8997,9 @@ expand_expr_real (tree exp, rtx target, >>>>> machine_mode tmode, >>>>> ret = CONST0_RTX (tmode); >>>>> return ret ? ret : const0_rtx; >>>>> } >>>>> + rtx x = get_scalar_rtx_for_aggregate_expr (exp); >>>>> + if (x) >>>>> + return x; >>>>> >>>>> ret = expand_expr_real_1 (exp, target, tmode, modifier, alt_rtl, >>>>> inner_reference_p); >>>>> diff --git a/gcc/function.cc b/gcc/function.cc >>>>> index 82102ed78d7..262d3f17e72 100644 >>>>> --- a/gcc/function.cc >>>>> +++ b/gcc/function.cc >>>>> @@ -2742,6 +2742,9 @@ assign_parm_find_stack_rtl (tree parm, struct >>>>> assign_parm_data_one *data) >>>>> data->stack_parm = stack_parm; >>>>> } >>>>> >>>>> +extern void >>>>> +set_scalar_rtx_for_aggregate_access (tree, rtx); >>>>> + >>>>> /* A subroutine of assign_parms. Adjust DATA->ENTRY_RTL such that it's >>>>> always valid and contiguous. */ >>>>> >>>>> @@ -3117,8 +3120,21 @@ assign_parm_setup_block (struct >>>>> assign_parm_data_all *all, >>>>> emit_move_insn (mem, entry_parm); >>>>> } >>>>> else >>>>> - move_block_from_reg (REGNO (entry_parm), mem, >>>>> - size_stored / UNITS_PER_WORD); >>>>> + { >>>>> + int regno = REGNO (entry_parm); >>>>> + int nregs = size_stored / UNITS_PER_WORD; >>>>> + move_block_from_reg (regno, mem, nregs); >>>>> + >>>>> + rtx *tmps = XALLOCAVEC (rtx, nregs); >>>>> + machine_mode mode = word_mode; >>>>> + for (int i = 0; i < nregs; i++) >>>>> + tmps[i] = gen_rtx_EXPR_LIST ( >>>>> + VOIDmode, gen_rtx_REG (mode, regno + i), >>>>> + GEN_INT (GET_MODE_SIZE (mode).to_constant () * i)); >>>>> + >>>>> + rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps)); >>>>> + set_scalar_rtx_for_aggregate_access (parm, regs); >>>>> + } >>>>> } >>>>> else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type)) >>>>> { >>>>> @@ -3718,6 +3734,10 @@ assign_parms (tree fndecl) >>>>> else >>>>> set_decl_incoming_rtl (parm, data.entry_parm, false); >>>>> >>>>> + rtx incoming = DECL_INCOMING_RTL (parm); >>>>> + if (GET_CODE (incoming) == PARALLEL) >>>>> + set_scalar_rtx_for_aggregate_access (parm, incoming); >>>>> + >>>>> assign_parm_adjust_stack_rtl (&data); >>>>> >>>>> if (assign_parm_setup_block_p (&data)) >>>>> @@ -5037,6 +5057,7 @@ stack_protect_epilogue (void) >>>>> the function's parameters, which must be run at any return statement. >>>>> */ >>>>> >>>>> bool currently_expanding_function_start; >>>>> +extern void set_scalar_rtx_for_returns (); >>>>> void >>>>> expand_function_start (tree subr) >>>>> { >>>>> @@ -5138,6 +5159,7 @@ expand_function_start (tree subr) >>>>> { >>>>> gcc_assert (GET_CODE (hard_reg) == PARALLEL); >>>>> set_parm_rtl (res, gen_group_rtx (hard_reg)); >>>>> + set_scalar_rtx_for_returns (); >>>>> } >>>>> } >>>>> >>>>> diff --git a/gcc/opts.cc b/gcc/opts.cc >>>>> index 86b94d62b58..5e129a1cc49 100644 >>>>> --- a/gcc/opts.cc >>>>> +++ b/gcc/opts.cc >>>>> @@ -1559,6 +1559,10 @@ public: >>>>> vec<const char *> m_values; >>>>> }; >>>>> >>>>> +#ifdef __GNUC__ >>>>> +#pragma GCC diagnostic push >>>>> +#pragma GCC diagnostic ignored "-Wformat-truncation" >>>>> +#endif >>>>> /* Print help for a specific front-end, etc. */ >>>>> static void >>>>> print_filtered_help (unsigned int include_flags, >>>>> @@ -1913,7 +1917,9 @@ print_filtered_help (unsigned int include_flags, >>>>> printf ("\n\n"); >>>>> } >>>>> } >>>>> - >>>>> +#ifdef __GNUC__ >>>>> +#pragma GCC diagnostic pop >>>>> +#endif >>>>> /* Display help for a specified type of option. >>>>> The options must have ALL of the INCLUDE_FLAGS set >>>>> ANY of the flags in the ANY_FLAGS set >>>>> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C >>>>> b/gcc/testsuite/g++.target/powerpc/pr102024.C >>>>> index 769585052b5..c8995cae707 100644 >>>>> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C >>>>> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C >>>>> @@ -5,7 +5,7 @@ >>>>> // Test that a zero-width bit field in an otherwise homogeneous aggregate >>>>> // generates a psabi warning and passes arguments in GPRs. >>>>> >>>>> -// { dg-final { scan-assembler-times {\mstd\M} 4 } } >>>>> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } >>>>> >>>>> struct a_thing >>>>> { >>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c >>>>> b/gcc/testsuite/gcc.target/powerpc/pr108073.c >>>>> new file mode 100644 >>>>> index 00000000000..7dd1a4a326a >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c >>>>> @@ -0,0 +1,29 @@ >>>>> +/* { dg-do run } */ >>>>> +/* { dg-options "-O2 -save-temps" } */ >>>>> + >>>>> +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; >>>>> } DF; >>>>> +typedef struct SF {float a[4]; int i1; int i2; } SF; >>>>> + >>>>> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { >>>>> has_arch_ppc64 && has_arch_pwr8 } } } } */ >>>>> +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && >>>>> has_arch_pwr8 } } } } */ >>>>> +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && >>>>> has_arch_pwr8 } } } } */ >>>>> +short __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == >>>>> 2)return a.s2+a.s3;return 0;} >>>>> +int __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == >>>>> 2)return a.i2+a.i1;return 0;} >>>>> +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == >>>>> 2)return arg.a[3];else return 0.0;} >>>>> +float __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == >>>>> 2)return arg.a[2]; return 0;} >>>>> +float __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == >>>>> 2)return arg.a[1];return 0;} >>>>> + >>>>> +DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4}; >>>>> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2}; >>>>> + >>>>> +int main() >>>>> +{ >>>>> + if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) >>>>> == 4.0 >>>>> + && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0)) >>>>> + __builtin_abort (); >>>>> + if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) >>>>> == 0 >>>>> + && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0)) >>>>> + __builtin_abort (); >>>>> + return 0; >>>>> +} >>>>> + >>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c >>>>> b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c >>>>> new file mode 100644 >>>>> index 00000000000..4e1f87f7939 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c >>>>> @@ -0,0 +1,6 @@ >>>>> +/* PR target/65421 */ >>>>> +/* { dg-options "-O2" } */ >>>>> + >>>>> +typedef struct LARGE {double a[4]; int arr[32];} LARGE; >>>>> +LARGE foo (LARGE a){return a;} >>>>> +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c >>>>> b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c >>>>> new file mode 100644 >>>>> index 00000000000..8a8e1a0e996 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c >>>>> @@ -0,0 +1,32 @@ >>>>> +/* PR target/65421 */ >>>>> +/* { dg-options "-O2" } */ >>>>> +/* { dg-require-effective-target powerpc_elfv2 } */ >>>>> +/* { dg-require-effective-target has_arch_ppc64 } */ >>>>> + >>>>> +typedef struct FLOATS >>>>> +{ >>>>> + double a[3]; >>>>> +} FLOATS; >>>>> + >>>>> +/* 3 lfd after returns also optimized */ >>>>> +/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */ >>>>> + >>>>> +/* 3 stfd */ >>>>> +void st_arg (FLOATS a, FLOATS *p) {*p = a;} >>>>> +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ >>>>> + >>>>> +/* blr */ >>>>> +FLOATS ret_arg (FLOATS a) {return a;} >>>>> + >>>>> +typedef struct MIX >>>>> +{ >>>>> + double a[2]; >>>>> + long l; >>>>> +} MIX; >>>>> + >>>>> +/* std 3 param regs to return slot */ >>>>> +MIX ret_arg1 (MIX a) {return a;} >>>>> +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */ >>>>> + >>>>> +/* count insns */ >>>>> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */ >>>>> >>>> >>>> -- >>>> Richard Biener <rguent...@suse.de> >>>> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, >>>> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; >>>> HRB 36809 (AG Nuernberg)