Hi Richard,
Thanks a lot for your comments! Richard Biener <rguent...@suse.de> writes: > On Wed, 23 Nov 2022, Jiufu Guo wrote: > >> Hi Jeff, >> >> Thanks a lot for your comments! > > Sorry for the late response ... > >> Jeff Law <jeffreya...@gmail.com> writes: >> >> > On 11/20/22 20:07, Jiufu Guo wrote: >> >> Jiufu Guo <guoji...@linux.ibm.com> writes: >> >> >> >>> Hi, >> >>> >> >>> As mentioned in the previous version patch: >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html >> >>> The suboptimal code is generated for "assigning from parameter" or >> >>> "assigning to return value". >> >>> This patch enhances the assignment from parameters like the below >> >>> cases: >> >>> /////case1.c >> >>> typedef struct SA {double a[3];long l; } A; >> >>> A ret_arg (A a) {return a;} >> >>> void st_arg (A a, A *p) {*p = a;} >> >>> >> >>> ////case2.c >> >>> typedef struct SA {double a[3];} A; >> >>> A ret_arg (A a) {return a;} >> >>> void st_arg (A a, A *p) {*p = a;} >> >>> >> >>> For this patch, bootstrap and regtest pass on ppc64{,le} >> >>> and x86_64. >> >>> * Besides asking for help reviewing this patch, I would like to >> >>> consult comments about enhancing for "assigning to returns". >> >> I updated the patch to fix the issue for returns. This patch >> >> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used >> >> by a return stmt. This patch fix the issue in expand pass only, >> >> so, we would try to update the patch to avoid this flag. >> >> >> >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc >> >> index dd29ffffc03..09b8ec64cea 100644 >> >> --- a/gcc/cfgexpand.cc >> >> +++ b/gcc/cfgexpand.cc >> >> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) >> >> frame_phase = off ? align - off : 0; >> >> } >> >> + /* Collect VARs on returns. */ >> >> + if (DECL_RESULT (current_function_decl)) >> >> + { >> >> + edge_iterator ei; >> >> + edge e; >> >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) >> >> + if (greturn *ret = safe_dyn_cast<greturn *> (last_stmt (e->src))) >> >> + { >> >> + tree val = gimple_return_retval (ret); >> >> + if (val && VAR_P (val)) >> >> + DECL_USEDBY_RETURN_P (val) = 1; > > you probably want to check && auto_var_in_fn (val, ...) since val > might be global? Since we are collecting the return vals, it would be built in function gimplify_return_expr. In this function, create_tmp_reg is used and a local temp. So it would not be a global var here. code piece in gimplify_return_expr: if (!result_decl) result = NULL_TREE; else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl))) { .... result = result_decl; } else if (gimplify_ctxp->return_temp) result = gimplify_ctxp->return_temp; else { result = create_tmp_reg (TREE_TYPE (result_decl)); Here, for "typedef struct SA {double a[3];}", aggregate_value_p returns false for target like ppc64le, because result of "hard_function_value" is a "rtx with PARALLELL code". And then a DECL_VAR is built via "create_tmp_reg" (actually it is not a reg here. it built a DECL_VAR with RECORD type and BLK mode). I also tried the way to use RESULT_DECL for this kind of type, or let aggregate_value_p accept this kind of type. But it seems not easy, because "<retval> (RESULT_DECL with PARALLEL)" is not ok for address operations. > >> >> + } >> >> + } >> >> + >> >> /* Set TREE_USED on all variables in the local_decls. */ >> >> FOR_EACH_LOCAL_DECL (cfun, i, var) >> >> TREE_USED (var) = 1; >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> >> index d9407432ea5..20973649963 100644 >> >> --- a/gcc/expr.cc >> >> +++ b/gcc/expr.cc >> >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool >> >> nontemporal) >> >> return; >> >> } > > I miss an explanatory comment here on that the following is heuristics > and its reasoning. > >> >> + if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from) >> >> + && TYPE_MODE (TREE_TYPE (from)) == BLKmode > > Why check TYPE_MODE here? Do you want AGGREGATE_TYPE_P on the type > instead? Checking BLK, because I want make sure the param should occur on register and stack (saved from register). Actualy, my intention is checking: GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode For code: GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL || REG_P (DECL_INCOMING_RTL (from))) This checking indicates if the param may be passing via 2 or more registers. Using "AGGREGATE_TYPE_P && (PARALLEL || REG_P)" may be ok and more readable. I would have a test. > >> >> + && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL >> >> + || REG_P (DECL_INCOMING_RTL (from)))) >> >> + || (VAR_P (to) && DECL_USEDBY_RETURN_P (to) >> >> + && TYPE_MODE (TREE_TYPE (to)) == BLKmode > > Likewise. > >> >> + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) >> >> + == PARALLEL)) > > Not REG_P here? REG_P with BLK on return would means return in memory, instead return through registers, so, REG_P was not added here, and let it use orignal behavior. > >> >> + { >> >> + push_temp_slots (); >> >> + rtx par_ret; >> >> + machine_mode mode; >> >> + par_ret = TREE_CODE (from) == PARM_DECL >> >> + ? DECL_INCOMING_RTL (from) >> >> + : DECL_RTL (DECL_RESULT (current_function_decl)); >> >> + mode = GET_CODE (par_ret) == PARALLEL >> >> + ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0)) >> >> + : word_mode; >> >> + int mode_size = GET_MODE_SIZE (mode).to_constant (); >> >> + int size = INTVAL (expr_size (from)); >> >> + >> >> + /* If/How the parameter using submode, it dependes on the size and >> >> + position of the parameter. Here using heurisitic number. */ >> >> + int hurstc_num = 8; >> > >> > Where did this come from and what does it mean? >> Sorry for does not make this clear. We know that an aggregate arg may be >> on registers partially or totally, as assign_parm_adjust_entry_rtl. >> For an example, if a parameter with 12 words and the target/ABI only >> allow 8 gprs for arguments, then the parameter could use 8 regs at most >> and left part in stack. > > I also wonder about the exact semantics of the parallels we get here. > > + int size = INTVAL (expr_size (from)); > > esp. when you use sth as simple as this. Shouldn't you instead look > at to_rtx since that's already expanded? For returns that should Yes, I use "expr_size (from)" is just because the size should be same between "from", "to" and "to_rtx" for these cases. > be the desired layout to match 'from' to, no? Maybe it's better > to not try sharing the code for both incoming and return copies > for clarity? OK, thanks, good sugguestion. > > Also, what happens if there's a copy from a PARM_DECL to a > DECL_USEDBY_RETURN_P decl? Which heuristic takes precedent? I thinking below code reflects this senario: typedef struct SA {double a[3];} A; A ret_arg (A a) {return a;} The gimple seq: D.3951 = a; return D.3951; For this patch, since "PARM_DECL" is checked before "VAR_P", so, PARM_DECL is checked as true first. > > I think that at least the place of the copy improvement and the > way you compute DECL_USEDBY_RETURN_P is reasonable. I'm also thinking to mark this flag for return variables during it was built in gimplify_return_expr. This would be more straightforward and no need to query/walk return stmts. One possible concern: it is too far between "gimplify" phase and "expand". I will update the patch accordingly. Thanks for suggestions and comments! BR, Jeff (Jiufu) > > Thanks, > Richard. > >> > >> > >> > Note that BLKmode subword values passed in registers can be either >> > right or left justified. I think you also need to worry about >> > endianness here. >> Since the subword is used to move block(read from source mem and then >> store to destination mem with register mode), and this would keep to use >> the same endianness on reg like move_block_from_reg. So, the patch does >> not check the endianness. >> >> If any concerns and sugguestions, please point out, thanks! >> >> BR, >> Jeff (Jiufu) >> > >> > >> > Jeff >>