Segher Boessenkool <seg...@kernel.crashing.org> writes: Hi!
> Hi! > > On Thu, Feb 24, 2022 at 03:48:54PM +0800, Jiufu Guo wrote: >> Segher Boessenkool <seg...@kernel.crashing.org> writes: >> > That is the problem yes. You need insns to call insn_cost on. You can >> > look in combine.c:combine_validate_cost to see how this can be done; but >> > you need to have some code to generate in the first place, and for CSE >> > it isn't always clear what code to generate, it really is based on RTL >> > expressions having a cost. >> >> Hi Segher, >> >> Thanks! combine_validate_cost is useful to help me on >> evaluating the costs of several instructions or replacements. >> >> As you pointed out, at CSE, it may not be clear to know what >> extact insn sequences will be generated. Actually, the same >> issue also exists on RTL expression. At CSE, it may not clear >> the exact cost, since the real instructions maybe emitted in >> very late passes. > > But there will be RTL insns already. Those may not correspond 1-1 to > the eventual machine insns (ideally they do most of the time though), > but it should be possible to estimate pretty accurate costs for them. > > Costs are never exact anyway, it is only one number, while in reality > there are many dimensions. > >> To get the accurate cost, we may analyze the constant in the >> hook(insn_cost or rtx_cost) and estimate the possible final >> instructions and then calculate the costs. > > Yes. > >> We discussed one idea: let the hook insn_cost accept >> any interim instruction, and estimate the real instruction >> base on the interim insn, and then return the estimated >> costs. > > No. insn_cost is only for correct, existing instructions, not for > made-up nonsense. I created insn_cost precisely to get away from that > aspect of rtx_cost (and some other issues, like, it is incredibly hard > and cumbersome to write a correct rtx_cost). Thanks! The implementations of hook insn_cost are align with this design, they are checking insn's attributes and COSTS_N_INSNS. One question on the speciall case: For instruction: "r119:DI=0x100803004101001" Would we treat it as valid instruction? A patch, which is attached the end of this mail, accepts "r119:DI=0x100803004101001" as input of insn_cost. In this patch, - A tmp instruction is generated via make_insn_raw. - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost. - In hook of insn_cost, checking the special 'constant' instruction. Are these make sense? > >> For example: input insn "r119:DI=0x100803004101001" to >> insn_cost; and in rs6000_insn_cost (for ppc), analyze >> constant "0x100803004101001" which would need 5 insns; >> then rs6000_insn_cost sumarize the cost of 5 insns. >> >> A minor concern: because we know that reading this >> constant from the pool is faster than building it by insns, >> we will generate instructions to load constant from the pool >> finally, do not emit 5 real instructions to build the value. >> So, we are more interested in if it is faster to load from >> pool or not. > > That is one reason why it is better to generate (close to) machine > insns as early as possible: it makes it much easier to estimate > realistic costs. (Another important reason is it allows other > optimisations, without us having to do any work for it!) Get it! In the middle of an optimization pass, 'interim' instruction maybe acceptable. While it would better to outputs only contains 'valid machine insn' from any RTL passes. BR, Jiufu > > > Segher diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index d7a7cfe860f..1184e6ad65b 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22131,6 +22131,16 @@ rs6000_debug_rtx_costs (rtx x, machine_mode mode, int outer_code, static int rs6000_insn_cost (rtx_insn *insn, bool speed) { + /* Handle special 'constant int' insn. */ + rtx set = PATTERN (insn); + if (GET_CODE (set) == SET && CONSTANT_P (SET_SRC (set))) + { + rtx src = SET_SRC (set); + machine_mode mode = GET_MODE (SET_DEST (set)); + if (CONST_INT_P (src) || CONST_WIDE_INT_P (src)) + return COSTS_N_INSNS (num_insns_constant (src, mode)); + } + if (recog_memoized (insn) < 0) return 0; diff --git a/gcc/cse.cc b/gcc/cse.cc index a18b599d324..4705ad82084 100644 --- a/gcc/cse.cc +++ b/gcc/cse.cc @@ -456,6 +456,8 @@ struct table_elt (REG_P (X) ? 0 : notreg_cost (X, MODE, SET, 1)) #define COST_IN(X, MODE, OUTER, OPNO) \ (REG_P (X) ? 0 : notreg_cost (X, MODE, OUTER, OPNO)) +#define COST_SRC(I, X, MODE) \ + (REG_P (X) ? 0 : notreg_cost (X, MODE, SET, 1, I)) /* Get the number of times this register has been updated in this basic block. */ @@ -523,7 +525,8 @@ static bitmap cse_ebb_live_in, cse_ebb_live_out; static sbitmap cse_visited_basic_blocks; static bool fixed_base_plus_p (rtx x); -static int notreg_cost (rtx, machine_mode, enum rtx_code, int); +static int notreg_cost (rtx, machine_mode, enum rtx_code, int, rtx_insn*); +static int insn_cost_x (rtx_insn *, rtx); static int preferable (int, int, int, int); static void new_basic_block (void); static void make_new_qty (unsigned int, machine_mode); @@ -698,7 +701,8 @@ preferable (int cost_a, int regcost_a, int cost_b, int regcost_b) from COST macro to keep it simple. */ static int -notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno) +notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno, + rtx_insn *insn = NULL) { scalar_int_mode int_mode, inner_mode; return ((GET_CODE (x) == SUBREG @@ -709,9 +713,21 @@ notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno) && subreg_lowpart_p (x) && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode)) ? 0 + : insn != NULL ? insn_cost_x (insn, x) : rtx_cost (x, mode, outer, opno, optimize_this_for_speed_p) * 2); } +/* Internal function, to get cost when use X to replace source of insn + which is a SET. */ + +static int +insn_cost_x (rtx_insn *insn, rtx x) +{ + INSN_CODE (insn) = -1; + SET_SRC (PATTERN (insn)) = x; + return insn_cost (insn, optimize_this_for_speed_p); +} + /* Initialize CSE_REG_INFO_TABLE. */ @@ -4603,6 +4619,7 @@ cse_insn (rtx_insn *insn) Nothing in this loop changes the hash table or the register chains. */ + rtx_insn *tmp_insn = NULL; for (i = 0; i < n_sets; i++) { bool repeat = false; @@ -4638,6 +4655,10 @@ cse_insn (rtx_insn *insn) mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src); sets[i].mode = mode; + if (tmp_insn == NULL_RTX && src && dest && dest != pc_rtx + && src != pc_rtx) + tmp_insn = make_insn_raw (gen_rtx_SET (copy_rtx (dest), copy_rtx(src))); + if (src_eqv) { machine_mode eqvmode = mode; @@ -5103,7 +5124,7 @@ cse_insn (rtx_insn *insn) src_cost = src_regcost = -1; else { - src_cost = COST (src, mode); + src_cost = COST_SRC (tmp_insn, src, mode); src_regcost = approx_reg_cost (src); } } @@ -5114,7 +5135,7 @@ cse_insn (rtx_insn *insn) src_eqv_cost = src_eqv_regcost = -1; else { - src_eqv_cost = COST (src_eqv_here, mode); + src_eqv_cost = COST_SRC (tmp_insn, src_eqv_here, mode); src_eqv_regcost = approx_reg_cost (src_eqv_here); } } @@ -5125,7 +5146,7 @@ cse_insn (rtx_insn *insn) src_folded_cost = src_folded_regcost = -1; else { - src_folded_cost = COST (src_folded, mode); + src_folded_cost = COST_SRC (tmp_insn, src_folded, mode); src_folded_regcost = approx_reg_cost (src_folded); } } @@ -5136,7 +5157,7 @@ cse_insn (rtx_insn *insn) src_related_cost = src_related_regcost = -1; else { - src_related_cost = COST (src_related, mode); + src_related_cost = COST_SRC (tmp_insn, src_related, mode); src_related_regcost = approx_reg_cost (src_related); /* If a const-anchor is used to synthesize a constant that @@ -5406,7 +5427,7 @@ cse_insn (rtx_insn *insn) src_folded = force_const_mem (mode, trial); if (src_folded) { - src_folded_cost = COST (src_folded, mode); + src_folded_cost = COST_SRC (tmp_insn, src_folded, mode); src_folded_regcost = approx_reg_cost (src_folded); } } @@ -5460,7 +5481,9 @@ cse_insn (rtx_insn *insn) /* If we had a constant that is cheaper than what we are now setting SRC to, use that constant. We ignored it when we thought we could make this into a no-op. */ - if (src_const && COST (src_const, mode) < COST (src, mode) + if (src_const + && COST_SRC (tmp_insn, src_const, mode) + < COST_SRC (tmp_insn, src, mode) && validate_change (insn, &SET_SRC (sets[i].rtl), src_const, 0)) src = src_const;