On Wed, 23 Feb 2022, guojiufu wrote: > > > On 2/22/22 PM3:26, Richard Biener wrote: > > On Tue, 22 Feb 2022, Jiufu Guo wrote: > > > >> Hi, > >> > >> For constants, there are some codes to check: if it is able to put > >> to instruction as an immediate operand or it is profitable to load from > >> mem. There are still some places that could be improved for platforms. > >> > >> This patch could handle PR63281/57836. This patch does not change > >> too much on the code like force_const_mem and legitimate_constant_p. > >> We may integrate these APIs for passes like expand/cse/combine > >> as a whole solution in the future (maybe better for stage1?). > >> > >> Bootstrap and regtest pass on ppc64le and x86_64. Is this ok for trunk? > >> Thanks for comments! > > > > I'm not sure whether we need a new hook here, but iff, then I think > > whether loading a constant (from memory?) is faster or not depends > > on the context. So what's the exact situation and the two variants > > you are costing against each other? I assume (since you are > > touching CSE) you are costing > > Hi Richard, > > Thanks for your review! > > In some contexts, it may be faster to load from memory for some > constant value, and for some constant value, it would be faster > to put into immediate of few (1 or 2) instructions. > > For example 0x1234567812345678, on ppc64, we may need 3 instructions > to build it, and then it would be better to put it in .rodata, and > then load it from memory. > > Currently, we already have hooks TARGET_CANNOT_FORCE_CONST_MEM and > TARGET_LEGITIMATE_CONSTANT_P. > > TARGET_CANNOT_FORCE_CONST_MEM is used to check if one 'rtx' can be > store into the constant pool. > On some targets (e.g. alpha), TARGET_LEGITIMATE_CONSTANT_P does the > behavior like what we expect: > > I once thought to use TARGET_LEGITIMATE_CONSTANT_P too. > But in general, it seems this hook is designed to check if one > 'rtx' could be used as an immediate instruction. This hook is used > in RTL passes: ira/reload. It is also used in recog.cc and expr.cc. > > In other words, I feel, whether putting a constant in the constant > pool, we could check: > - If TARGET_CANNOT_FORCE_CONST_MEM returns true, we should not put > the 'constant' in the constant pool. > - If TARGET_LEGITIMATE_CONSTANT_P returns true, then the 'constant' > would be immediate of **one** instruction, and not put to constant > pool. > - If the new hook TARGET_FASTER_LOADING_CONSTANT returns true, then > the 'constant' would be stored in the constant pool. > Otherwise, it would be better to use an instructions-seq to build the > 'constant'. > This is why I introduce a new hook.
I agree TARGET_CANNOT_FORCE_CONST_MEM and TARGET_LEGITIMATE_CONSTANT_P are not the correct vehicle for a cost based consideration, they are used for correctness checks. > We may also use the new hook at other places, e.g. expand/combining... > where is calling force_const_mem. > > Any suggestions? > > > > > (set (...) (mem)) (before CSE) > > > > against > > > > (set (...) (immediate)) (what CSE does now) > > > > vs. > > > > (set (...) (mem)) (original, no CSE) > > > > ? With the new hook you are skipping _all_ of the following loops > > logic which does look like a quite bad design and hack (not that > > I am very familiar with the candidate / costing logic in there). > > At cse_insn, in the following loop of the code, it is also testing > the constant and try to put into memory: > > else if (crtl->uses_const_pool > && CONSTANT_P (trial) > && !CONST_INT_P (trial) > && (src_folded == 0 || !MEM_P (src_folded)) > && GET_MODE_CLASS (mode) != MODE_CC > && mode != VOIDmode) > { > src_folded = force_const_mem (mode, trial); > if (src_folded) > { > src_folded_cost = COST (src_folded, mode); > src_folded_regcost = approx_reg_cost (src_folded); > } > } > > This code is at the end of the loop, it would only be tested for > the next iteration. It may be better to test "if need to put the > constant into memory" for all iterations. > > The current patch is adding an additional test before the loop. > I will update the patch to integrate these two places! I'm assuming we're always dealing with (set (reg:MODE ..) <src_folded>) here and CSE is not substituting into random places of an instruction(?). I don't know what 'rtx_cost' should evaluate to for a constant, if it should implicitely evaluate the cost of putting the result into a register for example. The RTX_COST target hook at least has some context (outer_code and opno) and COST passes SET and 1 here. So why is adjusting the RTX_COST hook not enough then for this case? Using RTX_COST with SET and 1 at least looks no worse than using your proposed new target hook and comparing it with the original unfolded src (again with SET and 1). Richard. > > > > We already have TARGET_INSN_COST which you could ask for a cost. > > Like if we'd have a single_set then just temporarily substitute > > the RHS with the candidate and cost the insns and compare against > > the original insn cost. So why exactly do you need a new hook > > for this particular situation? > > Thanks for pointing out this! Segher also mentioned this before. > Currently, CSE is using rtx_cost. Using insn_cost to replace > rtx_cost would be a good idea for all necessary places including CSE. > > For this particular case: check the cost for constants. > I did not use insn_cost. Because to use insn_cost, we may need > to create a recognizable insn temporarily, and for some kind of > constants we may need to create a sequence instructions on some > platform, e.g. "li xx; ori ; sldi .." on ppc64, and check the > sum cost of those instructions. If only create one fake > instruction, the insn_cost may not return the accurate cost either. > > BR, > Jiufu > > > > > Thanks, > > Richard. > > > > > >> > >> BR, > >> Jiufu > >> > >> gcc/ChangeLog: > >> > >> PR target/94393 > >> PR rtl-optimization/63281 > >> * config/rs6000/rs6000.cc (TARGET_FASTER_LOADING_CONSTANT): > >> New hook. > >> (rs6000_faster_loading_const): New hook. > >> (rs6000_cannot_force_const_mem): Update. > >> (rs6000_emit_move): Use rs6000_faster_loading_const. > >> * cse.cc (cse_insn): Use new hook. > >> * doc/tm.texi.in: New hook TARGET_FASTER_LOADING_CONSTANT. > >> * doc/tm.texi: Regenerate. > >> * target.def: New hook faster_loading_constant. > >> * targhooks.cc (default_faster_loading_constant): New. > >> * targhooks.h (default_faster_loading_constant): Declare it. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/powerpc/medium_offset.c: Update. > >> * gcc.target/powerpc/pr93012.c: Update. > >> * gcc.target/powerpc/pr63281.c: New test. > >> > >> --- > >> gcc/config/rs6000/rs6000.cc | 38 ++++++++++++++----- > >> gcc/cse.cc | 5 +++ > >> gcc/doc/tm.texi | 5 +++ > >> gcc/doc/tm.texi.in | 2 + > >> gcc/target.def | 8 ++++ > >> gcc/targhooks.cc | 6 +++ > >> .../gcc.target/powerpc/medium_offset.c | 2 +- > >> gcc/testsuite/gcc.target/powerpc/pr63281.c | 14 +++++++ > >> gcc/testsuite/gcc.target/powerpc/pr93012.c | 2 +- > >> 9 files changed, 71 insertions(+), 11 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c > >> > >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > >> index d7a7cfe860f..beb21a0bb2b 100644 > >> --- a/gcc/config/rs6000/rs6000.cc > >> +++ b/gcc/config/rs6000/rs6000.cc > >> @@ -1361,6 +1361,9 @@ static const struct attribute_spec > >> rs6000_attribute_table[] = > >> #undef TARGET_CANNOT_FORCE_CONST_MEM > >> #define TARGET_CANNOT_FORCE_CONST_MEM rs6000_cannot_force_const_mem > >> > >> +#undef TARGET_FASTER_LOADING_CONSTANT > >> +#define TARGET_FASTER_LOADING_CONSTANT rs6000_faster_loading_const > >> + > >> #undef TARGET_DELEGITIMIZE_ADDRESS > >> #define TARGET_DELEGITIMIZE_ADDRESS rs6000_delegitimize_address > >> > >> @@ -9684,8 +9687,7 @@ rs6000_init_stack_protect_guard (void) > >> static bool > >> rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) > >> { > >> - if (GET_CODE (x) == HIGH > >> - && GET_CODE (XEXP (x, 0)) == UNSPEC) > >> + if (GET_CODE (x) == HIGH) > >> return true; > >> > >> /* A TLS symbol in the TOC cannot contain a sum. */ > >> @@ -10483,6 +10485,30 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx > >> source, machine_mode mode) > >> return false; > >> } > >> > >> + > >> +/* Implement TARGET_FASTER_LOADING_CONSTANT. */ > >> + > >> +static bool > >> +rs6000_faster_loading_const (machine_mode mode, rtx dest, rtx src) > >> +{ > >> + gcc_assert (CONSTANT_P (src)); > >> + > >> + if (GET_MODE_CLASS (mode) == MODE_CC || mode == VOIDmode) > >> + return false; > >> + if (GET_CODE (src) == HIGH) > >> + return false; > >> + if (toc_relative_expr_p (src, false, NULL, NULL)) > >> + return false; > >> + if (rs6000_cannot_force_const_mem (mode, src)) > >> + return false; > >> + > >> + if (REG_P (dest) && FP_REGNO_P (REGNO (dest))) > >> + return true; > >> + if (!CONST_INT_P (src)) > >> + return true; > >> + return num_insns_constant (src, mode) > (TARGET_PCREL ? 1 : 2); > >> +} > >> + > >> /* Emit a move from SOURCE to DEST in mode MODE. */ > >> void > >> rs6000_emit_move (rtx dest, rtx source, machine_mode mode) > >> @@ -10860,13 +10886,7 @@ rs6000_emit_move (rtx dest, rtx source, > >> machine_mode mode) > >> operands[1] = create_TOC_reference (operands[1], operands[0]); > >> else if (mode == Pmode > >> && CONSTANT_P (operands[1]) > >> - && GET_CODE (operands[1]) != HIGH > >> - && ((REG_P (operands[0]) > >> - && FP_REGNO_P (REGNO (operands[0]))) > >> - || !CONST_INT_P (operands[1]) > >> - || (num_insns_constant (operands[1], mode) > >> - > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2))) > >> - && !toc_relative_expr_p (operands[1], false, NULL, NULL) > >> + && rs6000_faster_loading_const (mode, operands[0], operands[1]) > >> && (TARGET_CMODEL == CMODEL_SMALL > >> || can_create_pseudo_p () > >> || (REG_P (operands[0]) > >> diff --git a/gcc/cse.cc b/gcc/cse.cc > >> index a18b599d324..c33d3c7e911 100644 > >> --- a/gcc/cse.cc > >> +++ b/gcc/cse.cc > >> @@ -5158,6 +5158,11 @@ cse_insn (rtx_insn *insn) > >> if (dest == pc_rtx && src_const && GET_CODE (src_const) == > >> LABEL_REF) > >> src_folded = src_const, src_folded_cost = src_folded_regcost = -1; > >> + /* Check if src_foled is contant which is fastster to load pool. > >> */ > >> + if (src_folded && CONSTANT_P (src_folded) && src && MEM_P (src) > >> + && targetm.faster_loading_constant (mode, dest, src_folded)) > >> + src_folded = NULL_RTX, src_folded_cost = src_folded_regcost = -1; > >> + > >> /* Terminate loop when replacement made. This must terminate since > >> the current contents will be tested and will always be valid. > >> */ > >> while (1) > >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > >> index 962bbb8caaf..65685311249 100644 > >> --- a/gcc/doc/tm.texi > >> +++ b/gcc/doc/tm.texi > >> @@ -6014,6 +6014,11 @@ holding the constant. This restriction is often > >> true of addresses > >> of TLS symbols for various targets. > >> @end deftypefn > >> > >> +@deftypefn {Target Hook} bool TARGET_FASTER_LOADING_CONSTANT (machine_mode > >> @var{mode}, rtx @var{y}, rtx @var{x}) > >> +It returns true if loading contant @var{x} is faster than building > >> +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}. > >> +@end deftypefn > >> + > >> @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P > >> (machine_mode @var{mode}, const_rtx @var{x}) > >> This hook should return true if pool entries for constant @var{x} can > >> be placed in an @code{object_block} structure. @var{mode} is the mode > >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > >> index 394b59e70a6..918914f0e30 100644 > >> --- a/gcc/doc/tm.texi.in > >> +++ b/gcc/doc/tm.texi.in > >> @@ -4148,6 +4148,8 @@ address; but often a machine-dependent strategy can > >> generate better code. > >> > >> @hook TARGET_CANNOT_FORCE_CONST_MEM > >> > >> +@hook TARGET_FASTER_LOADING_CONSTANT > >> + > >> @hook TARGET_USE_BLOCKS_FOR_CONSTANT_P > >> > >> @hook TARGET_USE_BLOCKS_FOR_DECL_P > >> diff --git a/gcc/target.def b/gcc/target.def > >> index 57e64b20eef..8b007aca9dc 100644 > >> --- a/gcc/target.def > >> +++ b/gcc/target.def > >> @@ -2987,6 +2987,14 @@ locally. The default is to return false.", > >> bool, (void), > >> hook_bool_void_false) > >> +/* Check if it is profitable to load the constant from constant pool. > >> */ > >> +DEFHOOK > >> +(faster_loading_constant, > >> + "It returns true if loading contant @var{x} is faster than building\n\ > >> +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}.", > >> + bool, (machine_mode mode, rtx y, rtx x), > >> + default_faster_loading_constant) > >> + > >> /* True if it is OK to do sibling call optimization for the specified > >> call expression EXP. DECL will be the called function, or NULL if > >> this is an indirect call. */ > >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc > >> index fc49235eb38..1d9340d1c14 100644 > >> --- a/gcc/targhooks.cc > >> +++ b/gcc/targhooks.cc > >> @@ -173,6 +173,12 @@ default_return_in_memory (const_tree type, > >> return (TYPE_MODE (type) == BLKmode); > >> } > >> > >> +bool > >> +default_faster_loading_constant (machine_mode mode, rtx dest, rtx src) > >> +{ > >> + return false; > >> +} > >> + > >> rtx > >> default_legitimize_address (rtx x, rtx orig_x ATTRIBUTE_UNUSED, > >> machine_mode mode ATTRIBUTE_UNUSED) > >> diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> b/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> index f29eba08c38..4889e8fa8ec 100644 > >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> @@ -1,7 +1,7 @@ > >> /* { dg-do compile { target { powerpc*-*-* } } } */ > >> /* { dg-require-effective-target lp64 } */ > >> /* { dg-options "-O" } */ > >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ > >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ > >> > >> static int x; > >> > >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c > >> b/gcc/testsuite/gcc.target/powerpc/pr63281.c > >> new file mode 100644 > >> index 00000000000..58ba074d427 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c > >> @@ -0,0 +1,14 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2" } */ > >> + > >> +#define CONST1 0x100803004101001 > >> +#define CONST2 0x000406002105007 > >> + > >> +void __attribute__ ((noinline)) > >> +foo (unsigned long long *a, unsigned long long *b) > >> +{ > >> + *a = CONST1; > >> + *b = CONST2; > >> +} > >> + > >> +/* { dg-final { scan-assembler-times {\mld|pld\M} 2 } } */ > >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c > >> b/gcc/testsuite/gcc.target/powerpc/pr93012.c > >> index 4f764d0576f..5afb4f79c45 100644 > >> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c > >> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c > >> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return > >> 0xffff9234ffff9234ULL; } > >> unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; } > >> unsigned long long mskse() { return 0xffff1234ffff1234ULL; } > >> > >> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */ > >> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */ > >> > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)