Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain
> On Jan 3, 2021, at 2:42 PM, Aaron Sawdey <acsaw...@linux.ibm.com> wrote: > > Ping. > > I assume we’re going to want a separate patch for the new instruction type. > > Aaron Sawdey, Ph.D. saw...@linux.ibm.com > IBM Linux on POWER Toolchain > > >> On Dec 4, 2020, at 1:19 PM, acsaw...@linux.ibm.com wrote: >> >> From: Aaron Sawdey <acsaw...@linux.ibm.com> >> >> This patch adds the first batch of patterns to support p10 fusion. These >> will allow combine to create a single insn for a pair of instructions >> that that power10 can fuse and execute. These particular ones have the >> requirement that only cr0 can be used when fusing a load with a compare >> immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine >> to put that requirement in, and if it doesn't work out later the splitter >> can get used. >> >> The patterns are generated by a script genfusion.pl and live in new file >> fusion.md. This script will be expanded to generate more patterns for >> fusion. >> >> This also adds option -mpower10-fusion which defaults on for power10 and >> will gate all these fusion patterns. In addition I have added an >> undocumented option -mpower10-fusion-ld-cmpi (which may be removed later) >> that just controls the load+compare-immediate patterns. I have make >> these default on for power10 but they are not disallowed for earlier >> processors because it is still valid code. This allows us to test the >> correctness of fusion code generation by turning it on explicitly. >> >> If bootstrap/regtest is clean, ok for trunk? >> >> Thanks! >> >> Aaron >> >> gcc/ChangeLog: >> >> * config/rs6000/genfusion.pl: New file, script to generate >> define_insn_and_split patterns so combine can arrange fused >> instructions next to each other. >> * config/rs6000/fusion.md: New file, generated fused instruction >> patterns for combine. >> * config/rs6000/predicates.md (const_m1_to_1_operand): New predicate. >> (non_update_memory_operand): New predicate. >> * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and >> OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and >> POWERPC_MASKS. >> * config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add >> prototype. >> * config/rs6000/rs6000.c (rs6000_option_override_internal): >> automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi >> if target is power10. (rs600_opt_masks): Allow -mpower10-fusion >> in function attributes. (address_is_non_pfx_d_or_x): New function. >> * config/rs6000/rs6000.h: Add MASK_P10_FUSION. >> * config/rs6000/rs6000.md: Include fusion.md. >> * config/rs6000/rs6000.opt: Add -mpower10-fusion >> and -mpower10-fusion-ld-cmpi. >> * config/rs6000/t-rs6000: Add dependencies involving fusion.md. >> --- >> gcc/config/rs6000/fusion.md | 357 ++++++++++++++++++++++++++++++ >> gcc/config/rs6000/genfusion.pl | 144 ++++++++++++ >> gcc/config/rs6000/predicates.md | 14 ++ >> gcc/config/rs6000/rs6000-cpus.def | 6 +- >> gcc/config/rs6000/rs6000-protos.h | 2 + >> gcc/config/rs6000/rs6000.c | 51 +++++ >> gcc/config/rs6000/rs6000.h | 1 + >> gcc/config/rs6000/rs6000.md | 1 + >> gcc/config/rs6000/rs6000.opt | 8 + >> gcc/config/rs6000/t-rs6000 | 6 +- >> 10 files changed, 588 insertions(+), 2 deletions(-) >> create mode 100644 gcc/config/rs6000/fusion.md >> create mode 100755 gcc/config/rs6000/genfusion.pl >> >> diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md >> new file mode 100644 >> index 00000000000..a4d3a6ae7f3 >> --- /dev/null >> +++ b/gcc/config/rs6000/fusion.md >> @@ -0,0 +1,357 @@ >> +;; -*- buffer-read-only: t -*- >> +;; Generated automatically by genfusion.pl >> + >> +;; Copyright (C) 2020 Free Software Foundation, Inc. >> +;; >> +;; This file is part of GCC. >> +;; >> +;; GCC is free software; you can redistribute it and/or modify it under >> +;; the terms of the GNU General Public License as published by the Free >> +;; Software Foundation; either version 3, or (at your option) any later >> +;; version. >> +;; >> +;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +;; WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >> +;; for more details. >> +;; >> +;; You should have received a copy of the GNU General Public License >> +;; along with GCC; see the file COPYING3. If not see >> +;; <http://www.gnu.org/licenses/>. >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is DI result mode is clobber compare mode is CC extend is none >> +(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none" >> + [(set (match_operand:CC 2 "cc_reg_operand" "=x") >> + (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m") >> + (match_operand:DI 3 "const_m1_to_1_operand" "n"))) >> + (clobber (match_scratch:DI 0 "=r"))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "ld%X1 %0,%1\;cmpdi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, >> NON_PREFIXED_DS))" >> + [(set (match_dup 0) (match_dup 1)) >> + (set (match_dup 2) >> + (compare:CC (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is DI result mode is clobber compare mode is CCUNS extend is >> none >> +(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m") >> + (match_operand:DI 3 "const_0_to_1_operand" "n"))) >> + (clobber (match_scratch:DI 0 "=r"))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "ld%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, >> NON_PREFIXED_DS))" >> + [(set (match_dup 0) (match_dup 1)) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is DI result mode is DI compare mode is CC extend is none >> +(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none" >> + [(set (match_operand:CC 2 "cc_reg_operand" "=x") >> + (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m") >> + (match_operand:DI 3 "const_m1_to_1_operand" "n"))) >> + (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "ld%X1 %0,%1\;cmpdi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, >> NON_PREFIXED_DS))" >> + [(set (match_dup 0) (match_dup 1)) >> + (set (match_dup 2) >> + (compare:CC (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is DI result mode is DI compare mode is CCUNS extend is none >> +(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m") >> + (match_operand:DI 3 "const_0_to_1_operand" "n"))) >> + (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "ld%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, >> NON_PREFIXED_DS))" >> + [(set (match_dup 0) (match_dup 1)) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is SI result mode is clobber compare mode is CC extend is none >> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none" >> + [(set (match_operand:CC 2 "cc_reg_operand" "=x") >> + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m") >> + (match_operand:SI 3 "const_m1_to_1_operand" "n"))) >> + (clobber (match_scratch:SI 0 "=r"))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lwa%X1 %0,%1\;cmpdi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, >> NON_PREFIXED_DS))" >> + [(set (match_dup 0) (match_dup 1)) >> + (set (match_dup 2) >> + (compare:CC (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is SI result mode is clobber compare mode is CCUNS extend is >> none >> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m") >> + (match_operand:SI 3 "const_0_to_1_operand" "n"))) >> + (clobber (match_scratch:SI 0 "=r"))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lwz%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (match_dup 1)) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is SI result mode is SI compare mode is CC extend is none >> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none" >> + [(set (match_operand:CC 2 "cc_reg_operand" "=x") >> + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m") >> + (match_operand:SI 3 "const_m1_to_1_operand" "n"))) >> + (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lwa%X1 %0,%1\;cmpdi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, >> NON_PREFIXED_DS))" >> + [(set (match_dup 0) (match_dup 1)) >> + (set (match_dup 2) >> + (compare:CC (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is SI result mode is SI compare mode is CCUNS extend is none >> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m") >> + (match_operand:SI 3 "const_0_to_1_operand" "n"))) >> + (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lwz%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (match_dup 1)) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is SI result mode is EXTSI compare mode is CC extend is sign >> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign" >> + [(set (match_operand:CC 2 "cc_reg_operand" "=x") >> + (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m") >> + (match_operand:SI 3 "const_m1_to_1_operand" "n"))) >> + (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI >> (match_dup 1)))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lwa%X1 %0,%1\;cmpdi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, >> NON_PREFIXED_DS))" >> + [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1))) >> + (set (match_dup 2) >> + (compare:CC (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero >> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m") >> + (match_operand:SI 3 "const_0_to_1_operand" "n"))) >> + (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI >> (match_dup 1)))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lwz%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1))) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is HI result mode is clobber compare mode is CC extend is sign >> +(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign" >> + [(set (match_operand:CC 2 "cc_reg_operand" "=x") >> + (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m") >> + (match_operand:HI 3 "const_m1_to_1_operand" "n"))) >> + (clobber (match_scratch:GPR 0 "=r"))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lha%X1 %0,%1\;cmpdi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (sign_extend:GPR (match_dup 1))) >> + (set (match_dup 2) >> + (compare:CC (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is HI result mode is clobber compare mode is CCUNS extend is >> zero >> +(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m") >> + (match_operand:HI 3 "const_0_to_1_operand" "n"))) >> + (clobber (match_scratch:GPR 0 "=r"))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lhz%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (zero_extend:GPR (match_dup 1))) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is HI result mode is EXTHI compare mode is CC extend is sign >> +(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign" >> + [(set (match_operand:CC 2 "cc_reg_operand" "=x") >> + (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m") >> + (match_operand:HI 3 "const_m1_to_1_operand" "n"))) >> + (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI >> (match_dup 1)))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lha%X1 %0,%1\;cmpdi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1))) >> + (set (match_dup 2) >> + (compare:CC (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero >> +(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m") >> + (match_operand:HI 3 "const_0_to_1_operand" "n"))) >> + (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI >> (match_dup 1)))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lhz%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1))) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is QI result mode is clobber compare mode is CCUNS extend is >> zero >> +(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m") >> + (match_operand:QI 3 "const_0_to_1_operand" "n"))) >> + (clobber (match_scratch:GPR 0 "=r"))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lbz%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (zero_extend:GPR (match_dup 1))) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 >> +;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero >> +(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero" >> + [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") >> + (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m") >> + (match_operand:QI 3 "const_0_to_1_operand" "n"))) >> + (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR >> (match_dup 1)))] >> + "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)" >> + "lbz%X1 %0,%1\;cmpldi 0,%0,%3" >> + "&& reload_completed >> + && (cc_reg_not_cr0_operand (operands[2], CCmode) >> + || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, >> NON_PREFIXED_D))" >> + [(set (match_dup 0) (zero_extend:GPR (match_dup 1))) >> + (set (match_dup 2) >> + (compare:CCUNS (match_dup 0) >> + (match_dup 3)))] >> + "" >> + [(set_attr "type" "load") >> + (set_attr "cost" "8") >> + (set_attr "length" "8")]) >> + >> diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl >> new file mode 100755 >> index 00000000000..494537c9439 >> --- /dev/null >> +++ b/gcc/config/rs6000/genfusion.pl >> @@ -0,0 +1,144 @@ >> +#!/usr/bin/perl -w >> +# Generate fusion.md >> +# Copyright (C) 2020 Free Software Foundation, Inc. >> +# >> +# This file is part of GCC. >> +# >> +# GCC is free software; you can redistribute it and/or modify >> +# it under the terms of the GNU General Public License as published by >> +# the Free Software Foundation; either version 3, or (at your option) >> +# any later version. >> +# >> +# GCC is distributed in the hope that it will be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with GCC; see the file COPYING3. If not see >> +# <http://www.gnu.org/licenses/>. >> + >> +my $copyright = <<'EOF'; >> +;; -*- buffer-read-only: t -*- >> +;; Generated automatically by genfusion.pl >> + >> +;; Copyright (C) 2020 Free Software Foundation, Inc. >> +;; >> +;; This file is part of GCC. >> +;; >> +;; GCC is free software; you can redistribute it and/or modify it under >> +;; the terms of the GNU General Public License as published by the Free >> +;; Software Foundation; either version 3, or (at your option) any later >> +;; version. >> +;; >> +;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +;; WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >> +;; for more details. >> +;; >> +;; You should have received a copy of the GNU General Public License >> +;; along with GCC; see the file COPYING3. If not see >> +;; <http://www.gnu.org/licenses/>. >> + >> +EOF >> + >> +print $copyright; >> + >> +sub mode_to_ldst_char >> +{ >> + my ($mode) = @_; >> + if ($mode eq 'DI') { return 'd'; } >> + if ($mode eq 'SI') { return 'w'; } >> + if ($mode eq 'HI') { return 'h'; } >> + if ($mode eq 'QI') { return 'b'; } >> + return '?'; >> +} >> + >> +sub gen_ld_cmpi_p10 >> +{ >> + LMODE: foreach $lmode ('DI','SI','HI','QI') { >> + $ldst = mode_to_ldst_char($lmode); >> + $clobbermode = $lmode; >> + # For clobber, we need a SI/DI reg in case we split because we have >> to sign/zero extend. >> + if ( $lmode eq 'HI' || $lmode eq 'QI' ) { $clobbermode = "GPR"; } >> + RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) { >> + # EXTDI does not exist, and we cannot directly produce HI/QI results. >> + next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI"; >> + # Don't allow EXTQI because that would allow HI result which we can't >> do. >> + if ( $result eq "EXTQI" ) { $result = "GPR"; } >> + CCMODE: foreach $ccmode ('CC','CCUNS') { >> + $np = "NON_PREFIXED_D"; >> + if ( $ccmode eq 'CC' ) { >> + next CCMODE if $lmode eq 'QI'; >> + if ( $lmode eq 'DI' || $lmode eq 'SI' ) { >> + # ld and lwa are both DS-FORM. >> + $np = "NON_PREFIXED_DS"; >> + } >> + $cmpl = ""; >> + $echr = "a"; >> + $constpred = "const_m1_to_1_operand"; >> + } else { >> + if ( $lmode eq 'DI' ) { >> + # ld is DS-form, but lwz is not. >> + $np = "NON_PREFIXED_DS"; >> + } >> + $cmpl = "l"; >> + $echr = "z"; >> + $constpred = "const_0_to_1_operand"; >> + } >> + if ($lmode eq 'DI') { $echr = ""; } >> + if ($result =~ m/EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') { >> + # We always need extension if result > lmode. >> + if ( $ccmode eq 'CC' ) { >> + $extend = "sign"; >> + } else { >> + $extend = "zero"; >> + } >> + } else { >> + # Result of SI/DI does not need sign extension. >> + $extend = "none"; >> + } >> + print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n"; >> + print ";; load mode is $lmode result mode is $result compare mode is >> $ccmode extend is $extend\n"; >> + >> + print "(define_insn_and_split >> \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n"; >> + print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" >> \"=x\")\n"; >> + print " (compare:${ccmode} (match_operand:${lmode} 1 >> \"non_update_memory_operand\" \"m\")\n"; >> + print " (match_operand:${lmode} 3 \"${constpred}\" >> \"n\")))\n"; >> + if ($result eq 'clobber') { >> + print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n"; >> + } elsif ($result eq $lmode) { >> + print " (set (match_operand:${result} 0 \"gpc_reg_operand\" >> \"=r\") (match_dup 1))]\n"; >> + } else { >> + print " (set (match_operand:${result} 0 \"gpc_reg_operand\" >> \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n"; >> + } >> + print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n"; >> + print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di 0,%0,%3\"\n"; >> + print " \"&& reload_completed\n"; >> + print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n"; >> + print " || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), >> ${lmode}mode, ${np}))\"\n"; >> + if ($extend eq "none") { >> + print " [(set (match_dup 0) (match_dup 1))\n"; >> + } else { >> + $resultmode = $result; >> + if ( $result eq 'clobber' ) { $resultmode = $clobbermode } >> + print " [(set (match_dup 0) (${extend}_extend:${resultmode} >> (match_dup 1)))\n"; >> + } >> + print " (set (match_dup 2)\n"; >> + print " (compare:${ccmode} (match_dup 0)\n"; >> + print " (match_dup 3)))]\n"; >> + print " \"\"\n"; >> + print " [(set_attr \"type\" \"load\")\n"; >> + print " (set_attr \"cost\" \"8\")\n"; >> + print " (set_attr \"length\" \"8\")])\n"; >> + print "\n"; >> + } >> + } >> + } >> +} >> + >> + >> +gen_ld_cmpi_p10(); >> + >> +exit(0); >> + >> diff --git a/gcc/config/rs6000/predicates.md >> b/gcc/config/rs6000/predicates.md >> index 9ad5ae67302..78de8102f44 100644 >> --- a/gcc/config/rs6000/predicates.md >> +++ b/gcc/config/rs6000/predicates.md >> @@ -297,6 +297,11 @@ (define_predicate "const_0_to_1_operand" >> (and (match_code "const_int") >> (match_test "IN_RANGE (INTVAL (op), 0, 1)"))) >> >> +;; Match op = -1, op = 0, or op = 1. >> +(define_predicate "const_m1_to_1_operand" >> + (and (match_code "const_int") >> + (match_test "IN_RANGE (INTVAL (op), -1, 1)"))) >> + >> ;; Match op = 0..3. >> (define_predicate "const_0_to_3_operand" >> (and (match_code "const_int") >> @@ -847,6 +852,15 @@ (define_special_predicate "update_address_mem" >> || GET_CODE (XEXP (op, 0)) == PRE_DEC >> || GET_CODE (XEXP (op, 0)) == PRE_MODIFY))")) >> >> +;; Anything that matches memory_operand but does not update the address. >> +(define_predicate "non_update_memory_operand" >> + (match_code "mem") >> +{ >> + if (update_address_mem (op, mode)) >> + return 0; >> + return memory_operand (op, mode); >> +}) >> + >> ;; Return 1 if the operand is a MEM with an indexed-form address. >> (define_special_predicate "indexed_address_mem" >> (match_test "(MEM_P (op) >> diff --git a/gcc/config/rs6000/rs6000-cpus.def >> b/gcc/config/rs6000/rs6000-cpus.def >> index 8d2c1ffd6cf..3e65289d8df 100644 >> --- a/gcc/config/rs6000/rs6000-cpus.def >> +++ b/gcc/config/rs6000/rs6000-cpus.def >> @@ -82,7 +82,9 @@ >> >> #define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \ >> | OPTION_MASK_POWER10 \ >> - | OTHER_POWER10_MASKS) >> + | OTHER_POWER10_MASKS \ >> + | OPTION_MASK_P10_FUSION \ >> + | OPTION_MASK_P10_FUSION_LD_CMPI) >> >> /* Flags that need to be turned off if -mno-power9-vector. */ >> #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW >> \ >> @@ -129,6 +131,8 @@ >> | OPTION_MASK_FLOAT128_KEYWORD \ >> | OPTION_MASK_FPRND \ >> | OPTION_MASK_POWER10 \ >> + | OPTION_MASK_P10_FUSION \ >> + | OPTION_MASK_P10_FUSION_LD_CMPI \ >> | OPTION_MASK_HTM \ >> | OPTION_MASK_ISEL \ >> | OPTION_MASK_MFCRF \ >> diff --git a/gcc/config/rs6000/rs6000-protos.h >> b/gcc/config/rs6000/rs6000-protos.h >> index 3c4682b0e26..cd644083558 100644 >> --- a/gcc/config/rs6000/rs6000-protos.h >> +++ b/gcc/config/rs6000/rs6000-protos.h >> @@ -191,6 +191,8 @@ enum non_prefixed_form { >> >> extern enum insn_form address_to_insn_form (rtx, machine_mode, >> enum non_prefixed_form); >> +extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode, >> + enum non_prefixed_form >> non_prefix_format); >> extern bool prefixed_load_p (rtx_insn *); >> extern bool prefixed_store_p (rtx_insn *); >> extern bool prefixed_paddi_p (rtx_insn *); >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 517467ebc63..759551d07ec 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -4423,6 +4423,12 @@ rs6000_option_override_internal (bool global_init_p) >> if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0) >> rs6000_isa_flags |= OPTION_MASK_MMA; >> >> + if (TARGET_POWER10 && (rs6000_isa_flags_explicit & >> OPTION_MASK_P10_FUSION) == 0) >> + rs6000_isa_flags |= OPTION_MASK_P10_FUSION; >> + >> + if (TARGET_POWER10 && (rs6000_isa_flags_explicit & >> OPTION_MASK_P10_FUSION_LD_CMPI) == 0) >> + rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI; >> + >> /* Turn off vector pair/mma options on non-power10 systems. */ >> else if (!TARGET_POWER10 && TARGET_MMA) >> { >> @@ -23614,6 +23620,7 @@ static struct rs6000_opt_mask const >> rs6000_opt_masks[] = >> { "power9-minmax", OPTION_MASK_P9_MINMAX, false, true }, >> { "power9-misc", OPTION_MASK_P9_MISC, false, true }, >> { "power9-vector", OPTION_MASK_P9_VECTOR, false, true }, >> + { "power10-fusion", OPTION_MASK_P10_FUSION, false, >> true }, >> { "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false, true }, >> { "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true }, >> { "prefixed", OPTION_MASK_PREFIXED, false, >> true }, >> @@ -25705,6 +25712,50 @@ address_to_insn_form (rtx addr, >> return INSN_FORM_BAD; >> } >> >> +/* Given address rtx ADDR for a load of MODE, is this legitimate for a >> + non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is >> + given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want >> + a D-form or DS-form instruction. X-form and base_reg are always >> + allowed. */ >> +bool >> +address_is_non_pfx_d_or_x (rtx addr, machine_mode mode, >> + enum non_prefixed_form non_prefixed_format) >> +{ >> + enum insn_form result_form; >> + >> + result_form = address_to_insn_form (addr, mode, non_prefixed_format); >> + >> + switch (non_prefixed_format) >> + { >> + case NON_PREFIXED_D: >> + switch (result_form) >> + { >> + case INSN_FORM_X: >> + case INSN_FORM_D: >> + case INSN_FORM_DS: >> + case INSN_FORM_BASE_REG: >> + return true; >> + default: >> + break; >> + } >> + break; >> + case NON_PREFIXED_DS: >> + switch (result_form) >> + { >> + case INSN_FORM_X: >> + case INSN_FORM_DS: >> + case INSN_FORM_BASE_REG: >> + return true; >> + default: >> + break; >> + } >> + break; >> + default: >> + break; >> + } >> + return false; >> +} >> + >> /* Helper function to see if we're potentially looking at lfs/stfs. >> - PARALLEL containing a SET and a CLOBBER >> - stfs: >> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h >> index 5bf9c83fc1e..307c0b200bd 100644 >> --- a/gcc/config/rs6000/rs6000.h >> +++ b/gcc/config/rs6000/rs6000.h >> @@ -539,6 +539,7 @@ extern int rs6000_vector_align[]; >> #define MASK_UPDATE OPTION_MASK_UPDATE >> #define MASK_VSX OPTION_MASK_VSX >> #define MASK_POWER10 OPTION_MASK_POWER10 >> +#define MASK_P10_FUSION OPTION_MASK_P10_FUSION >> >> #ifndef IN_LIBGCC2 >> #define MASK_POWERPC64 OPTION_MASK_POWERPC64 >> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md >> index b89990f46bf..c39b7098978 100644 >> --- a/gcc/config/rs6000/rs6000.md >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -14926,3 +14926,4 @@ (define_insn "*cmpeqb_internal" >> (include "dfp.md") >> (include "crypto.md") >> (include "htm.md") >> +(include "fusion.md") >> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt >> index 2888172cb27..008a318b98d 100644 >> --- a/gcc/config/rs6000/rs6000.opt >> +++ b/gcc/config/rs6000/rs6000.opt >> @@ -479,6 +479,14 @@ mpower8-vector >> Target Report Mask(P8_VECTOR) Var(rs6000_isa_flags) >> Use vector and scalar instructions added in ISA 2.07. >> >> +mpower10-fusion >> +Target Report Mask(P10_FUSION) Var(rs6000_isa_flags) >> +Fuse certain integer operations together for better performance on power10. >> + >> +mpower10-fusion-ld-cmpi >> +Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags) >> +Fuse certain integer operations together for better performance on power10. >> + >> mcrypto >> Target Report Mask(CRYPTO) Var(rs6000_isa_flags) >> Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions. >> diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 >> index 1ddb5729cb2..bcc71a9e21b 100644 >> --- a/gcc/config/rs6000/t-rs6000 >> +++ b/gcc/config/rs6000/t-rs6000 >> @@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c >> $(COMPILE) $< >> $(POSTCOMPILE) >> >> +$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl >> + $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md >> + >> $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh >> \ >> $(srcdir)/config/rs6000/rs6000-cpus.def >> $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \ >> @@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \ >> $(srcdir)/config/rs6000/mma.md \ >> $(srcdir)/config/rs6000/crypto.md \ >> $(srcdir)/config/rs6000/htm.md \ >> - $(srcdir)/config/rs6000/dfp.md >> + $(srcdir)/config/rs6000/dfp.md \ >> + $(srcdir)/config/rs6000/fusion.md >> -- >> 2.27.0 >> >