Ping.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> On Jan 3, 2021, at 2:42 PM, Aaron Sawdey <acsaw...@linux.ibm.com> wrote:
> 
> Ping.
> 
> I assume we’re going to want a separate patch for the new instruction type.
> 
> Aaron Sawdey, Ph.D. saw...@linux.ibm.com
> IBM Linux on POWER Toolchain
> 
> 
>> On Dec 4, 2020, at 1:19 PM, acsaw...@linux.ibm.com wrote:
>> 
>> From: Aaron Sawdey <acsaw...@linux.ibm.com>
>> 
>> This patch adds the first batch of patterns to support p10 fusion. These
>> will allow combine to create a single insn for a pair of instructions
>> that that power10 can fuse and execute. These particular ones have the
>> requirement that only cr0 can be used when fusing a load with a compare
>> immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
>> to put that requirement in, and if it doesn't work out later the splitter
>> can get used.
>> 
>> The patterns are generated by a script genfusion.pl and live in new file
>> fusion.md. This script will be expanded to generate more patterns for
>> fusion.
>> 
>> This also adds option -mpower10-fusion which defaults on for power10 and
>> will gate all these fusion patterns. In addition I have added an
>> undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
>> that just controls the load+compare-immediate patterns. I have make
>> these default on for power10 but they are not disallowed for earlier
>> processors because it is still valid code. This allows us to test the
>> correctness of fusion code generation by turning it on explicitly.
>> 
>> If bootstrap/regtest is clean, ok for trunk?
>> 
>> Thanks!
>> 
>>  Aaron
>> 
>> gcc/ChangeLog:
>> 
>>      * config/rs6000/genfusion.pl: New file, script to generate
>>      define_insn_and_split patterns so combine can arrange fused
>>      instructions next to each other.
>>      * config/rs6000/fusion.md: New file, generated fused instruction
>>      patterns for combine.
>>      * config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
>>      (non_update_memory_operand): New predicate.
>>      * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
>>      OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
>>      POWERPC_MASKS.
>>      * config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
>>      prototype.
>>      * config/rs6000/rs6000.c (rs6000_option_override_internal):
>>      automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
>>      if target is power10.  (rs600_opt_masks): Allow -mpower10-fusion
>>      in function attributes.  (address_is_non_pfx_d_or_x): New function.
>>      * config/rs6000/rs6000.h: Add MASK_P10_FUSION.
>>      * config/rs6000/rs6000.md: Include fusion.md.
>>      * config/rs6000/rs6000.opt: Add -mpower10-fusion
>>      and -mpower10-fusion-ld-cmpi.
>>      * config/rs6000/t-rs6000: Add dependencies involving fusion.md.
>> ---
>> gcc/config/rs6000/fusion.md       | 357 ++++++++++++++++++++++++++++++
>> gcc/config/rs6000/genfusion.pl    | 144 ++++++++++++
>> gcc/config/rs6000/predicates.md   |  14 ++
>> gcc/config/rs6000/rs6000-cpus.def |   6 +-
>> gcc/config/rs6000/rs6000-protos.h |   2 +
>> gcc/config/rs6000/rs6000.c        |  51 +++++
>> gcc/config/rs6000/rs6000.h        |   1 +
>> gcc/config/rs6000/rs6000.md       |   1 +
>> gcc/config/rs6000/rs6000.opt      |   8 +
>> gcc/config/rs6000/t-rs6000        |   6 +-
>> 10 files changed, 588 insertions(+), 2 deletions(-)
>> create mode 100644 gcc/config/rs6000/fusion.md
>> create mode 100755 gcc/config/rs6000/genfusion.pl
>> 
>> diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
>> new file mode 100644
>> index 00000000000..a4d3a6ae7f3
>> --- /dev/null
>> +++ b/gcc/config/rs6000/fusion.md
>> @@ -0,0 +1,357 @@
>> +;; -*- buffer-read-only: t -*-
>> +;; Generated automatically by genfusion.pl
>> +
>> +;; Copyright (C) 2020 Free Software Foundation, Inc.
>> +;;
>> +;; This file is part of GCC.
>> +;;
>> +;; GCC is free software; you can redistribute it and/or modify it under
>> +;; the terms of the GNU General Public License as published by the Free
>> +;; Software Foundation; either version 3, or (at your option) any later
>> +;; version.
>> +;;
>> +;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +;; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +;; for more details.
>> +;;
>> +;; You should have received a copy of the GNU General Public License
>> +;; along with GCC; see the file COPYING3.  If not see
>> +;; <http://www.gnu.org/licenses/>.
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is DI result mode is clobber compare mode is CC extend is none
>> +(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "=x")
>> +        (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
>> +   (clobber (match_scratch:DI 0 "=r"))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "ld%X1 %0,%1\;cmpdi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, 
>> NON_PREFIXED_DS))"
>> +  [(set (match_dup 0) (match_dup 1))
>> +   (set (match_dup 2)
>> +        (compare:CC (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is DI result mode is clobber compare mode is CCUNS extend is 
>> none
>> +(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:DI 3 "const_0_to_1_operand" "n")))
>> +   (clobber (match_scratch:DI 0 "=r"))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "ld%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, 
>> NON_PREFIXED_DS))"
>> +  [(set (match_dup 0) (match_dup 1))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is DI result mode is DI compare mode is CC extend is none
>> +(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "=x")
>> +        (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
>> +   (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "ld%X1 %0,%1\;cmpdi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, 
>> NON_PREFIXED_DS))"
>> +  [(set (match_dup 0) (match_dup 1))
>> +   (set (match_dup 2)
>> +        (compare:CC (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is DI result mode is DI compare mode is CCUNS extend is none
>> +(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:DI 3 "const_0_to_1_operand" "n")))
>> +   (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "ld%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, 
>> NON_PREFIXED_DS))"
>> +  [(set (match_dup 0) (match_dup 1))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is SI result mode is clobber compare mode is CC extend is none
>> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "=x")
>> +        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:SI 3 "const_m1_to_1_operand" "n")))
>> +   (clobber (match_scratch:SI 0 "=r"))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, 
>> NON_PREFIXED_DS))"
>> +  [(set (match_dup 0) (match_dup 1))
>> +   (set (match_dup 2)
>> +        (compare:CC (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is SI result mode is clobber compare mode is CCUNS extend is 
>> none
>> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:SI 3 "const_0_to_1_operand" "n")))
>> +   (clobber (match_scratch:SI 0 "=r"))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (match_dup 1))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is SI result mode is SI compare mode is CC extend is none
>> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "=x")
>> +        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:SI 3 "const_m1_to_1_operand" "n")))
>> +   (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, 
>> NON_PREFIXED_DS))"
>> +  [(set (match_dup 0) (match_dup 1))
>> +   (set (match_dup 2)
>> +        (compare:CC (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is SI result mode is SI compare mode is CCUNS extend is none
>> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:SI 3 "const_0_to_1_operand" "n")))
>> +   (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (match_dup 1))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
>> +(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "=x")
>> +        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:SI 3 "const_m1_to_1_operand" "n")))
>> +   (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI 
>> (match_dup 1)))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, 
>> NON_PREFIXED_DS))"
>> +  [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
>> +   (set (match_dup 2)
>> +        (compare:CC (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
>> +(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:SI 3 "const_0_to_1_operand" "n")))
>> +   (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI 
>> (match_dup 1)))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is HI result mode is clobber compare mode is CC extend is sign
>> +(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "=x")
>> +        (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:HI 3 "const_m1_to_1_operand" "n")))
>> +   (clobber (match_scratch:GPR 0 "=r"))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lha%X1 %0,%1\;cmpdi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
>> +   (set (match_dup 2)
>> +        (compare:CC (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is HI result mode is clobber compare mode is CCUNS extend is 
>> zero
>> +(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:HI 3 "const_0_to_1_operand" "n")))
>> +   (clobber (match_scratch:GPR 0 "=r"))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
>> +(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "=x")
>> +        (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:HI 3 "const_m1_to_1_operand" "n")))
>> +   (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI 
>> (match_dup 1)))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lha%X1 %0,%1\;cmpdi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
>> +   (set (match_dup 2)
>> +        (compare:CC (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
>> +(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:HI 3 "const_0_to_1_operand" "n")))
>> +   (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI 
>> (match_dup 1)))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is QI result mode is clobber compare mode is CCUNS extend is 
>> zero
>> +(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:QI 3 "const_0_to_1_operand" "n")))
>> +   (clobber (match_scratch:GPR 0 "=r"))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> +;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
>> +;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
>> +(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
>> +  [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
>> +        (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
>> +                 (match_operand:QI 3 "const_0_to_1_operand" "n")))
>> +   (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR 
>> (match_dup 1)))]
>> +  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
>> +  "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
>> +  "&& reload_completed
>> +   && (cc_reg_not_cr0_operand (operands[2], CCmode)
>> +       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, 
>> NON_PREFIXED_D))"
>> +  [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
>> +   (set (match_dup 2)
>> +        (compare:CCUNS (match_dup 0)
>> +                (match_dup 3)))]
>> +  ""
>> +  [(set_attr "type" "load")
>> +   (set_attr "cost" "8")
>> +   (set_attr "length" "8")])
>> +
>> diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
>> new file mode 100755
>> index 00000000000..494537c9439
>> --- /dev/null
>> +++ b/gcc/config/rs6000/genfusion.pl
>> @@ -0,0 +1,144 @@
>> +#!/usr/bin/perl -w
>> +# Generate fusion.md 
>> +# Copyright (C) 2020 Free Software Foundation, Inc.
>> +#
>> +# This file is part of GCC.
>> +#
>> +# GCC is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3, or (at your option)
>> +# any later version.
>> +#
>> +# GCC is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with GCC; see the file COPYING3.  If not see
>> +# <http://www.gnu.org/licenses/>.
>> +
>> +my $copyright =  <<'EOF';
>> +;; -*- buffer-read-only: t -*-
>> +;; Generated automatically by genfusion.pl
>> +
>> +;; Copyright (C) 2020 Free Software Foundation, Inc.
>> +;;
>> +;; This file is part of GCC.
>> +;;
>> +;; GCC is free software; you can redistribute it and/or modify it under
>> +;; the terms of the GNU General Public License as published by the Free
>> +;; Software Foundation; either version 3, or (at your option) any later
>> +;; version.
>> +;;
>> +;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +;; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +;; for more details.
>> +;;
>> +;; You should have received a copy of the GNU General Public License
>> +;; along with GCC; see the file COPYING3.  If not see
>> +;; <http://www.gnu.org/licenses/>.
>> +
>> +EOF
>> +
>> +print $copyright;
>> +
>> +sub mode_to_ldst_char
>> +{
>> +    my ($mode) = @_;
>> +    if ($mode eq 'DI') { return 'd'; }
>> +    if ($mode eq 'SI') { return 'w'; }
>> +    if ($mode eq 'HI') { return 'h'; }
>> +    if ($mode eq 'QI') { return 'b'; }
>> +    return '?';
>> +}
>> +
>> +sub gen_ld_cmpi_p10
>> +{
>> +  LMODE: foreach $lmode ('DI','SI','HI','QI') {
>> +      $ldst = mode_to_ldst_char($lmode);
>> +      $clobbermode = $lmode;
>> +      # For clobber, we need a SI/DI reg in case we split because we have 
>> to sign/zero extend.
>> +      if ( $lmode eq 'HI' || $lmode eq 'QI' ) { $clobbermode = "GPR"; }
>> +    RESULT: foreach $result ('clobber', $lmode,  "EXT".$lmode) {
>> +    # EXTDI does not exist, and we cannot directly produce HI/QI results.
>> +    next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
>> +    # Don't allow EXTQI because that would allow HI result which we can't 
>> do.
>> +    if ( $result eq "EXTQI" ) { $result = "GPR"; }
>> +      CCMODE: foreach $ccmode ('CC','CCUNS') {
>> +      $np = "NON_PREFIXED_D";
>> +      if ( $ccmode eq 'CC' ) {
>> +          next CCMODE if $lmode eq 'QI';
>> +          if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
>> +              # ld and lwa are both DS-FORM.
>> +              $np = "NON_PREFIXED_DS";
>> +          }
>> +          $cmpl = "";
>> +          $echr = "a";
>> +          $constpred = "const_m1_to_1_operand";
>> +      } else {
>> +          if ( $lmode eq 'DI' ) {
>> +              # ld is DS-form, but lwz is not.
>> +              $np = "NON_PREFIXED_DS";
>> +          }
>> +          $cmpl = "l";
>> +          $echr = "z";
>> +          $constpred = "const_0_to_1_operand";
>> +      }
>> +      if ($lmode eq 'DI') { $echr = ""; }
>> +      if ($result =~ m/EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
>> +          # We always need extension if result > lmode.
>> +          if ( $ccmode eq 'CC' ) {
>> +              $extend = "sign";
>> +          } else {
>> +              $extend = "zero";
>> +          }
>> +      } else {
>> +          # Result of SI/DI does not need sign extension.
>> +          $extend = "none";
>> +      }
>> +      print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
>> +      print ";; load mode is $lmode result mode is $result compare mode is 
>> $ccmode extend is $extend\n";
>> +
>> +      print "(define_insn_and_split 
>> \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
>> +      print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" 
>> \"=x\")\n";
>> +      print "        (compare:${ccmode} (match_operand:${lmode} 1 
>> \"non_update_memory_operand\" \"m\")\n";
>> +      print "                 (match_operand:${lmode} 3 \"${constpred}\" 
>> \"n\")))\n";
>> +      if ($result eq 'clobber') {
>> +          print "   (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
>> +      } elsif ($result eq $lmode) {
>> +          print "   (set (match_operand:${result} 0 \"gpc_reg_operand\" 
>> \"=r\") (match_dup 1))]\n";
>> +      } else {
>> +          print "   (set (match_operand:${result} 0 \"gpc_reg_operand\" 
>> \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
>> +      }
>> +      print "  \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
>> +      print "  \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di 0,%0,%3\"\n";
>> +      print "  \"&& reload_completed\n";
>> +      print "   && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
>> +      print "       || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), 
>> ${lmode}mode, ${np}))\"\n";
>> +      if ($extend eq "none") {
>> +          print "  [(set (match_dup 0) (match_dup 1))\n";
>> +      } else {
>> +          $resultmode = $result;
>> +          if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
>> +          print "  [(set (match_dup 0) (${extend}_extend:${resultmode} 
>> (match_dup 1)))\n";
>> +      }
>> +      print "   (set (match_dup 2)\n";
>> +      print "        (compare:${ccmode} (match_dup 0)\n";
>> +      print "                   (match_dup 3)))]\n";
>> +      print "  \"\"\n";
>> +      print "  [(set_attr \"type\" \"load\")\n";
>> +      print "   (set_attr \"cost\" \"8\")\n";
>> +      print "   (set_attr \"length\" \"8\")])\n";
>> +      print "\n";
>> +      }
>> +    }
>> +  }
>> +}
>> +
>> +
>> +gen_ld_cmpi_p10();
>> +
>> +exit(0);
>> +
>> diff --git a/gcc/config/rs6000/predicates.md 
>> b/gcc/config/rs6000/predicates.md
>> index 9ad5ae67302..78de8102f44 100644
>> --- a/gcc/config/rs6000/predicates.md
>> +++ b/gcc/config/rs6000/predicates.md
>> @@ -297,6 +297,11 @@ (define_predicate "const_0_to_1_operand"
>>  (and (match_code "const_int")
>>       (match_test "IN_RANGE (INTVAL (op), 0, 1)")))
>> 
>> +;; Match op = -1, op = 0, or op = 1.
>> +(define_predicate "const_m1_to_1_operand"
>> +  (and (match_code "const_int")
>> +       (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
>> +
>> ;; Match op = 0..3.
>> (define_predicate "const_0_to_3_operand"
>>  (and (match_code "const_int")
>> @@ -847,6 +852,15 @@ (define_special_predicate "update_address_mem"
>>                  || GET_CODE (XEXP (op, 0)) == PRE_DEC
>>                  || GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
>> 
>> +;; Anything that matches memory_operand but does not update the address.
>> +(define_predicate "non_update_memory_operand"
>> +  (match_code "mem")
>> +{
>> +  if (update_address_mem (op, mode))
>> +    return 0;
>> +  return memory_operand (op, mode);
>> +})
>> +
>> ;; Return 1 if the operand is a MEM with an indexed-form address.
>> (define_special_predicate "indexed_address_mem"
>>  (match_test "(MEM_P (op)
>> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
>> b/gcc/config/rs6000/rs6000-cpus.def
>> index 8d2c1ffd6cf..3e65289d8df 100644
>> --- a/gcc/config/rs6000/rs6000-cpus.def
>> +++ b/gcc/config/rs6000/rs6000-cpus.def
>> @@ -82,7 +82,9 @@
>> 
>> #define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER                   \
>>                               | OPTION_MASK_POWER10                  \
>> -                             | OTHER_POWER10_MASKS)
>> +                             | OTHER_POWER10_MASKS                  \
>> +                             | OPTION_MASK_P10_FUSION               \
>> +                             | OPTION_MASK_P10_FUSION_LD_CMPI)
>> 
>> /* Flags that need to be turned off if -mno-power9-vector.  */
>> #define OTHER_P9_VECTOR_MASKS        (OPTION_MASK_FLOAT128_HW                
>> \
>> @@ -129,6 +131,8 @@
>>                               | OPTION_MASK_FLOAT128_KEYWORD         \
>>                               | OPTION_MASK_FPRND                    \
>>                               | OPTION_MASK_POWER10                  \
>> +                             | OPTION_MASK_P10_FUSION               \
>> +                             | OPTION_MASK_P10_FUSION_LD_CMPI       \
>>                               | OPTION_MASK_HTM                      \
>>                               | OPTION_MASK_ISEL                     \
>>                               | OPTION_MASK_MFCRF                    \
>> diff --git a/gcc/config/rs6000/rs6000-protos.h 
>> b/gcc/config/rs6000/rs6000-protos.h
>> index 3c4682b0e26..cd644083558 100644
>> --- a/gcc/config/rs6000/rs6000-protos.h
>> +++ b/gcc/config/rs6000/rs6000-protos.h
>> @@ -191,6 +191,8 @@ enum non_prefixed_form {
>> 
>> extern enum insn_form address_to_insn_form (rtx, machine_mode,
>>                                          enum non_prefixed_form);
>> +extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
>> +                                   enum non_prefixed_form 
>> non_prefix_format);
>> extern bool prefixed_load_p (rtx_insn *);
>> extern bool prefixed_store_p (rtx_insn *);
>> extern bool prefixed_paddi_p (rtx_insn *);
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 517467ebc63..759551d07ec 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -4423,6 +4423,12 @@ rs6000_option_override_internal (bool global_init_p)
>>  if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
>>    rs6000_isa_flags |= OPTION_MASK_MMA;
>> 
>> +  if (TARGET_POWER10 && (rs6000_isa_flags_explicit & 
>> OPTION_MASK_P10_FUSION) == 0)
>> +    rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
>> +
>> +  if (TARGET_POWER10 && (rs6000_isa_flags_explicit & 
>> OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
>> +    rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
>> +
>>  /* Turn off vector pair/mma options on non-power10 systems.  */
>>  else if (!TARGET_POWER10 && TARGET_MMA)
>>    {
>> @@ -23614,6 +23620,7 @@ static struct rs6000_opt_mask const 
>> rs6000_opt_masks[] =
>>  { "power9-minmax",          OPTION_MASK_P9_MINMAX,          false, true  },
>>  { "power9-misc",            OPTION_MASK_P9_MISC,            false, true  },
>>  { "power9-vector",          OPTION_MASK_P9_VECTOR,          false, true  },
>> +  { "power10-fusion",               OPTION_MASK_P10_FUSION,         false, 
>> true  },
>>  { "powerpc-gfxopt",         OPTION_MASK_PPC_GFXOPT,         false, true  },
>>  { "powerpc-gpopt",          OPTION_MASK_PPC_GPOPT,          false, true  },
>>  { "prefixed",                       OPTION_MASK_PREFIXED,           false, 
>> true  },
>> @@ -25705,6 +25712,50 @@ address_to_insn_form (rtx addr,
>>  return INSN_FORM_BAD;
>> }
>> 
>> +/* Given address rtx ADDR for a load of MODE, is this legitimate for a
>> +   non-prefixed D-form or X-form instruction?  NON_PREFIXED_FORMAT is
>> +   given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
>> +   a D-form or DS-form instruction.  X-form and base_reg are always
>> +   allowed.  */
>> +bool
>> +address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
>> +                       enum non_prefixed_form non_prefixed_format)
>> +{
>> +  enum insn_form result_form;
>> +
>> +  result_form = address_to_insn_form (addr, mode, non_prefixed_format);
>> +
>> +  switch (non_prefixed_format)
>> +    {
>> +    case NON_PREFIXED_D:
>> +      switch (result_form)
>> +    {
>> +    case INSN_FORM_X:
>> +    case INSN_FORM_D:
>> +    case INSN_FORM_DS:
>> +    case INSN_FORM_BASE_REG:
>> +      return true;
>> +    default:
>> +      break;
>> +    }
>> +      break;
>> +    case NON_PREFIXED_DS:
>> +      switch (result_form)
>> +    {
>> +    case INSN_FORM_X:
>> +    case INSN_FORM_DS:
>> +    case INSN_FORM_BASE_REG:
>> +      return true;
>> +    default:
>> +      break;
>> +    }
>> +      break;
>> +    default:
>> +      break;
>> +    }
>> +  return false;
>> +}
>> +
>> /* Helper function to see if we're potentially looking at lfs/stfs.
>>   - PARALLEL containing a SET and a CLOBBER
>>   - stfs:
>> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
>> index 5bf9c83fc1e..307c0b200bd 100644
>> --- a/gcc/config/rs6000/rs6000.h
>> +++ b/gcc/config/rs6000/rs6000.h
>> @@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
>> #define MASK_UPDATE                  OPTION_MASK_UPDATE
>> #define MASK_VSX                     OPTION_MASK_VSX
>> #define MASK_POWER10                 OPTION_MASK_POWER10
>> +#define MASK_P10_FUSION                     OPTION_MASK_P10_FUSION
>> 
>> #ifndef IN_LIBGCC2
>> #define MASK_POWERPC64                       OPTION_MASK_POWERPC64
>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>> index b89990f46bf..c39b7098978 100644
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -14926,3 +14926,4 @@ (define_insn "*cmpeqb_internal"
>> (include "dfp.md")
>> (include "crypto.md")
>> (include "htm.md")
>> +(include "fusion.md")
>> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
>> index 2888172cb27..008a318b98d 100644
>> --- a/gcc/config/rs6000/rs6000.opt
>> +++ b/gcc/config/rs6000/rs6000.opt
>> @@ -479,6 +479,14 @@ mpower8-vector
>> Target Report Mask(P8_VECTOR) Var(rs6000_isa_flags)
>> Use vector and scalar instructions added in ISA 2.07.
>> 
>> +mpower10-fusion
>> +Target Report Mask(P10_FUSION) Var(rs6000_isa_flags)
>> +Fuse certain integer operations together for better performance on power10.
>> +
>> +mpower10-fusion-ld-cmpi
>> +Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
>> +Fuse certain integer operations together for better performance on power10.
>> +
>> mcrypto
>> Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
>> Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
>> diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
>> index 1ddb5729cb2..bcc71a9e21b 100644
>> --- a/gcc/config/rs6000/t-rs6000
>> +++ b/gcc/config/rs6000/t-rs6000
>> @@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
>>      $(COMPILE) $<
>>      $(POSTCOMPILE)
>> 
>> +$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
>> +    $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
>> +
>> $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh 
>> \
>>  $(srcdir)/config/rs6000/rs6000-cpus.def
>>      $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
>> @@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
>>      $(srcdir)/config/rs6000/mma.md \
>>      $(srcdir)/config/rs6000/crypto.md \
>>      $(srcdir)/config/rs6000/htm.md \
>> -    $(srcdir)/config/rs6000/dfp.md
>> +    $(srcdir)/config/rs6000/dfp.md \
>> +    $(srcdir)/config/rs6000/fusion.md
>> -- 
>> 2.27.0
>> 
> 

Reply via email to