Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/28/23 22:04, Li, Pan2 wrote:

It seems this patch may result in many test ICE failures on RISC-V backend. 
Could you help to double confirm about it follow the possible reproduce steps 
like blow? Thank you!
I've one ICE due to this change as well but it wasn't in the 
tree-ssa-math-opts.code like this one is.  In my case we're in a place 
where it doesn't look like we expect a vector type to show up, but it 
does and we can likely just prune it away.


Anyway, your fault is in here:



divmod_candidate_p:

 if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
  && TYPE_PRECISION (type) <= BITS_PER_WORD)
return false;

TYPE is almost certainly a vector type.  The question we need to answer 
(and I'm not likely to get to it tomorrow) would be whether or not TYPE 
can legitimately be a vector type here.


The whole point of Richi's change is to detect invalid uses of 
TYPE_PRECISION.  So it's not a big surprise that we're finding a few as 
the change gets wider testing.


jeff



Re: [PATCH] i386: refactor macros.

2023-06-28 Thread Hongtao Liu via Gcc-patches
On Thu, Jun 29, 2023 at 10:51 AM Hu, Lin1 via Gcc-patches
 wrote:
>
> Hi, all
>
> This patch aims to refactor macros in case some other thing is added to
> AMX_TILE_SET in future. OK for trunk?
Ok, thanks.
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_INT8_SET):
> Change OPTION_MASK_ISA2_AMX_TILE to OPTION_MASK_ISA2_AMX_TILE_SET.
> (OPTION_MASK_ISA2_AMX_FP16_SET): Ditto
> (OPTION_MASK_ISA2_AMX_COMPLEX_SET): Ditto
> (OPTION_MASK_ISA_ABM_SET):
> Change OPTION_MASK_ISA_POPCNT to OPTION_MASK_ISA_POPCNT_SET.
> ---
>  gcc/common/config/i386/i386-common.cc | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index bf126f14073..4f79afba917 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -107,18 +107,18 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_SET 
> OPTION_MASK_ISA2_AVX512VP2INTERSECT
>  #define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE
>  #define OPTION_MASK_ISA2_AMX_INT8_SET \
> -  (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA2_AMX_INT8)
> +  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_INT8)
>  #define OPTION_MASK_ISA2_AMX_BF16_SET \
> -  (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA2_AMX_BF16)
> +  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_BF16)
>  #define OPTION_MASK_ISA2_AVXVNNIINT8_SET OPTION_MASK_ISA2_AVXVNNIINT8
>  #define OPTION_MASK_ISA2_AVXNECONVERT_SET OPTION_MASK_ISA2_AVXNECONVERT
>  #define OPTION_MASK_ISA2_CMPCCXADD_SET OPTION_MASK_ISA2_CMPCCXADD
>  #define OPTION_MASK_ISA2_AMX_FP16_SET \
> -  (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA2_AMX_FP16)
> +  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_FP16)
>  #define OPTION_MASK_ISA2_PREFETCHI_SET OPTION_MASK_ISA2_PREFETCHI
>  #define OPTION_MASK_ISA2_RAOINT_SET OPTION_MASK_ISA2_RAOINT
>  #define OPTION_MASK_ISA2_AMX_COMPLEX_SET \
> -  (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA2_AMX_COMPLEX)
> +  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_COMPLEX)
>
>  /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
> as -msse4.2.  */
> @@ -143,7 +143,7 @@ along with GCC; see the file COPYING3.  If not see
>(OPTION_MASK_ISA_PCLMUL | OPTION_MASK_ISA_SSE2_SET)
>
>  #define OPTION_MASK_ISA_ABM_SET \
> -  (OPTION_MASK_ISA_ABM | OPTION_MASK_ISA_POPCNT)
> +  (OPTION_MASK_ISA_ABM | OPTION_MASK_ISA_POPCNT_SET)
>
>  #define OPTION_MASK_ISA2_PCONFIG_SET OPTION_MASK_ISA2_PCONFIG
>  #define OPTION_MASK_ISA2_WBNOINVD_SET OPTION_MASK_ISA2_WBNOINVD
> --
> 2.31.1
>


-- 
BR,
Hongtao


RE: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-28 Thread Li, Pan2 via Gcc-patches
Sorry for disturbing, cc kito, juzhe and robin for awareness.

Pan

-Original Message-
From: Li, Pan2 
Sent: Thursday, June 29, 2023 12:05 PM
To: Jakub Jelinek ; Richard Biener 
Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com
Subject: RE: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

It seems this patch may result in many test ICE failures on RISC-V backend. 
Could you help to double confirm about it follow the possible reproduce steps 
like blow? Thank you!

cd gcc && mkdir __BUILD__ && cd __BUILD__
../configure \
  --target=riscv64-unknown-elf \
  --prefix= \
  --disable-shared \
  --enable-threads \
  --enable-tls \
  --enable-languages=c,c++ \
  --with-system-zlib \
  --with-newlib \
  --disable-libmudflap \
  --disable-libssp \
  --disable-libquadmath \
  --disable-libgomp \
  --enable-nls \
  --disable-tm-clone-registry \
  --enable-multilib \
  --src=`pwd`/../ \
  --with-abi=lp64d \
  --with-arch=rv64imafdcv \
  --with-tune=rocket \
  --with-isa-spec=20191213 \
  --enable-werror \
  --enable-bootstrap \
  CFLAGS_FOR_BUILD="-O0 -g" \
  CXXFLAGS_FOR_BUILD="-O0 -g" \
  CFLAGS_FOR_TARGET="-O0 -g" \
  CXXFLAGS_FOR_TARGET="-O0 -g" \
  BOOT_CFLAGS="-O0 -g" \
  CFLAGS="-O0 -g" \
  CXXFLAGS="-O0 -g" \
  GM2FLAGS_FOR_TARGET="-O0 -g" \
  GOCFLAGS_FOR_TARGET="-O0 -g" \
  GDCFLAGS_FOR_TARGET="-O0 -g"
make -j $(nproc) all-gcc && make install-gcc

Then run one test file build like below, and you may see the ICE similar to 
below.

../__RISC-V_INSTALL_/bin/riscv64-unknown-elf-gcc -O2 --param 
riscv-autovec-preference=fixed-vlmax 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c
during GIMPLE pass: widening_mul
In file included from 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c:4:
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c: In 
function 'f3_init':
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c:249:1: 
internal compiler error: tree check: expected none of vector_type, have 
vector_type in divmod_candidate_p, at tree-ssa-math-opts.cc:4998
  249 | f3_init (int8_t *__restrict x, int8_t *__restrict x2, int64_t 
*__restrict y,
  | ^~~
0x1b1584e tree_not_check_failed(tree_node const*, char const*, int, char 
const*, ...)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree.cc:8936
0xd74e9e tree_not_check(tree_node*, char const*, int, char const*, tree_code)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree.h:3581
0x196150c divmod_candidate_p

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:4998
0x196164f convert_to_divmod

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5041
0x196383d after_dom_children

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5580
0x299bcb4 dom_walker::walk(basic_block_def*)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/domwalk.cc:354
0x1963d09 execute

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5666
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jakub Jelinek via Gcc-patches
Sent: Tuesday, June 27, 2023 5:47 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com
Subject: Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

On Tue, Jun 27, 2023 at 11:45:33AM +0200, Richard Biener wrote:
> The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
> ICEs when tree checking is enabled.  This should avoid wrong-code
> in cases like PR110182 and instead ICE.
> 
> It also introduces a TYPE_PRECISION_RAW accessor and adjusts
> places I found that are eligible to use that.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu with all
> languages enabled.
> 
> OK for trunk?  There is definitely going to be fallout but it
> should be straight-forward to fix with quick fixes using
> TYPE_PRECISION_RAW possible.
> 
> Thanks,
> Richard.
> 
>   * tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
>   (TYPE_PRECISION_RAW): Provide raw access to the precision
>   field.
>   * tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
>   (gimple_canonical_types_compatible_p): Likewise.
>   * tree-streamer-out.cc (pack_ts_type_common_value_fields):
>   Stream TYPE_PRECISION_RAW.
>   * tree-streamer-in.cc (unpack_ts_type_common_value_fields):
>   Likewise.
>   * lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.
> 
> gcc/lto/
>   * lto-common.cc (compare_tree_sccs_1): Use TYPE_PRECISION_RAW.

LGTM.

Jakub



RE: [PATCH v1] RISC-V: Allow rounding mode control for RVV floating-point add

2023-06-28 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, June 29, 2023 10:35 AM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; gcc-patches ; Wang, Yanzhang 
; jeffreyalaw 
Subject: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
floating-point add

LGTM, thanks!

On Tue, Jun 27, 2023 at 3:02 PM Li, Pan2  wrote:
>
> Ack, thanks Juzhe.
>
>
>
> Pan
>
>
>
> From: juzhe.zh...@rivai.ai 
> Sent: Tuesday, June 27, 2023 3:00 PM
> To: Li, Pan2 ; gcc-patches 
> Cc: Kito.cheng ; Li, Pan2 ; Wang, 
> Yanzhang ; jeffreyalaw 
> Subject: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
> floating-point add
>
>
>
> LGTM.
>
> You can go ahead to implement rounding mode of floating-point by 
> mode-switching:
>
>
>
> Suggest you implement rounding mode for floating-poing as follows:
>
>
>
> 1st step: Implement mode-switching for floating-point rounding mode except 
> DYNAMIC which should be totally same as fixed-point.
>
> 2nd step: Support DYNAMIC rounding mode on mode-switching which may need to 
> modify the mode-switching PASS.
>
>
>
> Thanks.
>
> 
>
> juzhe.zh...@rivai.ai
>
>
>
> From: pan2.li
>
> Date: 2023-06-27 14:06
>
> To: gcc-patches
>
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang; jeffreyalaw
>
> Subject: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
> floating-point add
>
> From: Pan Li 
>
>
>
> According to the doc as below, we need to support the rounding mode of
>
> the RVV floating-point, both the static and dynamice frm.
>
>
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
>
>
> For tracking and development friendly, We will take some steps to support
>
> all rounding modes for the RVV floating-point rounding modes.
>
>
>
> 1. Allow rounding mode control by one intrinsic (aka this patch), vfadd.
>
> 2. Support static rounding mode control by mode switch, like fixed-point.
>
> 3. Support dynamice round mode control by mode switch.
>
> 4. Support the rest floating-point instructions for frm.
>
>
>
> Please *NOTE* this patch only allow the rounding mode control for the
>
> vfadd intrinsic API, and the related frm will be coverred by step 2.
>
>
>
> Signed-off-by: Pan Li 
>
> Co-Authored by: Juzhe-Zhong 
>
>
>
> gcc/ChangeLog:
>
>
>
> * config/riscv/riscv-protos.h (enum floating_point_rounding_mode):
>
> Add macro for static frm min and max.
>
> * config/riscv/riscv-vector-builtins-bases.cc
>
> (class binop_frm): New class for floating-point with frm.
>
> (BASE): Add vfadd for frm.
>
> * config/riscv/riscv-vector-builtins-bases.h: Likewise.
>
> * config/riscv/riscv-vector-builtins-functions.def
>
> (vfadd_frm): Likewise.
>
> * config/riscv/riscv-vector-builtins-shapes.cc
>
> (struct alu_frm_def): New struct for alu with frm.
>
> (SHAPE): Add alu with frm.
>
> * config/riscv/riscv-vector-builtins-shapes.h: Likewise.
>
> * config/riscv/riscv-vector-builtins.cc
>
> (function_checker::report_out_of_range_and_not): New function
>
> for report out of range and not val.
>
> (function_checker::require_immediate_range_or): New function
>
> for checking in range or one val.
>
> * config/riscv/riscv-vector-builtins.h: Add function decl.
>
>
>
> gcc/testsuite/ChangeLog:
>
>
>
> * gcc.target/riscv/rvv/base/float-point-frm-error.c: New test.
>
> * gcc.target/riscv/rvv/base/float-point-frm.c: New test.
>
> ---
>
> gcc/config/riscv/riscv-protos.h   |  2 +
>
> .../riscv/riscv-vector-builtins-bases.cc  | 25 +++
>
> .../riscv/riscv-vector-builtins-bases.h   |  1 +
>
> .../riscv/riscv-vector-builtins-functions.def |  2 +
>
> .../riscv/riscv-vector-builtins-shapes.cc | 68 +++
>
> .../riscv/riscv-vector-builtins-shapes.h  |  1 +
>
> gcc/config/riscv/riscv-vector-builtins.cc | 41 +++
>
> gcc/config/riscv/riscv-vector-builtins.h  |  4 ++
>
> .../riscv/rvv/base/float-point-frm-error.c| 15 
>
> .../riscv/rvv/base/float-point-frm.c  | 30 
>
> 10 files changed, 189 insertions(+)
>
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
>
>
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>
> index f686edab3d1..bee64eee504 100644
>
> --- a/gcc/config/riscv/riscv-protos.h
>
> +++ b/gcc/config/riscv/riscv-protos.h
>
> @@ -278,6 +278,8 @@ enum floating_point_rounding_mode
>
>FRM_RUP = 3, /* Aka 0b011.  */
>
>FRM_RMM = 4, /* Aka 0b100.  */
>
>FRM_DYN = 7, /* Aka 0b111.  */
>
> +  FRM_STATIC_MIN = FRM_RNE,
>
> +  FRM_STATIC_MAX = FRM_RMM,
>
> };
>
> opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
> index 5c8deda900d..1b4c2c6ad66 100644
>
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
> +++ 

RE: [PATCH v1] RISC-V: Support vfadd static rounding mode by mode switching

2023-06-28 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, June 29, 2023 10:34 AM
To: juzhe.zh...@rivai.ai
Cc: Li, Pan2 ; gcc-patches ; Wang, 
Yanzhang ; jeffreyalaw 
Subject: Re: [PATCH v1] RISC-V: Support vfadd static rounding mode by mode 
switching

LGTM, thanks :)

On Thu, Jun 29, 2023 at 10:24 AM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: pan2.li
> Date: 2023-06-29 09:40
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang; jeffreyalaw
> Subject: [PATCH v1] RISC-V: Support vfadd static rounding mode by mode 
> switching
> From: Pan Li 
>
> This patch would like to support the vfadd static round mode similar to
> the fixed-point. Then the related fsrm instructions will be inserted
> correlatively.
>
> Please *NOTE* this PATCH doesn't cover anything about FRM dynamic mode,
> it will be implemented in the underlying PATCH(s).
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_emit_mode_set): Add emit for FRM.
> (riscv_mode_needed): Likewise.
> (riscv_entity_mode_after): Likewise.
> (riscv_mode_after): Likewise.
> (riscv_mode_entry): Likewise.
> (riscv_mode_exit): Likewise.
> * config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Add number
> for FRM.
> * config/riscv/riscv.md: Add FRM register.
> * config/riscv/vector-iterators.md: Add FRM type.
> * config/riscv/vector.md (frm_mode): Define new attr for FRM mode.
> (fsrm): Define new insn for fsrm instruction.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-2.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-3.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-4.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-5.c: New test.
> ---
> gcc/config/riscv/riscv.cc | 52 ++
> gcc/config/riscv/riscv.h  |  4 +-
> gcc/config/riscv/riscv.md |  4 +-
> gcc/config/riscv/vector-iterators.md  |  2 +
> gcc/config/riscv/vector.md| 53 +++
> .../riscv/rvv/base/float-point-frm-insert-1.c | 31 +++
> .../riscv/rvv/base/float-point-frm-insert-2.c | 14 +
> .../riscv/rvv/base/float-point-frm-insert-3.c | 14 +
> .../riscv/rvv/base/float-point-frm-insert-4.c | 23 
> .../riscv/rvv/base/float-point-frm-insert-5.c | 23 
> 10 files changed, 206 insertions(+), 14 deletions(-)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-2.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-3.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-4.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-5.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 280aa0b33b9..e4dc8115e69 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7669,6 +7669,16 @@ riscv_emit_mode_set (int entity, int mode, int 
> prev_mode,
>if (mode != VXRM_MODE_NONE && mode != prev_mode)
> emit_insn (gen_vxrmsi (gen_int_mode (mode, SImode)));
>break;
> +case RISCV_FRM:
> +  if (mode != FRM_MODE_NONE && mode != prev_mode)
> + {
> +   rtx scaler = gen_reg_rtx (SImode);
> +   rtx imm = gen_int_mode (mode, SImode);
> +
> +   emit_insn (gen_movsi (scaler, imm));
> +   emit_insn (gen_fsrm (scaler, scaler));
> + }
> +  break;
>  default:
>gcc_unreachable ();
>  }
> @@ -7680,11 +7690,14 @@ riscv_emit_mode_set (int entity, int mode, int 
> prev_mode,
> static int
> riscv_mode_needed (int entity, rtx_insn *insn)
> {
> +  int code = recog_memoized (insn);
> +
>switch (entity)
>  {
>  case RISCV_VXRM:
> -  return recog_memoized (insn) >= 0 ? get_attr_vxrm_mode (insn)
> - : VXRM_MODE_NONE;
> +  return code >= 0 ? get_attr_vxrm_mode (insn) : VXRM_MODE_NONE;
> +case RISCV_FRM:
> +  return code >= 0 ? get_attr_frm_mode (insn) : FRM_MODE_NONE;
>  default:
>gcc_unreachable ();
>  }
> @@ -7715,6 +7728,21 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
> regno)
>return false;
> }
> +static int
> +riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
> + int (*get_attr_mode) (rtx_insn *), int default_mode)
> +{
> +  if (global_state_unknown_p (insn, regnum))
> +return default_mode;
> +  else if (recog_memoized (insn) < 0)
> +return mode;
> +
> +  rtx reg = gen_rtx_REG (SImode, regnum);
> +  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
> +
> +  return mentioned_p ? get_attr_mode (insn): mode;
> +}
> +
> /* Return the mode that an insn results in.  */
> static int
> @@ -7723,15 

RE: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-28 Thread Li, Pan2 via Gcc-patches
It seems this patch may result in many test ICE failures on RISC-V backend. 
Could you help to double confirm about it follow the possible reproduce steps 
like blow? Thank you!

cd gcc && mkdir __BUILD__ && cd __BUILD__
../configure \
  --target=riscv64-unknown-elf \
  --prefix= \
  --disable-shared \
  --enable-threads \
  --enable-tls \
  --enable-languages=c,c++ \
  --with-system-zlib \
  --with-newlib \
  --disable-libmudflap \
  --disable-libssp \
  --disable-libquadmath \
  --disable-libgomp \
  --enable-nls \
  --disable-tm-clone-registry \
  --enable-multilib \
  --src=`pwd`/../ \
  --with-abi=lp64d \
  --with-arch=rv64imafdcv \
  --with-tune=rocket \
  --with-isa-spec=20191213 \
  --enable-werror \
  --enable-bootstrap \
  CFLAGS_FOR_BUILD="-O0 -g" \
  CXXFLAGS_FOR_BUILD="-O0 -g" \
  CFLAGS_FOR_TARGET="-O0 -g" \
  CXXFLAGS_FOR_TARGET="-O0 -g" \
  BOOT_CFLAGS="-O0 -g" \
  CFLAGS="-O0 -g" \
  CXXFLAGS="-O0 -g" \
  GM2FLAGS_FOR_TARGET="-O0 -g" \
  GOCFLAGS_FOR_TARGET="-O0 -g" \
  GDCFLAGS_FOR_TARGET="-O0 -g"
make -j $(nproc) all-gcc && make install-gcc

Then run one test file build like below, and you may see the ICE similar to 
below.

../__RISC-V_INSTALL_/bin/riscv64-unknown-elf-gcc -O2 --param 
riscv-autovec-preference=fixed-vlmax 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c
during GIMPLE pass: widening_mul
In file included from 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c:4:
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c: In 
function 'f3_init':
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c:249:1: 
internal compiler error: tree check: expected none of vector_type, have 
vector_type in divmod_candidate_p, at tree-ssa-math-opts.cc:4998
  249 | f3_init (int8_t *__restrict x, int8_t *__restrict x2, int64_t 
*__restrict y,
  | ^~~
0x1b1584e tree_not_check_failed(tree_node const*, char const*, int, char 
const*, ...)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree.cc:8936
0xd74e9e tree_not_check(tree_node*, char const*, int, char const*, tree_code)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree.h:3581
0x196150c divmod_candidate_p

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:4998
0x196164f convert_to_divmod

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5041
0x196383d after_dom_children

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5580
0x299bcb4 dom_walker::walk(basic_block_def*)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/domwalk.cc:354
0x1963d09 execute

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5666
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jakub Jelinek via Gcc-patches
Sent: Tuesday, June 27, 2023 5:47 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com
Subject: Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

On Tue, Jun 27, 2023 at 11:45:33AM +0200, Richard Biener wrote:
> The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
> ICEs when tree checking is enabled.  This should avoid wrong-code
> in cases like PR110182 and instead ICE.
> 
> It also introduces a TYPE_PRECISION_RAW accessor and adjusts
> places I found that are eligible to use that.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu with all
> languages enabled.
> 
> OK for trunk?  There is definitely going to be fallout but it
> should be straight-forward to fix with quick fixes using
> TYPE_PRECISION_RAW possible.
> 
> Thanks,
> Richard.
> 
>   * tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
>   (TYPE_PRECISION_RAW): Provide raw access to the precision
>   field.
>   * tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
>   (gimple_canonical_types_compatible_p): Likewise.
>   * tree-streamer-out.cc (pack_ts_type_common_value_fields):
>   Stream TYPE_PRECISION_RAW.
>   * tree-streamer-in.cc (unpack_ts_type_common_value_fields):
>   Likewise.
>   * lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.
> 
> gcc/lto/
>   * lto-common.cc (compare_tree_sccs_1): Use TYPE_PRECISION_RAW.

LGTM.

Jakub



Re: [PATCH V3 1/4] rs6000: build constant via li;rotldi

2023-06-28 Thread Jiufu Guo via Gcc-patches


Hi,

Jiufu Guo via Gcc-patches  writes:

> Hi!
>
> Segher Boessenkool  writes:
>
>> Hi!
>>
>> On Fri, Jun 16, 2023 at 04:34:12PM +0800, Jiufu Guo wrote:
>>> +/* Check if value C can be built by 2 instructions: one is 'li', another is
>>> +   rotldi.
>>> +
>>> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
>>> +   is set to -1, and return true.  Return false otherwise.  */
>>
>> Don't say "is set to -1", the point of having this is so you say "is set
>> to the "li" value".  Just like you describe what SHIFT is for.
> Yes, thanks!
>>
>>> +static bool
>>> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
>>> +  HOST_WIDE_INT *mask)
>>> +{
>>> +  int n;
>>
>> Put shis later, like:
> Thanks!
>>
>>> +  /* Check if C can be rotated to a positive or negative value
>>> +  which 'li' instruction is able to load.  */
>>   int n;
>>> +  if (can_be_rotated_to_lowbits (c, 15, )
>>> +  || can_be_rotated_to_lowbits (~c, 15, ))
>>> +{
>>> +  *mask = HOST_WIDE_INT_M1;
>>> +  *shift = HOST_BITS_PER_WIDE_INT - n;
>>> +  return true;
>>> +}
>>
>> It is tricky to see ~c will always work, since what is really done is -c
>> instead.  Can you just use that here?
>
> Some explanation: 
> A negative value of 'li' is:
> 0b11..11xxx there are 49 leading '1's, and the other 15 tailing bits can
> be 0 or 1. With the '~' operation, there are 49 '0's.
> After the value is rotated,  there are still 49 '1's. (xxx may also be
> at head/tail.) 
> For the rotated value, with the '~' operation, there are still 49 '0's.
>
> So, for a value, if there are 49 successive '1's (may cross head/tail).
> It should be able to rotate to low 15 bits after the '~' operation.
>
> It would not be enough if using the '-' operation, since '-x=~x+1' in
> the bit aspect. As the below case 'li_rotldi_3': 0x8531LL
> (rotate left 0x8531 32bit).
> The '~c' is 0x7ace, this can be rotated from 0x7ace. (~0x8531).
> But '-c' is 0x7ace0001. this value is not good.
>
>>
>>> @@ -10266,15 +10291,14 @@ static void
>>>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>>>  {
>>>rtx temp;
>>> +  int shift;
>>> +  HOST_WIDE_INT mask;
>>>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>>>  
>>>ud1 = c & 0x;
>>> -  c = c >> 16;
>>> -  ud2 = c & 0x;
>>> -  c = c >> 16;
>>> -  ud3 = c & 0x;
>>> -  c = c >> 16;
>>> -  ud4 = c & 0x;
>>> +  ud2 = (c >> 16) & 0x;
>>> +  ud3 = (c >> 32) & 0x;
>>> +  ud4 = (c >> 48) & 0x;
>>>  
>>>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>>>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
>>> @@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, 
>>> HOST_WIDE_INT c)
>>>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>>>  GEN_INT ((ud2 ^ 0x) << 16)));
>>>  }
>>> +  else if (can_be_built_by_li_and_rotldi (c, , ))
>>> +{
>>> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>>> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
>>> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
>>> +
>>> +  emit_move_insn (temp, GEN_INT (imm));
>>> +  if (shift != 0)
>>> +   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
>>> +  emit_move_insn (dest, temp);
>>> +}
>>
>> If you would rewrite so it isn't such a run-on thing with "else if",
>> instead using early outs, or even some factoring, you could declare the
>> variable used only in a tiny scope in that tiny scope instead.
>
> Yes! Early returning is better for a lot of cases.  I would like
> to have a refactor patch.
>
>>
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
>>> @@ -0,0 +1,54 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2 -save-temps" } */
>>> +/* { dg-require-effective-target has_arch_ppc64 } */
>>
>> Please put a tiny comment here saying what this test is *for*?  The file
>> name is a bit of hint already, but you can indicate much more in one or
>> two lines :-)
>
> Oh, yes, thanks for point out this!
>
>>
>> With those adjustments, okay for trunk.  Thanks!
>>
>> (If -c doesn't work, it needs more explanation).

The patch is updated, and attached below.
If ok, I would like to commit the patch accordingly.

BR,
Jeff (Jiufu Guo)


If a constant is possible to be rotated to/from a positive or negative
value from "li", then "li;rotldi" can be used to build the constant.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.

---
 gcc/config/rs6000/rs6000.cc   | 47 +--
 .../gcc.target/powerpc/const-build.c  | 57 +++
 2 files changed, 98 insertions(+), 6 deletions(-)
 create mode 100644 

Re: [PATCH V3] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-28 Thread Jiufu Guo via Gcc-patches


Hi,

Jiufu Guo  writes:

> Hi,
>
> Integer expression "(X - N * M) / N" can be optimized to "X / N - M" if
> there is no wrap/overflow/underflow and "X - N * M" has the same sign
> with "X".
>
> Compare with the previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620896.html
> This version changes:
> 1. Remove the behavior to convert 'm' to '-m' for unsigned variable.
>This kind of case is rare, and it makes the code ambiguous.
> 2. Use the 'capture' expression and avoid building new expressions.
> 3. Add APIs like get_range and nonpositive/nonnegative.
> 4. Refactor patterns in match.pd and function names and signatures.
>
> While some APIs are still in gimple-fold.cc/h.  Tried to add them
> to other files, but did not find a better place.
> Thanks for comments/suggestions!

Saving and propagating overflow information in range-op and value-range
maybe one idea.  While I'm wondering if this is a better method from
the aspect of compiling time and memory usage.
As below attached patch, a m_ovf field is added to irange, and maintain
it in range-op/value-range-storage.

BR,
Jeff (Jiufu Guo)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 3ab2c665901..7c287aed8b8 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -261,6 +261,7 @@ range_operator::fold_range (irange , tree type,
relation_trio trio) const
 {
   gcc_checking_assert (r.supports_type_p (type));
+  r.set_overflow (lh.with_overflow () || rh.with_overflow ());
   if (empty_range_varying (r, type, lh, rh))
 return true;
 
@@ -433,6 +434,10 @@ value_range_with_overflow (irange , tree type,
   const unsigned int prec = TYPE_PRECISION (type);
   const bool overflow_wraps = TYPE_OVERFLOW_WRAPS (type);
 
+  if (!TYPE_OVERFLOW_UNDEFINED (type)
+  && (min_ovf != wi::OVF_NONE || max_ovf != wi::OVF_NONE))
+r.set_overflow (true);
+
   // For one bit precision if max != min, then the range covers all
   // values.
   if (prec == 1 && wi::ne_p (wmax, wmin))
@@ -2050,10 +2055,15 @@ operator_mult::wi_fold (irange , tree type,
 
   // Sort the 4 products so that min is in prod0 and max is in
   // prod3.
-  widest2_int prod0 = min0 * min1;
-  widest2_int prod1 = min0 * max1;
-  widest2_int prod2 = max0 * min1;
-  widest2_int prod3 = max0 * max1;
+  wi::overflow_type ovf1, ovf2, ovf3, ovf4;
+  widest2_int prod0 = wi::mul (min0, min1, sign, );
+  widest2_int prod1 = wi::mul (min0, max1, sign, );
+  widest2_int prod2 = wi::mul (max0, min1, sign, );
+  widest2_int prod3 = wi::mul (max0, max1, sign, );
+  if (!TYPE_OVERFLOW_UNDEFINED (type)
+  && (ovf1 != wi::OVF_NONE || ovf2 != wi::OVF_NONE || ovf3 != wi::OVF_NONE
+ || ovf3 != wi::OVF_NONE))
+r.set_overflow (true);
 
   // min0min1 > max0max1
   if (prod0 > prod3)
diff --git a/gcc/value-range-storage.cc b/gcc/value-range-storage.cc
index 2f82739680c..a541c31bde2 100644
--- a/gcc/value-range-storage.cc
+++ b/gcc/value-range-storage.cc
@@ -277,6 +277,7 @@ void
 irange_storage::set_irange (const irange )
 {
   gcc_checking_assert (fits_p (r));
+  m_ovf = r.with_overflow ();
 
   if (r.undefined_p ())
 {
@@ -325,6 +326,7 @@ read_wide_int (wide_int ,
 void
 irange_storage::get_irange (irange , tree type) const
 {
+  r.set_overflow (m_ovf);
   if (m_kind == VR_UNDEFINED)
 {
   r.set_undefined ();
diff --git a/gcc/value-range-storage.h b/gcc/value-range-storage.h
index 99fb815cdc2..fc19009e566 100644
--- a/gcc/value-range-storage.h
+++ b/gcc/value-range-storage.h
@@ -90,6 +90,7 @@ private:
   unsigned char m_num_ranges;
 
   enum value_range_kind m_kind : 3;
+  bool m_ovf;
 
   // The length of this is m_num_ranges * 2 + 1 to accomodate the nonzero bits.
   HOST_WIDE_INT m_val[1];
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 4dad4666a32..468d48547e1 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -147,6 +147,8 @@ public:
   bool contains_p (const wide_int &) const;
   bool nonnegative_p () const;
   bool nonpositive_p () const;
+  bool with_overflow () const { return m_ovf; }
+  void set_overflow (bool ovf) { m_ovf = ovf;}
 
   // In-place operators.
   virtual bool union_ (const vrange &) override;
@@ -199,6 +201,7 @@ private:
   unsigned char m_max_ranges;
   tree m_type;
   wide_int m_nonzero_mask;
+  bool m_ovf;
 protected:
   wide_int *m_base;
 };
@@ -842,6 +845,7 @@ irange::irange (wide_int *base, unsigned nranges, bool 
resizable)
 {
   m_base = base;
   set_undefined ();
+  m_ovf = false;
 }
 
 // Constructors for int_range<>.
>
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this patch ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
>
>   PR tree-optimization/108757
>
> gcc/ChangeLog:
>
>   * gimple-fold.cc (mult_without_overflow_p): New function.
>   (plus_without_overflow_p): New function.
>   (minus_without_overflow_p): New function.
>   (same_sign_p): New function.
>   * gimple-fold.h (mult_without_overflow_p): New declare.
>   (plus_without_overflow_p): New 

RE: [PATCH] x86: Update model values for Alderlake, Rocketlake and Raptorlake.

2023-06-28 Thread Cui, Lili via Gcc-patches
I will directly commit this patch, it can be considered as an obvious patch.

Thanks,
Lili.

> -Original Message-
> From: Gcc-patches  On
> Behalf Of Cui, Lili via Gcc-patches
> Sent: Wednesday, June 28, 2023 6:52 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao 
> Subject: [PATCH] x86: Update model values for Alderlake, Rocketlake and
> Raptorlake.
> 
> Hi Hongtao,
> 
> This patch is to update model values for Alderlake, Rocketlake and
> Raptorlake according to SDM.
> 
> Ok for trunk?
> 
> Thanks.
> Lili.
> 
> Update model values for Alderlake, Rocketlake and Raptorlake according to
> SDM.
> 
> gcc/ChangeLog
> 
>   * common/config/i386/cpuinfo.h (get_intel_cpu): Remove model
> value 0xa8
>   from Rocketlake, move model value 0xbf from Alderlake to
> Raptorlake.
> ---
>  gcc/common/config/i386/cpuinfo.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/gcc/common/config/i386/cpuinfo.h
> b/gcc/common/config/i386/cpuinfo.h
> index 61559ed9de2..ae48bc17771 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -463,7 +463,6 @@ get_intel_cpu (struct __processor_model
> *cpu_model,
>cpu_model->__cpu_subtype = INTEL_COREI7_SKYLAKE;
>break;
>  case 0xa7:
> -case 0xa8:
>/* Rocket Lake.  */
>cpu = "rocketlake";
>CHECK___builtin_cpu_is ("corei7"); @@ -536,9 +535,9 @@ get_intel_cpu
> (struct __processor_model *cpu_model,
>break;
>  case 0x97:
>  case 0x9a:
> -case 0xbf:
>/* Alder Lake.  */
>  case 0xb7:
> +case 0xbf:
>/* Raptor Lake.  */
>  case 0xaa:
>  case 0xac:
> --
> 2.25.1



RE: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV floating-point add

2023-06-28 Thread Li, Pan2 via Gcc-patches
Sure thing, echo on below part. I think we need one place to put something like 
summary for this, for example, a table to indicate some information about this 
(aka rounding mode needed or not). I will try to summarize one draft then.

> Check SPIKE implementation, make sure which API needs rounding mode, which 
> API doesn't need rounding mode.
> Do not trust the rvv-intrinsic-doc since it's often wrong.
> You should check doc too, if doc is wrong,  you should not only correct GCC 
> implementation but also make a fix PR to the doc.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, June 29, 2023 10:44 AM
To: Kito.cheng ; Li, Pan2 
Cc: gcc-patches ; Wang, Yanzhang 
; jeffreyalaw 
Subject: Re: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
floating-point add

Hi, Pan.

I think the last step is to support dynamic mode switching which may need to 
change the mode-switching PASS.

After this done, I suggest you go over all rounding mode API (including 
fixed-point and floating-point.)

Check SPIKE implementation, make sure which API needs rounding mode, which API 
doesn't need rounding mode.
Do not trust the rvv-intrinsic-doc since it's often wrong.
You should check doc too, if doc is wrong,  you should not only correct GCC 
implementation but also make a fix PR to the doc.

Thanks.

juzhe.zh...@rivai.ai

From: Kito Cheng
Date: 2023-06-29 10:35
To: Li, Pan2
CC: juzhe.zh...@rivai.ai; 
gcc-patches; Wang, 
Yanzhang; 
jeffreyalaw
Subject: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
floating-point add
LGTM, thanks!

On Tue, Jun 27, 2023 at 3:02 PM Li, Pan2 
mailto:pan2...@intel.com>> wrote:
>
> Ack, thanks Juzhe.
>
>
>
> Pan
>
>
>
> From: juzhe.zh...@rivai.ai 
> mailto:juzhe.zh...@rivai.ai>>
> Sent: Tuesday, June 27, 2023 3:00 PM
> To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
> mailto:gcc-patches@gcc.gnu.org>>
> Cc: Kito.cheng mailto:kito.ch...@sifive.com>>; Li, 
> Pan2 mailto:pan2...@intel.com>>; Wang, Yanzhang 
> mailto:yanzhang.w...@intel.com>>; jeffreyalaw 
> mailto:jeffreya...@gmail.com>>
> Subject: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
> floating-point add
>
>
>
> LGTM.
>
> You can go ahead to implement rounding mode of floating-point by 
> mode-switching:
>
>
>
> Suggest you implement rounding mode for floating-poing as follows:
>
>
>
> 1st step: Implement mode-switching for floating-point rounding mode except 
> DYNAMIC which should be totally same as fixed-point.
>
> 2nd step: Support DYNAMIC rounding mode on mode-switching which may need to 
> modify the mode-switching PASS.
>
>
>
> Thanks.
>
> 
>
> juzhe.zh...@rivai.ai
>
>
>
> From: pan2.li
>
> Date: 2023-06-27 14:06
>
> To: gcc-patches
>
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang; jeffreyalaw
>
> Subject: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
> floating-point add
>
> From: Pan Li mailto:pan2...@intel.com>>
>
>
>
> According to the doc as below, we need to support the rounding mode of
>
> the RVV floating-point, both the static and dynamice frm.
>
>
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
>
>
> For tracking and development friendly, We will take some steps to support
>
> all rounding modes for the RVV floating-point rounding modes.
>
>
>
> 1. Allow rounding mode control by one intrinsic (aka this patch), vfadd.
>
> 2. Support static rounding mode control by mode switch, like fixed-point.
>
> 3. Support dynamice round mode control by mode switch.
>
> 4. Support the rest floating-point instructions for frm.
>
>
>
> Please *NOTE* this patch only allow the rounding mode control for the
>
> vfadd intrinsic API, and the related frm will be coverred by step 2.
>
>
>
> Signed-off-by: Pan Li mailto:pan2...@intel.com>>
>
> Co-Authored by: Juzhe-Zhong 
> mailto:juzhe.zh...@rivai.ai>>
>
>
>
> gcc/ChangeLog:
>
>
>
> * config/riscv/riscv-protos.h (enum floating_point_rounding_mode):
>
> Add macro for static frm min and max.
>
> * config/riscv/riscv-vector-builtins-bases.cc
>
> (class binop_frm): New class for floating-point with frm.
>
> (BASE): Add vfadd for frm.
>
> * config/riscv/riscv-vector-builtins-bases.h: Likewise.
>
> * config/riscv/riscv-vector-builtins-functions.def
>
> (vfadd_frm): Likewise.
>
> * config/riscv/riscv-vector-builtins-shapes.cc
>
> (struct alu_frm_def): New struct for alu with frm.
>
> (SHAPE): Add alu with frm.
>
> * config/riscv/riscv-vector-builtins-shapes.h: Likewise.
>
> * config/riscv/riscv-vector-builtins.cc
>
> (function_checker::report_out_of_range_and_not): New function
>
> for report out of range and not val.
>
> (function_checker::require_immediate_range_or): New function
>
> for checking in 

[PATCH] i386: refactor macros.

2023-06-28 Thread Hu, Lin1 via Gcc-patches
Hi, all

This patch aims to refactor macros in case some other thing is added to
AMX_TILE_SET in future. OK for trunk?

BRs,
Lin

gcc/ChangeLog:

* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_INT8_SET):
Change OPTION_MASK_ISA2_AMX_TILE to OPTION_MASK_ISA2_AMX_TILE_SET.
(OPTION_MASK_ISA2_AMX_FP16_SET): Ditto
(OPTION_MASK_ISA2_AMX_COMPLEX_SET): Ditto
(OPTION_MASK_ISA_ABM_SET):
Change OPTION_MASK_ISA_POPCNT to OPTION_MASK_ISA_POPCNT_SET.
---
 gcc/common/config/i386/i386-common.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index bf126f14073..4f79afba917 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -107,18 +107,18 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_SET 
OPTION_MASK_ISA2_AVX512VP2INTERSECT
 #define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE
 #define OPTION_MASK_ISA2_AMX_INT8_SET \
-  (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA2_AMX_INT8)
+  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_INT8)
 #define OPTION_MASK_ISA2_AMX_BF16_SET \
-  (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA2_AMX_BF16)
+  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_BF16)
 #define OPTION_MASK_ISA2_AVXVNNIINT8_SET OPTION_MASK_ISA2_AVXVNNIINT8
 #define OPTION_MASK_ISA2_AVXNECONVERT_SET OPTION_MASK_ISA2_AVXNECONVERT
 #define OPTION_MASK_ISA2_CMPCCXADD_SET OPTION_MASK_ISA2_CMPCCXADD
 #define OPTION_MASK_ISA2_AMX_FP16_SET \
-  (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA2_AMX_FP16)
+  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_FP16)
 #define OPTION_MASK_ISA2_PREFETCHI_SET OPTION_MASK_ISA2_PREFETCHI
 #define OPTION_MASK_ISA2_RAOINT_SET OPTION_MASK_ISA2_RAOINT
 #define OPTION_MASK_ISA2_AMX_COMPLEX_SET \
-  (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA2_AMX_COMPLEX)
+  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_COMPLEX)
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -143,7 +143,7 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_PCLMUL | OPTION_MASK_ISA_SSE2_SET)
 
 #define OPTION_MASK_ISA_ABM_SET \
-  (OPTION_MASK_ISA_ABM | OPTION_MASK_ISA_POPCNT)
+  (OPTION_MASK_ISA_ABM | OPTION_MASK_ISA_POPCNT_SET)
 
 #define OPTION_MASK_ISA2_PCONFIG_SET OPTION_MASK_ISA2_PCONFIG
 #define OPTION_MASK_ISA2_WBNOINVD_SET OPTION_MASK_ISA2_WBNOINVD
-- 
2.31.1



Re: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV floating-point add

2023-06-28 Thread juzhe.zh...@rivai.ai
Hi, Pan.

I think the last step is to support dynamic mode switching which may need to 
change the mode-switching PASS.

After this done, I suggest you go over all rounding mode API (including 
fixed-point and floating-point.)

Check SPIKE implementation, make sure which API needs rounding mode, which API 
doesn't need rounding mode.
Do not trust the rvv-intrinsic-doc since it's often wrong. 
You should check doc too, if doc is wrong,  you should not only correct GCC 
implementation but also make a fix PR to the doc.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-29 10:35
To: Li, Pan2
CC: juzhe.zh...@rivai.ai; gcc-patches; Wang, Yanzhang; jeffreyalaw
Subject: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
floating-point add
LGTM, thanks!
 
On Tue, Jun 27, 2023 at 3:02 PM Li, Pan2  wrote:
>
> Ack, thanks Juzhe.
>
>
>
> Pan
>
>
>
> From: juzhe.zh...@rivai.ai 
> Sent: Tuesday, June 27, 2023 3:00 PM
> To: Li, Pan2 ; gcc-patches 
> Cc: Kito.cheng ; Li, Pan2 ; Wang, 
> Yanzhang ; jeffreyalaw 
> Subject: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
> floating-point add
>
>
>
> LGTM.
>
> You can go ahead to implement rounding mode of floating-point by 
> mode-switching:
>
>
>
> Suggest you implement rounding mode for floating-poing as follows:
>
>
>
> 1st step: Implement mode-switching for floating-point rounding mode except 
> DYNAMIC which should be totally same as fixed-point.
>
> 2nd step: Support DYNAMIC rounding mode on mode-switching which may need to 
> modify the mode-switching PASS.
>
>
>
> Thanks.
>
> 
>
> juzhe.zh...@rivai.ai
>
>
>
> From: pan2.li
>
> Date: 2023-06-27 14:06
>
> To: gcc-patches
>
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang; jeffreyalaw
>
> Subject: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
> floating-point add
>
> From: Pan Li 
>
>
>
> According to the doc as below, we need to support the rounding mode of
>
> the RVV floating-point, both the static and dynamice frm.
>
>
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
>
>
> For tracking and development friendly, We will take some steps to support
>
> all rounding modes for the RVV floating-point rounding modes.
>
>
>
> 1. Allow rounding mode control by one intrinsic (aka this patch), vfadd.
>
> 2. Support static rounding mode control by mode switch, like fixed-point.
>
> 3. Support dynamice round mode control by mode switch.
>
> 4. Support the rest floating-point instructions for frm.
>
>
>
> Please *NOTE* this patch only allow the rounding mode control for the
>
> vfadd intrinsic API, and the related frm will be coverred by step 2.
>
>
>
> Signed-off-by: Pan Li 
>
> Co-Authored by: Juzhe-Zhong 
>
>
>
> gcc/ChangeLog:
>
>
>
> * config/riscv/riscv-protos.h (enum floating_point_rounding_mode):
>
> Add macro for static frm min and max.
>
> * config/riscv/riscv-vector-builtins-bases.cc
>
> (class binop_frm): New class for floating-point with frm.
>
> (BASE): Add vfadd for frm.
>
> * config/riscv/riscv-vector-builtins-bases.h: Likewise.
>
> * config/riscv/riscv-vector-builtins-functions.def
>
> (vfadd_frm): Likewise.
>
> * config/riscv/riscv-vector-builtins-shapes.cc
>
> (struct alu_frm_def): New struct for alu with frm.
>
> (SHAPE): Add alu with frm.
>
> * config/riscv/riscv-vector-builtins-shapes.h: Likewise.
>
> * config/riscv/riscv-vector-builtins.cc
>
> (function_checker::report_out_of_range_and_not): New function
>
> for report out of range and not val.
>
> (function_checker::require_immediate_range_or): New function
>
> for checking in range or one val.
>
> * config/riscv/riscv-vector-builtins.h: Add function decl.
>
>
>
> gcc/testsuite/ChangeLog:
>
>
>
> * gcc.target/riscv/rvv/base/float-point-frm-error.c: New test.
>
> * gcc.target/riscv/rvv/base/float-point-frm.c: New test.
>
> ---
>
> gcc/config/riscv/riscv-protos.h   |  2 +
>
> .../riscv/riscv-vector-builtins-bases.cc  | 25 +++
>
> .../riscv/riscv-vector-builtins-bases.h   |  1 +
>
> .../riscv/riscv-vector-builtins-functions.def |  2 +
>
> .../riscv/riscv-vector-builtins-shapes.cc | 68 +++
>
> .../riscv/riscv-vector-builtins-shapes.h  |  1 +
>
> gcc/config/riscv/riscv-vector-builtins.cc | 41 +++
>
> gcc/config/riscv/riscv-vector-builtins.h  |  4 ++
>
> .../riscv/rvv/base/float-point-frm-error.c| 15 
>
> .../riscv/rvv/base/float-point-frm.c  | 30 
>
> 10 files changed, 189 insertions(+)
>
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
>
>
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>
> index f686edab3d1..bee64eee504 100644
>
> --- a/gcc/config/riscv/riscv-protos.h
>
> +++ b/gcc/config/riscv/riscv-protos.h
>
> @@ -278,6 +278,8 @@ enum floating_point_rounding_mode
>
>FRM_RUP = 3, /* Aka 0b011.  */
>
>FRM_RMM = 4, 

Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV floating-point add

2023-06-28 Thread Kito Cheng via Gcc-patches
LGTM, thanks!

On Tue, Jun 27, 2023 at 3:02 PM Li, Pan2  wrote:
>
> Ack, thanks Juzhe.
>
>
>
> Pan
>
>
>
> From: juzhe.zh...@rivai.ai 
> Sent: Tuesday, June 27, 2023 3:00 PM
> To: Li, Pan2 ; gcc-patches 
> Cc: Kito.cheng ; Li, Pan2 ; Wang, 
> Yanzhang ; jeffreyalaw 
> Subject: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
> floating-point add
>
>
>
> LGTM.
>
> You can go ahead to implement rounding mode of floating-point by 
> mode-switching:
>
>
>
> Suggest you implement rounding mode for floating-poing as follows:
>
>
>
> 1st step: Implement mode-switching for floating-point rounding mode except 
> DYNAMIC which should be totally same as fixed-point.
>
> 2nd step: Support DYNAMIC rounding mode on mode-switching which may need to 
> modify the mode-switching PASS.
>
>
>
> Thanks.
>
> 
>
> juzhe.zh...@rivai.ai
>
>
>
> From: pan2.li
>
> Date: 2023-06-27 14:06
>
> To: gcc-patches
>
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang; jeffreyalaw
>
> Subject: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
> floating-point add
>
> From: Pan Li 
>
>
>
> According to the doc as below, we need to support the rounding mode of
>
> the RVV floating-point, both the static and dynamice frm.
>
>
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
>
>
> For tracking and development friendly, We will take some steps to support
>
> all rounding modes for the RVV floating-point rounding modes.
>
>
>
> 1. Allow rounding mode control by one intrinsic (aka this patch), vfadd.
>
> 2. Support static rounding mode control by mode switch, like fixed-point.
>
> 3. Support dynamice round mode control by mode switch.
>
> 4. Support the rest floating-point instructions for frm.
>
>
>
> Please *NOTE* this patch only allow the rounding mode control for the
>
> vfadd intrinsic API, and the related frm will be coverred by step 2.
>
>
>
> Signed-off-by: Pan Li 
>
> Co-Authored by: Juzhe-Zhong 
>
>
>
> gcc/ChangeLog:
>
>
>
> * config/riscv/riscv-protos.h (enum floating_point_rounding_mode):
>
> Add macro for static frm min and max.
>
> * config/riscv/riscv-vector-builtins-bases.cc
>
> (class binop_frm): New class for floating-point with frm.
>
> (BASE): Add vfadd for frm.
>
> * config/riscv/riscv-vector-builtins-bases.h: Likewise.
>
> * config/riscv/riscv-vector-builtins-functions.def
>
> (vfadd_frm): Likewise.
>
> * config/riscv/riscv-vector-builtins-shapes.cc
>
> (struct alu_frm_def): New struct for alu with frm.
>
> (SHAPE): Add alu with frm.
>
> * config/riscv/riscv-vector-builtins-shapes.h: Likewise.
>
> * config/riscv/riscv-vector-builtins.cc
>
> (function_checker::report_out_of_range_and_not): New function
>
> for report out of range and not val.
>
> (function_checker::require_immediate_range_or): New function
>
> for checking in range or one val.
>
> * config/riscv/riscv-vector-builtins.h: Add function decl.
>
>
>
> gcc/testsuite/ChangeLog:
>
>
>
> * gcc.target/riscv/rvv/base/float-point-frm-error.c: New test.
>
> * gcc.target/riscv/rvv/base/float-point-frm.c: New test.
>
> ---
>
> gcc/config/riscv/riscv-protos.h   |  2 +
>
> .../riscv/riscv-vector-builtins-bases.cc  | 25 +++
>
> .../riscv/riscv-vector-builtins-bases.h   |  1 +
>
> .../riscv/riscv-vector-builtins-functions.def |  2 +
>
> .../riscv/riscv-vector-builtins-shapes.cc | 68 +++
>
> .../riscv/riscv-vector-builtins-shapes.h  |  1 +
>
> gcc/config/riscv/riscv-vector-builtins.cc | 41 +++
>
> gcc/config/riscv/riscv-vector-builtins.h  |  4 ++
>
> .../riscv/rvv/base/float-point-frm-error.c| 15 
>
> .../riscv/rvv/base/float-point-frm.c  | 30 
>
> 10 files changed, 189 insertions(+)
>
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
>
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
>
>
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>
> index f686edab3d1..bee64eee504 100644
>
> --- a/gcc/config/riscv/riscv-protos.h
>
> +++ b/gcc/config/riscv/riscv-protos.h
>
> @@ -278,6 +278,8 @@ enum floating_point_rounding_mode
>
>FRM_RUP = 3, /* Aka 0b011.  */
>
>FRM_RMM = 4, /* Aka 0b100.  */
>
>FRM_DYN = 7, /* Aka 0b111.  */
>
> +  FRM_STATIC_MIN = FRM_RNE,
>
> +  FRM_STATIC_MAX = FRM_RMM,
>
> };
>
> opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
> index 5c8deda900d..1b4c2c6ad66 100644
>
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
> @@ -281,6 +281,29 @@ public:
>
>}
>
> };
>
> +/* Implements below instructions for now.
>
> +   - vfadd
>
> +*/
>
> +template
>
> +class binop_frm : public function_base
>
> +{
>
> +public:
>
> +  bool has_rounding_mode_operand_p () const override { return true; }
>
> +
>
> +  

Re: [PATCH v1] RISC-V: Support vfadd static rounding mode by mode switching

2023-06-28 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Thu, Jun 29, 2023 at 10:24 AM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: pan2.li
> Date: 2023-06-29 09:40
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang; jeffreyalaw
> Subject: [PATCH v1] RISC-V: Support vfadd static rounding mode by mode 
> switching
> From: Pan Li 
>
> This patch would like to support the vfadd static round mode similar to
> the fixed-point. Then the related fsrm instructions will be inserted
> correlatively.
>
> Please *NOTE* this PATCH doesn't cover anything about FRM dynamic mode,
> it will be implemented in the underlying PATCH(s).
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_emit_mode_set): Add emit for FRM.
> (riscv_mode_needed): Likewise.
> (riscv_entity_mode_after): Likewise.
> (riscv_mode_after): Likewise.
> (riscv_mode_entry): Likewise.
> (riscv_mode_exit): Likewise.
> * config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Add number
> for FRM.
> * config/riscv/riscv.md: Add FRM register.
> * config/riscv/vector-iterators.md: Add FRM type.
> * config/riscv/vector.md (frm_mode): Define new attr for FRM mode.
> (fsrm): Define new insn for fsrm instruction.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-2.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-3.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-4.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-5.c: New test.
> ---
> gcc/config/riscv/riscv.cc | 52 ++
> gcc/config/riscv/riscv.h  |  4 +-
> gcc/config/riscv/riscv.md |  4 +-
> gcc/config/riscv/vector-iterators.md  |  2 +
> gcc/config/riscv/vector.md| 53 +++
> .../riscv/rvv/base/float-point-frm-insert-1.c | 31 +++
> .../riscv/rvv/base/float-point-frm-insert-2.c | 14 +
> .../riscv/rvv/base/float-point-frm-insert-3.c | 14 +
> .../riscv/rvv/base/float-point-frm-insert-4.c | 23 
> .../riscv/rvv/base/float-point-frm-insert-5.c | 23 
> 10 files changed, 206 insertions(+), 14 deletions(-)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-2.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-3.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-4.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-5.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 280aa0b33b9..e4dc8115e69 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7669,6 +7669,16 @@ riscv_emit_mode_set (int entity, int mode, int 
> prev_mode,
>if (mode != VXRM_MODE_NONE && mode != prev_mode)
> emit_insn (gen_vxrmsi (gen_int_mode (mode, SImode)));
>break;
> +case RISCV_FRM:
> +  if (mode != FRM_MODE_NONE && mode != prev_mode)
> + {
> +   rtx scaler = gen_reg_rtx (SImode);
> +   rtx imm = gen_int_mode (mode, SImode);
> +
> +   emit_insn (gen_movsi (scaler, imm));
> +   emit_insn (gen_fsrm (scaler, scaler));
> + }
> +  break;
>  default:
>gcc_unreachable ();
>  }
> @@ -7680,11 +7690,14 @@ riscv_emit_mode_set (int entity, int mode, int 
> prev_mode,
> static int
> riscv_mode_needed (int entity, rtx_insn *insn)
> {
> +  int code = recog_memoized (insn);
> +
>switch (entity)
>  {
>  case RISCV_VXRM:
> -  return recog_memoized (insn) >= 0 ? get_attr_vxrm_mode (insn)
> - : VXRM_MODE_NONE;
> +  return code >= 0 ? get_attr_vxrm_mode (insn) : VXRM_MODE_NONE;
> +case RISCV_FRM:
> +  return code >= 0 ? get_attr_frm_mode (insn) : FRM_MODE_NONE;
>  default:
>gcc_unreachable ();
>  }
> @@ -7715,6 +7728,21 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
> regno)
>return false;
> }
> +static int
> +riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
> + int (*get_attr_mode) (rtx_insn *), int default_mode)
> +{
> +  if (global_state_unknown_p (insn, regnum))
> +return default_mode;
> +  else if (recog_memoized (insn) < 0)
> +return mode;
> +
> +  rtx reg = gen_rtx_REG (SImode, regnum);
> +  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
> +
> +  return mentioned_p ? get_attr_mode (insn): mode;
> +}
> +
> /* Return the mode that an insn results in.  */
> static int
> @@ -7723,15 +7751,13 @@ riscv_mode_after (int entity, int mode, rtx_insn 
> *insn)
>switch (entity)
>  {
>  case RISCV_VXRM:
> -  if (global_state_unknown_p (insn, VXRM_REGNUM))
> - return VXRM_MODE_NONE;
> -  else if (recog_memoized (insn) >= 0)
> - return reg_mentioned_p (gen_rtx_REG 

Re: [PATCH v1] RISC-V: Support vfadd static rounding mode by mode switching

2023-06-28 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-29 09:40
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang; jeffreyalaw
Subject: [PATCH v1] RISC-V: Support vfadd static rounding mode by mode switching
From: Pan Li 
 
This patch would like to support the vfadd static round mode similar to
the fixed-point. Then the related fsrm instructions will be inserted
correlatively.
 
Please *NOTE* this PATCH doesn't cover anything about FRM dynamic mode,
it will be implemented in the underlying PATCH(s).
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_emit_mode_set): Add emit for FRM.
(riscv_mode_needed): Likewise.
(riscv_entity_mode_after): Likewise.
(riscv_mode_after): Likewise.
(riscv_mode_entry): Likewise.
(riscv_mode_exit): Likewise.
* config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Add number
for FRM.
* config/riscv/riscv.md: Add FRM register.
* config/riscv/vector-iterators.md: Add FRM type.
* config/riscv/vector.md (frm_mode): Define new attr for FRM mode.
(fsrm): Define new insn for fsrm instruction.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-3.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-4.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-5.c: New test.
---
gcc/config/riscv/riscv.cc | 52 ++
gcc/config/riscv/riscv.h  |  4 +-
gcc/config/riscv/riscv.md |  4 +-
gcc/config/riscv/vector-iterators.md  |  2 +
gcc/config/riscv/vector.md| 53 +++
.../riscv/rvv/base/float-point-frm-insert-1.c | 31 +++
.../riscv/rvv/base/float-point-frm-insert-2.c | 14 +
.../riscv/rvv/base/float-point-frm-insert-3.c | 14 +
.../riscv/rvv/base/float-point-frm-insert-4.c | 23 
.../riscv/rvv/base/float-point-frm-insert-5.c | 23 
10 files changed, 206 insertions(+), 14 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-5.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 280aa0b33b9..e4dc8115e69 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7669,6 +7669,16 @@ riscv_emit_mode_set (int entity, int mode, int prev_mode,
   if (mode != VXRM_MODE_NONE && mode != prev_mode)
emit_insn (gen_vxrmsi (gen_int_mode (mode, SImode)));
   break;
+case RISCV_FRM:
+  if (mode != FRM_MODE_NONE && mode != prev_mode)
+ {
+   rtx scaler = gen_reg_rtx (SImode);
+   rtx imm = gen_int_mode (mode, SImode);
+
+   emit_insn (gen_movsi (scaler, imm));
+   emit_insn (gen_fsrm (scaler, scaler));
+ }
+  break;
 default:
   gcc_unreachable ();
 }
@@ -7680,11 +7690,14 @@ riscv_emit_mode_set (int entity, int mode, int 
prev_mode,
static int
riscv_mode_needed (int entity, rtx_insn *insn)
{
+  int code = recog_memoized (insn);
+
   switch (entity)
 {
 case RISCV_VXRM:
-  return recog_memoized (insn) >= 0 ? get_attr_vxrm_mode (insn)
- : VXRM_MODE_NONE;
+  return code >= 0 ? get_attr_vxrm_mode (insn) : VXRM_MODE_NONE;
+case RISCV_FRM:
+  return code >= 0 ? get_attr_frm_mode (insn) : FRM_MODE_NONE;
 default:
   gcc_unreachable ();
 }
@@ -7715,6 +7728,21 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
regno)
   return false;
}
+static int
+riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
+ int (*get_attr_mode) (rtx_insn *), int default_mode)
+{
+  if (global_state_unknown_p (insn, regnum))
+return default_mode;
+  else if (recog_memoized (insn) < 0)
+return mode;
+
+  rtx reg = gen_rtx_REG (SImode, regnum);
+  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
+
+  return mentioned_p ? get_attr_mode (insn): mode;
+}
+
/* Return the mode that an insn results in.  */
static int
@@ -7723,15 +7751,13 @@ riscv_mode_after (int entity, int mode, rtx_insn *insn)
   switch (entity)
 {
 case RISCV_VXRM:
-  if (global_state_unknown_p (insn, VXRM_REGNUM))
- return VXRM_MODE_NONE;
-  else if (recog_memoized (insn) >= 0)
- return reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM),
- PATTERN (insn))
- ? get_attr_vxrm_mode (insn)
- : mode;
-  else
- return mode;
+  return riscv_entity_mode_after (VXRM_REGNUM, insn, mode,
+   (int (*)(rtx_insn *)) get_attr_vxrm_mode,
+   VXRM_MODE_NONE);
+case RISCV_FRM:
+  return riscv_entity_mode_after (FRM_REGNUM, insn, mode,
+   (int (*)(rtx_insn *)) 

[PATCH] PR gcc/110148:Avoid adding loop-carried ops to long chains

2023-06-28 Thread Cui, Lili via Gcc-patches
From: Lili Cui 

Hi Maintainer

This patch is to fix TSVC242 regression related to loop-carried ops.

Bootstrapped and regtested. Ok for trunk?

Regards
Lili.

Avoid adding loop-carried ops to long chains, otherwise the whole chain will
have dependencies across the loop iteration. Just keep loop-carried ops in a
separate chain.
   E.g.
   x_1 = phi(x_0, x_2)
   y_1 = phi(y_0, y_2)

   a + b + c + d + e + x1 + y1

   SSA1 = a + b;
   SSA2 = c + d;
   SSA3 = SSA1 + e;
   SSA4 = SSA3 + SSA2;
   SSA5 = x1 + y1;
   SSA6 = SSA4 + SSA5;

With the patch applied, these test cases improved by 32%~100%.

S242:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];}

Case 1:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}

Case 2:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + b[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}

The value is the execution time
A: original version
B: with FMA patch g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409(base on A)
C: with current patch(base on B)

  A   B   C B/A C/A
s2422.859   5.152   2.859   1.802028681 1
case 1  5.489   5.488   3.511   0.9998180.64
case 2  7.216   7.499   4.885   1.0392180.68

gcc/ChangeLog:
PR tree-optimization/110148
* tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Handle loop-carried
ops in this function.
---
 gcc/tree-ssa-reassoc.cc | 236 
 1 file changed, 167 insertions(+), 69 deletions(-)

diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc
index 96c88ec003e..f5da385e0b2 100644
--- a/gcc/tree-ssa-reassoc.cc
+++ b/gcc/tree-ssa-reassoc.cc
@@ -5471,37 +5471,62 @@ get_reassociation_width (int ops_num, enum tree_code 
opc,
   return width;
 }
 
+#define SPECIAL_BIASED_END_STMT 0 /* It is the end stmt of all ops.  */
+#define BIASED_END_STMT 1 /* It is the end stmt of normal or biased ops.  */
+#define NORMAL_END_STMT 2 /* It is the end stmt of normal ops.  */
+
 /* Rewrite statements with dependency chain with regard the chance to generate
FMA.
For the chain with FMA: Try to keep fma opportunity as much as possible.
For the chain without FMA: Putting the computation in rank order and trying
to allow operations to be executed in parallel.
E.g.
-   e + f + g + a * b + c * d;
+   e + f + a * b + c * d;
 
-   ssa1 = e + f;
-   ssa2 = g + a * b;
-   ssa3 = ssa1 + c * d;
-   ssa4 = ssa2 + ssa3;
+   ssa1 = e + a * b;
+   ssa2 = f + c * d;
+   ssa3 = ssa1 + ssa2;
 
This reassociation approach preserves the chance of fma generation as much
-   as possible.  */
+   as possible.
+
+   Another thing is to avoid adding loop-carried ops to long chains, otherwise
+   the whole chain will have dependencies across the loop iteration. Just keep
+   loop-carried ops in a separate chain.
+   E.g.
+   x_1 = phi(x_0, x_2)
+   y_1 = phi(y_0, y_2)
+
+   a + b + c + d + e + x1 + y1
+
+   SSA1 = a + b;
+   SSA2 = c + d;
+   SSA3 = SSA1 + e;
+   SSA4 = SSA3 + SSA2;
+   SSA5 = x1 + y1;
+   SSA6 = SSA4 + SSA5;
+ */
 static void
 rewrite_expr_tree_parallel (gassign *stmt, int width, bool has_fma,
-const vec )
+   const vec )
 {
   enum tree_code opcode = gimple_assign_rhs_code (stmt);
   int op_num = ops.length ();
+  int op_normal_num = op_num;
   gcc_assert (op_num > 0);
   int stmt_num = op_num - 1;
   gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
-  int op_index = op_num - 1;
-  int width_count = width;
   int i = 0, j = 0;
   tree tmp_op[2], op1;
   operand_entry *oe;
   gimple *stmt1 = NULL;
   tree last_rhs1 = gimple_assign_rhs1 (stmt);
+  int last_rhs1_stmt_index = 0, last_rhs2_stmt_index = 0; 
+  int width_active = 0, width_count = 0;
+  bool has_biased = false, ops_changed = false;
+  auto_vec ops_normal;
+  auto_vec ops_biased;
+  vec *ops1;
 
   /* We start expression rewriting from the top statements.
  So, in this loop we create a full list of statements
@@ -5510,83 +5535,155 @@ rewrite_expr_tree_parallel (gassign *stmt, int width, 
bool has_fma,
   for (i = stmt_num - 2; i >= 0; i--)
 stmts[i] = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmts[i+1]));
 
-  /* Width should not be larger than op_num / 2, since we can not create
+  /* Avoid adding loop-carried ops to long chains, first filter out the
+ loop-carried.  But we need to make sure that the length of the remainder
+ is not less than 4, which is the smallest ops length we can break the
+ dependency.  */
+  FOR_EACH_VEC_ELT (ops, i, oe)
+{
+  if (TREE_CODE (oe->op) == SSA_NAME
+ && bitmap_bit_p (biased_names, SSA_NAME_VERSION (oe->op))
+ && op_normal_num > 4)
+   {
+ ops_biased.safe_push (oe);
+ has_biased = true;
+ op_normal_num --;
+   }
+  else
+   ops_normal.safe_push (oe);
+}
+
+  /* Width should not be larger than ops length 

[PATCH v1] RISC-V: Support vfadd static rounding mode by mode switching

2023-06-28 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the vfadd static round mode similar to
the fixed-point. Then the related fsrm instructions will be inserted
correlatively.

Please *NOTE* this PATCH doesn't cover anything about FRM dynamic mode,
it will be implemented in the underlying PATCH(s).

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_emit_mode_set): Add emit for FRM.
(riscv_mode_needed): Likewise.
(riscv_entity_mode_after): Likewise.
(riscv_mode_after): Likewise.
(riscv_mode_entry): Likewise.
(riscv_mode_exit): Likewise.
* config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Add number
for FRM.
* config/riscv/riscv.md: Add FRM register.
* config/riscv/vector-iterators.md: Add FRM type.
* config/riscv/vector.md (frm_mode): Define new attr for FRM mode.
(fsrm): Define new insn for fsrm instruction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-3.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-4.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-5.c: New test.
---
 gcc/config/riscv/riscv.cc | 52 ++
 gcc/config/riscv/riscv.h  |  4 +-
 gcc/config/riscv/riscv.md |  4 +-
 gcc/config/riscv/vector-iterators.md  |  2 +
 gcc/config/riscv/vector.md| 53 +++
 .../riscv/rvv/base/float-point-frm-insert-1.c | 31 +++
 .../riscv/rvv/base/float-point-frm-insert-2.c | 14 +
 .../riscv/rvv/base/float-point-frm-insert-3.c | 14 +
 .../riscv/rvv/base/float-point-frm-insert-4.c | 23 
 .../riscv/rvv/base/float-point-frm-insert-5.c | 23 
 10 files changed, 206 insertions(+), 14 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-5.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 280aa0b33b9..e4dc8115e69 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7669,6 +7669,16 @@ riscv_emit_mode_set (int entity, int mode, int prev_mode,
   if (mode != VXRM_MODE_NONE && mode != prev_mode)
emit_insn (gen_vxrmsi (gen_int_mode (mode, SImode)));
   break;
+case RISCV_FRM:
+  if (mode != FRM_MODE_NONE && mode != prev_mode)
+   {
+ rtx scaler = gen_reg_rtx (SImode);
+ rtx imm = gen_int_mode (mode, SImode);
+
+ emit_insn (gen_movsi (scaler, imm));
+ emit_insn (gen_fsrm (scaler, scaler));
+   }
+  break;
 default:
   gcc_unreachable ();
 }
@@ -7680,11 +7690,14 @@ riscv_emit_mode_set (int entity, int mode, int 
prev_mode,
 static int
 riscv_mode_needed (int entity, rtx_insn *insn)
 {
+  int code = recog_memoized (insn);
+
   switch (entity)
 {
 case RISCV_VXRM:
-  return recog_memoized (insn) >= 0 ? get_attr_vxrm_mode (insn)
-   : VXRM_MODE_NONE;
+  return code >= 0 ? get_attr_vxrm_mode (insn) : VXRM_MODE_NONE;
+case RISCV_FRM:
+  return code >= 0 ? get_attr_frm_mode (insn) : FRM_MODE_NONE;
 default:
   gcc_unreachable ();
 }
@@ -7715,6 +7728,21 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
regno)
   return false;
 }
 
+static int
+riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
+int (*get_attr_mode) (rtx_insn *), int default_mode)
+{
+  if (global_state_unknown_p (insn, regnum))
+return default_mode;
+  else if (recog_memoized (insn) < 0)
+return mode;
+
+  rtx reg = gen_rtx_REG (SImode, regnum);
+  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
+
+  return mentioned_p ? get_attr_mode (insn): mode;
+}
+
 /* Return the mode that an insn results in.  */
 
 static int
@@ -7723,15 +7751,13 @@ riscv_mode_after (int entity, int mode, rtx_insn *insn)
   switch (entity)
 {
 case RISCV_VXRM:
-  if (global_state_unknown_p (insn, VXRM_REGNUM))
-   return VXRM_MODE_NONE;
-  else if (recog_memoized (insn) >= 0)
-   return reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM),
-   PATTERN (insn))
-? get_attr_vxrm_mode (insn)
-: mode;
-  else
-   return mode;
+  return riscv_entity_mode_after (VXRM_REGNUM, insn, mode,
+ (int (*)(rtx_insn *)) get_attr_vxrm_mode,
+   

Re: [PATCH] analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [pr110198]

2023-06-28 Thread David Malcolm via Gcc-patches
On Thu, 2023-06-22 at 21:55 +0200, priour...@gmail.com wrote:
> From: benjamin priour 
> 
> Resend with proper subject line ...
> 
> Hi,

Hi Benjamin

> 
> Below is the fix to regression bug
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110198
> Was bootstrapped and regtested successfully on x86_64-linux-gnu
> Considering mishap from last patch, I'd would appreciate if you could
> also regtest it, to be sure :)

I tried this, but it didn't apply cleanly to my working copy.  Which
version of master was this against / when did you last rebase this?  I
see in comment #5 of PR 110198 that the results have been changing.

[...snip...]

> g++.dg/analyzer/pr100244.C was failing after a patch of PR109439.
> The reason was a spurious preemptive return of get_store_value upon 
> out-of-bounds read that
> was preventing further checks. Now instead, a boolean value check_poisoned 
> goes to false when
> a OOB is detected, and is later on given to get_or_create_initial_value.
> 
> gcc/analyzer/ChangeLog:
> 
> * region-model-manager.cc 
> (region_model_manager::get_or_create_initial_value): Take an
> optional boolean value to bypass poisoning checks
> * region-model-manager.h: Update declaration of the above function.
> * region-model.cc (region_model::get_store_value): No longer
> returns on OOB, but rather gives a boolean to 
> get_or_create_initial_value.
> (region_model::check_region_access): Update docstring.
> (region_model::check_region_for_write): Update docstring.

Something's gone a bit wrong with the formatting of the ChangeLog
entries.  Ideally they shouldn't go wider than 74 columns, so they need
a few newlines.  Also, some of the lines have too many leading tabs.

[...snip...]

The content of the patch itself looks reasonable.

Thanks

Dave



[r14-2159 Regression] FAIL: gcc.target/i386/pieces-memcmp-2.c scan-assembler-times vptest[ \\t]*%xmm 2 on Linux/x86_64

2023-06-28 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

4afbebcdc5780d28e52b7d65643e462c7c3882ce is the first bad commit
commit 4afbebcdc5780d28e52b7d65643e462c7c3882ce
Author: Roger Sayle 
Date:   Wed Jun 28 11:11:34 2023 +0100

i386: Add cbranchti4 pattern to i386.md (for -m32 compare_by_pieces).

caused

FAIL: gcc.target/i386/pieces-memcmp-2.c scan-assembler-not vptest[ \\t]*%ymm
FAIL: gcc.target/i386/pieces-memcmp-2.c scan-assembler-times vptest[ \\t]*%xmm 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2159/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pieces-memcmp-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/28/23 16:10, 钟居哲 wrote:

Sure.

https://godbolt.org/z/8857KzTno 

Failed to match this instruction:
(set (reg:VNx2DF 134 [ vect__31.47 ])
     (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [ 
vect__28.44 ])))

         (reg:VNx2DF 150 [ vect__8.12 ])
         (reg:VNx2DF 171 [ vect__29.45 ])))
Please attach the full dump.  I would expect to see additional attempts 
with more operands replaced.


jeff


Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-28 Thread 钟居哲
Sure.

https://godbolt.org/z/8857KzTno 

Failed to match this instruction:
(set (reg:VNx2DF 134 [ vect__31.47 ])
(fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [ vect__28.44 
])))
(reg:VNx2DF 150 [ vect__8.12 ])
(reg:VNx2DF 171 [ vect__29.45 ])))



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 02:16
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
 
 
On 6/28/23 05:55, Juzhe-Zhong wrote:
> Similar to vfwmacc. Add combine patterns as follows:
> 
> For vfwnmsac:
> 1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
> 2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))
> 
> For vfwmsac:
> 1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
> 2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))
> 
> For vfwnmacc:
> 1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg 
> (reg)) )))
> 2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec-opt.md (*double_widen_fnma): New 
> pattern.
>  (*single_widen_fnma): Ditto.
>  (*double_widen_fms): Ditto.
>  (*single_widen_fms): Ditto.
>  (*double_widen_fnms): Ditto.
>  (*single_widen_fnms): Ditto.
> 
 
> +
> +;; This helps to match ext + fnma.
> +(define_insn_and_split "*single_widen_fnma"
> +  [(set (match_operand:VWEXTF 0 "register_operand")
> + (fma:VWEXTF
> +   (neg:VWEXTF
> + (float_extend:VWEXTF
> +   (match_operand: 2 "register_operand")))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (match_operand:VWEXTF 1 "register_operand")))]
I'd like to understand this better.  It looks like it's meant to be a 
bridge to another pattern.  However, it looks like it would be a 4->1 
pattern without needing a bridge.  So I'd like to know why that code 
isn't working.
 
Can you send the before/after combine dumps which show this bridge 
pattern being used?
 
The same concern exists with the other bridge patterns, but I don't 
think I need to see the before/after for each of them.
 
 
 
Thanks,
Jeff
 
 


Re: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-28 Thread 钟居哲
You can see here:

https://godbolt.org/z/d78646hWb 

The first case can't genreate vfwmul.vv but second case succeed.

Failed to match this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
(if_then_else:VNx2DF (unspec:VNx2BI [
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])
(reg:DI 153)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(const_int 7 [0x7])
(reg:SI 66 vl)
(reg:SI 67 vtype)
(reg:SI 69 N/A)
] UNSPEC_VPREDICATE)
(mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
(reg:VNx2DF 148 [ vect__8.49 ]))
(unspec:VNx2DF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))


This patch is adding this combine pattern.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 00:24
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/27/23 22:15, Juzhe-Zhong wrote:
> Consider the following complicate case:
> #define TEST_TYPE(TYPE1, TYPE2)   
>  \
>__attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (   
>   \
>  TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,   
>   \
>  TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,
>   \
>  TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)   
>   \
>{  
>   \
>  for (int i = 0; i < n; i++)  
>   \
>{  
>   \
> dst[i] = (TYPE1) a[i] * (TYPE1) b[i];  \
> dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];\
> dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];\
> dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];\
>}  
>   \
>}
> 
> TEST_TYPE (double, float)
> 
> Such complicate situation, Combine PASS can not combine extension of both 
> operands on the fly.
> So the combine PASS will first try to combine one of the combine extension, 
> and then combine
> the other. The combine flow is as follows:
> 
> Original IR:
> (set (reg 0) (float_extend: (reg 1))
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (reg 0) (reg 3))
> 
> First step of combine:
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
> 
> Second step of combine:
> (set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
> 
> So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL 
> pattern in autovec-opt.md
> which is (set (reg 0) (mult (float_extend (reg 1) (reg 2.
Hmm, something doesn't make sense here.  Combine knows how to do a 3->1 
combination.  I would expect to see the first step fail (substituting 
just one operand), then a later step try to combine all three 
instructions, substituting the extension for both input operands.
 
Can you pass along the .combine dump from the failing case?
 
Jeff
 


Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread 钟居哲
Ok. Plz go ahead commit this change with the testcases.
Then it won't block the following patches.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-29 04:42
To: Robin Dapp via Gcc-patches
CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
Robin Dapp via Gcc-patches  writes:
> Hi Juzhe,
>
> I find the bug description rather confusing.  What I can see is that
> the constant in the literal pool is indeed wrong but how would DSE or
> so play a role there?  Particularly only for the smaller modes?
>
> My suspicion would be that the constant in the literal/constant pool
> is wrong from start to finish.
>
> I just played around with the following hunk:
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 542315f88cd..5223c08924f 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
> unsigned int align)
>whole element.  Often this is byte_mode and contains more
>than one element.  */
> unsigned int nelts = GET_MODE_NUNITS (mode);
> -   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> +   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>
> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI elements
> into an int and not just 8.  It would also explain why it works for modes
> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
> analysis but maybe it's a start?
 
Yeah.  Preapproved for trunk & any necessary branches.
 
Thanks,
Richard
 


[PATCH] A couple of va_gc_atomic tweaks

2023-06-28 Thread Richard Sandiford via Gcc-patches
The only current user of va_gc_atomic is Ada's:

vec

It uses the generic gt_pch_nx routines (with gt_pch_nx being the
“note pointers” hooks), such as:

template
void
gt_pch_nx (vec *v)
{
  extern void gt_pch_nx (T &);
  for (unsigned i = 0; i < v->length (); i++)
gt_pch_nx ((*v)[i]);
}

It then defines gt_pch_nx routines for Entity_Id &.

The problem is that if we wanted to take the same approach for
an array of unsigned ints, we'd need to define:

inline void gt_pch_nx (unsigned int &) { }

which would then be ambiguous with:

inline void gt_pch_nx (unsigned int) { }

The point of va_gc_atomic is that the elements don't need to be GCed,
and so we have:

template
void
gt_ggc_mx (vec *v ATTRIBUTE_UNUSED)
{
  /* Nothing to do.  Vectors of atomic types wrt GC do not need to
 be traversed.  */
}

I think it's therefore reasonable to assume that no pointers will
need to be processed for PCH either.

The patch also relaxes the array_slice constructor for vec *
so that it handles all embedded vectors.

Bootstrapped & regression-tested on aarch64-linux-gnu (all languages).
OK to install?

Richard


gcc/
* vec.h (gt_pch_nx): Add overloads for va_gc_atomic.
(array_slice): Relax va_gc constructor to handle all vectors
with a vl_embed layout.

gcc/ada/
* gcc-interface/decl.cc (gt_pch_nx): Remove overloads for Entity_Id.
---
 gcc/ada/gcc-interface/decl.cc | 11 ---
 gcc/vec.h | 22 ++
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 494b24e2111..ee913a017d2 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -163,17 +163,6 @@ struct GTY((for_user)) tree_entity_vec_map
   vec *to;
 };
 
-void
-gt_pch_nx (Entity_Id &)
-{
-}
-
-void
-gt_pch_nx (Entity_Id *x, gt_pointer_operator op, void *cookie)
-{
-  op (x, NULL, cookie);
-}
-
 struct dummy_type_hasher : ggc_cache_ptr_hash
 {
   static inline hashval_t
diff --git a/gcc/vec.h b/gcc/vec.h
index 36918915701..6f7b0487eb6 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1390,6 +1390,13 @@ gt_pch_nx (vec *v)
 gt_pch_nx ((*v)[i]);
 }
 
+template
+void
+gt_pch_nx (vec *)
+{
+  /* No pointers to note.  */
+}
+
 template
 void
 gt_pch_nx (vec *v, gt_pointer_operator op, void *cookie)
@@ -1407,6 +1414,13 @@ gt_pch_nx (vec *v, gt_pointer_operator 
op, void *cookie)
 gt_pch_nx (&((*v)[i]), op, cookie);
 }
 
+template
+void
+gt_pch_nx (vec *, gt_pointer_operator, void *)
+{
+  /* No pointers to note.  */
+}
+
 
 /* Space efficient vector.  These vectors can grow dynamically and are
allocated together with their control data.  They are suited to be
@@ -2286,12 +2300,12 @@ public:
   array_slice (vec )
 : m_base (v.address ()), m_size (v.length ()) {}
 
-  template
-  array_slice (const vec *v)
+  template
+  array_slice (const vec *v)
 : m_base (v ? v->address () : nullptr), m_size (v ? v->length () : 0) {}
 
-  template
-  array_slice (vec *v)
+  template
+  array_slice (vec *v)
 : m_base (v ? v->address () : nullptr), m_size (v ? v->length () : 0) {}
 
   iterator begin () { return m_base; }
-- 
2.25.1



[committed] testsuite: check_effective_target_lra: CRIS is LRA

2023-06-28 Thread Hans-Peter Nilsson via Gcc-patches
Left-over from r14-383-gfaf8bea79b6256.

* lib/target-supports.exp (check_effective_target_lra): Remove
cris-*-* from expression for exceptions to LRA.
---
 gcc/testsuite/lib/target-supports.exp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 184fafb020f8..bad97d8c26b9 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12278,7 +12278,7 @@ proc check_effective_target_o_flag_in_section { } {
 # return 1 if LRA is supported.
 
 proc check_effective_target_lra { } {
-if { [istarget hppa*-*-*] || [istarget cris-*-*] || [istarget avr-*-*] } {
+if { [istarget hppa*-*-*] || [istarget avr-*-*] } {
return 0
 }
 return 1
-- 
2.30.2



[committed] CRIS: Don't apply PATTERN to insn before validation (PR 110144)

2023-06-28 Thread Hans-Peter Nilsson via Gcc-patches
Oops.  The validation was there, but PATTERN was applied
before that.  Noticeable only with rtl-checking (for example
as in the report: "--enable-checking=yes,rtl") as this
statement was only a (one of many) straggling olde-C
declare-and-initialize-at-beginning-of-block thing.

PR target/110144
* config/cris/cris.cc (cris_postdbr_cmpelim): Don't apply PATTERN
to insn before validating it.
---
 gcc/config/cris/cris.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/cris/cris.cc b/gcc/config/cris/cris.cc
index 7fca2af085a7..f04f501326e7 100644
--- a/gcc/config/cris/cris.cc
+++ b/gcc/config/cris/cris.cc
@@ -375,7 +375,6 @@ cris_postdbr_cmpelim ()
   for (insn = get_insns (); insn; insn = next)
 {
   rtx_insn *outer_insn = insn;
-  rtx pat = PATTERN (insn);
 
   next = NEXT_INSN (outer_insn);
 
@@ -389,6 +388,7 @@ cris_postdbr_cmpelim ()
 
   if (!NONDEBUG_INSN_P (insn))
continue;
+  rtx pat = PATTERN (insn);
 
   /* Consider filled delay slots; there might be a comparison there.
 It's only the second insn in a sequence that is interesting.  */
-- 
2.30.2



Re: PR82943 - Suggested patch to fix

2023-06-28 Thread Harald Anlauf via Gcc-patches

Hi Alex,

welcome to the gfortran community.  It is great that you are trying
to get actively involved.

You already did quite a few things right: patches shall be sent to
the gcc-patches ML, but Fortran reviewers usually notice them only
where they are copied to the fortran ML.

There are some general recommendations on the formatting of C code,
like indentation, of the patches, and of the commit log entries.

Regarding coding standards, see https://www.gnu.org/prep/standards/ .

Regarding testcases, a recommendation is to have a look at
existing testcases, e.g. in gcc/testsuite/gfortran.dg/, and then
decide if the testcase shall test the compile-time or run-time
behaviour, and add the necessary dejagnu directives.

You should also verify if your patch passes regression testing.
For changes to gfortran, it is usually sufficient to run

make check-fortran -j 

where  is the number of parallel tests.
You would need to report also the platform where you tested on.

There is also a legal issue to consider before non-trivial patches can
be accepted for incorporation: https://gcc.gnu.org/contribute.html#legal

If your patch is accepted and if you do not have write-access to the
repository, one of the maintainers will likely take care of it.
If you become a regular contributor, you will probably want to consider
getting write access.

Cheers,
Harald



On 6/24/23 19:17, Alexander Westbrooks via Gcc-patches wrote:

Hello,

I am new to the GFortran community. Over the past two weeks I created a
patch that should fix PR82943 for GFortran. I have attached it to this
email. The patch allows the code below to compile successfully. I am
working on creating test cases next, but I am new to the process so it may
take me some time. After I make test cases, do I email them to you as well?
Do I need to make a pull-request on github in order to get the patch
reviewed?

Thank you,

Alexander Westbrooks

module testmod

 public :: foo

 type, public :: tough_lvl_0(a, b)
 integer, kind :: a = 1
 integer, len :: b
 contains
 procedure :: foo
 end type

 type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c)
 integer, len :: c
 contains
 procedure :: bar
 end type

 type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d)
 integer, len :: d
 contains
 procedure :: foobar
 end type

contains
 subroutine foo(this)
 class(tough_lvl_0(1,*)), intent(inout) :: this
 end subroutine

 subroutine bar(this)
 class(tough_lvl_1(1,*,*)), intent(inout) :: this
 end subroutine

 subroutine foobar(this)
 class(tough_lvl_2(1,*,*,*)), intent(inout) :: this
 end subroutine

end module

PROGRAM testprogram
 USE testmod

 TYPE(tough_lvl_0(1,5)) :: test_pdt_0
 TYPE(tough_lvl_1(1,5,6))   :: test_pdt_1
 TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2

 CALL test_pdt_0%foo()

 CALL test_pdt_1%foo()
 CALL test_pdt_1%bar()

 CALL test_pdt_2%foo()
 CALL test_pdt_2%bar()
 CALL test_pdt_2%foobar()


END PROGRAM testprogram




Enable early inlining into always_inline functions

2023-06-28 Thread Jan Hubicka via Gcc-patches
Hi,
early inliner currently skips always_inline functions and moreover we ignore
calls from always_inline in ipa_reverse_postorder.  This leads to disabling
most of propagation done using early optimization that is quite bad when
early inline functions are not leaf functions, which is now quite common
in libstdc++.

This patch instead of fully disabling the inline checks calls in callee.
I am quite conservative about what can be inlined as this patch is bit
touchy anyway.  To avoid problems with always_inline being optimized
after early inline I extended inline_always_inline_functions to lazilly
compute fnsummary when needed.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

PR middle-end/110334
* ipa-fnsummary.h (ipa_fn_summary): Add
safe_to_inline_to_always_inline.
* ipa-inline.cc (can_early_inline_edge_p): ICE
if SSA is not built; do cycle checking for
always_inline functions.
(inline_always_inline_functions): Be recrusive;
watch for cycles; do not updat overall summary.
(early_inliner): Do not give up on always_inlines.
* ipa-utils.cc (ipa_reverse_postorder): Do not skip
always inlines.

gcc/testsuite/ChangeLog:

PR middle-end/110334
* g++.dg/opt/pr66119.C: Disable early inlining.
* gcc.c-torture/compile/pr110334.c: New test.
* gcc.dg/tree-ssa/pr110334.c: New test.

diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index fcc01167d0d..0c5a81e2dca 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -126,8 +126,8 @@ public:
   ipa_fn_summary ()
 : min_size (0),
   inlinable (false), single_caller (false),
-  fp_expressions (false), target_info (0),
-  estimated_stack_size (false),
+  fp_expressions (false), safe_to_inline_to_always_inline (0),
+  target_info (0), estimated_stack_size (false),
   time (0), conds (NULL),
   size_time_table (), call_size_time_table (vNULL),
   loop_iterations (NULL), loop_strides (NULL),
@@ -165,6 +165,8 @@ public:
   unsigned int single_caller : 1;
   /* True if function contains any floating point expressions.  */
   unsigned int fp_expressions : 1;
+  /* Cache for analysis of can_early_inline_edge_p.  */
+  unsigned int safe_to_inline_to_always_inline : 2;
   /* Like fp_expressions field above, but it's to hold some target specific
  information, such as some target specific isa flags.  Note that for
  offloading target compilers, this field isn't streamed.  */
diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
index efc8df7d4e0..71a1c6ca68e 100644
--- a/gcc/ipa-inline.cc
+++ b/gcc/ipa-inline.cc
@@ -680,28 +680,60 @@ can_early_inline_edge_p (struct cgraph_edge *e)
   e->inline_failed = CIF_BODY_NOT_AVAILABLE;
   return false;
 }
-  /* In early inliner some of callees may not be in SSA form yet
- (i.e. the callgraph is cyclic and we did not process
- the callee by early inliner, yet).  We don't have CIF code for this
- case; later we will re-do the decision in the real inliner.  */
-  if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->caller->decl))
-  || !gimple_in_ssa_p (DECL_STRUCT_FUNCTION (callee->decl)))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt,
-"  edge not inlinable: not in SSA form\n");
-  return false;
-}
-  else if (profile_arc_flag
-  && ((lookup_attribute ("no_profile_instrument_function",
-DECL_ATTRIBUTES (caller->decl)) == NULL_TREE)
-  != (lookup_attribute ("no_profile_instrument_function",
-DECL_ATTRIBUTES (callee->decl)) == 
NULL_TREE)))
+  gcc_assert (gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->caller->decl))
+ && gimple_in_ssa_p (DECL_STRUCT_FUNCTION (callee->decl)));
+  if (profile_arc_flag
+  && ((lookup_attribute ("no_profile_instrument_function",
+   DECL_ATTRIBUTES (caller->decl)) == NULL_TREE)
+ != (lookup_attribute ("no_profile_instrument_function",
+   DECL_ATTRIBUTES (callee->decl)) == NULL_TREE)))
 return false;
 
   if (!can_inline_edge_p (e, true, true)
   || !can_inline_edge_by_limits_p (e, true, false, true))
 return false;
+  /* When inlining regular functions into always-inline functions
+ during early inlining watch for possible inline cycles.  */
+  if (DECL_DISREGARD_INLINE_LIMITS (caller->decl)
+  && lookup_attribute ("always_inline", DECL_ATTRIBUTES (caller->decl))
+  && (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
+ || !lookup_attribute ("always_inline", DECL_ATTRIBUTES 
(callee->decl
+{
+  /* If there are indirect calls, inlining may produce direct call.
+TODO: We may lift this restriction if we avoid errors on formely
+indirect calls to always_inline functions.  Taking address
+of 

[PATCH, part3, committed] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

2023-06-28 Thread Harald Anlauf via Gcc-patches
Dear all,

the previous patches to this PR unfortunately caused a regression,
seen on Power big-endian systems/-m32 (pr110419), and while trying
to investigate on x86 also showed a regression (ICE) on cases that
were not covered in the testsuite before.

The original fix did not properly handle the dereferencing of
string arguments that were not constant, and it was lacking the
truncation of strings to length one that is needed when passing
a character on the stack.

This patch has been regtested on x86_64-pc-linux-gnu,
and the extended testcase was scrutinized with -m64 and -m32.

Pushed after discussion in the PR with Mikael as
commit r14-2171-g8736d6b14a4dfdfb58c80ccd398981b0fb5d00aa

https://gcc.gnu.org/g:8736d6b14a4dfdfb58c80ccd398981b0fb5d00aa

Will keep the PR open as long as the issues on Power big-endian
are not confirmed resolved.

Thanks,
Harald

From 8736d6b14a4dfdfb58c80ccd398981b0fb5d00aa Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 28 Jun 2023 22:16:18 +0200
Subject: [PATCH] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument
 [PR110360]

gcc/fortran/ChangeLog:

	PR fortran/110360
	* trans-expr.cc (gfc_conv_procedure_call): For non-constant string
	argument passed to CHARACTER(LEN=1),VALUE dummy, ensure proper
	dereferencing and truncation of string to length 1.

gcc/testsuite/ChangeLog:

	PR fortran/110360
	* gfortran.dg/value_9.f90: Add tests for intermediate regression.
---
 gcc/fortran/trans-expr.cc | 15 ++-
 gcc/testsuite/gfortran.dg/value_9.f90 | 23 +++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index ad0cdf902ba..30946ba3f63 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6395,7 +6395,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,

 		/* ABI: actual arguments to CHARACTER(len=1),VALUE
 		   dummy arguments are actually passed by value.
-		   Constant strings are truncated to length 1.
+		   Strings are truncated to length 1.
 		   The BIND(C) case is handled elsewhere.  */
 		if (fsym->ts.type == BT_CHARACTER
 			&& !fsym->ts.is_c_interop
@@ -6405,10 +6405,15 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 			(fsym->ts.u.cl->length->value.integer, 1) == 0))
 		  {
 			if (e->expr_type != EXPR_CONSTANT)
-			  parmse.expr = gfc_string_to_single_character
-			(build_int_cst (gfc_charlen_type_node, 1),
-			 parmse.expr,
-			 e->ts.kind);
+			  {
+			tree slen1 = build_int_cst (gfc_charlen_type_node, 1);
+			gfc_conv_string_parameter ();
+			parmse.expr = gfc_string_to_single_character (slen1,
+	  parmse.expr,
+	  e->ts.kind);
+			/* Truncate resulting string to length 1.  */
+			parmse.string_length = slen1;
+			  }
 			else if (e->value.character.length > 1)
 			  {
 			e->value.character.length = 1;
diff --git a/gcc/testsuite/gfortran.dg/value_9.f90 b/gcc/testsuite/gfortran.dg/value_9.f90
index f6490645e27..1a2fa80ed0d 100644
--- a/gcc/testsuite/gfortran.dg/value_9.f90
+++ b/gcc/testsuite/gfortran.dg/value_9.f90
@@ -9,7 +9,12 @@ program p
   character  (kind=4), allocatable :: ca4
   character  (kind=4), pointer :: cp4
   character(len=:,kind=4), allocatable :: cd4
+  character:: c  =   "1"
+  character  (kind=4)  :: c4 = 4_"4"
+  character(len=3) :: d  =   "210"
+  character(len=3,kind=4)  :: d4 = 4_"321"
   integer :: a = 65
+  integer :: l = 2
   allocate (ca, cp, ca4, cp4)

   ! Check len=1 actual argument cases first
@@ -20,15 +25,21 @@ program p
   call val  ("A",char(a))
   call val  ("A",mychar(65))
   call val  ("A",mychar(a))
+  call val  ("1",c)
+  call val  ("1",(c))
   call val4 (4_"C",4_"C")
   call val4 (4_"A",char(65,kind=4))
   call val4 (4_"A",char(a, kind=4))
+  call val4 (4_"4",c4)
+  call val4 (4_"4",(c4))
   call val  (ca,ca)
   call val  (cp,cp)
   call val  (cd,cd)
+  call val  (ca,(ca))
   call val4 (ca4,ca4)
   call val4 (cp4,cp4)
   call val4 (cd4,cd4)
+  call val4 (cd4,(cd4))
   call sub  ("S")
   call sub4 (4_"T")

@@ -37,6 +48,18 @@ program p
   call val4 (4_"V**",4_"V//")
   call sub  (  "WTY")
   call sub4 (4_"ZXV")
+  call val  (  "234",  d)
+  call val4 (4_"345",  d4   )
+  call val  (  "234", (d)   )
+  call val4 (4_"345", (d4)  )
+  call val  (  "234",  d (1:2))
+  call val4 (4_"345",  d4(1:2))
+  call val  (  "234",  d (1:l))
+  call val4 (4_"345",  d4(1:l))
+  call val  ("1",c // d)
+  call val  ("1",trim (c // d))
+  call val4 (4_"4",c4 // d4)
+  call val4 (4_"4",trim (c4 // d4))
   cd = "gkl"; cd4 = 4_"hmn"
   call val  (cd,cd)
   call val4 (cd4,cd4)
--
2.35.3



Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread Richard Sandiford via Gcc-patches
Robin Dapp via Gcc-patches  writes:
> Hi Juzhe,
>
> I find the bug description rather confusing.  What I can see is that
> the constant in the literal pool is indeed wrong but how would DSE or
> so play a role there?  Particularly only for the smaller modes?
>
> My suspicion would be that the constant in the literal/constant pool
> is wrong from start to finish.
>
> I just played around with the following hunk:
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 542315f88cd..5223c08924f 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
> unsigned int align)
>whole element.  Often this is byte_mode and contains more
>than one element.  */
> unsigned int nelts = GET_MODE_NUNITS (mode);
> -   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> +   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>
> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI elements
> into an int and not just 8.  It would also explain why it works for modes
> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
> analysis but maybe it's a start?

Yeah.  Preapproved for trunk & any necessary branches.

Thanks,
Richard


Re: [PATCH] c++: cache partial template specialization selection

2023-06-28 Thread Jason Merrill via Gcc-patches

On 6/28/23 12:51, Patrick Palka wrote:

There's currently no cheap way to obtain the partial template
specialization (and arguments relative to it) that was selected for a
class or variable template specialization.  Our only option is to
compute the result from scratch via most_specialized_partial_spec.

For class templates this isn't really an issue because we usually need
this information just once, upon instantiation.  But for variable
templates we need it upon specialization and later upon instantiation.
It'd be good for this information to be readily available in general
however.

To that end, this patch adds a TI_PARTIAL_INFO field to TEMPLATE_INFO
that holds another TEMPLATE_INFO consisting of the partial template and
arguments relative to it, which most_specialized_partial_spec then
uses to transparently cache its (now TEMPLATE_INFO) result.
> Similarly, there's no easy way to go from the DECL_TEMPLATE_RESULT of a
partial TEMPLATE_DECL back to the TEMPLATE_DECL.  (Our best option is to
walk the DECL_TEMPLATE_SPECIALIZATIONS list of the primary TEMPLATE_DECL.)
So this patch also uses this new field to link these entities in this
other direction.


You had talked about this possibly replacing the deferred_access_checks; 
could they share the same slot by anonymous union?  Your second use 
would conflict, but perhaps the checks could move to the TEMPLATE_INFO 
of the TEMPLATE_DECL now that we have a way to get at it?


In any case, this patch is OK.

Jason



Re: [PATCH] c++: redundant targ coercion for var/alias tmpls

2023-06-28 Thread Patrick Palka via Gcc-patches
On Wed, Jun 28, 2023 at 11:50 AM Jason Merrill  wrote:
>
> On 6/23/23 12:23, Patrick Palka wrote:
> > On Fri, 23 Jun 2023, Jason Merrill wrote:
> >
> >> On 6/21/23 13:19, Patrick Palka wrote:
> >>> When stepping through the variable/alias template specialization code
> >>> paths, I noticed we perform template argument coercion twice: first from
> >>> instantiate_alias_template / finish_template_variable and again from
> >>> tsubst_decl (during instantiate_template).  It should suffice to perform
> >>> coercion once.
> >>>
> >>> To that end patch elides this second coercion from tsubst_decl when
> >>> possible.  We can't get rid of it completely because we don't always
> >>> specialize a variable template from finish_template_variable: we could
> >>> also be doing so directly from instantiate_template during variable
> >>> template partial specialization selection, in which case the coercion
> >>> from tsubst_decl would be the first and only coercion.
> >>
> >> Perhaps we should be coercing in lookup_template_variable rather than
> >> finish_template_variable?
> >
> > Ah yes, there's a patch for that at
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617377.html :)
>
> So after that patch, can we get rid of the second coercion completely?

On second thought it should be possible to get rid of it, if we
rearrange things to always pass the primary arguments to tsubst_decl,
and perform partial specialization selection from there instead of
instantiate_template.  Let me try...

>
> Jason
>



Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

I find the bug description rather confusing.  What I can see is that
the constant in the literal pool is indeed wrong but how would DSE or
so play a role there?  Particularly only for the smaller modes?

My suspicion would be that the constant in the literal/constant pool
is wrong from start to finish.

I just played around with the following hunk:

diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 542315f88cd..5223c08924f 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, 
unsigned int align)
   whole element.  Often this is byte_mode and contains more
   than one element.  */
unsigned int nelts = GET_MODE_NUNITS (mode);
-   unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
+   unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();

With this all your examples pass for me.  We then pack e.g. 16 VNx2BI elements
into an int and not just 8.  It would also explain why it works for modes
where PRECISION == BITSIZE.  Now it will certainly require a more thorough
analysis but maybe it's a start?

Regards
 Robin



Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread 钟居哲
Try this:
https://godbolt.org/z/x7bM5Pr84 




juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 02:11
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
 
 
On 6/28/23 03:47, Juzhe-Zhong wrote:
> This bug blocks the following patches.
> 
> GCC doesn't know RVV is using compact mask model.
> Consider this following case:
> 
> #define N 16
> 
> int
> main ()
> {
>int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
>int8_t out[N] = {0};
>for (int8_t i = 0; i < N; ++i)
>  if (mask[i])
>out[i] = i;
>for (int8_t i = 0; i < N; ++i)
>  {
>if (mask[i])
> assert (out[i] == i);
>else
> assert (out[i] == 0);
>  }
> }
> 
> Before this patch, the pre-calculated mask in constant memory pool:
> .LC1:
>  .byte   68 > 0b01000100
> 
> This is incorrect, such case failed in execution.
> 
> After this patch:
> .LC1:
> .byte 10 > 0b1010
So I don't get anything like this in my testing.  What are the precise 
arguments you're using to build the testcase?
 
I'm compiling the test use a trunk compiler with
 
  -O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv
 
I get the attached code both before and after your patch.  Clearly I'm 
doing something different/wrong.So my request is for the precise 
command line you're using and the before/after resulting assembly code.
 
Jeff


[committed] d: Fix wrong code-gen when returning structs by value.

2023-06-28 Thread Iain Buclaw via Gcc-patches
Hi,

Since r13-1104, structs in the D have had compute_record_mode called too
early on them, causing them to return differently depending on the order
that types are generated in, and whether there are forward references.

This patch moves the call to compute_record_mode into its own function,
and calls it after all fields have been given a size.

Bootstrapped on i686-apple-darwin17 - previously it failed at stage2 -
as well as bootstrapped and regression tested on x86_64-linux-gnu/-m32.
Committed to mainline, and backported to releases/gcc-13.

Regards,
Iain.

---
PR d/106977
PR target/110406

gcc/d/ChangeLog:

* types.cc (finish_aggregate_mode): New function.
(finish_incomplete_fields): Call finish_aggregate_mode.
(finish_aggregate_type): Replace call to compute_record_mode with
finish_aggregate_mode.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr110406.d: New test.
---
 gcc/d/types.cc  | 39 ++---
 gcc/testsuite/gdc.dg/torture/pr110406.d | 25 
 2 files changed, 60 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/torture/pr110406.d

diff --git a/gcc/d/types.cc b/gcc/d/types.cc
index bdf07f83d4b..ef2d80e5bd4 100644
--- a/gcc/d/types.cc
+++ b/gcc/d/types.cc
@@ -573,6 +573,35 @@ layout_aggregate_type (AggregateDeclaration *decl, tree 
type,
 }
 }
 
+/* Given a record type TYPE compute the finalized record mode if all fields 
have
+   had their types resolved and sizes determined.  */
+
+void
+finish_aggregate_mode (tree type)
+{
+  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+{
+  /* Fields of type `typeof(*null)' have no size, so let them force the
+record type mode to be computed as BLKmode.  */
+  if (TYPE_MAIN_VARIANT (TREE_TYPE (field)) == noreturn_type_node)
+   break;
+
+  if (DECL_SIZE (field) == NULL_TREE)
+   return;
+}
+
+  compute_record_mode (type);
+
+  /* Propagate computed mode to all variants of this aggregate type.  */
+  for (tree t = TYPE_MAIN_VARIANT (type); t; t = TYPE_NEXT_VARIANT (t))
+{
+  if (t == type)
+   continue;
+
+  SET_TYPE_MODE (t, TYPE_MODE (type));
+}
+}
+
 /* If the aggregate type TYPE completes the type of any previous field
declarations, lay them out now.  */
 
@@ -596,6 +625,9 @@ finish_incomplete_fields (tree type)
}
 
   relayout_decl (field);
+
+  /* Relayout of field may change the mode of its RECORD_TYPE.  */
+  finish_aggregate_mode (DECL_FIELD_CONTEXT (field));
 }
 
   /* No more forward references to process.  */
@@ -615,9 +647,6 @@ finish_aggregate_type (unsigned structsize, unsigned 
alignsize, tree type)
   SET_TYPE_ALIGN (type, alignsize * BITS_PER_UNIT);
   TYPE_PACKED (type) = (alignsize == 1);
 
-  /* Set the back-end type mode.  */
-  compute_record_mode (type);
-
   /* Layout all fields now the type is complete.  */
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 {
@@ -662,6 +691,9 @@ finish_aggregate_type (unsigned structsize, unsigned 
alignsize, tree type)
}
 }
 
+  /* Set the back-end type mode after all fields have had their size set.  */
+  finish_aggregate_mode (type);
+
   /* Fix up all forward-referenced variants of this aggregate type.  */
   for (tree t = TYPE_MAIN_VARIANT (type); t; t = TYPE_NEXT_VARIANT (t))
 {
@@ -673,7 +705,6 @@ finish_aggregate_type (unsigned structsize, unsigned 
alignsize, tree type)
   TYPE_SIZE (t) = TYPE_SIZE (type);
   TYPE_SIZE_UNIT (t) = TYPE_SIZE_UNIT (type);
   TYPE_PACKED (type) = TYPE_PACKED (type);
-  SET_TYPE_MODE (t, TYPE_MODE (type));
   SET_TYPE_ALIGN (t, TYPE_ALIGN (type));
   TYPE_USER_ALIGN (t) = TYPE_USER_ALIGN (type);
 }
diff --git a/gcc/testsuite/gdc.dg/torture/pr110406.d 
b/gcc/testsuite/gdc.dg/torture/pr110406.d
new file mode 100644
index 000..c380e4bdec8
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/torture/pr110406.d
@@ -0,0 +1,25 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110406
+// { dg-do compile { target i?86-*-* x86_64-*-* } }
+// { dg-options "-fdump-tree-optimized" }
+struct cpuid_abcd_t
+{
+uint eax;
+uint ebx;
+uint ecx;
+uint edx;
+};
+
+cpuid_abcd_t cpuid_insn(const uint in_eax)
+{
+cpuid_abcd_t ret = void;
+asm { "cpuid"
+: "=a" (ret.eax),
+  "=b" (ret.ebx),
+  "=c" (ret.ecx),
+  "=d" (ret.edx)
+: "a"  (in_eax)
+:;
+}
+return ret;
+}
+// { dg-final { scan-tree-dump-not "MEM " "optimized" } }
-- 
2.39.2



Re: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/21/23 18:19, Li, Pan2 wrote:

Hi there,

I try to verify the offloading following below doc.

https://gcc.gnu.org/wiki/Offloading#How_to_build_an_offloading-enabled_GCC

with some steps:

1. Build nvptx-tools.
2. Symbol link nvptx-newlib to gcc source code.
3. Build the Nividia PTX accel compiler.
4. Build the host compiler with nvptx as offload target, but I don't have the 
GPU, then drop the --with-cuda-driver=xxx option.
5. Run command for building, aka ./nvptx-tools/usr/local/bin/gcc -O0 -fopenmp 
test.c -o test.elf.

The building complete successfully, but looks I cannot run it without GPU, and 
I am not very sure this is good enough for validation or not.
If you don't have a suitable GPU for offloading, you could instead just 
compare the offloaded binary before/after your change.  I would expect 
them to be 100% identical.


If we take that route for verification, I think the question turns into 
how to do that for the testsuite.  ie, I think Jakub wants to verify 
that check-target-libgomp still passes when offloading is enabled.  I 
don't think there's an easy way to capture the resulting binaries for 
comparison purposes.  But that's what I'd suggest given the lack of a 
suitable GPU for testing.  So you might need to hack up the libgomp 
testsuite's .exp files to capture the binaries.


Before going to those extremes, I would suggest verifying that you do in 
fact get identical binaries before/after your change on a simple 
offloading test.




jeff


Re: [PATCH 10/11] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/28/23 06:39, Christoph Müllner wrote:


+;; XTheadMemIdx overview:
+;; All peephole passes attempt to improve the operand utilization of
+;; XTheadMemIdx instructions, where one sign or zero extended
+;; register-index-operand can be shifted left by a 2-bit immediate.
+;;
+;; The basic idea is the following optimization:
+;; (set (reg 0) (op (reg 1) (imm 2)))
+;; (set (reg 3) (mem (plus (reg 0) (reg 4)))
+;; ==>
+;; (set (reg 3) (mem (plus (reg 4) (op2 (reg 1) (imm 2
+;; This optimization only valid if (reg 0) has no further uses.

Couldn't this be done by combine if you created define_insn patterns
rather than define_peephole2 patterns?  Similarly for the other cases
handled here.


I was inspired by XTheadMemPair, which merges two memory accesses
into a mem-pair instruction (and which got inspiration from
gcc/config/aarch64/aarch64-ldpstp.md).
Right.  I'm pretty familiar with those.  They cover a different case, 
specifically the two insns being optimized don't have a true data 
dependency between them.  ie, the first instruction does not produce a 
result used in the second insn.



In the case above there is a data dependency on reg0.  ie, the first 
instruction generates a result used in the second instruction.  combine 
is usually the best place to handle the data dependency case.





I don't see the benefit of using combine or peephole, but I can change
if necessary. At least for the provided test cases, the implementation
works quite well.
Peepholes require the instructions to be consecutive in the stream while 
combine relies on data dependence links and can thus find these 
opportunities even when the two insn we care about are separated by 
unrelated other insns.



Jeff


Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/28/23 05:55, Juzhe-Zhong wrote:

Similar to vfwmacc. Add combine patterns as follows:

For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))

For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))

For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) 
)))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))

gcc/ChangeLog:

 * config/riscv/autovec-opt.md (*double_widen_fnma): New pattern.
 (*single_widen_fnma): Ditto.
 (*double_widen_fms): Ditto.
 (*single_widen_fms): Ditto.
 (*double_widen_fnms): Ditto.
 (*single_widen_fnms): Ditto.




+
+;; This helps to match ext + fnma.
+(define_insn_and_split "*single_widen_fnma"
+  [(set (match_operand:VWEXTF 0 "register_operand")
+   (fma:VWEXTF
+ (neg:VWEXTF
+   (float_extend:VWEXTF
+ (match_operand: 2 "register_operand")))
+ (match_operand:VWEXTF 3 "register_operand")
+ (match_operand:VWEXTF 1 "register_operand")))]
I'd like to understand this better.  It looks like it's meant to be a 
bridge to another pattern.  However, it looks like it would be a 4->1 
pattern without needing a bridge.  So I'd like to know why that code 
isn't working.


Can you send the before/after combine dumps which show this bridge 
pattern being used?


The same concern exists with the other bridge patterns, but I don't 
think I need to see the before/after for each of them.




Thanks,
Jeff



Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-28 Thread Jeff Law via Gcc-patches



On 6/28/23 03:47, Juzhe-Zhong wrote:

This bug blocks the following patches.

GCC doesn't know RVV is using compact mask model.
Consider this following case:

#define N 16

int
main ()
{
   int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
   int8_t out[N] = {0};
   for (int8_t i = 0; i < N; ++i)
 if (mask[i])
   out[i] = i;
   for (int8_t i = 0; i < N; ++i)
 {
   if (mask[i])
assert (out[i] == i);
   else
assert (out[i] == 0);
 }
}

Before this patch, the pre-calculated mask in constant memory pool:
.LC1:
 .byte   68 > 0b01000100

This is incorrect, such case failed in execution.

After this patch:
.LC1:
.byte   10 > 0b1010
So I don't get anything like this in my testing.  What are the precise 
arguments you're using to build the testcase?


I'm compiling the test use a trunk compiler with

 -O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv

I get the attached code both before and after your patch.  Clearly I'm 
doing something different/wrong.So my request is for the precise 
command line you're using and the before/after resulting assembly code.


Jeff.file   "j.c"
.option nopic
.attribute arch, 
"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.section.rodata.str1.8,"aMS",@progbits,1
.align  3
.LC1:
.string "j.c"
.align  3
.LC2:
.string "out[i] == i"
.align  3
.LC3:
.string "out[i] == 0"
.section.text.startup,"ax",@progbits
.align  1
.globl  main
.type   main, @function
main:
.LFB0:
.cfi_startproc
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
ld  a4,0(a5)
ld  a5,8(a5)
addisp,sp,-48
.cfi_def_cfa_offset 48
vsetivlizero,16,e8,m1,ta,ma
sd  zero,16(sp)
sd  a4,0(sp)
sd  a5,8(sp)
sd  ra,40(sp)
.cfi_offset 1, -8
addia5,sp,16
sd  zero,24(sp)
vid.v   v1
vl1re8.vv0,0(sp)
vmsne.viv0,v0,0
vsetvli a4,zero,e8,m1,ta,ma
vse8.v  v1,0(a5),v0.t
lbu a5,16(sp)
bne a5,zero,.L2
lbu a4,17(sp)
li  a5,1
bne a4,a5,.L3
lbu a5,18(sp)
bne a5,zero,.L2
lbu a4,19(sp)
li  a5,3
bne a4,a5,.L3
lbu a5,20(sp)
bne a5,zero,.L2
lbu a4,21(sp)
li  a5,5
bne a4,a5,.L3
lbu a5,22(sp)
bne a5,zero,.L2
lbu a4,23(sp)
li  a5,7
bne a4,a5,.L3
lbu a5,24(sp)
bne a5,zero,.L2
lbu a4,25(sp)
li  a5,9
bne a4,a5,.L3
lbu a5,26(sp)
bne a5,zero,.L2
lbu a4,27(sp)
li  a5,11
bne a4,a5,.L3
lbu a5,28(sp)
bne a5,zero,.L2
lbu a4,29(sp)
li  a5,13
bne a4,a5,.L3
lbu a5,30(sp)
bne a5,zero,.L2
lbu a4,31(sp)
li  a5,15
bne a4,a5,.L3
ld  ra,40(sp)
.cfi_remember_state
.cfi_restore 1
li  a0,0
addisp,sp,48
.cfi_def_cfa_offset 0
jr  ra
.L2:
.cfi_restore_state
lui a3,%hi(__PRETTY_FUNCTION__.0)
lui a1,%hi(.LC1)
lui a0,%hi(.LC3)
addia3,a3,%lo(__PRETTY_FUNCTION__.0)
li  a2,18
addia1,a1,%lo(.LC1)
addia0,a0,%lo(.LC3)
call__assert_fail
.L3:
lui a3,%hi(__PRETTY_FUNCTION__.0)
lui a1,%hi(.LC1)
lui a0,%hi(.LC2)
addia3,a3,%lo(__PRETTY_FUNCTION__.0)
li  a2,16
addia1,a1,%lo(.LC1)
addia0,a0,%lo(.LC2)
call__assert_fail
.cfi_endproc
.LFE0:
.size   main, .-main
.section.rodata
.align  3
.set.LANCHOR0,. + 0
.LC0:
.string ""
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.ascii  "\001"
.section.srodata,"a"
    .align  3
.type   __PRETTY_FUNCTION__.0, @object
.size   __PRETTY_FUNCTION__.0, 5
__PRETTY_FUNCTION__.0:
.string "main"
.ident  "GCC: (GNU) 14.0.0 20230628 (experimental)"
.section.note.GNU-stack,"",@progbits


Re: [PATCH] Relax type-printer regexp in libstdc++ test suite

2023-06-28 Thread Jonathan Wakely via Gcc-patches
On Wed, 28 Jun 2023 at 16:58, Tom Tromey via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> The libstdc++ test suite checks whether gdb type printers are
> available like so:
>
> set do_whatis_tests [gdb_batch_check "python print(gdb.type_printers)"
> \
>"\\\[\\\]"]
>
> This regexp assumes that the list of printers is empty.  However,
> sometimes it's convenient to ship a gdb that comes with some default
> printers, causing this to erroneously report that gdb is "too old".
>
> I believe the intent of this check is to ensure that gdb.type_printers
> exists -- not to check its starting value.  This patch changes the
> check to accept any Python list as output.
>
> Note that the patch doesn't look for the trailing "]".  I tried this
> but in my case the output was too long for expect.  It seemed fine to
> just check the start, as the point really is to reject the case where
> the command prints an error message.
>


Looks good. OK for trunk, and OK to backport after some soak time on trunk.
Thanks.



> * testsuite/lib/gdb-test.exp (gdb-test): Relax type-printer
> regexp.
> ---
>  libstdc++-v3/testsuite/lib/gdb-test.exp | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/testsuite/lib/gdb-test.exp
> b/libstdc++-v3/testsuite/lib/gdb-test.exp
> index 3728a060aa4..d8e572ef7b3 100644
> --- a/libstdc++-v3/testsuite/lib/gdb-test.exp
> +++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
> @@ -107,8 +107,12 @@ proc gdb-test { marker {selector {}} {load_xmethods
> 0} } {
> }
>  }
>
> +# A very old version of gdb will not have the type_printers
> +# global.  Some organizations may ship a gdb that has some default
> +# type printers, so accept any list output as indication that the
> +# global exists.
>  set do_whatis_tests [gdb_batch_check "python
> print(gdb.type_printers)" \
> -  "\\\[\\\]"]
> +  "\\\[.+"]
>  if {!$do_whatis_tests} {
> send_log "skipping 'whatis' tests - gdb too old"
>  }
> --
> 2.40.1
>
>


Re: [PATCH] c++: Fix ICE with parameter pack of decltype(auto) [PR103497]

2023-06-28 Thread Patrick Palka via Gcc-patches
On Sat, Jun 24, 2023 at 9:24 AM Nathaniel Shead
 wrote:
>
> On Fri, Jun 23, 2023 at 11:59:51AM -0400, Patrick Palka wrote:
> > Hi,
> >
> > On Sat, 22 Apr 2023, Nathaniel Shead via Gcc-patches wrote:
> >
> > > Bootstrapped and tested on x86_64-pc-linux-gnu.
> > >
> > > -- 8< --
> > >
> > > This patch raises an error early when the decltype(auto) specifier is
> > > used as a parameter of a function. This prevents any issues with an
> > > unexpected tree type later on when performing the call.
> >
> > Thanks very much for the patch!  Some minor comments below.
> >
> > >
> > > PR 103497
> >
> > We should include the bug component name when referring to the PR in the
> > commit message (i.e. PR c++/103497) so that upon pushing the patch the
> > post-commit hook automatically adds a comment to the PR reffering to the
> > commit.  I could be wrong but AFAIK the hook only performs this when the
> > component name is included.
>
> Thanks for the review! Fixed.
>
> > >
> > > gcc/cp/ChangeLog:
> > >
> > > * parser.cc (cp_parser_simple_type_specifier): Add check for
> > > decltype(auto) as function parameter.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * g++.dg/pr103497.C: New test.
> > >
> > > Signed-off-by: Nathaniel Shead 
> > > ---
> > >  gcc/cp/parser.cc| 10 ++
> > >  gcc/testsuite/g++.dg/pr103497.C |  7 +++
> > >  2 files changed, 17 insertions(+)
> > >  create mode 100644 gcc/testsuite/g++.dg/pr103497.C
> > >
> > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > index e5f032f2330..1415e07e152 100644
> > > --- a/gcc/cp/parser.cc
> > > +++ b/gcc/cp/parser.cc
> > > @@ -19884,6 +19884,16 @@ cp_parser_simple_type_specifier (cp_parser* 
> > > parser,
> > >&& cp_lexer_peek_nth_token (parser->lexer, 2)->type != CPP_SCOPE)
> > >  {
> > >type = saved_checks_value (token->u.tree_check_value);
> > > +  /* Within a function parameter declaration, decltype(auto) is 
> > > always an
> > > +error.  */
> > > +  if (parser->auto_is_implicit_function_template_parm_p
> > > + && TREE_CODE (type) == TEMPLATE_TYPE_PARM
> >
> > We could check is_auto (type) here instead, to avoid any confusion with
> > checking AUTO_IS_DECLTYPE for a non-auto TEMPLATE_TYPE_PARM.
> >
> > > + && AUTO_IS_DECLTYPE (type))
> > > +   {
> > > + error_at (token->location,
> > > +   "cannot declare a parameter with %");
> > > + type = error_mark_node;
> > > +   }
> > >if (decl_specs)
> > > {
> > >   cp_parser_set_decl_spec_type (decl_specs, type,
> > > diff --git a/gcc/testsuite/g++.dg/pr103497.C 
> > > b/gcc/testsuite/g++.dg/pr103497.C
> > > new file mode 100644
> > > index 000..bcd421c2907
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/pr103497.C
> > > @@ -0,0 +1,7 @@
> > > +// { dg-do compile { target c++14 } }
> > > +
> > > +void foo(decltype(auto)... args);  // { dg-error "parameter with 
> > > .decltype.auto..|no parameter packs" }
> >
> > I noticed for
> >
> >   void foo(decltype(auto) arg);
> >
> > we already issue an identical error from grokdeclarator.  Perhaps we could
> > instead extend the error handling there to detect decltype(auto)... as well,
> > rather than adding new error handling in cp_parser_simple_type_specifier?
>
> Ah thanks, I didn't notice this; this simplifies the change a fair bit.
> How about this patch instead?

LGTM! Though I can't approve the patch myself.

>
> Regtested on x86_64-pc-linux-gnu.
>
> -- 8< --
>
> This patch ensures that checks for usages of 'auto' in function
> parameters also consider parameter packs, since 'type_uses_auto' does
> not seem to consider this case.
>
> PR c++/103497
>
> gcc/cp/ChangeLog:
>
> * decl.cc (grokdeclarator): Check for decltype(auto) in
> parameter pack.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp1y/decltype-auto-103497.C: New test.
>
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/decl.cc| 3 +++
>  gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C | 8 
>  2 files changed, 11 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
>
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 60f107d50c4..aaf691fce68 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -14044,6 +14044,9 @@ grokdeclarator (const cp_declarator *declarator,
> error ("cannot use %<::%> in parameter declaration");
>
>tree auto_node = type_uses_auto (type);
> +  if (!auto_node && parameter_pack_p)
> +   auto_node = type_uses_auto (PACK_EXPANSION_PATTERN (type));
> +
>if (auto_node && !(cxx_dialect >= cxx17 && template_parm_flag))
> {
>   if (cxx_dialect >= cxx14)
> diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C 
> b/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
> new file mode 100644
> index 000..cedd661710c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
> @@ 

[PATCH] c++: cache partial template specialization selection

2023-06-28 Thread Patrick Palka via Gcc-patches
There's currently no cheap way to obtain the partial template
specialization (and arguments relative to it) that was selected for a
class or variable template specialization.  Our only option is to
compute the result from scratch via most_specialized_partial_spec.

For class templates this isn't really an issue because we usually need
this information just once, upon instantiation.  But for variable
templates we need it upon specialization and later upon instantiation.
It'd be good for this information to be readily available in general
however.

To that end, this patch adds a TI_PARTIAL_INFO field to TEMPLATE_INFO
that holds another TEMPLATE_INFO consisting of the partial template and
arguments relative to it, which most_specialized_partial_spec then
uses to transparently cache its (now TEMPLATE_INFO) result.

Similarly, there's no easy way to go from the DECL_TEMPLATE_RESULT of a
partial TEMPLATE_DECL back to the TEMPLATE_DECL.  (Our best option is to
walk the DECL_TEMPLATE_SPECIALIZATIONS list of the primary TEMPLATE_DECL.)
So this patch also uses this new field to link these entities in this
other direction.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Memory usage increases by ~0.2% overall with this patch (due to
the larger TEMPLATE_INFO, which now is the same size as TREE_LIST),
which seems acceptable.

gcc/cp/ChangeLog:

* cp-tree.h (tree_template_info::partial): New data member.
(TI_PARTIAL_INFO): New tree accessor.
(most_specialized_partial_spec): Add defaulted bool parameter.
* module.cc (trees_out::core_vals) :
Stream TI_PARTIAL_INFO.
(trees_in::core_vals) : Likewise.
* parser.cc (specialization_of): Adjust after making
most_specialized_partial_spec return TEMPLATE_INFO instead
of TREE_LIST.
* pt.cc (process_partial_specialization): Set TI_PARTIAL_INFO
of 'decl' to point back to the partial TEMPLATE_DECL.  Likewise
(and pass rechecking=true to most_specialization_partial_spec).
(instantiate_class_template): Likewise.
(instantiate_template): Set TI_PARTIAL_INFO to the result of
most_specialization_partial_spec after forming a variable
template specialization.
(most_specialized_partial_spec): Add 'rechecking' parameter.
Exit early if the template is not primary.  Use the TI_PARTIAL_INFO
of the corresponding TEMPLATE_INFO as a cache unless 'rechecking'
is true.  Don't bother setting TREE_TYPE of each TREE_LIST.
(instantiate_decl): Adjust after making
most_specialized_partial_spec return TEMPLATE_INFO instead of
TREE_LIST.
* ptree.cc (cxx_print_xnode) : Dump
TI_PARTIAL_INFO.
---
 gcc/cp/cp-tree.h | 11 ++-
 gcc/cp/module.cc |  2 ++
 gcc/cp/parser.cc |  6 ++--
 gcc/cp/pt.cc | 75 +++-
 gcc/cp/ptree.cc  |  3 ++
 5 files changed, 66 insertions(+), 31 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 83982233111..fe94af46346 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1564,6 +1564,7 @@ struct GTY(()) tree_template_info {
   struct tree_base base;
   tree tmpl;
   tree args;
+  tree partial;
   vec *deferred_access_checks;
 };
 
@@ -3755,6 +3756,14 @@ struct GTY(()) lang_decl {
   ((struct tree_template_info*)TEMPLATE_INFO_CHECK (NODE))->args
 #define TI_PENDING_TEMPLATE_FLAG(NODE) \
   TREE_LANG_FLAG_1 (TEMPLATE_INFO_CHECK (NODE))
+
+/* For a class or variable template specialization, this contains the
+   TEMPLATE_INFO result of most_specialized_partial_spec, i.e. the selected
+   partial template specialization and arguments relative to it.  */
+#define TI_PARTIAL_INFO(NODE) \
+  (gcc_checking_assert (PRIMARY_TEMPLATE_P (TI_TEMPLATE (NODE))), \
+   ((struct tree_template_info*)NODE)->partial)
+
 /* For a given TREE_VEC containing a template argument list,
this property contains the number of arguments that are not
defaulted.  */
@@ -7397,7 +7406,7 @@ extern bool comp_template_args(tree, 
tree, tree * = NULL,
 extern int template_args_equal  (tree, tree, bool = false);
 extern tree maybe_process_partial_specialization (tree);
 extern tree most_specialized_instantiation (tree);
-extern tree most_specialized_partial_spec   (tree, tsubst_flags_t);
+extern tree most_specialized_partial_spec   (tree, tsubst_flags_t, bool = 
false);
 extern void print_candidates   (tree);
 extern void instantiate_pending_templates  (int);
 extern tree tsubst_default_argument(tree, int, tree, tree,
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ecde98d69b4..ea362bdffa4 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -6364,6 +6364,7 @@ trees_out::core_vals (tree t)
   {
WT (((lang_tree_node *)t)->template_info.tmpl);
WT (((lang_tree_node *)t)->template_info.args);
+   WT 

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/27/23 22:15, Juzhe-Zhong wrote:

Consider the following complicate case:
#define TEST_TYPE(TYPE1, TYPE2)\
   __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( 
\
 TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, 
\
 TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,  
\
 TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) 
\
   {
\
 for (int i = 0; i < n; i++)
\
   {
\
dst[i] = (TYPE1) a[i] * (TYPE1) b[i];  \
dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];\
dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];\
dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];\
   }
\
   }

TEST_TYPE (double, float)

Such complicate situation, Combine PASS can not combine extension of both 
operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and 
then combine
the other. The combine flow is as follows:

Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (reg 0) (reg 3))

First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))

Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))

So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL 
pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2.
Hmm, something doesn't make sense here.  Combine knows how to do a 3->1 
combination.  I would expect to see the first step fail (substituting 
just one operand), then a later step try to combine all three 
instructions, substituting the extension for both input operands.


Can you pass along the .combine dump from the failing case?

Jeff


Re: [PATCH] Mark asm goto with outputs as volatile

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/27/23 11:23, Andrew Pinski via Gcc-patches wrote:

On Tue, Jun 27, 2023 at 12:14 AM Richard Biener via Gcc-patches
 wrote:


On Tue, Jun 27, 2023 at 5:26 AM Andrew Pinski via Gcc-patches
 wrote:


The manual references asm goto as being implicitly volatile already
and that was done when asm goto could not have outputs. When outputs
were added to `asm goto`, only asm goto without outputs were still being
marked as volatile. Now some parts of GCC decide, removing the `asm goto`
is ok if the output is not used, though not updating the CFG (this happens
on both the RTL level and the gimple level). Since the biggest user of `asm 
goto`
is the Linux kernel and they expect them to be volatile (they use them to
copy to/from userspace), we should just mark the inline-asm as volatile.

OK? Bootstrapped and tested on x86_64-linux-gnu.


OK.


Committed to GCC 12 and GCC 13 branches also.
The test should be conditional on target lra since we don't support asm 
goto on the reload targets.


It looks like gcc.dg/pr108095.c needs similar adjustment.  Consider a 
patch to make those adjustments pre-approved.


jeff


RE: FW: [PATCH v5 0/19] Support early break/return auto-vectorization

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi Juzhe,

> 
> Hi, Tamar.
> 
> This is an amazing auto-vectorization flow.
> 
> I am thinking about whether RVV can also get benefits from this optimization.
> IMHO, RVV should be also using this flow.
> 
> So, to allow RVV  (target uses len as loop_control and mask as flow control), 
> I
> am not sure whether we can do this (Feel free to correct me if I am wrong):
> 
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> + vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type,
> NULL);
> 
> Maybe it can be ?
> 
> if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) {
>   if (mask_loop_p)
>  vect_record_loop_mask
>else
>  vect_record_loop_len
> }
> 

Yeah, that should be the only change required,  I started this patch before the 
loop_len change
made it in and just rebased recently 

> 
> +  tree cond = gimple_assign_lhs (new_stmt);
> +  if (masked_loop_p)
> +{
> +  tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> truth_type, 0);
> +  cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +_gsi);
> +}
> +
> +  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
> + build_zero_cst (truth_type));
> 
> From my understanding, you are using final_mask = loop_mask (WHILE_ULT)
> && control_mask (comparison).
> Then Test final_mask using NE_EXPR. Am I right?

Yeah that's right, It's creating the mask for partial iterations.  The only 
other constraint is
being able to reduce a boolean mask using inclusive OR,  but that's optional 
and is only
needed if one side of the comparison produces more than 1 copy (so it's only 
checked then).

> 
> For RVV, I thinking whether we can have a good way to do this testing.
> Not sure whether we can have something like LEN_TEST_MASK_NE (loop_len,
> control_mask...)
> 

Hmm Is just the vect_record_loop_len change not enough? (I haven't followed the 
masking
implementation in RVV in detail) but I assume that it's following the general 
principle than
& an operation with a mask creates a masked operation?

That is to say, I thought LOOP_LEN was only for the loop control? Which doesn't 
change here.

> I am not saying that we should support "early break" auto-vectorization for
> RVV (loop_len && control_mask).
> I am just write some comments trying to figure out how I can adapt your
> working for RVV in the future.
> 

Yes happy to help, the more uses it gets the more bugs I can fix 

Cheers,
Tamar

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
> 
> From: Li, Pan2
> Date: 2023-06-28 22:21
> To: juzhe.zh...@rivai.ai
> Subject: FW: [PATCH v5 0/19] Support early break/return auto-vectorization
> FYI.
> 
> -Original Message-
> From: Gcc-patches 
> On Behalf Of Tamar Christina via Gcc-patches
> Sent: Wednesday, June 28, 2023 9:41 PM
> To: gcc-patches@gcc.gnu.org
> Cc: n...@arm.com; rguent...@suse.de; j...@ventanamicro.com
> Subject: [PATCH v5 0/19] Support early break/return auto-vectorization
> 
> Hi All,
> 
> This patch adds initial support for early break vectorization in GCC.
> The support is added for any target that implements a vector cbranch optab,
> this includes both fully masked and non-masked targets.
> 
> Depending on the operation, the vectorizer may also require support for
> boolean mask reductions using Inclusive OR.  This is however only checked
> then the comparison would produce multiple statements.
> 
> Concretely the kind of loops supported are of the forms:
> 
> for (int i = 0; i < N; i++)
> {
>
>if ()
>  {
>...
>;
>  }
>
> }
> 
> where  can be:
> - break
> - return
> - goto
> 
> Any number of statements can be used before the  occurs.
> 
> Since this is an initial version for GCC 14 it has the following limitations 
> and
> features:
> 
> - Only fixed sized iterations and buffers are supported.  That is to say any
>   vectors loaded or stored must be to statically allocated arrays with known
>   sizes. N must also be known.  This limitation is because our primary target
>   for this optimization is SVE.  For VLA SVE we can't easily do cross page
>   iteraion checks. The result is likely to also not be beneficial. For that
>   reason we punt support for variable buffers till we have First-Faulting
>   support in GCC.
> - any stores in  should not be to the same objects as in
>   .  Loads are fine as long as they don't have the possibility to
>   alias.  More concretely, we block RAW dependencies when the intermediate
> value
>   can't be separated fromt the store, or the store itself can't be moved.
> - The number of loop iterations must be known,  this is just a temporarily
>   limitation that I intend to address in GCC 14 itself as follow on patches.
> - Prologue peeling, alignment peelinig and loop versioning are supported.
> - Fully masked loops, unmasked loops and partially masked loops are
> supported
> - Any number of loop early exits are supported.
> - The early exit must be before 

[PATCH] Relax type-printer regexp in libstdc++ test suite

2023-06-28 Thread Tom Tromey via Gcc-patches
The libstdc++ test suite checks whether gdb type printers are
available like so:

set do_whatis_tests [gdb_batch_check "python print(gdb.type_printers)" \
   "\\\[\\\]"]

This regexp assumes that the list of printers is empty.  However,
sometimes it's convenient to ship a gdb that comes with some default
printers, causing this to erroneously report that gdb is "too old".

I believe the intent of this check is to ensure that gdb.type_printers
exists -- not to check its starting value.  This patch changes the
check to accept any Python list as output.

Note that the patch doesn't look for the trailing "]".  I tried this
but in my case the output was too long for expect.  It seemed fine to
just check the start, as the point really is to reject the case where
the command prints an error message.

* testsuite/lib/gdb-test.exp (gdb-test): Relax type-printer
regexp.
---
 libstdc++-v3/testsuite/lib/gdb-test.exp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/lib/gdb-test.exp 
b/libstdc++-v3/testsuite/lib/gdb-test.exp
index 3728a060aa4..d8e572ef7b3 100644
--- a/libstdc++-v3/testsuite/lib/gdb-test.exp
+++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
@@ -107,8 +107,12 @@ proc gdb-test { marker {selector {}} {load_xmethods 0} } {
}
 }
 
+# A very old version of gdb will not have the type_printers
+# global.  Some organizations may ship a gdb that has some default
+# type printers, so accept any list output as indication that the
+# global exists.
 set do_whatis_tests [gdb_batch_check "python print(gdb.type_printers)" \
-  "\\\[\\\]"]
+  "\\\[.+"]
 if {!$do_whatis_tests} {
send_log "skipping 'whatis' tests - gdb too old"
 }
-- 
2.40.1



[committed] d: Fix d_signed_or_unsigned_type is invoked for vector types (PR110193)

2023-06-28 Thread Iain Buclaw via Gcc-patches
Hi,

The function being changed in this patch can be invoked on VECTOR_TYPE,
but the implementation assumes it works on integer types only.

To fix, added a check whether the type passed is any `__vector(T)' or
non-integral type, and return early by calling
`signed_or_unsigned_type_for()' instead.

Problem was found by instrumenting TYPE_PRECISION and ICEing when
applied on VECTOR_TYPEs.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
PR d/110193

gcc/d/ChangeLog:

* types.cc (d_signed_or_unsigned_type): Handle being called with any
vector or non-integral type.
---
 gcc/d/types.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/d/types.cc b/gcc/d/types.cc
index a4c05bfb75f..bdf07f83d4b 100644
--- a/gcc/d/types.cc
+++ b/gcc/d/types.cc
@@ -49,8 +49,8 @@ along with GCC; see the file COPYING3.  If not see
 static tree
 d_signed_or_unsigned_type (int unsignedp, tree type)
 {
-  if (TYPE_UNSIGNED (type) == (unsigned) unsignedp)
-return type;
+  if (VECTOR_TYPE_P (type) || !ANY_INTEGRAL_TYPE_P (type))
+return signed_or_unsigned_type_for (unsignedp, type);
 
   if (TYPE_PRECISION (type) == TYPE_PRECISION (d_cent_type))
 return unsignedp ? d_ucent_type : d_cent_type;
-- 
2.39.2



Re: [PATCH] c++: fix error reporting routines re-entered ICE [PR110175]

2023-06-28 Thread Jason Merrill via Gcc-patches

On 6/23/23 18:25, Marek Polacek wrote:

Here we get the "error reporting routines re-entered" ICE because
of an unguarded use of warning_at.  While at it, I added a check
for a warning_at just above it.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


PR c++/110175

gcc/cp/ChangeLog:

* typeck.cc (cp_build_unary_op): Check tf_warning before warning.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-110175.C: New test.
---
  gcc/cp/typeck.cc | 5 +++--
  gcc/testsuite/g++.dg/cpp0x/decltype-110175.C | 6 ++
  2 files changed, 9 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-110175.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index da591dafc8f..859b133a18d 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -7561,7 +7561,8 @@ cp_build_unary_op (enum tree_code code, tree xarg, bool 
noconvert,
/* [depr.volatile.type] "Postfix ++ and -- expressions and
   prefix ++ and -- expressions of volatile-qualified arithmetic
   and pointer types are deprecated."  */
-   if (TREE_THIS_VOLATILE (arg) || CP_TYPE_VOLATILE_P (TREE_TYPE (arg)))
+   if ((TREE_THIS_VOLATILE (arg) || CP_TYPE_VOLATILE_P (TREE_TYPE (arg)))
+   && (complain & tf_warning))
  warning_at (location, OPT_Wvolatile,
  "%qs expression of %-qualified type is "
  "deprecated",
@@ -7592,7 +7593,7 @@ cp_build_unary_op (enum tree_code code, tree xarg, bool 
noconvert,
return error_mark_node;
  }
/* Otherwise, [depr.incr.bool] says this is deprecated.  */
-   else
+   else if (complain & tf_warning)
  warning_at (location, OPT_Wdeprecated,
  "use of an operand of type %qT "
  "in % is deprecated",
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype-110175.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype-110175.C
new file mode 100644
index 000..39643cafcf8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype-110175.C
@@ -0,0 +1,6 @@
+// PR c++/110175
+// { dg-do compile { target c++11 } }
+
+template auto f(T t) -> decltype(++t) { return t; } // { dg-warning "reference" 
"" { target c++14_down } }
+void f(...) {}
+void g() { f(true); }

base-commit: 5388a43f6a3f348929292998bd6d0c1da6f006de




Re: [PATCH] c++: redundant targ coercion for var/alias tmpls

2023-06-28 Thread Jason Merrill via Gcc-patches

On 6/23/23 12:23, Patrick Palka wrote:

On Fri, 23 Jun 2023, Jason Merrill wrote:


On 6/21/23 13:19, Patrick Palka wrote:

When stepping through the variable/alias template specialization code
paths, I noticed we perform template argument coercion twice: first from
instantiate_alias_template / finish_template_variable and again from
tsubst_decl (during instantiate_template).  It should suffice to perform
coercion once.

To that end patch elides this second coercion from tsubst_decl when
possible.  We can't get rid of it completely because we don't always
specialize a variable template from finish_template_variable: we could
also be doing so directly from instantiate_template during variable
template partial specialization selection, in which case the coercion
from tsubst_decl would be the first and only coercion.


Perhaps we should be coercing in lookup_template_variable rather than
finish_template_variable?


Ah yes, there's a patch for that at
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617377.html :)


So after that patch, can we get rid of the second coercion completely?

Jason



Re: [PATCH] c++: ahead of time variable template-id coercion [PR89442]

2023-06-28 Thread Jason Merrill via Gcc-patches

On 6/28/23 11:28, Jason Merrill wrote:

On 5/3/23 09:50, Patrick Palka wrote:

This patch makes us coerce the arguments of a variable template-id ahead
of time, as we do for other template-ids, which allows us to immediately
diagnose template parameter/argument kind mismatches and arity 
mismatches.


Unfortunately this causes a regression in cpp1z/constexpr-if20.C: 
coercing

the variable template-id m ahead of time means we strip it of
typedefs, yielding m::q, typename C::q>, but in this
stripped form we're directly using 'i' and so we expect to have captured
it.  This is PR107437 but with a variable template instead of a class
template.  I'm not sure how to fix this :(


Hmm, it does seem like strip_typedefs needs to be more conservative in a 
lambda.  We can probably stop stripping dependent function-scope 
typedefs in general without breaking things like cpp0x/decltype62.C.


@@ -22097,7 +22099,7 @@ instantiate_template (tree tmpl, tree 
orig_args, tsubst_flags_t complain)

    /* We need to determine if we're using a partial or explicit
   specialization now, because the type of the variable could be
   different.  */
-  tree tid = lookup_template_variable (tmpl, targ_ptr);
+  tree tid = build2 (TEMPLATE_ID_EXPR, NULL_TREE, tmpl, targ_ptr);


Why?  I'd think we want to consider partial specializations based on the 
coerced arguments.


...ah, but presumably we would have already come through 
lookup_template_variable, so we don't need to call it again here.  The 
patch is OK.


Jason



Re: [PATCH] c++: ahead of time variable template-id coercion [PR89442]

2023-06-28 Thread Jason Merrill via Gcc-patches

On 5/3/23 09:50, Patrick Palka wrote:

This patch makes us coerce the arguments of a variable template-id ahead
of time, as we do for other template-ids, which allows us to immediately
diagnose template parameter/argument kind mismatches and arity mismatches.

Unfortunately this causes a regression in cpp1z/constexpr-if20.C: coercing
the variable template-id m ahead of time means we strip it of
typedefs, yielding m::q, typename C::q>, but in this
stripped form we're directly using 'i' and so we expect to have captured
it.  This is PR107437 but with a variable template instead of a class
template.  I'm not sure how to fix this :(


Hmm, it does seem like strip_typedefs needs to be more conservative in a 
lambda.  We can probably stop stripping dependent function-scope 
typedefs in general without breaking things like cpp0x/decltype62.C.



@@ -22097,7 +22099,7 @@ instantiate_template (tree tmpl, tree orig_args, 
tsubst_flags_t complain)
/* We need to determine if we're using a partial or explicit
 specialization now, because the type of the variable could be
 different.  */
-  tree tid = lookup_template_variable (tmpl, targ_ptr);
+  tree tid = build2 (TEMPLATE_ID_EXPR, NULL_TREE, tmpl, targ_ptr);


Why?  I'd think we want to consider partial specializations based on the 
coerced arguments.


Jason



[PATCH 2/2] AArch64: New RTL for ABDL

2023-06-28 Thread Oluwatamilore Adebayo via Gcc-patches
From: oluade01 

This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(vec_widen_abdl_lo_, vec_widen_abdl_hi_):
Expansions for abd vec widen optabs.
(aarch64_abdl_insn): VQW based abdl RTL.
* config/aarch64/iterators.md (USMAX_EXT): Code attributes
that give the appropriate extend RTL for the max RTL.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_run_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.
---
 gcc/config/aarch64/aarch64-simd.md| 65 ++
 gcc/config/aarch64/iterators.md   |  3 +
 gcc/testsuite/gcc.target/aarch64/abd_2.c  | 33 +---
 gcc/testsuite/gcc.target/aarch64/abd_3.c  | 36 +---
 gcc/testsuite/gcc.target/aarch64/abd_4.c  | 34 
 gcc/testsuite/gcc.target/aarch64/abd_none_2.c | 73 
 gcc/testsuite/gcc.target/aarch64/abd_none_3.c | 73 
 gcc/testsuite/gcc.target/aarch64/abd_none_4.c | 84 +++
 gcc/testsuite/gcc.target/aarch64/abd_run_1.c  | 29 +++
 .../gcc.target/aarch64/abd_widen_2.c  | 62 ++
 .../gcc.target/aarch64/abd_widen_3.c  | 62 ++
 .../gcc.target/aarch64/abd_widen_4.c  | 56 +
 gcc/testsuite/gcc.target/aarch64/sve/abd_1.c  | 57 +++--
 gcc/testsuite/gcc.target/aarch64/sve/abd_2.c  | 47 +--
 .../gcc.target/aarch64/sve/abd_none_1.c   | 73 
 .../gcc.target/aarch64/sve/abd_none_2.c   | 80 ++
 16 files changed, 811 insertions(+), 56 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_4.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
bf90202ba2ad3f62f2020486d21256f083effb07..9acf0ab3067a76c0ba49d61e2857558c8482e77d
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -975,6 +975,71 @@ (define_expand "aarch64_abdl2"
   }
 )
 
+(define_insn "aarch64_abdl_hi_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (abs:
+ (minus:
+   (ANY_EXTEND:
+ (vec_select:
+   (match_operand:VQW 1 "register_operand" "w")
+   (match_operand:VQW 3 "vect_par_cnst_hi_half" "")))
+   (ANY_EXTEND:
+ (vec_select:
+   (match_operand:VQW 2 "register_operand" "w")
+   (match_dup 3))]
+  "TARGET_SIMD"
+  "abdl2\t%0., %1., %2."
+  [(set_attr "type" "neon_abd_long")]
+)
+
+(define_insn "aarch64_abdl_lo_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (minus:
+ (USMAX:
+   (:
+ (vec_select:
+   (match_operand:VQW 1 "register_operand" "w")
+   (match_operand:VQW 3 "vect_par_cnst_lo_half" "")))
+   (:
+ (vec_select:
+   (match_operand:VQW 2 "register_operand" "w")
+   (match_dup 3
+ (:
+   (:
+ (vec_select: (match_dup 1) (match_dup 3)))
+   (:
+ (vec_select: (match_dup 2) (match_dup 3))]
+  "TARGET_SIMD"
+  "abdl\t%0., %1., %2."
+  [(set_attr "type" "neon_abd_long")]
+)
+
+(define_expand "vec_widen_abd_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_abdl_hi_internal (operands[0], 
operands[1],
+  operands[2], p));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_abd_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_abdl_lo_internal (operands[0], 
operands[1],
+  operands[2], p));
+DONE;
+  }
+)
+
 (define_insn "aarch64_abal"
   [(set (match_operand: 0 "register_operand" "=w")
(plus:
diff 

Re: [PATCH 2/2] AArch64: New RTL for ABDL

2023-06-28 Thread Oluwatamilore Adebayo via Gcc-patches
> > +(define_insn "aarch64_abdl_hi_internal"
> > +  [(set (match_operand: 0 "register_operand" "=w")
> > +   (minus:
> > + (USMAX:
> > +   (:
> > + (vec_select:
> > +   (match_operand:VQW 1 "register_operand" "w")
> > +   (match_operand:VQW 3 "vect_par_cnst_hi_half" "")))
> > +   (:
> > + (vec_select:
> > +   (match_operand:VQW 2 "register_operand" "w")
> > +   (match_dup 3
> > + (:
> > +   (:
> > + (vec_select: (match_dup 1) (match_dup 3)))
> > +   (:
> > + (vec_select: (match_dup 2) (match_dup 3))]
> > +  "TARGET_SIMD"
> > +  "abdl2\t%0., %1., %2."
> > +  [(set_attr "type" "neon_abd_long")]
> > +)
> 
> We don't need the (minus (max…) (min…)) thing when widening is
> involved.  It should be enough to do something like:
> 
>   (abs:
> (minus:
>   (ANY_EXTEND:
>   (vec_select:…))
>   (ANY_EXTEND:
>   (vec_select:…

Change made.

> Sorry to be awkward, but could you put the widening cases in a separate
> file?  It's not very easy as things stand to work out which tests are
> matched against widening ops and which aren't.

Done.

Patch in next email.


[PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-28 Thread Oluwatamilore Adebayo via Gcc-patches
From: oluade01 

This updates vect_recog_abd_pattern to recognize the widening
variant of absolute difference (ABDL, ABDL2).

gcc/ChangeLog:

* internal-fn.cc (widening_fn_p, decomposes_to_hilo_fn_p):
Add IFN_VEC_WIDEN_ABD to the switch statement.
* internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
* optabs.def (vec_widen_sabd_optab,
vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
vec_widen_uabd_optab,
vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
New optabs.
* tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
to build a VEC_WIDEN_ABD call if the input precision is smaller
than the precision of the output.
(vect_recog_widen_abd_pattern): Should an ABD expression be
found preceeding an extension, replace the two with a
VEC_WIDEN_ABD.
---
 gcc/doc/md.texi   |  11 ++
 gcc/internal-fn.def   |   5 +
 gcc/optabs.def|  10 ++
 gcc/tree-vect-patterns.cc | 205 +-
 4 files changed, 183 insertions(+), 48 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
e11b10d2fca11016232921bc85e47975f700e6c6..2ae6182b925d0cf8950dc830d083cf93baf2eaa1
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5617,6 +5617,17 @@ signed/unsigned elements of size S@.  Subtract the 
high/low elements of 2 from
 1 and widen the resulting elements. Put the N/2 results of size 2*S in the
 output vector (operand 0).
 
+@cindex @code{vec_widen_sabdl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_sabdl_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_uabdl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_uabdl_lo_@var{m}} instruction pattern
+@item @samp{vec_widen_uabdl_hi_@var{m}}, @samp{vec_widen_uabdl_lo_@var{m}}
+@itemx @samp{vec_widen_sabdl_hi_@var{m}}, @samp{vec_widen_sabdl_lo_@var{m}}
+Signed/Unsigned widening absolute difference long.  Operands 1 and 2 are
+vectors with N signed/unsigned elements of size S@.  Find the absolute
+difference between 1 and 2 and widen the resulting elements.  Put the N/2
+results of size 2*S in the output vector (operand 0).
+
 @cindex @code{vec_addsub@var{m}3} instruction pattern
 @item @samp{vec_addsub@var{m}3}
 Alternating subtract, add with even lanes doing subtract and odd
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
116965f4830cec8f60642ff011a86b6562e2c509..d67274d68b49943a88c531e903fd03b42343ab97
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -352,6 +352,11 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS,
first,
vec_widen_ssub, vec_widen_usub,
binary)
+DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
+   ECF_CONST | ECF_NOTHROW,
+   first,
+   vec_widen_sabd, vec_widen_uabd,
+   binary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
35b835a6ac56d72417dac8ddfd77a8a7e2475e65..68dfa1550f791a2fe833012157601ecfa68f1e09
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -418,6 +418,11 @@ OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
 OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
 OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a")
 OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a")
+OPTAB_D (vec_widen_sabd_optab, "vec_widen_sabd_$a")
+OPTAB_D (vec_widen_sabd_hi_optab, "vec_widen_sabd_hi_$a")
+OPTAB_D (vec_widen_sabd_lo_optab, "vec_widen_sabd_lo_$a")
+OPTAB_D (vec_widen_sabd_odd_optab, "vec_widen_sabd_odd_$a")
+OPTAB_D (vec_widen_sabd_even_optab, "vec_widen_sabd_even_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -436,6 +441,11 @@ OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
 OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
 OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a")
 OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a")
+OPTAB_D (vec_widen_uabd_optab, "vec_widen_uabd_$a")
+OPTAB_D (vec_widen_uabd_hi_optab, "vec_widen_uabd_hi_$a")
+OPTAB_D (vec_widen_uabd_lo_optab, "vec_widen_uabd_lo_$a")
+OPTAB_D (vec_widen_uabd_odd_optab, "vec_widen_uabd_odd_$a")
+OPTAB_D (vec_widen_uabd_even_optab, "vec_widen_uabd_even_$a")
 OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
 OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
 OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 

Re: [PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-28 Thread Oluwatamilore Adebayo via Gcc-patches
> The new optabs need to be documented in doc/md.texi.

Done.

> “Long” is a bit of an architecture-specific term.  Maybe just:
> 
>Try to find the following ABsolute Difference (ABD) or
>widening ABD (WIDEN_ABD) pattern:

Change made.

> >> - VTYPE x, y, out;
> >> + VTYPE x, y;
> >> + WTYPE out;
> >>   type diff;
> >> loop i in range:
> >>   S1 diff = x[i] - y[i]
> >>   S2 out[i] = ABS_EXPR ;
> >>  
> >> -   where 'type' is a integer and 'VTYPE' is a vector of integers
> >> -   the same size as 'type'
> >> +   where 'VTYPE' and 'WTYPE' are vectors of integers.
> >> +   'WTYPE' may be wider than 'VTYPE'.
> >> +   'type' is as wide as 'WTYPE'.
> >
> > I don't think the existing comment is right about the types.  What we're
> > matching is scalar code, so VTYPE and (now) WTYPE are integers rather
> > than vectors of integers.
> 
> Gah, sorry, I realise now that the point was that VTYPE and WTYPE
> are sequences rather than scalars.  But patterns are used for SLP
> as well as loops, and the inputs and outputs might not be memory
> objects.  So:
> 
> > I think it would be clearer to write:
> >
> >S1 diff = (type) x[i] - (type) y[i]
> >S2 out[i] = ABS_EXPR <(WTYPE) diff>;
> >
> > since the promotions happen on the operands.
> >
> > It'd be good to keep the part about 'type' being an integer.
> >
> > Rather than:
> >
> >  'WTYPE' may be wider than 'VTYPE'.
> >  'type' is as wide as 'WTYPE'.
> >
> > maybe:
> >
> >  'type' is no narrower than 'VTYPE' (but may be wider)
> >  'WTYPE' is no narrower than 'type' (but may be wider)
> 
> ...how about:
> 
>   TYPE1 x;
>   TYPE2 y;
>   TYPE3 x_cast = (TYPE3) x;  // widening or no-op
>   TYPE3 y_cast = (TYPE3) y;  // widening or no-op
>   TYPE3 diff = x_cast - y_cast;
>   TYPE4 diff_cast = (TYPE4) diff;// widening or no-op
>   TYPE5 abs = ABS(U)_EXPR ;
> 
> (based on the comment above vect_recog_widen_op_pattern).

Done.

> WTYPE can't be narrower than VTYPE though.  I think with the changes
> suggested above, the text before this block describes the conditions
> in enough detail, and so we can just say:
> 
>WIDEN_ABD exists to optimize the case where WTYPE is at least twice as
>wide as VTYPE.

Change made.

> SABD_EXPR/UABD_EXPR should be IFN_ABD
> SABDL_EXPR/UABDL_EXPR should be IFN_WIDEN_ABD

Change made.

> Maybe it would be easier to remove this comment, since I think the
> comment above the function says enough.

Done.

> Rather than have the "extend" variable, how about:
> 
> >  
> > -  vect_pattern_detected ("vect_recog_abd_pattern", last_stmt);
> > +  tree vectype_in = get_vectype_for_scalar_type (vinfo, abd_in_type);
> > +  tree vectype_out = get_vectype_for_scalar_type (vinfo, abd_out_type);
> > +  if (!vectype_in || !vectype_out)
> > +return NULL;
> >  
> > -  if (!vectype
> > -  || !direct_internal_fn_supported_p (IFN_ABD, vectype,
> > +  if (ifn == IFN_VEC_WIDEN_ABD)
> > +{
> > +  code_helper dummy_code;
> > +  int dummy_int;
> > +  auto_vec dummy_vec;
> > +  if (!supportable_widening_operation (vinfo, ifn, stmt_vinfo,
> > +  vectype_out, vectype_in,
> > +  _code, _code,
> > +  _int, _vec))
> > +   {
> > + /* There are architectures that have the ABD instructions
> > +but not the ABDL instructions.  If we just return NULL here
> > +we will miss an occasion where we should have used ABD.
> > +So we change back to ABD and try again.  */
> > + ifn = IFN_ABD;
> > + abd_out_type = abd_in_type;
> > + extend = true;
> > +   }
> > +}
> 
> making this:
> 
>   if (TYPE_PRECISION (out_type) >= TYPE_PRECISION (abd_in_type) * 2)
> {
>   tree mid_type
>   = build_nonstandard_integer_type (TYPE_PRECISION (abd_in_type) * 2,
> TYPE_UNSIGNED (abd_in_type));
>   tree mid_vectype = get_vectype_for_scalar_type (vinfo, mid_vectype);
>   code_helper dummy_code;
>   int dummy_int;
>   auto_vec dummy_vec;
>   if (mid_vectype
> && supportable_widening_operation (vinfo, IFN_WIDEN_ABD, stmt_vinfo,
>mid_vectype, vectype_in,
>_code, _code,
>_int, _vec))
>   {
> ifn = IFN_WIDEN_ABD;
> abd_out_type = mid_type;
> vectype_out = mid_vectype;
>   }
> }
> 
> The idea is to keep the assumption that we're using IFN_ABD
> until we've proven conclusively otherwise.
> 
> I think the later:
> 
>   if (!extend)
> return abd_stmt;
> 
> should then be deleted, since we should still use vect_convert_output
> if abd_out_type has a different sign from out_type.
> 
> .
> 
> And this condition would then be:
> 
>   if (TYPE_PRECISION (abd_out_type) == TYPE_PRECISION 

Re: [PATCH] Introduce hardbool attribute for C

2023-06-28 Thread Qing Zhao via Gcc-patches


> On Jun 28, 2023, at 3:26 AM, Alexandre Oliva  wrote:
> 
> I'd probably have arranged for the front-end to create the initializer
> value, because expansion time is too late to figure it out: we may not
> even have the front-end at hand any more, in case of lto compilation.
>>> 
 Is the hardbool attribute information available during the rtl expansion 
 phase?
>>> 
>>> It is in the sense that the attribute lives on, but c_hardbool_type_attr
>>> is a frontend function, it cannot be called from e.g. lto1.
>> does lookup_attribute work for this attribute during rtl expansion?
>> (Still a little confusing here)
> 
> Yes, the hardbool attribute would be there in C.
> But not in Ada.
> 
> And that should be fine, because Ada hardbool is handled entirely in the
> frontend, as are non-hardened booleans with representation clauses, that
> become enumeration types without any distinguishing feature.
> 
>>> The hardbool attribute is also implemented in Ada, but there it only
>>> affects validity checking in the front end: Boolean types in Ada are
>>> Enumeration types, and there is standard syntax to specify the
>>> representations for true and false.  AFAICT, once we translate GNAT IR
>>> to GNU IR, hardened booleans would not be recognizable as boolean types.
>>> Even non-hardened booleans with representation clauses would.
> 
>> So, right now, the GNU IR represents Ada’s boolean type as enumeration type? 
> 
> All Ada boolean types are defined by the language as enumeration types:
> 
>  There is a predefined enumeration type named Boolean, [declared in the
>  visible part of package Standard].  It has the two enumeration
>  literals False and True ordered with the relation False < True.  Any
>  descendant of the predefined type Boolean is called a boolean type.
> 
> However, boolean types without representation clauses are mapped to the
> language-independent boolean_type_node.  Those that do are mapped to
> enumeration types.

>>> So
>>> handling these differently from other enumeration types, to make them
>>> closer to booleans, would be a bit of a challenge,
> 
>> is there any special handling in GNU IR when representing Ada’s
>> boolean type as enumeration type?
>> Any issue right now?
> 
> Not that I'm aware of.  The front end takes care of converting between
> non-boolean_type_node enumeration types and boolean_type_node as needed,
> so that the GNU IR needs no extra information.

In summary, Ada’s Boolean variables (whether it’s hardened or not) are 
represented as
enumeration types in GNU IR. FE takes care of the converting between 
non-boolean_type_node enumeration types and boolean_type_node as needed, 
no special handling in Middle end. 

So, is this exactly the same situation as the new hardbool attribute for C 
being implemented in 
this patch?

(Another question, for Ada’s Boolean variables, does the ada FE also insert 
BUILT_IN_TRAP when
  The value is neither true_value nor false_value?)

>>> and a
>>> backwards-compatibility issue (because such booleans have already been
>>> handled in the present way since the introduction of -ftrivial-* back in
>>> GCC12)
> 
>> With the new hardbool attribute added for C, an original bool type
>> becomes an enumeration type logically,
> 
> There's no change to the original bool type.
> 
> Only hardbool types are represented as enumeration types in C.
> 
> In Ada, boolean types with representation clauses are still represented
> as enumeration types, whether or not they're hardbool.
> 
>> But such information is not passed to middle end through GNU IR, So,
>> in GCC middle-end, We still treat such type as boolean, not an
>> enumeration type.
> 
> The middle-end doesn't know (and ATM cannot know) that those represented
> as enumeration types are conceptually booleans, so they are treated as
> enumeration types, not as booleans.
They should know it’s a boolean if using the lookup_attribute to get the 
attribute info -:)
> 
>>> static hbool zeroinit; /* False, stored as (char)0x5a.  */
>>> auto hbool uninit; /* Undefined, may trap.  */
> 
>> For the hardbool variable "uninit", -ftrivial-auto-var-init=zero will
>> initialize it to zero, and it will trap during runtime.
>> And at the same time, -ftrivial-auto-var-init=pattern will initialize
>> it to 0xfe, and it will trap during runtime, too.
> 
>> I think these are good behaviors, just need to be documented. 
> 
> You mean more than what's in the patch posted last week?
No, the updated doc is good I think.

> 
>>> 
 And this is a very reasonable initial value for Boolean variables,
>>> 
>>> Agreed.  The all-zeros bit pattern is not so great for booleans that use
>>> alternate representations, though, such as the following standard Ada:
>>> 
>>> type MyBool is new Boolean;
>>> for MyBool use (16#5a#, 16#a5#);
>>> for MyBool'Size use 8;
>>> 
>>> or for biased variables such as:
>>> 
>>> X : Integer range 254 .. 507;
>>> for X'Size use 8; -- bits, so a biased representation is required.

Re: FW: [PATCH v5 0/19] Support early break/return auto-vectorization

2023-06-28 Thread 钟居哲
Hi, Tamar.

This is an amazing auto-vectorization flow.

I am thinking about whether RVV can also get benefits from this optimization.
IMHO, RVV should be also using this flow.

So, to allow RVV  (target uses len as loop_control and mask as flow control),
I am not sure whether we can do this (Feel free to correct me if I am wrong):

+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+   vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);

Maybe it can be ?

if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) {
  if (mask_loop_p)
 vect_record_loop_mask
   else
 vect_record_loop_len
}


+  tree cond = gimple_assign_lhs (new_stmt);
+  if (masked_loop_p)
+{
+  tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, 
truth_type, 0);
+  cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+  _gsi);
+}
+
+  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
+   build_zero_cst (truth_type));

From my understanding, you are using final_mask = loop_mask (WHILE_ULT) && 
control_mask (comparison).
Then Test final_mask using NE_EXPR. Am I right?

For RVV, I thinking whether we can have a good way to do this testing.
Not sure whether we can have something like LEN_TEST_MASK_NE (loop_len, 
control_mask...)

I am not saying that we should support "early break" auto-vectorization for RVV 
(loop_len && control_mask).
I am just write some comments trying to figure out how I can adapt your working 
for RVV in the future.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-06-28 22:21
To: juzhe.zh...@rivai.ai
Subject: FW: [PATCH v5 0/19] Support early break/return auto-vectorization
FYI.
 
-Original Message-
From: Gcc-patches  On Behalf 
Of Tamar Christina via Gcc-patches
Sent: Wednesday, June 28, 2023 9:41 PM
To: gcc-patches@gcc.gnu.org
Cc: n...@arm.com; rguent...@suse.de; j...@ventanamicro.com
Subject: [PATCH v5 0/19] Support early break/return auto-vectorization
 
Hi All,
 
This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.
 
Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR.  This is however only checked then the
comparison would produce multiple statements.
 
Concretely the kind of loops supported are of the forms:
 
for (int i = 0; i < N; i++)
{
   
   if ()
 {
   ...
   ;
 }
   
}
 
where  can be:
- break
- return
- goto
 
Any number of statements can be used before the  occurs.
 
Since this is an initial version for GCC 14 it has the following limitations and
features:
 
- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC.
- any stores in  should not be to the same objects as in
  .  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- The number of loop iterations must be known,  this is just a temporarily
  limitation that I intend to address in GCC 14 itself as follow on patches.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- The early exit must be before the natural loop exit/latch.  The vectorizer is
  designed in way to propage phi-nodes downwards.  As such supporting this
  inverted control flow is hard.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Epilogue vectorization would also not be profitable.
- Early breaks are only supported for inner loop vectorization.
 
I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
 
With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.
 
This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in .
 
niters analysis and the majority of the vectorizer with hardcoded single_exit
have been updated 

Re: [testsuite] tolerate enabled but missing language frontends

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/28/23 05:25, Alexandre Oliva via Gcc-patches wrote:


When a language is enabled but we run the testsuite against a tree in
which the frontend compiler is not present, help.exp fails.  It
recognizes the output pattern for a disabled language, but not a
missing frontend.  Extend the pattern so that it covers both cases.

Tested on x86_64-linux-gnu.  Ok to install?


for  gcc/testsuite/ChangeLog

* lib/options.exp (check_for_options_with_filter): Handle
missing frontend compiler like disabled language.

ok
Jeff


[committed] final+varasm: Change return type of predicate functions from int to bool

2023-06-28 Thread Uros Bizjak via Gcc-patches
Also change some internal variables to bool and change return type of
compute_alignments to void.

gcc/ChangeLog:

* output.h (leaf_function_p): Change return type from int to bool.
(final_forward_branch_p): Ditto.
(only_leaf_regs_used): Ditto.
(maybe_assemble_visibility): Ditto.
* varasm.h (supports_one_only): Ditto.
* rtl.h (compute_alignments): Change return type from int to void.
* final.cc (app_on): Change return type from int to bool.
(compute_alignments): Change return type from int to void
and adjust function body accordingly.
(shorten_branches):  Change "something_changed" variable
type from int to bool.
(leaf_function_p):  Change return type from int to bool
and adjust function body accordingly.
(final_forward_branch_p): Ditto.
(only_leaf_regs_used): Ditto.
* varasm.cc (contains_pointers_p): Change return type from
int to bool and adjust function body accordingly.
(compare_constant): Ditto.
(maybe_assemble_visibility): Ditto.
(supports_one_only): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/final.cc b/gcc/final.cc
index e614491a69a..dd3e22547ac 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -163,9 +163,9 @@ static int insn_counter = 0;
 
 static int block_depth;
 
-/* Nonzero if have enabled APP processing of our assembler output.  */
+/* True if have enabled APP processing of our assembler output.  */
 
-static int app_on;
+static bool app_on;
 
 /* If we are outputting an insn sequence, this contains the sequence rtx.
Zero otherwise.  */
@@ -603,7 +603,7 @@ insn_current_reference_address (rtx_insn *branch)
 
 /* Compute branch alignments based on CFG profile.  */
 
-unsigned int
+void
 compute_alignments (void)
 {
   basic_block bb;
@@ -617,7 +617,7 @@ compute_alignments (void)
 
   /* If not optimizing or optimizing for size, don't assign any alignments.  */
   if (! optimize || optimize_function_for_size_p (cfun))
-return 0;
+return;
 
   if (dump_file)
 {
@@ -721,7 +721,6 @@ compute_alignments (void)
 
   loop_optimizer_finalize ();
   free_dominance_info (CDI_DOMINATORS);
-  return 0;
 }
 
 /* Grow the LABEL_ALIGN array after new labels are created.  */
@@ -790,7 +789,8 @@ public:
   /* opt_pass methods: */
   unsigned int execute (function *) final override
   {
-return compute_alignments ();
+compute_alignments ();
+return 0;
   }
 
 }; // class pass_compute_alignments
@@ -822,7 +822,7 @@ shorten_branches (rtx_insn *first)
   int max_uid;
   int i;
   rtx_insn *seq;
-  int something_changed = 1;
+  bool something_changed = true;
   char *varying_length;
   rtx body;
   int uid;
@@ -1103,7 +1103,7 @@ shorten_branches (rtx_insn *first)
 
   while (something_changed)
 {
-  something_changed = 0;
+  something_changed = false;
   insn_current_align = MAX_CODE_ALIGN - 1;
   for (insn_current_address = 0, insn = first;
   insn != 0;
@@ -1136,7 +1136,7 @@ shorten_branches (rtx_insn *first)
{
  log = newlog;
  LABEL_TO_ALIGNMENT (insn) = log;
- something_changed = 1;
+ something_changed = true;
}
}
}
@@ -1274,7 +1274,7 @@ shorten_branches (rtx_insn *first)
   * GET_MODE_SIZE (table->get_data_mode ()));
  insn_current_address += insn_lengths[uid];
  if (insn_lengths[uid] != old_length)
-   something_changed = 1;
+   something_changed = true;
}
 
  continue;
@@ -1332,7 +1332,7 @@ shorten_branches (rtx_insn *first)
  if (!increasing || inner_length > insn_lengths[inner_uid])
{
  insn_lengths[inner_uid] = inner_length;
- something_changed = 1;
+ something_changed = true;
}
  else
inner_length = insn_lengths[inner_uid];
@@ -1358,7 +1358,7 @@ shorten_branches (rtx_insn *first)
  && (!increasing || new_length > insn_lengths[uid]))
{
  insn_lengths[uid] = new_length;
- something_changed = 1;
+ something_changed = true;
}
  else
insn_current_address += insn_lengths[uid] - new_length;
@@ -4043,9 +4043,9 @@ asm_fprintf (FILE *file, const char *p, ...)
   va_end (argptr);
 }
 
-/* Return nonzero if this function has no function calls.  */
+/* Return true if this function has no function calls.  */
 
-int
+bool
 leaf_function_p (void)
 {
   rtx_insn *insn;
@@ -4056,29 +4056,29 @@ leaf_function_p (void)
   /* Some back-ends (e.g. s390) want leaf functions to stay leaf
  functions even if they call mcount.  */
   if (crtl->profile && 

Re: [PATCH] tree-optimization/110434 - avoid ={v} {CLOBBER} from NRV

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/28/23 04:21, Richard Biener via Gcc-patches wrote:

When NRV replaces a local variable with  it also replaces
occurences in clobbers.  This leads to  being clobbered
before the return of it which is strictly invalid but harmless in
practice since there's no pass after NRV which would remove
earlier stores.

The following fixes this nevertheless.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/110434
* tree-nrv.cc (pass_nrv::execute): Remove CLOBBERs of
VAR we replace with .

OK.
jeff


Re: [PATCH] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-28 Thread Philipp Tomsich
Thanks! Applied to master with the requested changes as
417b8379b32945d61f1ce3d8281bee063eea1937.
Note that the final version factors out the duplicated logic, so we
now have a single place to add the comments.

Philipp.


On Sun, 25 Jun 2023 at 06:09, Jeff Law  wrote:
>
>
>
> On 6/22/23 05:11, Philipp Tomsich wrote:
> > From: Manolis Tsamis 
> >
> > Fixes: 6a2e8dcbbd4bab3
> >
> > Propagation for the stack pointer in regcprop was enabled in
> > 6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
> > stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).
> >
> > This fix adds special handling for stack_pointer_rtx in the places
> > where maybe_mode_change is called. This also adds an check in
> > maybe_mode_change to return the stack pointer only when the requested
> > mode matches the mode of stack_pointer_rtx.
> >
> >   PR 110308
> Should be
> PR debug/110308
>
>
> >
> > gcc/ChangeLog:
> >
> >   * regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
> >   (find_oldest_value_reg): Special handling of stack_pointer_rtx.
> >   (copyprop_hardreg_forward_1): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/torture/pr110308.C: New test.
> I don't doubt the need for the special handling of the stack pointer,
> but it's not obvious why it's needed.  So my request is that both humks
> which specialize handling of ORIGINAL_REGNO, REG_ATTRS & REG_POINTER
> have a comment indicating why we must not adjust those values when
> NEW_RTX is STACK_POINTER_RTX.
>
> OK with that change.
>
> Jeff


[COMMITTED, PR 110308] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-28 Thread Philipp Tomsich
From: Manolis Tsamis 

Fixes: 6a2e8dcbbd4bab3

Propagation for the stack pointer in regcprop was enabled in
6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).

This fix adds special handling for stack_pointer_rtx in the places
where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.

PR debug/110308

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
(maybe_copy_reg_attrs): New function.
(find_oldest_value_reg): Use maybe_copy_reg_attrs.
(copyprop_hardreg_forward_1): Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr110308.C: New test.

Signed-off-by: Manolis Tsamis 
Signed-off-by: Philipp Tomsich 

---

 gcc/regcprop.cc | 52 +
 gcc/testsuite/g++.dg/torture/pr110308.C | 29 ++
 2 files changed, 65 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110308.C

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index 6cbfadb181f..d28a4d5aca8 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -423,7 +423,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
  It's unclear if we need to do the same for other special registers.  */
   if (regno == STACK_POINTER_REGNUM)
 {
-  if (orig_mode == new_mode)
+  if (orig_mode == new_mode && new_mode == GET_MODE (stack_pointer_rtx))
return stack_pointer_rtx;
   else
return NULL_RTX;
@@ -451,6 +451,31 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
   return NULL_RTX;
 }
 
+/* Helper function to copy attributes when replacing OLD_REG with NEW_REG.
+   If the changes required for NEW_REG are invalid return NULL_RTX, otherwise
+   return NEW_REG.  This is intended to be used with maybe_mode_change.  */
+
+static rtx
+maybe_copy_reg_attrs (rtx new_reg, rtx old_reg)
+{
+  if (new_reg != stack_pointer_rtx)
+{
+  /* NEW_REG is assumed to be a register copy resulting from
+maybe_mode_change.  */
+  ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (old_reg);
+  REG_ATTRS (new_reg) = REG_ATTRS (old_reg);
+  REG_POINTER (new_reg) = REG_POINTER (old_reg);
+}
+  else if (REG_POINTER (new_reg) != REG_POINTER (old_reg))
+{
+  /* Only a single instance of STACK_POINTER_RTX must exist and we cannot
+modify it.  Allow propagation if REG_POINTER for OLD_REG matches and
+don't touch ORIGINAL_REGNO and REG_ATTRS.  */
+  return NULL_RTX;
+}
+  return new_reg;
+}
+
 /* Find the oldest copy of the value contained in REGNO that is in
register class CL and has mode MODE.  If found, return an rtx
of that oldest register, otherwise return NULL.  */
@@ -486,12 +511,7 @@ find_oldest_value_reg (enum reg_class cl, rtx reg, struct 
value_data *vd)
 
   new_rtx = maybe_mode_change (oldmode, vd->e[regno].mode, mode, i, regno);
   if (new_rtx)
-   {
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
- REG_ATTRS (new_rtx) = REG_ATTRS (reg);
- REG_POINTER (new_rtx) = REG_POINTER (reg);
- return new_rtx;
-   }
+   return maybe_copy_reg_attrs (new_rtx, reg);
 }
 
   return NULL_RTX;
@@ -965,15 +985,15 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
 
  if (validate_change (insn, _SRC (set), new_rtx, 0))
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
- REG_ATTRS (new_rtx) = REG_ATTRS (src);
- REG_POINTER (new_rtx) = REG_POINTER (src);
- if (dump_file)
-   fprintf (dump_file,
-"insn %u: replaced reg %u with %u\n",
-INSN_UID (insn), regno, REGNO (new_rtx));
- changed = true;
- goto did_replacement;
+ if (maybe_copy_reg_attrs (new_rtx, src))
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"insn %u: replaced reg %u with %u\n",
+INSN_UID (insn), regno, REGNO (new_rtx));
+ changed = true;
+ goto did_replacement;
+   }
}
  /* We need to re-extract as validate_change clobbers
 recog_data.  */
diff --git a/gcc/testsuite/g++.dg/torture/pr110308.C 
b/gcc/testsuite/g++.dg/torture/pr110308.C
new file mode 100644
index 000..36c6d382121
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr110308.C
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+
+int channelCount, decodeBlock_outputLength;
+struct BlockCodec {
+  

[PATCH][vs] tree-optimization/110434 - avoid ={v} {CLOBBER} from NRV

2023-06-28 Thread Richard Biener via Gcc-patches
When NRV replaces a local variable with  it also replaces
occurences in clobbers.  This leads to  being clobbered
before the return of it which is strictly invalid but harmless in
practice since there's no pass after NRV which would remove
earlier stores.

The following fixes this nevertheless.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110434
* tree-nrv.cc (pass_nrv::execute): Remove CLOBBERs of
VAR we replace with .
---
 gcc/tree-nrv.cc | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-nrv.cc b/gcc/tree-nrv.cc
index ff47439647c..99c4e21a842 100644
--- a/gcc/tree-nrv.cc
+++ b/gcc/tree-nrv.cc
@@ -264,7 +264,17 @@ pass_nrv::execute (function *fun)
  data.modified = 0;
  walk_gimple_op (stmt, finalize_nrv_r, );
  if (data.modified)
-   update_stmt (stmt);
+   {
+ /* If this is a CLOBBER of VAR, remove it.  */
+ if (gimple_clobber_p (stmt))
+   {
+ unlink_stmt_vdef (stmt);
+ gsi_remove (, true);
+ release_defs (stmt);
+ continue;
+   }
+ update_stmt (stmt);
+   }
  gsi_next ();
}
}
-- 
2.35.3


RE: [PATCH 9/19] middle-end: refactor vectorizable_comparison to make the main body re-usable.

2023-06-28 Thread Tamar Christina via Gcc-patches
Adding proper maintainers.

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, June 28, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison
> to make the main body re-usable.
> 
> Hi All,
> 
> Vectorization of a gcond starts off essentially the same as vectorizing a
> comparison witht he only difference being how the operands are extracted.
> 
> This refactors vectorable_comparison such that we now have a generic
> function that can be used from vectorizable_early_break.  The refactoring
> splits the gassign checks and actual validation/codegen off to a helper
> function.
> 
> No change in functionality expected.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting
> body
>   to ...
>   (vectorizable_comparison_1): ...This.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca8
> 1acd197693fc3457c31 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo,
> 
>  /* vectorizable_comparison.
> 
> -   Check if STMT_INFO is comparison expression that can be vectorized.
> +/* Helper of vectorizable_comparison.
> +
> +   Check if STMT_INFO is comparison expression CODE that can be vectorized.
> If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
> comparison, put it in VEC_STMT, and insert it at GSI.
> 
> Return true if STMT_INFO is vectorizable in this way.  */
> 
>  static bool
> -vectorizable_comparison (vec_info *vinfo,
> -  stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> -  gimple **vec_stmt,
> -  slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
> +stmt_vec_info stmt_info, tree_code code,
> +gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +slp_tree slp_node, stmt_vector_for_cost *cost_vec)
>  {
>tree lhs, rhs1, rhs2;
>tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
> -  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
>tree new_temp;
>loop_vec_info loop_vinfo = dyn_cast  (vinfo); @@ -11354,7
> +11355,7 @@ vectorizable_comparison (vec_info *vinfo,
>int ndts = 2;
>poly_uint64 nunits;
>int ncopies;
> -  enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
> +  enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
>int i;
>bb_vec_info bb_vinfo = dyn_cast  (vinfo);
>vec vec_oprnds0 = vNULL;
> @@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo,
>  ncopies = vect_get_num_copies (loop_vinfo, vectype);
> 
>gcc_assert (ncopies >= 1);
> -  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> -return false;
> -
> -  gassign *stmt = dyn_cast  (stmt_info->stmt);
> -  if (!stmt)
> -return false;
> -
> -  code = gimple_assign_rhs_code (stmt);
> 
>if (TREE_CODE_CLASS (code) != tcc_comparison)
>  return false;
> @@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo,
> return false;
>   }
> 
> -  STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
>vect_model_simple_cost (vinfo, stmt_info,
> ncopies * (1 + (bitop2 != NOP_EXPR)),
> dts, ndts, slp_node, cost_vec); @@ -11565,6
> +11557,44 @@ vectorizable_comparison (vec_info *vinfo,
>return true;
>  }
> 
> +/* vectorizable_comparison.
> +
> +   Check if STMT_INFO is comparison expression that can be vectorized.
> +   If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
> +   comparison, put it in VEC_STMT, and insert it at GSI.
> +
> +   Return true if STMT_INFO is vectorizable in this way.  */
> +
> +static bool
> +vectorizable_comparison (vec_info *vinfo,
> +  stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> +  gimple **vec_stmt,
> +  slp_tree slp_node, stmt_vector_for_cost *cost_vec) {
> +  bb_vec_info bb_vinfo = dyn_cast  (vinfo);
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> +return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> +return false;
> +
> +  gassign *stmt = dyn_cast  (stmt_info->stmt);  if (!stmt)
> +return false;
> +
> +  enum tree_code code = gimple_assign_rhs_code (stmt);
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +   

RE: [PATCH 3/19]middle-end clean up vect testsuite using pragma novector

2023-06-28 Thread Tamar Christina via Gcc-patches
Resending attached only due to size limit

> -Original Message-
> From: Tamar Christina
> Sent: Wednesday, June 28, 2023 2:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> Subject: [PATCH 3/19]middle-end clean up vect testsuite using pragma
> novector
> 
> Hi All,
> 
> The support for early break vectorization breaks lots of scan vect and slp
> testcases because they assume that loops with abort () in them cannot be
> vectorized.  Additionally it breaks the point of having a scalar loop to check
> the output of the vectorizer if that loop is also vectorized.
> 
> For that reason this adds
> 
> #pragma GCC novector to all tests which have a scalar loop that we would
> have
> vectorized using this patch series.
> 
> FWIW, none of these tests were failing to vectorize or run before the pragma.
> The tests that did point to some issues were copies to the early break test
> suit as well.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/vect/pr84556.cc: Add novector pragma.
>   * g++.dg/vect/simd-1.cc: Add novector pragma.
>   * g++.dg/vect/simd-2.cc: Add novector pragma.
>   * g++.dg/vect/simd-3.cc: Add novector pragma.
>   * g++.dg/vect/simd-4.cc: Add novector pragma.
>   * g++.dg/vect/simd-5.cc: Add novector pragma.
>   * g++.dg/vect/simd-6.cc: Add novector pragma.
>   * g++.dg/vect/simd-7.cc: Add novector pragma.
>   * g++.dg/vect/simd-8.cc: Add novector pragma.
>   * g++.dg/vect/simd-9.cc: Add novector pragma.
>   * g++.dg/vect/simd-clone-6.cc: Add novector pragma.
>   * gcc.dg/vect/O3-pr70130.c: Add novector pragma.
>   * gcc.dg/vect/Os-vect-95.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-16.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-2.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-24.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-25.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-26.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-27.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-28.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-29.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-42.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-cond-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-over-widen-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-over-widen-2.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pattern-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pattern-2.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pow-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pr101615-2.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pr65935.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-subgroups-1.c: Add novector pragma.
>   * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/i386/costmodel-vect-68.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Add
> novector pragma.
>   * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c: Add novector
> pragma.
>   * gcc.dg/vect/fast-math-bb-slp-call-1.c: Add novector pragma.
>   * gcc.dg/vect/fast-math-bb-slp-call-2.c: Add novector pragma.
>   * gcc.dg/vect/fast-math-vect-call-1.c: Add novector pragma.
>   * gcc.dg/vect/fast-math-vect-call-2.c: Add novector pragma.
>   * gcc.dg/vect/fast-math-vect-complex-3.c: Add novector pragma.
>   * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Add novector pragma.
>   * 

[PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 for Advanced SIMD

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

Advanced SIMD lacks a cmpeq for vectors, and unlike compare to 0 we can't
rewrite to a cmtst.

This operation is however fairly common, especially now that we support early
break vectorization.

As such this adds a pattern to recognize the negated any comparison and
transform it to an all.  i.e. any(~x) => all(x) and invert the branches.

For e.g.

void f1 (int x)
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] != x)
break;
}
}

We currently generate:

cmeqv31.4s, v30.4s, v29.4s
not v31.16b, v31.16b
umaxp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbnzx5, .L2

and after this patch:

cmeqv31.4s, v30.4s, v29.4s
uminp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbz x5, .L2

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*cbranchnev4si): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+(if_then_else
+  (ne (subreg:DI
+   (unspec:V4SI
+ [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+  (not:V4SI (match_dup 0))]
+   UNSPEC_UMAXV) 0)
+  (const_int 0))
+   (label_ref (match_operand 1 ""))
+   (pc)))
+(clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+   (unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+(if_then_else
+  (eq (subreg:DI (match_dup 2) 0)
+ (const_int 0))
+   (label_ref (match_dup 1))
+   (pc)))]
+{
+  if (can_create_pseudo_p ())
+operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 
..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+** ...
+   cmeqv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+   uminp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+   fmovx[0-9]+, d[0-9]+
+   cbz x[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 (int x)
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] != x)
+   break;
+}
+}




-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+(if_then_else
+  (ne (subreg:DI
+   (unspec:V4SI
+ [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+  (not:V4SI (match_dup 0))]
+   UNSPEC_UMAXV) 0)
+  (const_int 0))
+   (label_ref (match_operand 1 ""))
+   (pc)))
+(clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+   (unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+(if_then_else
+  (eq (subreg:DI (match_dup 2) 0)
+ (const_int 0))
+   (label_ref (match_dup 1))
+   (pc)))]
+{
+  if (can_create_pseudo_p ())
+operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 
..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null

[PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an implementation for conditional branch optab for AArch64.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

cmgtv1.4s, v1.4s, #0
umaxp   v1.4s, v1.4s, v1.4s
fmovx3, d1
cbnzx3, .L8

and of 64-bit vector we can omit the compression:

cmgtv1.2s, v1.2s, #0
fmovx2, d1
cbz x2, .L13

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch4"
+  [(set (pc)
+(if_then_else
+  (match_operator 0 "aarch64_equality_operator"
+[(match_operand:VDQ_I 1 "register_operand")
+ (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+  (label_ref (match_operand 3 ""))
+  (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+ so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (mode))
+emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
+   operands[2]));
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  rtx reduc = gen_lowpart (V4SImode, tmp);
+  rtx res = gen_reg_rtx (V4SImode);
+  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+  emit_move_insn (tmp, gen_lowpart (mode, res));
+}
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 
..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+** ...
+** cmgtv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** cmgev[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** cmeqv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] == 0)
+   break;
+}
+}
+
+/*
+** f4:
+** ...
+** cmtst   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] != 0)
+   break;
+}
+}
+
+/*
+** f5:
+** ...
+** cmltv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] < 0)
+   break;
+}
+}
+
+/*
+** f6:
+** ...
+** cmlev[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...

[PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an implementation for conditional branch optab for AArch32.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

vcgt.s32q8, q9, #0
vpmax.u32   d7, d16, d17
vpmax.u32   d7, d7, d7
vmovr3, s14 @ int
cmp r3, #0

and of 64-bit vector we can omit one vpmax as we still need to compress to
32-bits.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (vect_early_break): Add AArch32.
* gcc.target/arm/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -408,6 +408,45 @@ (define_insn "vec_extract"
   [(set_attr "type" "neon_store1_one_lane,neon_to_gp")]
 )
 
+;; Patterns comparing two vectors and conditionally jump.
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+;;
+;; However unlike the AArch64 version, we can't optimize this further as the
+;; chain is too long for combine due to these being unspecs so it doesn't fold
+;; the operation to something simpler.
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else
+ (match_operator 0 "expandable_comparison_operator"
+  [(match_operand:VDQI 1 "register_operand")
+   (match_operand:VDQI 2 "zero_operand")])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))]
+  "TARGET_NEON"
+{
+  rtx mask = operands[1];
+
+  /* For 128-bit vectors we need an additional reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  mask = gen_reg_rtx (V2SImode);
+  rtx low = gen_reg_rtx (V2SImode);
+  rtx high = gen_reg_rtx (V2SImode);
+  emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
+  emit_insn (gen_neon_vget_highv4si (high, operands[1]));
+  emit_insn (gen_neon_vpumaxv2si (mask, low, high));
+}
+
+  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
+
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, mask));
+  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; This pattern is renamed from "vec_extract" to
 ;; "neon_vec_extract" and this pattern is called
 ;; by define_expand in vec-common.md file.
diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
new file mode 100644
index 
..2c05aa10d26ed4ac9785672e6e3b4355cef046dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
@@ -0,0 +1,136 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/* f1:
+** ...
+** vcgt.s32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** vcge.s32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** vceq.i32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] == 0)
+   break;
+}
+}
+
+/*
+** f4:
+** ...
+** vceq.i32q[0-9]+, q[0-9]+, #0
+** vmvnq[0-9]+, q[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** 

[PATCH 19/19]Arm: Add MVE cbranch implementation

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an implementation for conditional branch optab for MVE.

Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
ability to do P0 comparisons and logical OR on P0.

For that reason we can only support cbranch with 0, as for comparing to a 0
predicate we don't need to actually do a comparison, we only have to check that
any bit is set within P0.

Because we can only do P0 comparisons with 0, the costing of the comparison was
reduced in order for the compiler not to try to push 0 to a register thinking
it's too expensive.  For the cbranch implementation to be safe we must see the
constant 0 vector.

For the lack of logical OR on P0 we can't really work around.  This means MVE
can't support cases where the sizes of operands in the comparison don't match,
i.e. when one operand has been unpacked.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

vcmp.s32gt, q3, q1
vmrsr3, p0  @ movhi
cbnzr3, .L2

MVE does not have 64-bit vector comparisons, as such that is also not supported.

Bootstrapped arm-none-linux-gnueabihf and regtested with
-march=armv8.1-m.main+mve -mfpu=auto and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0
compares.
* config/arm/mve.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (vect_early_break): Add MVE.
* gcc.target/arm/mve/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, 
enum rtx_code outer_code,
   || TARGET_HAVE_MVE)
  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
*cost = COSTS_N_INSNS (1);
+  else if (TARGET_HAVE_MVE
+  && outer_code == COMPARE
+  && VALID_MVE_PRED_MODE (mode))
+   /* MVE allows very limited instructions on VPT.P0,  however comparisons
+  to 0 do not require us to materialze this constant or require a
+  predicate comparison as we can go through SImode.  For that reason
+  allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
+  registers as we can't compare two predicates.  */
+   *cost = COSTS_N_INSNS (1);
   else
*cost = COSTS_N_INSNS (4);
   return true;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 
74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f
 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else
+ (match_operator 0 "expandable_comparison_operator"
+  [(match_operand:MVE_7 1 "register_operand")
+   (match_operand:MVE_7 2 "zero_operand")])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))]
+  "TARGET_HAVE_MVE"
+{
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
+  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
 (define_expand "@arm_mve_reinterpret"
   [(set (match_operand:MVE_vecs 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
new file mode 100644
index 
..c3b8506dca0b2b044e6869a6c8259d663c1ff930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/*
+** f1:
+** ...
+** vcmp.s32gt, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** vcmp.s32ge, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** vcmp.i32eq, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** 

[PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and Advanced SIMD

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

Advanced SIMD lacks flag setting vector comparisons which SVE adds.  Since 
machines
with SVE also support Advanced SIMD we can use the SVE comparisons to perform 
the
operation in cases where SVE codegen is allowed, but the vectorizer has decided
to generate Advanced SIMD because of loop costing.

e.g. for

void f1 (int x)
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] != x)
break;
}
}

We currently generate:

cmeqv31.4s, v31.4s, v28.4s
uminp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbz x5, .L2

and after this patch:

ptrue   p7.b, vl16
...
cmpne   p15.s, p7/z, z31.s, z28.s
b.any   .L2

Because we need to lift the predicate creation to outside of the loop we need to
expand the predicate early,  however in the cbranch expansion we don't see the
outer compare which we need to consume.

For this reason the expansion is two fold, when expanding the cbranch we emit an
SVE predicated comparison and later on during combine we match the SVE and NEON
comparison while also consuming the ptest.

Unfortunately *aarch64_pred_cmpne_neon_ptest is needed because
for some reason combine destroys the NOT and transforms it into a plus and -1.

For the straight SVE ones, we seem to fail to eliminate the ptest in these cases
but that's a separate optimization

Test show that I'm missing a few, but before I write the patterns for them, are
these OK?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cbranch4): Update with SVE.
* config/aarch64/aarch64-sve.md
(*aarch64_pred_cmp_neon_ptest,
*aarch64_pred_cmpeq_neon_ptest,
*aarch64_pred_cmpne_neon_ptest): New.
(aarch64_ptest): Rename to...
(@aarch64_ptest): ... This.
* genemit.cc: Include rtx-vector-builder.h.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/vect-early-break-cbranch_1.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78..75cb5d6f7f92b70fed8762fe64e23f0c05a99c99
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3843,31 +3843,59 @@ (define_expand "cbranch4"
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
-  rtx tmp = operands[1];
 
-  /* If comparing against a non-zero vector we have to do a comparison first
- so we can have a != 0 comparison with the result.  */
-  if (operands[2] != CONST0_RTX (mode))
-emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
-   operands[2]));
-
-  /* For 64-bit vectors we need no reductions.  */
-  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+  /* If SVE is available, lets borrow some instructions.  We will optimize
+ these further later in combine.  */
+  if (TARGET_SVE)
 {
-  /* Always reduce using a V4SI.  */
-  rtx reduc = gen_lowpart (V4SImode, tmp);
-  rtx res = gen_reg_rtx (V4SImode);
-  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
-  emit_move_insn (tmp, gen_lowpart (mode, res));
+  machine_mode full_mode = aarch64_full_sve_mode (mode).require ();
+  rtx in1 = lowpart_subreg (full_mode, operands[1], mode);
+  rtx in2 = lowpart_subreg (full_mode, operands[2], mode);
+
+  machine_mode pred_mode = aarch64_sve_pred_mode (full_mode);
+  rtx_vector_builder builder (VNx16BImode, 16, 2);
+  for (unsigned int i = 0; i < 16; ++i)
+   builder.quick_push (CONST1_RTX (BImode));
+  for (unsigned int i = 0; i < 16; ++i)
+   builder.quick_push (CONST0_RTX (BImode));
+  rtx ptrue = force_reg (VNx16BImode, builder.build ());
+  rtx cast_ptrue = gen_lowpart (pred_mode, ptrue);
+  rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode);
+
+  rtx tmp = gen_reg_rtx (pred_mode);
+  aarch64_expand_sve_vec_cmp_int (tmp, code, in1, in2);
+  emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, 
tmp));
+  operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM);
+  operands[2] = const0_rtx;
 }
+  else
+{
+  rtx tmp = operands[1];
 
-  rtx val = gen_reg_rtx (DImode);
-  emit_move_insn (val, gen_lowpart (DImode, tmp));
+  /* If comparing against a non-zero vector we have to do a comparison 
first
+so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (mode))
+   emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
+   operands[2]));
 
-  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
-  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
-  DONE;
+  /* For 

[PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

I didn't want these to get lost in the noise of updates.

The following three tests now correctly work for targets that have an
implementation of cbranch for vectors so XFAILs are conditionally removed gated
on vect_early_break support.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Remove xfail when early break
supported.
* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 
3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index 
bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 
c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 
3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index 
bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 
c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */





[PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This patch updates the peeling code to maintain LCSSA during peeling.
The rewrite also naturally takes into account multiple exits and so it didn't
make sense to split them off.

For the purposes of peeling the only change for multiple exits is that the
secondary exits are all wired to the start of the new loop preheader when doing
epilogue peeling.

When doing prologue peeling the CFG is kept in tact.

For both epilogue and prologue peeling we wire through between the two loops any
PHI nodes that escape the first loop into the second loop if flow_loops is
specified.  The reason for this conditionality is because
slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
  - prologue peeling
  - epilogue peeling
  - loop distribution

for the last case the loops should remain independent, and so not be connected.
Because of this propagation of only used phi nodes get_current_def can be used
to easily find the previous definitions.  However live statements that are
not used inside the loop itself are not propagated (since if unused, the moment
we add the guard in between the two loops the value across the bypass edge can
be wrong if the loop has been peeled.)

This is dealt with easily enough in find_guard_arg.

For multiple exits, while we are in LCSSA form, and have a correct DOM tree, the
moment we add the guard block we will change the dominators again.  To deal with
this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks to
update without having to recompute the list of blocks to update again.

When multiple exits and doing epilogue peeling we will also temporarily have an
incorrect VUSES chain for the secondary exits as it anticipates the final result
after the VDEFs have been moved.  This will thus be corrected once the code
motion is applied.

Lastly by doing things this way we can remove the helper functions that
previously did lock step iterations to update things as it went along.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops = false.
* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
assert.
(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
exits.
(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit peeling.
(slpeel_can_duplicate_loop_p): Likewise.
(vect_update_ivs_after_vectorizer): Don't enter this...
(vect_update_ivs_after_early_break): ...but instead enter here.
(find_guard_arg): Update for new peeling code.
(slpeel_update_phi_nodes_for_loops): Remove.
(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0 checks.
(slpeel_update_phi_nodes_for_lcssa): Remove.
(vect_do_peeling): Fix VF for multiple exits and force epilogue.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
non_break_control_flow and early_breaks.
(vect_need_peeling_or_partial_vectors_p): Force partial vector if
multiple exits and VLA.
(vect_analyze_loop_form): Support inner loop multiple exits.
(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
(vect_create_epilog_for_reduction):  Update live phi nodes.
(vectorizable_live_operation): Ignore live operations in vector loop
when multiple exits.
(vect_transform_loop): Force unrolling for VF loops and multiple exits.
* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow and
analyze gcond params.
(vect_analyze_stmt): Support gcond.
* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
in RPO pass.
* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW): New.
(loop_vec_info_for_loop): Change to const and static.
(is_loop_header_bb_p): Drop assert.
(slpeel_can_duplicate_loop_p): Update prototype.
(class loop): Add early_breaks and non_break_control_flow.

--- inline copy of patch -- 
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 
97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40d833a968314a4442b9e
 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool 
redirect_lc_phi_defs)
   edge preheader = loop_preheader_edge (loop);
 
   initialize_original_copy_tables ();
-  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
+  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader, false);
   gcc_assert (res != NULL);
 
   /* When a not last partition is 

[PATCH 10/19]middle-end: implement vectorizable_early_break.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This implements vectorable_early_exit which is used as the codegen part of
vectorizing a gcond.

For the most part it shares the majority of the code with
vectorizable_comparison with addition that it needs to be able to reduce
multiple resulting statements into a single one for use in the gcond, and also
needs to be able to perform masking on the comparisons.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
lhs.
(vectorizable_early_exit): New.
(vect_analyze_stmt, vect_transform_stmt): Use it.
(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
f3e33cd4ed125b9564ca81acd197693fc3457c31..87c4353fa5180fcb7f60b192897456cf24f3fdbe
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11330,8 +11330,186 @@ vectorizable_condition (vec_info *vinfo,
   return true;
 }
 
-/* vectorizable_comparison.
+static bool
+vectorizable_comparison_1 (vec_info *, tree, stmt_vec_info, tree_code,
+  gimple_stmt_iterator *, gimple **, slp_tree,
+  stmt_vector_for_cost *);
+
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+gimple_stmt_iterator *gsi, gimple **vec_stmt,
+slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
+  if (!loop_vinfo
+  || !is_a  (STMT_VINFO_STMT (stmt_info)))
+return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, ))
+gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  stmt_vec_info operand0_info
+= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (op.ops[0]));
+  if (!operand0_info)
+return false;
+  /* If we're in a pattern get the type of the original statement.  */
+  if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+  tree vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+ncopies = 1;
+  else
+ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+{
+  if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+   {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "can't vectorize early exit because the "
+  "target doesn't support flag setting vector "
+  "comparisons.\n");
+ return false;
+   }
+
+  if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+   {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "can't vectorize early exit because the "
+  "target does not support boolean vector "
+  "comparisons for type %T.\n", truth_type);
+ return false;
+   }
+
+  if (ncopies > 1
+ && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+   {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "can't vectorize early exit because the "
+  "target does not support boolean vector OR for "
+  "type %T.\n", truth_type);
+ return false;
+   }
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+ vec_stmt, slp_node, cost_vec))
+   return false;
 
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+   vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
+
+  return true;
+}
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+ vec_stmt, slp_node, cost_vec))
+gcc_unreachable ();
+
+  

[PATCH 11/19]middle-end: implement code motion for early break.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

When performing early break vectorization we need to be sure that the vector
operations are safe to perform.  A simple example is e.g.

 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i]*2 != x)
 break;
   vect_a[i] = x;
 }

where the store to vect_b is not allowed to be executed unconditionally since
if we exit through the early break it wouldn't have been done for the full VF
iteration.

Effective the code motion determines:
  - is it safe/possible to vectorize the function
  - what updates to the VUSES should be performed if we do
  - Which statements need to be moved
  - Which statements can't be moved:
* values that are live must be reachable through all exits
* values that aren't single use and shared by the use/def chain of the cond
  - The final insertion point of the instructions.  In the cases we have
multiple early exist statements this should be the one closest to the loop
latch itself.

After motion the loop above is:

 for (int i = 0; i < N; i++)
 {
   ... y = x + i;
   if (vect_a[i]*2 != x)
 break;
   vect_b[i] = y;
   vect_a[i] = x;

 }

The operation is split into two, during data ref analysis we determine
validity of the operation and generate a worklist of actions to perform if we
vectorize.

After peeling and just before statetement tranformation we replay this worklist
which moves the statements and updates book keeping only in the main loop that's
to be vectorized.  This includes updating of USES in exit blocks.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
(vect_analyze_data_ref_dependences): Use it.
* tree-vect-loop.cc (move_early_exit_stmts): New.
(vect_transform_loop): Use it.
* tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS,
LOOP_VINFO_EARLY_BRK_DEST_BB, LOOP_VINFO_EARLY_BRK_VUSES): New.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 
fcc950f528b2d1e044be12424c2df11f692ee8ba..240bd7a86233f6b907816f812681e4cd778ecaae
 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -568,6 +568,278 @@ vect_analyze_data_ref_dependence (struct 
data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+ - Any memory access must be to a fixed size buffer.
+ - There must not be any loads and stores to the same object.
+ - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+ This implemementation is very conservative. Any overlappig loads/stores
+ that take place before the early break statement gets rejected aside from
+ WAR dependencies.
+
+ i.e.:
+
+   a[i] = 8
+   c = a[i]
+   if (b[i])
+ ...
+
+   is not allowed, but
+
+   c = a[i]
+   a[i] = 8
+   if (b[i])
+ ...
+
+   is which is the common case.
+
+   Arguments:
+ - LOOP_VINFO: loop information for the current loop.
+ - CHAIN: Currently detected sequence of instructions that need to be moved
+ if we are to vectorize this early break.
+ - FIXED: Sequences of SSA_NAMEs that must not be moved, they are 
reachable from
+ one or more cond conditions.  If this set overlaps with CHAIN 
then FIXED
+ takes precedence.  This deals with non-single use cases.
+ - LOADS: List of all loads found during traversal.
+ - BASES: List of all load data references found during traversal.
+ - GSTMT: Current position to inspect for validity.  The sequence
+ will be moved upwards from this point.
+ - REACHING_VUSE: The dominating VUSE found so far.
+ - CURRENT_VDEF: The last VDEF we've seen.  These are updated in
+ pre-order and updated in post-order after moving the
+ instruction.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set *chain,
+  hash_set *fixed, vec *loads,
+  vec *bases, tree *reaching_vuse,
+  tree *current_vdef, gimple_stmt_iterator *gstmt,
+  hash_map *renames)
+{
+  if (gsi_end_p (*gstmt))
+return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  if (gimple_has_ops (stmt))
+{
+  tree dest = NULL_TREE;
+  /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+use the LHS, if not, assume that the first argument of a call is the
+value being defined.  e.g. MASKED_LOAD etc.  */
+  if (gimple_has_lhs (stmt))
+   {
+ if (is_gimple_assign (stmt))
+   dest = gimple_assign_lhs (stmt);
+ else if (const gcall *call = dyn_cast  (stmt))

[PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

Vectorization of a gcond starts off essentially the same as vectorizing a
comparison witht he only difference being how the operands are extracted.

This refactors vectorable_comparison such that we now have a generic function
that can be used from vectorizable_early_break.  The refactoring splits the
gassign checks and actual validation/codegen off to a helper function.

No change in functionality expected.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body
to ...
(vectorizable_comparison_1): ...This.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca81acd197693fc3457c31
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo,
 
 /* vectorizable_comparison.
 
-   Check if STMT_INFO is comparison expression that can be vectorized.
+/* Helper of vectorizable_comparison.
+
+   Check if STMT_INFO is comparison expression CODE that can be vectorized.
If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
comparison, put it in VEC_STMT, and insert it at GSI.
 
Return true if STMT_INFO is vectorizable in this way.  */
 
 static bool
-vectorizable_comparison (vec_info *vinfo,
-stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
-gimple **vec_stmt,
-slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
+  stmt_vec_info stmt_info, tree_code code,
+  gimple_stmt_iterator *gsi, gimple **vec_stmt,
+  slp_tree slp_node, stmt_vector_for_cost *cost_vec)
 {
   tree lhs, rhs1, rhs2;
   tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
   tree new_temp;
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
@@ -11354,7 +11355,7 @@ vectorizable_comparison (vec_info *vinfo,
   int ndts = 2;
   poly_uint64 nunits;
   int ncopies;
-  enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
+  enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
   int i;
   bb_vec_info bb_vinfo = dyn_cast  (vinfo);
   vec vec_oprnds0 = vNULL;
@@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo,
 ncopies = vect_get_num_copies (loop_vinfo, vectype);
 
   gcc_assert (ncopies >= 1);
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
-return false;
-
-  gassign *stmt = dyn_cast  (stmt_info->stmt);
-  if (!stmt)
-return false;
-
-  code = gimple_assign_rhs_code (stmt);
 
   if (TREE_CODE_CLASS (code) != tcc_comparison)
 return false;
@@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo,
  return false;
}
 
-  STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
   vect_model_simple_cost (vinfo, stmt_info,
  ncopies * (1 + (bitop2 != NOP_EXPR)),
  dts, ndts, slp_node, cost_vec);
@@ -11565,6 +11557,44 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT_INFO is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return true if STMT_INFO is vectorizable in this way.  */
+
+static bool
+vectorizable_comparison (vec_info *vinfo,
+stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
+gimple **vec_stmt,
+slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  bb_vec_info bb_vinfo = dyn_cast  (vinfo);
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+return false;
+
+  gassign *stmt = dyn_cast  (stmt_info->stmt);
+  if (!stmt)
+return false;
+
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+ vec_stmt, slp_node, cost_vec))
+return false;
+
+  if (!vec_stmt)
+STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
can handle all live statements in the node.  Otherwise return true
if STMT_INFO is not live or if vectorizable_live_operation can handle it.




-- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca81acd197693fc3457c31
 100644
--- 

[PATCH 8/19]middle-end: updated niters analysis to handle multiple exits.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

For early break vectorization we have to update niters analysis to record and
analyze all exits of the loop, and so all conds.

The niters of the loop is still determined by the main/natural exit of the loop
as this is the O(n) bounds.  For now we don't do much with the secondary conds,
but their assumptions can be used to generate versioning checks later.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.cc (vect_get_loop_niters): Analyze all exits and return
all gconds.
(vect_analyze_loop_form): Update code checking for conds.
(vect_create_loop_vinfo): Handle having multiple conds.
(vect_analyze_loop): Release extra loop conds structures.
* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
LOOP_VINFO_LOOP_IV_COND): New.
(struct vect_loop_form_info): Add conds, loop_iv_cond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info 
loop_vinfo)
in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
niter information holds in ASSUMPTIONS.
 
-   Return the loop exit condition.  */
+   Return the loop exit conditions.  */
 
 
-static gcond *
+static vec
 vect_get_loop_niters (class loop *loop, tree *assumptions,
  tree *number_of_iterations, tree *number_of_iterationsm1)
 {
-  edge exit = single_exit (loop);
+  auto_vec exits = get_loop_exit_edges (loop);
+  vec conds;
+  conds.create (exits.length ());
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
-  gcond *cond = get_loop_exit_condition (loop);
 
   *assumptions = boolean_true_node;
   *number_of_iterationsm1 = chrec_dont_know;
   *number_of_iterations = chrec_dont_know;
+
   DUMP_VECT_SCOPE ("get_loop_niters");
 
-  if (!exit)
-return cond;
+  if (exits.is_empty ())
+return conds;
 
-  may_be_zero = NULL_TREE;
-  if (!number_of_iterations_exit_assumptions (loop, exit, _desc, NULL)
-  || chrec_contains_undetermined (niter_desc.niter))
-return cond;
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
+exits.length ());
 
-  niter_assumptions = niter_desc.assumptions;
-  may_be_zero = niter_desc.may_be_zero;
-  niter = niter_desc.niter;
+  edge exit;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (exits, i, exit)
+{
+  gcond *cond = get_edge_condition (exit);
+  if (cond)
+   conds.safe_push (cond);
 
-  if (may_be_zero && integer_zerop (may_be_zero))
-may_be_zero = NULL_TREE;
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
 
-  if (may_be_zero)
-{
-  if (COMPARISON_CLASS_P (may_be_zero))
+  may_be_zero = NULL_TREE;
+  if (!number_of_iterations_exit_assumptions (loop, exit, _desc, 
NULL)
+  || chrec_contains_undetermined (niter_desc.niter))
+   continue;
+
+  niter_assumptions = niter_desc.assumptions;
+  may_be_zero = niter_desc.may_be_zero;
+  niter = niter_desc.niter;
+
+  if (may_be_zero && integer_zerop (may_be_zero))
+   may_be_zero = NULL_TREE;
+
+  if (may_be_zero)
{
- /* Try to combine may_be_zero with assumptions, this can simplify
-computation of niter expression.  */
- if (niter_assumptions && !integer_nonzerop (niter_assumptions))
-   niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-niter_assumptions,
-fold_build1 (TRUTH_NOT_EXPR,
- boolean_type_node,
- may_be_zero));
+ if (COMPARISON_CLASS_P (may_be_zero))
+   {
+ /* Try to combine may_be_zero with assumptions, this can simplify
+computation of niter expression.  */
+ if (niter_assumptions && !integer_nonzerop (niter_assumptions))
+   niter_assumptions = fold_build2 (TRUTH_AND_EXPR, 
boolean_type_node,
+niter_assumptions,
+fold_build1 (TRUTH_NOT_EXPR,
+ boolean_type_node,
+ may_be_zero));
+ else
+   niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
+build_int_cst (TREE_TYPE (niter), 0),
+rewrite_to_non_trapping_overflow (niter));
+
+ may_be_zero = NULL_TREE;
+  

[PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This patch splits off the vectorizer's understanding of the main loop exit off
from the normal loop infrastructure.

Essentially we're relaxing the use of single_exit() in the vectorizer as we will
no longer have a single single and need a well defined split between the main
and secondary exits of loops for vectorization.

These new values were added to the loop class even though they're only used by
the vectorizer for a couple of reasons:
  - We need access to them in places where we have no loop_vinfo.
  - We only have a single loop_vinfo for each loop under consideration, however
that same loop can have different copies, e.g. peeled/versioned copies or
the scalar variant of the loop.  For each of these we still need to be able
to have a coherent exit definition.

For these reason the placement in the loop class was the only way to keep the
book keeping together with the loops and avoid possibly expensive lookups.

For this version of the patch the `main` exit of a loop is defined as the exit
that is closest to the loop latch. This is stored in vec_loop_iv.  The remaining
exits which are relevant for the vectorizer are stored inside
vec_loop_alt_exits.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* cfgloop.cc (alloc_loop): Initialize vec_loop_iv.
* cfgloop.h (class loop): Add vec_loop_iv and vec_loop_alt_exits.
* doc/loop.texi: Document get_edge_condition.
* tree-loop-distribution.cc (loop_distribution::distribute_loop):
Initialize vec_loop_iv since loop distributions calls loop peeling which
only understands vec_loop_iv now.
* tree-scalar-evolution.cc (get_edge_condition): New.
(get_loop_exit_condition): Refactor into get_edge_condition.
* tree-scalar-evolution.h (get_edge_condition): New.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Update use
of single_exit.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
vect_set_loop_condition_normal, vect_set_loop_condition,
slpeel_tree_duplicate_loop_to_edge_cfg, slpeel_can_duplicate_loop_p,
find_loop_location, vect_update_ivs_after_vectorizer,
vect_gen_vector_loop_niters_mult_vf, find_guard_arg, vect_do_peeling):
Replace usages of single_exit.
(vec_init_exit_info): New.
* tree-vect-loop.cc (vect_analyze_loop_form,
vect_create_epilog_for_reduction, vectorizable_live_operation,
scale_profile_for_vect_loop, vect_transform_loop): New.
* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_ALT_EXITS,
vec_init_exit_info): New.

--- inline copy of patch -- 
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 
e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2621f7f4888e7bf3c295
 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -272,6 +272,14 @@ public:
  the basic-block from being collected but its index can still be
  reused.  */
   basic_block former_header;
+
+  /* The controlling loop IV for the current loop when vectorizing.  This IV
+ controls the natural exits of the loop.  */
+  edge  GTY ((skip (""))) vec_loop_iv;
+
+  /* If the loop has multiple exits this structure contains the alternate
+ exits of the loop which are relevant for vectorization.  */
+  vec GTY ((skip (""))) vec_loop_alt_exits;
 };
 
 /* Set if the loop is known to be infinite.  */
diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index 
ccda7415d7037e26048425b5d85f3633a39fd325..98123f7dce98227c8dffe4833e159fbb05596831
 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -355,6 +355,7 @@ alloc_loop (void)
   loop->nb_iterations_upper_bound = 0;
   loop->nb_iterations_likely_upper_bound = 0;
   loop->nb_iterations_estimate = 0;
+  loop->vec_loop_iv = NULL;
   return loop;
 }
 
diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
index 
b357e9de7bcb1898ab9dda25738b9f003ca6f9f5..4ba6bb2585c81f7af34943b0493b94d5c3a8bf60
 100644
--- a/gcc/doc/loop.texi
+++ b/gcc/doc/loop.texi
@@ -212,6 +212,7 @@ relation, and breath-first search order, respectively.
 @code{NULL} if the loop has more than one exit.  You can only use this
 function if @code{LOOPS_HAVE_RECORDED_EXITS} is used.
 @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
+@item @code{get_edge_condition}: Get the condition belonging to an exit edge.
 @item @code{just_once_each_iteration_p}: Returns true if the basic block
 is executed exactly once during each iteration of a loop (that is, it
 does not belong to a sub-loop, and it dominates the latch of the loop).
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 
cf7c197aaf7919a0ecd56a10db0a42f93707ca58..97879498db46dd3c34181ae9aa6e5476004dd5b5
 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -3042,6 +3042,24 @@ loop_distribution::distribute_loop (class loop *loop,
   return 0;
 }
 
+ 

[PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

expand_vector_piecewise does not support VLA expansion as it has a hard assert
on the type not being VLA.

Instead of just failing to expand and so the call marked unsupported we ICE.
This adjust it so we don't and can gracefully handle the expansion in support
checks.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-generic.cc (expand_vector_comparison): Skip piecewise if not
constant.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 
df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d
 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree 
type, tree op0,
}
  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
}
-  else
+  else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
t = expand_vector_piecewise (gsi, do_compare, type,
 TREE_TYPE (TREE_TYPE (op0)), op0, op1,
 code, false);




-- 
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 
df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d
 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree 
type, tree op0,
}
  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
}
-  else
+  else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
t = expand_vector_piecewise (gsi, do_compare, type,
 TREE_TYPE (TREE_TYPE (op0)), op0, op1,
 code, false);





[PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

The bitfield vectorization support does not currently recognize bitfields inside
gconds. This means they can't be used as conditions for early break
vectorization which is a functionality we require.

This adds support for them by explicitly matching and handling gcond as a
source.

Testcases are added in the testsuite update patch as the only way to get there
is with the early break vectorization.   See tests:

  - vect-early-break_20.c
  - vect-early-break_21.c

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
from original statement.
(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.

Co-Authored-By:  Andre Vieira 

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
60bc9be6819af9bd28a81430869417965ba9d82d..c221b1d64449ce3b6c8864bbec4b17ddf938c2d6
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
*pattern_stmt,
   STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
 = STMT_VINFO_DEF_TYPE (orig_stmt_info);
+  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
 {
   gcc_assert (!vectype
@@ -2488,27 +2489,37 @@ static gimple *
 vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 tree *type_out)
 {
-  gassign *first_stmt = dyn_cast  (stmt_info->stmt);
+  gassign *conv_stmt = dyn_cast  (stmt_info->stmt);
+  gcond *cond_stmt = dyn_cast  (stmt_info->stmt);
 
-  if (!first_stmt)
-return NULL;
-
-  gassign *bf_stmt;
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
-  && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+  gimple *bf_stmt = NULL;
+  tree cond_cst = NULL_TREE;
+  if (cond_stmt)
 {
-  gimple *second_stmt
-   = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
-  bf_stmt = dyn_cast  (second_stmt);
-  if (!bf_stmt
- || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+  tree op = gimple_cond_lhs (cond_stmt);
+  if (TREE_CODE (op) != SSA_NAME)
+   return NULL;
+  bf_stmt = dyn_cast  (SSA_NAME_DEF_STMT (op));
+  cond_cst = gimple_cond_rhs (cond_stmt);
+  if (TREE_CODE (cond_cst) != INTEGER_CST)
return NULL;
 }
-  else
+  else if (conv_stmt
+  && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (conv_stmt))
+  && TREE_CODE (gimple_assign_rhs1 (conv_stmt)) == SSA_NAME)
+{
+  gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (conv_stmt));
+  bf_stmt = dyn_cast  (second_stmt);
+}
+
+  if (!bf_stmt
+  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
 return NULL;
 
   tree bf_ref = gimple_assign_rhs1 (bf_stmt);
   tree container = TREE_OPERAND (bf_ref, 0);
+  tree ret_type = cond_cst ? TREE_TYPE (container)
+  : TREE_TYPE (gimple_assign_lhs (conv_stmt));
 
   if (!bit_field_offset (bf_ref).is_constant ()
   || !bit_field_size (bf_ref).is_constant ()
@@ -2522,8 +2533,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   gimple *use_stmt, *pattern_stmt;
   use_operand_p use_p;
-  tree ret = gimple_assign_lhs (first_stmt);
-  tree ret_type = TREE_TYPE (ret);
   bool shift_first = true;
   tree container_type = TREE_TYPE (container);
   tree vectype = get_vectype_for_scalar_type (vinfo, container_type);
@@ -2560,7 +2569,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
  PLUS_EXPR then do the shift last as some targets can combine the shift and
  add into a single instruction.  */
-  if (single_imm_use (gimple_assign_lhs (first_stmt), _p, _stmt))
+  if (conv_stmt
+  && single_imm_use (gimple_assign_lhs (conv_stmt), _p, _stmt))
 {
   if (gimple_code (use_stmt) == GIMPLE_ASSIGN
  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
@@ -2620,7 +2630,21 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   NOP_EXPR, result);
 }
 
-  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  if (cond_cst)
+{
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+  pattern_stmt
+   = gimple_build_cond (gimple_cond_code (cond_stmt),
+gimple_get_lhs (pattern_stmt),
+fold_convert (ret_type, cond_cst),
+gimple_cond_true_label (cond_stmt),
+gimple_cond_false_label (cond_stmt));
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+}
+  else
+*type_out
+  = get_vectype_for_scalar_type 

[PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

There's an existing bug in loop frequency scaling where the if statement checks
to see if there's a single exit, and records an dump file note but then
continues.

It then tries to access the null pointer, which of course fails.

For multiple loop exists it's not really clear how to scale the exit
probablities as it's really unknown which exit is most probably.

For that reason I ignore the exit edges during scaling but still adjust the
loop body.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* cfgloopmanip.cc (scale_loop_frequencies): Fix typo.
(scale_loop_profile): Don't access null pointer.

--- inline copy of patch -- 
diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 
6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb
 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, 
profile_probability p)
 /* Scale profile in LOOP by P.
If ITERATION_BOUND is non-zero, scale even further if loop is predicted
to iterate too many times.
-   Before caling this function, preheader block profile should be already
+   Before calling this function, preheader block profile should be already
scaled to final count.  This is necessary because loop iterations are
determined by comparing header edge count to latch ege count and thus
they need to be scaled synchronously.  */
@@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability 
p,
   /* If latch exists, change its count, since we changed
 probability of exit.  Theoretically we should update everything from
 source of exit edge to latch, but for vectorizer this is enough.  */
-  if (loop->latch && loop->latch != e->src)
+  if (e && loop->latch && loop->latch != e->src)
loop->latch->count += count_delta;
 
   /* Scale the probabilities.  */
   scale_loop_frequencies (loop, p);
 
   /* Change latch's count back.  */
-  if (loop->latch && loop->latch != e->src)
+  if (e && loop->latch && loop->latch != e->src)
loop->latch->count -= count_delta;
 
   if (dump_file && (dump_flags & TDF_DETAILS))




-- 
diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 
6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb
 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, 
profile_probability p)
 /* Scale profile in LOOP by P.
If ITERATION_BOUND is non-zero, scale even further if loop is predicted
to iterate too many times.
-   Before caling this function, preheader block profile should be already
+   Before calling this function, preheader block profile should be already
scaled to final count.  This is necessary because loop iterations are
determined by comparing header edge count to latch ege count and thus
they need to be scaled synchronously.  */
@@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability 
p,
   /* If latch exists, change its count, since we changed
 probability of exit.  Theoretically we should update everything from
 source of exit edge to latch, but for vectorizer this is enough.  */
-  if (loop->latch && loop->latch != e->src)
+  if (e && loop->latch && loop->latch != e->src)
loop->latch->count += count_delta;
 
   /* Scale the probabilities.  */
   scale_loop_frequencies (loop, p);
 
   /* Change latch's count back.  */
-  if (loop->latch && loop->latch != e->src)
+  if (e && loop->latch && loop->latch != e->src)
loop->latch->count -= count_delta;
 
   if (dump_file && (dump_flags & TDF_DETAILS))





[PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C and C++
as gfortan does for FORTRAN and what ICX/ICX does for C and C++.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement, c_parser_statement_after_labels,
c_parse_pragma_novector, c_parser_pragma): Wire through novector and
default to false.

gcc/cp/ChangeLog:

* cp-tree.def (RANGE_FOR_STMT): Update comment.
* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.
* tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
* tree.h (TREE_LANG_FLAG_7): New.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect-novector-pragma.cc: New test.
* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 
9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576
 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -87,6 +87,7 @@ enum pragma_kind {
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
   PRAGMA_UNROLL,
+  PRAGMA_NOVECTOR,
 
   PRAGMA_FIRST_EXTERNAL
 };
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 
0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88
 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1862,6 +1862,10 @@ init_pragma (void)
 cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
  false, false);
 
+  if (!flag_preprocess_only)
+cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR,
+ false, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..9d35fe68704c8aca197bcd4805a146c655959621
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, 
bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, unsigned short, bool 
*);
-static void c_parser_do_statement (c_parser *, bool, unsigned short);
-static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
+ bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short, bool);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool,
+   bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool 
*if_p,
  c_parser_switch_statement (parser, if_p);
  break;
case RID_WHILE:
- c_parser_while_statement (parser, false, 0, if_p);
+ c_parser_while_statement (parser, false, 0, false, if_p);
  break;
case RID_DO:
- c_parser_do_statement (parser, false, 0);
+ c_parser_do_statement (parser, false, 0, false);
  break;
case RID_FOR:
- c_parser_for_statement (parser, false, 0, if_p);

[PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi,

With the patch enabling the vectorization of early-breaks, we'd like to allow
bitfield lowering in such loops, which requires the relaxation of allowing
multiple exits when doing so.  In order to avoid a similar issue to PR107275,
the code that rejects loops with certain types of gimple_stmts was hoisted from
'if_convertible_loop_p_1' to 'get_loop_body_in_if_conv_order', to avoid trying
to lower bitfields in loops we are not going to vectorize anyway.

This also ensures 'ifcvt_local_dec' doesn't accidentally remove statements it
shouldn't as it will never come across them.  I made sure to add a comment to
make clear that there is a direct connection between the two and if we were to
enable vectorization of any other gimple statement we should make sure both
handle it.

NOTE: This patch accepted before but never committed because it is a no-op
without the early break patch.   This is a respun version of Andre's patch and
rebased to changes in ifcvt and updated to handle multiple exits.

Bootstrappend and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu and no issues.

gcc/ChangeLog:

* tree-if-conv.cc (if_convertible_loop_p_1): Move check from here ...
(get_loop_body_if_conv_order): ... to here.
(if_convertible_loop_p): Remove single_exit check.
(tree_if_conversion): Move single_exit check to if-conversion part and
support multiple exits.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-bitfield-read-1-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-2-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-8.c: New test.
* gcc.dg/vect/vect-bitfield-read-9.c: New test.

Co-Authored-By:  Andre Vieira 

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
new file mode 100644
index 
..0d91067ebb27b1db2b2352975c43bce8b4171e3f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
@@ -0,0 +1,60 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include 
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+char a : 4;
+};
+
+#define N 32
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 56
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+int res = 0;
+for (int i = 0; i < n; ++i)
+  {
+   switch (ptr[i].a)
+ {
+ case 0:
+   res += ptr[i].a + 1;
+   break;
+ case 1:
+ case 2:
+ case 3:
+   res += ptr[i].a;
+   break;
+ default:
+   return 0;
+ }
+  }
+return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f([0], N) != RES)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
new file mode 100644
index 
..4ac7b3fc0dfd1c9d0b5e94a2ba6a745545577ec1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include 
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+char a : 4;
+};
+
+#define N 32
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+int res = 0;
+for (int i = 0; i < n; ++i)
+  {
+   asm volatile ("" ::: "memory");
+   res += ptr[i].a;
+  }
+return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f([0], N) != RES)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
new file mode 100644
index 
..52cfd33d937ae90f3fe9556716c90e098b768ac8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift 

[PATCH v5 0/19] Support early break/return auto-vectorization

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.

Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR.  This is however only checked then the
comparison would produce multiple statements.

Concretely the kind of loops supported are of the forms:

 for (int i = 0; i < N; i++)
 {
   
   if ()
 {
   ...
   ;
 }
   
 }

where  can be:
 - break
 - return
 - goto

Any number of statements can be used before the  occurs.

Since this is an initial version for GCC 14 it has the following limitations and
features:

- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC.
- any stores in  should not be to the same objects as in
  .  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- The number of loop iterations must be known,  this is just a temporarily
  limitation that I intend to address in GCC 14 itself as follow on patches.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- The early exit must be before the natural loop exit/latch.  The vectorizer is
  designed in way to propage phi-nodes downwards.  As such supporting this
  inverted control flow is hard.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Epilogue vectorization would also not be profitable.
- Early breaks are only supported for inner loop vectorization.

I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break

With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.

This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in .

niters analysis and the majority of the vectorizer with hardcoded single_exit
have been updated with the use of a new function vec_loop_iv value which returns
the exit the vectorizer wants to use as the main IV exit.

for niters the this exit is what determines the overall iterations as
that is the O(iters) for the loop.

For the scalar loop we know that whatever exit you take you have to perform at
most VF iterations.  For vector code we only case about the state of fully
performed iteration and reset the scalar code to the (partially) remaining loop.

This new version of the patch does the majority of the work in a new rewritten
loop peeling.  This new function maintains LCSSA all the way through and no
longer requires the touch up functions the vectorized used to incrementally
adjust them later on.  This means that aside from IV updates and guard edge
updates the early exit code is identical to the single exit cases.

When the loop is peeled during the copying I have to go through great lengths to
keep the dominators up to date.  All exits from the first loop are rewired to 
the
loop header of the second loop.  But this can change the immediate dominator.

The dominators can change again when we wire in the loop guard, as such peeling
now returns a list of dominators that need to be updated if a new guard edge is
added.

For the loop peeling we rewrite the loop form:


 Header
  ---
  |x|
   2
   |
   v
---3<--
 early exit |  |  |
v  v  | latch
7  4->6
|  |
|  v
|  8
|  |
|  v
-->5

into

 Header
  ---
  |x|
   2
   |
   v
---3<--
 early exit |  |  |
v  v  | latch
7  4->6
  

[PATCH] middle-end/110452 - bad code generation with AVX512 mask splat

2023-06-28 Thread Richard Biener via Gcc-patches
The following adds an alternate way of expanding a uniform
mask vector constructor like

  _55 = _2 ? -1 : 0;
  vect_cst__56 = {_55, _55, _55, _55, _55, _55, _55, _55};

when the mask mode is a scalar int mode like for AVX512 or GCN.
Instead of piecewise building the result via shifts and ors
we can take advantage of uniformity and signedness of the
component and simply sign-extend to the result.

Instead of

cmpl$3, %edi
sete%cl
movl%ecx, %esi
leal(%rsi,%rsi), %eax
leal0(,%rsi,4), %r9d
leal0(,%rsi,8), %r8d
orl %esi, %eax
orl %r9d, %eax
movl%ecx, %r9d
orl %r8d, %eax
movl%ecx, %r8d
sall$4, %r9d
sall$5, %r8d
sall$6, %esi
orl %r9d, %eax
orl %r8d, %eax
movl%ecx, %r8d
orl %esi, %eax
sall$7, %r8d
orl %r8d, %eax
kmovb   %eax, %k1

we then get

cmpl$3, %edi
sete%cl
negl%ecx
kmovb   %ecx, %k1

Code generation for non-uniform masks remains bad, but at least
I see no easy way out for the most general case here.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Will apply tomorrow after double-checking SPEC results and
if no comments appear.

Richard.

PR middle-end/110452
* expr.cc (store_constructor): Handle uniform boolean
vectors with integer mode specially.
---
 gcc/expr.cc  | 13 +
 gcc/testsuite/gcc.target/i386/pr110452.c | 13 +
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110452.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 62cd8facf75..b7f4e2fda9e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7447,6 +7447,19 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
  emit_move_insn (target, ops[0].value);
break;
  }
+   /* Use sign-extension for uniform boolean vectors with
+  integer modes.  */
+   if (!TREE_SIDE_EFFECTS (exp)
+   && VECTOR_BOOLEAN_TYPE_P (type)
+   && SCALAR_INT_MODE_P (mode)
+   && (elt = uniform_vector_p (exp))
+   && !VECTOR_TYPE_P (TREE_TYPE (elt)))
+ {
+   rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
+expand_normal (elt));
+   convert_move (target, op0, 0);
+   break;
+ }
 
n_elts = TYPE_VECTOR_SUBPARTS (type);
if (REG_P (target)
diff --git a/gcc/testsuite/gcc.target/i386/pr110452.c 
b/gcc/testsuite/gcc.target/i386/pr110452.c
new file mode 100644
index 000..8a3e2e560d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110452.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -mavx512f 
-mprefer-vector-width=512" } */
+
+double a[1024], b[1024], c[1024];
+
+void foo (int flag, int n)
+{
+  _Bool x = flag == 3;
+  for (int i = 0; i < n; ++i)
+a[i] = (x ? b[i] : c[i]) * 42.;
+}
+
+/* { dg-final { scan-assembler-not "\[^x\]orl" } } */
-- 
2.35.3


Re: [PATCH 11/11] riscv: thead: Add support for the XTheadFMemIdx ISA extension

2023-06-28 Thread Christoph Müllner
On Sat, Jun 10, 2023 at 7:54 PM Jeff Law  wrote:
>
>
>
> On 4/28/23 00:23, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > The XTheadFMemIdx ISA extension provides additional load and store
> > instructions for floating-point registers with new addressing modes.
> >
> > The following memory accesses types are supported:
> > * ftype = [w,d] (single-precision, double-precision)
> >
> > The following addressing modes are supported:
> > * register offset with additional immediate offset (4 instructions):
> >flr, fsr
> > * zero-extended register offset with additional immediate offset
> >(4 instructions): flur, fsur
> >
> > These addressing modes are also part of the similar XTheadMemIdx
> > ISA extension support, whose code is reused and extended to support
> > floating-point registers.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (riscv_index_reg_class): Also allow
> >   for XTheadFMemIdx.
> >   (riscv_regno_ok_for_index_p): Likewise.
> >   * config/riscv/thead-peephole.md (TARGET_64BIT):
> >   Generalize peepholes for XTheadFMemIdx.
> >   * config/riscv/thead.cc (is_fmemidx_mode): New function.
> >   (th_memidx_classify_address_index): Add support for
> >   XTheadFMemIdx.
> >   (th_fmemidx_output_index): New function.
> >   (th_output_move): Add support for XTheadFMemIdx.
> >   * config/riscv/thead.md (*th_fmemidx_movsf_hardfloat): New INSN.
> >   (*th_fmemidx_movdf_hardfloat_rv64): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadmemidx-helpers.h: Add helpers for
> > XTheadMemFIdx.
> >   * gcc.target/riscv/xtheadfmemidx-index-update.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-index.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-uindex-update.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-uindex.c: New test.
> Same core questions/comments as in patch #10 of this series.

The basic support for this extension is already merged.

The documentation can be found here:
  https://github.com/T-head-Semi/thead-extension-spec/tree/master

The extension's name and a link to the documentation has also been
registered here:
 
https://github.com/riscv-non-isa/riscv-toolchain-conventions#list-of-vendor-extensions

The XTheadFMemIdx extension is part of the T-Head C906 and C910 SoCs.
The C906 was launched in October 2021.

Thanks,
Christoph

>
> jeff
>


Re: [PATCH 10/11] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-06-28 Thread Christoph Müllner
On Sat, Jun 10, 2023 at 7:53 PM Jeff Law  wrote:
>
>
>
> On 4/28/23 00:23, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > The XTheadMemIdx ISA extension provides a additional load and store
> > instructions with new addressing modes.
> >
> > The following memory accesses types are supported:
> > * ltype = [b,bu,h,hu,w,wu,d]
> > * stype = [b,h,w,d]
> >
> > The following addressing modes are supported:
> > * immediate offset with PRE_MODIFY or POST_MODIFY (22 instructions):
> >l.ia, l.ib, s.ia, s.ib
> > * register offset with additional immediate offset (11 instructions):
> >lr, sr
> > * zero-extended register offset with additional immediate offset
> >(11 instructions): lur, sur
> >
> > The RISC-V base ISA does not support index registers, so the changes
> > are kept separate from the RISC-V standard support.
> >
> > Similar like other extensions (Zbb, XTheadBb), this patch needs to
> > prevent the conversion of sign-extensions/zero-extensions into
> > shift instructions. The case of the zero-extended register offset
> > addressing mode is handled by a new peephole pass.
> >
> > Handling the different cases of extensions results in a couple of INSNs
> > that look redundant on first view, but they are just the equivalent
> > of what we already have for Zbb as well. The only difference is, that
> > we have much more load instructions.
> >
> > To fully utilize the capabilities of the instructions, there are
> > a few new peephole passes which fold shift amounts into the RTX
> > if possible. The added tests ensure that this feature won't
> > regress without notice.
> >
> > We already have a constraint with the name 'th_f_fmv', therefore,
> > the new constraints follow this pattern and have the same length
> > as required ('th_m_mia', 'th_m_mib', 'th_m_mir', 'th_m_miu').
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/constraints.md (th_m_mia): New constraint.
> >   (th_m_mib): Likewise.
> >   (th_m_mir): Likewise.
> >   (th_m_miu): Likewise.
> >   * config/riscv/riscv-protos.h (enum riscv_address_type):
> >   Add new address types ADDRESS_REG_REG, ADDRESS_REG_UREG,
> >   and ADDRESS_REG_WB and their documentation.
> >   (struct riscv_address_info): Add new field 'shift' and
> >   document the field usage for the new address types.
> >   (riscv_valid_base_register_p): New prototype.
> >   (th_memidx_legitimate_modify_p): Likewise.
> >   (th_memidx_legitimate_index_p): Likewise.
> >   (th_classify_address): Likewise.
> >   (th_output_move): Likewise.
> >   (th_print_operand_address): Likewise.
> >   * config/riscv/riscv.cc (riscv_index_reg_class):
> >   Return GR_REGS for XTheadMemIdx.
> >   (riscv_regno_ok_for_index_p): Add support for XTheadMemIdx.
> >   (riscv_classify_address): Call th_classify_address() on top.
> >   (riscv_output_move): Call th_output_move() on top.
> >   (riscv_print_operand_address): Call th_print_operand_address()
> >   on top.
> >   * config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): New macro.
> >   (HAVE_PRE_MODIFY_DISP): Likewise.
> >   * config/riscv/riscv.md (zero_extendqi2): Disable
> >   for XTheadMemIdx.
> >   (*zero_extendqi2_internal): Convert to expand,
> >   create INSN with same name and disable it for XTheadMemIdx.
> >   (extendsidi2): Likewise.
> >   (*extendsidi2_internal): Disable for XTheadMemIdx.
> >   * config/riscv/thead-peephole.md: Add helper peephole passes.
> >   * config/riscv/thead.cc (valid_signed_immediate): New helper
> >   function.
> >   (th_memidx_classify_address_modify): New function.
> >   (th_memidx_legitimate_modify_p): Likewise.
> >   (th_memidx_output_modify): Likewise.
> >   (is_memidx_mode): Likewise.
> >   (th_memidx_classify_address_index): Likewise.
> >   (th_memidx_legitimate_index_p): Likewise.
> >   (th_memidx_output_index): Likewise.
> >   (th_classify_address): Likewise.
> >   (th_output_move): Likewise.
> >   (th_print_operand_address): Likewise.
> >   * config/riscv/thead.md (*th_memidx_mov2):
> >   New INSN.
> >   (*th_memidx_zero_extendqi2): Likewise.
> >   (*th_memidx_extendsidi2): Likewise
> >   (*th_memidx_zero_extendsidi2): Likewise.
> >   (*th_memidx_zero_extendhi2): Likewise.
> >   (*th_memidx_extend2): Likewise
> >   (*th_memidx_bb_zero_extendsidi2): Likewise.
> >   (*th_memidx_bb_zero_extendhi2): Likewise.
> >   (*th_memidx_bb_extendhi2): Likewise.
> >   (*th_memidx_bb_extendqi2): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadmemidx-helpers.h: New test.
> >   * gcc.target/riscv/xtheadmemidx-index-update.c: New test.
> >   * gcc.target/riscv/xtheadmemidx-index-xtheadbb-update.c: New test.
> >   * gcc.target/riscv/xtheadmemidx-index-xtheadbb.c: New test.
> >   * gcc.target/riscv/xtheadmemidx-index.c: New test.
> >   * 

Re: [PATCH] tree-optimization/110434 - avoid ={v} {CLOBBER} from NRV

2023-06-28 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 28, 2023 at 12:32:51PM +, Richard Biener wrote:
> As said there's nothing run after NRV.

There is expansion but in the  case I strongly doubt we are trying
to stack reuse it for other vars, so maybe it is ok.

> > On the other side, could there be partial clobbers for the var -> ,
> >   var.fld = {CLOBBER};
> > ?  Or even worse, indirect clobbers (MEM_REF with SSA_NAME pointing to
> > var or parts of it)?
> 
> We know that 'var' is not address taken, not sure about the partial
> clobbers.  We could deal with this in the walk_gimple_op case and
> simply remove a clobber when data.modified.

LGTM.

Jakub



[PATCH] tree-optimization/110451 - hoist invariant compare after interchange

2023-06-28 Thread Richard Biener via Gcc-patches
The following adjusts the cost model of invariant motion to consider
[VEC_]COND_EXPRs and comparisons producing a data value as expensive.
For 503.bwaves_r this avoids an unnecessarily high vectorization
factor because of an integer comparison besides data operations on
double.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110451
* tree-ssa-loop-im.cc (stmt_cost): [VEC_]COND_EXPR and
tcc_comparison are expensive.

* gfortran.dg/vect/pr110451.f: New testcase.
---
 gcc/testsuite/gfortran.dg/vect/pr110451.f | 51 +++
 gcc/tree-ssa-loop-im.cc   | 11 -
 2 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/vect/pr110451.f

diff --git a/gcc/testsuite/gfortran.dg/vect/pr110451.f 
b/gcc/testsuite/gfortran.dg/vect/pr110451.f
new file mode 100644
index 000..ba77b0dd174
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/pr110451.f
@@ -0,0 +1,51 @@
+! { dg-do compile }
+! { dg-require-effective-target vect_condition }
+! { dg-require-effective-target vect_double }
+! { dg-additional-options "-ffast-math -floop-interchange 
-fdump-tree-linterchange-details -fdump-tree-vect-details" }
+! { dg-additional-options "-mprefer-vector-width=128" { target x86_64-*-* 
i?86-*-* } }
+
+subroutine mat_times_vec(y,x,a,axp,ayp,azp,axm,aym,azm,
+ $  nb,nx,ny,nz)
+implicit none
+integer nb,nx,ny,nz,i,j,k,m,l,kit,im1,ip1,jm1,jp1,km1,kp1
+
+real*8 y(nb,nx,ny,nz),x(nb,nx,ny,nz)
+
+real*8 a(nb,nb,nx,ny,nz),
+ 1  axp(nb,nb,nx,ny,nz),ayp(nb,nb,nx,ny,nz),azp(nb,nb,nx,ny,nz),
+ 2  axm(nb,nb,nx,ny,nz),aym(nb,nb,nx,ny,nz),azm(nb,nb,nx,ny,nz)
+
+
+  do k=1,nz
+ km1=mod(k+nz-2,nz)+1
+ kp1=mod(k,nz)+1
+ do j=1,ny
+jm1=mod(j+ny-2,ny)+1
+jp1=mod(j,ny)+1
+do i=1,nx
+   im1=mod(i+nx-2,nx)+1
+   ip1=mod(i,nx)+1
+   do l=1,nb
+  y(l,i,j,k)=0.0d0
+  do m=1,nb
+ y(l,i,j,k)=y(l,i,j,k)+
+ 1   a(l,m,i,j,k)*x(m,i,j,k)+
+ 2   axp(l,m,i,j,k)*x(m,ip1,j,k)+
+ 3   ayp(l,m,i,j,k)*x(m,i,jp1,k)+
+ 4   azp(l,m,i,j,k)*x(m,i,j,kp1)+
+ 5   axm(l,m,i,j,k)*x(m,im1,j,k)+
+ 6   aym(l,m,i,j,k)*x(m,i,jm1,k)+
+ 7   azm(l,m,i,j,k)*x(m,i,j,km1)
+  enddo
+   enddo
+enddo
+ enddo
+enddo  
+return
+end
+
+! loop interchange adds a conditional on m != 1 in the innermost loop
+! verify that is hoisted and thus not affecting the vectorization factor
+
+! { dg-final { scan-tree-dump-times "is interchanged" 1 "linterchange" } }
+! { dg-final { scan-tree-dump "vectorization factor = 2" "vect" { target 
x86_64-*-* i?86-*-* } } }
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 86ce6acb023..f5b01e986ae 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -617,7 +617,8 @@ stmt_cost (gimple *stmt)
   if (gimple_code (stmt) != GIMPLE_ASSIGN)
 return 1;
 
-  switch (gimple_assign_rhs_code (stmt))
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  switch (code)
 {
 case MULT_EXPR:
 case WIDEN_MULT_EXPR:
@@ -645,6 +646,11 @@ stmt_cost (gimple *stmt)
   /* Shifts and rotates are usually expensive.  */
   return LIM_EXPENSIVE;
 
+case COND_EXPR:
+case VEC_COND_EXPR:
+  /* Conditionals are expensive.  */
+  return LIM_EXPENSIVE;
+
 case CONSTRUCTOR:
   /* Make vector construction cost proportional to the number
  of elements.  */
@@ -658,6 +664,9 @@ stmt_cost (gimple *stmt)
   return 0;
 
 default:
+  /* Comparisons are usually expensive.  */
+  if (TREE_CODE_CLASS (code) == tcc_comparison)
+   return LIM_EXPENSIVE;
   return 1;
 }
 }
-- 
2.35.3


Re: [PATCH] tree-optimization/110434 - avoid ={v} {CLOBBER} from NRV

2023-06-28 Thread Richard Biener via Gcc-patches
On Wed, 28 Jun 2023, Jakub Jelinek wrote:

> On Wed, Jun 28, 2023 at 10:21:45AM +, Richard Biener via Gcc-patches 
> wrote:
> > When NRV replaces a local variable with  it also replaces
> > occurences in clobbers.  This leads to  being clobbered
> > before the return of it which is strictly invalid but harmless in
> > practice since there's no pass after NRV which would remove
> > earlier stores.
> > 
> > The following fixes this nevertheless.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> > 
> > Thanks,
> > Richard.
> > 
> > PR tree-optimization/110434
> > * tree-nrv.cc (pass_nrv::execute): Remove CLOBBERs of
> > VAR we replace with .
> 
> This is in a loop over all basic blocks in a function.
> Do we want to kill all clobbers, or just the ones at the end of functions
> (i.e. after the  = VAR; assignment that we also remove)?
> Complication is that doesn't necessarily have to be just the rest of
> a single basic block, but all basic blocks from that point until end of
> function.
> I mean, if we have
>   var = whatever;
>   use (var);
>   var = {CLOBBER};
>   ...
>   var = whatever_else;
>= var;
>   var = {CLOBBER};
> killing the first clobber might result in missed optimizations later on.

As said there's nothing run after NRV.

> 
> On the other side, could there be partial clobbers for the var -> ,
>   var.fld = {CLOBBER};
> ?  Or even worse, indirect clobbers (MEM_REF with SSA_NAME pointing to
> var or parts of it)?

We know that 'var' is not address taken, not sure about the partial
clobbers.  We could deal with this in the walk_gimple_op case and
simply remove a clobber when data.modified.

I went at it under the presumption that  never goes out of
scope so we shouldn't have any CLOBBER for it.  You could also say
that NRV should operate flow-sensitive, going from returns backwards
and simply stop replacing when the var it substitutes for is clobbered.

I'll do the adjustment handling var.fld = {CLOBBER};

If we don't remove earlier clobbers shouldn't that prevent the NRV
then?

Richard.


[PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-28 Thread Juzhe-Zhong
Similar to vfwmacc. Add combine patterns as follows:

For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))

For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))

For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) 
)))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*double_widen_fnma): New pattern.
(*single_widen_fnma): Ditto.
(*double_widen_fms): Ditto.
(*single_widen_fms): Ditto.
(*double_widen_fnms): Ditto.
(*single_widen_fnms): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 182 ++
 .../riscv/rvv/autovec/widen/widen-10.c|  22 +++
 .../riscv/rvv/autovec/widen/widen-11.c|  22 +++
 .../riscv/rvv/autovec/widen/widen-12.c|  22 +++
 .../rvv/autovec/widen/widen-complicate-7.c|  27 +++
 .../rvv/autovec/widen/widen-complicate-8.c|  27 +++
 .../rvv/autovec/widen/widen-complicate-9.c|  27 +++
 .../riscv/rvv/autovec/widen/widen_run-10.c|  32 +++
 .../riscv/rvv/autovec/widen/widen_run-11.c|  32 +++
 .../riscv/rvv/autovec/widen/widen_run-12.c|  32 +++
 .../rvv/autovec/widen/widen_run_zvfh-10.c |  32 +++
 .../rvv/autovec/widen/widen_run_zvfh-11.c |  32 +++
 .../rvv/autovec/widen/widen_run_zvfh-12.c |  32 +++
 13 files changed, 521 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 1a1cef0eaa5..0c0ba685d6b 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -502,3 +502,185 @@
   }
   [(set_attr "type" "vfwmuladd")
(set_attr "mode" "")])
+
+;; -
+;;  [FP] VFWNMSAC
+;; -
+;; Includes:
+;; - vfwnmsac.vv
+;; -
+
+;; Combine ext + ext + fnma ===> widen fnma.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;; vect__8.176_40 = (vector([2,2]) double) vect__7.175_41;
+;; vect__11.180_35 = (vector([2,2]) double) vect__10.179_36;
+;; vect__13.182_33 = .FNMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
+(define_insn_and_split "*double_widen_fnma"
+  [(set (match_operand:VWEXTF 0 "register_operand")
+   (fma:VWEXTF
+ (neg:VWEXTF
+   (float_extend:VWEXTF
+ (match_operand: 2 "register_operand")))
+ (float_extend:VWEXTF
+   (match_operand: 3 "register_operand"))
+ (match_operand:VWEXTF 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_widen_mul_neg 
(PLUS, mode),
+

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-06-28 Thread Tejas Belagod via Gcc-patches


From: Richard Biener 
Date: Tuesday, June 27, 2023 at 12:58 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:
>
>
>
>
>
> From: Richard Biener 
> Date: Monday, June 26, 2023 at 2:23 PM
> To: Tejas Belagod 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
>
> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > Packed Boolean Vectors
> > --
> >
> > I'd like to propose a feature addition to GNU Vector extensions to add 
> > packed
> > boolean vectors (PBV).  This has been discussed in the past here[1] and a 
> > variant has
> > been implemented in Clang recently[2].
> >
> > With predication features being added to vector architectures (SVE, MVE, 
> > AVX),
> > it is a useful feature to have to model predication on targets.  This could
> > find its use in intrinsics or just used as is as a GNU vector extension 
> > being
> > mapped to underlying target features.  For example, the packed boolean 
> > vector
> > could directly map to a predicate register on SVE.
> >
> > Also, this new packed boolean type GNU extension can be used with SVE ACLE
> > intrinsics to replace a fixed-length svbool_t.
> >
> > Here are a few options to represent the packed boolean vector type.
>
> The GIMPLE frontend uses a new 'vector_mask' attribute:
>
> typedef int v8si __attribute__((vector_size(8*sizeof(int;
> typedef v8si v8sib __attribute__((vector_mask));
>
> it get's you a vector type that's the appropriate (dependent on the
> target) vector
> mask type for the vector data type (v8si in this case).
>
>
>
> Thanks Richard.
>
> Having had a quick look at the implementation, it does seem to tick the boxes.
>
> I must admit I haven't dug deep, but if the target hook allows the mask to be
>
> defined in way that is target-friendly (and I don't know how much effort it 
> will
>
> be to migrate the attribute to more front-ends), it should do the job nicely.
>
> Let me go back and dig a bit deeper and get back with questions if any.


Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

Sorry for the delayed response – I spent a day experimenting with vector_mask.

Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough
to avoid having to sprinkle the code with ifdefs.

It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.

This sounds very much like what the scenario would be with NEON vs SVE. Coming 
to think
of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
‘base’ vector type
and a ‘w’ specified for the type.

Given its current implementation, if vector_mask is exposed to the CFE, would 
there be any
major challenges wrt implementation or defining behaviour semantics? I played 
around with a
few examples from the testsuite and wrote some new ones. I mostly tried 
operations that
the new type would have to support (unary, binary bitwise, initializations etc) 
– with a couple of exceptions
most of the ops seem to be supported. I also triggered a couple of ICEs in some 
tests involving
implicit conversions to wider/narrower vector_mask types (will raise reports 
for these). Correct me
if I’m wrong here, but we’d probably have to support a couple of new ops if 
vector_mask is exposed
to the CFE – initialization and subscript operations?


Thanks,
Tejas.



Richard.

>
>
> Thanks,
>
> Tejas.
>
>
>
>
>
>
>
> > 1. __attribute__((vector_size (n))) where n represents bytes
> >
> >   typedef bool vbool __attribute__ ((vector_size (1)));
> >
> > In this approach, the shape of the boolean vector is unclear. IoW, it is not
> > clear if each bit in 'n' controls a byte or an element. On targets
> > like SVE, it would be natural to have each bit control a byte of the target
> > vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, 
> > each
> > bit would control one element/lane on the target vector(therefore resulting 
> > in a
> > 'packed' layout with all significant bits at the LSB).
> >
> > 2. __attribute__((vector_size (n))) where n represents num of lanes
> >
> >   typedef int v4si __attribute__ 

[testsuite] tolerate enabled but missing language frontends

2023-06-28 Thread Alexandre Oliva via Gcc-patches


When a language is enabled but we run the testsuite against a tree in
which the frontend compiler is not present, help.exp fails.  It
recognizes the output pattern for a disabled language, but not a
missing frontend.  Extend the pattern so that it covers both cases.

Tested on x86_64-linux-gnu.  Ok to install?


for  gcc/testsuite/ChangeLog

* lib/options.exp (check_for_options_with_filter): Handle
missing frontend compiler like disabled language.
---
 gcc/testsuite/lib/options.exp |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/options.exp b/gcc/testsuite/lib/options.exp
index 30e6e50d703dc..a4b15c14f9c6c 100644
--- a/gcc/testsuite/lib/options.exp
+++ b/gcc/testsuite/lib/options.exp
@@ -59,7 +59,7 @@ proc check_for_options_with_filter { language gcc_options 
exclude \
 set gcc_output [gcc_target_compile $srcfname $filebase.x executable 
$gcc_options]
 remote_file build delete $srcfname $filebase.x $filebase.gcno
 
-if {[regexp -- "compiler not installed on this system" $gcc_output]} {
+if {[regexp -- "compiler not installed on this system|cannot execute" 
$gcc_output]} {
unsupported "$test: $language compiler not available"
return
 }

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] Basic asm blocks should always be volatile

2023-06-28 Thread Julian Waters via Gcc-patches
Hi all,

I've revised the change to be much neater

>From 480954bc7d2b24e5d19a98260a2be0b49e112c42 Mon Sep 17 00:00:00 2001
From: TheShermanTanker 
Date: Wed, 28 Jun 2023 19:11:34 +0800
Subject: [PATCH] asm not using extended syntax should always be volatile

---
 gcc/cp/parser.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index a6341b9..2d5d494 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -22355,7 +22355,7 @@ cp_parser_asm_definition (cp_parser* parser)
   /* Create the ASM_EXPR.  */
   if (parser->in_function_body)
  {
-   asm_stmt = finish_asm_stmt (asm_loc, volatile_p, string, outputs,
+   asm_stmt = finish_asm_stmt (asm_loc, !extended_p || volatile_p, string,
outputs,
inputs, clobbers, labels, inline_p);
/* If the extended syntax was not used, mark the ASM_EXPR.  */
if (!extended_p)
-- 
2.35.1.windows.2


PING: Re: [PATCH] analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [pr110198]

2023-06-28 Thread Benjamin Priour via Gcc-patches

Hi,
Pinging that regression fix.
Is everything OK for trunk ?

Thanks,
Benjamin

On Thu, Jun 22, 2023 at 9:57 PM  wrote:

   From: benjamin priour 

   Resend with proper subject line ...

   Hi,

   Below is the fix to regression bug
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110198
   Was bootstrapped and regtested successfully on x86_64-linux-gnu
   Considering mishap from last patch, I would appreciate if you could
   also regtest it, to be sure :)

   Thanks,
   Benjamin.


   g++.dg/analyzer/pr100244.C was failing after a patch of PR109439.
   The reason was a spurious preemptive return of get_store_value upon
   out-of-bounds read that
   was preventing further checks. Now instead, a boolean value
   check_poisoned goes to false when
   a OOB is detected, and is later on given to get_or_create_initial_value.

   gcc/analyzer/ChangeLog:

        * region-model-manager.cc
   (region_model_manager::get_or_create_initial_value): Take an
                optional boolean value to bypass poisoning checks
        * region-model-manager.h: Update declaration of the above
   function.
        * region-model.cc (region_model::get_store_value): No longer
                returns on OOB, but rather gives a boolean to
   get_or_create_initial_value.
        (region_model::check_region_access): Update docstring.
        (region_model::check_region_for_write): Update docstring.

   Signed-off-by: benjamin priour 
   ---
 gcc/analyzer/region-model-manager.cc |  5 +++--
 gcc/analyzer/region-model-manager.h  |  3 ++-
 gcc/analyzer/region-model.cc         | 15 ---
 3 files changed, 13 insertions(+), 10 deletions(-)

   diff --git a/gcc/analyzer/region-model-manager.cc
   b/gcc/analyzer/region-model-manager.cc
   index 1453acf7bc9..4f11ef4bd29 100644
   --- a/gcc/analyzer/region-model-manager.cc
   +++ b/gcc/analyzer/region-model-manager.cc
   @@ -293,9 +293,10 @@ region_model_manager::create_unique_svalue
   (tree type)
    necessary.  */

 const svalue *
   -region_model_manager::get_or_create_initial_value (const region *reg)
   +region_model_manager::get_or_create_initial_value (const region *reg,
   +                                                  bool check_poisoned)
 {
   -  if (!reg->can_have_initial_svalue_p ())
   +  if (!reg->can_have_initial_svalue_p () && check_poisoned)
     return get_or_create_poisoned_svalue (POISON_KIND_UNINIT,
                                          reg->get_type ());

   diff --git a/gcc/analyzer/region-model-manager.h
   b/gcc/analyzer/region-model-manager.h
   index 3340c3ebd1e..ff5333bf07c 100644
   --- a/gcc/analyzer/region-model-manager.h
   +++ b/gcc/analyzer/region-model-manager.h
   @@ -49,7 +49,8 @@ public:
                                             tree type);
   const svalue *get_or_create_poisoned_svalue (enum poison_kind kind,
                                               tree type);
   -  const svalue *get_or_create_initial_value (const region *reg);
   +  const svalue *get_or_create_initial_value (const region *reg,
   +                                            bool check_poisoned =
   true);
   const svalue *get_ptr_svalue (tree ptr_type, const region *pointee);
   const svalue *get_or_create_unaryop (tree type, enum tree_code op,
                                       const svalue *arg);
   diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
   index 6bc60f89f3d..187013a37cc 100644
   --- a/gcc/analyzer/region-model.cc
   +++ b/gcc/analyzer/region-model.cc
   @@ -2373,8 +2373,9 @@ region_model::get_store_value (const region *reg,
   if (reg->empty_p ())
     return m_mgr->get_or_create_unknown_svalue (reg->get_type ());

   +  bool check_poisoned = true;
   if (check_region_for_read (reg, ctxt))
   -    return m_mgr->get_or_create_unknown_svalue(reg->get_type());
   +    check_poisoned = false;

   /* Special-case: handle var_decls in the constant pool.  */
   if (const decl_region *decl_reg = reg->dyn_cast_decl_region ())
   @@ -2427,7 +2428,7 @@ region_model::get_store_value (const region *reg,
       == RK_GLOBALS)
     return get_initial_value_for_global (reg);

   -  return m_mgr->get_or_create_initial_value (reg);
   +  return m_mgr->get_or_create_initial_value (reg, check_poisoned);
 }

 /* Return false if REG does not exist, true if it may do.
   @@ -2790,7 +2791,7 @@ region_model::get_string_size (const region
   *reg) const

 /* If CTXT is non-NULL, use it to warn about any problems
   accessing REG,
    using DIR to determine if this access is a read or write.
   -   Return TRUE if an UNKNOWN_SVALUE needs be created.
   +   Return TRUE if an OOB access was detected.
    If SVAL_HINT is non-NULL, use it as a hint in diagnostics
    about the value that would be written to REG.  */

   @@ -2804,10 +2805,10 @@ region_model::check_region_access (const
   region *reg,
   

[PATCH] x86: Update model values for Alderlake, Rocketlake and Raptorlake.

2023-06-28 Thread Cui, Lili via Gcc-patches
Hi Hongtao,

This patch is to update model values for Alderlake, Rocketlake and Raptorlake 
according to SDM.

Ok for trunk?

Thanks.
Lili.

Update model values for Alderlake, Rocketlake and Raptorlake according to SDM.

gcc/ChangeLog

* common/config/i386/cpuinfo.h (get_intel_cpu): Remove model value 0xa8
from Rocketlake, move model value 0xbf from Alderlake to Raptorlake.
---
 gcc/common/config/i386/cpuinfo.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 61559ed9de2..ae48bc17771 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -463,7 +463,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
   cpu_model->__cpu_subtype = INTEL_COREI7_SKYLAKE;
   break;
 case 0xa7:
-case 0xa8:
   /* Rocket Lake.  */
   cpu = "rocketlake";
   CHECK___builtin_cpu_is ("corei7");
@@ -536,9 +535,9 @@ get_intel_cpu (struct __processor_model *cpu_model,
   break;
 case 0x97:
 case 0x9a:
-case 0xbf:
   /* Alder Lake.  */
 case 0xb7:
+case 0xbf:
   /* Raptor Lake.  */
 case 0xaa:
 case 0xac:
-- 
2.25.1



Re: [PATCH] tree-optimization/110434 - avoid ={v} {CLOBBER} from NRV

2023-06-28 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 28, 2023 at 10:21:45AM +, Richard Biener via Gcc-patches wrote:
> When NRV replaces a local variable with  it also replaces
> occurences in clobbers.  This leads to  being clobbered
> before the return of it which is strictly invalid but harmless in
> practice since there's no pass after NRV which would remove
> earlier stores.
> 
> The following fixes this nevertheless.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/110434
>   * tree-nrv.cc (pass_nrv::execute): Remove CLOBBERs of
>   VAR we replace with .

This is in a loop over all basic blocks in a function.
Do we want to kill all clobbers, or just the ones at the end of functions
(i.e. after the  = VAR; assignment that we also remove)?
Complication is that doesn't necessarily have to be just the rest of
a single basic block, but all basic blocks from that point until end of
function.
I mean, if we have
  var = whatever;
  use (var);
  var = {CLOBBER};
  ...
  var = whatever_else;
   = var;
  var = {CLOBBER};
killing the first clobber might result in missed optimizations later on.

On the other side, could there be partial clobbers for the var -> ,
  var.fld = {CLOBBER};
?  Or even worse, indirect clobbers (MEM_REF with SSA_NAME pointing to
var or parts of it)?

Jakub



[PATCH v2][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-06-28 Thread Alex Coplan via Gcc-patches
Hi,

This patch implements clang's __has_feature and __has_extension in GCC.
This is a v2 of the original RFC posted here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html

Changes since v1:
 - Follow the clang behaviour where -pedantic-errors means that
   __has_extension behaves exactly like __has_feature.
 - We're now more conservative with reporting C++ features as extensions
   available in C++98. For features where we issue a pedwarn in C++98
   mode, we no longer report these as available extensions for C++98.
 - Switch to using a hash_map to store the features. As well as ensuring
   lookup is constant time, this allows us to dynamically register
   features (right now based on frontend, but later we could allow the
   target to register additional features).
 - Also implement some Objective-C features, add a langhook to dispatch
   to each frontend to allow it to register language-specific features.

There is an outstanding question around what to do with
cxx_binary_literals in the C frontend for C2x. Should we introduce a new
c_binary_literals feature that is a feature in C2x and an extension
below that, or should we just continue using the cxx_binary_literals
feature and mark that as a standard feature in C2x? See the comment in
c_feature_table in the patch.

There is also some doubt over what to do with the undocumented "tls"
feature.  In clang this is gated on whether the target supports TLS, but
in clang (unlike GCC) it is a hard error to use TLS when the target
doesn't support it.  In GCC I believe you can always use TLS, you just
get emulated TLS in the case that the target doesn't support it
natively.  So in this patch GCC always reports having the "tls" feature.
Would appreciate if anyone has feedback on this aspect.

I know Iain was concerned that it should be possible to have
target-specific features. Hopefully it is clear that the design in this
patch is more amenable in this. I think for Darwin it should be possible
to add a targetcm hook to register additional features (either passing
through a callback to allow the target code to add to the hash_map, or
exposing a separate langhook that the target can call to register
features).

Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin. Any
thoughts?

Thanks,
Alex

--

Co-Authored-By: Iain Sandoe 

gcc/c-family/ChangeLog:

PR c++/60512
* c-common.cc (struct hf_feature_info): New.
(struct hf_table_entry): New.
(hf_generic_predicate): New.
(c_common_register_feature): New.
(init_has_feature): New.
(has_feature_p): New.
* c-common.h (c_common_has_feature): New.
(has_feature_p): New.
(c_common_register_feature): New.
(c_register_features): New.
(cp_register_features): New.
* c-lex.cc (init_c_lex): Plumb through has_feature callback.
(c_common_has_builtin): Generalize and move common part ...
(c_common_lex_availability_macro): ... here.
(c_common_has_feature): New.
* c-ppoutput.cc (init_pp_output): Plumb through has_feature.

gcc/c/ChangeLog:

PR c++/60512
* c-lang.cc (LANG_HOOKS_REGISTER_FEATURES): Implement with
c_register_features.
* c-objc-common.cc (struct c_feature_info): New.
(c_has_feature): New.
(c_register_features): New.

gcc/cp/ChangeLog:

PR c++/60512
* cp-lang.cc (LANG_HOOKS_REGISTER_FEATURES): Implement with
cp_register_features.
* cp-objcp-common.cc (struct cp_feature_selector): New.
(cp_feature_selector::has_feature): New.
(struct cp_feature_info): New.
(cp_has_feature): New.
(cp_register_features): New.

gcc/ChangeLog:

PR c++/60512
* doc/cpp.texi: Document __has_{feature,extension}.
* langhooks-def.h (LANG_HOOKS_REGISTER_FEATURES): New.
(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_REGISTER_FEATURES.
* langhooks.h (struct lang_hooks): Add register_features hook.

gcc/objc/ChangeLog:

PR c++/60512
* objc-act.cc (struct objc_feature_info): New.
(objc_nonfragile_abi_p): New.
(objc_has_feature): New.
(objc_common_register_features): New.
* objc-act.h (objc_register_features): New.
(objc_common_register_features): New.
* objc-lang.cc (LANG_HOOKS_REGISTER_FEATURES): Implement with
objc_register_features.
(objc_register_features): New.

gcc/objcp/ChangeLog:

PR c++/60512
* objcp-lang.cc (objcxx_register_features): New.
(LANG_HOOKS_REGISTER_FEATURES): Implement with
objcxx_register_features.

libcpp/ChangeLog:

PR c++/60512
* include/cpplib.h (struct cpp_callbacks): Add has_feature.
(enum cpp_builtin_type): Add BT_HAS_{FEATURE,EXTENSION}.
* init.cc: Add __has_{feature,extension}.
* macro.cc (_cpp_builtin_macro_text): Handle
BT_HAS_{FEATURE,EXTENSION}.


[PATCH] tree-optimization/110434 - avoid ={v} {CLOBBER} from NRV

2023-06-28 Thread Richard Biener via Gcc-patches
When NRV replaces a local variable with  it also replaces
occurences in clobbers.  This leads to  being clobbered
before the return of it which is strictly invalid but harmless in
practice since there's no pass after NRV which would remove
earlier stores.

The following fixes this nevertheless.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/110434
* tree-nrv.cc (pass_nrv::execute): Remove CLOBBERs of
VAR we replace with .
---
 gcc/tree-nrv.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/tree-nrv.cc b/gcc/tree-nrv.cc
index ff47439647c..466b491e4e7 100644
--- a/gcc/tree-nrv.cc
+++ b/gcc/tree-nrv.cc
@@ -256,6 +256,14 @@ pass_nrv::execute (function *fun)
  gsi_remove (, true);
  release_defs (stmt);
}
+ /* If this is a CLOBBER of VAR, remove it.  */
+ else if (gimple_clobber_p (stmt)
+  && gimple_assign_lhs (stmt) == found)
+   {
+ unlink_stmt_vdef (stmt);
+ gsi_remove (, true);
+ release_defs (stmt);
+   }
  else
{
  struct walk_stmt_info wi;
-- 
2.35.3


Re: [PATCH 2/2] [testsuite, arm]: Make mve_fp_fpu[12].c accept single or double precision FPU

2023-06-28 Thread Richard Earnshaw (lists) via Gcc-patches

On 28/06/2023 10:26, Christophe Lyon via Gcc-patches wrote:

This tests currently expect a directive containing .fpu fpv5-sp-d16
and thus may fail if the test is executed for instance with
-march=armv8.1-m.main+mve.fp+fp.dp

This patch accepts either fpv5-sp-d16 or fpv5-d16 to avoid the failure.

2023-06-28  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c: Fix .fpu
scan-assembler.
* gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c: Likewise.
---
  gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c
index e375327fb97..8358a616bb5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c
@@ -12,4 +12,4 @@ foo1 (int8x16_t value)
return b;
  }
  
-/* { dg-final { scan-assembler "\.fpu fpv5-sp-d16" }  } */

+/* { dg-final { scan-assembler "\.fpu fpv5(-sp|)-d16" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c
index 1fca1100cf0..5dd2feefc35 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c
@@ -12,4 +12,4 @@ foo1 (int8x16_t value)
return b;
  }
  
-/* { dg-final { scan-assembler "\.fpu fpv5-sp-d16" }  } */

+/* { dg-final { scan-assembler "\.fpu fpv5(-sp|)-d16" }  } */


OK.


Re: [PATCH 1/2] [testsuite,arm]: Make nomve_fp_1.c require arm_fp

2023-06-28 Thread Richard Earnshaw (lists) via Gcc-patches

On 28/06/2023 10:26, Christophe Lyon via Gcc-patches wrote:

If GCC is configured with the default (soft) -mfloat-abi, and we don't
override the target_board test flags appropriately,
gcc.target/arm/mve/general-c/nomve_fp_1.c fails for lack of
-mfloat-abi=softfp or -mfloat-abi=hard, because it doesn't use
dg-add-options arm_v8_1m_mve (on purpose, see comment in the test).

Require and use the options needed for arm_fp to fix this problem.

2023-06-28  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/general-c/nomve_fp_1.c: Require arm_fp.
---
  gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c 
b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
index 21c2af16a61..c9d279ead68 100644
--- a/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
+++ b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
@@ -1,9 +1,11 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-require-effective-target arm_fp_ok } */
  /* Do not use dg-add-options arm_v8_1m_mve, because this might expand to "",
 which could imply mve+fp depending on the user settings. We want to make
 sure the '+fp' extension is not enabled.  */
  /* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
+/* { dg-add-options arm_fp } */
  
  #include 
  


OK.


  1   2   >