date:20230505

Re: [PATCH] Remove type from vrange_storage::equal_p.

2023-05-05 Thread Aldy Hernandez via Gcc-patches


On 5/3/23 13:41, Aldy Hernandez wrote:

[Andrew, since you suggested this, is this what you had in mind?].


Pushed.  You can comment when you're back from vacation :).

Aldy


The equal_p method in vrange_storage is only used to compare ranges
that are the same type.  No sense passing the type if it can be
determined from the range being compared.

gcc/ChangeLog:

* gimple-range-cache.cc (sbr_sparse_bitmap::set_bb_range): Do not
pass type to vrange_storage::equal_p.
* value-range-storage.cc (vrange_storage::equal_p): Remove type.
(irange_storage::equal_p): Same.
(frange_storage::equal_p): Same.
* value-range-storage.h (class frange_storage): Same.
---
  gcc/gimple-range-cache.cc  |  2 +-
  gcc/value-range-storage.cc | 28 +++-
  gcc/value-range-storage.h  |  6 +++---
  3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 92622fc5000..07c69ef858a 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -320,7 +320,7 @@ sbr_sparse_bitmap::set_bb_range (const_basic_block bb, const 
vrange &r)
  
// Loop thru the values to see if R is already present.

for (int x = 0; x < SBR_NUM; x++)
-if (!m_range[x] || m_range[x]->equal_p (r, m_type))
+if (!m_range[x] || m_range[x]->equal_p (r))
{
if (!m_range[x])
  m_range[x] = m_range_allocator->clone (r);
diff --git a/gcc/value-range-storage.cc b/gcc/value-range-storage.cc
index 7d2de5e8384..1e06a7acc8d 100644
--- a/gcc/value-range-storage.cc
+++ b/gcc/value-range-storage.cc
@@ -206,20 +206,22 @@ vrange_storage::fits_p (const vrange &r) const
return false;
  }
  
-// Return TRUE if the range in storage is equal to R.

+// Return TRUE if the range in storage is equal to R.  It is the
+// caller's responsibility to verify that the type of the range in
+// storage matches that of R.
  
  bool

-vrange_storage::equal_p (const vrange &r, tree type) const
+vrange_storage::equal_p (const vrange &r) const
  {
if (is_a  (r))
  {
const irange_storage *s = static_cast  (this);
-  return s->equal_p (as_a  (r), type);
+  return s->equal_p (as_a  (r));
  }
if (is_a  (r))
  {
const frange_storage *s = static_cast  (this);
-  return s->equal_p (as_a  (r), type);
+  return s->equal_p (as_a  (r));
  }
gcc_unreachable ();
  }
@@ -375,21 +377,17 @@ irange_storage::get_irange (irange &r, tree type) const
  }
  
  bool

-irange_storage::equal_p (const irange &r, tree type) const
+irange_storage::equal_p (const irange &r) const
  {
if (m_kind == VR_UNDEFINED || r.undefined_p ())
  return m_kind == r.m_kind;
if (m_kind == VR_VARYING || r.varying_p ())
-return m_kind == r.m_kind && types_compatible_p (r.type (), type);
-
-  tree rtype = r.type ();
-  if (!types_compatible_p (rtype, type))
-return false;
+return m_kind == r.m_kind;
  
// ?? We could make this faster by doing the comparison in place,

// without going through get_irange.
int_range_max tmp;
-  get_irange (tmp, rtype);
+  get_irange (tmp, r.type ());
return tmp == r;
  }
  
@@ -526,17 +524,13 @@ frange_storage::get_frange (frange &r, tree type) const

  }
  
  bool

-frange_storage::equal_p (const frange &r, tree type) const
+frange_storage::equal_p (const frange &r) const
  {
if (r.undefined_p ())
  return m_kind == VR_UNDEFINED;
  
-  tree rtype = type;

-  if (!types_compatible_p (rtype, type))
-return false;
-
frange tmp;
-  get_frange (tmp, rtype);
+  get_frange (tmp, r.type ());
return tmp == r;
  }
  
diff --git a/gcc/value-range-storage.h b/gcc/value-range-storage.h

index 4ec0da73059..f25489f32c1 100644
--- a/gcc/value-range-storage.h
+++ b/gcc/value-range-storage.h
@@ -54,7 +54,7 @@ public:
void get_vrange (vrange &r, tree type) const;
void set_vrange (const vrange &r);
bool fits_p (const vrange &r) const;
-  bool equal_p (const vrange &r, tree type) const;
+  bool equal_p (const vrange &r) const;
  protected:
// Stack initialization disallowed.
vrange_storage () { }
@@ -68,7 +68,7 @@ public:
static irange_storage *alloc (vrange_internal_alloc &, const irange &);
void set_irange (const irange &r);
void get_irange (irange &r, tree type) const;
-  bool equal_p (const irange &r, tree type) const;
+  bool equal_p (const irange &r) const;
bool fits_p (const irange &r) const;
void dump () const;
  private:
@@ -111,7 +111,7 @@ class frange_storage : public vrange_storage
static frange_storage *alloc (vrange_internal_alloc &, const frange &r);
void set_frange (const frange &r);
void get_frange (frange &r, tree type) const;
-  bool equal_p (const frange &r, tree type) const;
+  bool equal_p (const frange &r) const;
bool fits_p (const frange &) const;
   private:
frange_storage (const frange &r) { set_frange (r); }

Re: [PATCH] gimple-range-op: Improve handling of sin/cos ranges

2023-05-05 Thread Aldy Hernandez via Gcc-patches





On 5/5/23 22:53, Jakub Jelinek wrote:

Hi!

Similarly to the earlier sqrt patch, this patch attempts to improve
sin/cos ranges.  As the functions are periodic, for the reverse range
there is not much we can do (but I've discovered I forgot to take
into account the boundary ulps for the discovery of impossible result
ranges).  For fold_range, we can do something only if the range is
narrow enough (narrower than 2*pi).  The patch computes the value of
the functions (taking ulps into account) and also computes the derivative
to find out if the function is growing or declining on the boundaries and
from that it figures out if the result range should be
[min (fn (lb), fn (ub)), max (fn (lb), fn (ub))] or if it needs to be
extended to 1 (actually using +Inf) and/or -1 (actually using -Inf) because
there must be a local minimum and/or maximum in the range.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


It's getting to the point where I think my reviews are getting less 
useful for you.  The mathematical bits are beyond my area of expertise 
and I'm just limiting myself to commenting on general style in range-op, 
etc.


I have no problem with you going ahead with this patch, but it may be 
beneficial to get someone else's opinion on the math bits.  Up to you. 
I don't want to impede your progress here.


Thanks for your work in this area.

Aldy



2023-05-05  Jakub Jelinek  

* real.h (dconst_pi): Define.
(dconst_e_ptr): Formatting fix.
(dconst_pi_ptr): Declare.
* real.cc (dconst_pi_ptr): New function.
* gimple-range-op.cc (cfn_sincos::fold_range): Intersect the generic
boundaries range with range computed from sin/cos of the particular
bounds if the argument range is shorter than 2*pi.
(cfn_sincos::op1_range): Take bulps into account when determining
which result ranges are always invalid or behave like known NAN.

* gcc.dg/tree-ssa/range-sincos-2.c: New test.

--- gcc/real.h.jj   2023-04-19 09:33:59.434350121 +0200
+++ gcc/real.h  2023-05-05 16:36:35.606611170 +0200
@@ -480,9 +480,13 @@ extern REAL_VALUE_TYPE dconstninf;
  #define dconst_sixth() (*dconst_sixth_ptr ())
  #define dconst_ninth() (*dconst_ninth_ptr ())
  #define dconst_sqrt2() (*dconst_sqrt2_ptr ())
+#define dconst_pi() (*dconst_pi_ptr ())
  
  /* Function to return the real value special constant 'e'.  */

-extern const REAL_VALUE_TYPE * dconst_e_ptr (void);
+extern const REAL_VALUE_TYPE *dconst_e_ptr (void);
+
+/* Function to return the real value special constant 'pi'.  */
+extern const REAL_VALUE_TYPE *dconst_pi_ptr (void);
  
  /* Returns a cached REAL_VALUE_TYPE corresponding to 1/n, for various n.  */

  extern const REAL_VALUE_TYPE *dconst_third_ptr (void);
--- gcc/real.cc.jj  2023-04-20 09:36:09.066376175 +0200
+++ gcc/real.cc 2023-05-05 16:39:25.244201299 +0200
@@ -2475,6 +2475,26 @@ dconst_e_ptr (void)
return &value;
  }
  
+/* Returns the special REAL_VALUE_TYPE corresponding to 'pi'.  */

+
+const REAL_VALUE_TYPE *
+dconst_pi_ptr (void)
+{
+  static REAL_VALUE_TYPE value;
+
+  /* Initialize mathematical constants for constant folding builtins.
+ These constants need to be given to at least 160 bits precision.  */
+  if (value.cl == rvc_zero)
+{
+  auto_mpfr m (SIGNIFICAND_BITS);
+  mpfr_set_si (m, -1, MPFR_RNDN);
+  mpfr_acos (m, m, MPFR_RNDN);
+  real_from_mpfr (&value, m, NULL_TREE, MPFR_RNDN);
+
+}
+  return &value;
+}
+
  /* Returns a cached REAL_VALUE_TYPE corresponding to 1/n, for various n.  */
  
  #define CACHED_FRACTION(NAME, N)	\

--- gcc/gimple-range-op.cc.jj   2023-05-05 16:02:48.174419009 +0200
+++ gcc/gimple-range-op.cc  2023-05-05 19:44:27.292304968 +0200
@@ -633,6 +633,98 @@ public:
}
  if (!lh.maybe_isnan () && !lh.maybe_isinf ())
r.clear_nan ();
+
+unsigned ulps
+  = targetm.libm_function_max_error (m_cfn, TYPE_MODE (type), false);
+if (ulps == ~0U)
+  return true;
+REAL_VALUE_TYPE lb = lh.lower_bound ();
+REAL_VALUE_TYPE ub = lh.upper_bound ();
+REAL_VALUE_TYPE diff;
+real_arithmetic (&diff, MINUS_EXPR, &ub, &lb);
+if (!real_isfinite (&diff))
+  return true;
+REAL_VALUE_TYPE pi = dconst_pi ();
+REAL_VALUE_TYPE pix2;
+real_arithmetic (&pix2, PLUS_EXPR, &pi, &pi);
+// We can only try to narrow the range further if ub-lb < 2*pi.
+if (!real_less (&diff, &pix2))
+  return true;
+REAL_VALUE_TYPE lb_lo, lb_hi, ub_lo, ub_hi;
+REAL_VALUE_TYPE lb_deriv_lo, lb_deriv_hi, ub_deriv_lo, ub_deriv_hi;
+if (!frange_mpfr_arg1 (&lb_lo, &lb_hi,
+  m_cfn == CFN_SIN ? mpfr_sin : mpfr_cos, lb,
+  type, ulps)
+   || !frange_mpfr_arg1 (&ub_lo, &ub_hi,
+ m_cfn == CFN_SIN ? mpfr_sin : mpfr_cos, ub,
+ type, ulps)
+   || !frange_mpfr_arg1 (&lb_deriv_lo, &lb_deriv_hi,
+

Re: [patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-05 Thread Steve Kargl via Gcc-patches

On Fri, May 05, 2023 at 08:41:48PM -0700, Jerry D via Fortran wrote:
> The attached patch adds a check for the invalid comma and emits a runtime
> error if -std=f95,f2003,f2018 are specified at compile time.
> 
> Attached patch includes a new test case.
> 
> Regression tested on x86_64-linux-gnu.
> 
> OK for mainline?
> 

Yes.  Thanks for the fix.  It's been a long time since
I looked at libgfortran code and couldn't quite determine
where to start to fix this.

-- 
Steve

[PATCH V7] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (preferred_simd_mode): New function.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto.
(preferred_simd_mode): Ditto.
* config/riscv/riscv.cc (riscv_get_arg_info): Handle RVV type in 
function arg.
(riscv_convert_vector_bits): Adjust for RVV auto-vectorization.
(riscv_preferred_simd_mode): New function.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New target hook support.
* config/riscv/vector.md: Add autovec.md.
* config/riscv/autovec.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add testcases for RVV 
auto-vectorization.
* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/scalable-1.c: New test.
* gcc.target/riscv/rvv/autovec/template-1.h: New test.
* gcc.target/riscv/rvv/autovec/v-1.c: New test.
* gcc.target/riscv/rvv/autovec/v-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.

---
 gcc/config/riscv/autovec.md   |  49 
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |  51 +
 gcc/config/riscv/riscv.cc |  31 -
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/fixed-vlmax-1.c |  24 
 .../rvv/autovec/partial/single_rgroup-1.c |   8 ++
 .../rvv/autovec/partial/single_rgroup-1.h | 106 ++
 .../rvv/autovec/partial/single_rgroup_run-1.c |  19 
 .../gcc.target/riscv/rvv/autovec/scalable-1.c |  17 +++
 .../gcc.target/riscv/rvv/autovec/template-1.h |  68 +++
 .../gcc.target/riscv/rvv/autovec/v-1.c|  11 ++
 .../gcc.target/riscv/rvv/autovec/v-2.c|   6 +
 .../gcc.target/riscv/rvv/autovec/zve32f-1.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32f-2.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32f-3.c   |   6 +
 .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   6 +
 .../riscv/rvv/autovec/zve32f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-1.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-2.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-3.c   |   6 +
 .../riscv/rvv/autovec/zve32x_zvl128b-1.c  |   6 +
 .../riscv/rvv/autovec/zve32x_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64d-2.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64d-3.c   |   6 +
 .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   6 +
 .../riscv/rvv/autovec/zve64d_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64f-2.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64f-3.c   |   6 +
 .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   6 +
 .../riscv/rvv/autovec/zve64f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64x-1.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64x-2.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64x-3.c   |   6 +
 .../riscv/rvv/autovec/zve64x_zvl128b-1.c  |   6 +
 .../riscv/rvv/autovec/zve64x_zvl128b-2.c  |   6 +
 gcc

[patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-05 Thread Jerry D via Gcc-patches

The attached patch adds a check for the invalid comma and emits a 
runtime error if -std=f95,f2003,f2018 are specified at compile time.


Attached patch includes a new test case.

Regression tested on x86_64-linux-gnu.

OK for mainline?

Regards,

Jerry

Author: Jerry DeLisle 
Date:   Fri May 5 20:12:25 2023 -0700

Fortran: Namelist read with invalid input accepted.

PR fortran/109662

libgfortran/ChangeLog:

* io/list_read.c: Add a check for a comma after a namelist
name in read input. Issue a runtime error message.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr109662.f90: New test.
diff --git a/gcc/testsuite/gfortran.dg/pr109662.f90 b/gcc/testsuite/gfortran.dg/pr109662.f90
new file mode 100644
index 000..988cfab73cc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr109662.f90
@@ -0,0 +1,15 @@
+! { dg-do run }
+! { dg-options "-std=f2003" }
+! PR109662 a comma after namelist name accepted on input. 
+program testnmlread
+  implicit none
+  character(16) :: list = '&stuff, n = 759/'
+  character(100)::message
+  integer   :: n, ioresult
+  namelist/stuff/n
+  message = ""
+  ioresult = 0
+  n = 99
+  read(list,nml=stuff,iostat=ioresult)
+  if (ioresult == 0) STOP 13
+end program testnmlread
diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index 109313c15b1..78bfd9e8787 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -3596,8 +3596,12 @@ find_nml_name:
   if (dtp->u.p.nml_read_error)
 goto find_nml_name;
 
-  /* A trailing space is required, we give a little latitude here, 10.9.1.  */
+  /* A trailing space is required, we allow a comma with std=gnu.  */
   c = next_char (dtp);
+  if (c == ',' && !(compile_options.allow_std & GFC_STD_GNU))
+generate_error (&dtp->common, LIBERROR_READ_VALUE,
+		"Comma after namelist name not allowed");
+
   if (!is_separator(c) && c != '!')
 {
   unget_char (dtp, c);

New Croatian PO file for 'gcc' (version 13.1.0)

2023-05-05 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.1.0.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-05 Thread Li, Pan2 via Gcc-patches

Picked all changes mentioned in previous to single patch as attachment. Please 
help to review if any mistake.

Pan

-Original Message-
From: Li, Pan2 
Sent: Saturday, May 6, 2023 10:20 AM
To: Kito Cheng 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Yes, that makes sense, will have a try and keep you posted.

Pan

-Original Message-
From: Kito Cheng 
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

I think x86 first? The major thing we want to make sure is that this change 
won't affect those targets which do not really require 16 bit machine_mode too 
much.


On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches 
 wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the 
> overall diff before start to make sure my understand is correct) the test 
> again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng ; Li, Pan2 
> Cc: rguenther ; richard.sandiford 
> ; jeffreyalaw ; 
> gcc-patches ; palmer ; 
> jakub 
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to 
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> 
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-06 09:53
> To: Li, Pan2
> CC: Richard Biener;
> 钟居哲;
> richard.sandiford; Jeff 
> Law;
> gcc-patches;
> palmer; jakub
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This 
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common=
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
>   tree attributes;
>   unsigned int uid;
>
> +  ENUM_BITFIELD(machine_mode) mode : 16;
> +
>   unsigned int precision : 10;
>   unsigned no_force_blk_flag : 1;
>   unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct
> GTY(()) tree_type_common {
>   unsigned restrict_flag : 1;
>   unsigned contains_placeholder_bits : 2;
>
> -  ENUM_BITFIELD(machine_mode) mode : 16;
>
>   /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
>  TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */ @@ -1712,7
> +1713,7 @@ struct GTY(()) tree_type_common {
>   unsigned empty_flag : 1;
>   unsigned indivisible_p : 1;
>   unsigned no_named_args_stdarg_p : 1;
> -  unsigned spare : 15;
> +  unsigned spare : 7;
>
>   alias_set_type alias_set;
>   tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches 
> mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update 
> > the correlated memory bytes allocated for the X86 target.
> >
> > Bytes allocated with O2:
> > -
> > Benchmark   |  upstream | with this PATCH
> > -
> > 400.perlbench   | 25286185160   | 25176544846 ~0.0%
> > 401.bzip2   | 1429883731| 1391040027 -2.7%
> > 403.gcc | 55023568981   | 54798890746 ~0.0%
> > 429.mcf | 1360975660| 1321537710 -2.9%
> > 445.gobmk   | 12791636502   | 12666523431 -1.0%
> > 456.hmmer   | 9354433652| 9279189174 ~0.0%
> > 458.sjeng   | 1991260562| 1944031904 -2.4%
> > 462.libquantum  | 1725112078| 1684213981 -2.4%
> > 464.h264ref | 8597673515| 8528855778 ~0.0%
> > 471.omnetpp | 37613034778   | 37432278047 ~0.0%
> > 473.astar   | 3817295518| 3772460508 -1.2%
> > 483.xalancbmk   | 149418776991  | 148545162207 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
>

Re: [PATCH V2] RISC-V: Fix incorrect demand info merge in local vsetvli optimization [PR109748]

2023-05-05 Thread Kito Cheng via Gcc-patches

Thanks, committed to trunk!

On Fri, May 5, 2023 at 10:13 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is fixing my recent optimization patch:
> https://github.com/gcc-mirror/gcc/commit/d51f2456ee51bd59a79b4725ca0e488c25260bbf
>
> In that patch, the new_info = parse_insn (i) is not correct.
> Since consider the following case:
>
> vsetvli a5,a4, e8,m1
> ..
> vsetvli zero,a5, e32, m4
> vle8.v
> vmacc.vv
> ...
>
> Since we have backward demand fusion in Phase 1, so the real demand of 
> "vle8.v" is e32, m4.
> However, if we use parse_insn (vle8.v) = e8, m1 which is not correct.
>
> So this patch we change new_info = new_info.parse_insn (i)
> into:
>
> vector_insn_info new_info = m_vector_manager->vector_insn_infos[i->uid ()];
>
> So that, we can correctly optimize codes into:
>
> vsetvli a5,a4, e32, m4
> ..
> .. (vsetvli zero,a5, e32, m4 is removed)
> vle8.v
> vmacc.vv
>
> Since m_vector_manager->vector_insn_infos is the member variable of 
> pass_vsetvl class.
> We remove static void function "local_eliminate_vsetvl_insn", and make it as 
> the member function
> of pass_vsetvl class.
>
> PR target/109748
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Remove 
> it.
> (pass_vsetvl::local_eliminate_vsetvl_insn): New function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr109748.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc  | 102 ++
>  .../gcc.target/riscv/rvv/vsetvl/pr109748.c|  36 +++
>  2 files changed, 93 insertions(+), 45 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109748.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 39b4d21210b..e1efd7b1c40 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1056,51 +1056,6 @@ change_vsetvl_insn (const insn_info *insn, const 
> vector_insn_info &info)
>change_insn (rinsn, new_pat);
>  }
>
> -static void
> -local_eliminate_vsetvl_insn (const vector_insn_info &dem)
> -{
> -  const insn_info *insn = dem.get_insn ();
> -  if (!insn || insn->is_artificial ())
> -return;
> -  rtx_insn *rinsn = insn->rtl ();
> -  const bb_info *bb = insn->bb ();
> -  if (vsetvl_insn_p (rinsn))
> -{
> -  rtx vl = get_vl (rinsn);
> -  for (insn_info *i = insn->next_nondebug_insn ();
> -  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
> -   {
> - if (i->is_call () || i->is_asm ()
> - || find_access (i->defs (), VL_REGNUM)
> - || find_access (i->defs (), VTYPE_REGNUM))
> -   return;
> -
> - if (has_vtype_op (i->rtl ()))
> -   {
> - if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
> -   return;
> - rtx avl = get_avl (i->rtl ());
> - if (avl != vl)
> -   return;
> - set_info *def = find_access (i->uses (), REGNO (avl))->def ();
> - if (def->insn () != insn)
> -   return;
> -
> - vector_insn_info new_info;
> - new_info.parse_insn (i);
> - if (!new_info.skip_avl_compatible_p (dem))
> -   return;
> -
> - new_info.set_avl_info (dem.get_avl_info ());
> - new_info = dem.merge (new_info, LOCAL_MERGE);
> - change_vsetvl_insn (insn, new_info);
> - eliminate_insn (PREV_INSN (i->rtl ()));
> - return;
> -   }
> -   }
> -}
> -}
> -
>  static bool
>  source_equal_p (insn_info *insn1, insn_info *insn2)
>  {
> @@ -2672,6 +2627,7 @@ private:
>void pre_vsetvl (void);
>
>/* Phase 5.  */
> +  void local_eliminate_vsetvl_insn (const vector_insn_info &) const;
>void cleanup_insns (void) const;
>
>/* Phase 6.  */
> @@ -3993,6 +3949,62 @@ pass_vsetvl::pre_vsetvl (void)
>  commit_edge_insertions ();
>  }
>
> +/* Local user vsetvl optimizaiton:
> +
> + Case 1:
> +   vsetvl a5,a4,e8,mf8
> +   ...
> +   vsetvl zero,a5,e8,mf8 --> Eliminate directly.
> +
> + Case 2:
> +   vsetvl a5,a4,e8,mf8--> vsetvl a5,a4,e32,mf2
> +   ...
> +   vsetvl zero,a5,e32,mf2 --> Eliminate directly.  */
> +void
> +pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const
> +{
> +  const insn_info *insn = dem.get_insn ();
> +  if (!insn || insn->is_artificial ())
> +return;
> +  rtx_insn *rinsn = insn->rtl ();
> +  const bb_info *bb = insn->bb ();
> +  if (vsetvl_insn_p (rinsn))
> +{
> +  rtx vl = get_vl (rinsn);
> +  for (insn_info *i = insn->next_nondebug_insn ();
> +  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
> +   {
> + if (i->is_call () || i->is_asm ()
> + || find_access (i->defs (), VL_REGNUM)
> + || find_access (i->defs (), VTYPE_REGNUM))
> +   return;
> +
> + if (has_vtype_op (i->rtl

RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-05 Thread Li, Pan2 via Gcc-patches

Yes, that makes sense, will have a try and keep you posted.

Pan

-Original Message-
From: Kito Cheng  
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

I think x86 first? The major thing we want to make sure is that this change 
won't affect those targets which do not really require 16 bit machine_mode too 
much.


On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches 
 wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the 
> overall diff before start to make sure my understand is correct) the test 
> again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng ; Li, Pan2 
> Cc: rguenther ; richard.sandiford 
> ; jeffreyalaw ; 
> gcc-patches ; palmer ; 
> jakub 
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to 
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> 
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-06 09:53
> To: Li, Pan2
> CC: Richard Biener; 
> 钟居哲; 
> richard.sandiford; Jeff 
> Law; 
> gcc-patches; 
> palmer; jakub
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This 
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common= 
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this 
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
>   tree attributes;
>   unsigned int uid;
>
> +  ENUM_BITFIELD(machine_mode) mode : 16;
> +
>   unsigned int precision : 10;
>   unsigned no_force_blk_flag : 1;
>   unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct 
> GTY(()) tree_type_common {
>   unsigned restrict_flag : 1;
>   unsigned contains_placeholder_bits : 2;
>
> -  ENUM_BITFIELD(machine_mode) mode : 16;
>
>   /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
>  TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */ @@ -1712,7 
> +1713,7 @@ struct GTY(()) tree_type_common {
>   unsigned empty_flag : 1;
>   unsigned indivisible_p : 1;
>   unsigned no_named_args_stdarg_p : 1;
> -  unsigned spare : 15;
> +  unsigned spare : 7;
>
>   alias_set_type alias_set;
>   tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches 
> mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update 
> > the correlated memory bytes allocated for the X86 target.
> >
> > Bytes allocated with O2:
> > -
> > Benchmark   |  upstream | with this PATCH
> > -
> > 400.perlbench   | 25286185160   | 25176544846 ~0.0%
> > 401.bzip2   | 1429883731| 1391040027 -2.7%
> > 403.gcc | 55023568981   | 54798890746 ~0.0%
> > 429.mcf | 1360975660| 1321537710 -2.9%
> > 445.gobmk   | 12791636502   | 12666523431 -1.0%
> > 456.hmmer   | 9354433652| 9279189174 ~0.0%
> > 458.sjeng   | 1991260562| 1944031904 -2.4%
> > 462.libquantum  | 1725112078| 1684213981 -2.4%
> > 464.h264ref | 8597673515| 8528855778 ~0.0%
> > 471.omnetpp | 37613034778   | 37432278047 ~0.0%
> > 473.astar   | 3817295518| 3772460508 -1.2%
> > 483.xalancbmk   | 149418776991  | 148545162207 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > --
> > Benchmark   |  upstream | with this PATCH
> > --
> > 400.perlbench   | 30438407499   | 30574152897 ~0.0%
> > 401.bzip2   | 2277114519| 2319432664

Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-05 Thread Kito Cheng via Gcc-patches

I think x86 first? The major thing we want to make sure is that this
change won't affect those targets which do not really require 16 bit
machine_mode too much.


On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches
 wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the 
> overall diff before start to make sure my understand is correct) the test 
> again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng ; Li, Pan2 
> Cc: rguenther ; richard.sandiford 
> ; jeffreyalaw ; gcc-patches 
> ; palmer ; jakub 
> 
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit 
> to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to Richard 
> suggestion
> since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> 
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-06 09:53
> To: Li, Pan2
> CC: Richard Biener; 
> 钟居哲; 
> richard.sandiford; Jeff 
> Law; 
> gcc-patches; 
> palmer; jakub
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit 
> to 16-bit
> Hi Pan:
>
> Could you try to apply the following diff and measure again? This
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit)
> sizeof tree_type_common= 136 (mode = 16 bit)
> sizeof tree_type_common= 128 (mode = 8 bit w/ this diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
>   tree attributes;
>   unsigned int uid;
>
> +  ENUM_BITFIELD(machine_mode) mode : 16;
> +
>   unsigned int precision : 10;
>   unsigned no_force_blk_flag : 1;
>   unsigned needs_constructing_flag : 1;
> @@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common {
>   unsigned restrict_flag : 1;
>   unsigned contains_placeholder_bits : 2;
>
> -  ENUM_BITFIELD(machine_mode) mode : 16;
>
>   /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
>  TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */
> @@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common {
>   unsigned empty_flag : 1;
>   unsigned indivisible_p : 1;
>   unsigned no_named_args_stdarg_p : 1;
> -  unsigned spare : 15;
> +  unsigned spare : 7;
>
>   alias_set_type alias_set;
>   tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
> mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update 
> > the correlated memory bytes allocated for the X86 target.
> >
> > Bytes allocated with O2:
> > -
> > Benchmark   |  upstream | with this PATCH
> > -
> > 400.perlbench   | 25286185160   | 25176544846 ~0.0%
> > 401.bzip2   | 1429883731| 1391040027 -2.7%
> > 403.gcc | 55023568981   | 54798890746 ~0.0%
> > 429.mcf | 1360975660| 1321537710 -2.9%
> > 445.gobmk   | 12791636502   | 12666523431 -1.0%
> > 456.hmmer   | 9354433652| 9279189174 ~0.0%
> > 458.sjeng   | 1991260562| 1944031904 -2.4%
> > 462.libquantum  | 1725112078| 1684213981 -2.4%
> > 464.h264ref | 8597673515| 8528855778 ~0.0%
> > 471.omnetpp | 37613034778   | 37432278047 ~0.0%
> > 473.astar   | 3817295518| 3772460508 -1.2%
> > 483.xalancbmk   | 149418776991  | 148545162207 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > --
> > Benchmark   |  upstream | with this PATCH
> > --
> > 400.perlbench   | 30438407499   | 30574152897 ~0.0%
> > 401.bzip2   | 2277114519| 2319432664 +1.9%
> > 403.gcc | 64499664264   | 64781232731 ~0.0%
> > 429.mcf | 1361486758| 1399942116 +2.8%
> > 445.gobmk   | 15258056111   | 15396801542 +1.0%
> > 456.hmmer   | 10896615649   | 10936223486 ~0.0%
> > 458.sjeng   | 2592620709| 2641687496 +1.9%
> > 46

RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-05 Thread Li, Pan2 via Gcc-patches

Sure thing, I will pick them all together and trigger(will send out the overall 
diff before start to make sure my understand is correct) the test again. BTW 
which target do we prefer first? X86 or RISC-V.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Saturday, May 6, 2023 10:00 AM
To: kito.cheng ; Li, Pan2 
Cc: rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Yeah, you should also swap mode and code in rtx_def according to Richard 
suggestion
since it will not change the rtx_def data structure.

I think the only problem is the mode in tree data structure.

juzhe.zh...@rivai.ai

From: Kito Cheng
Date: 2023-05-06 09:53
To: Li, Pan2
CC: Richard Biener; 钟居哲; 
richard.sandiford; Jeff 
Law; gcc-patches; 
palmer; jakub
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
Hi Pan:

Could you try to apply the following diff and measure again? This
makes tree_type_common size unchanged.


sizeof tree_type_common= 128 (mode = 8 bit)
sizeof tree_type_common= 136 (mode = 16 bit)
sizeof tree_type_common= 128 (mode = 8 bit w/ this diff)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af795aa81f98..b8ccfa407ed9 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
  tree attributes;
  unsigned int uid;

+  ENUM_BITFIELD(machine_mode) mode : 16;
+
  unsigned int precision : 10;
  unsigned no_force_blk_flag : 1;
  unsigned needs_constructing_flag : 1;
@@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common {
  unsigned restrict_flag : 1;
  unsigned contains_placeholder_bits : 2;

-  ENUM_BITFIELD(machine_mode) mode : 16;

  /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
 TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */
@@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common {
  unsigned empty_flag : 1;
  unsigned indivisible_p : 1;
  unsigned no_named_args_stdarg_p : 1;
-  unsigned spare : 15;
+  unsigned spare : 7;

  alias_set_type alias_set;
  tree pointer_to;

On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
mailto:gcc-patches@gcc.gnu.org>> wrote:
>
> Yes, totally agree the number cannot be very accurate up to a point. Update 
> the correlated memory bytes allocated for the X86 target.
>
> Bytes allocated with O2:
> -
> Benchmark   |  upstream | with this PATCH
> -
> 400.perlbench   | 25286185160   | 25176544846 ~0.0%
> 401.bzip2   | 1429883731| 1391040027 -2.7%
> 403.gcc | 55023568981   | 54798890746 ~0.0%
> 429.mcf | 1360975660| 1321537710 -2.9%
> 445.gobmk   | 12791636502   | 12666523431 -1.0%
> 456.hmmer   | 9354433652| 9279189174 ~0.0%
> 458.sjeng   | 1991260562| 1944031904 -2.4%
> 462.libquantum  | 1725112078| 1684213981 -2.4%
> 464.h264ref | 8597673515| 8528855778 ~0.0%
> 471.omnetpp | 37613034778   | 37432278047 ~0.0%
> 473.astar   | 3817295518| 3772460508 -1.2%
> 483.xalancbmk   | 149418776991  | 148545162207 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> --
> Benchmark   |  upstream | with this PATCH
> --
> 400.perlbench   | 30438407499   | 30574152897 ~0.0%
> 401.bzip2   | 2277114519| 2319432664 +1.9%
> 403.gcc | 64499664264   | 64781232731 ~0.0%
> 429.mcf | 1361486758| 1399942116 +2.8%
> 445.gobmk   | 15258056111   | 15396801542 +1.0%
> 456.hmmer   | 10896615649   | 10936223486 ~0.0%
> 458.sjeng   | 2592620709| 2641687496 +1.9%
> 462.libquantum  | 1814487525| 1854518500 +2.2%
> 464.h264ref | 13528736878   | 13614517066 ~0.0%
> 471.omnetpp | 38721066702   | 38910524667 ~0.0%
> 473.astar   | 3924015756| 3968057027 +1.1%
> 483.xalancbmk   | 165897692838  | 166843885880 ~0.0%
>
> Pan
>
>
> -Original Message-
> From: Richard Biener mailto:rguent...@suse.de>>
> Sent: Frida

Re: [PATCH V6] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread Kito Cheng via Gcc-patches

> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c
> new file mode 100644
> index 000..6384888dd03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -fno-vect-cost-model 
> -fno-tree-loop-distribute-patterns" } */
> +
> +#include "single_rgroup-1.h"
> +
> +TEST_ALL (test_1)
> +
> +/* { dg-final { scan-assembler-times {vsetvli} 10 } } */

Why scan # of vsetvli? did you mind explain more about this testcase?

maybe this should check something like { dg-final {
scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } ?

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
> new file mode 100644
> index 000..e1236e678ef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zve64d -mabi=ilp32 -O3 -fno-vect-cost-model 
> --param=riscv-autovec-preference=scalable" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +f (int32_t *__restrict f, int32_t *__restrict d, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +{
> +  f[i * 2 + 0] = 1;
> +  f[i * 2 + 1] = 2;
> +  d[i] = 3;
> +}
> +}

Didn't check anything?

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> new file mode 100644
> index 000..e5e54d08d3e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -fdump-tree-vect-details -save-temps" } */

Why save-temps? this flag also appear in many other testcase, remove
that if not necessary.

> +
> +#include "template-1.h"
> +
> +/* Currently, we don't support SLP auto-vectorization for VLA. But it's
> +   necessary that we add this testcase here to make sure such unsupported SLP
> +   auto-vectorization will not cause an ICE. We will enable "vect" checking 
> when
> +   we support SLP auto-vectorization for VLA in the future.  */

Didn't check anything?


> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-1.c
> new file mode 100644
> index 000..066d4ae7f84
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-1.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zve32f -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -fdump-tree-vect-details -save-temps" } */
> +
> +#include "template-1.h"

Didn't check anything?

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c
> new file mode 100644
> index 000..9c9123d75f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zve32f -mabi=ilp32d --param 
> riscv-autovec-preference=fixed-vlmax -fdump-tree-vect-details -save-temps" } 
> */
> +
> +#include "template-1.h"
> +

Didn't check anything?


> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
> new file mode 100644
> index 000..ef70c006ec5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zve32f_zvl128b -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -fdump-tree-vect-details -save-temps" } */
> +
> +#include "template-1.h"

Didn't check anything?

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-1.c
> new file mode 100644
> index 000..80e69ee8e66
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-1.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zve32x -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -fdump-tree-vect-details -save-temps" } */
> +
> +#include "template-1.h"


Didn't check anything?

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-2.c
> new file mode 100644
> index 000..f7be76965ef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-2.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zve32x -mabi=ilp32d --param 
> riscv-autovec-preference=fixed-vlmax -fdump-tree-vect-details -save-temps" } 
> */
> +
> +#include "template-1.h"
> +
> +

Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-05 Thread juzhe.zh...@rivai.ai

Yeah, you should also swap mode and code in rtx_def according to Richard 
suggestion
since it will not change the rtx_def data structure.

I think the only problem is the mode in tree data structure.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-06 09:53
To: Li, Pan2
CC: Richard Biener; 钟居哲; richard.sandiford; Jeff Law; gcc-patches; palmer; jakub
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
Hi Pan:
 
Could you try to apply the following diff and measure again? This
makes tree_type_common size unchanged.
 
 
sizeof tree_type_common= 128 (mode = 8 bit)
sizeof tree_type_common= 136 (mode = 16 bit)
sizeof tree_type_common= 128 (mode = 8 bit w/ this diff)
 
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af795aa81f98..b8ccfa407ed9 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
  tree attributes;
  unsigned int uid;
 
+  ENUM_BITFIELD(machine_mode) mode : 16;
+
  unsigned int precision : 10;
  unsigned no_force_blk_flag : 1;
  unsigned needs_constructing_flag : 1;
@@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common {
  unsigned restrict_flag : 1;
  unsigned contains_placeholder_bits : 2;
 
-  ENUM_BITFIELD(machine_mode) mode : 16;
 
  /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
 TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */
@@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common {
  unsigned empty_flag : 1;
  unsigned indivisible_p : 1;
  unsigned no_named_args_stdarg_p : 1;
-  unsigned spare : 15;
+  unsigned spare : 7;
 
  alias_set_type alias_set;
  tree pointer_to;
 
On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
 wrote:
>
> Yes, totally agree the number cannot be very accurate up to a point. Update 
> the correlated memory bytes allocated for the X86 target.
>
> Bytes allocated with O2:
> -
> Benchmark   |  upstream | with this PATCH
> -
> 400.perlbench   | 25286185160   | 25176544846 ~0.0%
> 401.bzip2   | 1429883731| 1391040027 -2.7%
> 403.gcc | 55023568981   | 54798890746 ~0.0%
> 429.mcf | 1360975660| 1321537710 -2.9%
> 445.gobmk   | 12791636502   | 12666523431 -1.0%
> 456.hmmer   | 9354433652| 9279189174 ~0.0%
> 458.sjeng   | 1991260562| 1944031904 -2.4%
> 462.libquantum  | 1725112078| 1684213981 -2.4%
> 464.h264ref | 8597673515| 8528855778 ~0.0%
> 471.omnetpp | 37613034778   | 37432278047 ~0.0%
> 473.astar   | 3817295518| 3772460508 -1.2%
> 483.xalancbmk   | 149418776991  | 148545162207 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> --
> Benchmark   |  upstream | with this PATCH
> --
> 400.perlbench   | 30438407499   | 30574152897 ~0.0%
> 401.bzip2   | 2277114519| 2319432664 +1.9%
> 403.gcc | 64499664264   | 64781232731 ~0.0%
> 429.mcf | 1361486758| 1399942116 +2.8%
> 445.gobmk   | 15258056111   | 15396801542 +1.0%
> 456.hmmer   | 10896615649   | 10936223486 ~0.0%
> 458.sjeng   | 2592620709| 2641687496 +1.9%
> 462.libquantum  | 1814487525| 1854518500 +2.2%
> 464.h264ref | 13528736878   | 13614517066 ~0.0%
> 471.omnetpp | 38721066702   | 38910524667 ~0.0%
> 473.astar   | 3924015756| 3968057027 +1.1%
> 483.xalancbmk   | 165897692838  | 166843885880 ~0.0%
>
> Pan
>
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 5, 2023 2:25 PM
> To: Li, Pan2 
> Cc: 钟居哲 ; kito.cheng ; 
> richard.sandiford ; Jeff Law 
> ; gcc-patches ; palmer 
> ; jakub 
> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit 
> to 16-bit
>
> On Fri, 5 May 2023, Li, Pan2 wrote:
>
> > I tried the memory profiling by valgrind --tool=memcheck 
> > --trace-children=yes for this change, target the SPEC 2006 INT part with 
> > rv64gcv. Note we only count the bytes allocated from valgrind log like this 
> > "==2832896==   total heap usage: 208 allocs, 165 frees, 123,204 bytes 
> > allocated".
> >
> > Consider some variance of valgrind, it looks like the impact to bytes
> > allocated may be limited. However, I am still running this for x86, it
> > will take more than 30 hours for each iteration...
>
> I'm not sure I'd call +- 7% on memory use "limited" -

Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-05 Thread Kito Cheng via Gcc-patches

Hi Pan:

Could you try to apply the following diff and measure again? This
makes tree_type_common size unchanged.


sizeof tree_type_common= 128 (mode = 8 bit)
sizeof tree_type_common= 136 (mode = 16 bit)
sizeof tree_type_common= 128 (mode = 8 bit w/ this diff)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af795aa81f98..b8ccfa407ed9 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
  tree attributes;
  unsigned int uid;

+  ENUM_BITFIELD(machine_mode) mode : 16;
+
  unsigned int precision : 10;
  unsigned no_force_blk_flag : 1;
  unsigned needs_constructing_flag : 1;
@@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common {
  unsigned restrict_flag : 1;
  unsigned contains_placeholder_bits : 2;

-  ENUM_BITFIELD(machine_mode) mode : 16;

  /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
 TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */
@@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common {
  unsigned empty_flag : 1;
  unsigned indivisible_p : 1;
  unsigned no_named_args_stdarg_p : 1;
-  unsigned spare : 15;
+  unsigned spare : 7;

  alias_set_type alias_set;
  tree pointer_to;

On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
 wrote:
>
> Yes, totally agree the number cannot be very accurate up to a point. Update 
> the correlated memory bytes allocated for the X86 target.
>
> Bytes allocated with O2:
> -
> Benchmark   |  upstream | with this PATCH
> -
> 400.perlbench   | 25286185160   | 25176544846 ~0.0%
> 401.bzip2   | 1429883731| 1391040027 -2.7%
> 403.gcc | 55023568981   | 54798890746 ~0.0%
> 429.mcf | 1360975660| 1321537710 -2.9%
> 445.gobmk   | 12791636502   | 12666523431 -1.0%
> 456.hmmer   | 9354433652| 9279189174 ~0.0%
> 458.sjeng   | 1991260562| 1944031904 -2.4%
> 462.libquantum  | 1725112078| 1684213981 -2.4%
> 464.h264ref | 8597673515| 8528855778 ~0.0%
> 471.omnetpp | 37613034778   | 37432278047 ~0.0%
> 473.astar   | 3817295518| 3772460508 -1.2%
> 483.xalancbmk   | 149418776991  | 148545162207 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> --
> Benchmark   |  upstream | with this PATCH
> --
> 400.perlbench   | 30438407499   | 30574152897 ~0.0%
> 401.bzip2   | 2277114519| 2319432664 +1.9%
> 403.gcc | 64499664264   | 64781232731 ~0.0%
> 429.mcf | 1361486758| 1399942116 +2.8%
> 445.gobmk   | 15258056111   | 15396801542 +1.0%
> 456.hmmer   | 10896615649   | 10936223486 ~0.0%
> 458.sjeng   | 2592620709| 2641687496 +1.9%
> 462.libquantum  | 1814487525| 1854518500 +2.2%
> 464.h264ref | 13528736878   | 13614517066 ~0.0%
> 471.omnetpp | 38721066702   | 38910524667 ~0.0%
> 473.astar   | 3924015756| 3968057027 +1.1%
> 483.xalancbmk   | 165897692838  | 166843885880 ~0.0%
>
> Pan
>
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 5, 2023 2:25 PM
> To: Li, Pan2 
> Cc: 钟居哲 ; kito.cheng ; 
> richard.sandiford ; Jeff Law 
> ; gcc-patches ; palmer 
> ; jakub 
> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit 
> to 16-bit
>
> On Fri, 5 May 2023, Li, Pan2 wrote:
>
> > I tried the memory profiling by valgrind --tool=memcheck 
> > --trace-children=yes for this change, target the SPEC 2006 INT part with 
> > rv64gcv. Note we only count the bytes allocated from valgrind log like this 
> > "==2832896==   total heap usage: 208 allocs, 165 frees, 123,204 bytes 
> > allocated".
> >
> > Consider some variance of valgrind, it looks like the impact to bytes
> > allocated may be limited. However, I am still running this for x86, it
> > will take more than 30 hours for each iteration...
>
> I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers 
> are off.  Note since various structures reside in GC memory there's also 
> changes to GC overhead and fragmentation, so precise measurements are 
> difficult.
>
> Richard.
>
> > RISC-V GCC Version:
> > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
> > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > This is free softwa

[PATCH V6] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (preferred_simd_mode): New function.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto.
(preferred_simd_mode): Ditto.
* config/riscv/riscv.cc (riscv_get_arg_info): Handle RVV type in 
function arg.
(riscv_convert_vector_bits): Adjust for RVV auto-vectorization.
(riscv_preferred_simd_mode): New function.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New target hook support.
* config/riscv/vector.md: Add autovec.md.
* config/riscv/autovec.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add testcases for RVV 
auto-vectorization.
* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/scalable-1.c: New test.
* gcc.target/riscv/rvv/autovec/template-1.h: New test.
* gcc.target/riscv/rvv/autovec/v-1.c: New test.
* gcc.target/riscv/rvv/autovec/v-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.

---
 gcc/config/riscv/autovec.md   |  49 
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |  51 +
 gcc/config/riscv/riscv.cc |  31 -
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/fixed-vlmax-1.c |  24 
 .../rvv/autovec/partial/single_rgroup-1.c |   8 ++
 .../rvv/autovec/partial/single_rgroup-1.h | 106 ++
 .../rvv/autovec/partial/single_rgroup_run-1.c |  19 
 .../gcc.target/riscv/rvv/autovec/scalable-1.c |  15 +++
 .../gcc.target/riscv/rvv/autovec/template-1.h |  68 +++
 .../gcc.target/riscv/rvv/autovec/v-1.c|   9 ++
 .../gcc.target/riscv/rvv/autovec/v-2.c|   6 +
 .../gcc.target/riscv/rvv/autovec/zve32f-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve32f-2.c   |   5 +
 .../gcc.target/riscv/rvv/autovec/zve32f-3.c   |   6 +
 .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve32f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve32x-2.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-3.c   |   6 +
 .../riscv/rvv/autovec/zve32x_zvl128b-1.c  |   5 +
 .../riscv/rvv/autovec/zve32x_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64d-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64d-3.c   |   6 +
 .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64d_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64f-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64f-3.c   |   6 +
 .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64x-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64x-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64x-3.c   |   6 +
 .../riscv/rvv/autovec/zve64x_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64x_zvl128b-2.c  |   6 +
 gcc

[PATCH] ISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (preferred_simd_mode): New function.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto.
(preferred_simd_mode): Ditto.
* config/riscv/riscv.cc (riscv_get_arg_info): Handle RVV type in 
function arg.
(riscv_convert_vector_bits): Adjust for RVV auto-vectorization.
(riscv_preferred_simd_mode): New function.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New target hook support.
* config/riscv/vector.md: Add autovec.md.
* config/riscv/autovec.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add testcases for RVV 
auto-vectorization.
* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/scalable-1.c: New test.
* gcc.target/riscv/rvv/autovec/template-1.h: New test.
* gcc.target/riscv/rvv/autovec/v-1.c: New test.
* gcc.target/riscv/rvv/autovec/v-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.

---
 gcc/config/riscv/autovec.md   |  49 
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |  51 +
 gcc/config/riscv/riscv.cc |  31 -
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/fixed-vlmax-1.c |  24 
 .../rvv/autovec/partial/single_rgroup-1.c |   8 ++
 .../rvv/autovec/partial/single_rgroup-1.h | 106 ++
 .../rvv/autovec/partial/single_rgroup_run-1.c |  19 
 .../gcc.target/riscv/rvv/autovec/scalable-1.c |  15 +++
 .../gcc.target/riscv/rvv/autovec/template-1.h |  68 +++
 .../gcc.target/riscv/rvv/autovec/v-1.c|   9 ++
 .../gcc.target/riscv/rvv/autovec/v-2.c|   6 +
 .../gcc.target/riscv/rvv/autovec/zve32f-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve32f-2.c   |   5 +
 .../gcc.target/riscv/rvv/autovec/zve32f-3.c   |   6 +
 .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve32f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve32x-2.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-3.c   |   6 +
 .../riscv/rvv/autovec/zve32x_zvl128b-1.c  |   5 +
 .../riscv/rvv/autovec/zve32x_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64d-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64d-3.c   |   6 +
 .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64d_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64f-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64f-3.c   |   6 +
 .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64x-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64x-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64x-3.c   |   6 +
 .../riscv/rvv/autovec/zve64x_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64x_zvl128b-2.c  |   6 +
 gcc

RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-05 Thread Li, Pan2 via Gcc-patches

Yes, totally agree the number cannot be very accurate up to a point. Update the 
correlated memory bytes allocated for the X86 target.

Bytes allocated with O2:
-
Benchmark   |  upstream | with this PATCH   
-
400.perlbench   | 25286185160   | 25176544846 ~0.0%
401.bzip2   | 1429883731| 1391040027 -2.7%
403.gcc | 55023568981   | 54798890746 ~0.0%
429.mcf | 1360975660| 1321537710 -2.9%
445.gobmk   | 12791636502   | 12666523431 -1.0%
456.hmmer   | 9354433652| 9279189174 ~0.0%
458.sjeng   | 1991260562| 1944031904 -2.4%
462.libquantum  | 1725112078| 1684213981 -2.4%
464.h264ref | 8597673515| 8528855778 ~0.0%
471.omnetpp | 37613034778   | 37432278047 ~0.0%
473.astar   | 3817295518| 3772460508 -1.2%
483.xalancbmk   | 149418776991  | 148545162207 ~0.0%

Bytes allocated with Ofast + funroll-loops:
--
Benchmark   |  upstream | with this PATCH
--
400.perlbench   | 30438407499   | 30574152897 ~0.0% 
401.bzip2   | 2277114519| 2319432664 +1.9%
403.gcc | 64499664264   | 64781232731 ~0.0%
429.mcf | 1361486758| 1399942116 +2.8%
445.gobmk   | 15258056111   | 15396801542 +1.0%
456.hmmer   | 10896615649   | 10936223486 ~0.0%
458.sjeng   | 2592620709| 2641687496 +1.9%
462.libquantum  | 1814487525| 1854518500 +2.2%
464.h264ref | 13528736878   | 13614517066 ~0.0%
471.omnetpp | 38721066702   | 38910524667 ~0.0%
473.astar   | 3924015756| 3968057027 +1.1%
483.xalancbmk   | 165897692838  | 166843885880 ~0.0%

Pan


-Original Message-
From: Richard Biener  
Sent: Friday, May 5, 2023 2:25 PM
To: Li, Pan2 
Cc: 钟居哲 ; kito.cheng ; 
richard.sandiford ; Jeff Law 
; gcc-patches ; palmer 
; jakub 
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

On Fri, 5 May 2023, Li, Pan2 wrote:

> I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes 
> for this change, target the SPEC 2006 INT part with rv64gcv. Note we only 
> count the bytes allocated from valgrind log like this "==2832896==   total 
> heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> 
> Consider some variance of valgrind, it looks like the impact to bytes 
> allocated may be limited. However, I am still running this for x86, it 
> will take more than 30 hours for each iteration...

I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers 
are off.  Note since various structures reside in GC memory there's also 
changes to GC overhead and fragmentation, so precise measurements are difficult.

Richard.

> RISC-V GCC Version:
> >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
> riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 
> (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There 
> is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
> PURPOSE.
> 
> Bytes allocated with O2:
> -
> Benchmark |  upstream | with this PATCH   
> -
> 400.perlbench | 29699642875   | 29949876269 ~0.0%
> 401.bzip2 | 1641041659| 1755563972 +6.95%
> 403.gcc   | 68447500516   | 68900883291 ~0.0%
> 429.mcf   | 1433156462| 1433253373 ~0.0%
> 445.gobmk | 14239225210   | 14463438465 ~0.0%
> 456.hmmer | 9635955623| 9808534948 +1.8%
> 458.sjeng | 2419478204| 2545478940 +5.4%
> 462.libquantum| 1686404489| 1800884197 +6.8%
> 464.h264ref   8j1 | 10190413900   | 10351134161 +1.6%
> 471.omnetpp   | 40814627684   | 41185864529 ~0.0%
> 473.astar | 3807097529| 3928428183 +3.2%
> 483.xalancbmk | 152959418167  | 154201738843 ~0.0%
> 
> Bytes allocated with Ofast + funroll-loops:
> ---

Re: [PATCH] RISC-V: Add bext pattern for ZBS

2023-05-05 Thread Jeff Law via Gcc-patches





On 5/4/23 11:08, Raphael Moreira Zinsly wrote:

When (a & (1 << bit_no)) is tested inside an IF we can use a bit extract.

gcc/ChangeLog:

* config/riscv/bitmanip.md
(bext): Rename one to avoid name clash.
(branch_bext): New split pattern.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/zbs-bext-02.c: New test.




---
  gcc/config/riscv/bitmanip.md | 24 +++-
  gcc/testsuite/gcc.target/riscv/zbs-bext-02.c | 18 +++
  2 files changed, 41 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bext-02.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index a27fc3e34a1..e29e2d1fa53 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -595,7 +595,7 @@
  ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
  ;; usually has the `bitno` typed as X-mode (i.e. no further
  ;; zero-extension is performed around the bitno).
-(define_insn "*bext"
+(define_insn "*bext_2"
[(set (match_operand:X 0 "register_operand" "=r")
(zero_extract:X (match_operand:X 1 "register_operand" "r")
(const_int 1)
This doesn't make sense to me.  Is it possible this was from an earlier 
version of the patch?


In general when we have  a * prefix, we're allowed to have multiple 
patterns with the same name.  Essentially the pattern names are just for 
debugging purposes, no API is exposed to generate those patterns when 
there's a '*' prefix.




@@ -720,6 +720,28 @@
 operands[9] = GEN_INT (clearbit);
  })
  
+;; IF_THEN_ELSE: test for (a & (1 << BIT_NO))

+(define_insn_and_split "*branch_bext"
+  [(set (pc)
+   (if_then_else
+ (match_operator 1 "equality_operator"
+[(zero_extract:X (match_operand:X 2 "register_operand" "r")
+(const_int 1)
+(zero_extend:X (match_operand:QI 3 "register_operand" "r")))
+   (const_int 0)])
+(label_ref (match_operand 0 "" ""))
+(pc)))
+   (clobber (match_scratch:X 4 "=&r"))]
Formatting nit.  In general the operands of a rtx operator all line up 
together when we can.  So in this case the (const_int 1) should line up 
under the (match_operand:X 2).  Similarly for the (zero_extend:X).  That 
may require wrapping the zero_extned line.  The way to do that would be 
to bring its match_operand down to a new line, indent it two spaces from 
the open paren of the (zero_extend.




It's been a while since we poked at this, so maybe you've already told 
me before, but would it make sense to use the GPR iterator rather than 
the X iterator?


GPR would result in two patterns that are available to match at the same 
time, one for SI, another for DI.


X also results in two patterns, but only one is available at any given 
time dependent on TARGET_64BIT.


I guess the rest are defined in terms of X, particularly the bext 
pattern.  So nevermind, keep it as X.


So I think the only things we potentially adjust is to remove the hunk 
which changes the name of the *bext pattern and the whitespace 
fix.  I think we'll be good to go after those changes.


Jeff

[committed] CRIS: peephole2 an add into two addq or subq

2023-05-05 Thread Hans-Peter Nilsson via Gcc-patches

Unfortunately, doesn't cause a performance improvement for coremark,
but happens a few times in newlib, just enough to affect coremark
0.01% by size (or 4 bytes, and three cycles (__fwalk_sglue and
__vfiprintf_r each two bytes).

gcc:
* config/cris/cris.md (splitop): Add PLUS.
* config/cris/cris.cc (cris_split_constant): Also handle
PLUS when a split into two insns may be useful.

gcc/testsuite:
* gcc.target/cris/peep2-addsplit1.c: New test.
---
 gcc/config/cris/cris.cc   | 25 +++-
 gcc/config/cris/cris.md   |  6 +-
 .../gcc.target/cris/peep2-addsplit1.c | 59 +++
 3 files changed, 88 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/cris/peep2-addsplit1.c

diff --git a/gcc/config/cris/cris.cc b/gcc/config/cris/cris.cc
index 331f5908a538..561ca1b3fa92 100644
--- a/gcc/config/cris/cris.cc
+++ b/gcc/config/cris/cris.cc
@@ -2642,7 +2642,30 @@ cris_split_constant (HOST_WIDE_INT wval, enum rtx_code 
code,
   int32_t ival = (int32_t) wval;
   uint32_t uval = (uint32_t) wval;
 
-  if (code != AND || IN_RANGE(ival, -32, 31)
+  /* Can we do with two addq or two subq, improving chances of filling a
+ delay-slot?  At worst, we break even, both performance and
+ size-wise.  */
+  if (code == PLUS
+  && (IN_RANGE (ival, -63 * 2, -63 - 1)
+ || IN_RANGE (ival, 63 + 1, 63 * 2)))
+{
+  if (generate)
+   {
+ int sign = ival < 0 ? -1 : 1;
+ int aval = abs (ival);
+
+ if (mode != SImode)
+   {
+ dest = gen_rtx_REG (SImode, REGNO (dest));
+ op = gen_rtx_REG (SImode, REGNO (op));
+   }
+ emit_insn (gen_addsi3 (dest, op, GEN_INT (63 * sign)));
+ emit_insn (gen_addsi3 (dest, op, GEN_INT ((aval - 63) * sign)));
+   }
+  return 2;
+}
+
+  if (code != AND || IN_RANGE (ival, -32, 31)
   /* Implemented using movu.[bw] elsewhere.  */
   || ival == 255 || ival == 65535
   /* Implemented using clear.[bw] elsewhere.  */
diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index 53fc2f2de4af..243d47748b78 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -209,7 +209,7 @@ (define_code_iterator plusminusumin [plus minus umin])
 (define_code_iterator plusumin [plus umin])
 
 ;; For opsplit1.
-(define_code_iterator splitop [and])
+(define_code_iterator splitop [and plus])
 
 ;; The addsubbo and nd code-attributes form a hack.  We need to output
 ;; "addu.b", "subu.b" but "bound.b" (no "u"-suffix) which means we'd
@@ -2984,6 +2984,10 @@ (define_peephole2 ; movandsplit1
 
 ;; Large (read: non-quick) numbers can sometimes be AND:ed by other means.
 ;; Testcase: gcc.target/cris/peep2-andsplit1.c
+;; 
+;; Another case is add N,rx with -126..-64,64..126: it has the same
+;; size and execution time as two addq or subq, but addq and subq can fill
+;; a delay-slot.
 (define_peephole2 ; opsplit1
   [(parallel
 [(set (match_operand 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/cris/peep2-addsplit1.c 
b/gcc/testsuite/gcc.target/cris/peep2-addsplit1.c
new file mode 100644
index ..7dff1d8c77c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/cris/peep2-addsplit1.c
@@ -0,0 +1,52 @@
+/* Check that "opsplit1" with PLUS does its job.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-leading-underscore" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+int addsi (int x)
+{
+  return x + 64;
+}
+
+char addqi (char x)
+{
+  return x + 126;
+}
+
+short addhi (short x)
+{
+  return x - 64;
+}
+
+unsigned short addhi2 (short x)
+{
+  return x - 126;
+}
+
+/*
+** addsi:
+** addq 63,.r10
+** ret
+** addq 1,.r10
+*/
+
+/*
+** addqi:
+** addq 63,.r10
+** ret
+** addq 63,.r10
+*/
+
+/*
+** addhi:
+** subq 63,.r10
+** ret
+** subq 1,.r10
+*/
+
+/*
+** addhi2:
+** subq 63,.r10
+** ret
+** subq 63,.r10
+*/
-- 
2.30.2

[committed] CRIS: peephole2 a move of constant followed by and of same register

2023-05-05 Thread Hans-Peter Nilsson via Gcc-patches

While moves of constants into registers are separately
optimizable, a combination of a move with a subsequent "and"
is slightly preferable even if the move can be generated
with the same number (and timing) of insns, as moves of
"just" registers are eliminated now and then in different
passes, loosely speaking.  This movandsplit1 pattern feeds
into the opsplit1/AND peephole2, with matching occurrences
observed in the floating point functions in libgcc.  Also, a
test-case to fit.  Coremark improvements are unimpressive:
less than 0.0003% speed, 0.1% size.

But that was pre-LRA; after the switch to LRA this peephole2
doesn't match anymore (for any of coremark, local tests,
libgcc and newlib libc) and the test-case passes with and
without the patch.  Still, there's no apparent reason why
LRA prefers "move R1,R2" "and I,R2" to "move I,R1" "and
R1,R2", or why that wouldn't "randomly" change (also seen
with other operations than "and").  Thus committed.

gcc:
* config/cris/cris.md (movandsplit1): New define_peephole2.

gcc/testsuite:
* gcc.target/cris/peep2-movandsplit1.c: New test.
---
 gcc/config/cris/cris.md   | 38 +++
 .../gcc.target/cris/peep2-movandsplit1.c  | 17 +
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/cris/peep2-movandsplit1.c

diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index d5aadf752e86..53fc2f2de4af 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -2944,6 +2944,44 @@ (define_peephole2 ; andqu
   operands[4] = GEN_INT (trunc_int_for_mode (INTVAL (operands[1]), QImode));
 })
 
+;; Somewhat similar to andqu, but a different range and expansion,
+;; intended to feed the output into opsplit1 with AND:
+;;  move.d 0x7,$r10
+;;  and.d $r11,$r10
+;; into:
+;;  move.d $r11,$r10
+;;  and.d 0x7,$r10
+;; which opsplit1/AND will change into:
+;;  move.d $r11,$r10 (unaffected by opsplit1/AND; shown only for context)
+;;  lslq 13,$r10
+;;  lsrq 13,$r10
+;; thereby winning in space, but in time only if the 0x7 happened to
+;; be unaligned in the code.
+(define_peephole2 ; movandsplit1
+  [(parallel
+[(set (match_operand 0 "register_operand")
+ (match_operand 1 "const_int_operand"))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])
+   (parallel
+[(set (match_operand 2 "register_operand")
+ (and (match_operand 3 "register_operand")
+  (match_operand 4 "register_operand")))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])]
+  "REGNO (operands[0]) == REGNO (operands[2])
+   && REGNO (operands[0]) == REGNO (operands[3])
+   && cris_splittable_constant_p (INTVAL (operands[1]), AND,
+ GET_MODE (operands[2]),
+ optimize_function_for_speed_p (cfun))"
+  [(parallel
+[(set (match_dup 2) (match_dup 4))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])
+   (parallel
+[(set (match_dup 2) (match_dup 5))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])]
+{
+  operands[5] = gen_rtx_AND (GET_MODE (operands[2]), operands[2], operands[1]);
+})
+
 ;; Large (read: non-quick) numbers can sometimes be AND:ed by other means.
 ;; Testcase: gcc.target/cris/peep2-andsplit1.c
 (define_peephole2 ; opsplit1
diff --git a/gcc/testsuite/gcc.target/cris/peep2-movandsplit1.c 
b/gcc/testsuite/gcc.target/cris/peep2-movandsplit1.c
new file mode 100644
index ..e4a860d966e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/cris/peep2-movandsplit1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler-times "lsrq " 2 } } */
+/* { dg-final { scan-assembler-times "lslq " 2 } } */
+/* { dg-final { scan-assembler-times "move.d \\\$r11,\\\$r10" 2 } } */
+/* { dg-final { scan-assembler-times "\tmov" 2 } } */
+/* { dg-final { scan-assembler-not "\tand" } } */
+/* { dg-options "-O2" } */
+
+unsigned int xmovandr (unsigned int y, unsigned int x)
+{
+  return x & 0x7;
+}
+
+unsigned int xmovandl (unsigned int y, unsigned int x)
+{
+  return x & 0xfffe;
+}
-- 
2.30.2

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-05 Thread 钟居哲

Hi, Richards. I would like to give more information about this patch so that it 
will make this patch easier for you to review.

Currently, I saw we have 3 situations that we need to handle in case of loop 
control IV in auto-vectorization:
1. Single rgroup loop control (ncopies == 1 && vec_num == 1 so loop_len.length 
() == 1 or rgc->lengh () == 1)
2. Multiple rgroup for SLP.
3. Multiple rgroup for non-SLP which is Richard Sandiford point out previously 
(For example, VEC_PACK_TRUNC).

To talk about this patch, let me talk about RVV LLVM implementation first which 
inspire me to send this patch:
https://reviews.llvm.org/D99750 

According to LLVM implementation, they are adding a middle-end IR called 
"get_vector_length" which has totally
same functionality as "select_vl" in this patch (I call it "while_len" 
previously, now I rename it as "select_vl" following Richard suggestion).

The LLVM implementation is only let "get_vector_length" calculate the number of 
elements in single rgroup loop.
For multi rgroup, let's take a look at it:
https://godbolt.org/z/3GP78efTY 

void
foo1 (short *__restrict f, int *__restrict d, int n)
{
  for (int i = 0; i < n; ++i)
{
  f[i * 2 + 0] = 1;
  f[i * 2 + 1] = 2;
  d[i] = 3;
}
} 

RISC-V Clang:
foo1:   # @foo1
# %bb.0:
bleza2, .LBB0_8
# %bb.1:
li  a3, 16
bgeua2, a3, .LBB0_3
# %bb.2:
li  a6, 0
j   .LBB0_6
.LBB0_3:
andia6, a2, -16
lui a3, 32
addiw   a3, a3, 1
vsetivlizero, 8, e32, m2, ta, ma
vmv.v.x v8, a3
vmv.v.i v10, 3
mv  a4, a6
mv  a5, a1
mv  a3, a0
.LBB0_4:# =>This Inner Loop Header: Depth=1
addia7, a5, 32
addit0, a3, 32
vsetivlizero, 16, e16, m2, ta, ma
vse16.v v8, (a3)
vse16.v v8, (t0)
vsetivlizero, 8, e32, m2, ta, ma
vse32.v v10, (a5)
vse32.v v10, (a7)
addia3, a3, 64
addia4, a4, -16
addia5, a5, 64
bneza4, .LBB0_4
# %bb.5:
beq a6, a2, .LBB0_8
.LBB0_6:
sllia3, a6, 2
add a0, a0, a3
addia0, a0, 2
add a1, a1, a3
sub a2, a2, a6
li  a3, 1
li  a4, 2
li  a5, 3
.LBB0_7:# =>This Inner Loop Header: Depth=1
sh  a3, -2(a0)
sh  a4, 0(a0)
sw  a5, 0(a1)
addia0, a0, 4
addia2, a2, -1
addia1, a1, 4
bneza2, .LBB0_7
.LBB0_8:
ret

ARM GCC:
foo1:
cmp w2, 0
ble .L1
addvl   x4, x0, #1
mov x3, 0
cntbx7
cntbx6, all, mul #2
sbfiz   x2, x2, 1, 32
ptrue   p0.b, all
mov x5, x2
adrpx8, .LC0
uqdech  x5
add x8, x8, :lo12:.LC0
whilelo p1.h, xzr, x5
ld1rw   z1.s, p0/z, [x8]
mov z0.s, #3
whilelo p0.h, xzr, x2
.L3:
st1hz1.h, p0, [x0, x3, lsl 1]
st1hz1.h, p1, [x4, x3, lsl 1]
st1wz0.s, p1, [x1, #1, mul vl]
add x3, x3, x7
whilelo p1.h, x3, x5
st1wz0.s, p0, [x1]
add x1, x1, x6
whilelo p0.h, x3, x2
b.any   .L3
.L1:
ret

It's very obvious that ARM GCC has much better codegen since RVV LLVM just use 
SIMD style to handle multi-rgroup SLP auto-vectorization.

Well, I am totally aggree that we should add length stuff in auto-vectorization 
not only for single rgroup but also multiple rgroup.
However, when I am trying to implement multiple rgroup length for both SLP and 
non-SLP and testing, turns out it's hard to use select_vl
since "select_vl" pattern allows non-VF flexible length (length <= min 
(remain,VF)) in any iteration, it's consuming much more operations for
adjust loop controls IV and data reference address point IV than just using 
"MIN_EXPR".

So for Case 2 && Case 3, I just use MIN_EXPR directly instead of SELECT_VL 
after my serveral internal testing.

Now base on these situations, we only have "select_vl" for single-rgroup, but 
multiple-rgroup (both SLP and non-SLP), we just
use MIN_EXPR.

Is it more appropriate that we should remove "select_vl" and just use MIN_EXPR 
force VF elements in each non-final iteration in single rgroup?

Like the codegen according to RVV ISA example (show as RVV LLVM):
https://repo.hca.bsc.es/epic/z/oynhzP 

ASM:
vec_add:# @vec_add
bleza3, .LBB0_3
li  a4, 0
.LBB0_2:# %vector.body
sub a5, a3, a4
vsetvli a6, a5, e64, m1, ta, mu  ==> change it into a6 = min (a5, VF) 
&& vsetvli zero, a6, e64, m1, ta, mu
sllia7, a4, 3
add a5, a1, a7
vle64.v v8, (a5)
add a5, a2

Re: [PATCH V5] Use reg mode to move sub blocks for parameters and returns

2023-05-05 Thread Jeff Law via Gcc-patches





On 5/3/23 23:49, guojiufu wrote:

Hi,

On 2023-05-01 03:00, Jeff Law wrote:

On 3/16/23 21:39, Jiufu Guo wrote:

Hi,

When assigning a parameter to a variable, or assigning a variable to
return value with struct type, and the parameter/return is passed
through registers.
For this kind of case, it would be better to use the nature mode of
the registers to move the content for the assignment.

As the example code (like code in PR65421):

typedef struct SA {double a[3];} A;
A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s)
A ret_arg (A a) {return a;} // just empty fun body
void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)

Comparing with previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609394.html
This version refine code to eliminated reductant code in  the sub
routine "move_sub_blocks".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?


...


diff --git a/gcc/expr.cc b/gcc/expr.cc
index 15be1c8db99..97a7be9542e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -5559,6 +5559,41 @@ mem_ref_refers_to_non_mem_p (tree ref)
    return non_mem_decl_p (base);
  }
  +/* Sub routine of expand_assignment, invoked when assigning from a
+   parameter or assigning to a return val on struct type which may
+   be passed through registers.  The mode of register is used to
+   move the content for the assignment.
+
+   This routine generates code for expression FROM which is BLKmode,
+   and move the generated content to TO_RTX by su-blocks in 
SUB_MODE.  */

+
+static void
+move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode)
+{
+  gcc_assert (MEM_P (to_rtx));
+
+  HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant ();

Consider the case of a BLKmode return value.  Isn't TO_RTX in this
case a BLKmode object?


Thanks for this question!

Yes, the mode of TO_RTX is BLKmode.
As we know, when the function returns via registers, the mode of
the `return-rtx` could also be BLKmode.  This patch is going to
improve these kinds of cases.

For example:
```
typedef struct FLOATS
{
   double a[3];
} FLOATS;
FLOATS ret_arg_pt (FLOATS *a){return *a;}
```

D.3952 = *a_2(D); //this patch enhance this assignment
return D.3952;

The mode is BLKmode for the rtx of `D.3952` is BLKmode, and the
rtx for "DECL_RESULT(current_function_decl)".  And the DECL_RESULT
represents the return registers.
I didn't think MEM_SIZE  worked for BLKmode.  BUt looking at its 
definition, it's pulling the size out of the attributes rather than from 
the mode.  SO I guess there's a reasonable chance it's going to work :-)


OK for the trunk.

jeff

Re: [PATCH v2] Canonicalize vec_merge when mask is constant.

2023-05-05 Thread Jeff Law via Gcc-patches





On 5/3/23 21:25, liuhongt wrote:

Here's update patch with documents in md.texi.
Ok for trunk?

--
Use swap_communattive_operands_p for canonicalization. When both value
has same operand precedence value, then first bit in the mask should
select first operand.

The canonicalization should help backends for pattern match. .i.e. x86
backend has lots of vec_merge patterns, combine will create any form
of vec_merge(mask, or inverted mask), then backend need to add 2
patterns to match exact 1 instruction. The canonicalization can
simplify 2 patterns to 1.

gcc/ChangeLog:

* combine.cc (maybe_swap_commutative_operands): Canonicalize
vec_merge when mask is constant.
* doc/md.texi: Document vec_merge canonicalization.

OK.
jeff

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Jeff Law via Gcc-patches





On 5/5/23 14:46, Jakub Jelinek wrote:

On Fri, May 05, 2023 at 03:37:47PM +, Tamar Christina wrote:

2023-05-05  Jakub Jelinek  

* Makefile.in (check_p_numbers): Rename to one_to_, move
earlier with helper variables also renamed.
(MATCH_SPLUT_SEQ): Use $(wordlist 1,$(NUM_MATCH_SPLITS),$(one_to_))
instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
(check_p_subdirs): Use $(one_to_) instead of $(check_p_numbers).


Passed bootstrap/regtest on x86_64-linux and i686-linux, ok for trunk?

OK.
jeff

Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-05 Thread Jeff Law via Gcc-patches





On 4/19/23 03:57, Jin Ma wrote:

This patch adds the 'Zfa' extension for riscv, which is based on:
   https://github.com/riscv/riscv-isa-manual/commits/zfb
   
https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da

The binutils-gdb for 'Zfa' extension:
   https://github.com/a4lg/binutils-gdb/commits/riscv-zfa

What needs special explanation is:
1, The immediate number of the instructions FLI.H/S/D is represented in the 
assembly as a
   floating-point value, with scientific counting when rs1 is 1,2, and decimal 
numbers for
   the rest.

   Related llvm link:
 https://reviews.llvm.org/D145645
   Related discussion link:
 https://github.com/riscv/riscv-isa-manual/issues/980
Right.  I think the goal right now is to get the bulk of this reviewed 
now.  Ideally we'll get to the point where the only outstanding issue is 
the interface between the assembler & gcc.




2, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally 
to
   accelerate the processing of JavaScript Numbers.", so it seems that no 
implementation
   is required.
Fair enough.  There's seems to be a general desire to wire up builtins 
for many things that aren't directly usable by the compiler.  So 
consider such a change as a follow-up.   I don't think something like 
this should hold up the blk of Zfa.




3, The instructions FMINM and FMAXM correspond to C23 library function fminimum 
and fmaximum.
   Therefore, this patch has simply implemented the pattern of fminm3 
and
   fmaxm3 to prepare for later.

Sounds good.




gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension version.
* config/riscv/constraints.md (Zf): Constrain the floating point number 
that the
instructions FLI.H/S/D can load.
((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable FMVP.D.X 
and FMVH.X.D.
* config/riscv/iterators.md (ceil): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
New.
* config/riscv/riscv.cc (find_index_in_array): New.
(riscv_float_const_rtx_index_for_fli): Get the index of the 
floating-point number that
the instructions FLI.H/S/D can mov.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, 
memory is not applicable.
(riscv_const_insns): The cost of FLI.H/S/D is 3.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no 
split is required.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D 
immediate in assembly
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.h (GP_REG_RTX_P): New.
* config/riscv/riscv.md (fminm3): New.




index c448e6b37e9..62d9094f966 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -118,6 +118,13 @@ (define_constraint "T"
(and (match_operand 0 "move_operand")
 (match_test "CONSTANT_P (op)")))
  
+;; Zfa constraints.

+
+(define_constraint "Zf"
+  "A floating point number that can be loaded using instruction `fli` in zfa."
+  (and (match_code "const_double")
+   (match_test "(riscv_float_const_rtx_index_for_fli (op) != -1)")))
+
  ;; Vector constraints.
  
  (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS"

@@ -183,8 +190,8 @@ (define_memory_constraint "Wdm"
  
  ;; Vendor ISA extension constraints.
  
-(define_register_constraint "th_f_fmv" "TARGET_XTHEADFMV ? FP_REGS : NO_REGS"

+(define_register_constraint "th_f_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS 
: NO_REGS"
"A floating-point register for XTheadFmv.")
  
-(define_register_constraint "th_r_fmv" "TARGET_XTHEADFMV ? GR_REGS : NO_REGS"

+(define_register_constraint "th_r_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? GR_REGS 
: NO_REGS"
"An integer register for XTheadFmv.")
I think Christoph had good suggestions on the constraints.  So let's go 
with his suggestions.


You might consider a follow-up patch where you use negation of one of 
the predefined constants for synthesis.  I would not be surprised at all 
if that's as efficient on some cores as loading the negated constants 
out of the constant pool.  But I don't think it has to be a part of this 
patch.






diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 9b767038452..c81b08e3cc5 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -288,3 +288,8 @@ (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET 
UNSPEC_FLE_QUIET])
  (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET 
"le")])
  (define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET 
"LE")])
  
+(define_int_iterator ROUND [UNSPEC_ROUND UNSPEC_FLOOR UNSPEC_CEIL UNSPEC_BTRUNC UNSPEC_ROUNDEVEN UNSPEC_NEARBYINT])

+(define_int_attr round_patter

[committed] CRIS: peephole2 a lsrq into a lslq+lsrq pair

2023-05-05 Thread Hans-Peter Nilsson via Gcc-patches

Observed after opsplit1 with AND in libgcc floating-point
functions, like the first spottings of opsplit1/AND
opportunities.  Two patterns are nominally needed, as the
peephole2 optimizer continues from the *first replacement*
insn, not from a minimum context for general matching; one
that includes it as the last match.

But, the "free-standing" opportunity (three shifts) didn't
match by itself in a gcc build of libraries plus running the
test-suite, and thus deemed uninteresting and left out.
(As expected; if it had matched, that'd have indicated a
previously missed optimization or other problem elsewhere.)
Only the one that includes the previous define_peephole2
that may generate the sequence (i.e. opsplit1/AND), matches
easily.

Coremark results aren't impressive though: 0.003%
improvement in speed and slightly less than 0.1% in size.

A testcase is added to match and another one to cover a case
of movulsr checking that it's used; it's preferable to
lsrandsplit when both would match.

gcc:
* config/cris/cris.md (lsrandsplit1): New define_peephole2.

gcc/testsuite:
* gcc.target/cris/peep2-lsrandsplit1.c,
gcc.target/cris/peep2-movulsr2.c: New tests.
---
 gcc/config/cris/cris.md   | 53 +++
 .../gcc.target/cris/peep2-lsrandsplit1.c  | 19 +++
 .../gcc.target/cris/peep2-movulsr2.c  | 19 +++
 3 files changed, 91 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/cris/peep2-lsrandsplit1.c
 create mode 100644 gcc/testsuite/gcc.target/cris/peep2-movulsr2.c

diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index e72943b942e5..d5aadf752e86 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -2690,6 +2690,59 @@ (define_peephole2 ; movulsr
 = INTVAL (operands[2]) <= 0xff ? GEN_INT (0xff) :  GEN_INT (0x);
 })
 
+;; Avoid, after opsplit1 with AND (below), sequences of:
+;;  lsrq N,R
+;;  lslq M,R
+;;  lsrq M,R
+;; (N < M), where we can fold the first lsrq into the lslq-lsrq, like:
+;;  lslq M-N,R
+;;  lsrq M,R
+;; We have to match this before opsplit1 below and before other peephole2s of
+;; lesser value, since peephole2 matching resumes at the first generated insn,
+;; and thus wouldn't match a pattern of the three shifts after opsplit1/AND.
+;; Note that this lsrandsplit1 is in turn of lesser value than movulsr, since
+;; that one doesn't require the same operand for source and destination, but
+;; they happen to be the same hard-register at peephole2 time even if
+;; naturally separated like in peep2-movulsr2.c, thus this placement.  (Source
+;; and destination will be re-separated and the move optimized out in
+;; cprop_hardreg at time of this writing.)
+;; Testcase: gcc.target/cris/peep2-lsrandsplit1.c
+(define_peephole2 ; lsrandsplit1
+  [(parallel
+[(set (match_operand:SI 0 "register_operand")
+ (lshiftrt:SI
+  (match_operand:SI 1 "register_operand")
+  (match_operand:SI 2 "const_int_operand")))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])
+   (parallel
+[(set (match_operand 3 "register_operand")
+ (and
+  (match_operand 4 "register_operand")
+  (match_operand 5 "const_int_operand")))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])]
+  "REGNO (operands[0]) == REGNO (operands[1])
+   && REGNO (operands[0]) == REGNO (operands[3])
+   && REGNO (operands[0]) == REGNO (operands[4])
+   && (INTVAL (operands[2])
+   < (clz_hwi (INTVAL (operands[5])) - (HOST_BITS_PER_WIDE_INT - 32)))
+   && cris_splittable_constant_p (INTVAL (operands[5]), AND, SImode,
+ optimize_function_for_speed_p (cfun)) == 2"
+  ;; We're guaranteed by the above hw_clz test (certainly non-zero) and the
+  ;; test for a two-insn return-value from cris_splittable_constant_p, that
+  ;; the cris_splittable_constant_p AND-replacement would be lslq-lsrq.
+  [(parallel
+[(set (match_dup 0) (ashift:SI (match_dup 0) (match_dup 9)))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])
+   (parallel
+[(set (match_dup 0) (lshiftrt:SI (match_dup 0) (match_dup 10)))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])]
+{
+  HOST_WIDE_INT shiftval
+= clz_hwi (INTVAL (operands[5])) - (HOST_BITS_PER_WIDE_INT - 32);
+  operands[9] = GEN_INT (shiftval - INTVAL (operands[2]));
+  operands[10] = GEN_INT (shiftval);
+})
+
 ;; Testcase for the following four peepholes: gcc.target/cris/peep2-xsrand.c
 
 (define_peephole2 ; asrandb
diff --git a/gcc/testsuite/gcc.target/cris/peep2-lsrandsplit1.c 
b/gcc/testsuite/gcc.target/cris/peep2-lsrandsplit1.c
new file mode 100644
index ..0da645358771
--- /dev/null
+++ b/gcc/testsuite/gcc.target/cris/peep2-lsrandsplit1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler-not " and" } } */
+/* { dg-final { scan-assembler-times "lsrq " 2 } } */
+/* { dg-final { scan-assembler-times "lslq " 2 } } */
+/* { dg-options "-O2" } */
+
+/* Test the "lsrlsllsr1" peephole2 trivially.  */
+
+unsigned int
+andwls

Re: [RFC] RISC-V: Add proposed Ztso atomic mappings

2023-05-05 Thread Andrea Parri

On Fri, May 05, 2023 at 02:42:38PM -0700, Hans Boehm wrote:
> I think A.6-tso also needs to change the last line in the table from
> lr.aqrl ... sc to lr.aq ... sc.rl, otherwise I think we have problems with
> a subsequent A.7-tso generated l.aq . Otherwise I agree.

Indeed!  Thanks for the correction.

  Andrea

Re: [RFC] RISC-V: Add proposed Ztso atomic mappings

2023-05-05 Thread Andrew Waterman via Gcc-patches

On Fri, May 5, 2023 at 2:42 PM Hans Boehm  wrote:
>
> I think A.6-tso also needs to change the last line in the table from lr.aqrl 
> ... sc to lr.aq ... sc.rl, otherwise I think we have problems with a 
> subsequent A.7-tso generated l.aq . Otherwise I agree.
>
> I certainly agree that, given the Ztso extension, there should be a standard 
> compiler-implemented mapping that leverages it. I'm personally much less 
> enthusiastic about calling it an ABI. I'd like to see clarity that the RVWMO 
> ABI is the standard we expect portable libraries to be prepared to use.

There's already a ratified sentiment that effectively implies this.
Ztso is not required by the RVA profiles, and so it follows that any
binary that's compatible across RVA-profile implementations cannot
assume the presence of Ztso.  (I agree the ABI should encode this
property for de jure purposes, too, but it's already a de facto
requirement.)

> If they want to test for and use Ztso internally, fine. But having users deal 
> with two different ABIs seems like a very high cost for avoiding some 
> (basically no-op?) fences.
>
> Hans
>
>
>
> On Fri, May 5, 2023 at 1:11 PM Andrea Parri  wrote:
>>
>> On Fri, May 05, 2023 at 12:18:12PM -0700, Palmer Dabbelt wrote:
>> > On Fri, 05 May 2023 11:55:31 PDT (-0700), Andrea Parri wrote:
>> > > On Fri, May 05, 2023 at 10:12:56AM -0700, Patrick O'Neill wrote:
>> > > > The RISC-V Ztso extension currently has no effect on generated code.
>> > > > With the additional ordering constraints guarenteed by Ztso, we can 
>> > > > emit
>> > > > more optimized atomic mappings than the RVWMO mappings.
>> > > >
>> > > > This patch implements Andrea Parri's proposed Ztso mappings ("Proposed
>> > > > Mapping").
>> > > >   
>> > > > https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst
>> > > >
>> > > > LLVM has implemented this same mapping (Ztso is still behind a
>> > > > experimental flag in LLVM, so there is *not* a defined ABI for this 
>> > > > yet).
>> > > >   https://reviews.llvm.org/D143076
>> > >
>> > > Given the recent patches/discussions, it seems worth pointing out the
>> > > the Ztso mappings referred to above was designed to be compatible with
>> > > the mappings in Table A.6 and that they are _not_ compatible with the
>> > > mappings in Table A.7 or with a "subset" of A.7 (even assuming RVTSO).
>> >
>> > I guess that brings up the question of what we should do about WMO/TSO
>> > compatibility.  IIUC the general plan has been that WMO binaries would be
>> > compatible with TSO binaries when run on TSO systems, and that TSO binaries
>> > would require TSO systems.
>> >
>> > I suppose it would be possible to have TSO produce binaries that would run
>> > on WMO systems by just emitting a bunch of extra fences, but I don't think
>> > anyone wants that?
>> >
>> > We've always just assumed that WMO binaries would be compatible with TSO
>> > binaries, but I don't think it's ever really been concretely discussed.
>> > Having an ABI break here wouldn't be the craziest idea as it'd let us fix
>> > some other issues, but that'd certainly need to be pretty widely discussed.
>> >
>> > Do we have an idea of what A.7-compatible TSO mappings would look like?
>>
>> As in riscv-tso-mappings.rst but with
>>
>>   atomic_store(memory_order_seq_cst)  |  s{b|h|w|d} ; fence rw,rw
>>
>> would be A.7-compatible: call the resulting mappings "A.6-tso".
>>
>> A.6-tso is (also) compatible with the following subset of A.7:
>>
>> C/C++ Construct | A.7-tso Mapping
>> --
>> Non-atomic load | l{b|h|w|d}
>> atomic_load(memory_order_relaxed| l{b|h|w|d}
>> atomic_load(memory_order_acquire)   | l{b|h|w|d}
>> atomic_load(memory_order_seq_cst)   | l{b|h|w|d}.aq
>> --
>> Non-atomic store| s{b|h|w|d}
>> atomic_store(memory_order_relaxed)  | s{b|h|w|d}
>> atomic_store(memory_order_release)  | s{b|h|w|d}
>> atomic_store(memory_order_seq_cst)  | s{b|h|w|d}.rl
>> --
>> atomic_thread_fence(memory_order_acquire)   | nop
>> atomic_thread_fence(memory_order_release)   | nop
>> atomic_thread_fence(memory_order_acq_rel)   | nop
>> atomic_thread_fence(memory_order_seq_cst)   | fence rw,rw
>> --
>> C/C++ Construct | RVTSO AMO Mapping
>> atomic_(memory_order_relaxed)   | amo.{w|d}
>> atomic_(memory_order_acquire)   | amo.{w|d}
>> atomic_(memory_order_release)   | amo.{w|d}
>> atomic_(memory_order_acq_rel)   | amo.{w|d}
>> atomic_(memory_order_seq_cst)   | amo.{w|d}
>

Re: [RFC] RISC-V: Add proposed Ztso atomic mappings

2023-05-05 Thread Hans Boehm via Gcc-patches

I think A.6-tso also needs to change the last line in the table from
lr.aqrl ... sc to lr.aq ... sc.rl, otherwise I think we have problems with
a subsequent A.7-tso generated l.aq . Otherwise I agree.

I certainly agree that, given the Ztso extension, there should be a
standard compiler-implemented mapping that leverages it. I'm personally
much less enthusiastic about calling it an ABI. I'd like to see clarity
that the RVWMO ABI is the standard we expect portable libraries to be
prepared to use. If they want to test for and use Ztso internally, fine.
But having users deal with two different ABIs seems like a very high cost
for avoiding some (basically no-op?) fences.

Hans



On Fri, May 5, 2023 at 1:11 PM Andrea Parri  wrote:

> On Fri, May 05, 2023 at 12:18:12PM -0700, Palmer Dabbelt wrote:
> > On Fri, 05 May 2023 11:55:31 PDT (-0700), Andrea Parri wrote:
> > > On Fri, May 05, 2023 at 10:12:56AM -0700, Patrick O'Neill wrote:
> > > > The RISC-V Ztso extension currently has no effect on generated code.
> > > > With the additional ordering constraints guarenteed by Ztso, we can
> emit
> > > > more optimized atomic mappings than the RVWMO mappings.
> > > >
> > > > This patch implements Andrea Parri's proposed Ztso mappings
> ("Proposed
> > > > Mapping").
> > > >
> https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst
> > > >
> > > > LLVM has implemented this same mapping (Ztso is still behind a
> > > > experimental flag in LLVM, so there is *not* a defined ABI for this
> yet).
> > > >   https://reviews.llvm.org/D143076
> > >
> > > Given the recent patches/discussions, it seems worth pointing out the
> > > the Ztso mappings referred to above was designed to be compatible with
> > > the mappings in Table A.6 and that they are _not_ compatible with the
> > > mappings in Table A.7 or with a "subset" of A.7 (even assuming RVTSO).
> >
> > I guess that brings up the question of what we should do about WMO/TSO
> > compatibility.  IIUC the general plan has been that WMO binaries would be
> > compatible with TSO binaries when run on TSO systems, and that TSO
> binaries
> > would require TSO systems.
> >
> > I suppose it would be possible to have TSO produce binaries that would
> run
> > on WMO systems by just emitting a bunch of extra fences, but I don't
> think
> > anyone wants that?
> >
> > We've always just assumed that WMO binaries would be compatible with TSO
> > binaries, but I don't think it's ever really been concretely discussed.
> > Having an ABI break here wouldn't be the craziest idea as it'd let us fix
> > some other issues, but that'd certainly need to be pretty widely
> discussed.
> >
> > Do we have an idea of what A.7-compatible TSO mappings would look like?
>
> As in riscv-tso-mappings.rst but with
>
>   atomic_store(memory_order_seq_cst)  |  s{b|h|w|d} ; fence rw,rw
>
> would be A.7-compatible: call the resulting mappings "A.6-tso".
>
> A.6-tso is (also) compatible with the following subset of A.7:
>
> C/C++ Construct | A.7-tso Mapping
>
> --
> Non-atomic load | l{b|h|w|d}
> atomic_load(memory_order_relaxed| l{b|h|w|d}
> atomic_load(memory_order_acquire)   | l{b|h|w|d}
> atomic_load(memory_order_seq_cst)   | l{b|h|w|d}.aq
>
> --
> Non-atomic store| s{b|h|w|d}
> atomic_store(memory_order_relaxed)  | s{b|h|w|d}
> atomic_store(memory_order_release)  | s{b|h|w|d}
> atomic_store(memory_order_seq_cst)  | s{b|h|w|d}.rl
>
> --
> atomic_thread_fence(memory_order_acquire)   | nop
> atomic_thread_fence(memory_order_release)   | nop
> atomic_thread_fence(memory_order_acq_rel)   | nop
> atomic_thread_fence(memory_order_seq_cst)   | fence rw,rw
>
> --
> C/C++ Construct | RVTSO AMO Mapping
> atomic_(memory_order_relaxed)   | amo.{w|d}
> atomic_(memory_order_acquire)   | amo.{w|d}
> atomic_(memory_order_release)   | amo.{w|d}
> atomic_(memory_order_acq_rel)   | amo.{w|d}
> atomic_(memory_order_seq_cst)   | amo.{w|d}
>
> --
> C/C++ Construct | RVTSO LR/SC Mapping
> atomic_(memory_order_relaxed)   | loop: lr.{w|d} ;  ;
> |   sc.{w|d} ; bnez
> loop
> atomic_(memory_order_acquire)   | loop: lr.{w|d} ;  ;
> |   sc.{w|d} ; bnez
> loop
> atomic_(memory_order_release)   | loop: lr.{w|d}

Re: [PATCH] libffi: fix handling of homogeneous float128 structs [PR109447]

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Thu, May 04, 2023 at 02:29:34PM -0500, Peter Bergner wrote:
> I'd like to pull in Dan's upstream libffi commit into trunk to fix a
> wrong code bug/testsuite failure on powerpc64le-linux with long double
> defaulting to ieee128.  This passed bootstrap and regtesting with no
> regressions.  Ok for trunk?
> 
> This bug is also on the GCC 12 and GCC 11 release branches. Ok there too
> assuming testing is clean?  I can wait to push the gcc12 backport until
> after the release.
> 
> Peter
> 
> 
> If there is a homogeneous struct with float128 members, they should be
> copied to vector register save area. The current code incorrectly copies
> only the value of the first member, not increasing the pointer with each
> iteration. Fix this.
> 
> Merged from upstream libffi commit: 464b4b66e3cf3b5489e730c1466ee1bf825560e0
> 
> 2023-05-03  Dan Horák 
> 
> libffi/
>   PR libffi/109447
>   * src/powerpc/ffi_linux64.c (ffi_prep_args64): Update arg.f128 pointer.

Ok for 14/13.2/12.4 (i.e. after 12.3 is out)/11.4

> diff --git a/libffi/src/powerpc/ffi_linux64.c 
> b/libffi/src/powerpc/ffi_linux64.c
> index 4d50878e402..3454dacd3d6 100644
> --- a/libffi/src/powerpc/ffi_linux64.c
> +++ b/libffi/src/powerpc/ffi_linux64.c
> @@ -680,7 +680,7 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long *const 
> stack)
>  {
>if (vecarg_count < NUM_VEC_ARG_REGISTERS64
>&& i < nfixedargs)
> - memcpy (vec_base.f128++, arg.f128, sizeof (float128));
> + memcpy (vec_base.f128++, arg.f128++, sizeof (float128));
>else
>   memcpy (next_arg.f128, arg.f128++, sizeof (float128));
>if (++next_arg.f128 == gpr_end.f128)

Jakub

Re: [PATCH] libffi: fix handling of homogeneous float128 structs [PR109447]

2023-05-05 Thread Peter Bergner via Gcc-patches

On 5/4/23 2:29 PM, Peter Bergner wrote:
> I'd like to pull in Dan's upstream libffi commit into trunk to fix a
> wrong code bug/testsuite failure on powerpc64le-linux with long double
> defaulting to ieee128.  This passed bootstrap and regtesting with no
> regressions.  Ok for trunk?
> 
> This bug is also on the GCC 12 and GCC 11 release branches. Ok there too
> assuming testing is clean?  I can wait to push the gcc12 backport until
> after the release.

Oops, and of course, this needs to be backported to GCC 13 as well.

Peter

[PATCH] gimple-range-op: Improve handling of sin/cos ranges

2023-05-05 Thread Jakub Jelinek via Gcc-patches

Hi!

Similarly to the earlier sqrt patch, this patch attempts to improve
sin/cos ranges.  As the functions are periodic, for the reverse range
there is not much we can do (but I've discovered I forgot to take
into account the boundary ulps for the discovery of impossible result
ranges).  For fold_range, we can do something only if the range is
narrow enough (narrower than 2*pi).  The patch computes the value of
the functions (taking ulps into account) and also computes the derivative
to find out if the function is growing or declining on the boundaries and
from that it figures out if the result range should be
[min (fn (lb), fn (ub)), max (fn (lb), fn (ub))] or if it needs to be
extended to 1 (actually using +Inf) and/or -1 (actually using -Inf) because
there must be a local minimum and/or maximum in the range.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-05-05  Jakub Jelinek  

* real.h (dconst_pi): Define.
(dconst_e_ptr): Formatting fix.
(dconst_pi_ptr): Declare.
* real.cc (dconst_pi_ptr): New function.
* gimple-range-op.cc (cfn_sincos::fold_range): Intersect the generic
boundaries range with range computed from sin/cos of the particular
bounds if the argument range is shorter than 2*pi.
(cfn_sincos::op1_range): Take bulps into account when determining
which result ranges are always invalid or behave like known NAN.

* gcc.dg/tree-ssa/range-sincos-2.c: New test.

--- gcc/real.h.jj   2023-04-19 09:33:59.434350121 +0200
+++ gcc/real.h  2023-05-05 16:36:35.606611170 +0200
@@ -480,9 +480,13 @@ extern REAL_VALUE_TYPE dconstninf;
 #define dconst_sixth() (*dconst_sixth_ptr ())
 #define dconst_ninth() (*dconst_ninth_ptr ())
 #define dconst_sqrt2() (*dconst_sqrt2_ptr ())
+#define dconst_pi() (*dconst_pi_ptr ())
 
 /* Function to return the real value special constant 'e'.  */
-extern const REAL_VALUE_TYPE * dconst_e_ptr (void);
+extern const REAL_VALUE_TYPE *dconst_e_ptr (void);
+
+/* Function to return the real value special constant 'pi'.  */
+extern const REAL_VALUE_TYPE *dconst_pi_ptr (void);
 
 /* Returns a cached REAL_VALUE_TYPE corresponding to 1/n, for various n.  */
 extern const REAL_VALUE_TYPE *dconst_third_ptr (void);
--- gcc/real.cc.jj  2023-04-20 09:36:09.066376175 +0200
+++ gcc/real.cc 2023-05-05 16:39:25.244201299 +0200
@@ -2475,6 +2475,26 @@ dconst_e_ptr (void)
   return &value;
 }
 
+/* Returns the special REAL_VALUE_TYPE corresponding to 'pi'.  */
+
+const REAL_VALUE_TYPE *
+dconst_pi_ptr (void)
+{
+  static REAL_VALUE_TYPE value;
+
+  /* Initialize mathematical constants for constant folding builtins.
+ These constants need to be given to at least 160 bits precision.  */
+  if (value.cl == rvc_zero)
+{
+  auto_mpfr m (SIGNIFICAND_BITS);
+  mpfr_set_si (m, -1, MPFR_RNDN);
+  mpfr_acos (m, m, MPFR_RNDN);
+  real_from_mpfr (&value, m, NULL_TREE, MPFR_RNDN);
+
+}
+  return &value;
+}
+
 /* Returns a cached REAL_VALUE_TYPE corresponding to 1/n, for various n.  */
 
 #define CACHED_FRACTION(NAME, N)   \
--- gcc/gimple-range-op.cc.jj   2023-05-05 16:02:48.174419009 +0200
+++ gcc/gimple-range-op.cc  2023-05-05 19:44:27.292304968 +0200
@@ -633,6 +633,98 @@ public:
   }
 if (!lh.maybe_isnan () && !lh.maybe_isinf ())
   r.clear_nan ();
+
+unsigned ulps
+  = targetm.libm_function_max_error (m_cfn, TYPE_MODE (type), false);
+if (ulps == ~0U)
+  return true;
+REAL_VALUE_TYPE lb = lh.lower_bound ();
+REAL_VALUE_TYPE ub = lh.upper_bound ();
+REAL_VALUE_TYPE diff;
+real_arithmetic (&diff, MINUS_EXPR, &ub, &lb);
+if (!real_isfinite (&diff))
+  return true;
+REAL_VALUE_TYPE pi = dconst_pi ();
+REAL_VALUE_TYPE pix2;
+real_arithmetic (&pix2, PLUS_EXPR, &pi, &pi);
+// We can only try to narrow the range further if ub-lb < 2*pi.
+if (!real_less (&diff, &pix2))
+  return true;
+REAL_VALUE_TYPE lb_lo, lb_hi, ub_lo, ub_hi;
+REAL_VALUE_TYPE lb_deriv_lo, lb_deriv_hi, ub_deriv_lo, ub_deriv_hi;
+if (!frange_mpfr_arg1 (&lb_lo, &lb_hi,
+  m_cfn == CFN_SIN ? mpfr_sin : mpfr_cos, lb,
+  type, ulps)
+   || !frange_mpfr_arg1 (&ub_lo, &ub_hi,
+ m_cfn == CFN_SIN ? mpfr_sin : mpfr_cos, ub,
+ type, ulps)
+   || !frange_mpfr_arg1 (&lb_deriv_lo, &lb_deriv_hi,
+ m_cfn == CFN_SIN ? mpfr_cos : mpfr_sin, lb,
+ type, 0)
+   || !frange_mpfr_arg1 (&ub_deriv_lo, &ub_deriv_hi,
+ m_cfn == CFN_SIN ? mpfr_cos : mpfr_sin, ub,
+ type, 0))
+  return true;
+if (m_cfn == CFN_COS)
+  {
+   // Derivative of cos is -sin, so negate.
+   lb_deriv_lo.sign ^= 1;
+   lb_deriv_hi.sign ^= 1;
+   ub_deriv_lo.sign ^= 1;
+   ub_deriv_hi.si

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 03:37:47PM +, Tamar Christina wrote:
> > 2023-05-05  Jakub Jelinek  
> > 
> > * Makefile.in (check_p_numbers): Rename to one_to_, move
> > earlier with helper variables also renamed.
> > (MATCH_SPLUT_SEQ): Use $(wordlist 1,$(NUM_MATCH_SPLITS),$(one_to_))
> > instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
> > (check_p_subdirs): Use $(one_to_) instead of $(check_p_numbers).

Passed bootstrap/regtest on x86_64-linux and i686-linux, ok for trunk?

Jakub

Re: [RFC] RISC-V: Add proposed Ztso atomic mappings

2023-05-05 Thread Andrea Parri

On Fri, May 05, 2023 at 12:18:12PM -0700, Palmer Dabbelt wrote:
> On Fri, 05 May 2023 11:55:31 PDT (-0700), Andrea Parri wrote:
> > On Fri, May 05, 2023 at 10:12:56AM -0700, Patrick O'Neill wrote:
> > > The RISC-V Ztso extension currently has no effect on generated code.
> > > With the additional ordering constraints guarenteed by Ztso, we can emit
> > > more optimized atomic mappings than the RVWMO mappings.
> > > 
> > > This patch implements Andrea Parri's proposed Ztso mappings ("Proposed
> > > Mapping").
> > >   
> > > https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst
> > > 
> > > LLVM has implemented this same mapping (Ztso is still behind a
> > > experimental flag in LLVM, so there is *not* a defined ABI for this yet).
> > >   https://reviews.llvm.org/D143076
> > 
> > Given the recent patches/discussions, it seems worth pointing out the
> > the Ztso mappings referred to above was designed to be compatible with
> > the mappings in Table A.6 and that they are _not_ compatible with the
> > mappings in Table A.7 or with a "subset" of A.7 (even assuming RVTSO).
> 
> I guess that brings up the question of what we should do about WMO/TSO
> compatibility.  IIUC the general plan has been that WMO binaries would be
> compatible with TSO binaries when run on TSO systems, and that TSO binaries
> would require TSO systems.
> 
> I suppose it would be possible to have TSO produce binaries that would run
> on WMO systems by just emitting a bunch of extra fences, but I don't think
> anyone wants that?
> 
> We've always just assumed that WMO binaries would be compatible with TSO
> binaries, but I don't think it's ever really been concretely discussed.
> Having an ABI break here wouldn't be the craziest idea as it'd let us fix
> some other issues, but that'd certainly need to be pretty widely discussed.
> 
> Do we have an idea of what A.7-compatible TSO mappings would look like?

As in riscv-tso-mappings.rst but with

  atomic_store(memory_order_seq_cst)  |  s{b|h|w|d} ; fence rw,rw

would be A.7-compatible: call the resulting mappings "A.6-tso".

A.6-tso is (also) compatible with the following subset of A.7:

C/C++ Construct | A.7-tso Mapping
--
Non-atomic load | l{b|h|w|d}
atomic_load(memory_order_relaxed| l{b|h|w|d}
atomic_load(memory_order_acquire)   | l{b|h|w|d}
atomic_load(memory_order_seq_cst)   | l{b|h|w|d}.aq
--
Non-atomic store| s{b|h|w|d}
atomic_store(memory_order_relaxed)  | s{b|h|w|d}
atomic_store(memory_order_release)  | s{b|h|w|d}
atomic_store(memory_order_seq_cst)  | s{b|h|w|d}.rl
--
atomic_thread_fence(memory_order_acquire)   | nop
atomic_thread_fence(memory_order_release)   | nop
atomic_thread_fence(memory_order_acq_rel)   | nop
atomic_thread_fence(memory_order_seq_cst)   | fence rw,rw
--
C/C++ Construct | RVTSO AMO Mapping
atomic_(memory_order_relaxed)   | amo.{w|d}
atomic_(memory_order_acquire)   | amo.{w|d}
atomic_(memory_order_release)   | amo.{w|d}
atomic_(memory_order_acq_rel)   | amo.{w|d}
atomic_(memory_order_seq_cst)   | amo.{w|d}
--
C/C++ Construct | RVTSO LR/SC Mapping
atomic_(memory_order_relaxed)   | loop: lr.{w|d} ;  ;
|   sc.{w|d} ; bnez loop
atomic_(memory_order_acquire)   | loop: lr.{w|d} ;  ;
|   sc.{w|d} ; bnez loop
atomic_(memory_order_release)   | loop: lr.{w|d} ;  ;
|   sc.{w|d} ; bnez loop
atomic_(memory_order_acq_rel)   | loop: lr.{w|d} ;  ;
|   sc.{w|d} ; bnez loop
atomic_(memory_order_seq_cst)   | loop: lr.{w|d}.aq ;  ;
|   sc.{w|d}.rl ; bnez loop

  Andrea

Re: [PATCH] Fortran: overloading of intrinsic binary operators [PR109641]

2023-05-05 Thread Harald Anlauf via Gcc-patches


Hi Mikael,

On 5/5/23 13:43, Mikael Morin wrote:

Hello,

Le 01/05/2023 à 18:29, Harald Anlauf via Fortran a écrit :



+/* Given two expressions, check that their rank is conformable, i.e.
either
+   both have the same rank or at least one is a scalar.  */
+
+bool
+gfc_op_rank_conformable (gfc_expr *op1, gfc_expr *op2)
+{
+//  if (op1->expr_type == EXPR_VARIABLE && op1->ref)

Please remove this, and the other one below.


oops, that was a leftover from debugging sessions, which
I missed during my final pass.  Fixed and pushed as
r14-529-g185da7c2014ba41f38dd62cc719873ebf020b076.

Thanks for the review!

Harald


+  if (op1->expr_type == EXPR_VARIABLE)
+    gfc_expression_rank (op1);
+//  if (op2->expr_type == EXPR_VARIABLE && op2->ref)
+  if (op2->expr_type == EXPR_VARIABLE)
+    gfc_expression_rank (op2);
+
+  return (op1->rank == 0 || op2->rank == 0 || op1->rank == op2->rank);
+}
+
+
 static void
 add_caf_get_intrinsic (gfc_expr *e)
 {


The rest looks good.
OK for master, and backport as well.

Thanks
Mikael

Re: [RFC] RISC-V: Add proposed Ztso atomic mappings

2023-05-05 Thread Palmer Dabbelt


On Fri, 05 May 2023 11:55:31 PDT (-0700), Andrea Parri wrote:

On Fri, May 05, 2023 at 10:12:56AM -0700, Patrick O'Neill wrote:

The RISC-V Ztso extension currently has no effect on generated code.
With the additional ordering constraints guarenteed by Ztso, we can emit
more optimized atomic mappings than the RVWMO mappings.

This patch implements Andrea Parri's proposed Ztso mappings ("Proposed
Mapping").
  https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst

LLVM has implemented this same mapping (Ztso is still behind a
experimental flag in LLVM, so there is *not* a defined ABI for this yet).
  https://reviews.llvm.org/D143076


Given the recent patches/discussions, it seems worth pointing out the
the Ztso mappings referred to above was designed to be compatible with
the mappings in Table A.6 and that they are _not_ compatible with the
mappings in Table A.7 or with a "subset" of A.7 (even assuming RVTSO).


I guess that brings up the question of what we should do about WMO/TSO 
compatibility.  IIUC the general plan has been that WMO binaries would 
be compatible with TSO binaries when run on TSO systems, and that TSO 
binaries would require TSO systems.


I suppose it would be possible to have TSO produce binaries that would 
run on WMO systems by just emitting a bunch of extra fences, but I don't 
think anyone wants that?


We've always just assumed that WMO binaries would be compatible with TSO 
binaries, but I don't think it's ever really been concretely discussed.  
Having an ABI break here wouldn't be the craziest idea as it'd let us 
fix some other issues, but that'd certainly need to be pretty widely 
discussed.


Do we have an idea of what A.7-compatible TSO mappings would look like?

Re: [RFC] RISC-V: Add proposed Ztso atomic mappings

2023-05-05 Thread Andrea Parri

On Fri, May 05, 2023 at 10:12:56AM -0700, Patrick O'Neill wrote:
> The RISC-V Ztso extension currently has no effect on generated code.
> With the additional ordering constraints guarenteed by Ztso, we can emit
> more optimized atomic mappings than the RVWMO mappings.
> 
> This patch implements Andrea Parri's proposed Ztso mappings ("Proposed
> Mapping").
>   https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst
> 
> LLVM has implemented this same mapping (Ztso is still behind a
> experimental flag in LLVM, so there is *not* a defined ABI for this yet).
>   https://reviews.llvm.org/D143076

Given the recent patches/discussions, it seems worth pointing out the
the Ztso mappings referred to above was designed to be compatible with
the mappings in Table A.6 and that they are _not_ compatible with the
mappings in Table A.7 or with a "subset" of A.7 (even assuming RVTSO).

  Andrea

Re: [PATCH] c++: list CTAD and resolve_nondeduced_context [PR106214]

2023-05-05 Thread Jason Merrill via Gcc-patches


On 5/5/23 13:36, Patrick Palka wrote:

This extends the PR93107 fix, which made us do resolve_nondeduced_context
on the elements of an initializer list during auto deduction, to happen
for CTAD as well.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/106214
PR c++/93107

gcc/cp/ChangeLog:

* pt.cc (do_auto_deduction): Move up resolve_nondeduced_context
calls to happen before do_class_deduction.  Add some error_mark_node
tests.


Maybe move them even higher?  I suppose it shouldn't actually make a 
difference, but this seems to make sense right after the early returns.


OK either way.


gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction114.C: New test.
---
  gcc/cp/pt.cc  | 27 +-
  .../g++.dg/cpp1z/class-deduction114.C | 28 +++
  2 files changed, 41 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction114.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8d66fde9f11..94e1664d00c 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30793,7 +30793,7 @@ do_auto_deduction (tree type, tree init, tree auto_node,
   int flags /* = LOOKUP_NORMAL */,
   tree tmpl /* = NULL_TREE */)
  {
-  if (init == error_mark_node)
+  if (type == error_mark_node || init == error_mark_node)
  return error_mark_node;
  
if (init && type_dependent_expression_p (init)

@@ -30827,6 +30827,17 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
/*return*/true)))
  init = r;
  
+  if (init && BRACE_ENCLOSED_INITIALIZER_P (init))

+{
+  /* We don't recurse here because we can't deduce from a nested
+initializer_list.  */
+  if (CONSTRUCTOR_ELTS (init))
+   for (constructor_elt &elt : CONSTRUCTOR_ELTS (init))
+ elt.value = resolve_nondeduced_context (elt.value, complain);
+}
+  else if (init)
+init = resolve_nondeduced_context (init, complain);
+
if (tree ctmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node))
  /* C++17 class template argument deduction.  */
  return do_class_deduction (type, ctmpl, init, flags, complain);
@@ -30861,24 +30872,12 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
}
  }
  
-  if (type == error_mark_node)

+  if (type == error_mark_node || init == error_mark_node)
  return error_mark_node;
  
-  if (BRACE_ENCLOSED_INITIALIZER_P (init))

-{
-  /* We don't recurse here because we can't deduce from a nested
-initializer_list.  */
-  if (CONSTRUCTOR_ELTS (init))
-   for (constructor_elt &elt : CONSTRUCTOR_ELTS (init))
- elt.value = resolve_nondeduced_context (elt.value, complain);
-}
-  else
-init = resolve_nondeduced_context (init, complain);
-
tree targs;
if (context == adc_decomp_type
&& auto_node == type
-  && init != error_mark_node
&& TREE_CODE (TREE_TYPE (init)) == ARRAY_TYPE)
  {
/* [dcl.struct.bind]/1 - if decomposition declaration has no 
ref-qualifiers
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction114.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction114.C
new file mode 100644
index 000..ba6921d1b96
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction114.C
@@ -0,0 +1,28 @@
+// PR c++/106214
+// { dg-do compile { target c++17 } }
+// A version of cpp0x/initlist-deduce3.C using list CTAD instead
+// of ordinary auto deduction from std::initializer_list.
+
+using size_t = decltype(sizeof 0);
+
+namespace std {
+  template struct initializer_list {
+const T *ptr;
+size_t n;
+initializer_list(const T*, size_t);
+  };
+}
+
+template
+void Task() {}
+
+template
+struct vector {
+  vector(std::initializer_list);
+};
+
+vector a = &Task; // { dg-error "deduction|no match" }
+vector b = { &Task };
+vector e{ &Task };
+vector f = { &Task, &Task };
+vector d = { static_cast(&Task) };

Re: [PATCH] c++: parenthesized -> resolving to static member [PR98283]

2023-05-05 Thread Jason Merrill via Gcc-patches


On 5/5/23 14:30, Patrick Palka wrote:

On Fri, 5 May 2023, Patrick Palka wrote:


Here we're neglecting to propagate parenthesized-ness when the member
access expression (this->m) resolves to a static member (and thus
finish_class_member_access_expr yields a VAR_DECL instead of a
COMPONENT_REF).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does look OK for
trunk?

PR c++/98283

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build) : Use
force_paren_expr on the result of finish_class_member_access_expr
if REF_PARENTHESIZED_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/paren6.C: New test.
---
  gcc/cp/pt.cc|  4 ++--
  gcc/cp/semantics.cc |  4 ++--
  gcc/testsuite/g++.dg/cpp1y/paren6.C | 14 ++
  3 files changed, 18 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/paren6.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 5446b5058b7..9f5549e8f29 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21502,8 +21502,8 @@ tsubst_copy_and_build (tree t,
r = finish_class_member_access_expr (object, member,
 /*template_p=*/false,
 complain);
-   if (TREE_CODE (r) == COMPONENT_REF)
- REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t);
+   if (REF_PARENTHESIZED_P (t))
+ r = force_paren_expr (r);
RETURN (r);
}
  
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc

index 474da71bff6..c4fea2f0f0f 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -2069,8 +2069,8 @@ finish_mem_initializers (tree mem_inits)
 that the call to finish_decltype in do_auto_deduction will give the
 right result.  If EVEN_UNEVAL, do this even in unevaluated context.  */
  
-tree

-force_paren_expr (tree expr, bool even_uneval)
+static tree
+force_paren_expr (tree expr, bool even_uneval /* = false */)


Whoops, I managed to send the wrong amended patch (I opted to document
this default arg as a drive-by change after the fact).  The correct
patch is:

-- >8 --

Subject: [PATCH] c++: parenthesized -> resolving to static data member
  [PR98283]

Here we're neglecting to propagate parenthesized-ness when the
member access (this->m) resolves to a static data member (and
thus finish_class_member_access_expr yields a VAR_DECL instead
of a COMPONENT_REF).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does look OK for
trunk?


OK.


PR c++/98283

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build) : Use
force_paren_expr on the result of finish_class_member_access_expr
if REF_PARENTHESIZED_P.
* semantics.cc (force_paren_expr): Document default argument.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/paren6.C: New test.
---
  gcc/cp/pt.cc|  4 ++--
  gcc/cp/semantics.cc |  2 +-
  gcc/testsuite/g++.dg/cpp1y/paren6.C | 14 ++
  3 files changed, 17 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/paren6.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 5446b5058b7..9f5549e8f29 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21502,8 +21502,8 @@ tsubst_copy_and_build (tree t,
r = finish_class_member_access_expr (object, member,
 /*template_p=*/false,
 complain);
-   if (TREE_CODE (r) == COMPONENT_REF)
- REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t);
+   if (REF_PARENTHESIZED_P (t))
+ r = force_paren_expr (r);
RETURN (r);
}
  
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc

index 474da71bff6..13c6582b628 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -2070,7 +2070,7 @@ finish_mem_initializers (tree mem_inits)
 right result.  If EVEN_UNEVAL, do this even in unevaluated context.  */
  
  tree

-force_paren_expr (tree expr, bool even_uneval)
+force_paren_expr (tree expr, bool even_uneval /* = false */)
  {
/* This is only needed for decltype(auto) in C++14.  */
if (cxx_dialect < cxx14)
diff --git a/gcc/testsuite/g++.dg/cpp1y/paren6.C 
b/gcc/testsuite/g++.dg/cpp1y/paren6.C
new file mode 100644
index 000..812a99ca91c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/paren6.C
@@ -0,0 +1,14 @@
+// PR c++/98283
+// { dg-do compile { target c++14 } }
+
+struct A {
+  static int m;
+};
+
+template
+struct B : T {
+  decltype(auto) f() { return (this->m);  }
+};
+
+using type = decltype(B().f());
+using type = int&;

Re: [PATCH] c++: parenthesized -> resolving to static member [PR98283]

2023-05-05 Thread Patrick Palka via Gcc-patches

On Fri, 5 May 2023, Patrick Palka wrote:

> Here we're neglecting to propagate parenthesized-ness when the member
> access expression (this->m) resolves to a static member (and thus
> finish_class_member_access_expr yields a VAR_DECL instead of a
> COMPONENT_REF).
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does look OK for
> trunk?
> 
>   PR c++/98283
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (tsubst_copy_and_build) : Use
>   force_paren_expr on the result of finish_class_member_access_expr
>   if REF_PARENTHESIZED_P.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1y/paren6.C: New test.
> ---
>  gcc/cp/pt.cc|  4 ++--
>  gcc/cp/semantics.cc |  4 ++--
>  gcc/testsuite/g++.dg/cpp1y/paren6.C | 14 ++
>  3 files changed, 18 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/paren6.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 5446b5058b7..9f5549e8f29 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -21502,8 +21502,8 @@ tsubst_copy_and_build (tree t,
>   r = finish_class_member_access_expr (object, member,
>/*template_p=*/false,
>complain);
> - if (TREE_CODE (r) == COMPONENT_REF)
> -   REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t);
> + if (REF_PARENTHESIZED_P (t))
> +   r = force_paren_expr (r);
>   RETURN (r);
>}
>  
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 474da71bff6..c4fea2f0f0f 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -2069,8 +2069,8 @@ finish_mem_initializers (tree mem_inits)
> that the call to finish_decltype in do_auto_deduction will give the
> right result.  If EVEN_UNEVAL, do this even in unevaluated context.  */
>  
> -tree
> -force_paren_expr (tree expr, bool even_uneval)
> +static tree
> +force_paren_expr (tree expr, bool even_uneval /* = false */)

Whoops, I managed to send the wrong amended patch (I opted to document
this default arg as a drive-by change after the fact).  The correct
patch is:

-- >8 --

Subject: [PATCH] c++: parenthesized -> resolving to static data member
 [PR98283]

Here we're neglecting to propagate parenthesized-ness when the
member access (this->m) resolves to a static data member (and
thus finish_class_member_access_expr yields a VAR_DECL instead
of a COMPONENT_REF).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does look OK for
trunk?

PR c++/98283

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build) : Use
force_paren_expr on the result of finish_class_member_access_expr
if REF_PARENTHESIZED_P.
* semantics.cc (force_paren_expr): Document default argument.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/paren6.C: New test.
---
 gcc/cp/pt.cc|  4 ++--
 gcc/cp/semantics.cc |  2 +-
 gcc/testsuite/g++.dg/cpp1y/paren6.C | 14 ++
 3 files changed, 17 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/paren6.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 5446b5058b7..9f5549e8f29 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21502,8 +21502,8 @@ tsubst_copy_and_build (tree t,
r = finish_class_member_access_expr (object, member,
 /*template_p=*/false,
 complain);
-   if (TREE_CODE (r) == COMPONENT_REF)
- REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t);
+   if (REF_PARENTHESIZED_P (t))
+ r = force_paren_expr (r);
RETURN (r);
   }
 
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 474da71bff6..13c6582b628 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -2070,7 +2070,7 @@ finish_mem_initializers (tree mem_inits)
right result.  If EVEN_UNEVAL, do this even in unevaluated context.  */
 
 tree
-force_paren_expr (tree expr, bool even_uneval)
+force_paren_expr (tree expr, bool even_uneval /* = false */)
 {
   /* This is only needed for decltype(auto) in C++14.  */
   if (cxx_dialect < cxx14)
diff --git a/gcc/testsuite/g++.dg/cpp1y/paren6.C 
b/gcc/testsuite/g++.dg/cpp1y/paren6.C
new file mode 100644
index 000..812a99ca91c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/paren6.C
@@ -0,0 +1,14 @@
+// PR c++/98283
+// { dg-do compile { target c++14 } }
+
+struct A {
+  static int m;
+};
+
+template
+struct B : T {
+  decltype(auto) f() { return (this->m);  }
+};
+
+using type = decltype(B().f());
+using type = int&;
-- 
2.40.1.476.g69c786637d

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Alexander Monakov 
> Sent: Friday, May 5, 2023 7:22 PM
> To: Tamar Christina 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Makefile.in: clean up match.pd-related dependencies
> 
> 
> On Fri, 5 May 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Alexander Monakov 
> > > Sent: Friday, May 5, 2023 6:59 PM
> > > To: Tamar Christina 
> > > Cc: Richard Biener ;
> > > gcc-patches@gcc.gnu.org
> > > Subject: RE: [PATCH] Makefile.in: clean up match.pd-related
> > > dependencies
> > >
> > >
> > > On Fri, 5 May 2023, Tamar Christina wrote:
> > >
> > > > > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via
> > > > > > Gcc-patches
> > > > > >  > > > > patc...@gcc.gnu.org>:
> > > > > >
> > > > > > Clean up confusing changes from the recent refactoring for
> > > > > > parallel match.pd build.
> > > > > >
> > > > > > gimple-match-head.o is not built. Remove related flags adjustment.
> > > > > >
> > > > > > Autogenerated gimple-match-N.o files do not depend on
> > > > > > gimple-match-exports.cc.
> > > > > >
> > > > > > {gimple,generic)-match-auto.h only depend on the prerequisites
> > > > > > of the corresponding s-{gimple,generic}-match stamp file, not any 
> > > > > > .cc
> file.
> > > > >
> > > > > LGTM
> > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >* Makefile.in: (gimple-match-head.o-warn): Remove.
> > > > > >(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
> > > > > >gimple-match-exports.cc.
> > > > > >(gimple-match-auto.h): Only depend on s-gimple-match.
> > > > > >(generic-match-auto.h): Likewise.
> > > > > > ---
> > > > > >
> > > > > > Tamar, do I understand correctly that you do not have more
> > > > > > plans for match.pd and I won't collide with you if I attempt
> > > > > > more cleanups in this
> > > > > area? Thanks!
> > > >
> > > > No, but I'm also not sure why this change.
> > > > The idea here was that if gimple-head-export.cc changes you must
> > > > have changed genmatch.cc and so you need to regenerate the
> > > > gimple-match-*
> > > which could change the header.
> > >
> > > gimple-head-export.cc does not exist.
> > >
> > > gimple-match-exports.cc is not a generated file. It's under source
> > > control and edited independently from genmatch.cc. It is compiled
> > > separately, producing gimple-match-exports.o.
> > >
> > > gimple-match-head.cc is also not a generated file, also under source
> control.
> > > It is transitively included into gimple-match-N.o files. If it
> > > changes, they will be rebuilt. This is not changed by my patch.
> > >
> > > gimple-match-auto.h is a generated file. It depends on
> > > s-gimple-match stamp file, which in turn depends on genmatch and
> > > match.pd. If either changes, the rule for the stamp file triggers.
> > > gimple-match-N.o files also depend on the stamp file, so they will be
> rebuilt as well.
> >
> > s-gimple-match does not depend on gimple-match-head.cc. if it changes
> > the stamp is not invalidated.
> 
> Right, this is correct: there's no need to rerun the recipe for the stamp,
> because contents of gimple-match-head.cc do not affect it.
> 
> > This happens to work because gimple-match-N.cc does depend on
> > gimple-match-head.cc, but if the gimple-match-N.cc already exists then
> nothing changes.
> 
> No, if gimple-match-N.cc already exist, make notices they are out-of-date via
> 
> $(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc;
> @true
> 
> and this triggers rebuilding gimple-match-N.o.
> 
> I tested this. After 'touch gimple-match-head.cc' all ten gimple-match-N.o 
> files
> are rebuilt.
> 
> > So I don't think this changes anything. If anything I would say the
> > stamp file needs to depend on gimple-match-head.cc.
> 
> Is my explanation above satisfactory?

Sure,

Thanks,
Tamar

> 
> Thanks.
> Alexander
> 
> >
> > Thanks,
> > Tamar
> >
> > >
> > > Is there some problem I'm not seeing?
> > >
> > > Thanks.
> > > Alexander
> > >
> > > > So not sure I agree with this.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > >
> > > > > > gcc/Makefile.in | 9 +++--
> > > > > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
> > > > > > 7e7ac078c5..0cc13c37d0 100644
> > > > > > --- a/gcc/Makefile.in
> > > > > > +++ b/gcc/Makefile.in
> > > > > > @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> > > > > > libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn =
> > > > > > -Wno-error libgcov-merge-tool.o-warn = -Wno-error
> > > > > > -gimple-match-head.o-warn = -Wno-unused
> > > > > > gimple-match-exports.o-warn
> > > > > =
> > > > > > -Wno-unused dfp.o-warn = -Wno-strict-aliasing
> > > > > >
> > > > > > @@ -2674,12 +2673,10 @@ s-tm-texi:
> > > > > > build/genhooks$(build_exeext)
> > > > > $(srcdir)/doc/tm.texi.in
> > > > > >  false; \
> > > > > >fi
> > > > > >
> > > > > > -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple

[PATCH] c++: parenthesized -> resolving to static member [PR98283]

2023-05-05 Thread Patrick Palka via Gcc-patches

Here we're neglecting to propagate parenthesized-ness when the member
access expression (this->m) resolves to a static member (and thus
finish_class_member_access_expr yields a VAR_DECL instead of a
COMPONENT_REF).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does look OK for
trunk?

PR c++/98283

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build) : Use
force_paren_expr on the result of finish_class_member_access_expr
if REF_PARENTHESIZED_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/paren6.C: New test.
---
 gcc/cp/pt.cc|  4 ++--
 gcc/cp/semantics.cc |  4 ++--
 gcc/testsuite/g++.dg/cpp1y/paren6.C | 14 ++
 3 files changed, 18 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/paren6.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 5446b5058b7..9f5549e8f29 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21502,8 +21502,8 @@ tsubst_copy_and_build (tree t,
r = finish_class_member_access_expr (object, member,
 /*template_p=*/false,
 complain);
-   if (TREE_CODE (r) == COMPONENT_REF)
- REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t);
+   if (REF_PARENTHESIZED_P (t))
+ r = force_paren_expr (r);
RETURN (r);
   }
 
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 474da71bff6..c4fea2f0f0f 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -2069,8 +2069,8 @@ finish_mem_initializers (tree mem_inits)
that the call to finish_decltype in do_auto_deduction will give the
right result.  If EVEN_UNEVAL, do this even in unevaluated context.  */
 
-tree
-force_paren_expr (tree expr, bool even_uneval)
+static tree
+force_paren_expr (tree expr, bool even_uneval /* = false */)
 {
   /* This is only needed for decltype(auto) in C++14.  */
   if (cxx_dialect < cxx14)
diff --git a/gcc/testsuite/g++.dg/cpp1y/paren6.C 
b/gcc/testsuite/g++.dg/cpp1y/paren6.C
new file mode 100644
index 000..812a99ca91c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/paren6.C
@@ -0,0 +1,14 @@
+// PR c++/98283
+// { dg-do compile { target c++14 } }
+
+struct A {
+  static int m;
+};
+
+template
+struct B : T {
+  decltype(auto) f() { return (this->m);  }
+};
+
+using type = decltype(B().f());
+using type = int&;
-- 
2.40.1.476.g69c786637d

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Alexander Monakov via Gcc-patches



On Fri, 5 May 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Alexander Monakov 
> > Sent: Friday, May 5, 2023 6:59 PM
> > To: Tamar Christina 
> > Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] Makefile.in: clean up match.pd-related dependencies
> > 
> > 
> > On Fri, 5 May 2023, Tamar Christina wrote:
> > 
> > > > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches
> > > > >  > > > patc...@gcc.gnu.org>:
> > > > >
> > > > > Clean up confusing changes from the recent refactoring for
> > > > > parallel match.pd build.
> > > > >
> > > > > gimple-match-head.o is not built. Remove related flags adjustment.
> > > > >
> > > > > Autogenerated gimple-match-N.o files do not depend on
> > > > > gimple-match-exports.cc.
> > > > >
> > > > > {gimple,generic)-match-auto.h only depend on the prerequisites of
> > > > > the corresponding s-{gimple,generic}-match stamp file, not any .cc 
> > > > > file.
> > > >
> > > > LGTM
> > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >* Makefile.in: (gimple-match-head.o-warn): Remove.
> > > > >(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
> > > > >gimple-match-exports.cc.
> > > > >(gimple-match-auto.h): Only depend on s-gimple-match.
> > > > >(generic-match-auto.h): Likewise.
> > > > > ---
> > > > >
> > > > > Tamar, do I understand correctly that you do not have more plans
> > > > > for match.pd and I won't collide with you if I attempt more
> > > > > cleanups in this
> > > > area? Thanks!
> > >
> > > No, but I'm also not sure why this change.
> > > The idea here was that if gimple-head-export.cc changes you must have
> > > changed genmatch.cc and so you need to regenerate the gimple-match-*
> > which could change the header.
> > 
> > gimple-head-export.cc does not exist.
> > 
> > gimple-match-exports.cc is not a generated file. It's under source control 
> > and
> > edited independently from genmatch.cc. It is compiled separately, producing
> > gimple-match-exports.o.
> > 
> > gimple-match-head.cc is also not a generated file, also under source 
> > control.
> > It is transitively included into gimple-match-N.o files. If it changes, 
> > they will be
> > rebuilt. This is not changed by my patch.
> > 
> > gimple-match-auto.h is a generated file. It depends on s-gimple-match stamp
> > file, which in turn depends on genmatch and match.pd. If either changes, the
> > rule for the stamp file triggers. gimple-match-N.o files also depend on the
> > stamp file, so they will be rebuilt as well.
> 
> s-gimple-match does not depend on gimple-match-head.cc. if it changes the 
> stamp
> is not invalidated. 

Right, this is correct: there's no need to rerun the recipe for the stamp,
because contents of gimple-match-head.cc do not affect it.

> This happens to work because gimple-match-N.cc does depend on 
> gimple-match-head.cc,
> but if the gimple-match-N.cc already exists then nothing changes.

No, if gimple-match-N.cc already exist, make notices they are out-of-date via

$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc; @true

and this triggers rebuilding gimple-match-N.o.

I tested this. After 'touch gimple-match-head.cc' all ten gimple-match-N.o files
are rebuilt.

> So I don't think this changes anything. If anything I would say the stamp 
> file needs to
> depend on gimple-match-head.cc. 

Is my explanation above satisfactory?

Thanks.
Alexander

> 
> Thanks,
> Tamar
> 
> > 
> > Is there some problem I'm not seeing?
> > 
> > Thanks.
> > Alexander
> > 
> > > So not sure I agree with this.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > > >
> > > > > gcc/Makefile.in | 9 +++--
> > > > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
> > > > > 7e7ac078c5..0cc13c37d0 100644
> > > > > --- a/gcc/Makefile.in
> > > > > +++ b/gcc/Makefile.in
> > > > > @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> > > > > libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn =
> > > > > -Wno-error libgcov-merge-tool.o-warn = -Wno-error
> > > > > -gimple-match-head.o-warn = -Wno-unused
> > > > > gimple-match-exports.o-warn
> > > > =
> > > > > -Wno-unused dfp.o-warn = -Wno-strict-aliasing
> > > > >
> > > > > @@ -2674,12 +2673,10 @@ s-tm-texi: build/genhooks$(build_exeext)
> > > > $(srcdir)/doc/tm.texi.in
> > > > >  false; \
> > > > >fi
> > > > >
> > > > > -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-
> > head.cc \
> > > > > -gimple-match-exports.cc; @true
> > > > > -gimple-match-auto.h: s-gimple-match gimple-match-head.cc \
> > > > > -gimple-match-exports.cc; @true
> > > > > +$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-
> > head.cc;
> > > > > +@true
> > > > > +gimple-match-auto.h: s-gimple-match; @true
> > > > > $(GENERIC_MATCH_PD_SEQ_SRC): s-generic-match
> > > > > generic-match-head.cc; @true
> > > > > -generic-match-auto.h: s-generic-match generic-match-head.cc;
> > > > > @true
>

Re: [PATCH] c++: goto entering scope of obj w/ non-trivial dtor [PR103091]

2023-05-05 Thread Jason Merrill via Gcc-patches


On 5/5/23 13:36, Patrick Palka wrote:

It seems DR 2256 permitted goto to cross the initialization of a
trivially initialized object with a non-trivial destructor.  We
already supported this as an -fpermissive extension, so this patch
just makes us unconditionally support this.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?


OK.


DR 2256
PR c++/103091

gcc/cp/ChangeLog:

* decl.cc (decl_jump_unsafe): Return bool instead of int.
Don't consider TYPE_HAS_NONTRIVIAL_DESTRUCTOR.
(check_previous_goto_1): Simplify now that decl_jump_unsafe
returns bool instead of int.
(check_goto): Likewise.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.other/init9.C: Don't expect diagnostics for
goto made valid by DR 2256.
* g++.dg/init/goto4.C: New test.
---
  gcc/cp/decl.cc   | 56 ++--
  gcc/testsuite/g++.dg/init/goto4.C| 22 
  gcc/testsuite/g++.old-deja/g++.other/init9.C |  7 +--
  3 files changed, 42 insertions(+), 43 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/init/goto4.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 71d33d2b7a4..23a2b2fef0b 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -69,7 +69,7 @@ enum bad_spec_place {
  
  static const char *redeclaration_error_message (tree, tree);
  
-static int decl_jump_unsafe (tree);

+static bool decl_jump_unsafe (tree);
  static void require_complete_types_for_parms (tree);
  static tree grok_reference_init (tree, tree, tree, int);
  static tree grokvardecl (tree, tree, tree, const cp_decl_specifier_seq *,
@@ -3548,10 +3548,9 @@ declare_local_label (tree id)
return ent ? ent->label_decl : NULL_TREE;
  }
  
-/* Returns nonzero if it is ill-formed to jump past the declaration of

-   DECL.  Returns 2 if it's also a real problem.  */
+/* Returns true if it is ill-formed to jump past the declaration of DECL.  */
  
-static int

+static bool
  decl_jump_unsafe (tree decl)
  {
/* [stmt.dcl]/3: A program that jumps from a point where a local variable
@@ -3562,18 +3561,11 @@ decl_jump_unsafe (tree decl)
   preceding types and is declared without an initializer (8.5).  */
tree type = TREE_TYPE (decl);
  
-  if (!VAR_P (decl) || TREE_STATIC (decl)

-  || type == error_mark_node)
-return 0;
-
-  if (DECL_NONTRIVIALLY_INITIALIZED_P (decl)
-  || variably_modified_type_p (type, NULL_TREE))
-return 2;
-
-  if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type))
-return 1;
-
-  return 0;
+  return (type != error_mark_node
+ && VAR_P (decl)
+ && !TREE_STATIC (decl)
+ && (DECL_NONTRIVIALLY_INITIALIZED_P (decl)
+ || variably_modified_type_p (type, NULL_TREE)));
  }
  
  /* A subroutine of check_previous_goto_1 and check_goto to identify a branch

@@ -3625,27 +3617,18 @@ check_previous_goto_1 (tree decl, cp_binding_level* 
level, tree names,
   new_decls = (DECL_P (new_decls) ? DECL_CHAIN (new_decls)
: TREE_CHAIN (new_decls)))
{
- int problem = decl_jump_unsafe (new_decls);
+ bool problem = decl_jump_unsafe (new_decls);
  if (! problem)
continue;
  
  	  if (!identified)

{
- complained = identify_goto (decl, input_location, locus,
- problem > 1
- ? DK_ERROR : DK_PERMERROR);
+ complained = identify_goto (decl, input_location, locus, 
DK_ERROR);
  identified = 1;
}
  if (complained)
-   {
- if (problem > 1)
-   inform (DECL_SOURCE_LOCATION (new_decls),
-   "  crosses initialization of %q#D", new_decls);
- else
-   inform (DECL_SOURCE_LOCATION (new_decls),
-   "  enters scope of %q#D, which has "
-   "non-trivial destructor", new_decls);
-   }
+   inform (DECL_SOURCE_LOCATION (new_decls),
+   "  crosses initialization of %q#D", new_decls);
}
  
if (b == level)

@@ -3790,9 +3773,9 @@ check_goto (tree decl)
  
FOR_EACH_VEC_SAFE_ELT (ent->bad_decls, ix, bad)

  {
-  int u = decl_jump_unsafe (bad);
+  bool problem = decl_jump_unsafe (bad);
  
-  if (u > 1 && DECL_ARTIFICIAL (bad))

+  if (problem && DECL_ARTIFICIAL (bad))
{
  /* Can't skip init of __exception_info.  */
  if (identified == 1)
@@ -3806,15 +3789,8 @@ check_goto (tree decl)
  saw_catch = true;
}
else if (complained)
-   {
- if (u > 1)
-   inform (DECL_SOURCE_LOCATION (bad),
-   "  skips initialization of %q#D", bad);
- else
-   inform (DECL_SOURCE_LOCATION (bad),
-   "  enters scope of %q#D which has "
-   "non-trivial destructor", bad);
-   }
+

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Alexander Monakov 
> Sent: Friday, May 5, 2023 6:59 PM
> To: Tamar Christina 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Makefile.in: clean up match.pd-related dependencies
> 
> 
> On Fri, 5 May 2023, Tamar Christina wrote:
> 
> > > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches
> > > >  > > patc...@gcc.gnu.org>:
> > > >
> > > > Clean up confusing changes from the recent refactoring for
> > > > parallel match.pd build.
> > > >
> > > > gimple-match-head.o is not built. Remove related flags adjustment.
> > > >
> > > > Autogenerated gimple-match-N.o files do not depend on
> > > > gimple-match-exports.cc.
> > > >
> > > > {gimple,generic)-match-auto.h only depend on the prerequisites of
> > > > the corresponding s-{gimple,generic}-match stamp file, not any .cc file.
> > >
> > > LGTM
> > >
> > > > gcc/ChangeLog:
> > > >
> > > >* Makefile.in: (gimple-match-head.o-warn): Remove.
> > > >(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
> > > >gimple-match-exports.cc.
> > > >(gimple-match-auto.h): Only depend on s-gimple-match.
> > > >(generic-match-auto.h): Likewise.
> > > > ---
> > > >
> > > > Tamar, do I understand correctly that you do not have more plans
> > > > for match.pd and I won't collide with you if I attempt more
> > > > cleanups in this
> > > area? Thanks!
> >
> > No, but I'm also not sure why this change.
> > The idea here was that if gimple-head-export.cc changes you must have
> > changed genmatch.cc and so you need to regenerate the gimple-match-*
> which could change the header.
> 
> gimple-head-export.cc does not exist.
> 
> gimple-match-exports.cc is not a generated file. It's under source control and
> edited independently from genmatch.cc. It is compiled separately, producing
> gimple-match-exports.o.
> 
> gimple-match-head.cc is also not a generated file, also under source control.
> It is transitively included into gimple-match-N.o files. If it changes, they 
> will be
> rebuilt. This is not changed by my patch.
> 
> gimple-match-auto.h is a generated file. It depends on s-gimple-match stamp
> file, which in turn depends on genmatch and match.pd. If either changes, the
> rule for the stamp file triggers. gimple-match-N.o files also depend on the
> stamp file, so they will be rebuilt as well.

s-gimple-match does not depend on gimple-match-head.cc. if it changes the stamp
is not invalidated. 

This happens to work because gimple-match-N.cc does depend on 
gimple-match-head.cc,
but if the gimple-match-N.cc already exists then nothing changes.

So I don't think this changes anything. If anything I would say the stamp file 
needs to
depend on gimple-match-head.cc. 

Thanks,
Tamar

> 
> Is there some problem I'm not seeing?
> 
> Thanks.
> Alexander
> 
> > So not sure I agree with this.
> >
> > Thanks,
> > Tamar
> >
> > > >
> > > > gcc/Makefile.in | 9 +++--
> > > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
> > > > 7e7ac078c5..0cc13c37d0 100644
> > > > --- a/gcc/Makefile.in
> > > > +++ b/gcc/Makefile.in
> > > > @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> > > > libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn =
> > > > -Wno-error libgcov-merge-tool.o-warn = -Wno-error
> > > > -gimple-match-head.o-warn = -Wno-unused
> > > > gimple-match-exports.o-warn
> > > =
> > > > -Wno-unused dfp.o-warn = -Wno-strict-aliasing
> > > >
> > > > @@ -2674,12 +2673,10 @@ s-tm-texi: build/genhooks$(build_exeext)
> > > $(srcdir)/doc/tm.texi.in
> > > >  false; \
> > > >fi
> > > >
> > > > -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-
> head.cc \
> > > > -gimple-match-exports.cc; @true
> > > > -gimple-match-auto.h: s-gimple-match gimple-match-head.cc \
> > > > -gimple-match-exports.cc; @true
> > > > +$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-
> head.cc;
> > > > +@true
> > > > +gimple-match-auto.h: s-gimple-match; @true
> > > > $(GENERIC_MATCH_PD_SEQ_SRC): s-generic-match
> > > > generic-match-head.cc; @true
> > > > -generic-match-auto.h: s-generic-match generic-match-head.cc;
> > > > @true
> > > > +generic-match-auto.h: s-generic-match; @true
> > > >
> > > > s-gimple-match: build/genmatch$(build_exeext) \
> > > >$(srcdir)/match.pd cfn-operators.pd
> > > > --
> > > > 2.39.2
> > > >
> >

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Alexander Monakov via Gcc-patches



On Fri, 5 May 2023, Tamar Christina wrote:

> > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches  > patc...@gcc.gnu.org>:
> > >
> > > Clean up confusing changes from the recent refactoring for parallel
> > > match.pd build.
> > >
> > > gimple-match-head.o is not built. Remove related flags adjustment.
> > >
> > > Autogenerated gimple-match-N.o files do not depend on
> > > gimple-match-exports.cc.
> > >
> > > {gimple,generic)-match-auto.h only depend on the prerequisites of the
> > > corresponding s-{gimple,generic}-match stamp file, not any .cc file.
> > 
> > LGTM
> > 
> > > gcc/ChangeLog:
> > >
> > >* Makefile.in: (gimple-match-head.o-warn): Remove.
> > >(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
> > >gimple-match-exports.cc.
> > >(gimple-match-auto.h): Only depend on s-gimple-match.
> > >(generic-match-auto.h): Likewise.
> > > ---
> > >
> > > Tamar, do I understand correctly that you do not have more plans for
> > > match.pd and I won't collide with you if I attempt more cleanups in this
> > area? Thanks!
> 
> No, but I'm also not sure why this change.
> The idea here was that if gimple-head-export.cc changes you must have changed
> genmatch.cc and so you need to regenerate the gimple-match-* which could 
> change the header.

gimple-head-export.cc does not exist.

gimple-match-exports.cc is not a generated file. It's under source control and
edited independently from genmatch.cc. It is compiled separately, producing
gimple-match-exports.o.

gimple-match-head.cc is also not a generated file, also under source control.
It is transitively included into gimple-match-N.o files. If it changes, they
will be rebuilt. This is not changed by my patch.

gimple-match-auto.h is a generated file. It depends on s-gimple-match stamp
file, which in turn depends on genmatch and match.pd. If either changes, the
rule for the stamp file triggers. gimple-match-N.o files also depend on the
stamp file, so they will be rebuilt as well.

Is there some problem I'm not seeing?

Thanks.
Alexander

> So not sure I agree with this.
> 
> Thanks,
> Tamar
> 
> > >
> > > gcc/Makefile.in | 9 +++--
> > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
> > > 7e7ac078c5..0cc13c37d0 100644
> > > --- a/gcc/Makefile.in
> > > +++ b/gcc/Makefile.in
> > > @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> > > libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn =
> > > -Wno-error libgcov-merge-tool.o-warn = -Wno-error
> > > -gimple-match-head.o-warn = -Wno-unused gimple-match-exports.o-warn
> > =
> > > -Wno-unused dfp.o-warn = -Wno-strict-aliasing
> > >
> > > @@ -2674,12 +2673,10 @@ s-tm-texi: build/genhooks$(build_exeext)
> > $(srcdir)/doc/tm.texi.in
> > >  false; \
> > >fi
> > >
> > > -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc \
> > > -gimple-match-exports.cc; @true
> > > -gimple-match-auto.h: s-gimple-match gimple-match-head.cc \
> > > -gimple-match-exports.cc; @true
> > > +$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc;
> > > +@true
> > > +gimple-match-auto.h: s-gimple-match; @true
> > > $(GENERIC_MATCH_PD_SEQ_SRC): s-generic-match generic-match-head.cc;
> > > @true
> > > -generic-match-auto.h: s-generic-match generic-match-head.cc; @true
> > > +generic-match-auto.h: s-generic-match; @true
> > >
> > > s-gimple-match: build/genmatch$(build_exeext) \
> > >$(srcdir)/match.pd cfn-operators.pd
> > > --
> > > 2.39.2
> > >
>

Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 01:32:02PM -0400, Jason Merrill wrote:
> > --- gcc/ada/gcc-interface/utils2.cc.jj  2023-01-16 23:19:05.539727388 
> > +0100
> > +++ gcc/ada/gcc-interface/utils2.cc 2023-05-05 15:37:30.193990948 +0200
> > @@ -3332,6 +3332,7 @@ gnat_invariant_expr (tree expr)
> > case IMAGPART_EXPR:
> > case VIEW_CONVERT_EXPR:
> > CASE_CONVERT:
> > +   case SAVE_EXPR:
> 
> I guess doing this would allow gnat_invariant_expr to handle
> DECL_INVARIANT_P that save_expr doesn't know about.  But it seems that it
> makes the same assumption as tree_invariant_p_1 about the pointed-to object
> not changing:
> 
> > case INDIRECT_REF:
> >   if ((!invariant_p && !TREE_READONLY (t)) || TREE_SIDE_EFFECTS (t))
> > return NULL_TREE;
> 
> I don't know if this assumption is any more valid in Ada than in C/C++.

I think we really need Eric (as one who e.g. introduced the
DECL_INVARIANT_P apparently for this kind of stuff) to have a look at that on 
the
Ada side.

The question is if the posted tree.cc (smallest) patch + 3 new testcases
+ the 7 ada testsuite workarounds are ok for trunk if it passes
bootstrap/regtest, then I'd file a PR about the Ada regression and only once
it is dealt with would consider backporting, or if we need to wait for Eric
before making progress.

Jakub

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Tamar Christina via Gcc-patches

> > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches  patc...@gcc.gnu.org>:
> >
> > Clean up confusing changes from the recent refactoring for parallel
> > match.pd build.
> >
> > gimple-match-head.o is not built. Remove related flags adjustment.
> >
> > Autogenerated gimple-match-N.o files do not depend on
> > gimple-match-exports.cc.
> >
> > {gimple,generic)-match-auto.h only depend on the prerequisites of the
> > corresponding s-{gimple,generic}-match stamp file, not any .cc file.
> 
> LGTM
> 
> > gcc/ChangeLog:
> >
> >* Makefile.in: (gimple-match-head.o-warn): Remove.
> >(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
> >gimple-match-exports.cc.
> >(gimple-match-auto.h): Only depend on s-gimple-match.
> >(generic-match-auto.h): Likewise.
> > ---
> >
> > Tamar, do I understand correctly that you do not have more plans for
> > match.pd and I won't collide with you if I attempt more cleanups in this
> area? Thanks!

No, but I'm also not sure why this change.
The idea here was that if gimple-head-export.cc changes you must have changed
genmatch.cc and so you need to regenerate the gimple-match-* which could change 
the header.

So not sure I agree with this.

Thanks,
Tamar

> >
> > gcc/Makefile.in | 9 +++--
> > 1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
> > 7e7ac078c5..0cc13c37d0 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> > libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn =
> > -Wno-error libgcov-merge-tool.o-warn = -Wno-error
> > -gimple-match-head.o-warn = -Wno-unused gimple-match-exports.o-warn
> =
> > -Wno-unused dfp.o-warn = -Wno-strict-aliasing
> >
> > @@ -2674,12 +2673,10 @@ s-tm-texi: build/genhooks$(build_exeext)
> $(srcdir)/doc/tm.texi.in
> >  false; \
> >fi
> >
> > -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc \
> > -gimple-match-exports.cc; @true
> > -gimple-match-auto.h: s-gimple-match gimple-match-head.cc \
> > -gimple-match-exports.cc; @true
> > +$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc;
> > +@true
> > +gimple-match-auto.h: s-gimple-match; @true
> > $(GENERIC_MATCH_PD_SEQ_SRC): s-generic-match generic-match-head.cc;
> > @true
> > -generic-match-auto.h: s-generic-match generic-match-head.cc; @true
> > +generic-match-auto.h: s-generic-match; @true
> >
> > s-gimple-match: build/genmatch$(build_exeext) \
> >$(srcdir)/match.pd cfn-operators.pd
> > --
> > 2.39.2
> >

[PATCH] c++: goto entering scope of obj w/ non-trivial dtor [PR103091]

2023-05-05 Thread Patrick Palka via Gcc-patches

It seems DR 2256 permitted goto to cross the initialization of a
trivially initialized object with a non-trivial destructor.  We
already supported this as an -fpermissive extension, so this patch
just makes us unconditionally support this.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?

DR 2256
PR c++/103091

gcc/cp/ChangeLog:

* decl.cc (decl_jump_unsafe): Return bool instead of int.
Don't consider TYPE_HAS_NONTRIVIAL_DESTRUCTOR.
(check_previous_goto_1): Simplify now that decl_jump_unsafe
returns bool instead of int.
(check_goto): Likewise.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.other/init9.C: Don't expect diagnostics for
goto made valid by DR 2256.
* g++.dg/init/goto4.C: New test.
---
 gcc/cp/decl.cc   | 56 ++--
 gcc/testsuite/g++.dg/init/goto4.C| 22 
 gcc/testsuite/g++.old-deja/g++.other/init9.C |  7 +--
 3 files changed, 42 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/init/goto4.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 71d33d2b7a4..23a2b2fef0b 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -69,7 +69,7 @@ enum bad_spec_place {
 
 static const char *redeclaration_error_message (tree, tree);
 
-static int decl_jump_unsafe (tree);
+static bool decl_jump_unsafe (tree);
 static void require_complete_types_for_parms (tree);
 static tree grok_reference_init (tree, tree, tree, int);
 static tree grokvardecl (tree, tree, tree, const cp_decl_specifier_seq *,
@@ -3548,10 +3548,9 @@ declare_local_label (tree id)
   return ent ? ent->label_decl : NULL_TREE;
 }
 
-/* Returns nonzero if it is ill-formed to jump past the declaration of
-   DECL.  Returns 2 if it's also a real problem.  */
+/* Returns true if it is ill-formed to jump past the declaration of DECL.  */
 
-static int
+static bool
 decl_jump_unsafe (tree decl)
 {
   /* [stmt.dcl]/3: A program that jumps from a point where a local variable
@@ -3562,18 +3561,11 @@ decl_jump_unsafe (tree decl)
  preceding types and is declared without an initializer (8.5).  */
   tree type = TREE_TYPE (decl);
 
-  if (!VAR_P (decl) || TREE_STATIC (decl)
-  || type == error_mark_node)
-return 0;
-
-  if (DECL_NONTRIVIALLY_INITIALIZED_P (decl)
-  || variably_modified_type_p (type, NULL_TREE))
-return 2;
-
-  if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type))
-return 1;
-
-  return 0;
+  return (type != error_mark_node
+ && VAR_P (decl)
+ && !TREE_STATIC (decl)
+ && (DECL_NONTRIVIALLY_INITIALIZED_P (decl)
+ || variably_modified_type_p (type, NULL_TREE)));
 }
 
 /* A subroutine of check_previous_goto_1 and check_goto to identify a branch
@@ -3625,27 +3617,18 @@ check_previous_goto_1 (tree decl, cp_binding_level* 
level, tree names,
   new_decls = (DECL_P (new_decls) ? DECL_CHAIN (new_decls)
: TREE_CHAIN (new_decls)))
{
- int problem = decl_jump_unsafe (new_decls);
+ bool problem = decl_jump_unsafe (new_decls);
  if (! problem)
continue;
 
  if (!identified)
{
- complained = identify_goto (decl, input_location, locus,
- problem > 1
- ? DK_ERROR : DK_PERMERROR);
+ complained = identify_goto (decl, input_location, locus, 
DK_ERROR);
  identified = 1;
}
  if (complained)
-   {
- if (problem > 1)
-   inform (DECL_SOURCE_LOCATION (new_decls),
-   "  crosses initialization of %q#D", new_decls);
- else
-   inform (DECL_SOURCE_LOCATION (new_decls),
-   "  enters scope of %q#D, which has "
-   "non-trivial destructor", new_decls);
-   }
+   inform (DECL_SOURCE_LOCATION (new_decls),
+   "  crosses initialization of %q#D", new_decls);
}
 
   if (b == level)
@@ -3790,9 +3773,9 @@ check_goto (tree decl)
 
   FOR_EACH_VEC_SAFE_ELT (ent->bad_decls, ix, bad)
 {
-  int u = decl_jump_unsafe (bad);
+  bool problem = decl_jump_unsafe (bad);
 
-  if (u > 1 && DECL_ARTIFICIAL (bad))
+  if (problem && DECL_ARTIFICIAL (bad))
{
  /* Can't skip init of __exception_info.  */
  if (identified == 1)
@@ -3806,15 +3789,8 @@ check_goto (tree decl)
  saw_catch = true;
}
   else if (complained)
-   {
- if (u > 1)
-   inform (DECL_SOURCE_LOCATION (bad),
-   "  skips initialization of %q#D", bad);
- else
-   inform (DECL_SOURCE_LOCATION (bad),
-   "  enters scope of %q#D which has "
-   "non-trivial destructor", bad);
-   }
+   inform (DECL_SOURCE_LOCATION (bad),
+   "  skips initialization

[PATCH] c++: list CTAD and resolve_nondeduced_context [PR106214]

2023-05-05 Thread Patrick Palka via Gcc-patches

This extends the PR93107 fix, which made us do resolve_nondeduced_context
on the elements of an initializer list during auto deduction, to happen
for CTAD as well.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/106214
PR c++/93107

gcc/cp/ChangeLog:

* pt.cc (do_auto_deduction): Move up resolve_nondeduced_context
calls to happen before do_class_deduction.  Add some error_mark_node
tests.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction114.C: New test.
---
 gcc/cp/pt.cc  | 27 +-
 .../g++.dg/cpp1z/class-deduction114.C | 28 +++
 2 files changed, 41 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction114.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8d66fde9f11..94e1664d00c 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30793,7 +30793,7 @@ do_auto_deduction (tree type, tree init, tree auto_node,
   int flags /* = LOOKUP_NORMAL */,
   tree tmpl /* = NULL_TREE */)
 {
-  if (init == error_mark_node)
+  if (type == error_mark_node || init == error_mark_node)
 return error_mark_node;
 
   if (init && type_dependent_expression_p (init)
@@ -30827,6 +30827,17 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
/*return*/true)))
 init = r;
 
+  if (init && BRACE_ENCLOSED_INITIALIZER_P (init))
+{
+  /* We don't recurse here because we can't deduce from a nested
+initializer_list.  */
+  if (CONSTRUCTOR_ELTS (init))
+   for (constructor_elt &elt : CONSTRUCTOR_ELTS (init))
+ elt.value = resolve_nondeduced_context (elt.value, complain);
+}
+  else if (init)
+init = resolve_nondeduced_context (init, complain);
+
   if (tree ctmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node))
 /* C++17 class template argument deduction.  */
 return do_class_deduction (type, ctmpl, init, flags, complain);
@@ -30861,24 +30872,12 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
}
 }
 
-  if (type == error_mark_node)
+  if (type == error_mark_node || init == error_mark_node)
 return error_mark_node;
 
-  if (BRACE_ENCLOSED_INITIALIZER_P (init))
-{
-  /* We don't recurse here because we can't deduce from a nested
-initializer_list.  */
-  if (CONSTRUCTOR_ELTS (init))
-   for (constructor_elt &elt : CONSTRUCTOR_ELTS (init))
- elt.value = resolve_nondeduced_context (elt.value, complain);
-}
-  else
-init = resolve_nondeduced_context (init, complain);
-
   tree targs;
   if (context == adc_decomp_type
   && auto_node == type
-  && init != error_mark_node
   && TREE_CODE (TREE_TYPE (init)) == ARRAY_TYPE)
 {
   /* [dcl.struct.bind]/1 - if decomposition declaration has no 
ref-qualifiers
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction114.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction114.C
new file mode 100644
index 000..ba6921d1b96
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction114.C
@@ -0,0 +1,28 @@
+// PR c++/106214
+// { dg-do compile { target c++17 } }
+// A version of cpp0x/initlist-deduce3.C using list CTAD instead
+// of ordinary auto deduction from std::initializer_list.
+
+using size_t = decltype(sizeof 0);
+
+namespace std {
+  template struct initializer_list {
+const T *ptr;
+size_t n;
+initializer_list(const T*, size_t);
+  };
+}
+
+template
+void Task() {}
+
+template
+struct vector {
+  vector(std::initializer_list);
+};
+
+vector a = &Task; // { dg-error "deduction|no match" }
+vector b = { &Task };
+vector e{ &Task };
+vector f = { &Task, &Task };
+vector d = { static_cast(&Task) };
-- 
2.40.1.476.g69c786637d

Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-05 Thread Jason Merrill via Gcc-patches


On 5/5/23 09:40, Jakub Jelinek wrote:

On Fri, May 05, 2023 at 07:38:45AM -0400, Jason Merrill wrote:

On 5/5/23 06:45, Jakub Jelinek wrote:

+  if (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t))
+{
+  /* Return true for const qualified vars, but for members or array
+elements without side-effects return true only if the base
+object is a decl.  If the base is e.g. a pointer dereference,
+what the pointer points to could be deallocated or the pointer
+could be changed.  See PR52339.  */
+  tree base = get_base_address (t);
+  if (DECL_P (base))
+   return true;


So I think the above is correct.


Ok, will test it with testsuite adjustments for the Ada testcases.
See below.


+  /* As an exception, allow pointer dereferences as long as the pointer
+is invariant.  */
+  if (TREE_CODE (base) == INDIRECT_REF
+ && tree_invariant_p_1 (get_base_address (TREE_OPERAND (base, 0
+   return true;


And this is unsafe.


Ok, idea withdrawn.

Had further look at the vect6.adb case, but I think it is for the Ada people
to analyze.

The *.original dump differences there are as I said instead of using
r.P_BOUNDS->LB0
r.P_BOUNDS->UB0
x.P_BOUNDS->LB0
x.P_BOUNDS->UB0
wrap those into SAVE_EXPR in various places (that is the expected part,
that is what the patch was about), but also:
-SAVE_EXPR LB0 < r.P_BOUNDS->LB0 || x.P_BOUNDS->UB0 > 
r.P_BOUNDS->UB0>;
  <<< Unknown tree: loop_stmt
I.0 != (unsigned long) vect6__add__L_1__T3b___U
I.0 = I.0 + 1;
i = (vect6_pkg__index_type) I.0;
-  if ((SAVE_EXPR LB0 < r.P_BOUNDS->LB0 || x.P_BOUNDS->UB0 > r.P_BOUNDS->UB0>) 
&& .BUILTIN_EXPECT (r.P_BOUNDS->LB0 > i || r.P_BOUNDS->UB0 < i, 0, 15))
+  if (SAVE_EXPR LB0> > i || SAVE_EXPR UB0> < 
i)
  {
.gnat_rcheck_CE_Index_Check ("vect6.adb", 9);
  }


From this diff it looks like the change stops looking at x.P_BOUNDS 
entirely, which seems like more of an optimization than hoisting that 
check out of the loop?



So, if the {x,r}.P_BOUNDS->{U,B}B0 expressions aren't wrapped into
SAVE_EXPRs, something in the FE decides to evaluate
x.P_BOUNDS->LB0 < r.P_BOUNDS->LB0 || x.P_BOUNDS->UB0 > r.P_BOUNDS->UB0
expression into a temporary before the loop and && the bounds condition
inside of the loop with it, while with the patch that doesn't happen.
And, that turns out in loop unswitching being successful without my patch
and not with my patch, where we can vectorize the unswitched loop without
the .gnat_rcheck_CE_Index_Check call.

Perhaps ada/gcc-interface/utils2.cc (gnat_invariant_expr) could be taught
to handle SAVE_EXPR by looking at its operand?
--- gcc/ada/gcc-interface/utils2.cc.jj  2023-01-16 23:19:05.539727388 +0100
+++ gcc/ada/gcc-interface/utils2.cc 2023-05-05 15:37:30.193990948 +0200
@@ -3332,6 +3332,7 @@ gnat_invariant_expr (tree expr)
case IMAGPART_EXPR:
case VIEW_CONVERT_EXPR:
CASE_CONVERT:
+   case SAVE_EXPR:


I guess doing this would allow gnat_invariant_expr to handle 
DECL_INVARIANT_P that save_expr doesn't know about.  But it seems that 
it makes the same assumption as tree_invariant_p_1 about the pointed-to 
object not changing:



case INDIRECT_REF:
  if ((!invariant_p && !TREE_READONLY (t)) || TREE_SIDE_EFFECTS (t))
return NULL_TREE;


I don't know if this assumption is any more valid in Ada than in C/C++.


  break;
  
  	case INDIRECT_REF:

fixes the vect{1,2,3,4,5,6}.adb regressions but not the
loop_optimization21.adb one.  But I'm afraid I really have no idea what
that code is doing.

2023-05-05  Jakub Jelinek  

PR c++/52339
* tree.cc (tree_invariant_p_1): For TREE_READONLY (t) without
side-effects, only return true if DECL_P (get_base_address (t)).

* g++.dg/opt/pr52339.C: New test.
* gcc.c-torture/execute/pr52339-1.c: New test.
* gcc.c-torture/execute/pr52339-2.c: New test.
* gnat.dg/loop_optimization21.adb: Adjust expected match count.
* gnat.dg/vect1.adb: Likewise.
* gnat.dg/vect2.adb: Likewise.
* gnat.dg/vect3.adb: Likewise.
* gnat.dg/vect4.adb: Likewise.
* gnat.dg/vect5.adb: Likewise.
* gnat.dg/vect6.adb: Likewise.

--- gcc/tree.cc.jj  2023-05-01 09:59:46.686293833 +0200
+++ gcc/tree.cc 2023-05-05 10:19:19.061827355 +0200
@@ -3876,10 +3876,21 @@ tree_invariant_p_1 (tree t)
  {
tree op;
  
-  if (TREE_CONSTANT (t)

-  || (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t)))
+  if (TREE_CONSTANT (t))
  return true;
  
+  if (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t))

+{
+  /* Return true for const qualified vars, but for members or array
+elements without side-effects return true only if the base
+object is a decl.  If the base is e.g. a pointer dereference,
+what the pointer points to could be deallocated or the pointer
+

[RFC] RISC-V: Add proposed Ztso atomic mappings

2023-05-05 Thread Patrick O'Neill

The RISC-V Ztso extension currently has no effect on generated code.
With the additional ordering constraints guarenteed by Ztso, we can emit
more optimized atomic mappings than the RVWMO mappings.

This patch implements Andrea Parri's proposed Ztso mappings ("Proposed
Mapping").
  https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst

LLVM has implemented this same mapping (Ztso is still behind a
experimental flag in LLVM, so there is *not* a defined ABI for this yet).
  https://reviews.llvm.org/D143076

2023-05-04 Patrick O'Neill 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Ztso and mark Ztso as
dependent on 'a' extension.
* config/riscv/riscv-opts.h (MASK_ZTSO): New mask.
(TARGET_ZTSO): New target.
* config/riscv/riscv.cc (riscv_memmodel_needs_amo_acquire): Add
Ztso case.
(riscv_memmodel_needs_amo_release): Add Ztso case.
(riscv_print_operand): Add Ztso case for LR/SC annotations.
* config/riscv/riscv.md: Import sync-rvwmo.md and sync-ztso.md.
* config/riscv/riscv.opt: Add Ztso target variable.
* config/riscv/sync.md (mem_thread_fence_1): Expand to RVWMO or
Ztso specific insn.
(atomic_load): Expand to RVWMO or Ztso specific insn.
(atomic_store): Expand to RVWMO or Ztso specific insn.
* config/riscv/sync-rvwmo.md: New file. Seperate out RVWMO
specific load/store/fence mappings.
* config/riscv/sync-ztso.md: New file. Seperate out Ztso
specific load/store/fence mappings.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-1.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-2.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-3.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-4.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-5.c: New test.
* gcc.target/riscv/amo-table-ztso-load-1.c: New test.
* gcc.target/riscv/amo-table-ztso-load-2.c: New test.
* gcc.target/riscv/amo-table-ztso-load-3.c: New test.
* gcc.target/riscv/amo-table-ztso-store-1.c: New test.
* gcc.target/riscv/amo-table-ztso-store-2.c: New test.
* gcc.target/riscv/amo-table-ztso-store-3.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
---
 gcc/common/config/riscv/riscv-common.cc   |   4 +
 gcc/config/riscv/riscv-opts.h |   4 +
 gcc/config/riscv/riscv.cc |  18 ++-
 gcc/config/riscv/riscv.md |   2 +
 gcc/config/riscv/riscv.opt|   3 +
 gcc/config/riscv/sync-rvwmo.md|  96 
 gcc/config/riscv/sync-ztso.md |  71 
 gcc/config/riscv/sync.md  | 107 ++
 .../riscv/amo-table-ztso-amo-add-1.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-2.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-3.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-4.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-5.c  |  15 +++
 .../riscv/amo-table-ztso-compare-exchange-1.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-2.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-3.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-4.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-5.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-6.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-7.c |  10 ++
 .../gcc.target/riscv/amo-table-ztso-fence-1.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-2.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-3.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-4.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-5.c |  15 +++
 .../gcc.target

Re: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Richard Biener via Gcc-patches




> Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches 
> :
> 
> Clean up confusing changes from the recent refactoring for
> parallel match.pd build.
> 
> gimple-match-head.o is not built. Remove related flags adjustment.
> 
> Autogenerated gimple-match-N.o files do not depend on
> gimple-match-exports.cc.
> 
> {gimple,generic)-match-auto.h only depend on the prerequisites of the
> corresponding s-{gimple,generic}-match stamp file, not any .cc file.

LGTM

> gcc/ChangeLog:
> 
>* Makefile.in: (gimple-match-head.o-warn): Remove.
>(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
>gimple-match-exports.cc.
>(gimple-match-auto.h): Only depend on s-gimple-match.
>(generic-match-auto.h): Likewise.
> ---
> 
> Tamar, do I understand correctly that you do not have more plans for match.pd
> and I won't collide with you if I attempt more cleanups in this area? Thanks!
> 
> gcc/Makefile.in | 9 +++--
> 1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 7e7ac078c5..0cc13c37d0 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> libgcov-util.o-warn = -Wno-error
> libgcov-driver-tool.o-warn = -Wno-error
> libgcov-merge-tool.o-warn = -Wno-error
> -gimple-match-head.o-warn = -Wno-unused
> gimple-match-exports.o-warn = -Wno-unused
> dfp.o-warn = -Wno-strict-aliasing
> 
> @@ -2674,12 +2673,10 @@ s-tm-texi: build/genhooks$(build_exeext) 
> $(srcdir)/doc/tm.texi.in
>  false; \
>fi
> 
> -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc \
> -gimple-match-exports.cc; @true
> -gimple-match-auto.h: s-gimple-match gimple-match-head.cc \
> -gimple-match-exports.cc; @true
> +$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc; @true
> +gimple-match-auto.h: s-gimple-match; @true
> $(GENERIC_MATCH_PD_SEQ_SRC): s-generic-match generic-match-head.cc; @true
> -generic-match-auto.h: s-generic-match generic-match-head.cc; @true
> +generic-match-auto.h: s-generic-match; @true
> 
> s-gimple-match: build/genmatch$(build_exeext) \
>$(srcdir)/match.pd cfn-operators.pd
> -- 
> 2.39.2
>

Re: [PATCH v6 0/9] RISC-V: autovec: Add autovec support

2023-05-05 Thread Michael Collison

Because everyone was commenting that we needed vector load/store support 
(including Juzhe). Juzhe specifically pointed me to his patch for the 
load/store patterns in his review of my code. Would you like me to 
remove the patterns?


On 5/5/23 12:34, Kito Cheng wrote:

Errr, why you just mixed in JuZhe’s patch set into this patch set?

Michael Collison 於 2023年5月5日 週五，23:47寫道：

This series of patches adds foundational support for RISC-V
auto-vectorization support. These patches are based on the current
upstream rvv vector intrinsic support and is not a new
implementation. Most of the implementation consists of adding the
new vector cost model, the autovectorization patterns themselves
and target hooks. This implementation only provides support for
integer addition and subtraction as a proof of concept. This patch
set should not be construed to be feature complete. Based on
conversations with the community these patches are intended to lay
the groundwork for feature completion and collaboration within the
RISC-V community.

These patches are largely based off the work of Juzhe Zhong
(juzhe.zh...@rivai.ai) of RiVAI. More
specifically the rvv-next branch at:
https://github.com/riscv-collab/riscv-gcc.git
is the foundation
of this patch set.

As discussed on this list, if these patches are approved they will
be merged into a "auto-vectorization" branch once gcc-13 branches
for release. There are two known issues related to crashes (assert
failures) associated with tree vectorization; one of which I have
sent a patch for and have received feedback.

Changes in v6:
- Incorporated upstream comments, added target hook for
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

Changes in v5:

- Incorporated upstream comments large to delete unnecessary code

Changes in v4:

- Added support for binary integer operations and test cases
- Fixed bug to support 8-bit integer vectorization
- Fixed several assert errors related to non-multiple of two
vector modes

Changes in v3:

- Removed the cost model and cost hooks based on feedback from
Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2

- Updated ChangeLog entry to include RiVAI contributions
- Fixed ChangeLog email formatting
- Fixed gnu formatting issues in the code

Kevin Lee (1):
  RISC-V:autovec: This patch supports 8 bit auto-vectorization in
riscv.

Michael Collison (8):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for binary integer &
    len_load/store
  RISC-V:autovec: Add autovectorization tests for add & sub
  vect: Verify that GET_MODE_NUNITS is a multiple of 2.
  RISC-V:autovec: Add autovectorization tests for binary integer

 gcc/config/riscv/riscv-opts.h                 |  10 ++
 gcc/config/riscv/riscv-protos.h               |   9 ++
 gcc/config/riscv/riscv-v.cc                   |  91 
 gcc/config/riscv/riscv-vector-builtins.cc     |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h      |   3 +
 gcc/config/riscv/riscv.cc                     | 130
++
 gcc/config/riscv/riscv.md                     |   1 +
 gcc/config/riscv/vector-auto.md               |  74 ++
 gcc/config/riscv/vector.md                    |   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 
 .../riscv/rvv/autovec/loop-and-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 
 .../riscv/rvv/autovec/loop-div-rv32.c         |  27 
 .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 
 .../riscv/rvv/autovec/loop-max-rv32.c         |  26 
 .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 
 .../riscv/rvv/autovec/loop-min-rv32.c         |  26 
 .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 
 .../riscv/rvv/autovec/loop-mod-rv32.c         |  27 
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 
 .../riscv/rvv/autovec/loop-mul-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 
 .../riscv/rvv/autovec/loop-or-rv32.c          |  25 
 .../gcc.target/riscv/rvv/autovec/loop-or.c    |  25 
 .../riscv/rvv/autovec/loop-sub-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 
 .../riscv/rvv/autovec/loop-xor-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   |  25 
 gcc/testsuite/gcc.tar

[PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Alexander Monakov via Gcc-patches

Clean up confusing changes from the recent refactoring for
parallel match.pd build.

gimple-match-head.o is not built. Remove related flags adjustment.

Autogenerated gimple-match-N.o files do not depend on
gimple-match-exports.cc.

{gimple,generic)-match-auto.h only depend on the prerequisites of the
corresponding s-{gimple,generic}-match stamp file, not any .cc file.

gcc/ChangeLog:

* Makefile.in: (gimple-match-head.o-warn): Remove.
(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
gimple-match-exports.cc.
(gimple-match-auto.h): Only depend on s-gimple-match.
(generic-match-auto.h): Likewise.
---

Tamar, do I understand correctly that you do not have more plans for match.pd
and I won't collide with you if I attempt more cleanups in this area? Thanks!

 gcc/Makefile.in | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7e7ac078c5..0cc13c37d0 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
 libgcov-util.o-warn = -Wno-error
 libgcov-driver-tool.o-warn = -Wno-error
 libgcov-merge-tool.o-warn = -Wno-error
-gimple-match-head.o-warn = -Wno-unused
 gimple-match-exports.o-warn = -Wno-unused
 dfp.o-warn = -Wno-strict-aliasing
 
@@ -2674,12 +2673,10 @@ s-tm-texi: build/genhooks$(build_exeext) 
$(srcdir)/doc/tm.texi.in
  false; \
fi
 
-$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc \
-   gimple-match-exports.cc; @true
-gimple-match-auto.h: s-gimple-match gimple-match-head.cc \
-   gimple-match-exports.cc; @true
+$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc; @true
+gimple-match-auto.h: s-gimple-match; @true
 $(GENERIC_MATCH_PD_SEQ_SRC): s-generic-match generic-match-head.cc; @true
-generic-match-auto.h: s-generic-match generic-match-head.cc; @true
+generic-match-auto.h: s-generic-match; @true
 
 s-gimple-match: build/genmatch$(build_exeext) \
$(srcdir)/match.pd cfn-operators.pd
-- 
2.39.2

[PATCH] ira: Don't create copies for earlyclobbered pairs

2023-05-05 Thread Richard Sandiford via Gcc-patches

This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a
("the previous patch").  To start by quoting that:

If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn.  Allocating the
same register to the input and the output then saves the cost of a move.

If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency.  This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.

This heuristic seems to work well in the vast majority of cases.
However, the problem fixed in the previous patch was that we
could create a copy for an operand pair even if, for all relevant
alternatives, the output and input register classes did not have
any registers in common.  It is then impossible for the output
operand to reuse the dying input register.

This left unfixed a further case where copies don't make sense:
there is no point trying to reuse the dying input register if,
for all relevant alternatives, the output is earlyclobbered and
the input doesn't match the output.  (Matched earlyclobbers are fine.)

Handling that case fixes several existing XFAILs and helps with
a follow-on aarch64 patch.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  A SPEC2017 run
on aarch64 showed no differences outside the noise.  Also, I tried
compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
per cpu directory, using the options -Os -fno-schedule-insns{,2}.
The results below summarise the tests that showed a difference in LOC:

Target   Tests   GoodBad   DeltaBest   Worst  Median
==   =   ===   =   =  ==
amdgcn-amdhsa   14  7  7   3 -18  10  -1
arm-linux-gnueabihf 16 15  1 -22  -4   2  -1
csky-elf 6  6  0 -21  -6  -2  -4
hppa64-hp-hpux11.23  5  5  0  -7  -2  -1  -1
ia64-linux-gnu  16 16  0 -70 -15  -1  -3
m32r-elf53  1 52  64  -2   8   1
mcore-elf2  2  0  -8  -6  -2  -6
microblaze-elf 285283  2-909 -68   4  -1
mmix 7  7  0   -2101   -2091  -1  -1
msp430-elf   1  1  0  -4  -4  -4  -4
pru-elf  8  6  2 -12  -6   2  -2
rx-elf  22 18  4 -40  -5   6  -2
sparc-linux-gnu 15 14  1 -40  -8   1  -2
sparc-wrs-vxworks   15 14  1 -40  -8   1  -2
visium-elf   2  1  1   0  -2   2  -2
xstormy16-elf1  1  0  -2  -2  -2  -2

with other targets showing no sensitivity to the patch.  The only
target that seems to be negatively affected is m32r-elf; otherwise
the patch seems like an extremely minor but still clear improvement.

OK to install?

Richard


gcc/
* ira-conflicts.cc (can_use_same_reg_p): Skip over non-matching
earlyclobbers.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c: Remove XFAILs.
* gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.
---
 gcc/ira-conflicts.cc | 3 +++
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c  | 2 +-
 gcc/testsuite/gcc.target/aarc

[PATCH 07/10] arm: [MVE intrinsics] rework vmovnbq vmovntq vqmovnbq vqmovntq vqmovunbq vqmovuntq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vmovnbq, vmovntq, vqmovnbq, vqmovntq, vqmovunbq, vqmovuntq
using the new MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vmovnbq, vmovntq, vqmovnbq)
(vqmovntq, vqmovunbq, vqmovuntq): New.
* config/arm/arm-mve-builtins-base.def (vmovnbq, vmovntq)
(vqmovnbq, vqmovntq, vqmovunbq, vqmovuntq): New.
* config/arm/arm-mve-builtins-base.h (vmovnbq, vmovntq, vqmovnbq)
(vqmovntq, vqmovunbq, vqmovuntq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vmovnbq,
vmovntq, vqmovnbq, vqmovntq, vqmovunbq, vqmovuntq.
* config/arm/arm_mve.h (vqmovntq): Remove.
(vqmovnbq): Remove.
(vqmovnbq_m): Remove.
(vqmovntq_m): Remove.
(vqmovntq_u16): Remove.
(vqmovnbq_u16): Remove.
(vqmovntq_s16): Remove.
(vqmovnbq_s16): Remove.
(vqmovntq_u32): Remove.
(vqmovnbq_u32): Remove.
(vqmovntq_s32): Remove.
(vqmovnbq_s32): Remove.
(vqmovnbq_m_s16): Remove.
(vqmovntq_m_s16): Remove.
(vqmovnbq_m_u16): Remove.
(vqmovntq_m_u16): Remove.
(vqmovnbq_m_s32): Remove.
(vqmovntq_m_s32): Remove.
(vqmovnbq_m_u32): Remove.
(vqmovntq_m_u32): Remove.
(__arm_vqmovntq_u16): Remove.
(__arm_vqmovnbq_u16): Remove.
(__arm_vqmovntq_s16): Remove.
(__arm_vqmovnbq_s16): Remove.
(__arm_vqmovntq_u32): Remove.
(__arm_vqmovnbq_u32): Remove.
(__arm_vqmovntq_s32): Remove.
(__arm_vqmovnbq_s32): Remove.
(__arm_vqmovnbq_m_s16): Remove.
(__arm_vqmovntq_m_s16): Remove.
(__arm_vqmovnbq_m_u16): Remove.
(__arm_vqmovntq_m_u16): Remove.
(__arm_vqmovnbq_m_s32): Remove.
(__arm_vqmovntq_m_s32): Remove.
(__arm_vqmovnbq_m_u32): Remove.
(__arm_vqmovntq_m_u32): Remove.
(__arm_vqmovntq): Remove.
(__arm_vqmovnbq): Remove.
(__arm_vqmovnbq_m): Remove.
(__arm_vqmovntq_m): Remove.
(vmovntq): Remove.
(vmovnbq): Remove.
(vmovnbq_m): Remove.
(vmovntq_m): Remove.
(vmovntq_u16): Remove.
(vmovnbq_u16): Remove.
(vmovntq_s16): Remove.
(vmovnbq_s16): Remove.
(vmovntq_u32): Remove.
(vmovnbq_u32): Remove.
(vmovntq_s32): Remove.
(vmovnbq_s32): Remove.
(vmovnbq_m_s16): Remove.
(vmovntq_m_s16): Remove.
(vmovnbq_m_u16): Remove.
(vmovntq_m_u16): Remove.
(vmovnbq_m_s32): Remove.
(vmovntq_m_s32): Remove.
(vmovnbq_m_u32): Remove.
(vmovntq_m_u32): Remove.
(__arm_vmovntq_u16): Remove.
(__arm_vmovnbq_u16): Remove.
(__arm_vmovntq_s16): Remove.
(__arm_vmovnbq_s16): Remove.
(__arm_vmovntq_u32): Remove.
(__arm_vmovnbq_u32): Remove.
(__arm_vmovntq_s32): Remove.
(__arm_vmovnbq_s32): Remove.
(__arm_vmovnbq_m_s16): Remove.
(__arm_vmovntq_m_s16): Remove.
(__arm_vmovnbq_m_u16): Remove.
(__arm_vmovntq_m_u16): Remove.
(__arm_vmovnbq_m_s32): Remove.
(__arm_vmovntq_m_s32): Remove.
(__arm_vmovnbq_m_u32): Remove.
(__arm_vmovntq_m_u32): Remove.
(__arm_vmovntq): Remove.
(__arm_vmovnbq): Remove.
(__arm_vmovnbq_m): Remove.
(__arm_vmovntq_m): Remove.
(vqmovuntq): Remove.
(vqmovunbq): Remove.
(vqmovunbq_m): Remove.
(vqmovuntq_m): Remove.
(vqmovuntq_s16): Remove.
(vqmovunbq_s16): Remove.
(vqmovuntq_s32): Remove.
(vqmovunbq_s32): Remove.
(vqmovunbq_m_s16): Remove.
(vqmovuntq_m_s16): Remove.
(vqmovunbq_m_s32): Remove.
(vqmovuntq_m_s32): Remove.
(__arm_vqmovuntq_s16): Remove.
(__arm_vqmovunbq_s16): Remove.
(__arm_vqmovuntq_s32): Remove.
(__arm_vqmovunbq_s32): Remove.
(__arm_vqmovunbq_m_s16): Remove.
(__arm_vqmovuntq_m_s16): Remove.
(__arm_vqmovunbq_m_s32): Remove.
(__arm_vqmovuntq_m_s32): Remove.
(__arm_vqmovuntq): Remove.
(__arm_vqmovunbq): Remove.
(__arm_vqmovunbq_m): Remove.
(__arm_vqmovuntq_m): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   6 +
 gcc/config/arm/arm-mve-builtins-base.def |   6 +
 gcc/config/arm/arm-mve-builtins-base.h   |   8 +-
 gcc/config/arm/arm-mve-builtins.cc   |   6 +
 gcc/config/arm/arm_mve.h | 788 ---
 5 files changed, 25 insertions(+), 789 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 4cf4464a48e..1dae12b445b 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -224,12 +224,18 @@ FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
 FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ

[PATCH 04/10] arm: [MVE intrinsics] rework vrndq vrndaq vrndmq vrndnq vrndpq vrndxq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vrndq, vrndaq, vrndmq, vrndnq, vrndpq, vrndxq using the new
MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_F): New.
(vrndaq, vrndmq, vrndnq, vrndpq, vrndq, vrndxq): New.
* config/arm/arm-mve-builtins-base.def (vrndaq, vrndmq, vrndnq)
(vrndpq, vrndq, vrndxq): New.
* config/arm/arm-mve-builtins-base.h (vrndaq, vrndmq, vrndnq)
(vrndpq, vrndq, vrndxq): New.
* config/arm/arm_mve.h (vrndxq): Remove.
(vrndq): Remove.
(vrndpq): Remove.
(vrndnq): Remove.
(vrndmq): Remove.
(vrndaq): Remove.
(vrndaq_m): Remove.
(vrndmq_m): Remove.
(vrndnq_m): Remove.
(vrndpq_m): Remove.
(vrndq_m): Remove.
(vrndxq_m): Remove.
(vrndq_x): Remove.
(vrndnq_x): Remove.
(vrndmq_x): Remove.
(vrndpq_x): Remove.
(vrndaq_x): Remove.
(vrndxq_x): Remove.
(vrndxq_f16): Remove.
(vrndxq_f32): Remove.
(vrndq_f16): Remove.
(vrndq_f32): Remove.
(vrndpq_f16): Remove.
(vrndpq_f32): Remove.
(vrndnq_f16): Remove.
(vrndnq_f32): Remove.
(vrndmq_f16): Remove.
(vrndmq_f32): Remove.
(vrndaq_f16): Remove.
(vrndaq_f32): Remove.
(vrndaq_m_f16): Remove.
(vrndmq_m_f16): Remove.
(vrndnq_m_f16): Remove.
(vrndpq_m_f16): Remove.
(vrndq_m_f16): Remove.
(vrndxq_m_f16): Remove.
(vrndaq_m_f32): Remove.
(vrndmq_m_f32): Remove.
(vrndnq_m_f32): Remove.
(vrndpq_m_f32): Remove.
(vrndq_m_f32): Remove.
(vrndxq_m_f32): Remove.
(vrndq_x_f16): Remove.
(vrndq_x_f32): Remove.
(vrndnq_x_f16): Remove.
(vrndnq_x_f32): Remove.
(vrndmq_x_f16): Remove.
(vrndmq_x_f32): Remove.
(vrndpq_x_f16): Remove.
(vrndpq_x_f32): Remove.
(vrndaq_x_f16): Remove.
(vrndaq_x_f32): Remove.
(vrndxq_x_f16): Remove.
(vrndxq_x_f32): Remove.
(__arm_vrndxq_f16): Remove.
(__arm_vrndxq_f32): Remove.
(__arm_vrndq_f16): Remove.
(__arm_vrndq_f32): Remove.
(__arm_vrndpq_f16): Remove.
(__arm_vrndpq_f32): Remove.
(__arm_vrndnq_f16): Remove.
(__arm_vrndnq_f32): Remove.
(__arm_vrndmq_f16): Remove.
(__arm_vrndmq_f32): Remove.
(__arm_vrndaq_f16): Remove.
(__arm_vrndaq_f32): Remove.
(__arm_vrndaq_m_f16): Remove.
(__arm_vrndmq_m_f16): Remove.
(__arm_vrndnq_m_f16): Remove.
(__arm_vrndpq_m_f16): Remove.
(__arm_vrndq_m_f16): Remove.
(__arm_vrndxq_m_f16): Remove.
(__arm_vrndaq_m_f32): Remove.
(__arm_vrndmq_m_f32): Remove.
(__arm_vrndnq_m_f32): Remove.
(__arm_vrndpq_m_f32): Remove.
(__arm_vrndq_m_f32): Remove.
(__arm_vrndxq_m_f32): Remove.
(__arm_vrndq_x_f16): Remove.
(__arm_vrndq_x_f32): Remove.
(__arm_vrndnq_x_f16): Remove.
(__arm_vrndnq_x_f32): Remove.
(__arm_vrndmq_x_f16): Remove.
(__arm_vrndmq_x_f32): Remove.
(__arm_vrndpq_x_f16): Remove.
(__arm_vrndpq_x_f32): Remove.
(__arm_vrndaq_x_f16): Remove.
(__arm_vrndaq_x_f32): Remove.
(__arm_vrndxq_x_f16): Remove.
(__arm_vrndxq_x_f32): Remove.
(__arm_vrndxq): Remove.
(__arm_vrndq): Remove.
(__arm_vrndpq): Remove.
(__arm_vrndnq): Remove.
(__arm_vrndmq): Remove.
(__arm_vrndaq): Remove.
(__arm_vrndaq_m): Remove.
(__arm_vrndmq_m): Remove.
(__arm_vrndnq_m): Remove.
(__arm_vrndpq_m): Remove.
(__arm_vrndq_m): Remove.
(__arm_vrndxq_m): Remove.
(__arm_vrndq_x): Remove.
(__arm_vrndnq_x): Remove.
(__arm_vrndmq_x): Remove.
(__arm_vrndpq_x): Remove.
(__arm_vrndaq_x): Remove.
(__arm_vrndxq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  15 +
 gcc/config/arm/arm-mve-builtins-base.def |   6 +
 gcc/config/arm/arm-mve-builtins-base.h   |   6 +
 gcc/config/arm/arm_mve.h | 655 ---
 4 files changed, 27 insertions(+), 655 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 627553f1784..4cf4464a48e 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -203,6 +203,15 @@ namespace arm_mve {
 UNSPEC##_M_S, -1, -1,  \
 -1, -1, -1))
 
+  /* Helper for builtins with only unspec codes, _m predicated
+ overrides, only floating-point.  */
+#define FUNCTION_ONLY_F(NAME, UNSPEC) FUNCTION \
+  (NAME, unspec_mve_function_exact_insn,   \
+   (-1, -1, UNSPEC##_F,

[PATCH 06/10] arm: [MVE intrinsics] factorize vmovnbq vmovntq vqmovnbq vqmovntq vqmovunbq vqmovuntq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vmovnbq vmovntq vqmovnbq vqmovntq vqmovunbq vqmovuntq so
that they use the same pattern.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_MOVN, MVE_MOVN_M): New.
(mve_insn): Add vmovnb, vmovnt, vqmovnb, vqmovnt, vqmovunb,
vqmovunt.
(isu): Likewise.
(supf): Add VQMOVUNBQ_M_S, VQMOVUNBQ_S, VQMOVUNTQ_M_S,
VQMOVUNTQ_S.
* config/arm/mve.md (mve_vmovnbq_)
(mve_vmovntq_, mve_vqmovnbq_)
(mve_vqmovntq_, mve_vqmovunbq_s)
(mve_vqmovuntq_s): Merge into ...
(@mve_q_): ... this.
(mve_vmovnbq_m_, mve_vmovntq_m_)
(mve_vqmovnbq_m_, mve_vqmovntq_m_)
(mve_vqmovunbq_m_s, mve_vqmovuntq_m_s): Merge into ...
(@mve_q_m_): ... this.
---
 gcc/config/arm/iterators.md |  46 +
 gcc/config/arm/mve.md   | 180 
 2 files changed, 64 insertions(+), 162 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0b4f69ee874..20735284979 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -578,6 +578,24 @@ (define_int_iterator MVE_FP_CREATE_ONLY [
 VCREATEQ_F
 ])
 
+(define_int_iterator MVE_MOVN [
+VMOVNBQ_S VMOVNBQ_U
+VMOVNTQ_S VMOVNTQ_U
+VQMOVNBQ_S VQMOVNBQ_U
+VQMOVNTQ_S VQMOVNTQ_U
+VQMOVUNBQ_S
+VQMOVUNTQ_S
+])
+
+(define_int_iterator MVE_MOVN_M [
+VMOVNBQ_M_S VMOVNBQ_M_U
+VMOVNTQ_M_S VMOVNTQ_M_U
+VQMOVNBQ_M_S VQMOVNBQ_M_U
+VQMOVNTQ_M_S VQMOVNTQ_M_U
+VQMOVUNBQ_M_S
+VQMOVUNTQ_M_S
+])
+
 (define_code_attr mve_addsubmul [
 (minus "vsub")
 (mult "vmul")
@@ -613,6 +631,10 @@ (define_int_attr mve_insn [
 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
 (VMLAQ_M_N_S "vmla") (VMLAQ_M_N_U "vmla")
 (VMLASQ_M_N_S "vmlas") (VMLASQ_M_N_U "vmlas")
+(VMOVNBQ_M_S "vmovnb") (VMOVNBQ_M_U "vmovnb")
+(VMOVNBQ_S "vmovnb") (VMOVNBQ_U "vmovnb")
+(VMOVNTQ_M_S "vmovnt") (VMOVNTQ_M_U "vmovnt")
+(VMOVNTQ_S "vmovnt") (VMOVNTQ_U "vmovnt")
 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
 (VMULHQ_S "vmulh") (VMULHQ_U "vmulh")
 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul") (VMULQ_M_N_F "vmul")
@@ -639,6 +661,14 @@ (define_int_attr mve_insn [
 (VQDMULHQ_M_S "vqdmulh")
 (VQDMULHQ_N_S "vqdmulh")
 (VQDMULHQ_S "vqdmulh")
+(VQMOVNBQ_M_S "vqmovnb") (VQMOVNBQ_M_U "vqmovnb")
+(VQMOVNBQ_S "vqmovnb") (VQMOVNBQ_U "vqmovnb")
+(VQMOVNTQ_M_S "vqmovnt") (VQMOVNTQ_M_U "vqmovnt")
+(VQMOVNTQ_S "vqmovnt") (VQMOVNTQ_U "vqmovnt")
+(VQMOVUNBQ_M_S "vqmovunb")
+(VQMOVUNBQ_S "vqmovunb")
+(VQMOVUNTQ_M_S "vqmovunt")
+(VQMOVUNTQ_S "vqmovunt")
 (VQNEGQ_M_S "vqneg")
 (VQNEGQ_S "vqneg")
 (VQRDMLADHQ_M_S "vqrdmladh")
@@ -723,8 +753,20 @@ (define_int_attr isu[
 (VCLSQ_M_S "s")
 (VCLZQ_M_S "i")
 (VCLZQ_M_U "i")
+(VMOVNBQ_M_S "i") (VMOVNBQ_M_U "i")
+(VMOVNBQ_S "i") (VMOVNBQ_U "i")
+(VMOVNTQ_M_S "i") (VMOVNTQ_M_U "i")
+(VMOVNTQ_S "i") (VMOVNTQ_U "i")
 (VNEGQ_M_S "s")
 (VQABSQ_M_S "s")
+(VQMOVNBQ_M_S "s") (VQMOVNBQ_M_U "u")
+(VQMOVNBQ_S "s") (VQMOVNBQ_U "u")
+(VQMOVNTQ_M_S "s") (VQMOVNTQ_M_U "u")
+(VQMOVNTQ_S "s") (VQMOVNTQ_U "u")
+(VQMOVUNBQ_M_S "s")
+(VQMOVUNBQ_S "s")
+(VQMOVUNTQ_M_S "s")
+(VQMOVUNTQ_S "s")
 (VQNEGQ_M_S "s")
 (VQRSHRNBQ_M_N_S "s") (VQRSHRNBQ_M_N_U "u")
 (VQRSHRNBQ_N_S "s") (VQRSHRNBQ_N_U "u")
@@ -1942,6 +1984,10 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U 
"u") (VREV16Q_S "s")
   (VCLSQ_S "s")
   (VQABSQ_S "s")
   (VQNEGQ_S "s")
+  (VQMOVUNBQ_M_S "s")
+  (VQMOVUNBQ_S "s")
+  (VQMOVUNTQ_M_S "s")
+  (VQMOVUNTQ_S "s")
   ])
 
 ;; Both kinds of return insn.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 7bf344d547a..2273078807b 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1645,32 +1645,22 @@ (define_insn "mve_vmlsldavxq_s"
 ])
 
 ;;
-;; [vmovnbq_u, vmovnbq_s])
+;; [vmovnbq_u, vmovnbq_s]
+;; [vmovntq_s, vm

[PATCH 08/10] arm: [MVE intrinsics] add binary_widen_n shape

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_widen_n shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_widen_n): New.
* config/arm/arm-mve-builtins-shapes.h (binary_widen_n): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 53 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 54 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index e26604510a2..1d43b8871bf 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -821,6 +821,59 @@ struct binary_rshift_narrow_unsigned_def : public 
overloaded_base<0>
 };
 SHAPE (binary_rshift_narrow_unsigned)
 
+/* _t vfoo[_n_t0](_t, const int)
+
+   Check that 'imm' is in the [1..#bits] range.
+
+   Example: vshllbq.
+   int16x8_t [__arm_]vshllbq[_n_s8](int8x16_t a, const int imm)
+   int16x8_t [__arm_]vshllbq_m[_n_s8](int16x8_t inactive, int8x16_t a, const 
int imm, mve_pred16_t p)
+   int16x8_t [__arm_]vshllbq_x[_n_s8](int8x16_t a, const int imm, mve_pred16_t 
p)  */
+struct binary_widen_n_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+build_all (b, "vw0,v0,s0", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+tree res;
+if (!r.check_gp_argument (2, i, nargs)
+   || (type = r.infer_vector_type (i - 1)) == NUM_TYPE_SUFFIXES
+   || !r.require_integer_immediate (i))
+  return error_mark_node;
+
+type_suffix_index wide_suffix
+  = find_type_suffix (type_suffixes[type].tclass,
+ type_suffixes[type].element_bits * 2);
+
+/* Check the inactive argument has the wide type.  */
+if (((r.pred == PRED_m) && (r.infer_vector_type (0) == wide_suffix))
+   || r.pred == PRED_none
+   || r.pred == PRED_x)
+  if ((res = r.lookup_form (r.mode_suffix_id, type)))
+   return res;
+
+return r.report_no_such_form (type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+unsigned int bits = c.type_suffix (0).element_bits;
+return c.require_immediate_range (1, 1, bits);
+  }
+
+};
+SHAPE (binary_widen_n)
+
 /* xN_t vfoo[_t0](uint64_t, uint64_t)
 
where there are N arguments in total.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 825e1bb2a3c..dd2597dc6f5 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -45,6 +45,7 @@ namespace arm_mve
 extern const function_shape *const binary_rshift;
 extern const function_shape *const binary_rshift_narrow;
 extern const function_shape *const binary_rshift_narrow_unsigned;
+extern const function_shape *const binary_widen_n;
 extern const function_shape *const create;
 extern const function_shape *const inherent;
 extern const function_shape *const unary;
-- 
2.34.1

[PATCH 10/10] arm: [MVE intrinsics] rework vshllbq vshlltq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vshllbq and vshlltq using the new MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vshllbq, vshlltq): New.
* config/arm/arm-mve-builtins-base.def (vshllbq, vshlltq): New.
* config/arm/arm-mve-builtins-base.h (vshllbq, vshlltq): New.
* config/arm/arm_mve.h (vshlltq): Remove.
(vshllbq): Remove.
(vshllbq_m): Remove.
(vshlltq_m): Remove.
(vshllbq_x): Remove.
(vshlltq_x): Remove.
(vshlltq_n_u8): Remove.
(vshllbq_n_u8): Remove.
(vshlltq_n_s8): Remove.
(vshllbq_n_s8): Remove.
(vshlltq_n_u16): Remove.
(vshllbq_n_u16): Remove.
(vshlltq_n_s16): Remove.
(vshllbq_n_s16): Remove.
(vshllbq_m_n_s8): Remove.
(vshllbq_m_n_s16): Remove.
(vshllbq_m_n_u8): Remove.
(vshllbq_m_n_u16): Remove.
(vshlltq_m_n_s8): Remove.
(vshlltq_m_n_s16): Remove.
(vshlltq_m_n_u8): Remove.
(vshlltq_m_n_u16): Remove.
(vshllbq_x_n_s8): Remove.
(vshllbq_x_n_s16): Remove.
(vshllbq_x_n_u8): Remove.
(vshllbq_x_n_u16): Remove.
(vshlltq_x_n_s8): Remove.
(vshlltq_x_n_s16): Remove.
(vshlltq_x_n_u8): Remove.
(vshlltq_x_n_u16): Remove.
(__arm_vshlltq_n_u8): Remove.
(__arm_vshllbq_n_u8): Remove.
(__arm_vshlltq_n_s8): Remove.
(__arm_vshllbq_n_s8): Remove.
(__arm_vshlltq_n_u16): Remove.
(__arm_vshllbq_n_u16): Remove.
(__arm_vshlltq_n_s16): Remove.
(__arm_vshllbq_n_s16): Remove.
(__arm_vshllbq_m_n_s8): Remove.
(__arm_vshllbq_m_n_s16): Remove.
(__arm_vshllbq_m_n_u8): Remove.
(__arm_vshllbq_m_n_u16): Remove.
(__arm_vshlltq_m_n_s8): Remove.
(__arm_vshlltq_m_n_s16): Remove.
(__arm_vshlltq_m_n_u8): Remove.
(__arm_vshlltq_m_n_u16): Remove.
(__arm_vshllbq_x_n_s8): Remove.
(__arm_vshllbq_x_n_s16): Remove.
(__arm_vshllbq_x_n_u8): Remove.
(__arm_vshllbq_x_n_u16): Remove.
(__arm_vshlltq_x_n_s8): Remove.
(__arm_vshlltq_x_n_s16): Remove.
(__arm_vshlltq_x_n_u8): Remove.
(__arm_vshlltq_x_n_u16): Remove.
(__arm_vshlltq): Remove.
(__arm_vshllbq): Remove.
(__arm_vshllbq_m): Remove.
(__arm_vshlltq_m): Remove.
(__arm_vshllbq_x): Remove.
(__arm_vshlltq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   2 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 424 ---
 4 files changed, 6 insertions(+), 424 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 1dae12b445b..aafd85b293d 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -263,6 +263,8 @@ FUNCTION_WITH_M_N_NO_F (vrshlq, VRSHLQ)
 FUNCTION_ONLY_N_NO_F (vrshrnbq, VRSHRNBQ)
 FUNCTION_ONLY_N_NO_F (vrshrntq, VRSHRNTQ)
 FUNCTION_ONLY_N_NO_F (vrshrq, VRSHRQ)
+FUNCTION_ONLY_N_NO_F (vshllbq, VSHLLBQ)
+FUNCTION_ONLY_N_NO_F (vshlltq, VSHLLTQ)
 FUNCTION_WITH_M_N_R (vshlq, VSHLQ)
 FUNCTION_ONLY_N_NO_F (vshrnbq, VSHRNBQ)
 FUNCTION_ONLY_N_NO_F (vshrntq, VSHRNTQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index f868614fb6b..78c7515b972 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -64,6 +64,8 @@ DEF_MVE_FUNCTION (vrshlq, binary_round_lshift, all_integer, 
mx_or_none)
 DEF_MVE_FUNCTION (vrshrnbq, binary_rshift_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vrshrntq, binary_rshift_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vrshrq, binary_rshift, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vshllbq, binary_widen_n, integer_8_16, mx_or_none)
+DEF_MVE_FUNCTION (vshlltq, binary_widen_n, integer_8_16, mx_or_none)
 DEF_MVE_FUNCTION (vshlq, binary_lshift, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vshlq, binary_lshift_r, all_integer, m_or_none) // "_r" 
forms do not support the "x" predicate
 DEF_MVE_FUNCTION (vshrnbq, binary_rshift_narrow, integer_16_32, m_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index f4960cbbea2..e5a83466512 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -74,6 +74,8 @@ extern const function_base *const vrshlq;
 extern const function_base *const vrshrnbq;
 extern const function_base *const vrshrntq;
 extern const function_base *const vrshrq;
+extern const function_base *const vshllbq;
+extern const function_base *const vshlltq;
 extern const function_base *const vshlq;
 extern const function_base *const vshrnbq;
 extern const function_base *const vshrntq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/

[PATCH 03/10] arm: [MVE intrinsics] rework vabsq vnegq vclsq vclzq, vqabsq, vqnegq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vabsq, vnegq, vclsq, vclzq, vqabsq, vqnegq using the new MVE
builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_N_NO_U_F): New.
(vabsq, vnegq, vclsq, vclzq, vqabsq, vqnegq): New.
* config/arm/arm-mve-builtins-base.def (vabsq, vnegq, vclsq)
(vclzq, vqabsq, vqnegq): New.
* config/arm/arm-mve-builtins-base.h (vabsq, vnegq, vclsq, vclzq)
(vqabsq, vqnegq): New.
* config/arm/arm_mve.h (vabsq): Remove.
(vabsq_m): Remove.
(vabsq_x): Remove.
(vabsq_f16): Remove.
(vabsq_f32): Remove.
(vabsq_s8): Remove.
(vabsq_s16): Remove.
(vabsq_s32): Remove.
(vabsq_m_s8): Remove.
(vabsq_m_s16): Remove.
(vabsq_m_s32): Remove.
(vabsq_m_f16): Remove.
(vabsq_m_f32): Remove.
(vabsq_x_s8): Remove.
(vabsq_x_s16): Remove.
(vabsq_x_s32): Remove.
(vabsq_x_f16): Remove.
(vabsq_x_f32): Remove.
(__arm_vabsq_s8): Remove.
(__arm_vabsq_s16): Remove.
(__arm_vabsq_s32): Remove.
(__arm_vabsq_m_s8): Remove.
(__arm_vabsq_m_s16): Remove.
(__arm_vabsq_m_s32): Remove.
(__arm_vabsq_x_s8): Remove.
(__arm_vabsq_x_s16): Remove.
(__arm_vabsq_x_s32): Remove.
(__arm_vabsq_f16): Remove.
(__arm_vabsq_f32): Remove.
(__arm_vabsq_m_f16): Remove.
(__arm_vabsq_m_f32): Remove.
(__arm_vabsq_x_f16): Remove.
(__arm_vabsq_x_f32): Remove.
(__arm_vabsq): Remove.
(__arm_vabsq_m): Remove.
(__arm_vabsq_x): Remove.
(vnegq): Remove.
(vnegq_m): Remove.
(vnegq_x): Remove.
(vnegq_f16): Remove.
(vnegq_f32): Remove.
(vnegq_s8): Remove.
(vnegq_s16): Remove.
(vnegq_s32): Remove.
(vnegq_m_s8): Remove.
(vnegq_m_s16): Remove.
(vnegq_m_s32): Remove.
(vnegq_m_f16): Remove.
(vnegq_m_f32): Remove.
(vnegq_x_s8): Remove.
(vnegq_x_s16): Remove.
(vnegq_x_s32): Remove.
(vnegq_x_f16): Remove.
(vnegq_x_f32): Remove.
(__arm_vnegq_s8): Remove.
(__arm_vnegq_s16): Remove.
(__arm_vnegq_s32): Remove.
(__arm_vnegq_m_s8): Remove.
(__arm_vnegq_m_s16): Remove.
(__arm_vnegq_m_s32): Remove.
(__arm_vnegq_x_s8): Remove.
(__arm_vnegq_x_s16): Remove.
(__arm_vnegq_x_s32): Remove.
(__arm_vnegq_f16): Remove.
(__arm_vnegq_f32): Remove.
(__arm_vnegq_m_f16): Remove.
(__arm_vnegq_m_f32): Remove.
(__arm_vnegq_x_f16): Remove.
(__arm_vnegq_x_f32): Remove.
(__arm_vnegq): Remove.
(__arm_vnegq_m): Remove.
(__arm_vnegq_x): Remove.
(vclsq): Remove.
(vclsq_m): Remove.
(vclsq_x): Remove.
(vclsq_s8): Remove.
(vclsq_s16): Remove.
(vclsq_s32): Remove.
(vclsq_m_s8): Remove.
(vclsq_m_s16): Remove.
(vclsq_m_s32): Remove.
(vclsq_x_s8): Remove.
(vclsq_x_s16): Remove.
(vclsq_x_s32): Remove.
(__arm_vclsq_s8): Remove.
(__arm_vclsq_s16): Remove.
(__arm_vclsq_s32): Remove.
(__arm_vclsq_m_s8): Remove.
(__arm_vclsq_m_s16): Remove.
(__arm_vclsq_m_s32): Remove.
(__arm_vclsq_x_s8): Remove.
(__arm_vclsq_x_s16): Remove.
(__arm_vclsq_x_s32): Remove.
(__arm_vclsq): Remove.
(__arm_vclsq_m): Remove.
(__arm_vclsq_x): Remove.
(vclzq): Remove.
(vclzq_m): Remove.
(vclzq_x): Remove.
(vclzq_s8): Remove.
(vclzq_s16): Remove.
(vclzq_s32): Remove.
(vclzq_u8): Remove.
(vclzq_u16): Remove.
(vclzq_u32): Remove.
(vclzq_m_u8): Remove.
(vclzq_m_s8): Remove.
(vclzq_m_u16): Remove.
(vclzq_m_s16): Remove.
(vclzq_m_u32): Remove.
(vclzq_m_s32): Remove.
(vclzq_x_s8): Remove.
(vclzq_x_s16): Remove.
(vclzq_x_s32): Remove.
(vclzq_x_u8): Remove.
(vclzq_x_u16): Remove.
(vclzq_x_u32): Remove.
(__arm_vclzq_s8): Remove.
(__arm_vclzq_s16): Remove.
(__arm_vclzq_s32): Remove.
(__arm_vclzq_u8): Remove.
(__arm_vclzq_u16): Remove.
(__arm_vclzq_u32): Remove.
(__arm_vclzq_m_u8): Remove.
(__arm_vclzq_m_s8): Remove.
(__arm_vclzq_m_u16): Remove.
(__arm_vclzq_m_s16): Remove.
(__arm_vclzq_m_u32): Remove.
(__arm_vclzq_m_s32): Remove.
(__arm_vclzq_x_s8): Remove.
(__arm_vclzq_x_s16): Remove.
(__arm_vclzq_x_s32): Remove.
(__arm_vclzq_x_u8): Remove.
(__arm_vclzq_x_u16): Remove.
(__arm_vclzq_x_u32): Remove.
(__arm_vclzq): Remove.
(__arm_vclzq_m): Remove.
(__arm_vclzq_x): Remove.
(vqabsq

[PATCH 09/10] arm: [MVE intrinsics] factorize vshllbq vshlltq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vshllbq vshlltq so that they use the same pattern.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add vshllb, vshllt.
(VSHLLBQ_N, VSHLLTQ_N): Remove.
(VSHLLxQ_N): New.
(VSHLLBQ_M_N, VSHLLTQ_M_N): Remove.
(VSHLLxQ_M_N): New.
* config/arm/mve.md (mve_vshllbq_n_)
(mve_vshlltq_n_): Merge into ...
(@mve_q_n_): ... this.
(mve_vshllbq_m_n_, mve_vshlltq_m_n_):
Merge into ...
(@mve_q_m_n_): ... this.
---
 gcc/config/arm/iterators.md | 10 +---
 gcc/config/arm/mve.md   | 50 -
 2 files changed, 16 insertions(+), 44 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 20735284979..e82ff0d5d9b 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -731,6 +731,10 @@ (define_int_attr mve_insn [
 (VRSHRNTQ_N_S "vrshrnt") (VRSHRNTQ_N_U "vrshrnt")
 (VRSHRQ_M_N_S "vrshr") (VRSHRQ_M_N_U "vrshr")
 (VRSHRQ_N_S "vrshr") (VRSHRQ_N_U "vrshr")
+(VSHLLBQ_M_N_S "vshllb") (VSHLLBQ_M_N_U "vshllb")
+(VSHLLBQ_N_S "vshllb") (VSHLLBQ_N_U "vshllb")
+(VSHLLTQ_M_N_S "vshllt") (VSHLLTQ_M_N_U "vshllt")
+(VSHLLTQ_N_S "vshllt") (VSHLLTQ_N_U "vshllt")
 (VSHLQ_M_N_S "vshl") (VSHLQ_M_N_U "vshl")
 (VSHLQ_M_R_S "vshl") (VSHLQ_M_R_U "vshl")
 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
@@ -2133,8 +2137,7 @@ (define_int_iterator VMOVNTQ [VMOVNTQ_S VMOVNTQ_U])
 (define_int_iterator VORRQ_N [VORRQ_N_U VORRQ_N_S])
 (define_int_iterator VQMOVNBQ [VQMOVNBQ_U VQMOVNBQ_S])
 (define_int_iterator VQMOVNTQ [VQMOVNTQ_U VQMOVNTQ_S])
-(define_int_iterator VSHLLBQ_N [VSHLLBQ_N_S VSHLLBQ_N_U])
-(define_int_iterator VSHLLTQ_N [VSHLLTQ_N_U VSHLLTQ_N_S])
+(define_int_iterator VSHLLxQ_N [VSHLLBQ_N_S VSHLLBQ_N_U VSHLLTQ_N_S 
VSHLLTQ_N_U])
 (define_int_iterator VRMLALDAVHQ [VRMLALDAVHQ_U VRMLALDAVHQ_S])
 (define_int_iterator VBICQ_M_N [VBICQ_M_N_S VBICQ_M_N_U])
 (define_int_iterator VCVTAQ_M [VCVTAQ_M_S VCVTAQ_M_U])
@@ -2250,8 +2253,7 @@ (define_int_iterator VQSHRNBQ_M_N [VQSHRNBQ_M_N_U 
VQSHRNBQ_M_N_S])
 (define_int_iterator VQSHRNTQ_M_N [VQSHRNTQ_M_N_S VQSHRNTQ_M_N_U])
 (define_int_iterator VRSHRNBQ_M_N [VRSHRNBQ_M_N_U VRSHRNBQ_M_N_S])
 (define_int_iterator VRSHRNTQ_M_N [VRSHRNTQ_M_N_U VRSHRNTQ_M_N_S])
-(define_int_iterator VSHLLBQ_M_N [VSHLLBQ_M_N_U VSHLLBQ_M_N_S])
-(define_int_iterator VSHLLTQ_M_N [VSHLLTQ_M_N_U VSHLLTQ_M_N_S])
+(define_int_iterator VSHLLxQ_M_N [VSHLLBQ_M_N_U VSHLLBQ_M_N_S VSHLLTQ_M_N_U 
VSHLLTQ_M_N_S])
 (define_int_iterator VSHRNBQ_M_N [VSHRNBQ_M_N_S VSHRNBQ_M_N_U])
 (define_int_iterator VSHRNTQ_M_N [VSHRNTQ_M_N_S VSHRNTQ_M_N_U])
 (define_int_iterator VSTRWSBQ [VSTRWQSB_S VSTRWQSB_U])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 2273078807b..98728e6f3ef 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1830,32 +1830,18 @@ (define_insn "mve_vrmlsldavhxq_sv4si"
 ])
 
 ;;
-;; [vshllbq_n_s, vshllbq_n_u])
+;; [vshllbq_n_s, vshllbq_n_u]
+;; [vshlltq_n_u, vshlltq_n_s]
 ;;
-(define_insn "mve_vshllbq_n_"
-  [
-   (set (match_operand: 0 "s_register_operand" "=w")
-   (unspec: [(match_operand:MVE_3 1 "s_register_operand" 
"w")
- (match_operand:SI 2 "immediate_operand" "i")]
-VSHLLBQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vshllb.%#\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vshlltq_n_u, vshlltq_n_s])
-;;
-(define_insn "mve_vshlltq_n_"
+(define_insn "@mve_q_n_"
   [
(set (match_operand: 0 "s_register_operand" "=w")
(unspec: [(match_operand:MVE_3 1 "s_register_operand" 
"w")
  (match_operand:SI 2 "immediate_operand" "i")]
-VSHLLTQ_N))
+VSHLLxQ_N))
   ]
   "TARGET_HAVE_MVE"
-  "vshllt.%#\t%q0, %q1, %2"
+  ".%#\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -4410,36 +4396,20 @@ (define_insn "mve_vrmlaldavhaq_p_sv4si"
(set_attr "length""8")])
 
 ;;
-;; [vshllbq_m_n_u, vshllbq_m_n_s])
-;;
-(define_insn "mve_vshllbq_m_n_"
-  [
-   (set (match_operand: 0 "s_register_operand" "=w")
-   (unspec: [(match_operand: 1 
"s_register_operand" "0")
-  (match_operand:MVE_3 2 "s_register_operand" "w")
-  (match_operand:SI 3 "immediate_operand" "i")
-  (match_operand: 4 "vpr_register_operand" 
"Up")]
-VSHLLBQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vshllbt.%#\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vshlltq_m_n_u, vshlltq_m_n_s])
+;; [vshllbq_m_n_u, vshllbq_m_n_s]
+;; [vshlltq_m_n_u, vshlltq_m_n_s]
 ;;
-(define_insn "mve_vshlltq_m_n_"
+(define_insn "@mve_q_m_n_"
   [
(set (match_operand: 0 "s_register_operand" "=w")
(unspec: [(match_operand: 1 
"s_register_operand" "0")
   (match_oper

[PATCH 05/10] arm: [MVE intrinsics] add binary_move_narrow and binary_move_narrow_unsigned shapes

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_move_narrow and binary_move_narrow_unsigned
shapes descriptions.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_move_narrow): New.
(binary_move_narrow_unsigned): New.
* config/arm/arm-mve-builtins-shapes.h (binary_move_narrow): New.
(binary_move_narrow_unsigned): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 73 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  2 +
 2 files changed, 75 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 7d39cf79aec..e26604510a2 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -401,6 +401,79 @@ struct binary_rshift_def : public overloaded_base<0>
 };
 SHAPE (binary_rshift)
 
+/* _t vfoo[_t0](_t, _t)
+
+   Example: vmovnbq.
+   int8x16_t [__arm_]vmovnbq[_s16](int8x16_t a, int16x8_t b)
+   int8x16_t [__arm_]vmovnbq_m[_s16](int8x16_t a, int16x8_t b, mve_pred16_t p) 
 */
+struct binary_move_narrow_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "vh0,vh0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+type_suffix_index narrow_suffix
+  = find_type_suffix (type_suffixes[type].tclass,
+ type_suffixes[type].element_bits / 2);
+
+
+if (!r.require_matching_vector_type (0, narrow_suffix))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (binary_move_narrow)
+
+/* _t vfoo[_t0](_t, _t)
+
+   Example: vqmovunbq.
+   uint8x16_t [__arm_]vqmovunbq[_s16](uint8x16_t a, int16x8_t b)
+   uint8x16_t [__arm_]vqmovunbq_m[_s16](uint8x16_t a, int16x8_t b, 
mve_pred16_t p)  */
+struct binary_move_narrow_unsigned_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "vhu0,vhu0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+type_suffix_index narrow_suffix
+  = find_type_suffix (TYPE_unsigned,
+ type_suffixes[type].element_bits / 2);
+
+if (!r.require_matching_vector_type (0, narrow_suffix))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (binary_move_narrow_unsigned)
+
 /* _t vfoo[_t0](_t, _t)
_t vfoo[_n_t0](_t, _t)
 
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index bd7e11b89f6..825e1bb2a3c 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -37,6 +37,8 @@ namespace arm_mve
 extern const function_shape *const binary;
 extern const function_shape *const binary_lshift;
 extern const function_shape *const binary_lshift_r;
+extern const function_shape *const binary_move_narrow;
+extern const function_shape *const binary_move_narrow_unsigned;
 extern const function_shape *const binary_opt_n;
 extern const function_shape *const binary_orrq;
 extern const function_shape *const binary_round_lshift;
-- 
2.34.1

[PATCH 02/10] arm: [MVE intrinsics] factorize several unary operations

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vabs vcls vclz vneg vqabs vqneg vrnda vrndm vrndn vrndp vrnd
vrndx so that they use the same pattern.

This patch introduces the mve_mnemo iterator because some of the
involved intrinsics have a different name from their mnenonic: for
instance vrndq vs vrintz.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_INT_M_UNARY, MVE_INT_UNARY)
(MVE_FP_UNARY, MVE_FP_M_UNARY): New.
(mve_insn): Add vabs, vcls, vclz, vneg, vqabs, vqneg, vrnda,
vrndm, vrndn, vrndp, vrnd, vrndx.
(isu): Add VABSQ_M_S, VCLSQ_M_S, VCLZQ_M_S, VCLZQ_M_U, VNEGQ_M_S,
VQABSQ_M_S, VQNEGQ_M_S.
(mve_mnemo): New.
* config/arm/mve.md (mve_vrndq_m_f, mve_vrndxq_f)
(mve_vrndq_f, mve_vrndpq_f, mve_vrndnq_f)
(mve_vrndmq_f, mve_vrndaq_f): Merge into ...
(@mve_q_f): ... this.
(mve_vnegq_f, mve_vabsq_f): Merge into ...
(mve_vq_f): ... this.
(mve_vnegq_s, mve_vabsq_s): Merge into ...
(mve_vq_s): ... this.
(mve_vclsq_s, mve_vqnegq_s, mve_vqabsq_s): Merge into 
...
(@mve_q_): ... this.
(mve_vabsq_m_s, mve_vclsq_m_s)
(mve_vclzq_m_, mve_vnegq_m_s)
(mve_vqabsq_m_s, mve_vqnegq_m_s): Merge into ...
(@mve_q_m_): ... this.
(mve_vabsq_m_f, mve_vnegq_m_f, mve_vrndaq_m_f)
(mve_vrndmq_m_f, mve_vrndnq_m_f, mve_vrndpq_m_f)
(mve_vrndxq_m_f): Merge into ...
(@mve_q_m_f): ... this.
---
 gcc/config/arm/iterators.md |  80 
 gcc/config/arm/mve.md   | 383 +---
 2 files changed, 126 insertions(+), 337 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 53873704174..0b4f69ee874 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -333,6 +333,42 @@ (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
 ;; Max/Min iterator, to factorize MVE patterns
 (define_code_iterator MAX_MIN_SU [smax umax smin umin])
 
+;; MVE integer unary operations.
+(define_int_iterator MVE_INT_M_UNARY [
+VABSQ_M_S
+VCLSQ_M_S
+VCLZQ_M_S VCLZQ_M_U
+VNEGQ_M_S
+VQABSQ_M_S
+VQNEGQ_M_S
+])
+
+(define_int_iterator MVE_INT_UNARY [
+VCLSQ_S
+VQABSQ_S
+VQNEGQ_S
+])
+
+(define_int_iterator MVE_FP_UNARY [
+VRNDQ_F
+VRNDAQ_F
+VRNDMQ_F
+VRNDNQ_F
+VRNDPQ_F
+VRNDXQ_F
+])
+
+(define_int_iterator MVE_FP_M_UNARY [
+VABSQ_M_F
+VNEGQ_M_F
+VRNDAQ_M_F
+VRNDMQ_M_F
+VRNDNQ_M_F
+VRNDPQ_M_F
+VRNDQ_M_F
+VRNDXQ_M_F
+])
+
 ;; MVE integer binary operations.
 (define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
 
@@ -551,6 +587,8 @@ (define_code_attr mve_addsubmul [
 (define_int_attr mve_insn [
 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F "vabd")
 (VABDQ_S "vabd") (VABDQ_U "vabd") (VABDQ_F "vabd")
+(VABSQ_M_F "vabs")
+(VABSQ_M_S "vabs")
 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
@@ -558,6 +596,9 @@ (define_int_attr mve_insn [
 (VBICQ_M_N_S "vbic") (VBICQ_M_N_U "vbic")
 (VBICQ_M_S "vbic") (VBICQ_M_U "vbic") (VBICQ_M_F "vbic")
 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
+(VCLSQ_M_S "vcls")
+(VCLSQ_S "vcls")
+(VCLZQ_M_S "vclz") (VCLZQ_M_U "vclz")
 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F 
"vcreate")
 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F "veor")
 (VHADDQ_M_N_S "vhadd") (VHADDQ_M_N_U "vhadd")
@@ -577,9 +618,13 @@ (define_int_attr mve_insn [
 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul") (VMULQ_M_N_F "vmul")
 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F "vmul")
 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F "vmul")
+(VNEGQ_M_F "vneg")
+(VNEGQ_M_S "vneg")
 (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F "vorr")
 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
+(VQABSQ_M_S "vqabs")
+(VQABSQ_S "vqabs")
 (VQADDQ_M_N_S "vqadd") (VQADDQ_M_N_U "vqadd")
 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
 (VQADDQ_N_S "vqadd") (VQAD

[PATCH 01/10] arm: [MVE intrinsics] add unary shape

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the unary shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (unary): New.
* config/arm/arm-mve-builtins-shapes.h (unary): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 27 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 28 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 7078f7d7220..7d39cf79aec 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -786,6 +786,33 @@ struct inherent_def : public nonoverloaded_base
 };
 SHAPE (inherent)
 
+/* _t vfoo[_t0](_t)
+
+   i.e. the standard shape for unary operations that operate on
+   uniform types.
+
+   Example: vabsq.
+   int8x16_t [__arm_]vabsq[_s8](int8x16_t a)
+   int8x16_t [__arm_]vabsq_m[_s8](int8x16_t inactive, int8x16_t a, 
mve_pred16_t p)
+   int8x16_t [__arm_]vabsq_x[_s8](int8x16_t a, mve_pred16_t p)  */
+struct unary_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+return r.resolve_unary ();
+  }
+};
+SHAPE (unary)
+
 /* _t foo_t0[_t1](_t)
 
where the target type  must be specified explicitly but the source
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 09e00b69e63..bd7e11b89f6 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -45,6 +45,7 @@ namespace arm_mve
 extern const function_shape *const binary_rshift_narrow_unsigned;
 extern const function_shape *const create;
 extern const function_shape *const inherent;
+extern const function_shape *const unary;
 extern const function_shape *const unary_convert;
 
   } /* end namespace arm_mve::shapes */
-- 
2.34.1

Re: [PATCH v6 0/9] RISC-V: autovec: Add autovec support

2023-05-05 Thread Kito Cheng via Gcc-patches

Errr, why you just mixed in JuZhe’s patch set into this patch set?

Michael Collison 於 2023年5月5日 週五，23:47寫道：

> This series of patches adds foundational support for RISC-V
> auto-vectorization support. These patches are based on the current upstream
> rvv vector intrinsic support and is not a new implementation. Most of the
> implementation consists of adding the new vector cost model, the
> autovectorization patterns themselves and target hooks. This implementation
> only provides support for integer addition and subtraction as a proof of
> concept. This patch set should not be construed to be feature complete.
> Based on conversations with the community these patches are intended to lay
> the groundwork for feature completion and collaboration within the RISC-V
> community.
>
> These patches are largely based off the work of Juzhe Zhong (
> juzhe.zh...@rivai.ai) of RiVAI. More
> specifically the rvv-next branch at:
> https://github.com/riscv-collab/riscv-gcc.git <
> https://github.com/riscv-collab/riscv-gcc.git>is the foundation of this
> patch set.
>
> As discussed on this list, if these patches are approved they will be
> merged into a "auto-vectorization" branch once gcc-13 branches for release.
> There are two known issues related to crashes (assert failures) associated
> with tree vectorization; one of which I have sent a patch for and have
> received feedback.
>
> Changes in v6:
> - Incorporated upstream comments, added target hook for
> TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
>
> Changes in v5:
>
> - Incorporated upstream comments large to delete unnecessary code
>
> Changes in v4:
>
> - Added support for binary integer operations and test cases
> - Fixed bug to support 8-bit integer vectorization
> - Fixed several assert errors related to non-multiple of two vector modes
>
> Changes in v3:
>
> - Removed the cost model and cost hooks based on feedback from Richard
> Biener
> - Used RVV_VUNDEF macro to fix failing patterns
>
> Changes in v2
>
> - Updated ChangeLog entry to include RiVAI contributions
> - Fixed ChangeLog email formatting
> - Fixed gnu formatting issues in the code
>
> Kevin Lee (1):
>   RISC-V:autovec: This patch supports 8 bit auto-vectorization in riscv.
>
> Michael Collison (8):
>   RISC-V: Add new predicates and function prototypes
>   RISC-V: autovec: Export policy functions to global scope
>   RISC-V:autovec: Add auto-vectorization support functions
>   RISC-V:autovec: Add target vectorization hooks
>   RISC-V:autovec: Add autovectorization patterns for binary integer &
> len_load/store
>   RISC-V:autovec: Add autovectorization tests for add & sub
>   vect: Verify that GET_MODE_NUNITS is a multiple of 2.
>   RISC-V:autovec: Add autovectorization tests for binary integer
>
>  gcc/config/riscv/riscv-opts.h |  10 ++
>  gcc/config/riscv/riscv-protos.h   |   9 ++
>  gcc/config/riscv/riscv-v.cc   |  91 
>  gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
>  gcc/config/riscv/riscv-vector-builtins.h  |   3 +
>  gcc/config/riscv/riscv.cc | 130 ++
>  gcc/config/riscv/riscv.md |   1 +
>  gcc/config/riscv/vector-auto.md   |  74 ++
>  gcc/config/riscv/vector.md|   4 +-
>  .../riscv/rvv/autovec/loop-add-rv32.c |  25 
>  .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 
>  .../riscv/rvv/autovec/loop-and-rv32.c |  25 
>  .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 
>  .../riscv/rvv/autovec/loop-div-rv32.c |  27 
>  .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 
>  .../riscv/rvv/autovec/loop-max-rv32.c |  26 
>  .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 
>  .../riscv/rvv/autovec/loop-min-rv32.c |  26 
>  .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 
>  .../riscv/rvv/autovec/loop-mod-rv32.c |  27 
>  .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 
>  .../riscv/rvv/autovec/loop-mul-rv32.c |  25 
>  .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 
>  .../riscv/rvv/autovec/loop-or-rv32.c  |  25 
>  .../gcc.target/riscv/rvv/autovec/loop-or.c|  25 
>  .../riscv/rvv/autovec/loop-sub-rv32.c |  25 
>  .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 
>  .../riscv/rvv/autovec/loop-xor-rv32.c |  25 
>  .../gcc.target/riscv/rvv/autovec/loop-xor.c   |  25 
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   4 +
>  gcc/tree-vect-slp.cc  |   7 +-
>  31 files changed, 843 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/config/riscv/vector-auto.md
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-a

Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-05-05 Thread Sinan via Gcc-patches

> hi Jiawei
> 
> Please ignore my previous reply. I accidently sent the email before I 
> finished it.
> Sorry for that!
> 
> I downloaded the series of patches from you and found in some cases
> it fails to generate zcmp push and pop insns.
> 
> TC:
> 
> char my_getchar();
> int test_s0()
> {
> 
> int a = my_getchar();
> int b = my_getchar();
> return a+b;
> }
> 
> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
> -mcmodel=medlow test.c
> 
> -fno-shrink-wrap-separate is used here to avoid the impact from 
> shrink-wrap-separate that is by default
> enabled in O2.
> 
> As i'm also interested in Zc*, i did some changes mainly in prologue and 
> epilogue pass quite simliar to
> what has been done for save and restore except the CFI directives due to 
> reversed order that zcmp
> pushes and pops ra, s regs than what save and restore do. 
> 
> I will refine and share the code soon for your review.
> 
> BR
> Fei
Hi Fei,
In the current implementation, cm.push will not increase the original 
adjustment size of the stack pointer. As cm.push uses a minimum adjustment size 
of 16, and in your example, the adjustment size of sp is 12, so cm.push will 
not be generated.
you can find the check at riscv_use_push_pop
> > + */
> > + if (base_size > frame_size)
> > + return false;
> > +
And if this check is removed, then you can get the output that you expect. 
```
 cm.push {ra,s0},-16
 call my_getchar
 mv s0,a0
 call my_getchar
 add a0,s0,a0
 cm.popret {ra,s0},16
```
In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps we 
can remove this check? I haven't tested if it is ok to remove this check, and 
CC jiawei to help test it.
BR,
Sinan
--
Sender:Fei Gao 
Sent At:2023 Apr. 25 (Tue.) 18:12
Recipient:jiawei 
Cc:gcc-patches 
Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
hi Jiawei
Please ignore my previous reply. I accidently sent the email before I finished 
it.
Sorry for that!
I downloaded the series of patches from you and found in some cases
it fails to generate zcmp push and pop insns.
TC:
char my_getchar();
int test_s0()
{
 int a = my_getchar();
 int b = my_getchar();
 return a+b;
}
cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
-mcmodel=medlow test.c
-fno-shrink-wrap-separate is used here to avoid the impact from 
shrink-wrap-separate that is by default
enabled in O2.
As i'm also interested in Zc*, i did some changes mainly in prologue and 
epilogue pass quite simliar to
what has been done for save and restore except the CFI directives due to 
reversed order that zcmp
pushes and pops ra, s regs than what save and restore do. 
I will refine and share the code soon for your review.
BR
Fei
On Thu Apr 6 06:21:17 GMT 2023 Jiawei jia...@iscas.ac.cn wrote:
>
>Add Zcmp extension instructions support. Generate push/pop
>with follow steps:
>
> 1. preprocessing:
> 1.1. if there is no push rtx, then just return. e.g.
> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
> (plus:SI (reg/f:SI 2 sp)
> (const_int -32 [0xffe0])))
> (nil))
> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
> 1.2. if push rtx exists, then we compute the number of
> pushed s-registers, n_sreg.
>
> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>
> [2 and 3 happend simultaneously]
>
> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
> and aN is not used the move pattern, and sN is not
> defined before the move pattern (from prologue to the
> position of move pattern).
>
> 3. analysis use and reach of every instruction from prologue
> to the position of move pattern.
> if any sN is used, then we mark the corresponding argument list
> candidate as invalid.
> e.g.
> push {ra,s0-s3}, {}, -32
> sw s0,44(sp) # s0 is used, then argument list is invalid
> mv a0,a5 # a0 is defined, then argument list is invalid
> ...
> mv s0,a0
> mv s1,a1
> mv s2,a2
>
> 4. if there is a valid argument list, then replace the pop
> push parallel insn, and delete mv pattern.
> if not, skip.
>
>All "zcmpe" means Zcmp with RVE extension.
>The push/pop instrunction implement is mostly finished by Sinan Lin.
>
>Co-Authored by: Sinan Lin 
>Co-Authored by: Simon Cook 
>Co-Authored by: Shihua Liao 
>
>gcc/ChangeLog:
>
> * config.gcc: New object.
> * config/riscv/predicates.md (riscv_stack_push_operation):
> New predicate.
> (riscv_stack_pop_operation): Ditto.
> (pop_return_value_constant): Ditto.
> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
> * config/riscv/riscv-protos.h (riscv_output_popret_p):
> New routine.
> (riscv_valid_stack_push_pop_p): Ditto.
> (riscv_check_regno): Ditto.
> (make_pass_zcmp_popret): New pass.
> * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
> (riscv_output_popret_p): New function.
> (riscv_print_pop_size): Ditto.
> (riscv_print_reglist): Ditto.
> (riscv_print_operand): New case symbols.
> (riscv_save_push_pop_count): New function.
>

[PATCH v6 6/9] RISC-V:autovec: Add autovectorization tests for add & sub

2023-05-05 Thread Michael Collison

2023-03-02  Michael Collison  
Vineet Gupta 

* gcc.target/riscv/rvv/autovec: New directory
for autovectorization tests.
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
test to verify code generation of vector add on rv32.
* gcc.target/riscv/rvv/autovec/loop-add.c: New
test to verify code generation of vector add on rv64.
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
test to verify code generation of vector subtract on rv32.
* gcc.target/riscv/rvv/autovec/loop-sub.c: New
test to verify code generation of vector subtract on rv64.
---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/*

[PATCH v6 9/9] RISC-V:autovec: This patch supports 8 bit auto-vectorization in riscv.

2023-05-05 Thread Michael Collison

From: Kevin Lee 

2023-04-14 Kevin Lee 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: Support 8bit
type
* gcc.target/riscv/rvv/autovec/loop-add.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor.c: Ditto
---
 .../gcc.target/riscv/rvv/autovec/loop-add-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-and-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-div-rv32.c   | 10 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c  | 10 ++
 .../gcc.target/riscv/rvv/autovec/loop-max-rv32.c   |  9 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c  |  9 +
 .../gcc.target/riscv/rvv/autovec/loop-min-rv32.c   |  9 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c  |  9 +
 .../gcc.target/riscv/rvv/autovec/loop-mod-rv32.c   | 10 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c  | 10 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-or-rv32.c|  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c   |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-sub-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-xor-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c  |  7 ---
 20 files changed, 92 insertions(+), 68 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
index bdc3b6892e9..d2765e67d0d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=fixed-vlmax -mno-strict-align" } */
 
 #include 
 
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
index d7f992c7d27..c43f6d3e8cb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=fixed-vlmax -mno-strict-align" } */
 
 #include 
 
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
index eb1ac5b44fd..703f4843c2b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
+++ b/gcc/testsu

[PATCH v6 7/9] RISC-V: autovec: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-05-05 Thread Michael Collison

While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.
---
 gcc/tree-vect-slp.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b299e209b5b..3b7a21724ec 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
(GET_MODE_BITSIZE (int_mode), 1);
  tree vector_type
= get_vectype_for_scalar_type (vinfo, int_type, count);
+ poly_int64 half_nelts;
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
+2, &half_nelts))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
@@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
  vec_perm_builder sel1 (nelts, 2, 3);
  vec_perm_builder sel2 (nelts, 2, 3);
- poly_int64 half_nelts = exact_div (nelts, 2);
+
  for (unsigned int i = 0; i < 3; ++i)
{
  sel1.quick_push (i);
-- 
2.34.1

[PATCH v6 8/9] RISC-V:autovec: Add autovectorization tests for binary integer

2023-05-05 Thread Michael Collison

2023-04-05  Michael Collison  

* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: New
test to verify code generation of vector "and" on rv32.
* gcc.target/riscv/rvv/autovec/loop-and.c: New
test to verify code generation of vector "and" on rv64.
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: New
test to verify code generation of vector divide on rv32.
* gcc.target/riscv/rvv/autovec/loop-div.c: New
test to verify code generation of vector divide on rv64.
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: New
test to verify code generation of vector maximum on rv32.
* gcc.target/riscv/rvv/autovec/loop-max.c: New
test to verify code generation of vector maximum on rv64.
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: New
test to verify code generation of vector minimum on rv32.
* gcc.target/riscv/rvv/autovec/loop-min.c: New
test to verify code generation of vector minimum on rv64.
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: New
test to verify code generation of vector modulus on rv32.
* gcc.target/riscv/rvv/autovec/loop-mod.c: New
test to verify code generation of vector modulus on rv64.
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: New
test to verify code generation of vector multiply on rv32.
* gcc.target/riscv/rvv/autovec/loop-mul.c: New
test to verify code generation of vector multiply on rv64.
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: New
test to verify code generation of vector "or" on rv32.
* gcc.target/riscv/rvv/autovec/loop-or.c: New
test to verify code generation of vector "or" on rv64.
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: New
test to verify code generation of vector xor on rv32.
* gcc.target/riscv/rvv/autovec/loop-xor.c: New
test to verify code generation of vector xor on rv64.
---
 .../riscv/rvv/autovec/loop-and-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-and.c   | 24 ++
 .../riscv/rvv/autovec/loop-div-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-div.c   | 25 +++
 .../riscv/rvv/autovec/loop-max-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-max.c   | 25 +++
 .../riscv/rvv/autovec/loop-min-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-min.c   | 25 +++
 .../riscv/rvv/autovec/loop-mod-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   | 25 +++
 .../riscv/rvv/autovec/loop-mul-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   | 24 ++
 .../riscv/rvv/autovec/loop-or-rv32.c  | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-or.c| 24 ++
 .../riscv/rvv/autovec/loop-xor-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   | 24 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +++
 17 files changed, 396 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
new file mode 100644
index 000..eb1ac5b44fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vand_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {

[PATCH v6 4/9] RISC-V:autovec: Add target vectorization hooks

2023-05-05 Thread Michael Collison

2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(riscv_support_vector_misalignment): Implement
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): Ditto.
---
 gcc/config/riscv/riscv.cc | 130 ++
 1 file changed, 130 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1e328f6a801..1425f50d80a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -7138,6 +7147,112 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known RVV width all three estimates are the same.
+   For generic RVV tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with RVV when it is a win
+   even for 128-bit RVV.
+   When RVV width information is available VAL.coeffs[1] is multiplied by
+   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+: (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 65536 bits.  */
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * 15;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_128 = width_source - 128;
+  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+  if (TARGET_VECTOR)
+return riscv_vector::riscv_vector_preferred_simd_mode (mode);
+
+  return word_mode;
+}
+
+bool
+riscv_support_vector_misalignment (machine_mode mode,
+  const_tree type ATTRIBUTE_UNUSED,
+  int misalignment,
+  bool is_packed ATTRIBUTE_UNUSED)
+{
+  if (TARGET_VECTOR)
+{
+  if (STRICT_ALIGNMENT)
+   {
+ /* Return if movmisalign pattern is not supported for this mode.  */
+ if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
+   return false;
+
+ /* Misalignment factor is unknown at compile time.  */
+ if (misalignment == -1)
+   return false;
+   }
+  return true;
+}
+
+  return default_builtin_support_vector_misalignment (mode, type, misalignment,
+ is_packed);
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE.  */
+
+static opt_machine_mode
+riscv_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mode = VOIDmode;
+  if (TARGET_VECTOR
+  && riscv_vector::riscv_vector_get_mask_mode (mode).exists (&mask_mode))
+return mask_mode;
+
+  retu

[PATCH v6 3/9] RISC-V:autovec: Add auto-vectorization support functions

2023-05-05 Thread Michael Collison

2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc
(riscv_vector_preferred_simd_mode): New function.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---
 gcc/config/riscv/riscv-v.cc | 91 +
 1 file changed, 91 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 99c414cc910..7faffb55046 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 
 using namespace riscv_vector;
@@ -176,6 +178,56 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
   return ratio;
 }
 
+/* SCALABLE means that the vector-length is agnostic (run-time invariant and
+   compile-time unknown). FIXED meands that the vector-length is specific
+   (compile-time known). Both RVV_SCALABLE and RVV_FIXED_VLMAX are doing
+   auto-vectorization using VLMAX vsetvl configuration.  */
+static bool
+autovec_use_vlmax_p (void)
+{
+  return riscv_autovec_preference == RVV_SCALABLE
+|| riscv_autovec_preference == RVV_FIXED_VLMAX;
+}
+
+/* Return the vectorization machine mode for RVV according to LMUL.  */
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode)
+{
+  /* We only enable auto-vectorization when TARGET_MIN_VLEN >= 128 &&
+ riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE
+ when we enable -march=rv64gc_zve32* and -march=rv32gc_zve64*.
+ in the 'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have
+ VNx1SImode in -march=*zve32* and VNx1DImode in -march=*zve64*, they are
+ enabled in targetm. vector_mode_supported_p and SLP vectorizer will try to
+ use them. Currently, we can support auto-vectorization in
+ -march=rv32_zve32x_zvl128b. Wheras, -march=rv32_zve32x_zvl32b or
+ -march=rv32_zve32x_zvl64b are disabled.
+ */
+  if (autovec_use_vlmax_p ())
+{
+  /* If TARGET_MIN_VLEN < 128, we don't allow LMUL < 2
+auto-vectorization since Loop Vectorizer may use VNx1SImode or
+VNx1DImode to vectorize which will create ICE in the
+'can_duplicate_and_interleave_p' of tree-vect-slp.cc.  */
+  if (TARGET_MIN_VLEN < 128 && riscv_autovec_lmul < RVV_M2)
+   return word_mode;
+  /* We use LMUL = 1 as base bytesize which is BYTES_PER_RISCV_VECTOR and
+riscv_autovec_lmul as multiply factor to calculate the the NUNITS to
+get the auto-vectorization mode.  */
+  poly_uint64 nunits;
+  poly_uint64 vector_size
+   = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
+  poly_uint64 scalar_size = GET_MODE_SIZE (mode);
+  gcc_assert (multiple_p (vector_size, scalar_size, &nunits));
+  machine_mode rvv_mode;
+  if (get_vector_mode (mode, nunits).exists (&rvv_mode))
+   return rvv_mode;
+}
+  /* TODO: We will support minimum length VLS auto-vectorization in the future.
+   */
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -430,6 +482,45 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
+/* Return the mask policy for no predication.  */
+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return the tail policy for no predication.  */
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_tail_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV mask mode.  */
+bool
+riscv_vector_mask_mode_p (machine_mode mode)
+{
+  return (mode == VNx1BImode || mode == VNx2BImode || mode == VNx4BImode
+ || mode == VNx8BImode || mode == VNx16BImode || mode == VNx32BImode
+ || mode == VNx64BImode);
+}
+
+/* Return the appropriate mask mode for MODE.  */
+
+opt_machine_mode
+riscv_vector_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mode;
+  int nf = 1;
+
+  FOR_EACH_MODE_IN_CLASS (mask_mode, MODE_VECTOR_BOOL)
+  if (GET_MODE_INNER (mask_mode) == BImode
+  && known_eq (GET_MODE_NUNITS (mask_mode) * nf, GET_MODE_NUNITS (mode))
+  && riscv_vector_mask_mode_p (mask_mode))
+return mask_mode;
+  return default_get_mask_mode (mode);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
This function is not only used by builtins, but also will be used by
auto-vectorization in the future.  */
-- 
2.34.1

[PATCH v6 2/9] RISC-V: autovec: Export policy functions to global scope

2023-05-05 Thread Michael Collison

2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 434bd8e157b..f0ebc095fa7 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2496,7 +2496,7 @@ use_real_merge_p (enum predication_type_index pred)
 
 /* Get TAIL policy for predication. If predication indicates TU, return the TU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == PRED_TYPE_tumu)
@@ -2506,7 +2506,7 @@ get_tail_policy_for_pred (enum predication_type_index 
pred)
 
 /* Get MASK policy for predication. If predication indicates MU, return the MU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h
index 8ffb9d33e33..de3fd6ca290 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -483,6 +483,9 @@ extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 
1];
 extern function_instance get_read_vl_instance (void);
 extern tree get_read_vl_decl (void);
 
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);
+
 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
 {
-- 
2.34.1

[PATCH v6 5/9] RISC-V:autovec: Add autovectorization patterns for binary integer & len_load/store

2023-05-05 Thread Michael Collison

2023-04-25  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
---
 gcc/config/riscv/riscv.md   |  1 +
 gcc/config/riscv/vector-auto.md | 74 +
 gcc/config/riscv/vector.md  |  4 +-
 3 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index c508ee3ad89..e9b49eda617 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -140,6 +140,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")
 
 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..83d2ab6957a
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,74 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; len_load/len_store is a sub-optimal pattern for RVV auto-vectorization 
support.
+;; We will replace them when len_maskload/len_maskstore is supported in loop 
vectorizer.
+(define_expand "len_load_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "memory_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+(define_expand "len_store_"
+  [(match_operand:V 0 "memory_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+;; -
+;;  [INT] Vector binary patterns
+;; -
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+   (any_int_binop:VI (match_operand:VI 1 "")
+ (match_operand:VI 2 "")))]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vl);
+  rtx mask_policy = get_mask_policy_no_pred ();
+  rtx tail_policy = get_tail_policy_no_pred ();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx (NONVLMAX);
+
+  emit_insn (gen_pred_ (operands[0], mask, merge, operands[1], 
operands[2],
+vl, tail_policy, mask_policy, 
vlmax_avl_p));
+
+  DONE;
+})
+
+
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 1642822d098..5c9252c281b 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
 ;; - Auto-vectorization (TBD)
 ;; - Combine optimization (TBD)
 
-(include "vector-iterators.md")
-
 (define_constants [
(INVALID_ATTRIBUTE255)
(X0_REGNUM  0)
@@ -368,6 +366,8 @@
   (symbol_ref "INTVAL (operands[4])")]
(const_int INVALID_ATTRIBUTE)))
 
+(include "vector-auto.md")
+
 ;; -
 ;;  Miscellaneous Operations
 ;; -
-- 
2.34.1

[PATCH v6 1/9] RISC-V: autovec: Add new predicates and function prototypes

2023-05-05 Thread Michael Collison

2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h
(riscv_vector_preferred_simd_mode): New.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/riscv-opts.h (riscv_vector_lmul_enum): New enum.
---
 gcc/config/riscv/riscv-opts.h   | 10 ++
 gcc/config/riscv/riscv-protos.h |  9 +
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 4207db240ea..00c4ab222ae 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,7 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+
 /* RISC-V auto-vectorization preference.  */
 enum riscv_autovec_preference_enum {
   NO_AUTOVEC,
@@ -82,6 +83,15 @@ enum riscv_autovec_lmul_enum {
   RVV_M8 = 8
 };
 
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 33eb574aadc..fb39b856735 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -243,4 +243,13 @@ th_mempair_output_move (rtx[4], bool, machine_mode, 
RTX_CODE);
 #endif
 
 extern bool riscv_use_divmod_expander (void);
+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
-- 
2.34.1

[PATCH v6 0/9] RISC-V: autovec: Add autovec support

2023-05-05 Thread Michael Collison

This series of patches adds foundational support for RISC-V auto-vectorization 
support. These patches are based on the current upstream rvv vector intrinsic 
support and is not a new implementation. Most of the implementation consists of 
adding the new vector cost model, the autovectorization patterns themselves and 
target hooks. This implementation only provides support for integer addition 
and subtraction as a proof of concept. This patch set should not be construed 
to be feature complete. Based on conversations with the community these patches 
are intended to lay the groundwork for feature completion and collaboration 
within the RISC-V community.

These patches are largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai) of RiVAI. More specifically 
the rvv-next branch at: https://github.com/riscv-collab/riscv-gcc.git 
is the foundation of this patch 
set. 

As discussed on this list, if these patches are approved they will be merged 
into a "auto-vectorization" branch once gcc-13 branches for release. There are 
two known issues related to crashes (assert failures) associated with tree 
vectorization; one of which I have sent a patch for and have received feedback. 

Changes in v6:
- Incorporated upstream comments, added target hook for 
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

Changes in v5:

- Incorporated upstream comments large to delete unnecessary code

Changes in v4:

- Added support for binary integer operations and test cases
- Fixed bug to support 8-bit integer vectorization
- Fixed several assert errors related to non-multiple of two vector modes

Changes in v3:

- Removed the cost model and cost hooks based on feedback from Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2 

- Updated ChangeLog entry to include RiVAI contributions 
- Fixed ChangeLog email formatting 
- Fixed gnu formatting issues in the code 

Kevin Lee (1):
  RISC-V:autovec: This patch supports 8 bit auto-vectorization in riscv.

Michael Collison (8):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for binary integer &
len_load/store
  RISC-V:autovec: Add autovectorization tests for add & sub
  vect: Verify that GET_MODE_NUNITS is a multiple of 2.
  RISC-V:autovec: Add autovectorization tests for binary integer

 gcc/config/riscv/riscv-opts.h |  10 ++
 gcc/config/riscv/riscv-protos.h   |   9 ++
 gcc/config/riscv/riscv-v.cc   |  91 
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/riscv.cc | 130 ++
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/vector-auto.md   |  74 ++
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 
 .../riscv/rvv/autovec/loop-and-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 
 .../riscv/rvv/autovec/loop-div-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 
 .../riscv/rvv/autovec/loop-max-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 
 .../riscv/rvv/autovec/loop-min-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 
 .../riscv/rvv/autovec/loop-mod-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 
 .../riscv/rvv/autovec/loop-mul-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 
 .../riscv/rvv/autovec/loop-or-rv32.c  |  25 
 .../gcc.target/riscv/rvv/autovec/loop-or.c|  25 
 .../riscv/rvv/autovec/loop-sub-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 
 .../riscv/rvv/autovec/loop-xor-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   |  25 
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   4 +
 gcc/tree-vect-slp.cc  |   7 +-
 31 files changed, 843 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 create mode 100644 gcc/testsuite/gcc.targe

Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-05 Thread Christoph Müllner

On Fri, May 5, 2023 at 5:13 PM Palmer Dabbelt  wrote:
>
> On Fri, 05 May 2023 08:04:53 PDT (-0700), christoph.muell...@vrull.eu wrote:
> > What I forgot to mention:
> > Zfa is frozen and in public review:
> >   https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/SED4ntBkabg
>
> Thanks, I'd also forgot to send that out ;).
>
> I think the only blocker here on the specification side is the assembly
> format for FLI?  It looks like the feedback on
>  has been
> pretty minor so far.  It'd be nice to have the docs lined up before
> we merge, but we could always just call it a GNU extension -- we've
> already got a lot of that in assembler land, so I don't think it's that
> big of a deal.

I also don't think that we need to wait for that PR to land.

Nelson already gave his ok on the Binutils v4 (but after ratification,
not freeze):
  https://sourceware.org/pipermail/binutils/2023-April/127027.html

FWIW, I have meanwhile sent out a v5 for Binutils as well (there were
few changes requested).
And the v5 has been rebased and retested as well.

>
> >
> > On Fri, May 5, 2023 at 5:03 PM Christoph Müllner
> >  wrote:
> >>
> >> On Wed, Apr 19, 2023 at 11:58 AM Jin Ma  wrote:
> >> >
> >> > This patch adds the 'Zfa' extension for riscv, which is based on:
> >> >   https://github.com/riscv/riscv-isa-manual/commits/zfb
> >> >   
> >> > https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da
> >> >
> >> > The binutils-gdb for 'Zfa' extension:
> >> >   https://github.com/a4lg/binutils-gdb/commits/riscv-zfa
> >> >
> >> > What needs special explanation is:
> >> > 1, The immediate number of the instructions FLI.H/S/D is represented in 
> >> > the assembly as a
> >> >   floating-point value, with scientific counting when rs1 is 1,2, and 
> >> > decimal numbers for
> >> >   the rest.
> >> >
> >> >   Related llvm link:
> >> > https://reviews.llvm.org/D145645
> >> >   Related discussion link:
> >> > https://github.com/riscv/riscv-isa-manual/issues/980
> >> >
> >> > 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added 
> >> > principally to
> >> >   accelerate the processing of JavaScript Numbers.", so it seems that no 
> >> > implementation
> >> >   is required.
> >> >
> >> > 3, The instructions FMINM and FMAXM correspond to C23 library function 
> >> > fminimum and fmaximum.
> >> >   Therefore, this patch has simply implemented the pattern of 
> >> > fminm3 and
> >> >   fmaxm3 to prepare for later.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> > * common/config/riscv/riscv-common.cc: Add zfa extension version.
> >> > * config/riscv/constraints.md (Zf): Constrain the floating point 
> >> > number that the
> >> > instructions FLI.H/S/D can load.
> >> > ((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable 
> >> > FMVP.D.X and FMVH.X.D.
> >> > * config/riscv/iterators.md (ceil): New.
> >> > * config/riscv/riscv-protos.h 
> >> > (riscv_float_const_rtx_index_for_fli): New.
> >> > * config/riscv/riscv.cc (find_index_in_array): New.
> >> > (riscv_float_const_rtx_index_for_fli): Get the index of the 
> >> > floating-point number that
> >> > the instructions FLI.H/S/D can mov.
> >> > (riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be 
> >> > used, memory is not applicable.
> >> > (riscv_const_insns): The cost of FLI.H/S/D is 3.
> >> > (riscv_legitimize_const_move): Likewise.
> >> > (riscv_split_64bit_move_p): If instruction FLI.H/S/D can be 
> >> > used, no split is required.
> >> > (riscv_output_move): Output the mov instructions in zfa 
> >> > extension.
> >> > (riscv_print_operand): Output the floating-point value of the 
> >> > FLI.H/S/D immediate in assembly
> >> > (riscv_secondary_memory_needed): Likewise.
> >> > * config/riscv/riscv.h (GP_REG_RTX_P): New.
> >> > * config/riscv/riscv.md (fminm3): New.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> > * gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
> >> > * gcc.target/riscv/zfa-fleq-fltq.c: New test.
> >> > * gcc.target/riscv/zfa-fli-rv32.c: New test.
> >> > * gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
> >> > * gcc.target/riscv/zfa-fli-zfh.c: New test.
> >> > * gcc.target/riscv/zfa-fli.c: New test.
> >> > * gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
> >> > * gcc.target/riscv/zfa-fround-rv32.c: New test.
> >> > * gcc.target/riscv/zfa-fround.c: New test.
> >> > ---
> >> >  gcc/common/config/riscv/riscv-common.cc   |   4 +
> >> >  gcc/config/riscv/constraints.md   |  11 +-
> >> >  gcc/config/riscv/iterators.md |   5 +
> >> >  gcc/config/riscv/riscv-opts.h |   3 +
> >> >  gcc/config/riscv/riscv-protos.h   |   1 +
> >> >  gcc/config/riscv/riscv.cc

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread David Edelsohn via Gcc-patches

On Fri, May 5, 2023 at 11:38 AM Tamar Christina 
wrote:

> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Friday, May 5, 2023 4:33 PM
> > To: Tamar Christina 
> > Cc: Jeff Law ; David Edelsohn  >;
> > GCC Patches 
> > Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> > configurable.
> >
> > On Fri, May 05, 2023 at 03:22:11PM +, Tamar Christina wrote:
> > > > We require GNU make, so perhaps we could use something like
> > > > $(wordlist
> > > > 1,$(NUM_MATCH_SPLITS),$(check_p_numbers))
> > > > instead of
> > > > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > > > provided we move the check_p_numbers definition earlier (or perhaps
> > > > bettter rename it to something more generic, so that it is clear
> > > > that is a variable holding numbers from 1 to .
> > >
> > > I'm currently testing
> > >
> > > NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@ -
> > MATCH_SPLITS_SEQ =
> > > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > > +MATCH_SPLITS_SEQ = $(shell echo {1..$(NUM_MATCH_SPLITS)})
> > >
> > > Which seems to work since it looks like we require an sh compatible
> shell.
> > >
> > > Question is this right? From the existing
> >
> > AIX /bin/sh certainly doesn't handle that.
>
> Wow, wonder what sh version it has..
>
> >
> > But what do I know about AIX...
>
> Same..
>

AIX defaults to Korn Shell.

I always use Bash on AIX to build GCC and recommend Bash in the GCC build
instructions for AIX.

Do we want to require Bash?  Bash is a more self-contained requirement than
seq from coreutils.

Thanks, David


>
> >
> > This seems to work and we use it already in the Makefile.
> > If something else works portably, we could change both spots...
> >
> > 2023-05-05  Jakub Jelinek  
> >
> >   * Makefile.in (check_p_numbers): Rename to one_to_, move
> >   earlier with helper variables also renamed.
> >   (MATCH_SPLUT_SEQ): Use $(wordlist
> > 1,$(NUM_MATCH_SPLITS),$(one_to_))
> >   instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
> >   (check_p_subdirs): Use $(one_to_) instead of
> > $(check_p_numbers).
> >
> > --- gcc/Makefile.in.jj2023-05-05 16:02:37.180575333 +0200
> > +++ gcc/Makefile.in   2023-05-05 17:20:27.923251821 +0200
> > @@ -214,9 +214,19 @@ rtl-ssa-warn = $(STRICT_WARN)
> > GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn)
> > $(if $(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN))
> > $(NOCOMMON_FLAG) $($@-warn)  GCC_WARN_CXXFLAGS =
> > $(LOOSE_WARN) $($(@D)-warn) $(NOCOMMON_FLAG) $($@-warn)
> >
> > +# 1 2 3 ... 
> > +one_to__0:=1 2 3 4 5 6 7 8 9
> > +one_to__1:=0 $(one_to__0)
> > +one_to__2:=$(foreach i,$(one_to__0),$(addprefix
> > +$(i),$(one_to__1))) one_to__3:=$(addprefix
> > 0,$(one_to__1))
> > +$(one_to__2) one_to__4:=$(foreach
> > +i,$(one_to__0),$(addprefix $(i),$(one_to__3)))
> > +one_to__5:=$(addprefix 0,$(one_to__3)) $(one_to__4)
> > +one_to__6:=$(foreach i,$(one_to__0),$(addprefix
> > +$(i),$(one_to__5)))
> > +one_to_:=$(one_to__0) $(one_to__2) $(one_to__4)
> > +$(one_to__6)
> > +
> >  # The number of splits to be made for the match.pd files.
> >  NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@ -
> > MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
> > +MATCH_SPLITS_SEQ = $(wordlist
> > 1,$(NUM_MATCH_SPLITS),$(one_to_))
> >  GIMPLE_MATCH_PD_SEQ_SRC = $(patsubst %, gimple-match-%.cc,
> > $(MATCH_SPLITS_SEQ))  GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-
> > match-%.o, $(MATCH_SPLITS_SEQ))  GENERIC_MATCH_PD_SEQ_SRC =
> > $(patsubst %, generic-match-%.cc, $(MATCH_SPLITS_SEQ)) @@ -4234,18
> > +4244,10 @@ $(patsubst %,%-subtargets,$(lang_checks)
> > check_p_tool=$(firstword $(subst _, ,$*))
> >  check_p_count=$(check_$(check_p_tool)_parallelize)
> >  check_p_subno=$(word 2,$(subst _, ,$*))
> > -check_p_numbers0:=1 2 3 4 5 6 7 8 9
> > -check_p_numbers1:=0 $(check_p_numbers0) -
> > check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix
> > $(i),$(check_p_numbers1))) -check_p_numbers3:=$(addprefix
> > 0,$(check_p_numbers1)) $(check_p_numbers2) -
> > check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix
> > $(i),$(check_p_numbers3))) -check_p_numbers5:=$(addprefix
> > 0,$(check_p_numbers3)) $(check_p_numbers4) -
> > check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix
> > $(i),$(check_p_numbers5)))
> > -check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2)
> > $(check_p_numbers4) $(check_p_numbers6)  check_p_subdir=$(subst _,,$*)
> > check_p_subdirs=$(wordlist 1,$(check_p_count),$(wordlist 1, \
> >   $(if
> > $(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128), \
> > - $(check_p_numbers)))
> > + $(one_to_)))
>
> Thanks, If it works I'm happy, I can rebase my other patches to use this.
>
> Thank you!
>
> Regards,
> Tamar
>
> >
> >  # For parallelized check-% targets, this decides whether
> parallelization  # is
> > desirable (if -jN is used)

RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, May 5, 2023 4:33 PM
> To: Tamar Christina 
> Cc: Jeff Law ; David Edelsohn ;
> GCC Patches 
> Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> configurable.
> 
> On Fri, May 05, 2023 at 03:22:11PM +, Tamar Christina wrote:
> > > We require GNU make, so perhaps we could use something like
> > > $(wordlist
> > > 1,$(NUM_MATCH_SPLITS),$(check_p_numbers))
> > > instead of
> > > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > > provided we move the check_p_numbers definition earlier (or perhaps
> > > bettter rename it to something more generic, so that it is clear
> > > that is a variable holding numbers from 1 to .
> >
> > I'm currently testing
> >
> > NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@ -
> MATCH_SPLITS_SEQ =
> > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > +MATCH_SPLITS_SEQ = $(shell echo {1..$(NUM_MATCH_SPLITS)})
> >
> > Which seems to work since it looks like we require an sh compatible shell.
> >
> > Question is this right? From the existing
> 
> AIX /bin/sh certainly doesn't handle that.

Wow, wonder what sh version it has..

> 
> But what do I know about AIX...

Same..

> 
> This seems to work and we use it already in the Makefile.
> If something else works portably, we could change both spots...
> 
> 2023-05-05  Jakub Jelinek  
> 
>   * Makefile.in (check_p_numbers): Rename to one_to_, move
>   earlier with helper variables also renamed.
>   (MATCH_SPLUT_SEQ): Use $(wordlist
> 1,$(NUM_MATCH_SPLITS),$(one_to_))
>   instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
>   (check_p_subdirs): Use $(one_to_) instead of
> $(check_p_numbers).
> 
> --- gcc/Makefile.in.jj2023-05-05 16:02:37.180575333 +0200
> +++ gcc/Makefile.in   2023-05-05 17:20:27.923251821 +0200
> @@ -214,9 +214,19 @@ rtl-ssa-warn = $(STRICT_WARN)
> GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn)
> $(if $(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN))
> $(NOCOMMON_FLAG) $($@-warn)  GCC_WARN_CXXFLAGS =
> $(LOOSE_WARN) $($(@D)-warn) $(NOCOMMON_FLAG) $($@-warn)
> 
> +# 1 2 3 ... 
> +one_to__0:=1 2 3 4 5 6 7 8 9
> +one_to__1:=0 $(one_to__0)
> +one_to__2:=$(foreach i,$(one_to__0),$(addprefix
> +$(i),$(one_to__1))) one_to__3:=$(addprefix
> 0,$(one_to__1))
> +$(one_to__2) one_to__4:=$(foreach
> +i,$(one_to__0),$(addprefix $(i),$(one_to__3)))
> +one_to__5:=$(addprefix 0,$(one_to__3)) $(one_to__4)
> +one_to__6:=$(foreach i,$(one_to__0),$(addprefix
> +$(i),$(one_to__5)))
> +one_to_:=$(one_to__0) $(one_to__2) $(one_to__4)
> +$(one_to__6)
> +
>  # The number of splits to be made for the match.pd files.
>  NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@ -
> MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
> +MATCH_SPLITS_SEQ = $(wordlist
> 1,$(NUM_MATCH_SPLITS),$(one_to_))
>  GIMPLE_MATCH_PD_SEQ_SRC = $(patsubst %, gimple-match-%.cc,
> $(MATCH_SPLITS_SEQ))  GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-
> match-%.o, $(MATCH_SPLITS_SEQ))  GENERIC_MATCH_PD_SEQ_SRC =
> $(patsubst %, generic-match-%.cc, $(MATCH_SPLITS_SEQ)) @@ -4234,18
> +4244,10 @@ $(patsubst %,%-subtargets,$(lang_checks)
> check_p_tool=$(firstword $(subst _, ,$*))
>  check_p_count=$(check_$(check_p_tool)_parallelize)
>  check_p_subno=$(word 2,$(subst _, ,$*))
> -check_p_numbers0:=1 2 3 4 5 6 7 8 9
> -check_p_numbers1:=0 $(check_p_numbers0) -
> check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix
> $(i),$(check_p_numbers1))) -check_p_numbers3:=$(addprefix
> 0,$(check_p_numbers1)) $(check_p_numbers2) -
> check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix
> $(i),$(check_p_numbers3))) -check_p_numbers5:=$(addprefix
> 0,$(check_p_numbers3)) $(check_p_numbers4) -
> check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix
> $(i),$(check_p_numbers5)))
> -check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2)
> $(check_p_numbers4) $(check_p_numbers6)  check_p_subdir=$(subst _,,$*)
> check_p_subdirs=$(wordlist 1,$(check_p_count),$(wordlist 1, \
>   $(if
> $(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128), \
> - $(check_p_numbers)))
> + $(one_to_)))

Thanks, If it works I'm happy, I can rebase my other patches to use this.

Thank you!

Regards,
Tamar

> 
>  # For parallelized check-% targets, this decides whether parallelization  # 
> is
> desirable (if -jN is used).  If desirable, recursive make is run with
> 
> 
>   Jakub

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 03:22:11PM +, Tamar Christina wrote:
> > We require GNU make, so perhaps we could use something like $(wordlist
> > 1,$(NUM_MATCH_SPLITS),$(check_p_numbers))
> > instead of
> > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > provided we move the check_p_numbers definition earlier (or perhaps bettter
> > rename it to something more generic, so that it is clear that is a variable
> > holding numbers from 1 to .
> 
> I'm currently testing
> 
> NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@
> -MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
> +MATCH_SPLITS_SEQ = $(shell echo {1..$(NUM_MATCH_SPLITS)})
> 
> Which seems to work since it looks like we require an sh compatible shell.
> 
> Question is this right? From the existing

AIX /bin/sh certainly doesn't handle that.

But what do I know about AIX...

This seems to work and we use it already in the Makefile.
If something else works portably, we could change both spots...

2023-05-05  Jakub Jelinek  

* Makefile.in (check_p_numbers): Rename to one_to_, move
earlier with helper variables also renamed.
(MATCH_SPLUT_SEQ): Use $(wordlist 1,$(NUM_MATCH_SPLITS),$(one_to_))
instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
(check_p_subdirs): Use $(one_to_) instead of $(check_p_numbers).

--- gcc/Makefile.in.jj  2023-05-05 16:02:37.180575333 +0200
+++ gcc/Makefile.in 2023-05-05 17:20:27.923251821 +0200
@@ -214,9 +214,19 @@ rtl-ssa-warn = $(STRICT_WARN)
 GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn) $(if 
$(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN)) $(NOCOMMON_FLAG) 
$($@-warn)
 GCC_WARN_CXXFLAGS = $(LOOSE_WARN) $($(@D)-warn) $(NOCOMMON_FLAG) $($@-warn)
 
+# 1 2 3 ... 
+one_to__0:=1 2 3 4 5 6 7 8 9
+one_to__1:=0 $(one_to__0)
+one_to__2:=$(foreach i,$(one_to__0),$(addprefix $(i),$(one_to__1)))
+one_to__3:=$(addprefix 0,$(one_to__1)) $(one_to__2)
+one_to__4:=$(foreach i,$(one_to__0),$(addprefix $(i),$(one_to__3)))
+one_to__5:=$(addprefix 0,$(one_to__3)) $(one_to__4)
+one_to__6:=$(foreach i,$(one_to__0),$(addprefix $(i),$(one_to__5)))
+one_to_:=$(one_to__0) $(one_to__2) $(one_to__4) 
$(one_to__6)
+
 # The number of splits to be made for the match.pd files.
 NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@
-MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
+MATCH_SPLITS_SEQ = $(wordlist 1,$(NUM_MATCH_SPLITS),$(one_to_))
 GIMPLE_MATCH_PD_SEQ_SRC = $(patsubst %, gimple-match-%.cc, $(MATCH_SPLITS_SEQ))
 GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-match-%.o, $(MATCH_SPLITS_SEQ))
 GENERIC_MATCH_PD_SEQ_SRC = $(patsubst %, generic-match-%.cc, 
$(MATCH_SPLITS_SEQ))
@@ -4234,18 +4244,10 @@ $(patsubst %,%-subtargets,$(lang_checks)
 check_p_tool=$(firstword $(subst _, ,$*))
 check_p_count=$(check_$(check_p_tool)_parallelize)
 check_p_subno=$(word 2,$(subst _, ,$*))
-check_p_numbers0:=1 2 3 4 5 6 7 8 9
-check_p_numbers1:=0 $(check_p_numbers0)
-check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers1)))
-check_p_numbers3:=$(addprefix 0,$(check_p_numbers1)) $(check_p_numbers2)
-check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers3)))
-check_p_numbers5:=$(addprefix 0,$(check_p_numbers3)) $(check_p_numbers4)
-check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers5)))
-check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2) $(check_p_numbers4) 
$(check_p_numbers6)
 check_p_subdir=$(subst _,,$*)
 check_p_subdirs=$(wordlist 1,$(check_p_count),$(wordlist 1, \
$(if 
$(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128), \
-   $(check_p_numbers)))
+   $(one_to_)))
 
 # For parallelized check-% targets, this decides whether parallelization
 # is desirable (if -jN is used).  If desirable, recursive make is run with


Jakub

RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, May 5, 2023 4:18 PM
> To: Jeff Law 
> Cc: David Edelsohn ; Tamar Christina
> ; GCC Patches 
> Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> configurable.
> 
> On Fri, May 05, 2023 at 09:04:16AM -0600, Jeff Law via Gcc-patches wrote:
> > On 5/5/23 08:59, David Edelsohn via Gcc-patches wrote:
> > > This patch has broken GCC bootstrap on AIX.  It appears to rely
> > > upon, or complain about, the command "seq":
> > >
> > > /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
> > > -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
> > > -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
> > > -Wmissing-format-attribute -Wconditionally-supported
> > > -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
> > > -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H
> > > -DGENERATOR_FILE -static-libstdc++ -static-libgcc -Wl,-bbigtoc -Wl,-
> bmaxdata:0x4000 -o build/genmatch \
> > >  build/genmatch.o
> > > ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
> > > build/errors.o build/vec.o build/hash-table.o build/sort.o
> > > ../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
> > > /usr/bin/bash: seq: command not found
> > > /usr/bin/bash: seq: command not found build/genmatch --gimple \
> > >  --header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
> > >  /nasfarm/edelsohn/src/src/gcc/match.pd
> > >
> > > All of the match files are dumped to stdout.
> > Sigh.  So the question is do we make seq a requirement or do we
> > implement an alternate to get the sequence or implement a fallback.
> 
> We require GNU make, so perhaps we could use something like $(wordlist
> 1,$(NUM_MATCH_SPLITS),$(check_p_numbers))
> instead of
> $(shell seq 1 $(NUM_MATCH_SPLITS))
> provided we move the check_p_numbers definition earlier (or perhaps bettter
> rename it to something more generic, so that it is clear that is a variable
> holding numbers from 1 to .

I'm currently testing

NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@
-MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
+MATCH_SPLITS_SEQ = $(shell echo {1..$(NUM_MATCH_SPLITS)})

Which seems to work since it looks like we require an sh compatible shell.

Question is this right? From the existing

$(foreach header_var,$(shell sed < Makefile -n -e 's/^\([A-Z0-9_]*_H\)[ 
]*=.*/\1/p'),echo $(header_var)=$(shell echo 
$($(header_var):$(srcdir)/%=.../%) | sed -e 's~\.\.\./config/~config/~' -e 
's~\.\.\./common/config/~common/config/~' -e 's~\.\.\.[^]*/~~g') >> 
tmp-header-vars;)

Rule this seems to be correct.

Thanks,
Tamar

> 
>   Jakub

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 09:04:16AM -0600, Jeff Law via Gcc-patches wrote:
> On 5/5/23 08:59, David Edelsohn via Gcc-patches wrote:
> > This patch has broken GCC bootstrap on AIX.  It appears to rely upon, or
> > complain about, the command "seq":
> > 
> > /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
> > -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
> > -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
> > -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual
> > -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
> > -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -static-libstdc++
> > -static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genmatch \
> >  build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
> > build/errors.o build/vec.o build/hash-table.o build/sort.o
> > ../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
> > /usr/bin/bash: seq: command not found
> > /usr/bin/bash: seq: command not found
> > build/genmatch --gimple \
> >  --header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
> >  /nasfarm/edelsohn/src/src/gcc/match.pd
> > 
> > All of the match files are dumped to stdout.
> Sigh.  So the question is do we make seq a requirement or do we implement an
> alternate to get the sequence or implement a fallback.

We require GNU make, so perhaps we could use something like
$(wordlist 1,$(NUM_MATCH_SPLITS),$(check_p_numbers))
instead of
$(shell seq 1 $(NUM_MATCH_SPLITS))
provided we move the check_p_numbers definition earlier (or perhaps bettter 
rename
it to something more generic, so that it is clear that is a variable holding
numbers from 1 to .

Jakub

[PATCH] Move substitute_and_fold over to use simple_dce_from_worklist

2023-05-05 Thread Andrew Pinski via Gcc-patches

While looking into a different issue, I noticed that it
would take until the second forwprop pass to do some
forward proping and it was because the ssa name was
used more than once but the second statement was
"dead" and we don't remove that until much later.

So this uses simple_dce_from_worklist instead of manually
removing of the known unused statements instead.
Propagate engine does not do a cleanupcfg afterwards either but manually
cleans up possible EH edges so simple_dce_from_worklist
needs to communicate that back to the propagate engine.

Some testcases needed to be updated/changed even because of better optimization.
gcc.dg/pr81192.c even had to be changed to be using the gimple FE so it would
be less fragile in the future too.
gcc.dg/tree-ssa/pr98737-1.c was failing because __atomic_fetch_ was being 
matched
but in those cases, the result was not being used so both __atomic_fetch_ and
__atomic_x_and_fetch_ are valid choices and would not make a code generation 
difference.
evrp7.c, evrp8.c, vrp35.c, vrp36.c: just needed a slightly change as the 
removal message
is different slightly.
kernels-alias-8.c: ccp1 is able to remove an unused load which causes ealias to 
have
one less load to analysis so update the expected scan #.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/109691
* tree-ssa-dce.cc (simple_dce_from_worklist): Add need_eh_cleanup
argument.
If the removed statement can throw, have need_eh_cleanup
include the bb of that statement.
* tree-ssa-dce.h (simple_dce_from_worklist): Update declaration.
* tree-ssa-propagate.cc (struct prop_stats_d): Remove
num_dce.
(substitute_and_fold_dom_walker::substitute_and_fold_dom_walker):
Initialize dceworklist instead of stmts_to_remove.
(substitute_and_fold_dom_walker::~substitute_and_fold_dom_walker):
Destore dceworklist instead of stmts_to_remove.
(substitute_and_fold_dom_walker::before_dom_children):
Set dceworklist instead of adding to stmts_to_remove.
(substitute_and_fold_engine::substitute_and_fold):
Call simple_dce_from_worklist instead of poping
from the list.
Don't update the stat on removal statements.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/evrp7.c: Update for output change.
* gcc.dg/tree-ssa/evrp8.c: Likewise.
* gcc.dg/tree-ssa/vrp35.c: Likewise.
* gcc.dg/tree-ssa/vrp36.c: Likewise.
* gcc.dg/tree-ssa/pr98737-1.c: Update scan-tree-dump-not
to check for assignment too instead of just a call.
* c-c++-common/goacc/kernels-alias-8.c: Update test
for removal of load.
* gcc.dg/pr81192.c: Rewrite testcase in gimple based test.
---
 .../c-c++-common/goacc/kernels-alias-8.c  |  6 +-
 gcc/testsuite/gcc.dg/pr81192.c| 64 ---
 gcc/testsuite/gcc.dg/tree-ssa/evrp7.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/evrp8.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr98737-1.c |  7 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp35.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp36.c |  2 +-
 gcc/tree-ssa-dce.cc   |  7 +-
 gcc/tree-ssa-dce.h|  2 +-
 gcc/tree-ssa-propagate.cc | 39 ++-
 10 files changed, 82 insertions(+), 51 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
index 69200ccf192..c3922e33241 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
@@ -16,7 +16,9 @@ foo (int *a, size_t n)
   }
 }
 
-/* Only the omp_data_i related loads should be annotated with cliques.  */
-/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
+/* Only the omp_data_i related loads should be annotated with cliques.
+   Note ccp can remove one of the omp_data_i loads which is why there
+   is there only one clique base still there.  */
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 1 "ealias" } } */
 /* { dg-final { scan-tree-dump-times "(?n)clique 1 base 0" 2 "ealias" } } */
 
diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c
index 6cab6056558..f6d201ee71a 100644
--- a/gcc/testsuite/gcc.dg/pr81192.c
+++ b/gcc/testsuite/gcc.dg/pr81192.c
@@ -1,5 +1,58 @@
-/* { dg-options "-Os -fdump-tree-pre-details -fdisable-tree-evrp 
-fno-tree-dse" } */
+/* { dg-options "-Os -fgimple -fdump-tree-pre-details -fdisable-tree-evrp 
-fno-tree-dse" } */
 
+#if __SIZEOF_INT__ == 2
+#define unsigned __UINT32_TYPE__
+#define int __INT32_TYPE__
+#endif
+
+unsigned a;
+int b, c;
+
+void __GIMPLE(ssa, startwith("pre")) fn2   ()
+{
+  int b_lsm6;
+  int j;
+  int c0_1;
+  int iftmp2_8;
+
+  __BB(2):
+  a = _Literal (unsigned)30;
+  c0_1 = c;
+  b_lsm6_9 = b;
+  goto __BB7;
+
+  __BB(3):
+  if (j_6(

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Jeff Law via Gcc-patches

On 5/5/23 09:08, Tamar Christina wrote:

-Original Message-
From: Jeff Law 
Sent: Friday, May 5, 2023 4:04 PM
To: David Edelsohn ; Tamar Christina

Cc: GCC Patches 
Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
configurable.

On 5/5/23 08:59, David Edelsohn via Gcc-patches wrote:

This patch has broken GCC bootstrap on AIX.  It appears to rely upon,
or complain about, the command "seq":

/nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
-Wmissing-format-attribute -Wconditionally-supported
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  -

DGENERATOR_FILE

-static-libstdc++ -static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o

build/genmatch \

  build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
build/errors.o build/vec.o build/hash-table.o build/sort.o
../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
/usr/bin/bash: seq: command not found
/usr/bin/bash: seq: command not found
build/genmatch --gimple \
  --header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
  /nasfarm/edelsohn/src/src/gcc/match.pd

All of the match files are dumped to stdout.

Sigh.  So the question is do we make seq a requirement or do we implement an
alternate to get the sequence or implement a fallback.

jeff

I'm looking for an alternate sequence now.

If I don't find one in a bit, since Monday is a bank holiday for the UK I can 
temporarily
Ignore the configure flag by defining

MATCH_SPLITS_SEQ = 1 2 3 4 5 6 7 8 9 10

Would that be ok as a temporary fix if I don't find anything else by EOD? But 
hoping to find another way that doesn't rely on coreutils.

Yea, that would be a fine workaround while we sort this out.
jeff

Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-05 Thread Palmer Dabbelt


On Fri, 05 May 2023 08:04:53 PDT (-0700), christoph.muell...@vrull.eu wrote:

What I forgot to mention:
Zfa is frozen and in public review:
  https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/SED4ntBkabg


Thanks, I'd also forgot to send that out ;).

I think the only blocker here on the specification side is the assembly 
format for FLI?  It looks like the feedback on 
 has been 
pretty minor so far.  It'd be nice to have the docs lined up before 
we merge, but we could always just call it a GNU extension -- we've 
already got a lot of that in assembler land, so I don't think it's that 
big of a deal.




On Fri, May 5, 2023 at 5:03 PM Christoph Müllner
 wrote:


On Wed, Apr 19, 2023 at 11:58 AM Jin Ma  wrote:
>
> This patch adds the 'Zfa' extension for riscv, which is based on:
>   https://github.com/riscv/riscv-isa-manual/commits/zfb
>   
https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da
>
> The binutils-gdb for 'Zfa' extension:
>   https://github.com/a4lg/binutils-gdb/commits/riscv-zfa
>
> What needs special explanation is:
> 1, The immediate number of the instructions FLI.H/S/D is represented in the 
assembly as a
>   floating-point value, with scientific counting when rs1 is 1,2, and decimal 
numbers for
>   the rest.
>
>   Related llvm link:
> https://reviews.llvm.org/D145645
>   Related discussion link:
> https://github.com/riscv/riscv-isa-manual/issues/980
>
> 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added 
principally to
>   accelerate the processing of JavaScript Numbers.", so it seems that no 
implementation
>   is required.
>
> 3, The instructions FMINM and FMAXM correspond to C23 library function 
fminimum and fmaximum.
>   Therefore, this patch has simply implemented the pattern of 
fminm3 and
>   fmaxm3 to prepare for later.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Add zfa extension version.
> * config/riscv/constraints.md (Zf): Constrain the floating point 
number that the
> instructions FLI.H/S/D can load.
> ((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable 
FMVP.D.X and FMVH.X.D.
> * config/riscv/iterators.md (ceil): New.
> * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
New.
> * config/riscv/riscv.cc (find_index_in_array): New.
> (riscv_float_const_rtx_index_for_fli): Get the index of the 
floating-point number that
> the instructions FLI.H/S/D can mov.
> (riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, 
memory is not applicable.
> (riscv_const_insns): The cost of FLI.H/S/D is 3.
> (riscv_legitimize_const_move): Likewise.
> (riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no 
split is required.
> (riscv_output_move): Output the mov instructions in zfa extension.
> (riscv_print_operand): Output the floating-point value of the 
FLI.H/S/D immediate in assembly
> (riscv_secondary_memory_needed): Likewise.
> * config/riscv/riscv.h (GP_REG_RTX_P): New.
> * config/riscv/riscv.md (fminm3): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
> * gcc.target/riscv/zfa-fleq-fltq.c: New test.
> * gcc.target/riscv/zfa-fli-rv32.c: New test.
> * gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
> * gcc.target/riscv/zfa-fli-zfh.c: New test.
> * gcc.target/riscv/zfa-fli.c: New test.
> * gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
> * gcc.target/riscv/zfa-fround-rv32.c: New test.
> * gcc.target/riscv/zfa-fround.c: New test.
> ---
>  gcc/common/config/riscv/riscv-common.cc   |   4 +
>  gcc/config/riscv/constraints.md   |  11 +-
>  gcc/config/riscv/iterators.md |   5 +
>  gcc/config/riscv/riscv-opts.h |   3 +
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv.cc | 168 +-
>  gcc/config/riscv/riscv.h  |   1 +
>  gcc/config/riscv/riscv.md | 112 +---
>  .../gcc.target/riscv/zfa-fleq-fltq-rv32.c |  19 ++
>  .../gcc.target/riscv/zfa-fleq-fltq.c  |  19 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c |  79 
>  .../gcc.target/riscv/zfa-fli-zfh-rv32.c   |  41 +
>  gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  41 +
>  gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  79 
>  .../gcc.target/riscv/zfa-fmovh-fmovp-rv32.c   |  10 ++
>  .../gcc.target/riscv/zfa-fround-rv32.c|  42 +
>  gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  42 +
>  17 files changed, 652 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fle

Re: [COMMITTED] Remove deprecated range_fold_{unary, binary}_expr uses from ipa-*.

2023-05-05 Thread Martin Jambor

Hello,

On Wed, Apr 26 2023, Aldy Hernandez via Gcc-patches wrote:
> gcc/ChangeLog:
>
>   * ipa-cp.cc (ipa_vr_operation_and_type_effects): Convert to ranger API.
>   (ipa_value_range_from_jfunc): Same.
>   (propagate_vr_across_jump_function): Same.
>   * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
>   * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
>   * vr-values.cc (bounds_of_var_in_loop): Same.

thanks for taking care of the value range uses in IPA.

> ---
>  gcc/ipa-cp.cc| 28 +--
>  gcc/ipa-fnsummary.cc | 45 
>  gcc/ipa-prop.cc  |  5 ++---
>  gcc/vr-values.cc |  6 --
>  4 files changed, 57 insertions(+), 27 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 65c49558b58..673c40b 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -128,6 +128,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "attribs.h"
>  #include "dbgcnt.h"
>  #include "symtab-clones.h"
> +#include "gimple-range.h"
>  
>  template  class ipcp_value;
>  
> @@ -1900,10 +1901,15 @@ ipa_vr_operation_and_type_effects (value_range 
> *dst_vr,
>  enum tree_code operation,
>  tree dst_type, tree src_type)
>  {
> -  range_fold_unary_expr (dst_vr, operation, dst_type, src_vr, src_type);
> -  if (dst_vr->varying_p () || dst_vr->undefined_p ())
> +  if (!irange::supports_p (dst_type) || !irange::supports_p (src_type))
>  return false;
> -  return true;
> +
> +  range_op_handler handler (operation, dst_type);

Would it be possible to document the range_op_handler class somewhat?

> +  return (handler
> +   && handler.fold_range (*dst_vr, dst_type,
> +  *src_vr, value_range (dst_type))
> +   && !dst_vr->varying_p ()
> +   && !dst_vr->undefined_p ());

It looks important but the class is not documented at all.  Although the
use of fold_range is probably hopefully mostly clear from its uses in
this patch, the meaning of the return value of this method and what
other methods do is less obvious.

For example, I am curious why (not in this patch, but in the code as it
is now in the repo), uses of fold_range seem to be always preceeded with
a check for supports_type_p, even though the type is then also fed into
fold_range itself.  Does the return value of fold_range mean something
slightly different from "could not deduce anything?"

Thanks!

Martin

RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Jeff Law 
> Sent: Friday, May 5, 2023 4:04 PM
> To: David Edelsohn ; Tamar Christina
> 
> Cc: GCC Patches 
> Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> configurable.
> 
> 
> 
> On 5/5/23 08:59, David Edelsohn via Gcc-patches wrote:
> > This patch has broken GCC bootstrap on AIX.  It appears to rely upon,
> > or complain about, the command "seq":
> >
> > /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
> > -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
> > -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
> > -Wmissing-format-attribute -Wconditionally-supported
> > -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
> > -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  -
> DGENERATOR_FILE
> > -static-libstdc++ -static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o
> build/genmatch \
> >  build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
> > build/errors.o build/vec.o build/hash-table.o build/sort.o
> > ../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
> > /usr/bin/bash: seq: command not found
> > /usr/bin/bash: seq: command not found
> > build/genmatch --gimple \
> >  --header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
> >  /nasfarm/edelsohn/src/src/gcc/match.pd
> >
> > All of the match files are dumped to stdout.
> Sigh.  So the question is do we make seq a requirement or do we implement an
> alternate to get the sequence or implement a fallback.
> 
> jeff

I'm looking for an alternate sequence now.

If I don't find one in a bit, since Monday is a bank holiday for the UK I can 
temporarily
Ignore the configure flag by defining

MATCH_SPLITS_SEQ = 1 2 3 4 5 6 7 8 9 10

Would that be ok as a temporary fix if I don't find anything else by EOD? But 
hoping to find another way that doesn't rely on coreutils.

Cheers,
Tamar

Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-05 Thread Christoph Müllner

What I forgot to mention:
Zfa is frozen and in public review:
  https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/SED4ntBkabg

On Fri, May 5, 2023 at 5:03 PM Christoph Müllner
 wrote:
>
> On Wed, Apr 19, 2023 at 11:58 AM Jin Ma  wrote:
> >
> > This patch adds the 'Zfa' extension for riscv, which is based on:
> >   https://github.com/riscv/riscv-isa-manual/commits/zfb
> >   
> > https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da
> >
> > The binutils-gdb for 'Zfa' extension:
> >   https://github.com/a4lg/binutils-gdb/commits/riscv-zfa
> >
> > What needs special explanation is:
> > 1, The immediate number of the instructions FLI.H/S/D is represented in the 
> > assembly as a
> >   floating-point value, with scientific counting when rs1 is 1,2, and 
> > decimal numbers for
> >   the rest.
> >
> >   Related llvm link:
> > https://reviews.llvm.org/D145645
> >   Related discussion link:
> > https://github.com/riscv/riscv-isa-manual/issues/980
> >
> > 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added 
> > principally to
> >   accelerate the processing of JavaScript Numbers.", so it seems that no 
> > implementation
> >   is required.
> >
> > 3, The instructions FMINM and FMAXM correspond to C23 library function 
> > fminimum and fmaximum.
> >   Therefore, this patch has simply implemented the pattern of 
> > fminm3 and
> >   fmaxm3 to prepare for later.
> >
> > gcc/ChangeLog:
> >
> > * common/config/riscv/riscv-common.cc: Add zfa extension version.
> > * config/riscv/constraints.md (Zf): Constrain the floating point 
> > number that the
> > instructions FLI.H/S/D can load.
> > ((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable 
> > FMVP.D.X and FMVH.X.D.
> > * config/riscv/iterators.md (ceil): New.
> > * config/riscv/riscv-protos.h 
> > (riscv_float_const_rtx_index_for_fli): New.
> > * config/riscv/riscv.cc (find_index_in_array): New.
> > (riscv_float_const_rtx_index_for_fli): Get the index of the 
> > floating-point number that
> > the instructions FLI.H/S/D can mov.
> > (riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be 
> > used, memory is not applicable.
> > (riscv_const_insns): The cost of FLI.H/S/D is 3.
> > (riscv_legitimize_const_move): Likewise.
> > (riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, 
> > no split is required.
> > (riscv_output_move): Output the mov instructions in zfa extension.
> > (riscv_print_operand): Output the floating-point value of the 
> > FLI.H/S/D immediate in assembly
> > (riscv_secondary_memory_needed): Likewise.
> > * config/riscv/riscv.h (GP_REG_RTX_P): New.
> > * config/riscv/riscv.md (fminm3): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
> > * gcc.target/riscv/zfa-fleq-fltq.c: New test.
> > * gcc.target/riscv/zfa-fli-rv32.c: New test.
> > * gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
> > * gcc.target/riscv/zfa-fli-zfh.c: New test.
> > * gcc.target/riscv/zfa-fli.c: New test.
> > * gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
> > * gcc.target/riscv/zfa-fround-rv32.c: New test.
> > * gcc.target/riscv/zfa-fround.c: New test.
> > ---
> >  gcc/common/config/riscv/riscv-common.cc   |   4 +
> >  gcc/config/riscv/constraints.md   |  11 +-
> >  gcc/config/riscv/iterators.md |   5 +
> >  gcc/config/riscv/riscv-opts.h |   3 +
> >  gcc/config/riscv/riscv-protos.h   |   1 +
> >  gcc/config/riscv/riscv.cc | 168 +-
> >  gcc/config/riscv/riscv.h  |   1 +
> >  gcc/config/riscv/riscv.md | 112 +---
> >  .../gcc.target/riscv/zfa-fleq-fltq-rv32.c |  19 ++
> >  .../gcc.target/riscv/zfa-fleq-fltq.c  |  19 ++
> >  gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c |  79 
> >  .../gcc.target/riscv/zfa-fli-zfh-rv32.c   |  41 +
> >  gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  41 +
> >  gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  79 
> >  .../gcc.target/riscv/zfa-fmovh-fmovp-rv32.c   |  10 ++
> >  .../gcc.target/riscv/zfa-fround-rv32.c|  42 +
> >  gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  42 +
> >  17 files changed, 652 insertions(+), 25 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq-rv32.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh-rv32.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
> >  create mode 100644 gcc/testsuite/gcc.ta

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Jeff Law via Gcc-patches





On 5/5/23 08:59, David Edelsohn via Gcc-patches wrote:

This patch has broken GCC bootstrap on AIX.  It appears to rely upon, or
complain about, the command "seq":

/nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
-Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
-fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -static-libstdc++
-static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genmatch \
 build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
build/errors.o build/vec.o build/hash-table.o build/sort.o
../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
/usr/bin/bash: seq: command not found
/usr/bin/bash: seq: command not found
build/genmatch --gimple \
 --header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
 /nasfarm/edelsohn/src/src/gcc/match.pd

All of the match files are dumped to stdout.
Sigh.  So the question is do we make seq a requirement or do we 
implement an alternate to get the sequence or implement a fallback.


jeff

Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-05 Thread Christoph Müllner

On Wed, Apr 19, 2023 at 11:58 AM Jin Ma  wrote:
>
> This patch adds the 'Zfa' extension for riscv, which is based on:
>   https://github.com/riscv/riscv-isa-manual/commits/zfb
>   
> https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da
>
> The binutils-gdb for 'Zfa' extension:
>   https://github.com/a4lg/binutils-gdb/commits/riscv-zfa
>
> What needs special explanation is:
> 1, The immediate number of the instructions FLI.H/S/D is represented in the 
> assembly as a
>   floating-point value, with scientific counting when rs1 is 1,2, and decimal 
> numbers for
>   the rest.
>
>   Related llvm link:
> https://reviews.llvm.org/D145645
>   Related discussion link:
> https://github.com/riscv/riscv-isa-manual/issues/980
>
> 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added 
> principally to
>   accelerate the processing of JavaScript Numbers.", so it seems that no 
> implementation
>   is required.
>
> 3, The instructions FMINM and FMAXM correspond to C23 library function 
> fminimum and fmaximum.
>   Therefore, this patch has simply implemented the pattern of 
> fminm3 and
>   fmaxm3 to prepare for later.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Add zfa extension version.
> * config/riscv/constraints.md (Zf): Constrain the floating point 
> number that the
> instructions FLI.H/S/D can load.
> ((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable 
> FMVP.D.X and FMVH.X.D.
> * config/riscv/iterators.md (ceil): New.
> * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
> New.
> * config/riscv/riscv.cc (find_index_in_array): New.
> (riscv_float_const_rtx_index_for_fli): Get the index of the 
> floating-point number that
> the instructions FLI.H/S/D can mov.
> (riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, 
> memory is not applicable.
> (riscv_const_insns): The cost of FLI.H/S/D is 3.
> (riscv_legitimize_const_move): Likewise.
> (riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no 
> split is required.
> (riscv_output_move): Output the mov instructions in zfa extension.
> (riscv_print_operand): Output the floating-point value of the 
> FLI.H/S/D immediate in assembly
> (riscv_secondary_memory_needed): Likewise.
> * config/riscv/riscv.h (GP_REG_RTX_P): New.
> * config/riscv/riscv.md (fminm3): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
> * gcc.target/riscv/zfa-fleq-fltq.c: New test.
> * gcc.target/riscv/zfa-fli-rv32.c: New test.
> * gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
> * gcc.target/riscv/zfa-fli-zfh.c: New test.
> * gcc.target/riscv/zfa-fli.c: New test.
> * gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
> * gcc.target/riscv/zfa-fround-rv32.c: New test.
> * gcc.target/riscv/zfa-fround.c: New test.
> ---
>  gcc/common/config/riscv/riscv-common.cc   |   4 +
>  gcc/config/riscv/constraints.md   |  11 +-
>  gcc/config/riscv/iterators.md |   5 +
>  gcc/config/riscv/riscv-opts.h |   3 +
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv.cc | 168 +-
>  gcc/config/riscv/riscv.h  |   1 +
>  gcc/config/riscv/riscv.md | 112 +---
>  .../gcc.target/riscv/zfa-fleq-fltq-rv32.c |  19 ++
>  .../gcc.target/riscv/zfa-fleq-fltq.c  |  19 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c |  79 
>  .../gcc.target/riscv/zfa-fli-zfh-rv32.c   |  41 +
>  gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  41 +
>  gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  79 
>  .../gcc.target/riscv/zfa-fmovh-fmovp-rv32.c   |  10 ++
>  .../gcc.target/riscv/zfa-fround-rv32.c|  42 +
>  gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  42 +
>  17 files changed, 652 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 309a52def75..f9fce6bcc38 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/risc

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread David Edelsohn via Gcc-patches

This patch has broken GCC bootstrap on AIX.  It appears to rely upon, or
complain about, the command "seq":

/nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
-Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
-fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -static-libstdc++
-static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genmatch \
build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
build/errors.o build/vec.o build/hash-table.o build/sort.o
../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
/usr/bin/bash: seq: command not found
/usr/bin/bash: seq: command not found
build/genmatch --gimple \
--header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
/nasfarm/edelsohn/src/src/gcc/match.pd

All of the match files are dumped to stdout.

Thanks, David

[PATCH V5] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

Address comments from Robin.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (preferred_simd_mode): Fix comments.
* config/riscv/riscv.cc (riscv_get_arg_info): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix function name.
* gcc.target/riscv/rvv/autovec/v-1.c: Remove -O3 -ftree-vectorize.
* gcc.target/riscv/rvv/autovec/v-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add -O3 -ftree-vectorize.

---
 gcc/config/riscv/riscv-v.cc  | 2 +-
 gcc/config/riscv/riscv.cc| 9 -
 .../gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c | 4 +++-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c | 7 ++-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/v-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp   | 2 +-
 31 files changed, 41 insertions(+), 35 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 82510743eb8..1f887f7e747 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -940,7 +940,7 @@ autovec_use_vlmax_p (void)
 machine_mode
 preferred_simd_mode (scalar_mode mode)
 {
-  /* We only enable auto-vectorization when TARGET_MIN_VLEN < 128 &&
+  /* We will disable auto-vectorization when TARGET_MIN_VLEN < 128 &&
  riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE when we
  enable -march=rv64gc_zve32* and -march=rv32gc_zve64*. in the
  'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8d3cd4261d2..aa985c2f456 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3791,12 +3791,11 @@ riscv_get_arg_info (struct riscv_arg_info *info, co

RE: [PATCH v2] RISC-V: Legitimise the const0_rtx for RVV indexed load/store

2023-05-05 Thread Li, Pan2 via Gcc-patches

Thank you!

-Original Message-
From: Kito Cheng  
Sent: Friday, May 5, 2023 10:52 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang 
Subject: Re: [PATCH v2] RISC-V: Legitimise the const0_rtx for RVV indexed 
load/store

pushed to trunk, thanks :)

On Thu, May 4, 2023 at 5:12 PM Pan Li via Gcc-patches  
wrote:
>
> From: Pan Li 
>
> This patch try to legitimise the const0_rtx (aka zero register) as the 
> base register for the RVV indexed load/store instructions by allowing 
> the const as the operand of the indexed RTL pattern.
> Then the underlying combine pass will try to perform the const 
> propagation.
>
> For example:
> vint32m1_t
> test_vluxei32_v_i32m1_shortcut (vuint32m1_t bindex, size_t vl) {
>   return __riscv_vluxei32_v_i32m1 ((int32_t *)0, bindex, vl); }
>
> Before this patch:
> li a5,0 <- can be eliminated.
> vl1re32.v  v1,0(a1)
> vsetvlizero,a2,e32,m1,ta,ma
> vluxei32.v v1,(a5),v1   <- can propagate the const 0 to a5 here.
> vs1r.v v1,0(a0)
> ret
>
> After this patch:
> test_vluxei32_v_i32m1_shortcut:
> vl1re32.v   v1,0(a1)
> vsetvli zero,a2,e32,m1,ta,ma
> vluxei32.v  v1,(0),v1
> vs1r.v  v1,0(a0)
> ret
>
> As above, this patch allow you to propagaate the const 0 (aka zero
> register) to the base register of the RVV indexed load in the combine 
> pass. This may benefit the underlying RVV auto-vectorization.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Allow const as the operand of RVV
>   indexed load/store.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c:
>   Adjust indexed load/store check condition.
>
> Signed-off-by: Pan Li 
> Co-authored-by: Ju-Zhe Zhong 
> ---
>  gcc/config/riscv/vector.md| 62 +--
>  .../base/zero_base_load_store_optimization.c  |  3 +-
>  2 files changed, 33 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md 
> index 92115e3935f..dc05e9fc713 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1511,12 +1511,12 @@ (define_insn 
> "@pred_indexed_load_same_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:V
> -   [(match_operand 3 "pmode_register_operand""  r,  r, r,  r")
> +   [(match_operand 3 "pmode_reg_or_0_operand"" rJ, rJ,rJ, rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand" " vr, vr,vr, vr")] 
> ORDER)
>   (match_operand:V 2 "vector_merge_operand"   " vu, vu, 0,  0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1533,12 +1533,12 @@ (define_insn 
> "@pred_indexed_load_x2_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT2
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand" "   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT2 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1554,12 +1554,12 @@ (define_insn 
> "@pred_indexed_load_x4_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT4
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand"   "   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT4 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1575,12 +1575,12 @@ (define_insn 
> "@pred_indexed_load_x8_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT8
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand""   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT8 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(s

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Kito Cheng via Gcc-patches

pushed v1 to trunk

On Fri, May 5, 2023 at 8:46 PM Li, Pan2 via Gcc-patches
 wrote:
>
> Ok, sounds good. Thank you!
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Friday, May 5, 2023 8:37 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> 
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> I will take V1 and commit to trunk after my local test is done :)
>
> On Fri, May 5, 2023 at 8:30 PM Li, Pan2  wrote:
> >
> > Hi kito,
> >
> > Could you please help to share any suggestion about the PATCH? Comparing 
> > the V1 and V2.
> >
> > Pan
> >
> >
> > -Original Message-
> > From: Li, Pan2
> > Sent: Wednesday, May 3, 2023 7:18 PM
> > To: Jeff Law ; Kito Cheng
> > 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> > ; Andrew Waterman 
> > Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> > Thanks all for comments, will work with kito to make it happen.
> >
> > Pan
> >
> > -Original Message-
> > From: Jeff Law 
> > Sent: Wednesday, May 3, 2023 12:28 AM
> > To: Kito Cheng 
> > Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org;
> > juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew
> > Waterman 
> > Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> >
> >
> > On 4/29/23 19:40, Kito Cheng wrote:
> > > Hi Jeff:
> > >
> > > The RTL pattern already models tail element and vector length well,
> > > so I don't feel the first version of Pan's patch has any problem?
> > >
> > > Input RTL pattern:
> > >
> > > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > > #(if_then_else:VNx2BI (unspec:VNx2BI [
> > > #(const_vector:VNx2BI repeat [
> > > #(const_int 1 [0x1])
> > > #])  # all-1 mask
> > > #(reg:DI 143)  # AVL reg, or vector length
> > > #(const_int 2 [0x2]) # mask policy
> > > #(const_int 0 [0])   # avl type
> > > #(reg:SI 66 vl)
> > > #(reg:SI 67 vtype)
> > > #] UNSPEC_VPREDICATE)
> > > #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > > #(reg/v:VNx2QI 137 [ v1 ]))
> > > #(unspec:VNx2BI [
> > > #(reg:SI 0 zero)
> > > #] UNSPEC_VUNDEF))) # maskoff and tail operand
> > > # (expr_list:REG_DEAD (reg:DI 143)
> > > #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > > #(nil
> > >
> > > And the split pattern, only did on tail/maskoff element with undefined 
> > > value:
> > >
> > > (define_split
> > >   [(set (match_operand:VB  0 "register_operand")
> > > (if_then_else:VB
> > >   (unspec:VB
> > > [(match_operand:VB 1 "vector_all_trues_mask_operand")
> > >  (match_operand4 "vector_length_operand")
> > >  (match_operand5 "const_int_operand")
> > >  (match_operand6 "const_int_operand")
> > >  (reg:SI VL_REGNUM)
> > >  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> > >   (match_operand:VB3 "vector_move_operand")
> > >   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> > > and tail operand, only match undef value
> > >
> > > Then it turns into vmset, and also discard mask policy operand
> > > (since maskoff is undef means don't care IMO):
> > >
> > > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > > (if_then_else:VNx2BI (unspec:VNx2BI [
> > > (const_vector:VNx2BI repeat [
> > > (const_int 1 [0x1])
> > > ])  # all-1 mask
> > > (reg:DI 143) # AVL reg, or vector length
> > > (const_int 2 [0x2]) # mask policy
> > > (reg:SI 66 vl)
> > > (reg:SI 67 vtype)
> > > ] UNSPEC_VPREDICATE)
> > > (const_vector:VNx2BI repeat [
> > > (const_int 1 [0x1])
> > > ])# all-1
> > > (unspec:VNx2BI [
> > > (reg:SI 0 zero)
> > > ] UNSPEC_VUNDEF))) # still vundef
> > >  (expr_list:REG_DEAD (reg:DI 143)
> > > (nil)))
> > Right.  My concern is that when we call relational_result it's going to 
> > return -1 (as a vector of bools) which bubbles up through the call
> > chain.   If that doesn't match the actual register state after the
> > instruction (irrespective of the tail policy), then we have the potential 
> > to generate incorrect code.
> >
> > For example, if there's a subsequent instruction that tried to set a vector 
> > register to -1, it could just copy from the destination of the vmset to the 
> > new target.  But if the vmset didn't set all the bits to 1, then the code 
> > is wrong.
> >
> > With all the UNSPECs in place, this may not be a problem in practice.
> > Un

Re: [PATCH v2] RISC-V: Legitimise the const0_rtx for RVV indexed load/store

2023-05-05 Thread Kito Cheng via Gcc-patches

pushed to trunk, thanks :)

On Thu, May 4, 2023 at 5:12 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> This patch try to legitimise the const0_rtx (aka zero register)
> as the base register for the RVV indexed load/store instructions
> by allowing the const as the operand of the indexed RTL pattern.
> Then the underlying combine pass will try to perform the const
> propagation.
>
> For example:
> vint32m1_t
> test_vluxei32_v_i32m1_shortcut (vuint32m1_t bindex, size_t vl)
> {
>   return __riscv_vluxei32_v_i32m1 ((int32_t *)0, bindex, vl);
> }
>
> Before this patch:
> li a5,0 <- can be eliminated.
> vl1re32.v  v1,0(a1)
> vsetvlizero,a2,e32,m1,ta,ma
> vluxei32.v v1,(a5),v1   <- can propagate the const 0 to a5 here.
> vs1r.v v1,0(a0)
> ret
>
> After this patch:
> test_vluxei32_v_i32m1_shortcut:
> vl1re32.v   v1,0(a1)
> vsetvli zero,a2,e32,m1,ta,ma
> vluxei32.v  v1,(0),v1
> vs1r.v  v1,0(a0)
> ret
>
> As above, this patch allow you to propagaate the const 0 (aka zero
> register) to the base register of the RVV indexed load in the combine
> pass. This may benefit the underlying RVV auto-vectorization.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Allow const as the operand of RVV
>   indexed load/store.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c:
>   Adjust indexed load/store check condition.
>
> Signed-off-by: Pan Li 
> Co-authored-by: Ju-Zhe Zhong 
> ---
>  gcc/config/riscv/vector.md| 62 +--
>  .../base/zero_base_load_store_optimization.c  |  3 +-
>  2 files changed, 33 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 92115e3935f..dc05e9fc713 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1511,12 +1511,12 @@ (define_insn 
> "@pred_indexed_load_same_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:V
> -   [(match_operand 3 "pmode_register_operand""  r,  r, r,  r")
> +   [(match_operand 3 "pmode_reg_or_0_operand"" rJ, rJ,rJ, rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand" " vr, vr,vr, vr")] 
> ORDER)
>   (match_operand:V 2 "vector_merge_operand"   " vu, vu, 0,  0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1533,12 +1533,12 @@ (define_insn 
> "@pred_indexed_load_x2_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT2
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand" "   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT2 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1554,12 +1554,12 @@ (define_insn 
> "@pred_indexed_load_x4_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT4
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand"   "   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT4 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1575,12 +1575,12 @@ (define_insn 
> "@pred_indexed_load_x8_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT8
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand""   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT8 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1597,12 +1597,12 @@ (define_insn 
> "@pred_indexed_load_x2_smaller_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWTRUNC2
> -   [(match_operand 3 "

Re: [PATCH V3] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread Robin Dapp via Gcc-patches

Hi Juzhe,

I wasn't yet able to check this locally so just some minor comment nits:

> +/* Return the vectorization machine mode for RVV according to LMUL.  */
> +machine_mode
> +preferred_simd_mode (scalar_mode mode)
> +{
> +  /* We only enable auto-vectorization when TARGET_MIN_VLEN < 128 &&
> + riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE when 
> we
> + enable -march=rv64gc_zve32* and -march=rv32gc_zve64*. in the

I believe Kito mentioned this in the last iteration but the comment
here doesn't match the code below.  You want >= 128 instead of < 128.

> +  /* TODO: Currently, it will produce ICE for --param
> +  riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
> +  let GCC genearte loads/stores. Ideally, GCC should either report
> +  Warning message to tell user do not use RVV vector type in function
> +  arg, or GCC just support function arg calling convention for RVV
> +  directly.  */
> +  if (riscv_v_ext_mode_p (mode))
> + return NULL_RTX;

will produce -> will cause an ICE

genearte -> generate

GCC should either... -> we should either warn the user not to use
an RVV vector type as function argument ... or support the calling
convention

> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -mpreferred-stack-boundary=3 
> -fno-schedule-insns -fno-schedule-insns2 -O3 --param 
> riscv-autovec-preference=fixed-vlmax" } */
> +
> +#include "riscv_vector.h"
> +
> +void f (char*);
> +
> +void stach_check_alloca_1 (vuint8m1_t data, uint8_t *base, int y, ...)
> +{

Shouldn't that be stack rather than stach?

> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h
> @@ -0,0 +1,106 @@
> +#include 
> +#include 
> +
> +#define N 777
> +
> +#define test_1(TYPE) 
>   \
> +  TYPE a_##TYPE[N];  
>   \
> +  TYPE b_##TYPE[N];  
>   \
> +  void __attribute__ ((noinline, noclone)) test_1_##TYPE (unsigned int n)
>   \

Just FYI, you can use ((noipa)) to cover all cases of unwanted "inlining".  Not
needed here though.

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> new file mode 100644
> index 000..7ff84f60749
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -O3 -ftree-vectorize 
> -fdump-tree-vect-details -save-temps" } */
> +
> +#include "template-1.h"

I'm a bit wary of these tests not checking anything.  Of course we will see
if we ICE or not but that I would expect from a "new feature".  Couldn't we 
check
something else at least that gives a clue as to what is supposed to happen?
Last time I tried some of those locally, we would not vectorize.  In case that's
intended we could check for e.g. "vectorized 0 loops in function".  If not, a
comment would still help.

Do we actually need -ftree-vectorize at -O3?  In general I would prefer to 
split off
common options and set them in rvv.exp already, only giving 
dg-additional-options for
each test.  Here we don't share too many apart from -O3 -ftree-vectorize so not 
yet
necessary.

Regards
 Robin

[PATCH V2] RISC-V: Fix incorrect demand info merge in local vsetvli optimization [PR109748]

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is fixing my recent optimization patch:
https://github.com/gcc-mirror/gcc/commit/d51f2456ee51bd59a79b4725ca0e488c25260bbf

In that patch, the new_info = parse_insn (i) is not correct.
Since consider the following case:
   
vsetvli a5,a4, e8,m1
..
vsetvli zero,a5, e32, m4
vle8.v
vmacc.vv
...

Since we have backward demand fusion in Phase 1, so the real demand of "vle8.v" 
is e32, m4.
However, if we use parse_insn (vle8.v) = e8, m1 which is not correct.

So this patch we change new_info = new_info.parse_insn (i)
into:

vector_insn_info new_info = m_vector_manager->vector_insn_infos[i->uid ()];

So that, we can correctly optimize codes into:

vsetvli a5,a4, e32, m4
..
.. (vsetvli zero,a5, e32, m4 is removed)
vle8.v
vmacc.vv

Since m_vector_manager->vector_insn_infos is the member variable of pass_vsetvl 
class.
We remove static void function "local_eliminate_vsetvl_insn", and make it as 
the member function
of pass_vsetvl class.

PR target/109748

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Remove it.
(pass_vsetvl::local_eliminate_vsetvl_insn): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109748.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 102 ++
 .../gcc.target/riscv/rvv/vsetvl/pr109748.c|  36 +++
 2 files changed, 93 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109748.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 39b4d21210b..e1efd7b1c40 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1056,51 +1056,6 @@ change_vsetvl_insn (const insn_info *insn, const 
vector_insn_info &info)
   change_insn (rinsn, new_pat);
 }
 
-static void
-local_eliminate_vsetvl_insn (const vector_insn_info &dem)
-{
-  const insn_info *insn = dem.get_insn ();
-  if (!insn || insn->is_artificial ())
-return;
-  rtx_insn *rinsn = insn->rtl ();
-  const bb_info *bb = insn->bb ();
-  if (vsetvl_insn_p (rinsn))
-{
-  rtx vl = get_vl (rinsn);
-  for (insn_info *i = insn->next_nondebug_insn ();
-  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
-   {
- if (i->is_call () || i->is_asm ()
- || find_access (i->defs (), VL_REGNUM)
- || find_access (i->defs (), VTYPE_REGNUM))
-   return;
-
- if (has_vtype_op (i->rtl ()))
-   {
- if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
-   return;
- rtx avl = get_avl (i->rtl ());
- if (avl != vl)
-   return;
- set_info *def = find_access (i->uses (), REGNO (avl))->def ();
- if (def->insn () != insn)
-   return;
-
- vector_insn_info new_info;
- new_info.parse_insn (i);
- if (!new_info.skip_avl_compatible_p (dem))
-   return;
-
- new_info.set_avl_info (dem.get_avl_info ());
- new_info = dem.merge (new_info, LOCAL_MERGE);
- change_vsetvl_insn (insn, new_info);
- eliminate_insn (PREV_INSN (i->rtl ()));
- return;
-   }
-   }
-}
-}
-
 static bool
 source_equal_p (insn_info *insn1, insn_info *insn2)
 {
@@ -2672,6 +2627,7 @@ private:
   void pre_vsetvl (void);
 
   /* Phase 5.  */
+  void local_eliminate_vsetvl_insn (const vector_insn_info &) const;
   void cleanup_insns (void) const;
 
   /* Phase 6.  */
@@ -3993,6 +3949,62 @@ pass_vsetvl::pre_vsetvl (void)
 commit_edge_insertions ();
 }
 
+/* Local user vsetvl optimizaiton:
+
+ Case 1:
+   vsetvl a5,a4,e8,mf8
+   ...
+   vsetvl zero,a5,e8,mf8 --> Eliminate directly.
+
+ Case 2:
+   vsetvl a5,a4,e8,mf8--> vsetvl a5,a4,e32,mf2
+   ...
+   vsetvl zero,a5,e32,mf2 --> Eliminate directly.  */
+void
+pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const
+{
+  const insn_info *insn = dem.get_insn ();
+  if (!insn || insn->is_artificial ())
+return;
+  rtx_insn *rinsn = insn->rtl ();
+  const bb_info *bb = insn->bb ();
+  if (vsetvl_insn_p (rinsn))
+{
+  rtx vl = get_vl (rinsn);
+  for (insn_info *i = insn->next_nondebug_insn ();
+  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
+   {
+ if (i->is_call () || i->is_asm ()
+ || find_access (i->defs (), VL_REGNUM)
+ || find_access (i->defs (), VTYPE_REGNUM))
+   return;
+
+ if (has_vtype_op (i->rtl ()))
+   {
+ if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
+   return;
+ rtx avl = get_avl (i->rtl ());
+ if (avl != vl)
+   return;
+ set_info *def = find_access (i->uses (), REGNO (avl))->def ();
+ if (def->insn () != insn)
+   return;
+

[PATCH V4] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is depending on 
https://patchwork.sourceware.org/project/gcc/patch/20230504054544.203366-1-juzhe.zh...@rivai.ai/
Fix codes according to comments of Kito.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_arg_info): Move RVV type argument 
handling outside.

---
 gcc/config/riscv/riscv.cc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1a35e02796d..8d3cd4261d2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3791,16 +3791,16 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   info->gpr_offset = cum->num_gprs;
   info->fpr_offset = cum->num_fprs;
 
+  /* TODO: Currently, it will produce ICE for --param
+ riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
+ let GCC genearte loads/stores. Ideally, GCC should either report
+ Warning message to tell user do not use RVV vector type in function
+ arg, or GCC just support function arg calling convention for RVV
+ directly.  */
+  if (riscv_v_ext_mode_p (mode))
+return NULL_RTX;
   if (named)
 {
-  /* TODO: Currently, it will produce ICE for --param
-riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
-let GCC genearte loads/stores. Ideally, GCC should either report
-Warning message to tell user do not use RVV vector type in function
-arg, or GCC just support function arg calling convention for RVV
-directly.  */
-  if (riscv_v_ext_mode_p (mode))
-   return NULL_RTX;
   riscv_aggregate_field fields[2];
   unsigned fregno = fpr_base + info->fpr_offset;
   unsigned gregno = gpr_base + info->gpr_offset;
-- 
2.36.3

1 2 >

1 - 100 of 190 matches

Mail list logo