date:20230515

Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-15 Thread Richard Sandiford via Gcc-patches

"juzhe.zh...@rivai.ai"  writes:
>>> The examples are good, but this one made me wonder: why is the
>>> adjustment made to the limit (namely 16, the gap between _39 and _41)
>>> different from the limits imposed by the MIN_EXPR (32)?  And I think
>>> the answer is that:
>
>>> - _47 counts the number of elements processed by the loop in total,
>>>   including the vectors under the control of _44
>
>>> - _44 counts the number of elements controlled by _47 in the next
>>>   iteration of the vector loop (if there is one)
>
>>> And that's needed to allow the IVs to be updated independently.
>
>>> The difficulty with this is that the len_load* and len_store*
>>> optabs currently say that the behaviour is undefined if the
>>> length argument is greater than the length of a vector.
>>> So I think using these values of _47 and _44 in the .LEN_STOREs
>>> is relying on undefined behaviour.
>
>>> Haven't had time to think about the consequences of that yet,
>>> but wanted to send something out sooner rather than later.
>
> Hi, Richard. I totally understand your concern now. I think the undefine 
> behavior is more
> appropriate for RVV since we have vsetvli instruction that gurantee this will 
> cause potential
> issues. However, for some other target, we may need to use additional 
> MIN_EXPR to guard
> the length never over VF. I think it can be addressed in the future when it 
> is needed.

But we can't generate (vector) gimple that has undefined behaviour from
(scalar) gimple that had defined behaviour.  So something needs to change.
Either we need to generate a different sequence, or we need to define
what the behaviour of len_load/store/etc. are when the length is out of
range (perhaps under a target hook?).

We also need to be consistent.  If case 2 is allowed to use length
parameters that are greater than the vector length, then there's no
reason for case 1 to use the result of the MIN_EXPR as the length
parameter.  It could just use the loop IV directly.  (I realise the
select_vl patch will change case 1 for RVV anyway.  But the principle
still holds.)

What does the riscv backend's implementation of the len_load and
len_store guarantee?  Is any length greater than the vector length
capped to the vector length?  Or is it more complicated than that?

Thanks,
Richard

[PATCH] RISC-V: Adjust stdint.h to stdint-gcc.h for rvv tests

2023-05-15 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to align the stdint.h to the stdint-gcc.h for all
the RVV test files. Aka:

stdint.h => stdint-gcc.h

Signed-off-by: Pan Li 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h:
Replace stdint.h with stdint-gcc.h.
* gcc.target/riscv/rvv/autovec/binop/shift-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vand-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vor-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vrem-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vxor-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/series-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Ditto.
---
 .../gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/shift-template.h | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vadd-template.h  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vand-template.h  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vdiv-template.h  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vmax-template.h  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vmin-template.h  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vmul-template.h  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vor-template.h | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vrem-template.h  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vsub-template.h  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vxor-template.h  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/series-1.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h   | 2 +-
 15 files changed, 15 insertions(+), 15 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h
index a0ddc00849d..8d1cefdca85 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h
@@ -2,7 +2,7 @@
 /* { dg-do run } */
 /* { dg-additional-options "-std=c99 --param=riscv-autovec-preference=scalable 
-fno-vect-cost-model --save-temps" } */
 
-#include 
+#include 
 #include 
 
 #define SHIFTL(TYPE,VAL)   \
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-template.h
index 64e0a386b06..16ae48c8ede 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-template.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-template.h
@@ -1,4 +1,4 @@
-#include 
+#include 
 
 #define TEST1_TYPE(TYPE)   \
   __attribute__((noipa))   \
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-template.h
index 5ed79329138..cd945d471d2 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-template.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-template.h
@@ -1,4 +1,4 @@
-#include 
+#include 
 
 #define TEST_TYPE(TYPE)\
   __attribute__((noipa))   \
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-template.h
index 7d02c83d164..5cabe073097 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-template.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-template.h
@@ -1,4 +1,4 @@
-#include 
+#include 
 
 #define TEST_TYPE(TYPE)\
   __attribute__((noipa))   \
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-template.h
index 7fbba7b4133..12a1de32874 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-template.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-template.h
@@ -1,4 +1,4 @@
-#include 
+#include 
 
 #define TEST_TYPE(TYPE)\
   __attribute__((noipa))   \
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autove

[PATCH] s390: Implement TARGET_ATOMIC_ALIGN_FOR_MODE

2023-05-15 Thread Stefan Schulze Frielinghaus via Gcc-patches

So far atomic objects are aligned according to their default alignment.
For 128 bit scalar types like int128 or long double this results in an
8 byte alignment which is wrong and must be 16 byte.

libstdc++ already computes a correct alignment, though, still adding a
test case in order to make sure that both implementations are
compatible.

Bootstrapped and regtested.  Ok for mainline?  Since this is an ABI
break, is a backport to GCC 13 reasonable?

gcc/ChangeLog:

* config/s390/s390.cc (TARGET_ATOMIC_ALIGN_FOR_MODE):
New.
(s390_atomic_align_for_mode): New.

gcc/testsuite/ChangeLog:

* g++.target/s390/atomic-align-1.C: New test.
* gcc.target/s390/atomic-align-1.c: New test.
* gcc.target/s390/atomic-align-2.c: New test.
---
 gcc/config/s390/s390.cc   |  8 ++
 .../g++.target/s390/atomic-align-1.C  | 25 +++
 .../gcc.target/s390/atomic-align-1.c  | 23 +
 .../gcc.target/s390/atomic-align-2.c  | 18 +
 4 files changed, 74 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/s390/atomic-align-1.C
 create mode 100644 gcc/testsuite/gcc.target/s390/atomic-align-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/atomic-align-2.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 505de995da8..4813bf91dc4 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -450,6 +450,14 @@ s390_preserve_fpr_arg_p (int regno)
  && regno >= FPR0_REGNUM);
 }
 
+#undef TARGET_ATOMIC_ALIGN_FOR_MODE
+#define TARGET_ATOMIC_ALIGN_FOR_MODE s390_atomic_align_for_mode
+static unsigned int
+s390_atomic_align_for_mode (machine_mode mode)
+{
+  return GET_MODE_BITSIZE (mode);
+}
+
 /* A couple of shortcuts.  */
 #define CONST_OK_FOR_J(x) \
CONST_OK_FOR_CONSTRAINT_P((x), 'J', "J")
diff --git a/gcc/testsuite/g++.target/s390/atomic-align-1.C 
b/gcc/testsuite/g++.target/s390/atomic-align-1.C
new file mode 100644
index 000..43aa0bc39ed
--- /dev/null
+++ b/gcc/testsuite/g++.target/s390/atomic-align-1.C
@@ -0,0 +1,25 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-std=c++11" } */
+/* { dg-final { scan-assembler-times {\.align\t2} 2 } } */
+/* { dg-final { scan-assembler-times {\.align\t4} 2 } } */
+/* { dg-final { scan-assembler-times {\.align\t8} 3 } } */
+/* { dg-final { scan-assembler-times {\.align\t16} 2 } } */
+
+#include 
+
+// 2
+std::atomic var_char;
+std::atomic var_short;
+// 4
+std::atomic var_int;
+// 8
+std::atomic var_long;
+std::atomic var_long_long;
+// 16
+std::atomic<__int128> var_int128;
+// 4
+std::atomic var_float;
+// 8
+std::atomic var_double;
+// 16
+std::atomic var_long_double;
diff --git a/gcc/testsuite/gcc.target/s390/atomic-align-1.c 
b/gcc/testsuite/gcc.target/s390/atomic-align-1.c
new file mode 100644
index 000..b2e1233e3ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/atomic-align-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-std=c11" } */
+/* { dg-final { scan-assembler-times {\.align\t2} 2 } } */
+/* { dg-final { scan-assembler-times {\.align\t4} 2 } } */
+/* { dg-final { scan-assembler-times {\.align\t8} 3 } } */
+/* { dg-final { scan-assembler-times {\.align\t16} 2 } } */
+
+// 2
+_Atomic char var_char;
+_Atomic short var_short;
+// 4
+_Atomic int var_int;
+// 8
+_Atomic long var_long;
+_Atomic long long var_long_long;
+// 16
+_Atomic __int128 var_int128;
+// 4
+_Atomic float var_float;
+// 8
+_Atomic double var_double;
+// 16
+_Atomic long double var_long_double;
diff --git a/gcc/testsuite/gcc.target/s390/atomic-align-2.c 
b/gcc/testsuite/gcc.target/s390/atomic-align-2.c
new file mode 100644
index 000..0bf17341bf8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/atomic-align-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O -std=c11" } */
+/* { dg-final { scan-assembler-not {abort} } } */
+
+/* The stack is 8 byte aligned which means GCC has to manually align a 16 byte
+   aligned object.  This is done by allocating not 16 but rather 24 bytes for
+   variable X and then manually aligning a pointer inside the memory block.
+   Validate this by ensuring that the if-statement is optimized out.  */
+
+void bar (_Atomic unsigned __int128 *ptr);
+
+void foo (void) {
+  _Atomic unsigned __int128 x;
+  unsigned long n = (unsigned long)&x;
+  if (n % 16 != 0)
+__builtin_abort ();
+  bar (&x);
+}
-- 
2.39.2

[PATCH v5 1/4] rs6000: Enable REE pass by default

2023-05-15 Thread Ajit Agarwal via Gcc-patches

Hello All:

This patch enable ree pass as a default pass for rs6000 target.
Bootstrapped and regtested for powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: Enable REE pass by default

Add ree pass as a default pass for rs6000 target for
O2 and above.

2023-05-16  Ajit Kumar Agarwal  

gcc/ChangeLog:

* common/config/rs6000/rs6000-common.cc: Add REE pass as a
default rs6000 target pass for O2 and above.
* doc/invoke.texi: Document -free
---
 gcc/common/config/rs6000/rs6000-common.cc | 2 ++
 gcc/doc/invoke.texi   | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/rs6000/rs6000-common.cc 
b/gcc/common/config/rs6000/rs6000-common.cc
index 2140c442ba9..968db215028 100644
--- a/gcc/common/config/rs6000/rs6000-common.cc
+++ b/gcc/common/config/rs6000/rs6000-common.cc
@@ -34,6 +34,8 @@ static const struct default_options 
rs6000_option_optimization_table[] =
 { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
 /* Enable -fsched-pressure for first pass instruction scheduling.  */
 { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
+/* Enable -free for zero extension and sign extension elimination.*/
+{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
 /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
loops at -O2 and above by default.  */
 { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b92b8576027..168fcc88b1d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12455,8 +12455,8 @@ Attempt to remove redundant extension instructions.  
This is especially
 helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
 registers after writing to their lower 32-bit half.
 
-Enabled for Alpha, AArch64 and x86 at levels @option{-O2},
-@option{-O3}, @option{-Os}.
+Enabled for Alpha, AArch64, RS/6000, RISC-V, SPARC, h83000 and x86 at levels 
+@option{-O2}, @option{-O3}, @option{-Os}.
 
 @opindex fno-lifetime-dse
 @opindex flifetime-dse
-- 
2.31.1

Re: [Testsuite] Skip -fdelete-null-pointer-check tests if target keeps_null_pointer_checks

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/14/23 23:06, SenthilKumar.Selvaraj--- via Gcc-patches wrote:

Hi,

When running regression tests related to 
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616772.html,
I noticed a bunch of failures because some tests explicitly pass in
-fdelete-null-pointer-checks, even if the target is configured to keep them.

This patch skips such failing tests by adding a dg-skip-if for 
keeps_null_pointer_checks.
Ok to commit?

Regards
Senthil

gcc/testsuite/ChangeLog:

* gcc.dg/attr-returns-nonnull.c: Skip if
keeps_null_pointer_checks.
* gcc.dg/init-compare-1.c: Likewise.
* gcc.dg/ipa/pr85734.c: Likewise.
* gcc.dg/ipa/propmalloc-1.c: Likewise.
* gcc.dg/ipa/propmalloc-2.c: Likewise.
* gcc.dg/ipa/propmalloc-3.c: Likewise.
* gcc.dg/ipa/propmalloc-4.c: Likewise.
* gcc.dg/tree-ssa/evrp11.c: Likewise.
* gcc.dg/tree-ssa/pr83648.c: Likewise.

OK.
jeff

Re: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-15 Thread juzhe.zh...@rivai.ai

>> The examples are good, but this one made me wonder: why is the
>> adjustment made to the limit (namely 16, the gap between _39 and _41)
>> different from the limits imposed by the MIN_EXPR (32)?  And I think
>> the answer is that:

>> - _47 counts the number of elements processed by the loop in total,
>>   including the vectors under the control of _44

>> - _44 counts the number of elements controlled by _47 in the next
>>   iteration of the vector loop (if there is one)

>> And that's needed to allow the IVs to be updated independently.

>> The difficulty with this is that the len_load* and len_store*
>> optabs currently say that the behaviour is undefined if the
>> length argument is greater than the length of a vector.
>> So I think using these values of _47 and _44 in the .LEN_STOREs
>> is relying on undefined behaviour.

>> Haven't had time to think about the consequences of that yet,
>> but wanted to send something out sooner rather than later.

Hi, Richard. I totally understand your concern now. I think the undefine 
behavior is more
appropriate for RVV since we have vsetvli instruction that gurantee this will 
cause potential
issues. However, for some other target, we may need to use additional MIN_EXPR 
to guard
the length never over VF. I think it can be addressed in the future when it is 
needed.

For now, is it OK for trunk the V9 patch?
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618638.html 
which has fixed the comments as you suggested.

Besides, we will going to add more patterns has length included:
len_mask_load/len_mask_stores, len_mask_gather_load/ len_cond...etc
They are all undefine behavior for length larger than the vector length.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-16 03:44
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer
juzhe.zh...@rivai.ai writes:
> From: Juzhe-Zhong 
>
> This patch implement decrement IV for length approach in loop control.
>
> Address comment from kewen that incorporate the implementation inside
> "vect_set_loop_controls_directly" instead of a standalone function.
>
> Address comment from Richard using MIN_EXPR to handle these 3 following
> cases
> 1. single rgroup.
> 2. multiple rgroup for SLP.
> 3. multiple rgroup for non-SLP (tested on vec_pack_trunc).
 
Thanks, this looks pretty reasonable to me FWIW, but some comments below:
 
> Bootstraped && Regression on x86.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function.
> (vect_set_loop_controls_directly): Add decrement IV support.
> (vect_set_loop_condition_partial_vectors): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Add a new 
> variable.
> (vect_get_loop_len): Add decrement IV support.
> * tree-vect-stmts.cc (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
> (vect_get_loop_len): Add decrement IV support.
>
> ---
>  gcc/tree-vect-loop-manip.cc | 177 +++-
>  gcc/tree-vect-loop.cc   |  38 +++-
>  gcc/tree-vect-stmts.cc  |   9 +-
>  gcc/tree-vectorizer.h   |  13 ++-
>  4 files changed, 224 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index ff6159e08d5..1baac7b1b52 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -385,6 +385,58 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
> rgroup_controls *dest_rgm,
>return false;
>  }
>  
> +/* Try to use adjust loop lens for non-SLP multiple-rgroups.
> +
> + _36 = MIN_EXPR ;
> +
> + First length (MIN (X, VF/N)):
> +   loop_len_15 = MIN_EXPR <_36, VF/N>;
> +
> + Second length:
> +   tmp = _36 - loop_len_15;
> +   loop_len_16 = MIN (tmp, VF/N);
> +
> + Third length:
> +   tmp2 = tmp - loop_len_16;
> +   loop_len_17 = MIN (tmp2, VF/N);
> +
> + Forth length:
> +   tmp3 = tmp2 - loop_len_17;
> +   loop_len_18 = MIN (tmp3, VF/N);  */
> +
> +static void
> +vect_adjust_loop_lens (tree iv_type, gimple_seq *seq, rgroup_controls 
> *dest_rgm,
> +rgroup_controls *src_rgm)
> +{
> +  tree ctrl_type = dest_rgm->type;
> +  poly_uint64 nitems_per_ctrl
> += TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor;
> +
> +  for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i)
> +{
> +  tree src = src_rgm->controls[i / dest_rgm->controls.length ()];
> +  tree dest = dest_rgm->controls[i];
> +  tree length_limit = build_int_cst (iv_type, nitems_per_ctrl);
> +  gassign *stmt;
> +  if (i == 0)
> + {
> +   /* MIN (X, VF*I/N) capped to the range [0, VF/N].  */
> +   stmt = gimple_build_assign (dest, MIN_EXPR, src, length_limit);
> +   gimple_seq_add_stmt (seq, stmt);
> + }
> +  else
> + {
> +   /* (MIN (remain, VF*I/N))

Re: [gcc13 backport] RISCV: Inline subword atomic ops

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/9/23 10:01, Patrick O'Neill wrote:

Ping.

OK for backporting.  Sorry for the delay.

jeff

Re: [PATCH] riscv: Add autovectorization tests for binary integer

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 03:15, juzhe.zh...@rivai.ai wrote:

I think it is the issue of include file.

Kito may know the better the solution instead of changing stdint.h into 
stdint-gcc.h.
I think that's the only solution right now.  I'm not keen to open up the 
multilib can of worms.


Consider a patch that changes stdint.h -> stdint-gcc.h in the RVV 
testsuite pre-approved.


jeff

Re: [PATCH v9] RISC-V: Add the 'zfa' extension, version 0.2

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 07:16, Jin Ma wrote:

This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb

The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html

What needs special explanation is:
1, The immediate number of the instructions FLI.H/S/D is represented in the 
assembly as a
   floating-point value, with scientific counting when rs1 is 2,3, and decimal 
numbers for
   the rest.

   Related llvm link:
 https://reviews.llvm.org/D145645
   Related discussion link:
 https://github.com/riscv/riscv-isa-manual/issues/980

2, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally 
to
   accelerate the processing of JavaScript Numbers.", so it seems that no 
implementation
   is required.

3, The instructions FMINM and FMAXM correspond to C23 library function fminimum 
and fmaximum.
   Therefore, this patch has simply implemented the pattern of fminm3 
and
   fmaxm3 to prepare for later.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension version.
* config/riscv/constraints.md (zfli): Constrain the floating point 
number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
(rup): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, 
memory is not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no 
split is required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D 
immediate in assembly
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm3): New.
(fmaxm3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f_quiet4_zfa): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-rv32.c: New test.
* gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
* gcc.target/riscv/zfa-fround-rv32.c: New test.
* gcc.target/riscv/zfa-fround.c: New test.
---
  gcc/common/config/riscv/riscv-common.cc   |   4 +
  gcc/config/riscv/constraints.md   |  21 +-
  gcc/config/riscv/iterators.md |   5 +
  gcc/config/riscv/riscv-opts.h |   3 +
  gcc/config/riscv/riscv-protos.h   |   1 +
  gcc/config/riscv/riscv.cc | 204 +-
  gcc/config/riscv/riscv.md | 145 +++--
  .../gcc.target/riscv/zfa-fleq-fltq-rv32.c |  19 ++
  .../gcc.target/riscv/zfa-fleq-fltq.c  |  19 ++
  gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c |  79 +++
  .../gcc.target/riscv/zfa-fli-zfh-rv32.c   |  41 
  gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  41 
  gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  79 +++
  .../gcc.target/riscv/zfa-fmovh-fmovp-rv32.c   |  10 +
  .../gcc.target/riscv/zfa-fround-rv32.c|  42 
  gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  42 
  16 files changed, 719 insertions(+), 36 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq-rv32.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh-rv32.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-rv32.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround-rv32.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c





+
+/* Return index of the FLI instruction table if rtx X is an immediate constant 
that can
+   be moved using a single FLI instruction in zfa extension. Return -1 if not 
found.  */
+
+int
+riscv_float_const_rtx_index_for_fli (rtx x)
+{
+  unsigned HOST_WIDE_INT *fli_value_array;
+
+  machine_mode mode = GET_MODE (x);
+
+  if (!TARGET_ZFA
+  || !CONST_DOUBLE_P(x)
+  || mode == VOIDmode
+  || (mode == HFmode && !TARGET_ZFH)
+  || (mode == SFmode && !TARGET_HARD_FLOAT)
+  || (mode == D

Re: [PATCH v9] RISC-V: Add the 'zfa' extension, version 0.2

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 07:30, jinma wrote:

According to Jeff's review feedback, the issues regarding UNSPEC's 
implementation of round, ceil, nearbyint, etc. still need to be determined:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617706.html

source:
https://github.com/majin2020/gcc-mirror/commit/93d7a2d995cee588d494d1839f56e8151c6cb057
After double-checking I was incorrect.  We have named patterns for those 
operations, but the RTL for them are UNSPECs.  So this is a non-issue 
for this patch.


jeff

Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/6/23 06:53, jinma wrote:

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 9b767038452..c81b08e3cc5 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -288,3 +288,8 @@ (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET 
UNSPEC_FLE_QUIET])
   (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET 
"le")])
   (define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET 
"LE")])
   
+(define_int_iterator ROUND [UNSPEC_ROUND UNSPEC_FLOOR UNSPEC_CEIL UNSPEC_BTRUNC UNSPEC_ROUNDEVEN UNSPEC_NEARBYINT])

+(define_int_attr round_pattern [(UNSPEC_ROUND "round") (UNSPEC_FLOOR "floor") 
(UNSPEC_CEIL "ceil")
+   (UNSPEC_BTRUNC "btrunc") (UNSPEC_ROUNDEVEN "roundeven") 
(UNSPEC_NEARBYINT "nearbyint")])
+(define_int_attr round_rm [(UNSPEC_ROUND "rmm") (UNSPEC_FLOOR "rdn") (UNSPEC_CEIL 
"rup")
+  (UNSPEC_BTRUNC "rtz") (UNSPEC_ROUNDEVEN "rne") 
(UNSPEC_NEARBYINT "dyn")])

Do we really need to use unspecs for all these cases?  I would expect
some correspond to the trunc, round, ceil, nearbyint, etc well known RTX
codes.

In general, we should try to avoid unspecs when there is a clear
semantic match between the instruction and GCC's RTX opcodes.  So please
review the existing RTX code semantics to see if any match the new
instructions.  If there are matches, use those RTX codes rather than
UNSPECs.


I'll try, thanks.



I encountered some confusion about this. I checked gcc's documents and
found no RTX codes that can correspond to round, ceil, nearbyint, etc.
Only "(fix:m x)" seems to correspond to trunc, which can be expressed
as rounding towards zero, while others have not yet been found.
You're largely correct.  My bad.  There's named patterns for round to 
integer, nearbyint, etc, but no RTX codes.  So they need to be handled 
as unspecs.  Sorry fo the confusion.


Jeff

Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 07:54, 钟居哲 wrote:
I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
maintainer) said we should

not add frm into vsqrt.v. Maybe kito knows the reason ?
I'm pretty sure this is referring to the estimator.   The documentation 
is very clear that the sqrt estimator is independent of the rounding mode.


While it's not as explicit in the RV manual, a real sqrt instruction 
must round to be usable in an IEEE compliant way.  If it didn't honor 
rounding modes we would largely be unable to use the [v]fsqrt instructions.


Jeff

RE: [PATCH V2] RISC-V: Add FRM and rounding mode operand into floating point intrinsics

2023-05-15 Thread Li, Pan2 via Gcc-patches

Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Tuesday, May 16, 2023 11:27 AM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: Kito.cheng ; palmer ; Robin 
Dapp 
Subject: Re: [PATCH V2] RISC-V: Add FRM and rounding mode operand into floating 
point intrinsics



On 5/15/23 19:02, juzhe.zh...@rivai.ai wrote:
> Ping。
> 
> Is it Ok for trunk ? I have double checked the floating-point 
> instructions needed FRM.
Yes, this is OK for the trunk.

Thanks,
jeff

Re: [PATCH V2] RISC-V: Add FRM and rounding mode operand into floating point intrinsics

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 19:02, juzhe.zh...@rivai.ai wrote:

Ping。

Is it Ok for trunk ? I have double checked the floating-point 
instructions needed FRM.

Yes, this is OK for the trunk.

Thanks,
jeff

Re: [PATCH] MATCH: [PR109424] Simplify min/max of boolean arguments

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 19:36, Andrew Pinski via Gcc-patches wrote:

This is version 2 of 
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html
which does not depend on adding gimple_truth_valued_p at this point.
Instead will use zero_one_valued_p which is already used for mult 
simplifications
to make sure that we only have [0,1] rather having the mistake of maybe having 
[-1,0]
as the range for signed bools.

This shows up in a few places in GCC itself but only at -O1, we miss the 
min/max conversion
because of PR 107888 (which I will be testing seperately).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

PR tree-optimization/109424

gcc/ChangeLog:

* match.pd: Add patterns for min/max of zero_one_valued
values to `&`/`|`.
Not sure it buys us a whole lot.  I guess the strongest argument is 
probably that turning it into a logical helps on targets without min/max 
support.


OK.

jeff

[PATCH] MATCH: [PR109424] Simplify min/max of boolean arguments

2023-05-15 Thread Andrew Pinski via Gcc-patches

This is version 2 of 
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html
which does not depend on adding gimple_truth_valued_p at this point.
Instead will use zero_one_valued_p which is already used for mult 
simplifications
to make sure that we only have [0,1] rather having the mistake of maybe having 
[-1,0]
as the range for signed bools.

This shows up in a few places in GCC itself but only at -O1, we miss the 
min/max conversion
because of PR 107888 (which I will be testing seperately).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

PR tree-optimization/109424

gcc/ChangeLog:

* match.pd: Add patterns for min/max of zero_one_valued
values to `&`/`|`.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bool-12.c: New test.
* gcc.dg/tree-ssa/bool-13.c: New test.
* gcc.dg/tree-ssa/minmax-20.c: New test.
* gcc.dg/tree-ssa/minmax-21.c: New test.
---
 gcc/match.pd  |  8 +
 gcc/testsuite/gcc.dg/tree-ssa/bool-12.c   | 44 +++
 gcc/testsuite/gcc.dg/tree-ssa/bool-13.c   | 38 
 gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c | 27 ++
 gcc/testsuite/gcc.dg/tree-ssa/minmax-21.c | 28 +++
 5 files changed, 145 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-12.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-13.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-21.c

diff --git a/gcc/match.pd b/gcc/match.pd
index b025fb8facf..30ffdfcf8bb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7439,6 +7439,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& TREE_CODE (@0) != INTEGER_CST)
(op @0 (ext @1 @2)
 
+/* Max -> bool0 | bool1
+   Min -> bool0 & bool1 */
+(for op(max min)
+ logic (bit_ior bit_and)
+ (simplify
+  (op zero_one_valued_p@0 zero_one_valued_p@1)
+  (logic @0 @1)))
+
 /* signbit(x) != 0 ? -x : x -> abs(x)
signbit(x) == 0 ? -x : x -> -abs(x) */
 (for sign (SIGNBIT)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bool-12.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bool-12.c
new file mode 100644
index 000..e62594e1dad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bool-12.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized -fdump-tree-original 
-fdump-tree-phiopt1 -fdump-tree-forwprop2" } */
+#define bool _Bool
+int maxbool(bool ab, bool bb)
+{
+  int a = ab;
+  int b = bb;
+  int c;
+  if (a > b)
+c = a;
+  else
+c = b;
+  return c;
+}
+int minbool(bool ab, bool bb)
+{
+  int a = ab;
+  int b = bb;
+  int c;
+  if (a < b)
+c = a;
+  else
+c = b;
+  return c;
+}
+/* In Original, we should still have the if form as that is what is written. */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "original" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "original" } } */
+/* { dg-final { scan-tree-dump-times "if " 2 "original" } } */
+
+/* PHI-OPT1 should have converted it into min/max */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "phiopt1" } } */
+
+/* Forwprop2 (after ccp) will convert it into &\| */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "forwprop2" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "forwprop2" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "forwprop2" } } */
+
+/* By optimize there should be no min/max nor if  */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bool-13.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bool-13.c
new file mode 100644
index 000..438f15a484a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bool-13.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized -fdump-tree-original 
-fdump-tree-phiopt1 -fdump-tree-forwprop2" } */
+#define bool _Bool
+int maxbool(bool ab, bool bb)
+{
+  int a = ab;
+  int b = bb;
+  int c;
+  c = a > b ? a : b;
+  return c;
+}
+int minbool(bool ab, bool bb)
+{
+  int a = ab;
+  int b = bb;
+  int c;
+  c = a < b ? a : b;
+  return c;
+}
+/* In Original, we should still have the min/max form as that is what is 
written. */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "original" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "original" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "original" } } */
+
+/* PHI-OPT1 should have kept it as min/max. */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "phiopt1" } } */
+
+/* Forwprop2 (afte

RE: [PATCH v3] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-15 Thread Li, Pan2 via Gcc-patches

Kindly ping for this PATCH v3.

Pan

-Original Message-
From: Li, Pan2  
Sent: Saturday, May 13, 2023 9:13 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2 ; 
Wang, Yanzhang ; jeffreya...@gmail.com; 
rguent...@suse.de; richard.sandif...@arm.com
Subject: [PATCH v3] Machine_Mode: Extend machine_mode from 8 to 16 bits

From: Pan Li 

We are running out of the machine_mode(8 bits) in RISC-V backend. Thus we would 
like to extend the machine_mode bit size from 8 to 16 bits.
However, it is sensitive to extend the memory size in common structure like 
tree or rtx. This patch would like to extend the machine_mode bits to 16 bits 
by shrinking, like:

* Swap the bit size of code and machine code in rtx_def.
* Adjust the machine_mode location and spare in tree.

The memory impact of this patch for correlated structure looks like below:

+---+--+-+--+
| struct/bytes  | upstream | patched | diff |
+---+--+-+--+
| rtx_obj_reference |8 |  12 |   +4 |
| ext_modified  |2 |   4 |   +2 |
| ira_allocno   |  192 | 184 |   -8 |
| qty_table_elem|   40 |  40 |0 |
| reg_stat_type |   64 |  64 |0 |
| rtx_def   |   40 |  40 |0 |
| table_elt |   80 |  80 |0 |
| tree_decl_common  |  112 | 112 |0 |
| tree_type_common  |  128 | 128 |0 |
+---+--+-+--+

The tree and rtx related struct has no memory changes after this patch, and the 
machine_mode changes to 16 bits already.

Signed-off-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
Co-authored-by: Kito Cheng 
Co-Authored-By: Richard Biener 
Co-Authored-By: Richard Sandiford 

gcc/ChangeLog:

* combine.cc (struct reg_stat_type): Extend machine_mode to 16 bits.
* cse.cc (struct qty_table_elem): Extend machine_mode to 16 bits
(struct table_elt): Extend machine_mode to 16 bits.
(struct set): Ditto.
* genmodes.cc (emit_mode_wider): Extend type from char to short.
(emit_mode_complex): Ditto.
(emit_mode_inner): Ditto.
(emit_class_narrowest_mode): Ditto.
* genopinit.cc (main): Extend the machine_mode limit.
* ira-int.h (struct ira_allocno): Extend machine_mode to 16 bits and
re-ordered the struct fields for padding.
* machmode.h (MACHINE_MODE_BITSIZE): New macro.
(GET_MODE_2XWIDER_MODE): Extend type from char to short.
(get_mode_alignment): Extend type from char to short.
* ree.cc (struct ext_modified): Extend machine_mode to 16 bits and
removed the ATTRIBUTE_PACKED.
* rtl-ssa/accesses.h: Extend machine_mode to 16 bits.
* rtl.h (RTX_CODE_BITSIZE): New macro.
(struct rtx_def): Swap both the bit size and location between the
rtx_code and the machine_mode.
(subreg_shape::unique_id): Extend the machine_mode limit.
* rtlanal.h: Extend machine_mode to 16 bits.
* tree-core.h (struct tree_type_common): Extend machine_mode to 16
bits and re-ordered the struct fields for padding.
(struct tree_decl_common): Extend machine_mode to 16 bits.
---
 gcc/combine.cc |  4 +--
 gcc/cse.cc | 16 
 gcc/genmodes.cc| 16 ++--
 gcc/genopinit.cc   |  3 ++-
 gcc/ira-int.h  | 56 +-
 gcc/machmode.h | 27 +++-
 gcc/ree.cc |  4 +--
 gcc/rtl-ssa/accesses.h |  2 +-
 gcc/rtl.h  | 12 +
 gcc/rtlanal.h  |  2 +-
 gcc/tree-core.h|  9 ---
 11 files changed, 82 insertions(+), 69 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc index 5aa0ec5c45a..a23caeed96f 
100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -200,7 +200,7 @@ struct reg_stat_type {
 
   unsigned HOST_WIDE_INT   last_set_nonzero_bits;
   char last_set_sign_bit_copies;
-  ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
+  ENUM_BITFIELD(machine_mode)  last_set_mode : MACHINE_MODE_BITSIZE;
 
   /* Set nonzero if references to register n in expressions should not be
  used.  last_set_invalid is set nonzero when this register is being @@ 
-235,7 +235,7 @@ struct reg_stat_type {
  truncation if we know that value already contains a truncated
  value.  */
 
-  ENUM_BITFIELD(machine_mode)  truncated_to_mode : 8;
+  ENUM_BITFIELD(machine_mode)  truncated_to_mode : MACHINE_MODE_BITSIZE;
 };
 
 
diff --git a/gcc/cse.cc b/gcc/cse.cc
index b10c9b0c94d..86403b95938 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -248,10 +248,8 @@ struct qty_table_elem
   rtx comparison_const;
   int comparison_qty;
   unsigned int first_reg, last_reg;
-  /* The sizes of these fields should match the sizes of the
- code and mode fields of struct rtx_def (see rtl.h).  */
-  ENUM_BITFIELD(rtx_code) compar

RE: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to vbool1_t

2023-05-15 Thread Li, Pan2 via Gcc-patches

Kindly ping for this PATCH, 😉.

Pan

From: Li, Pan2
Sent: Monday, May 15, 2023 11:25 AM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: Kito.cheng ; Wang, Yanzhang 
Subject: RE: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to 
vbool1_t

Thanks Juzhe. Let’s wait kito’s suggestion.

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Monday, May 15, 2023 11:20 AM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: Kito.cheng mailto:kito.ch...@sifive.com>>; Li, Pan2 
mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>
Subject: Re: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to 
vbool1_t

The implementation LGTM.
But I am not sure testcase since we don't include any intrinsic API testcases 
in GCC testsuite.
I think it needs Kito's decision.

Thanks.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-05-15 11:14
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to vbool1_t
From: Pan Li mailto:pan2...@intel.com>>

This patch support the RVV VREINTERPRET from the int to the vbool1_t.  Aka:

vbool1_t __riscv_vreinterpret_xx_xx(v{u}int[8|16|32|64]_t);

These APIs help the users to convert vector LMUL=1 integer to vbool1_t.
According to the RVV intrinsic SPEC as below, the reinterpret intrinsics
only change the types of the underlying contents.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1

For example, given below code.
vbool1_t test_vreinterpret_v_i8m1_b1(vint8m1_t src) {
  return __riscv_vreinterpret_v_i8m1_b1(src);
}

It will generate the assembly code similar as below:
vsetvli a5,zero,e8,m8,ta,ma
vlm.v   v1,0(a1)
vsm.v   v1,0(a0)
ret

The rest intrinsic bool size APIs will be prepared in other PATCH.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (BOOL_SIZE_LIST): New
  macro.
(main): Add bool1 to the type indexer.
* config/riscv/riscv-vector-builtins-functions.def
(vreinterpret): Register vbool1 interpret function.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
(vint8m1_t): Add the type to bool1_interpret_ops.
(vint16m1_t): Ditto.
(vint32m1_t): Ditto.
(vint64m1_t): Ditto.
(vuint8m1_t): Ditto.
(vuint16m1_t): Ditto.
(vuint32m1_t): Ditto.
(vuint64m1_t): Ditto.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
(required_extensions_p): Add bool1 interpret case.
* config/riscv/riscv-vector-builtins.def
(bool1_interpret): Add bool1 interpret to base type.
* config/riscv/vector.md (@vreinterpret): Add new expand
with VB dest for vreinterpret.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: New test.
---
gcc/config/riscv/genrvv-type-indexer.cc   | 19 ++
.../riscv/riscv-vector-builtins-functions.def |  1 +
.../riscv/riscv-vector-builtins-types.def | 17 +
gcc/config/riscv/riscv-vector-builtins.cc | 18 +
gcc/config/riscv/riscv-vector-builtins.def|  2 +
gcc/config/riscv/vector.md| 10 +
.../rvv/base/misc_vreinterpret_vbool_vint.c   | 38 +++
7 files changed, 105 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 9bf6a82601d..2f0375568a8 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see
#include 
#include 
+#define BOOL_SIZE_LIST {1}
+
std::string
to_lmul (int lmul_log2)
{
@@ -218,6 +220,9 @@ main (int argc, const char **argv)
   for (unsigned eew : {8, 16, 32, 64})
fprintf (fp, "  /*EEW%d_INTERPRET*/ INVALID,\n", eew);
+  for (unsigned boolsize : BOOL_SIZE_LIST)
+ fprintf (fp, "  /*BOOL%d_INTERPRET*/ INVALID,\n", boolsize);
+
   for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
{
  unsigned multiple_of_lmul = 1 << lmul_log2_offset;
@@ -297,6 +302,16 @@ main (int argc, const char **argv)
   inttype (eew, lmul_log2, unsigned_p).c_str ());
  }
+ for (unsigned boolsize : BOOL_SIZE_LIST)
+   {
+ std::stringstream mode;
+ mode << "vbool" << boolsize << "_t";
+
+ fprintf (fp, "  /*BOOL%d_INTERPRET*/ %s,\n", boolsize,
+ nf == 1 && lmul_log2 == 0 ? mode.str ().c_str ()
+: "INVALID");
+   }
+
for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
  {
unsigned multiple_of_lmul = 1 << lmul_log2_offset;
@@ -355,6 +370,10 @@ main (int argc, const char **argv)
   floattype (sew

Re: [PATCH V2] RISC-V: Add FRM and rounding mode operand into floating point intrinsics

2023-05-15 Thread juzhe.zh...@rivai.ai

Ping。

Is it Ok for trunk ? I have double checked the floating-point instructions 
needed FRM.

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-15 22:53
To: gcc-patches
CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Add FRM and rounding mode operand into floating 
point intrinsics
From: Juzhe-Zhong 
 
This patch is adding rounding mode operand and FRM_REGNUM dependency
into floating-point instructions.
 
The floating-point instructions we added FRM and rounding mode operand:
1. vfadd/vfsub
2. vfwadd/vfwsub
3. vfmul
4. vfdiv
5. vfwmul
6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
7. vfsqrt
8. floating-point conversions.
9. floating-point reductions.
10. floating-point ternary.
 
The floating-point instructions we did NOT add FRM and rounding mode operand:
1. vfabs/vfneg/vfsqrt7/vfrec7
2. vfmin/vfmax
3. comparisons
4. vfclass
5. vfsgnj/vfsgnjn/vfsgnjx
6. vfmerge
7. vfmv.v.f
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (enum frm_field_enum): New enum.
* config/riscv/riscv-vector-builtins.cc 
(function_expander::use_ternop_insn): Add default rounding mode.
(function_expander::use_widen_ternop_insn): Ditto.
* config/riscv/riscv.cc (riscv_hard_regno_nregs): Add FRM REGNUM.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_conditional_register_usage): Ditto.
* config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
(FRM_REG_P): Ditto.
(RISCV_DWARF_FRM): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: split no frm and has frm operations.
* config/riscv/vector.md (@pred__scalar): New pattern.
(@pred_): Ditto.
 
---
gcc/config/riscv/riscv-protos.h   |  10 +
gcc/config/riscv/riscv-vector-builtins.cc |  14 ++
gcc/config/riscv/riscv.cc |   7 +-
gcc/config/riscv/riscv.h  |   7 +-
gcc/config/riscv/riscv.md |   1 +
gcc/config/riscv/vector-iterators.md  |   9 +-
gcc/config/riscv/vector.md| 258 ++
7 files changed, 251 insertions(+), 55 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 835bb802fc6..12634d0ac1a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -231,6 +231,16 @@ enum vxrm_field_enum
   VXRM_RDN,
   VXRM_ROD
};
+/* Rounding mode bitfield for floating point FRM.  */
+enum frm_field_enum
+{
+  FRM_RNE = 0b000,
+  FRM_RTZ = 0b001,
+  FRM_RDN = 0b010,
+  FRM_RUP = 0b011,
+  FRM_RMM = 0b100,
+  DYN = 0b111
+};
}
/* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 1de075fb90d..b7458aaace6 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3460,6 +3460,13 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
insn_code icode)
   add_input_operand (Pmode, get_tail_policy_for_pred (pred));
   add_input_operand (Pmode, get_mask_policy_for_pred (pred));
   add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
+
+  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
mode.
+ We add default rounding mode for the intrinsics that didn't model rounding
+ mode yet.  */
+  if (opno != insn_data[icode].n_generator_args)
+add_input_operand (Pmode, const0_rtx);
+
   return generate_insn (icode);
}
@@ -3482,6 +3489,13 @@ function_expander::use_widen_ternop_insn (insn_code 
icode)
   add_input_operand (Pmode, get_tail_policy_for_pred (pred));
   add_input_operand (Pmode, get_mask_policy_for_pred (pred));
   add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
+
+  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
mode.
+ We add default rounding mode for the intrinsics that didn't model rounding
+ mode yet.  */
+  if (opno != insn_data[icode].n_generator_args)
+add_input_operand (Pmode, const0_rtx);
+
   return generate_insn (icode);
}
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b52e613c629..de5b87b1a87 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6082,7 +6082,8 @@ riscv_hard_regno_nregs (unsigned int regno, machine_mode 
mode)
   /* mode for VL or VTYPE are just a marker, not holding value,
  so it always consume one register.  */
-  if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno))
+  if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno)
+  || FRM_REG_P (regno))
 return 1;
   /* Assume every valid non-vector mode fits in one vector register.  */
@@ -6150,7 +6151,8 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
   if (lmul != 1)
return ((regno % lmul) == 0);
 }
-  else if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno))
+  else if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno)
+

[committed] c: Ignore _Atomic on function return type for C2x

2023-05-15 Thread Joseph Myers

For C2x it was decided that _Atomic would be completely ignored on
function return types (just as was done for qualifiers in C11 DR#423),
to eliminate the potential for an rvalue returned by a function having
_Atomic-qualified type when an rvalue resulting from lvalue-to-rvalue
conversion could not have such a type.  Implement this for GCC.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-decl.cc (grokdeclarator): Ignore _Atomic on function return
type for C2x.

gcc/testsuite/
* gcc.dg/qual-return-9.c, gcc.dg/qual-return-10.c: New tests.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 1b53f2d0785..90d7cd27cd5 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -7412,9 +7412,12 @@ grokdeclarator (const struct c_declarator *declarator,
   them for noreturn functions.  The resolution of C11
   DR#423 means qualifiers (other than _Atomic) are
   actually removed from the return type when
-  determining the function type.  */
+  determining the function type.  For C2X, _Atomic is
+  removed as well.  */
int quals_used = type_quals;
-   if (flag_isoc11)
+   if (flag_isoc2x)
+ quals_used = 0;
+   else if (flag_isoc11)
  quals_used &= TYPE_QUAL_ATOMIC;
if (quals_used && VOID_TYPE_P (type) && really_funcdef)
  pedwarn (specs_loc, 0,
diff --git a/gcc/testsuite/gcc.dg/qual-return-10.c 
b/gcc/testsuite/gcc.dg/qual-return-10.c
new file mode 100644
index 000..c7dd6adc4c6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/qual-return-10.c
@@ -0,0 +1,12 @@
+/* Test qualifiers on function return types in C2X (C2X version of
+   qual-return-6.c): those qualifiers are now ignored for all purposes,
+   including _Atomic, but should still get warnings.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -Wignored-qualifiers" } */
+
+const int f1 (void); /* { dg-warning "qualifiers ignored" } */
+volatile int f2 (void) { return 0; } /* { dg-warning "qualifiers ignored" } */
+const volatile void f3 (void) { } /* { dg-warning "qualifiers ignored" } */
+const void f4 (void); /* { dg-warning "qualifiers ignored" } */
+_Atomic int f5 (void); /* { dg-warning "qualifiers ignored" } */
+_Atomic int f6 (void) { return 0; } /* { dg-warning "qualifiers ignored" } */
diff --git a/gcc/testsuite/gcc.dg/qual-return-9.c 
b/gcc/testsuite/gcc.dg/qual-return-9.c
new file mode 100644
index 000..7762782edf0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/qual-return-9.c
@@ -0,0 +1,32 @@
+/* Test qualifiers on function return types in C2X (C2X version of
+   qual-return-5.c): those qualifiers are now ignored for all purposes,
+   including _Atomic.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+int f1 (void);
+const int f1 (void);
+volatile int f1 (void) { return 0; }
+
+int *restrict f2 (void) { return 0; }
+int *f2 (void);
+
+const volatile long f3 (void);
+long f3 (void);
+
+const volatile void f4 (void) { }
+void f4 (void);
+
+_Atomic int f5 (void);
+int f5 (void);
+
+int f6 (void);
+_Atomic int f6 (void) { return 0; }
+
+/* The standard seems unclear regarding the case where restrict is
+   applied to a function return type that may not be
+   restrict-qualified; assume here that it is disallowed.  */
+restrict int f7 (void); /* { dg-error "restrict" } */
+
+typedef void FT (void);
+FT *restrict f8 (void); /* { dg-error "restrict" } */

-- 
Joseph S. Myers
jos...@codesourcery.com

[committed] c: Update __has_c_attribute values for C2x

2023-05-15 Thread Joseph Myers

WG14 decided that __has_c_attribute should return the same value
(equal to the intended __STDC_VERSION__ value) for all standard
attributes in C2x, with values associated with when an attribute was
added to the working draft (or had semantics added or changed in the
working draft) only being used in earlier stages of development of
that draft.  The intent is that the values for existing attributes
increase in future standard versions only if there are new features /
semantic changes for those attributes.  Implement this change for GCC.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c-family/
* c-lex.cc (c_common_has_attribute): Use 202311 as
__has_c_attribute return for all C2x attributes.

gcc/testsuite/
* gcc.dg/c2x-has-c-attribute-2.c: Expect 202311L return value from
__has_c_attribute for all C2x attributes.

diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index 6eb0fae2f53..dcd061c7cb1 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -392,17 +392,13 @@ c_common_has_attribute (cpp_reader *pfile, bool 
std_syntax)
}
  else
{
- if (is_attribute_p ("deprecated", attr_name))
-   result = 201904;
- else if (is_attribute_p ("fallthrough", attr_name))
-   result = 201910;
- else if (is_attribute_p ("nodiscard", attr_name))
-   result = 202003;
- else if (is_attribute_p ("maybe_unused", attr_name))
-   result = 202106;
- else if (is_attribute_p ("noreturn", attr_name)
-  || is_attribute_p ("_Noreturn", attr_name))
-   result = 202202;
+ if (is_attribute_p ("deprecated", attr_name)
+ || is_attribute_p ("fallthrough", attr_name)
+ || is_attribute_p ("maybe_unused", attr_name)
+ || is_attribute_p ("nodiscard", attr_name)
+ || is_attribute_p ("noreturn", attr_name)
+ || is_attribute_p ("_Noreturn", attr_name))
+   result = 202311;
}
  if (result)
attr_name = NULL_TREE;
diff --git a/gcc/testsuite/gcc.dg/c2x-has-c-attribute-2.c 
b/gcc/testsuite/gcc.dg/c2x-has-c-attribute-2.c
index 3c34ab6cbd9..dc92b95e907 100644
--- a/gcc/testsuite/gcc.dg/c2x-has-c-attribute-2.c
+++ b/gcc/testsuite/gcc.dg/c2x-has-c-attribute-2.c
@@ -2,56 +2,56 @@
 /* { dg-do preprocess } */
 /* { dg-options "-std=c2x -pedantic-errors" } */
 
-#if __has_c_attribute ( nodiscard ) != 202003L
+#if __has_c_attribute ( nodiscard ) != 202311L
 #error "bad result for nodiscard"
 #endif
 
-#if __has_c_attribute ( __nodiscard__ ) != 202003L
+#if __has_c_attribute ( __nodiscard__ ) != 202311L
 #error "bad result for __nodiscard__"
 #endif
 
-#if __has_c_attribute(maybe_unused) != 202106L
+#if __has_c_attribute(maybe_unused) != 202311L
 #error "bad result for maybe_unused"
 #endif
 
-#if __has_c_attribute(__maybe_unused__) != 202106L
+#if __has_c_attribute(__maybe_unused__) != 202311L
 #error "bad result for __maybe_unused__"
 #endif
 
-#if __has_c_attribute (deprecated) != 201904L
+#if __has_c_attribute (deprecated) != 202311L
 #error "bad result for deprecated"
 #endif
 
-#if __has_c_attribute (__deprecated__) != 201904L
+#if __has_c_attribute (__deprecated__) != 202311L
 #error "bad result for __deprecated__"
 #endif
 
-#if __has_c_attribute (fallthrough) != 201910L
+#if __has_c_attribute (fallthrough) != 202311L
 #error "bad result for fallthrough"
 #endif
 
-#if __has_c_attribute (__fallthrough__) != 201910L
+#if __has_c_attribute (__fallthrough__) != 202311L
 #error "bad result for __fallthrough__"
 #endif
 
-#if __has_c_attribute (noreturn) != 202202L
+#if __has_c_attribute (noreturn) != 202311L
 #error "bad result for noreturn"
 #endif
 
-#if __has_c_attribute (__noreturn__) != 202202L
+#if __has_c_attribute (__noreturn__) != 202311L
 #error "bad result for __noreturn__"
 #endif
 
-#if __has_c_attribute (_Noreturn) != 202202L
+#if __has_c_attribute (_Noreturn) != 202311L
 #error "bad result for _Noreturn"
 #endif
 
-#if __has_c_attribute (___Noreturn__) != 202202L
+#if __has_c_attribute (___Noreturn__) != 202311L
 #error "bad result for ___Noreturn__"
 #endif
   
 /* Macros in the attribute name are expanded.  */
 #define foo deprecated
-#if __has_c_attribute (foo) != 201904L
+#if __has_c_attribute (foo) != 202311L
 #error "bad result for foo"
 #endif

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-15 Thread 钟居哲

Hi, Richard.

>> Easier to read as:

 >>  _41 = _40 - 16

>> (which might not be valid gimple, but pseudocode is good enough).

OK.


>> The difficulty with this is that the len_load* and len_store*
>>optabs currently say that the behaviour is undefined if the
>>length argument is greater than the length of a vector.
>>So I think using these values of _47 and _44 in the .LEN_STOREs
>>is relying on undefined behaviour.

>>Haven't had time to think about the consequences of that yet,
>>but wanted to send something out sooner rather than later.

Yes, we have tail agnostic (TA) in vsevli which is make tail element 
undefined value. The current optabs behavior matches the RVV specification.
I think maybe we can leave it to be carefully solved in the future. Currently,
I don't see the issue yet so far.

>>It would be better to use known_le here, without checking whether the
>>VF is constant.
Ok

Thank you so much for your patience helping this patch.
I have sent V8 patch with fixes as you suggested:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618638.html 

Can I merge this patch?

I am gonna post the next patch with select_vl included.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-16 03:44
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer
juzhe.zh...@rivai.ai writes:
> From: Juzhe-Zhong 
>
> This patch implement decrement IV for length approach in loop control.
>
> Address comment from kewen that incorporate the implementation inside
> "vect_set_loop_controls_directly" instead of a standalone function.
>
> Address comment from Richard using MIN_EXPR to handle these 3 following
> cases
> 1. single rgroup.
> 2. multiple rgroup for SLP.
> 3. multiple rgroup for non-SLP (tested on vec_pack_trunc).
 
Thanks, this looks pretty reasonable to me FWIW, but some comments below:
 
> Bootstraped && Regression on x86.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function.
> (vect_set_loop_controls_directly): Add decrement IV support.
> (vect_set_loop_condition_partial_vectors): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Add a new 
> variable.
> (vect_get_loop_len): Add decrement IV support.
> * tree-vect-stmts.cc (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
> (vect_get_loop_len): Add decrement IV support.
>
> ---
>  gcc/tree-vect-loop-manip.cc | 177 +++-
>  gcc/tree-vect-loop.cc   |  38 +++-
>  gcc/tree-vect-stmts.cc  |   9 +-
>  gcc/tree-vectorizer.h   |  13 ++-
>  4 files changed, 224 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index ff6159e08d5..1baac7b1b52 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -385,6 +385,58 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
> rgroup_controls *dest_rgm,
>return false;
>  }
>  
> +/* Try to use adjust loop lens for non-SLP multiple-rgroups.
> +
> + _36 = MIN_EXPR ;
> +
> + First length (MIN (X, VF/N)):
> +   loop_len_15 = MIN_EXPR <_36, VF/N>;
> +
> + Second length:
> +   tmp = _36 - loop_len_15;
> +   loop_len_16 = MIN (tmp, VF/N);
> +
> + Third length:
> +   tmp2 = tmp - loop_len_16;
> +   loop_len_17 = MIN (tmp2, VF/N);
> +
> + Forth length:
> +   tmp3 = tmp2 - loop_len_17;
> +   loop_len_18 = MIN (tmp3, VF/N);  */
> +
> +static void
> +vect_adjust_loop_lens (tree iv_type, gimple_seq *seq, rgroup_controls 
> *dest_rgm,
> +rgroup_controls *src_rgm)
> +{
> +  tree ctrl_type = dest_rgm->type;
> +  poly_uint64 nitems_per_ctrl
> += TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor;
> +
> +  for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i)
> +{
> +  tree src = src_rgm->controls[i / dest_rgm->controls.length ()];
> +  tree dest = dest_rgm->controls[i];
> +  tree length_limit = build_int_cst (iv_type, nitems_per_ctrl);
> +  gassign *stmt;
> +  if (i == 0)
> + {
> +   /* MIN (X, VF*I/N) capped to the range [0, VF/N].  */
> +   stmt = gimple_build_assign (dest, MIN_EXPR, src, length_limit);
> +   gimple_seq_add_stmt (seq, stmt);
> + }
> +  else
> + {
> +   /* (MIN (remain, VF*I/N)) capped to the range [0, VF/N].  */
> +   tree temp = make_ssa_name (iv_type);
> +   stmt = gimple_build_assign (temp, MINUS_EXPR, src,
> +   dest_rgm->controls[i - 1]);
> +   gimple_seq_add_stmt (seq, stmt);
> +   stmt = gimple_build_assign (dest, MIN_EXPR, temp, length_limit);
> +   gimple_seq_add_stmt (seq, stmt);
> + }
> +}
> +}
> +
>  /* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
> for all the rgroup controls in RGC and return a control that is nonzero
> when the loop needs to itera

[PATCH V9] VECT: Add decrement IV support in Loop Vectorizer

2023-05-15 Thread juzhe . zhong

From: Ju-Zhe Zhong 

his patch implement decrement IV for length approach in loop control.

Address comment from kewen that incorporate the implementation inside
"vect_set_loop_controls_directly" instead of a standalone function.

Address comment from Richard using MIN_EXPR to handle these 3 following
cases
1. single rgroup.
2. multiple rgroup for SLP.
3. multiple rgroup for non-SLP (tested on vec_pack_trunc).

Bootstraped && Regression on x86.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function.
(vect_set_loop_controls_directly): Add decrement IV support.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Add a new 
variable.
(vect_get_loop_len): Add decrement IV support.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
(vect_get_loop_len): Add decrement IV support.

---
 gcc/tree-vect-loop-manip.cc | 177 +++-
 gcc/tree-vect-loop.cc   |  37 +++-
 gcc/tree-vect-stmts.cc  |   9 +-
 gcc/tree-vectorizer.h   |  13 ++-
 4 files changed, 223 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index ff6159e08d5..aae2e122b1a 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -385,6 +385,58 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
rgroup_controls *dest_rgm,
   return false;
 }
 
+/* Try to use adjust loop lens for non-SLP multiple-rgroups.
+
+ _36 = MIN_EXPR ;
+
+ First length (MIN (X, VF/N)):
+   loop_len_15 = MIN_EXPR <_36, VF/N>;
+
+ Second length:
+   tmp = _36 - loop_len_15;
+   loop_len_16 = MIN (tmp, VF/N);
+
+ Third length:
+   tmp2 = tmp - loop_len_16;
+   loop_len_17 = MIN (tmp2, VF/N);
+
+ Forth length:
+   tmp3 = tmp2 - loop_len_17;
+   loop_len_18 = MIN (tmp3, VF/N);  */
+
+static void
+vect_adjust_loop_lens (tree iv_type, gimple_seq *seq, rgroup_controls 
*dest_rgm,
+  rgroup_controls *src_rgm)
+{
+  tree ctrl_type = dest_rgm->type;
+  poly_uint64 nitems_per_ctrl
+= TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor;
+
+  for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i)
+{
+  tree src = src_rgm->controls[i / dest_rgm->controls.length ()];
+  tree dest = dest_rgm->controls[i];
+  tree length_limit = build_int_cst (iv_type, nitems_per_ctrl);
+  gassign *stmt;
+  if (i == 0)
+   {
+ /* MIN (X, VF*I/N) capped to the range [0, VF/N].  */
+ stmt = gimple_build_assign (dest, MIN_EXPR, src, length_limit);
+ gimple_seq_add_stmt (seq, stmt);
+   }
+  else
+   {
+ /* (MIN (remain, VF*I/N)) capped to the range [0, VF/N].  */
+ tree temp = make_ssa_name (iv_type);
+ stmt = gimple_build_assign (temp, MINUS_EXPR, src,
+ dest_rgm->controls[i - 1]);
+ gimple_seq_add_stmt (seq, stmt);
+ stmt = gimple_build_assign (dest, MIN_EXPR, temp, length_limit);
+ gimple_seq_add_stmt (seq, stmt);
+   }
+}
+}
+
 /* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
for all the rgroup controls in RGC and return a control that is nonzero
when the loop needs to iterate.  Add any new preheader statements to
@@ -468,9 +520,10 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   standard_iv_increment_position (loop, &incr_gsi, &insert_after);
-  create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE,
-loop, &incr_gsi, insert_after, &index_before_incr,
-&index_after_incr);
+  if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE,
+  loop, &incr_gsi, insert_after, &index_before_incr,
+  &index_after_incr);
 
   tree zero_index = build_int_cst (compare_type, 0);
   tree test_index, test_limit, first_limit;
@@ -552,8 +605,13 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   /* Convert the IV value to the comparison type (either a no-op or
  a demotion).  */
   gimple_seq test_seq = NULL;
-  test_index = gimple_convert (&test_seq, compare_type, test_index);
-  gsi_insert_seq_before (test_gsi, test_seq, GSI_SAME_STMT);
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+test_limit = gimple_convert (preheader_seq, iv_type, nitems_total);
+  else
+{
+  test_index = gimple_convert (&test_seq, compare_type, test_index);
+  gsi_insert_seq_before (test_gsi, test_seq, GSI_SAME_STMT);
+}
 
   /* Provide a definition of each control in the group.  */
   tree next_ctrl = NULL_TREE;
@@ -587,6 +645,1

Re: [PATCH] Turn on LRA on all targets

2023-05-15 Thread Sam James via Gcc-patches


"Maciej W. Rozycki"  writes:

> On Sun, 23 Apr 2023, Segher Boessenkool wrote:
>
>> >  There are extra ICEs in regression testing and code quality is poor; cf. 
>> > .  
>> 
>> Do you have something you can show for this?  Maybe in a PR?
>
>  I have filed no PRs as I didn't assess the collateral damage at the time 
> I looked at it.  I only ran regression-testing with `-mlra' shortly after 
> I completed MODE_CC conversion and added the option, to see what lies 
> beyond.  And I only added `-mlra' and made minimal changes to make the 
> compiler build again just to make it easier to proceed towards LRA.

I think before moving forward with the plan in general, a PR is ideally
needed for each target anyway. Not all machine maintainers actively watch the
MLs.


signature.asc
Description: PGP signature

Re: [PATCH] Turn on LRA on all targets

2023-05-15 Thread Maciej W. Rozycki

On Sun, 23 Apr 2023, Segher Boessenkool wrote:

> >  There are extra ICEs in regression testing and code quality is poor; cf. 
> > .  
> 
> Do you have something you can show for this?  Maybe in a PR?

 I have filed no PRs as I didn't assess the collateral damage at the time 
I looked at it.  I only ran regression-testing with `-mlra' shortly after 
I completed MODE_CC conversion and added the option, to see what lies 
beyond.  And I only added `-mlra' and made minimal changes to make the 
compiler build again just to make it easier to proceed towards LRA.

> And, are the ICEs in the generic code, or something vax-specific?

 At least some were in generic code, e.g.:

during RTL pass: combine
.../gcc/testsuite/gcc.c-torture/compile/pr101562.c: In function 'foo':
.../gcc/testsuite/gcc.c-torture/compile/pr101562.c:12:1: internal compiler 
error: in insert, at wide-int.cc:682
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
compiler exited with status 1
FAIL: gcc.c-torture/compile/pr101562.c   -O1  (internal compiler error)
FAIL: gcc.c-torture/compile/pr101562.c   -O1  (test for excess errors)

(coming from `gcc_checking_assert (precision >= width)'), or:

In file included from .../gcc/testsuite/g++.dg/modules/xtreme-header-2.h:10,
 from .../gcc/testsuite/g++.dg/modules/xtreme-header-2_a.H:4:
.../vax-netbsdelf/libstdc++-v3/include/regex:42: internal compiler error: in 
set_filename, at cp/module.cc:19134
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
compiler exited with status 1
FAIL: g++.dg/modules/xtreme-header-2_a.H -std=c++2b (internal compiler error)
FAIL: g++.dg/modules/xtreme-header-2_a.H -std=c++2b (test for excess errors)

(from `gcc_checking_assert (!filename)').  As I say, I did not assess this 
at all back then and the logs are dated Nov 2021 (I had to chase them).

 Also I'm not going to dedicate any time now to switch the VAX backend to 
LRA, because old reload continues working while we have a non-functional 
exception unwinder that never ever worked, as I have recently discovered, 
which breaks lots of C++ code, including in particular native VAX/NetBSD 
GDB and `gdbserver' (my newly-ported implementation of), which is a bit of 
a problem (native VAX/NetBSD GCC has been spared owing to the decision not 
to use exceptions).

 And fixing the unwinder is going to be a major effort due to how the VAX 
CALLS machine instruction works and the stack frame has been consequently 
structured; it is unlike any other ELF target, and even if it can be 
expressed in DWARF terms (which I'm not entirely sure about), it is going 
to require a dedicated handler like with ARM or IA64.

 I may choose to implement a non-DWARF unwinder instead, as the VAX stack 
frame is always fully described by the hardware and there is never ever a 
need for debug information to be able to decode any VAX stack frame (the 
RET machine instruction uses the stack frame information to restore the 
previous PC, FP, SP, AP and any static registers saved by CALLS).

 So implementing a working exception unwinder has to take precedence over 
LRA and I do hope to complete it during this release cycle, but I may not 
have any time left for LRA.

 Please keep this in mind with any plans to drop old reload.  I'll highly 
appreciate that and I do keep LRA on my radar as the next item to address 
after the unwinder, by any means it's not been lost.

  Maciej

Re: [PATCH 0/3] Refactor memory block operations

2023-05-15 Thread Andreas Krebbel via Gcc-patches

On 5/15/23 09:17, Stefan Schulze Frielinghaus wrote:
> Bootstrapped and regtested.  Ok for mainline?
> 
> Stefan Schulze Frielinghaus (3):
>   s390: Refactor block operation cpymem
>   s390: Add block operation movmem
>   s390: Refactor block operation setmem
> 
>  gcc/config/s390/s390-protos.h|   5 +-
>  gcc/config/s390/s390.cc  | 301 ---
>  gcc/config/s390/s390.md  |  61 -
>  gcc/testsuite/gcc.target/s390/memset-1.c |   7 +-
>  4 files changed, 331 insertions(+), 43 deletions(-)
> 

Ok. Thanks!

Andreas

Re: [PATCH] aarch64: Add SVE instruction types

2023-05-15 Thread Evandro Menezes via Gcc-patches

Hi, Kyrill.

I wasn’t aware of your previous patch.  Could you clarify why you considered 
creating an SVE specific type attribute instead of reusing the common one?  I 
really liked the iterators that you created; I’d like to use them.

Do you have specific examples which you might want to mention with regards to 
granularity?

Yes, my intent for this patch is to enable modeling the SVE instructions on N1. 
 The patch that implements it brings up some performance improvements, but it’s 
mostly flat, as expected.

Thank you,

-- 
Evandro Menezes



> Em 15 de mai. de 2023, à(s) 04:49, Kyrylo Tkachov  
> escreveu:
> 
> 
> 
>> -Original Message-
>> From: Richard Sandiford > >
>> Sent: Monday, May 15, 2023 10:01 AM
>> To: Evandro Menezes via Gcc-patches > >
>> Cc: evandro+...@gcc.gnu.org ; Evandro 
>> Menezes mailto:ebah...@icloud.com>>;
>> Kyrylo Tkachov mailto:kyrylo.tkac...@arm.com>>; 
>> Tamar Christina
>> mailto:tamar.christ...@arm.com>>
>> Subject: Re: [PATCH] aarch64: Add SVE instruction types
>> 
>> Evandro Menezes via Gcc-patches  writes:
>>> This patch adds the attribute `type` to most SVE1 instructions, as in the
>> other
>>> instructions.
>> 
>> Thanks for doing this.
>> 
>> Could you say what criteria you used for picking the granularity?  Other
>> maintainers might disagree, but personally I'd prefer to distinguish two
>> instructions only if:
>> 
>> (a) a scheduling description really needs to distinguish them or
>> (b) grouping them together would be very artificial (because they're
>>logically unrelated)
>> 
>> It's always possible to split types later if new scheduling descriptions
>> require it.  Because of that, I don't think we should try to predict ahead
>> of time what future scheduling descriptions will need.
>> 
>> Of course, this depends on having results that show that scheduling
>> makes a significant difference on an SVE core.  I think one of the
>> problems here is that, when a different scheduling model changes the
>> performance of a particular test, it's difficult to tell whether
>> the gain/loss is caused by the model being more/less accurate than
>> the previous one, or if it's due to important "secondary" effects
>> on register live ranges.  Instinctively, I'd have expected these
>> secondary effects to dominate on OoO cores.
> 
> I agree with Richard on these points. The key here is getting the granularity 
> right without having too maintain too many types that aren't useful in the 
> models.
> FWIW I had posted 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607101.html in 
> November. It adds annotations to SVE2 patterns as well as for base SVE.
> Feel free to reuse it if you'd like.
> I see you had posted a Neoverse V1 scheduling model. Does that give an 
> improvement on SVE code when combined with the scheduling attributes somehow?
> Thanks,
> Kyrill

Re: [PATCH] aarch64: Add SVE instruction types

2023-05-15 Thread Evandro Menezes via Gcc-patches

Hi, Richard.

My criteria were very much (a).  In some cases though, a particular instruction 
could have variations that others in its natural group didn’t, when if seemed 
sensible to create a specific description for this instruction, even if its 
base form shares resources with other instructions in its group.

Do you have specific instances in mind?

Thank you,

-- 
Evandro Menezes



> Em 15 de mai. de 2023, à(s) 04:00, Richard Sandiford 
>  escreveu:
> 
> Evandro Menezes via Gcc-patches  writes:
>> This patch adds the attribute `type` to most SVE1 instructions, as in the 
>> other
>> instructions.
> 
> Thanks for doing this.
> 
> Could you say what criteria you used for picking the granularity?  Other
> maintainers might disagree, but personally I'd prefer to distinguish two
> instructions only if:
> 
> (a) a scheduling description really needs to distinguish them or
> (b) grouping them together would be very artificial (because they're
>logically unrelated)
> 
> It's always possible to split types later if new scheduling descriptions
> require it.  Because of that, I don't think we should try to predict ahead
> of time what future scheduling descriptions will need.
> 
> Of course, this depends on having results that show that scheduling
> makes a significant difference on an SVE core.  I think one of the
> problems here is that, when a different scheduling model changes the
> performance of a particular test, it's difficult to tell whether
> the gain/loss is caused by the model being more/less accurate than
> the previous one, or if it's due to important "secondary" effects
> on register live ranges.  Instinctively, I'd have expected these
> secondary effects to dominate on OoO cores.
> 
> Richard


-- 
Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX
Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus

Re: [PATCH v2] libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

2023-05-15 Thread Thomas Rodgers via Gcc-patches

On Thu, May 11, 2023 at 1:52 PM Jonathan Wakely  wrote:

> On Thu, 11 May 2023 at 13:42, Jonathan Wakely  wrote:
>
>>
>>
>> On Thu, 11 May 2023 at 13:19, Mike Crowe  wrote:
>>
>>> However, ...
>>>
>>> > > diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
>>> > > index 89e7f5f5f45..e2700b05ec3 100644
>>> > > --- a/libstdc++-v3/acinclude.m4
>>> > > +++ b/libstdc++-v3/acinclude.m4
>>> > > @@ -4284,7 +4284,7 @@
>>> AC_DEFUN([GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT], [
>>> > >[glibcxx_cv_PTHREAD_COND_CLOCKWAIT=no])
>>> > >])
>>> > >if test $glibcxx_cv_PTHREAD_COND_CLOCKWAIT = yes; then
>>> > > -AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, 1, [Define if
>>> > > pthread_cond_clockwait is available in .])
>>> > > +AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT,
>>> (_GLIBCXX_TSAN==0),
>>> > > [Define if pthread_cond_clockwait is available in .])
>>> > >fi
>>>
>>> TSan does appear to have an interceptor for pthread_cond_clockwait, even
>>> if
>>> it lacks the others. Does this mean that this part is unnecessary?
>>>
>>
>> Ah good point, thanks. I grepped for clocklock but not clockwait.
>>
>
> In fact it seems like we don't need to change
> _GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK either, because I don't get any tsan
> warnings for that. It doesn't have interceptors for
> pthread_rwlock_{rd,wr}lock, but it doesn't complain anyway (maybe it's
> simply not instrumenting the rwlock functions at all?!)
>
> So I'm now retesting with this version of the patch, which only touches
> the USE_PTHREAD_LOCKLOCK macro.
>
> Please take another look, thanks.
>
> LGTM.

Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-15 Thread Richard Sandiford via Gcc-patches

juzhe.zh...@rivai.ai writes:
> From: Juzhe-Zhong 
>
> This patch implement decrement IV for length approach in loop control.
>
> Address comment from kewen that incorporate the implementation inside
> "vect_set_loop_controls_directly" instead of a standalone function.
>
> Address comment from Richard using MIN_EXPR to handle these 3 following
> cases
> 1. single rgroup.
> 2. multiple rgroup for SLP.
> 3. multiple rgroup for non-SLP (tested on vec_pack_trunc).

Thanks, this looks pretty reasonable to me FWIW, but some comments below:

> Bootstraped && Regression on x86.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function.
> (vect_set_loop_controls_directly): Add decrement IV support.
> (vect_set_loop_condition_partial_vectors): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Add a new 
> variable.
> (vect_get_loop_len): Add decrement IV support.
> * tree-vect-stmts.cc (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
> (vect_get_loop_len): Add decrement IV support.
>
> ---
>  gcc/tree-vect-loop-manip.cc | 177 +++-
>  gcc/tree-vect-loop.cc   |  38 +++-
>  gcc/tree-vect-stmts.cc  |   9 +-
>  gcc/tree-vectorizer.h   |  13 ++-
>  4 files changed, 224 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index ff6159e08d5..1baac7b1b52 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -385,6 +385,58 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
> rgroup_controls *dest_rgm,
>return false;
>  }
>  
> +/* Try to use adjust loop lens for non-SLP multiple-rgroups.
> +
> + _36 = MIN_EXPR ;
> +
> + First length (MIN (X, VF/N)):
> +   loop_len_15 = MIN_EXPR <_36, VF/N>;
> +
> + Second length:
> +   tmp = _36 - loop_len_15;
> +   loop_len_16 = MIN (tmp, VF/N);
> +
> + Third length:
> +   tmp2 = tmp - loop_len_16;
> +   loop_len_17 = MIN (tmp2, VF/N);
> +
> + Forth length:
> +   tmp3 = tmp2 - loop_len_17;
> +   loop_len_18 = MIN (tmp3, VF/N);  */
> +
> +static void
> +vect_adjust_loop_lens (tree iv_type, gimple_seq *seq, rgroup_controls 
> *dest_rgm,
> +rgroup_controls *src_rgm)
> +{
> +  tree ctrl_type = dest_rgm->type;
> +  poly_uint64 nitems_per_ctrl
> += TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor;
> +
> +  for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i)
> +{
> +  tree src = src_rgm->controls[i / dest_rgm->controls.length ()];
> +  tree dest = dest_rgm->controls[i];
> +  tree length_limit = build_int_cst (iv_type, nitems_per_ctrl);
> +  gassign *stmt;
> +  if (i == 0)
> + {
> +   /* MIN (X, VF*I/N) capped to the range [0, VF/N].  */
> +   stmt = gimple_build_assign (dest, MIN_EXPR, src, length_limit);
> +   gimple_seq_add_stmt (seq, stmt);
> + }
> +  else
> + {
> +   /* (MIN (remain, VF*I/N)) capped to the range [0, VF/N].  */
> +   tree temp = make_ssa_name (iv_type);
> +   stmt = gimple_build_assign (temp, MINUS_EXPR, src,
> +   dest_rgm->controls[i - 1]);
> +   gimple_seq_add_stmt (seq, stmt);
> +   stmt = gimple_build_assign (dest, MIN_EXPR, temp, length_limit);
> +   gimple_seq_add_stmt (seq, stmt);
> + }
> +}
> +}
> +
>  /* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
> for all the rgroup controls in RGC and return a control that is nonzero
> when the loop needs to iterate.  Add any new preheader statements to
> @@ -468,9 +520,10 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
>standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> -  create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE,
> -  loop, &incr_gsi, insert_after, &index_before_incr,
> -  &index_after_incr);
> +  if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE,
> +loop, &incr_gsi, insert_after, &index_before_incr,
> +&index_after_incr);
>  
>tree zero_index = build_int_cst (compare_type, 0);
>tree test_index, test_limit, first_limit;
> @@ -552,8 +605,13 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>/* Convert the IV value to the comparison type (either a no-op or
>   a demotion).  */
>gimple_seq test_seq = NULL;
> -  test_index = gimple_convert (&test_seq, compare_type, test_index);
> -  gsi_insert_seq_before (test_gsi, test_seq, GSI_SAME_STMT);
> +  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +test_limit = gimple_convert (pre

Back to requiring "Perl version 5.6.1 (or later)" [PR82856] (was: Update GCC to autoconf 2.69, automake 1.15.1)

2023-05-15 Thread Thomas Schwinge

Hi!

On 2018-10-31T17:04:46+, Joseph Myers  wrote:
> On Wed, 31 Oct 2018, Thomas Koenig wrote:
>> Am 31.10.18 um 04:26 schrieb Joseph Myers:
>> > This patch (diffs to generated files omitted below) updates GCC to use
>> > autoconf 2.69 and automake 1.15.1.
>>
>> I think this should fix PR 82856.  Maybe you could confirm that this
>> restores automake functionality with perl 5.6.26, and mention the PR
>> in the ChangeLog.

(Perl 5.26, not 5.6.26, is what was meant there; see
.
I remember well, as I once chased down that patch...)

> At least, the warnings I saw with an older perl version and automake
> 1.11.x are gone when using 1.15.1.

ACK.

> I've committed this revised patch version

> A reference to PR
> bootstrap/82856 has been added.  gcc/doc/install.texi has been updated
> to mention the new versions required.

..., but not removed the Perl "5.6.25" 5.26 requirement.  OK to push the
attached "Back to requiring "Perl version 5.6.1 (or later)" [PR82856]"?
(Later then to be backported to all relevant release branches, too.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 18153b349eb0062d73e3b2da3a2721dd44884b94 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 15 May 2023 20:55:11 +0200
Subject: [PATCH] Back to requiring "Perl version 5.6.1 (or later)" [PR82856]

With Subversion r265695 (Git commit 22e052725189a472e4e86ebb6595278a49f4bcdd)
"Update GCC to autoconf 2.69, automake 1.15.1 (PR bootstrap/82856)" we're back
to normal; per Automake 1.15.1 'configure.ac' still "[...] perl 5.6 or better
is required [...]".

	PR bootstrap/82856
	gcc/
	* doc/install.texi (Perl): Back to requiring "Perl version 5.6.1 (or
	later)".
---
 gcc/doc/install.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index fa91ce1953d..dfab47dac96 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -366,7 +366,7 @@ Necessary (only on some platforms) to untar the source code.  Many
 systems' @command{tar} programs will also work, only try GNU
 @command{tar} if you have problems.
 
-@item Perl version between 5.6.1 and 5.6.24
+@item Perl version 5.6.1 (or later)
 
 Necessary when targeting Darwin, building @samp{libstdc++},
 and not using @option{--disable-symvers}.
-- 
2.34.1

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-15 Thread Richard Sandiford via Gcc-patches

Prathamesh Kulkarni  writes:
> Hi Richard,
> After committing the interleave+zip1 patch for vector initialization,
> it seems to regress the s32 case for this patch:
>
> int32x4_t f_s32(int32_t x)
> {
>   return (int32x4_t) { x, x, x, 1 };
> }
>
> code-gen:
> f_s32:
> moviv30.2s, 0x1
> fmovs31, w0
> dup v0.2s, v31.s[0]
> ins v30.s[0], v31.s[0]
> zip1v0.4s, v0.4s, v30.4s
> ret
>
> instead of expected code-gen:
> f_s32:
> moviv31.2s, 0x1
> dup v0.4s, w0
> ins v0.s[3], v31.s[0]
> ret
>
> Cost for fallback sequence: 16
> Cost for interleave and zip sequence: 12
>
> For the above case, the cost for interleave+zip1 sequence is computed as:
> halves[0]:
> (set (reg:V2SI 96)
> (vec_duplicate:V2SI (reg/v:SI 93 [ x ])))
> cost = 8
>
> halves[1]:
> (set (reg:V2SI 97)
> (const_vector:V2SI [
> (const_int 1 [0x1]) repeated x2
> ]))
> (set (reg:V2SI 97)
> (vec_merge:V2SI (vec_duplicate:V2SI (reg/v:SI 93 [ x ]))
> (reg:V2SI 97)
> (const_int 1 [0x1])))
> cost = 8
>
> followed by:
> (set (reg:V4SI 95)
> (unspec:V4SI [
> (subreg:V4SI (reg:V2SI 96) 0)
> (subreg:V4SI (reg:V2SI 97) 0)
> ] UNSPEC_ZIP1))
> cost = 4
>
> So the total cost becomes
> max(costs[0], costs[1]) + zip1_insn_cost
> = max(8, 8) + 4
> = 12
>
> While the fallback rtl sequence is:
> (set (reg:V4SI 95)
> (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
> cost = 8
> (set (reg:SI 98)
> (const_int 1 [0x1]))
> cost = 4
> (set (reg:V4SI 95)
> (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 98))
> (reg:V4SI 95)
> (const_int 8 [0x8])))
> cost = 4
>
> So total cost = 8 + 4 + 4 = 16, and we choose the interleave+zip1 sequence.
>
> I think the issue is probably that for the interleave+zip1 sequence we take
> max(costs[0], costs[1]) to reflect that both halves are interleaved,
> but for the fallback seq we use seq_cost, which assumes serial execution
> of insns in the sequence.
> For above fallback sequence,
> set (reg:V4SI 95)
> (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
> and
> (set (reg:SI 98)
> (const_int 1 [0x1]))
> could be executed in parallel, which would make it's cost max(8, 4) + 4 = 12.

Agreed.

A good-enough substitute for this might be to ignore scalar moves
(for both alternatives) when costing for speed.

> I was wondering if we should we make cost for interleave+zip1 sequence
> more conservative
> by not taking max, but summing up costs[0] + costs[1] even for speed ?
> For this case,
> that would be 8 + 8 + 4 = 20.
>
> It generates the fallback sequence for other cases (s8, s16, s64) from
> the test-case.

What does it do for the tests in the interleave+zip1 patch?  If it doesn't
make a difference there then it sounds like we don't have enough tests. :)

Summing is only conservative if the fallback sequence is somehow "safer".
But I don't think it is.   Building an N-element vector from N scalars
can be done using N instructions in the fallback case and N+1 instructions
in the interleave+zip1 case.  But the interleave+zip1 case is still
better (speedwise) for N==16.

Thanks,
Richard

Ping: [PATCH V5] PR target/105325: Fix constraint issue with power10 fusion

2023-05-15 Thread Michael Meissner via Gcc-patches

Ping both patches:

Patch #1, rewrite genfusion.pl's code for load and compare immediate fusion to
be more readable.  This patch produces the same output as the current sources.

| Date: Wed, 10 May 2023 11:38:55 -0400
| Subject: Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's 
gen_ld_cmpi_p10 function.
| Message-ID: 

Patch #2, implement the fix for PR target/105325:

| Date: Wed, 10 May 2023 11:40:00 -0400
| Subject: [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 
fusion.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: Question on patch -fprofile-partial-training

2023-05-15 Thread Qing Zhao via Gcc-patches



> On May 11, 2023, at 12:08 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On May 10, 2023, at 9:15 AM, Jan Hubicka  wrote:
>> 
>>> Honza,
 Main motivation for this was profiling programs that contain specific
 code paths for different CPUs (such as graphics library in Firefox or Linux
 kernel). In the situation training machine differs from the machine
 program is run later, we end up optimizing for size all code paths
 except ones taken by the specific CPU.  This patch essentially tells gcc
 to consider every non-trained function as built without profile
 feedback.
>>> Make sense.
 
 For Firefox it had important impact on graphics rendering tests back
 then since the building machined had AVX while the benchmarking did not.
 Some benchmarks improved several times which is not a surprise if you
 consider tight graphics rendering loop optimized for size versus
 vectorized one.  
>>> 
>>> That’s a lot of improvement. So, without -fprofile-partial-training, the 
>>> PGO hurt the performance for those cases? 
>> 
>> Yes, to get code size improvements we assume that the non-trained part
>> of code is cold and with -Os we are very aggressive to optimize for
>> size.  We now have two-level optimize_for size, so I think we could
>> make this more fine grained this stage1.
> 
> Okay. I see. 
> 
> Thanks a lot for the info.
> 
> Another question (which is confusing us very much right now is):
> 
> When we lower the following  parameter from 999 to 950: (in GCC8)
> 
> DEFPARAM(HOT_BB_COUNT_WS_PERMILLE,
> "hot-bb-count-ws-permille",
> "A basic block profile count is considered hot if it contributes to "
> "the given permillage of the entire profiled execution.”
> 999, 0, 1000)
> 
> The size of the “text.hot" section is 4x times SMALLER than the default one. 
> Is this expected behavior? 

As my further study of GCC8, yes, this is the expected behavior. -:).

Qing
> (From my reading of the GCC8 source code, when this parameter is getting 
> smaller, more basic blocks and functions will
> Be considered as HOT by GCC, then the text.hot section should be larger, not 
> smaller, do I miss anything here?)
> 
> Thanks a lot for your help.
> 
> Qing
> 
>> 
>> Honza
>>> 
 The patch has bad effect on code size which in turn
 impacts performance too, so I think it makes sense to use
 -fprofile-partial-training with bit of care (i.e. only one code where
 such scenarios are likely).
>>> 
>>> Right. 
 
 As for backporting, I do not have checkout of GCC 8 right now. It
 depends on profile infrastructure that was added in 2017 (so stage1 of
 GCC 8), so the patch may backport quite easilly.  I am not 100% sure
 what shape the infrastrucure was in the first version, but I am quite
 convinced it had the necessary bits - it was able to make the difference
 between 0 profile count and missing profile feedback.
>>> 
>>> This is good to know, I will try to back port to GCC8 and let them test to 
>>> see any good impact.
>>> 
>>> Qing
 
 Honza

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Aldy Hernandez via Gcc-patches


On 5/15/23 17:07, Aldy Hernandez wrote:



On 5/15/23 12:42, Jakub Jelinek wrote:

On Mon, May 15, 2023 at 12:35:23PM +0200, Aldy Hernandez wrote:

gcc/ChangeLog:

PR tree-optimization/109695
* value-range.cc (irange::operator=): Resize range.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(int_range_max): Default to 3 sub-ranges and resize as needed.
* value-range.h (irange::maybe_resize): New.
(~int_range): New.
(int_range::int_range): Adjust for resizing.
(int_range::operator=): Same.


LGTM.

One question is if we shouldn't do it for GCC13/GCC12 as well, perhaps
changing it to some larger number than 3 when the members aren't 
wide_ints

in there but just trees.  Sure, in 13/12 the problem is 10x less severe
than in current trunk, but still we have some cases where we run out of
stack because of it on some hosts.


Sure, but that would require messing around with the gt_* GTY functions, 
and making sure we're allocating the trees from a sensible place, etc 
etc.  I'm less confident in my ability to mess with GTY stuff this late 
in the game.


Hmmm, maybe backporting this isn't too bad.  The only time we'd have a 
chunk on the heap is for int_range_max, which will never live in GC 
space.  So I don't think we need to worry about GC at all.


Although, legacy mode in GCC13 does get in a the way a bit.  Sigh.

And unrealted, but speaking of GC... we should remove all GTY markers 
from vrange.  It should never live in GC.  That's why we have 
vrange_storage for, and that is what we put in the tree_ssa_name.


 /* Value range information.  */
  union ssa_name_info_type {
/* Range and aliasing info for pointers.  */
struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
/* Range info for everything else.  */
struct GTY ((tag ("1"))) vrange_storage * range_info;
  } GTY ((desc ("%1.typed.type ?" \
"!POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) : 2"))) info;

That should have been the only use of range GC stuff, but alas another 
one crept in... IPA:


struct GTY (()) ipa_jump_func
{
...
  /* Information about value range, containing valid data only when 
vr_known is

 true.  The pointed to structure is shared betweed different jump
 functions.  Use ipa_set_jfunc_vr to set this field.  */
  value_range *m_vr;
...
};

This means that we can't nuke int_range and default to an always 
resizable range just yet, because we'll end up with the value_range in 
GC memory, and resizable part in the heap.


That m_vr pointer should be a pointer to vrange_storage.  Meh...I'm 
bumping against my IPA work yet again.  I think it's time to start 
dusting off those patches.


Aldy

[patch,avr] Fix PR109650 wrong code

2023-05-15 Thread Georg-Johann Lay


This patch fixes a wrong-code bug in the wake of PR92729, the transition
that turned the AVR backend from cc0 to CCmode.  In cc0, the insn that
uses cc0 like a conditional branch always follows the cc0 setter, which
is no more the case with CCmode where set and use of REG_CC might be in
different basic blocks.

This patch removes the machine-dependent reorg pass in avr_reorg entirely.

It is replaced by a new, AVR specific mini-pass that runs prior to
split2. Canonicalization of comparisons away from the "difficult"
codes GT[U] and LE[U] is now mostly performed by implementing
TARGET_CANONICALIZE_COMPARISON.

Moreover:

* Text peephole conditions get "dead_or_set_regno_p (*, REG_CC)" as
needed.

* RTL peephole conditions get "peep2_regno_dead_p (*, REG_CC)" as
needed.

* Conditional branches no more clobber REG_CC.

* insn output for compares looks ahead to determine the branch mode in
use. This needs also "dead_or_set_regno_p (*, REG_CC)".

* Add RTL peepholes for decrement-and-branch detection.

Finally, it fixes some of the many indentation glitches left over from
PR92729.

Ok?

I'd also backport this one because all of v12+ is affected by the wrong 
code.


Johann

--

gcc/
PR/target 109650
PR/target 97279

* config/avr/avr-passes.def (avr_pass_ifelse): Insert new pass.
* config/avr/avr.cc (avr_pass_ifelse): New RTL pass.
(avr_pass_data_ifelse): New pass_data for it.
(make_avr_pass_ifelse, avr_redundant_compare, avr_cbranch_cost)
(avr_canonicalize_comparison, avr_out_plus_set_ZN): New functions.
(compare_condtition): Make sure REG_CC dies in the branch insn.
(avr_rtx_costs_1): Add computation of cbranch costs.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_ZN]: Handle case.
(TARGET_CANONICALIZE_COMPARISON): New define.
(avr_simplify_comparison_p, compare_diff_p, avr_compare_pattern)
(avr_reorg_remove_redundant_compare, avr_reorg): Remove functions.
(TARGET_MACHINE_DEPENDENT_REORG): Remove define.

* avr-protos.h (avr_simplify_comparison_p): Remove proto.
(make_avr_pass_ifelse, avr_out_plus_set_ZN, cc_reg_rtx): New Protos

* config/avr/avr.md (branch, difficult_branch): Don't split insns.
(*swapped_tst, *add.for.eqne.): New insns.
(*cbranch4): Rename to cbranch4_insn.
(cbranch4): Try to canonicalize comparisons at expand.
(define_peephole): Add dead_or_set_regno_p(insn,REG_CC) as needed.
(define_deephole2): Add peep2_regno_dead_p(*,REG_CC) as needed.
Add new RTL peepholes for decrement-and-branch and *swapped_tst.
(adjust_len) [add_set_ZN]: New.
(rvbranch, *rvbranch, difficult_rvbranch, *difficult_rvbranch)
(branch_unspec, *negated_tst, *reversed_tst): Remove insns.
(define_c_enum "unspec") [UNSPEC_IDENTITY]: Remove.

* config/avr/avr-dimode.md (cbranch4): Canonicalize comparisons.
* config/avr/predicates.md (scratch_or_d_register_operand): New.
* config/avr/contraints.md (Yxx): New constraint.

gcc/testsuite/
PR/target 109650
* config/avr/torture/pr109650-1.c: New test.diff --git a/gcc/config/avr/avr-dimode.md b/gcc/config/avr/avr-dimode.md
index c0bb04ff9e0..91f0d395761 100644
--- a/gcc/config/avr/avr-dimode.md
+++ b/gcc/config/avr/avr-dimode.md
@@ -455,12 +455,18 @@ (define_expand "conditional_jump"
 (define_expand "cbranch4"
   [(set (pc)
 (if_then_else (match_operator 0 "ordered_comparison_operator"
-[(match_operand:ALL8 1 "register_operand"  "")
- (match_operand:ALL8 2 "nonmemory_operand" "")])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
+[(match_operand:ALL8 1 "register_operand")
+ (match_operand:ALL8 2 "nonmemory_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   "avr_have_dimode"
{
+int icode = (int) GET_CODE (operands[0]);
+
+targetm.canonicalize_comparison (&icode, &operands[1], &operands[2], false);
+operands[0] = gen_rtx_fmt_ee ((enum rtx_code) icode,
+  VOIDmode, operands[1], operands[2]);
+
 rtx acc_a = gen_rtx_REG (mode, ACC_A);
 
 avr_fix_inputs (operands, 1 << 2, regmask (mode, ACC_A));
@@ -490,8 +496,8 @@ (define_insn_and_split "cbranch_2_split"
 (if_then_else (match_operator 0 "ordered_comparison_operator"
 [(reg:ALL8 ACC_A)
  (reg:ALL8 ACC_B)])
- (label_ref (match_operand 1 "" ""))
- (pc)))]
+  (label_ref (match_operand 1))
+  (pc)))]
   "avr_have_dimode"
   "#"
   "&& reload_completed"
@@ -544,8 +550,8 @@ (define_insn_and_split "cbranch_const_2_split"
 (if_then_else (match_operator 0 "ordered_comparison_operator"
 [(reg:ALL8 ACC_A)
  (match_op

Re: [COMMITTED] Remove deprecated range_fold_{unary, binary}_expr uses from ipa-*.

2023-05-15 Thread Aldy Hernandez via Gcc-patches





On 5/5/23 17:10, Martin Jambor wrote:

Hello,

On Wed, Apr 26 2023, Aldy Hernandez via Gcc-patches wrote:

gcc/ChangeLog:

* ipa-cp.cc (ipa_vr_operation_and_type_effects): Convert to ranger API.
(ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
* vr-values.cc (bounds_of_var_in_loop): Same.


thanks for taking care of the value range uses in IPA.


---
  gcc/ipa-cp.cc| 28 +--
  gcc/ipa-fnsummary.cc | 45 
  gcc/ipa-prop.cc  |  5 ++---
  gcc/vr-values.cc |  6 --
  4 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 65c49558b58..673c40b 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -128,6 +128,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "attribs.h"
  #include "dbgcnt.h"
  #include "symtab-clones.h"
+#include "gimple-range.h"
  
  template  class ipcp_value;
  
@@ -1900,10 +1901,15 @@ ipa_vr_operation_and_type_effects (value_range *dst_vr,

   enum tree_code operation,
   tree dst_type, tree src_type)
  {
-  range_fold_unary_expr (dst_vr, operation, dst_type, src_vr, src_type);
-  if (dst_vr->varying_p () || dst_vr->undefined_p ())
+  if (!irange::supports_p (dst_type) || !irange::supports_p (src_type))
  return false;
-  return true;
+
+  range_op_handler handler (operation, dst_type);


Would it be possible to document the range_op_handler class somewhat?


+  return (handler
+ && handler.fold_range (*dst_vr, dst_type,
+*src_vr, value_range (dst_type))
+ && !dst_vr->varying_p ()
+ && !dst_vr->undefined_p ());


It looks important but the class is not documented at all.  Although the
use of fold_range is probably hopefully mostly clear from its uses in
this patch, the meaning of the return value of this method and what
other methods do is less obvious.

For example, I am curious why (not in this patch, but in the code as it
is now in the repo), uses of fold_range seem to be always preceeded with
a check for supports_type_p, even though the type is then also fed into
fold_range itself.  Does the return value of fold_range mean something
slightly different from "could not deduce anything?"


Oh, I see what you mean.

Take for instance this bit in ipa-cp:

  if (!irange::supports_p (dst_type) || !irange::supports_p (src_type))
return false;

  range_op_handler handler (operation, dst_type);
  return (handler
  && handler.fold_range (*dst_vr, dst_type,
 *src_vr, value_range (dst_type))
  && !dst_vr->varying_p ()
  && !dst_vr->undefined_p ());

range_op_handler::fold_range() takes a type agnostic vrange (from which 
irange inherits).  If you pass it an irange, but the type is say a 
float, you'll get an ICE downstream.


Ranger itself is type agnostic and takes a vrange almost everywhere. 
It's up to the user to make sure the the range type and the type of the 
operation matches.


Eventually we should convert all those value_range arguments in IPA to 
vrange and have it work in a type agnostic manner.  I have patches for 
this, but I still have to flush out all this preliminary stuff :).


Aldy

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Aldy Hernandez via Gcc-patches

On Mon, May 15, 2023 at 5:03 PM Aldy Hernandez  wrote:
>
>
>
> On 5/15/23 13:08, Richard Biener wrote:
> > On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez  wrote:
> >>
> >> 
> >> We can now have int_range for automatically
> >> resizable ranges.  int_range_max is now int_range<3, true>
> >> for a 69X reduction in size from current trunk, and 6.9X reduction from
> >> GCC12.  This incurs a 5% performance penalty for VRP that is more than
> >> covered by our > 13% improvements recently.
> >> 
> >>
> >> int_range_max is the temporary range object we use in the ranger for
> >> integers.  With the conversion to wide_int, this structure bloated up
> >> significantly because wide_ints are huge (80 bytes a piece) and are
> >> about 10 times as big as a plain tree.  Since the temporary object
> >> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
> >> This means the structure grew from 4112 bytes to 40912 bytes.
> >>
> >> This patch adds the ability to resize ranges as needed, defaulting to
> >> no resizing, while int_range_max now defaults to 3 sub-ranges (instead
> >> of 255) and grows to 255 when the range being calculated does not fit.
> >>
> >> For example:
> >>
> >> int_range<1> foo;   // 1 sub-range with no resizing.
> >> int_range<5> foo;   // 5 sub-ranges with no resizing.
> >> int_range<5, true> foo; // 5 sub-ranges with resizing.
> >>
> >> I ran some tests and found that 3 sub-ranges cover 99% of cases, so
> >> I've set the int_range_max default to that:
> >>
> >>  typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
> >>
> >> We don't bother growing incrementally, since the default covers most
> >> cases and we have a 255 hard-limit.  This hard limit could be reduced
> >> to 128, since my tests never saw a range needing more than 124, but we
> >> could do that as a follow-up if needed.
> >>
> >> With 3-subranges, int_range_max is now 592 bytes versus 40912 for
> >> trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
> >> and 3.02% for threading, with no noticeable change in overall
> >> compilation (0.27%).  This is more than covered by our 13.26%
> >> improvements for the legacy removal + wide_int conversion.
> >
> > Thanks for doing this.
> >
> >> I think this approach is a good alternative, while providing us with
> >> flexibility going forward.  For example, we could try defaulting to a
> >> 8 sub-ranges for a noticeable improvement in VRP.  We could also use
> >> large sub-ranges for switch analysis to avoid resizing.
> >>
> >> Another approach I tried was always resizing.  With this, we could
> >> drop the whole int_range nonsense, and have irange just hold a
> >> resizable range.  This simplified things, but incurred a 7% penalty on
> >> ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
> >> this wasn't some artifact of valgrind.  However, until we're sure,
> >> let's avoid massive changes, especially since IPA changes are coming
> >> up.
> >>
> >> For the curious, a particular hot spot for IPA in this area was:
> >>
> >> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
> >> {
> >> ...
> >> ...
> >>value_range save (m_vr);
> >>m_vr.union_ (*other_vr);
> >>return m_vr != save;
> >> }
> >>
> >> The problem isn't the resizing (since we do that at most once) but the
> >> fact that for some functions with lots of callers we end up a huge
> >> range that gets copied and compared for every meet operation.  Maybe
> >> the IPA algorithm could be adjusted somehow??.
> >
> > Well, the above just wants to know whether the union_ operation changed
> > the range.  I suppose that would be an interesting (and easy to compute?)
> > secondary output of union_ and it seems it already computes that (but
> > maybe not correctly?).  So I suggest to change the above to
>
> union_ returns a value specifically for that, which Andrew uses for
> cache optimization.  For that matter, your suggestion was my first
> approach, but I quickly found out we were being overly pessimistic in
> some cases, and I was too lazy to figure out why.
>
> >
> >bool res;
> >if (flag_checking)
> > {
> >value_range save (m_vr);
> >res = m_vr.union_ (*other_vr);
> >gcc_assert (res == (m_vr != save));
> > }
> >   else
> >  res = m_vr.union (*other_vr);
> >   return res;
>
> With your suggested sanity check I chased the problem to a minor
> inconsistency when unioning nonzero masks.  The issue wasn't a bug, just
> a pessimization.  I'm attaching a patch that corrects the oversight
> (well, not oversight, everything was more expensive with trees)... It
> yields a 6.89% improvement to the ipa-cp pass!!!  Thanks.
>
> I'll push it if it passes tests.

Tests passed.  Pushed patch.

I've also pushed the original patch in this email.  We can address
anything else as a follow-up.

Thanks for everyone's feedback.
Aldy

Re: For GCC, newlib combined tree, newlib build-tree testing, use standard search paths

2023-05-15 Thread Jeff Johnston via Gcc-patches

Sounds fine Thomas.  Thanks.

-- Jeff J.

On Mon, May 15, 2023 at 4:01 AM Thomas Schwinge 
wrote:

> Hi!
>
> On 2023-05-08T21:50:56+0200, I wrote:
> > Ping: OK to push to newlib main branch the attached
> > "For GCC, newlib combined tree, newlib build-tree testing, use standard
> search paths"?
> > Or, has anybody got adverse comments/insight into this?
>
> Given that nobody has any comments, I'll push this later this week.
> (..., and be available to address any issue this, unlikely, may cause.)
>
>
> Grüße
>  Thomas
>
>
> > On 2023-04-14T22:03:28+0200, I wrote:
> >> Hi!
> >>
> >> OK to push to newlib main branch the attached
> >> "For GCC, newlib combined tree, newlib build-tree testing, use standard
> search paths"
> >> -- or is something else wrong here, or should this be done differently?
> >> (I mean, I'm confused why this doesn't just work; I'm certainly not the
> >> first person to be testing such a setup?)
> >>
> >> I'm not doing anything special here: just symlink 'newlib' into the GCC
> >> source directory, build the combined tree, and then run 'make check', as
> >> mentioned in the attached Git commit log.
> >>
> >>
> >> Grüße
> >>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> Registergericht München, HRB 106955
>

Re: [PATCH] c++: add feature-test macro for auto(x)

2023-05-15 Thread Patrick Palka via Gcc-patches

On Mon, May 15, 2023 at 11:43 AM Jakub Jelinek  wrote:
>
> On Mon, May 15, 2023 at 11:41:46AM -0400, Jason Merrill via Gcc-patches wrote:
> > On 5/15/23 11:24, Patrick Palka wrote:
> > > This adds the feature-test macro for PR0849R8, as per
> > > https://github.com/cplusplus/CWG/issues/281.
> > >
> > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/13?
> >
> > OK.
>
> https://gcc.gnu.org/projects/cxx-status.html#cxx23 lists it already in 12,
> shouldn't that go to 12 branch as well?

D'oh, I misremembered when Marek implemented this!  I can backport it
to 12 as well.

> >
> > > gcc/c-family/ChangeLog:
> > >
> > > * c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_auto_cast
> > > for C++23.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * g++.dg/cpp23/feat-cxx2b.C: Test __cpp_auto_cast.
>
> Jakub
>

Re: [PATCH] c++: add feature-test macro for auto(x)

2023-05-15 Thread Jakub Jelinek via Gcc-patches

On Mon, May 15, 2023 at 11:41:46AM -0400, Jason Merrill via Gcc-patches wrote:
> On 5/15/23 11:24, Patrick Palka wrote:
> > This adds the feature-test macro for PR0849R8, as per
> > https://github.com/cplusplus/CWG/issues/281.
> > 
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/13?
> 
> OK.

https://gcc.gnu.org/projects/cxx-status.html#cxx23 lists it already in 12,
shouldn't that go to 12 branch as well?
> 
> > gcc/c-family/ChangeLog:
> > 
> > * c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_auto_cast
> > for C++23.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp23/feat-cxx2b.C: Test __cpp_auto_cast.

Jakub

Re: [PATCH] c++: add feature-test macro for auto(x)

2023-05-15 Thread Jason Merrill via Gcc-patches


On 5/15/23 11:24, Patrick Palka wrote:

This adds the feature-test macro for PR0849R8, as per
https://github.com/cplusplus/CWG/issues/281.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/13?


OK.


gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_auto_cast
for C++23.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_auto_cast.
---
  gcc/c-family/c-cppbuiltin.cc| 1 +
  gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C | 6 ++
  2 files changed, 7 insertions(+)

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index 98f5aef2af9..5d64625fcd7 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1074,6 +1074,7 @@ c_cpp_builtins (cpp_reader *pfile)
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
+ cpp_define (pfile, "__cpp_auto_cast=202110L");
  cpp_define (pfile, "__cpp_constexpr=202211L");
  cpp_define (pfile, "__cpp_multidimensional_subscript=202211L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
diff --git a/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C 
b/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
index 6f4f6bcaad0..9e29b01adc1 100644
--- a/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
+++ b/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
@@ -578,6 +578,12 @@
  #  error "__cpp_implicit_move != 202207"
  #endif
  
+#ifndef __cpp_auto_cast

+#  error "__cpp_auto_cast"
+#elif __cpp_auto_cast != 202110
+#  error "__cpp_auto_cast != 202110"
+#endif
+
  //  C++23 attributes:
  
  #ifdef __has_cpp_attribute

[PATCH] c++: add feature-test macro for auto(x)

2023-05-15 Thread Patrick Palka via Gcc-patches

This adds the feature-test macro for PR0849R8, as per
https://github.com/cplusplus/CWG/issues/281.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/13?

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_auto_cast
for C++23.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_auto_cast.
---
 gcc/c-family/c-cppbuiltin.cc| 1 +
 gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index 98f5aef2af9..5d64625fcd7 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1074,6 +1074,7 @@ c_cpp_builtins (cpp_reader *pfile)
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
+ cpp_define (pfile, "__cpp_auto_cast=202110L");
  cpp_define (pfile, "__cpp_constexpr=202211L");
  cpp_define (pfile, "__cpp_multidimensional_subscript=202211L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
diff --git a/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C 
b/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
index 6f4f6bcaad0..9e29b01adc1 100644
--- a/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
+++ b/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
@@ -578,6 +578,12 @@
 #  error "__cpp_implicit_move != 202207"
 #endif
 
+#ifndef __cpp_auto_cast
+#  error "__cpp_auto_cast"
+#elif __cpp_auto_cast != 202110
+#  error "__cpp_auto_cast != 202110"
+#endif
+
 //  C++23 attributes:
 
 #ifdef __has_cpp_attribute
-- 
2.40.1.552.g91428f078b

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Aldy Hernandez via Gcc-patches





On 5/15/23 12:42, Jakub Jelinek wrote:

On Mon, May 15, 2023 at 12:35:23PM +0200, Aldy Hernandez wrote:

gcc/ChangeLog:

PR tree-optimization/109695
* value-range.cc (irange::operator=): Resize range.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(int_range_max): Default to 3 sub-ranges and resize as needed.
* value-range.h (irange::maybe_resize): New.
(~int_range): New.
(int_range::int_range): Adjust for resizing.
(int_range::operator=): Same.


LGTM.

One question is if we shouldn't do it for GCC13/GCC12 as well, perhaps
changing it to some larger number than 3 when the members aren't wide_ints
in there but just trees.  Sure, in 13/12 the problem is 10x less severe
than in current trunk, but still we have some cases where we run out of
stack because of it on some hosts.


Sure, but that would require messing around with the gt_* GTY functions, 
and making sure we're allocating the trees from a sensible place, etc 
etc.  I'm less confident in my ability to mess with GTY stuff this late 
in the game.


Thoughts?
Aldy

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Aldy Hernandez via Gcc-patches




On 5/15/23 13:08, Richard Biener wrote:

On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez  wrote:



We can now have int_range for automatically
resizable ranges.  int_range_max is now int_range<3, true>
for a 69X reduction in size from current trunk, and 6.9X reduction from
GCC12.  This incurs a 5% performance penalty for VRP that is more than
covered by our > 13% improvements recently.


int_range_max is the temporary range object we use in the ranger for
integers.  With the conversion to wide_int, this structure bloated up
significantly because wide_ints are huge (80 bytes a piece) and are
about 10 times as big as a plain tree.  Since the temporary object
requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
This means the structure grew from 4112 bytes to 40912 bytes.

This patch adds the ability to resize ranges as needed, defaulting to
no resizing, while int_range_max now defaults to 3 sub-ranges (instead
of 255) and grows to 255 when the range being calculated does not fit.

For example:

int_range<1> foo;   // 1 sub-range with no resizing.
int_range<5> foo;   // 5 sub-ranges with no resizing.
int_range<5, true> foo; // 5 sub-ranges with resizing.

I ran some tests and found that 3 sub-ranges cover 99% of cases, so
I've set the int_range_max default to that:

 typedef int_range<3, /*RESIZABLE=*/true> int_range_max;

We don't bother growing incrementally, since the default covers most
cases and we have a 255 hard-limit.  This hard limit could be reduced
to 128, since my tests never saw a range needing more than 124, but we
could do that as a follow-up if needed.

With 3-subranges, int_range_max is now 592 bytes versus 40912 for
trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
and 3.02% for threading, with no noticeable change in overall
compilation (0.27%).  This is more than covered by our 13.26%
improvements for the legacy removal + wide_int conversion.


Thanks for doing this.


I think this approach is a good alternative, while providing us with
flexibility going forward.  For example, we could try defaulting to a
8 sub-ranges for a noticeable improvement in VRP.  We could also use
large sub-ranges for switch analysis to avoid resizing.

Another approach I tried was always resizing.  With this, we could
drop the whole int_range nonsense, and have irange just hold a
resizable range.  This simplified things, but incurred a 7% penalty on
ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
this wasn't some artifact of valgrind.  However, until we're sure,
let's avoid massive changes, especially since IPA changes are coming
up.

For the curious, a particular hot spot for IPA in this area was:

ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
{
...
...
   value_range save (m_vr);
   m_vr.union_ (*other_vr);
   return m_vr != save;
}

The problem isn't the resizing (since we do that at most once) but the
fact that for some functions with lots of callers we end up a huge
range that gets copied and compared for every meet operation.  Maybe
the IPA algorithm could be adjusted somehow??.


Well, the above just wants to know whether the union_ operation changed
the range.  I suppose that would be an interesting (and easy to compute?)
secondary output of union_ and it seems it already computes that (but
maybe not correctly?).  So I suggest to change the above to


union_ returns a value specifically for that, which Andrew uses for 
cache optimization.  For that matter, your suggestion was my first 
approach, but I quickly found out we were being overly pessimistic in 
some cases, and I was too lazy to figure out why.




   bool res;
   if (flag_checking)
{
   value_range save (m_vr);
   res = m_vr.union_ (*other_vr);
   gcc_assert (res == (m_vr != save));
}
  else
 res = m_vr.union (*other_vr);
  return res;


With your suggested sanity check I chased the problem to a minor 
inconsistency when unioning nonzero masks.  The issue wasn't a bug, just 
a pessimization.  I'm attaching a patch that corrects the oversight 
(well, not oversight, everything was more expensive with trees)... It 
yields a 6.89% improvement to the ipa-cp pass!!!  Thanks.


I'll push it if it passes tests.

BTW, without the annoying IPA-cp performance regression, this paves the 
way for nuking int_range in favor of just irange, and have everything 
resize as needed.  I'll wait for Andrew to chime in when he returns from 
PTO, since we may want to leave int_range around since it does 
provide flexibility (at the expensive of fugly looking declarations).


AldyFrom 6a7354d3494665d46f8cbfc71c58f784c02142ff Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Mon, 15 May 2023 15:10:11 +0200
Subject: [PATCH] Only return changed=true in union_nonzero when appropriate.

irange::union_ was being overly pessimistic in its return value.  It
was returning false when the nonzero mask was possibly the same.

The reason for this is because the nonzero

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread 钟居哲

Thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618614.html 
here is the V2 patch.
I have description about instructions are adding FRM or not.
Would you mind check it again now?



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-15 22:41
To: 钟居哲
CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
instructions
all sign injection operations (vfsgnjn/ vfsgnj/vfsgnjx and its
friends) didn't involve rounding in the operation, so vfneg.v and
vfabs.v don't need FRM.
 
On Mon, May 15, 2023 at 10:38 PM 钟居哲  wrote:
>
> And what about vfabs ? I guess it also need FRM ?
> vfneg/vfabs/vfsgnj/vfsgnj/vfsgnjx
> vfneg.v vd,vs = vfsgnjn.vv vd,vs,vs
> vfabs.v vd,vs = vfsgnjx.vv vd,vs,vs
>
> That's all questions I have, plz double check for me.
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-15 22:22
> To: 钟居哲
> CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
> > Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should 
> > have frm.
> > Is that rigth? If yes, I am gonna send a patch to fix it immediately.
>
> Yes, and I also double checked spike implementation :P
>
> and it seems like you're not committed yet, so let's send V2 :)
>
> On Mon, May 15, 2023 at 10:12 PM 钟居哲  wrote:
> >
> > Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should 
> > have frm.
> > Is that rigth? If yes, I am gonna send a patch to fix it immediately.
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Kito Cheng
> > Date: 2023-05-15 22:07
> > To: 钟居哲
> > CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
> > Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating 
> > point instructions
> > Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
> > checked spike that match that.
> >
> > On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
> > >
> > > I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> > > maintainer) said we should
> > > not add frm into vsqrt.v. Maybe kito knows the reason ?
> > >
> > > https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
> > >
> > >
> > >
> > >
> > > juzhe.zh...@rivai.ai
> > >
> > > From: Jeff Law
> > > Date: 2023-05-15 21:52
> > > To: juzhe.zhong; gcc-patches
> > > CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> > > Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> > > instructions
> > >
> > >
> > > On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > > > From: Juzhe-Zhong 
> > > >
> > > > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > > > into floating-point instructions.
> > > >
> > > > The floating-point instructions we added FRM and rounding mode operand:
> > > > 1. vfadd/vfsub
> > > > 2. vfwadd/vfwsub
> > > > 3. vfmul
> > > > 4. vfdiv
> > > > 5. vfwmul
> > > > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > > > 7. vfsqrt7/vfrec7
> > > > 8. floating-point conversions.
> > > > 9. floating-point reductions.
> > > >
> > > > The floating-point instructions we did NOT add FRM and rounding mode 
> > > > operand:
> > > > 1. vfsqrt/vfneg
> > > Assuming vfsqrt is actually an estimator the best place to handle
> > > rounding modes is at the last step(s) after N-R or Goldschmidt
> > > refinement steps.  I haven't paid too much attention to FP yet, but this
> > > is an area I've got fairly extensive experience.
> > >
> > > Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> > > are going to result in an implementation that may not actually be any
> > > better than what glibc can do.
> > >
> > > Jeff
> > >
> >
>

[PATCH V2] RISC-V: Add FRM and rounding mode operand into floating point intrinsics

2023-05-15 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is adding rounding mode operand and FRM_REGNUM dependency
into floating-point instructions.

The floating-point instructions we added FRM and rounding mode operand:
1. vfadd/vfsub
2. vfwadd/vfwsub
3. vfmul
4. vfdiv
5. vfwmul
6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
7. vfsqrt
8. floating-point conversions.
9. floating-point reductions.
10. floating-point ternary.

The floating-point instructions we did NOT add FRM and rounding mode operand:
1. vfabs/vfneg/vfsqrt7/vfrec7
2. vfmin/vfmax
3. comparisons
4. vfclass
5. vfsgnj/vfsgnjn/vfsgnjx
6. vfmerge
7. vfmv.v.f

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum frm_field_enum): New enum.
* config/riscv/riscv-vector-builtins.cc 
(function_expander::use_ternop_insn): Add default rounding mode.
(function_expander::use_widen_ternop_insn): Ditto.
* config/riscv/riscv.cc (riscv_hard_regno_nregs): Add FRM REGNUM.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_conditional_register_usage): Ditto.
* config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
(FRM_REG_P): Ditto.
(RISCV_DWARF_FRM): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: split no frm and has frm operations.
* config/riscv/vector.md (@pred__scalar): New pattern.
(@pred_): Ditto.

---
 gcc/config/riscv/riscv-protos.h   |  10 +
 gcc/config/riscv/riscv-vector-builtins.cc |  14 ++
 gcc/config/riscv/riscv.cc |   7 +-
 gcc/config/riscv/riscv.h  |   7 +-
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/vector-iterators.md  |   9 +-
 gcc/config/riscv/vector.md| 258 ++
 7 files changed, 251 insertions(+), 55 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 835bb802fc6..12634d0ac1a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -231,6 +231,16 @@ enum vxrm_field_enum
   VXRM_RDN,
   VXRM_ROD
 };
+/* Rounding mode bitfield for floating point FRM.  */
+enum frm_field_enum
+{
+  FRM_RNE = 0b000,
+  FRM_RTZ = 0b001,
+  FRM_RDN = 0b010,
+  FRM_RUP = 0b011,
+  FRM_RMM = 0b100,
+  DYN = 0b111
+};
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 1de075fb90d..b7458aaace6 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3460,6 +3460,13 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
insn_code icode)
   add_input_operand (Pmode, get_tail_policy_for_pred (pred));
   add_input_operand (Pmode, get_mask_policy_for_pred (pred));
   add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
+
+  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
mode.
+ We add default rounding mode for the intrinsics that didn't model rounding
+ mode yet.  */
+  if (opno != insn_data[icode].n_generator_args)
+add_input_operand (Pmode, const0_rtx);
+
   return generate_insn (icode);
 }
 
@@ -3482,6 +3489,13 @@ function_expander::use_widen_ternop_insn (insn_code 
icode)
   add_input_operand (Pmode, get_tail_policy_for_pred (pred));
   add_input_operand (Pmode, get_mask_policy_for_pred (pred));
   add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
+
+  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
mode.
+ We add default rounding mode for the intrinsics that didn't model rounding
+ mode yet.  */
+  if (opno != insn_data[icode].n_generator_args)
+add_input_operand (Pmode, const0_rtx);
+
   return generate_insn (icode);
 }
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b52e613c629..de5b87b1a87 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6082,7 +6082,8 @@ riscv_hard_regno_nregs (unsigned int regno, machine_mode 
mode)
 
   /* mode for VL or VTYPE are just a marker, not holding value,
  so it always consume one register.  */
-  if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno))
+  if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno)
+  || FRM_REG_P (regno))
 return 1;
 
   /* Assume every valid non-vector mode fits in one vector register.  */
@@ -6150,7 +6151,8 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
   if (lmul != 1)
return ((regno % lmul) == 0);
 }
-  else if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno))
+  else if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno)
+  || FRM_REG_P (regno))
 return true;
   else
 return false;
@@ -6587,6 +6589,7 @@ riscv_conditional_register_usage (void)
   fixed_regs[VTYPE_REGNUM] = call_used_regs[VTYPE_REGNUM] = 1;
   fixed_regs[VL_REGNUM] = call_used_regs[VL_REGNUM] = 1;
   fixed_regs[VXRM_REGNUM] = call_used_regs[VXR

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread Kito Cheng via Gcc-patches

all sign injection operations (vfsgnjn/ vfsgnj/vfsgnjx and its
friends) didn't involve rounding in the operation, so vfneg.v and
vfabs.v don't need FRM.

On Mon, May 15, 2023 at 10:38 PM 钟居哲  wrote:
>
> And what about vfabs ? I guess it also need FRM ?
> vfneg/vfabs/vfsgnj/vfsgnj/vfsgnjx
> vfneg.v vd,vs = vfsgnjn.vv vd,vs,vs
> vfabs.v vd,vs = vfsgnjx.vv vd,vs,vs
>
> That's all questions I have, plz double check for me.
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-15 22:22
> To: 钟居哲
> CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
> > Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should 
> > have frm.
> > Is that rigth? If yes, I am gonna send a patch to fix it immediately.
>
> Yes, and I also double checked spike implementation :P
>
> and it seems like you're not committed yet, so let's send V2 :)
>
> On Mon, May 15, 2023 at 10:12 PM 钟居哲  wrote:
> >
> > Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should 
> > have frm.
> > Is that rigth? If yes, I am gonna send a patch to fix it immediately.
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Kito Cheng
> > Date: 2023-05-15 22:07
> > To: 钟居哲
> > CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
> > Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating 
> > point instructions
> > Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
> > checked spike that match that.
> >
> > On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
> > >
> > > I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> > > maintainer) said we should
> > > not add frm into vsqrt.v. Maybe kito knows the reason ?
> > >
> > > https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
> > >
> > >
> > >
> > >
> > > juzhe.zh...@rivai.ai
> > >
> > > From: Jeff Law
> > > Date: 2023-05-15 21:52
> > > To: juzhe.zhong; gcc-patches
> > > CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> > > Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> > > instructions
> > >
> > >
> > > On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > > > From: Juzhe-Zhong 
> > > >
> > > > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > > > into floating-point instructions.
> > > >
> > > > The floating-point instructions we added FRM and rounding mode operand:
> > > > 1. vfadd/vfsub
> > > > 2. vfwadd/vfwsub
> > > > 3. vfmul
> > > > 4. vfdiv
> > > > 5. vfwmul
> > > > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > > > 7. vfsqrt7/vfrec7
> > > > 8. floating-point conversions.
> > > > 9. floating-point reductions.
> > > >
> > > > The floating-point instructions we did NOT add FRM and rounding mode 
> > > > operand:
> > > > 1. vfsqrt/vfneg
> > > Assuming vfsqrt is actually an estimator the best place to handle
> > > rounding modes is at the last step(s) after N-R or Goldschmidt
> > > refinement steps.  I haven't paid too much attention to FP yet, but this
> > > is an area I've got fairly extensive experience.
> > >
> > > Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> > > are going to result in an implementation that may not actually be any
> > > better than what glibc can do.
> > >
> > > Jeff
> > >
> >
>

Re: [pushed] c++: fix TTP level reduction cache

2023-05-15 Thread Patrick Palka via Gcc-patches

On Wed, 3 May 2023, Jason Merrill via Gcc-patches wrote:

> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
> -- 8< --
> 
> We try to cache the result of reduce_template_parm_level so that when we
> reduce the same parm multiple times we get the same result, but this wasn't
> working for template template parms because in that case TYPE is a
> TEMPLATE_TEMPLATE_PARM, and so same_type_p was false because of the same
> level mismatch that we're trying to adjust for.  So in that case compare the
> template parms of the template template parms instead.
> 
> The result can be seen in nontype12.C, where we previously gave three
> duplicate errors on line 7 and now give only one because subsequent
> substitutions use the cache.
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (reduce_template_parm_level): Fix comparison of
>   template template parm to cached version.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/nontype12.C: Check for duplicate error.
> ---
>  gcc/cp/pt.cc  | 7 ++-
>  gcc/testsuite/g++.dg/template/nontype12.C | 3 ++-
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 471fc20bc5b..5446b5058b7 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -4550,7 +4550,12 @@ reduce_template_parm_level (tree index, tree type, int 
> levels, tree args,
>if (TEMPLATE_PARM_DESCENDANTS (index) == NULL_TREE
>|| (TEMPLATE_PARM_LEVEL (TEMPLATE_PARM_DESCENDANTS (index))
> != TEMPLATE_PARM_LEVEL (index) - levels)
> -  || !same_type_p (type, TREE_TYPE (TEMPLATE_PARM_DESCENDANTS (index
> +  || !(TREE_CODE (type) == TEMPLATE_TEMPLATE_PARM
> +? (comp_template_parms
> +   (DECL_TEMPLATE_PARMS (TYPE_NAME (type)),
> +DECL_TEMPLATE_PARMS (TEMPLATE_PARM_DECL
> + (TEMPLATE_PARM_DESCENDANTS (index)

Isn't this comparing the unsubstituted/unlowered template parameters of
the ttp vs the substituted/lowered template parameters?  So this test
should always return false because the depth of the two sets of tparms
will always be different.

I'm surprised then that this has an effect for g++.dg/template/nontype12.C.
Ah, seems it's because compare_template_parms returns true if either set
of template parameters contains error_mark_node, for sake of error
recovery.  In this case, the two sets of template parameters for the
ttp in

  template class> int bar();

are

  3 D.2778, 2 , 1 T  [original template parameters]
  2 <<< error >>>, 1 [substituted/lowered via T=double]

where the error_mark_node is due to double not being a valid template
parameter before C++20.

Perhaps we should be comparing only the innermost parameters instead?
That way the test will work in non-erroneous cases for at least for ttps with
non-dependent template parameters.

We might also want to consider caching the TEMPLATE_TEMPLATE_PARM node as well,
by adjusting the hunk in 'tsubst':

pt.cc:16234
if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
&& (arg = TEMPLATE_TYPE_PARM_INDEX (t),
r = TEMPLATE_PARM_DESCENDANTS (arg))
&& (TEMPLATE_PARM_LEVEL (r)
== TEMPLATE_PARM_LEVEL (arg) - levels))
  /* Cache the simple case of lowering a type parameter.  */
  r = TREE_TYPE (r);

> +: same_type_p (type, TREE_TYPE (TEMPLATE_PARM_DESCENDANTS (index)
>  {
>tree orig_decl = TEMPLATE_PARM_DECL (index);
>  
> diff --git a/gcc/testsuite/g++.dg/template/nontype12.C 
> b/gcc/testsuite/g++.dg/template/nontype12.C
> index e37cf8f7646..6642ffd0a13 100644
> --- a/gcc/testsuite/g++.dg/template/nontype12.C
> +++ b/gcc/testsuite/g++.dg/template/nontype12.C
> @@ -4,7 +4,8 @@
>  template struct A
>  {
>template int foo();// { dg-error "double" "" { 
> target c++17_down } }
> -  template class> int bar();// { dg-error "double" "" { 
> target c++17_down } }
> +  template class> int bar();// { dg-bogus 
> {double.*C:7:[^\n]*double} }
> +  // { dg-error "double" "" { target c++17_down } .-1 }
>template struct X; // { dg-error "double" "" { 
> target c++17_down } }
>  };
>  
> 
> base-commit: d7cb9720ed54687bd1135c5e6ef90776a9db0bd5
> -- 
> 2.31.1
> 
>

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread 钟居哲

And what about vfabs ? I guess it also need FRM ?
vfneg/vfabs/vfsgnj/vfsgnj/vfsgnjx
vfneg.v vd,vs = vfsgnjn.vv vd,vs,vs
vfabs.v vd,vs = vfsgnjx.vv vd,vs,vs

That's all questions I have, plz double check for me.
Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-15 22:22
To: 钟居哲
CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
instructions
> Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
> frm.
> Is that rigth? If yes, I am gonna send a patch to fix it immediately.
 
Yes, and I also double checked spike implementation :P
 
and it seems like you're not committed yet, so let's send V2 :)
 
On Mon, May 15, 2023 at 10:12 PM 钟居哲  wrote:
>
> Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
> frm.
> Is that rigth? If yes, I am gonna send a patch to fix it immediately.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-15 22:07
> To: 钟居哲
> CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
> Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
> checked spike that match that.
>
> On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
> >
> > I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> > maintainer) said we should
> > not add frm into vsqrt.v. Maybe kito knows the reason ?
> >
> > https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
> >
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Jeff Law
> > Date: 2023-05-15 21:52
> > To: juzhe.zhong; gcc-patches
> > CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> > Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> > instructions
> >
> >
> > On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > > From: Juzhe-Zhong 
> > >
> > > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > > into floating-point instructions.
> > >
> > > The floating-point instructions we added FRM and rounding mode operand:
> > > 1. vfadd/vfsub
> > > 2. vfwadd/vfwsub
> > > 3. vfmul
> > > 4. vfdiv
> > > 5. vfwmul
> > > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > > 7. vfsqrt7/vfrec7
> > > 8. floating-point conversions.
> > > 9. floating-point reductions.
> > >
> > > The floating-point instructions we did NOT add FRM and rounding mode 
> > > operand:
> > > 1. vfsqrt/vfneg
> > Assuming vfsqrt is actually an estimator the best place to handle
> > rounding modes is at the last step(s) after N-R or Goldschmidt
> > refinement steps.  I haven't paid too much attention to FP yet, but this
> > is an area I've got fairly extensive experience.
> >
> > Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> > are going to result in an implementation that may not actually be any
> > better than what glibc can do.
> >
> > Jeff
> >
>

Re: [PATCH 2/2] ivopts: Revert register pressure cost when there are enough registers.

2023-05-15 Thread Jovan Dmitrovic

Hi Richard,
I had pinged the community about this problem back in March, and I will be 
taking Dimitrije's place, considering he isn't working on these patches anymore.

Your solution for 2/2 seems reasonable, I don't exactly know why 
target_reg_cost hasn't been accounted for in the first case, nor do I know why 
that particular case was specified at all.

I will get back to you when I have researched 1/2 a bit more thoroughly.

Regards,
Jovan


From: Richard Biener 
Sent: Monday, May 15, 2023 2:23 PM
To: Dimitrije Milošević 
Cc: gcc-patches@gcc.gnu.org ; Djordje Todorovic 
; jeffreya...@gmail.com 
Subject: Re: [PATCH 2/2] ivopts: Revert register pressure cost when there are 
enough registers.

On Mon, May 15, 2023 at 12:44 PM Richard Biener
 wrote:
>
> On Wed, Dec 21, 2022 at 2:12 PM Dimitrije Milošević
>  wrote:
> >
> > When there are enough registers, the register pressure cost is
> > unnecessarily bumped by adding another n_cands.
> >
> > This behavior may result in register pressure costs for the case
> > when there are enough registers being higher than for other cases.
> >
> > When there are enough registers, the register pressure cost should be
> > equal to n_invs + n_cands.
> >
> > This used to be the case before c18101f.
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> >
> > Signed-off-by: Dimitrije Milosevic 
> > ---
> >  gcc/tree-ssa-loop-ivopts.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index 60c61dc9e49..3176482d0d9 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -6092,7 +6092,7 @@ ivopts_estimate_reg_pressure (struct ivopts_data 
> > *data, unsigned n_invs,
> >
> >/* If we have enough registers.  */
> >if (regs_needed + target_res_regs < available_regs)
> > -cost = n_new;
> > +return n_new;
>
> This still doesn't make much sense (before nor after).  We're
> comparing apples and oranges.
>
> I think it would make most sense to merge this case with the following
> and thus do
> the following.  The distinction between the cases should be preserved
> and attenuated
> by the adding of n_cands at the end (as tie-breaker).
>
> Does this help the mips case?  I'm going to throw it at x86_64-linux
> bootstrap/regtest.
>
> Btw, I don't think using address complexity makes much sense for a port that
> has only one addressing mode so I guess a better approach for 1/2 would be
> to make sure it is consistently the same value (I suppose it is not, otherwise
> you wouldn't have changed it).  Oh, and we're adding the
> reg-pressure cost to the same bucket as well, and there we don't really know
> how many times we're going to spill.  That said, I think ->complexity should
> rather go away - we are asking for address-cost already and IVOPTs uses
> built RTX to query the target.
>
> But yes, I agree ivopts_estimate_reg_pressure has an issue.
>
> Sorry for the very long delay,
> Richard.

The patch below bootstraps and regtests ok on x86_64-unknown-linux-gnu,
but I guess that doesn't mean much.

Richard.

> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 6fbd2d59318..bc8493622de 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -6077,8 +6077,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data
> *data, unsigned n_invs,
>   unsigned n_cands)
>  {
>unsigned cost;
> -  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> -  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> +  unsigned n_old = data->regs_used;
> +  unsigned regs_needed = n_invs + n_cands + n_old;
> +  unsigned available_regs = target_avail_regs;
>bool speed = data->speed;
>
>/* If there is a call in the loop body, the call-clobbered registers
> @@ -6087,10 +6088,7 @@ ivopts_estimate_reg_pressure (struct
> ivopts_data *data, unsigned n_invs,
>  available_regs = available_regs - target_clobbered_regs;
>
>/* If we have enough registers.  */
> -  if (regs_needed + target_res_regs < available_regs)
> -cost = n_new;
> -  /* If close to running out of registers, try to preserve them.  */
> -  else if (regs_needed <= available_regs)
> +  if (regs_needed <= available_regs)
>  cost = target_reg_cost [speed] * regs_needed;
>/* If we run out of available registers but the number of candidates
>   does not, we penalize extra registers using target_spill_cost.  */
>
>
> >/* If close to running out of registers, try to preserve them.  */
> >else if (regs_needed <= available_regs)
> >  cost = target_reg_cost [speed] * regs_needed;
> > --
> > 2.25.1
> >

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread 钟居哲

The reason I ask vfsgnjn since according to RVV ISA:
vfneg.v vd,vs = vfsgnjn.vv vd,vs,vs.

It's really confusing here that document has FRM in vfneg but no FRM in vfsgnjn 
?
It's really odd here.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-15 22:22
To: 钟居哲
CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
instructions
> Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
> frm.
> Is that rigth? If yes, I am gonna send a patch to fix it immediately.
 
Yes, and I also double checked spike implementation :P
 
and it seems like you're not committed yet, so let's send V2 :)
 
On Mon, May 15, 2023 at 10:12 PM 钟居哲  wrote:
>
> Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
> frm.
> Is that rigth? If yes, I am gonna send a patch to fix it immediately.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-15 22:07
> To: 钟居哲
> CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
> Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
> checked spike that match that.
>
> On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
> >
> > I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> > maintainer) said we should
> > not add frm into vsqrt.v. Maybe kito knows the reason ?
> >
> > https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
> >
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Jeff Law
> > Date: 2023-05-15 21:52
> > To: juzhe.zhong; gcc-patches
> > CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> > Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> > instructions
> >
> >
> > On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > > From: Juzhe-Zhong 
> > >
> > > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > > into floating-point instructions.
> > >
> > > The floating-point instructions we added FRM and rounding mode operand:
> > > 1. vfadd/vfsub
> > > 2. vfwadd/vfwsub
> > > 3. vfmul
> > > 4. vfdiv
> > > 5. vfwmul
> > > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > > 7. vfsqrt7/vfrec7
> > > 8. floating-point conversions.
> > > 9. floating-point reductions.
> > >
> > > The floating-point instructions we did NOT add FRM and rounding mode 
> > > operand:
> > > 1. vfsqrt/vfneg
> > Assuming vfsqrt is actually an estimator the best place to handle
> > rounding modes is at the last step(s) after N-R or Goldschmidt
> > refinement steps.  I haven't paid too much attention to FP yet, but this
> > is an area I've got fairly extensive experience.
> >
> > Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> > are going to result in an implementation that may not actually be any
> > better than what glibc can do.
> >
> > Jeff
> >
>

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Aldy Hernandez via Gcc-patches





On 5/15/23 16:24, Bernhard Reutner-Fischer wrote:

On Mon, 15 May 2023 12:35:23 +0200
Aldy Hernandez via Gcc-patches  wrote:


+// For resizable ranges, resize the range up to HARD_MAX_RANGES if the
+// NEEDED pairs is greater than the current capacity of the range.
+
+inline void
+irange::maybe_resize (int needed)
+{
+  if (!m_resizable || m_max_ranges == HARD_MAX_RANGES)
+return;
+
+  if (needed > m_max_ranges)
+{
+  m_max_ranges = HARD_MAX_RANGES;
+  wide_int *newmem = new wide_int[m_max_ranges * 2];
+  memcpy (newmem, m_base, sizeof (wide_int) * num_pairs () * 2);
+  m_base = newmem;


Please excuse my ignorance, but where's the old m_base freed? I think
the assignment above does not call the destructor, or does it?


The old m_base is never freed because it points to m_ranges, a static 
array in int_range:


template
class GTY((user)) int_range : public irange
{
...
...
private:
  wide_int m_ranges[N*2];
};

Aldy

RE: [PATCH V3] RISC-V: Add rounding mode operand for fixed-point patterns

2023-05-15 Thread Li, Pan2 via Gcc-patches

Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Monday, May 15, 2023 9:42 PM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH V3] RISC-V: Add rounding mode operand for fixed-point 
patterns



On 5/15/23 04:25, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> Since we are going to have fixed-point intrinsics that are modeling 
> rounding mode
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
> 
> We should have operand to specify rounding mode in fixed-point instructions.
> We don't support these modeling rounding mode intrinsics yet but we 
> will definetely support them later.
> 
> This is the preparing patch for new coming intrinsics.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-protos.h (enum vxrm_field_enum): New enum.
>  * config/riscv/riscv-vector-builtins.cc 
> (function_expander::use_exact_insn): Add default rounding mode operand.
>  * config/riscv/riscv.cc (riscv_hard_regno_nregs): Add VXRM_REGNUM.
>  (riscv_hard_regno_mode_ok): Ditto.
>  (riscv_conditional_register_usage): Ditto.
>  * config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
>  (VXRM_REG_P): Ditto.
>  (RISCV_DWARF_VXRM): Ditto.
>  * config/riscv/riscv.md: Ditto.
>  * config/riscv/vector.md: Ditto.
OK.
jeff

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Bernhard Reutner-Fischer via Gcc-patches

On Mon, 15 May 2023 12:35:23 +0200
Aldy Hernandez via Gcc-patches  wrote:

> +// For resizable ranges, resize the range up to HARD_MAX_RANGES if the
> +// NEEDED pairs is greater than the current capacity of the range.
> +
> +inline void
> +irange::maybe_resize (int needed)
> +{
> +  if (!m_resizable || m_max_ranges == HARD_MAX_RANGES)
> +return;
> +
> +  if (needed > m_max_ranges)
> +{
> +  m_max_ranges = HARD_MAX_RANGES;
> +  wide_int *newmem = new wide_int[m_max_ranges * 2];
> +  memcpy (newmem, m_base, sizeof (wide_int) * num_pairs () * 2);
> +  m_base = newmem;

Please excuse my ignorance, but where's the old m_base freed? I think
the assignment above does not call the destructor, or does it?

thanks,

> +}
> +}
> +
> +template
> +inline
> +int_range::~int_range ()
> +{
> +  if (RESIZABLE && m_base != m_ranges)
> +delete m_base;
> +}

Re: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics

2023-05-15 Thread Richard Sandiford via Gcc-patches

Kyrylo Tkachov  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Monday, May 15, 2023 3:18 PM
>> To: Kyrylo Tkachov 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics
>> 
>> Kyrylo Tkachov  writes:
>> > Hi Richard,
>> >
>> >> -Original Message-
>> >> From: Gcc-patches > >> bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
>> >> Sandiford via Gcc-patches
>> >> Sent: Tuesday, May 9, 2023 7:48 AM
>> >> To: gcc-patches@gcc.gnu.org
>> >> Cc: Richard Sandiford 
>> >> Subject: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics
>> >>
>> >> Some ACLE intrinsics map to instructions that tie the output
>> >> operand to an input operand.  If all the operands are allocated
>> >> to different registers, and if MOVPRFX can't be used, we will need
>> >> a move either before the instruction or after it.  Many tests only
>> >> matched the "before" case; this patch makes them accept the "after"
>> >> case too.
>> >>
>> >> gcc/testsuite/
>> >>   * gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: Allow
>> >>   moves to occur after the intrinsic instruction, rather than 
>> >> requiring
>> >>   them to happen before.
>> >>   * gcc.target/aarch64/advsimd-intrinsics/bfdot-1.c: Likewise.
>> >>   * gcc.target/aarch64/advsimd-intrinsics/vdot-3-1.c: Likewise.
>> >
>> > I'm seeing some dot-product intrinsics failures:
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O1   
>> > check-function-
>> bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O1   
>> > check-function-
>> bodies ufooq_lane_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2   
>> > check-function-
>> bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2   
>> > check-function-
>> bodies ufooq_lane_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2 -flto -fno-use-
>> linker-plugin -flto-partition=none   check-function-bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2 -flto -fno-use-
>> linker-plugin -flto-partition=none   check-function-bodies ufooq_lane_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O3 -g   check-
>> function-bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O3 -g   check-
>> function-bodies ufooq_lane_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Og -g   check-
>> function-bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Og -g   check-
>> function-bodies ufooq_lane_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Os   
>> > check-function-
>> bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Os   
>> > check-function-
>> bodies ufooq_lane_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O1   check-
>> function-bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O1   check-
>> function-bodies ufooq_laneq_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2   check-
>> function-bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2   check-
>> function-bodies ufooq_laneq_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2 -flto 
>> > -fno-use-
>> linker-plugin -flto-partition=none   check-function-bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2 -flto 
>> > -fno-use-
>> linker-plugin -flto-partition=none   check-function-bodies
>> ufooq_laneq_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O3 -g   check-
>> function-bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O3 -g   check-
>> function-bodies ufooq_laneq_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Og -g   check-
>> function-bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Og -g   check-
>> function-bodies ufooq_laneq_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Os   
>> > check-function-
>> bodies ufoo_untied
>> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Os   
>> > check-function-
>> bodies ufooq_laneq_untied
>> 
>> Ugh.  Big-endian.  Hadn't thought about that being an issue.
>> Was testing natively on little-endian aarch64-linux-gnu and
>> didn't see these.
>
> FWIW this is on a little-endian aarch64-none-elf configuration.

Yeah, but the tests force big-endian, and require a  that
supports big-endian.  Newlib supports both endiannesses, but a given
glibc installation doesn't.  So the tests will be exercied on *-elf
of any endianness, but will only be exercised on *-linux-gnu for
big-endian.

Richard

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread 钟居哲

What about vfsnjn ? Do they have FRM ? I want to double check it since I don't 
trust document.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-15 22:22
To: 钟居哲
CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
instructions
> Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
> frm.
> Is that rigth? If yes, I am gonna send a patch to fix it immediately.
 
Yes, and I also double checked spike implementation :P
 
and it seems like you're not committed yet, so let's send V2 :)
 
On Mon, May 15, 2023 at 10:12 PM 钟居哲  wrote:
>
> Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
> frm.
> Is that rigth? If yes, I am gonna send a patch to fix it immediately.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-15 22:07
> To: 钟居哲
> CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
> Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
> checked spike that match that.
>
> On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
> >
> > I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> > maintainer) said we should
> > not add frm into vsqrt.v. Maybe kito knows the reason ?
> >
> > https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
> >
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Jeff Law
> > Date: 2023-05-15 21:52
> > To: juzhe.zhong; gcc-patches
> > CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> > Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> > instructions
> >
> >
> > On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > > From: Juzhe-Zhong 
> > >
> > > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > > into floating-point instructions.
> > >
> > > The floating-point instructions we added FRM and rounding mode operand:
> > > 1. vfadd/vfsub
> > > 2. vfwadd/vfwsub
> > > 3. vfmul
> > > 4. vfdiv
> > > 5. vfwmul
> > > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > > 7. vfsqrt7/vfrec7
> > > 8. floating-point conversions.
> > > 9. floating-point reductions.
> > >
> > > The floating-point instructions we did NOT add FRM and rounding mode 
> > > operand:
> > > 1. vfsqrt/vfneg
> > Assuming vfsqrt is actually an estimator the best place to handle
> > rounding modes is at the last step(s) after N-R or Goldschmidt
> > refinement steps.  I haven't paid too much attention to FP yet, but this
> > is an area I've got fairly extensive experience.
> >
> > Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> > are going to result in an implementation that may not actually be any
> > better than what glibc can do.
> >
> > Jeff
> >
>

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread Kito Cheng via Gcc-patches

> Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
> frm.
> Is that rigth? If yes, I am gonna send a patch to fix it immediately.

Yes, and I also double checked spike implementation :P

and it seems like you're not committed yet, so let's send V2 :)

On Mon, May 15, 2023 at 10:12 PM 钟居哲  wrote:
>
> Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
> frm.
> Is that rigth? If yes, I am gonna send a patch to fix it immediately.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-15 22:07
> To: 钟居哲
> CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
> Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
> checked spike that match that.
>
> On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
> >
> > I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> > maintainer) said we should
> > not add frm into vsqrt.v. Maybe kito knows the reason ?
> >
> > https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
> >
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Jeff Law
> > Date: 2023-05-15 21:52
> > To: juzhe.zhong; gcc-patches
> > CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> > Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> > instructions
> >
> >
> > On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > > From: Juzhe-Zhong 
> > >
> > > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > > into floating-point instructions.
> > >
> > > The floating-point instructions we added FRM and rounding mode operand:
> > > 1. vfadd/vfsub
> > > 2. vfwadd/vfwsub
> > > 3. vfmul
> > > 4. vfdiv
> > > 5. vfwmul
> > > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > > 7. vfsqrt7/vfrec7
> > > 8. floating-point conversions.
> > > 9. floating-point reductions.
> > >
> > > The floating-point instructions we did NOT add FRM and rounding mode 
> > > operand:
> > > 1. vfsqrt/vfneg
> > Assuming vfsqrt is actually an estimator the best place to handle
> > rounding modes is at the last step(s) after N-R or Goldschmidt
> > refinement steps.  I haven't paid too much attention to FP yet, but this
> > is an area I've got fairly extensive experience.
> >
> > Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> > are going to result in an implementation that may not actually be any
> > better than what glibc can do.
> >
> > Jeff
> >
>

RE: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics

2023-05-15 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, May 15, 2023 3:18 PM
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics
> 
> Kyrylo Tkachov  writes:
> > Hi Richard,
> >
> >> -Original Message-
> >> From: Gcc-patches  >> bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
> >> Sandiford via Gcc-patches
> >> Sent: Tuesday, May 9, 2023 7:48 AM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Richard Sandiford 
> >> Subject: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics
> >>
> >> Some ACLE intrinsics map to instructions that tie the output
> >> operand to an input operand.  If all the operands are allocated
> >> to different registers, and if MOVPRFX can't be used, we will need
> >> a move either before the instruction or after it.  Many tests only
> >> matched the "before" case; this patch makes them accept the "after"
> >> case too.
> >>
> >> gcc/testsuite/
> >>   * gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: Allow
> >>   moves to occur after the intrinsic instruction, rather than requiring
> >>   them to happen before.
> >>   * gcc.target/aarch64/advsimd-intrinsics/bfdot-1.c: Likewise.
> >>   * gcc.target/aarch64/advsimd-intrinsics/vdot-3-1.c: Likewise.
> >
> > I'm seeing some dot-product intrinsics failures:
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O1   
> > check-function-
> bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O1   
> > check-function-
> bodies ufooq_lane_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2   
> > check-function-
> bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2   
> > check-function-
> bodies ufooq_lane_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2 -flto -fno-use-
> linker-plugin -flto-partition=none   check-function-bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2 -flto -fno-use-
> linker-plugin -flto-partition=none   check-function-bodies ufooq_lane_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O3 -g   check-
> function-bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O3 -g   check-
> function-bodies ufooq_lane_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Og -g   check-
> function-bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Og -g   check-
> function-bodies ufooq_lane_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Os   
> > check-function-
> bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Os   
> > check-function-
> bodies ufooq_lane_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O1   check-
> function-bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O1   check-
> function-bodies ufooq_laneq_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2   check-
> function-bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2   check-
> function-bodies ufooq_laneq_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2 -flto -fno-use-
> linker-plugin -flto-partition=none   check-function-bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2 -flto -fno-use-
> linker-plugin -flto-partition=none   check-function-bodies
> ufooq_laneq_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O3 -g   check-
> function-bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O3 -g   check-
> function-bodies ufooq_laneq_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Og -g   check-
> function-bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Og -g   check-
> function-bodies ufooq_laneq_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Os   
> > check-function-
> bodies ufoo_untied
> > FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Os   
> > check-function-
> bodies ufooq_laneq_untied
> 
> Ugh.  Big-endian.  Hadn't thought about that being an issue.
> Was testing natively on little-endian aarch64-linux-gnu and
> didn't see these.

FWIW this is on a little-endian aarch64-none-elf configuration.
Maybe some defaults are different on bare-metal from Linux...

> 
> > From a quick inspection it looks like it's just an alternative regalloc that
> moves the mov + dot instructions around, similar to what you fixed in bfdot-
> 2.c and vdot-3-2.c.
> > I guess they need a similar adjustment?
> 
> Yeah, will fix.

Thanks!
Kyrill

> 
> Thanks,
> Richard

Re: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics

2023-05-15 Thread Richard Sandiford via Gcc-patches

Kyrylo Tkachov  writes:
> Hi Richard,
>
>> -Original Message-
>> From: Gcc-patches > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
>> Sandiford via Gcc-patches
>> Sent: Tuesday, May 9, 2023 7:48 AM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Richard Sandiford 
>> Subject: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics
>>
>> Some ACLE intrinsics map to instructions that tie the output
>> operand to an input operand.  If all the operands are allocated
>> to different registers, and if MOVPRFX can't be used, we will need
>> a move either before the instruction or after it.  Many tests only
>> matched the "before" case; this patch makes them accept the "after"
>> case too.
>>
>> gcc/testsuite/
>>   * gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: Allow
>>   moves to occur after the intrinsic instruction, rather than requiring
>>   them to happen before.
>>   * gcc.target/aarch64/advsimd-intrinsics/bfdot-1.c: Likewise.
>>   * gcc.target/aarch64/advsimd-intrinsics/vdot-3-1.c: Likewise.
>
> I'm seeing some dot-product intrinsics failures:
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O1   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O1   
> check-function-bodies ufooq_lane_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2   
> check-function-bodies ufooq_lane_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none   check-function-bodies 
> ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none   check-function-bodies 
> ufooq_lane_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O3 -g   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O3 -g   
> check-function-bodies ufooq_lane_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Og -g   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Og -g   
> check-function-bodies ufooq_lane_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Os   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Os   
> check-function-bodies ufooq_lane_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O1   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O1   
> check-function-bodies ufooq_laneq_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2   
> check-function-bodies ufooq_laneq_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none   check-function-bodies 
> ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none   check-function-bodies 
> ufooq_laneq_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O3 -g   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O3 -g   
> check-function-bodies ufooq_laneq_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Og -g   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Og -g   
> check-function-bodies ufooq_laneq_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Os   
> check-function-bodies ufoo_untied
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Os   
> check-function-bodies ufooq_laneq_untied

Ugh.  Big-endian.  Hadn't thought about that being an issue.
Was testing natively on little-endian aarch64-linux-gnu and
didn't see these.

> From a quick inspection it looks like it's just an alternative regalloc that 
> moves the mov + dot instructions around, similar to what you fixed in 
> bfdot-2.c and vdot-3-2.c.
> I guess they need a similar adjustment?

Yeah, will fix.

Thanks,
Richard

RE: [PATCH] OPTABS: Extend the number of expanding instructions pattern.

2023-05-15 Thread Li, Pan2 via Gcc-patches

Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Monday, May 15, 2023 8:52 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com
Subject: Re: [PATCH] OPTABS: Extend the number of expanding instructions 
pattern.

On Mon, 15 May 2023, juzhe.zh...@rivai.ai wrote:

> From: Juzhe-Zhong 
> 
> Hi, Richi.
> 
> We (RVV) is going to add a rounding mode operand into floating-point 
> instructions which have 11 operands.
> 
> Since we are going have intrinsic that is adding rounding mode argument:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
> 
> This is the patch that is adding rounding mode operand in RISC-V port:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618573.html
> You can see there are 11 operands in these patterns.
> 
> Is it Ok for trunk ?

OK.

Richard.

> Thanks
> 
> gcc/ChangeLog:
> 
> * optabs.cc (maybe_gen_insn): Add case to generate instruction that 
> has 11 operands.
> 
> ---
>  gcc/optabs.cc | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 
> c8e39c82d57..a12333c7169 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -8139,6 +8139,11 @@ maybe_gen_insn (enum insn_code icode, unsigned int 
> nops,
> ops[3].value, ops[4].value, ops[5].value,
> ops[6].value, ops[7].value, ops[8].value,
> ops[9].value);
> +case 11:
> +  return GEN_FCN (icode) (ops[0].value, ops[1].value, ops[2].value,
> +   ops[3].value, ops[4].value, ops[5].value,
> +   ops[6].value, ops[7].value, ops[8].value,
> +   ops[9].value, ops[10].value);
>  }
>gcc_unreachable ();
>  }
> 

--
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, 
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 
36809 (AG Nuernberg)

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread 钟居哲

Oh, do you mean vfsqrt7/vfrec7 doesn't have frm, but vfsqrt/vfneg should have 
frm.
Is that rigth? If yes, I am gonna send a patch to fix it immediately.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-15 22:07
To: 钟居哲
CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
instructions
Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
checked spike that match that.
 
On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
>
> I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> maintainer) said we should
> not add frm into vsqrt.v. Maybe kito knows the reason ?
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Jeff Law
> Date: 2023-05-15 21:52
> To: juzhe.zhong; gcc-patches
> CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
>
>
> On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > From: Juzhe-Zhong 
> >
> > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > into floating-point instructions.
> >
> > The floating-point instructions we added FRM and rounding mode operand:
> > 1. vfadd/vfsub
> > 2. vfwadd/vfwsub
> > 3. vfmul
> > 4. vfdiv
> > 5. vfwmul
> > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > 7. vfsqrt7/vfrec7
> > 8. floating-point conversions.
> > 9. floating-point reductions.
> >
> > The floating-point instructions we did NOT add FRM and rounding mode 
> > operand:
> > 1. vfsqrt/vfneg
> Assuming vfsqrt is actually an estimator the best place to handle
> rounding modes is at the last step(s) after N-R or Goldschmidt
> refinement steps.  I haven't paid too much attention to FP yet, but this
> is an area I've got fairly extensive experience.
>
> Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> are going to result in an implementation that may not actually be any
> better than what glibc can do.
>
> Jeff
>

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-15 Thread Prathamesh Kulkarni via Gcc-patches

On Fri, 12 May 2023 at 00:45, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
>
> > On Tue, 2 May 2023 at 18:22, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > On Tue, 2 May 2023 at 17:32, Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> Prathamesh Kulkarni  writes:
> >> >> > On Tue, 2 May 2023 at 14:56, Richard Sandiford
> >> >> >  wrote:
> >> >> >> > [aarch64] Improve code-gen for vector initialization with single 
> >> >> >> > constant element.
> >> >> >> >
> >> >> >> > gcc/ChangeLog:
> >> >> >> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): 
> >> >> >> > Tweak condition
> >> >> >> >   if (n_var == n_elts && n_elts <= 16) to allow a single 
> >> >> >> > constant,
> >> >> >> >   and if maxv == 1, use constant element for duplicating into 
> >> >> >> > register.
> >> >> >> >
> >> >> >> > gcc/testsuite/ChangeLog:
> >> >> >> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
> >> >> >> >
> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
> >> >> >> > b/gcc/config/aarch64/aarch64.cc
> >> >> >> > index 2b0de7ca038..f46750133a6 100644
> >> >> >> > --- a/gcc/config/aarch64/aarch64.cc
> >> >> >> > +++ b/gcc/config/aarch64/aarch64.cc
> >> >> >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, 
> >> >> >> > rtx vals)
> >> >> >> >   and matches[X][1] with the count of duplicate elements (if X 
> >> >> >> > is the
> >> >> >> >   earliest element which has duplicates).  */
> >> >> >> >
> >> >> >> > -  if (n_var == n_elts && n_elts <= 16)
> >> >> >> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
> >> >> >> >  {
> >> >> >> >int matches[16][2] = {0};
> >> >> >> >for (int i = 0; i < n_elts; i++)
> >> >> >> > @@ -7,6 +7,18 @@ aarch64_expand_vector_init (rtx target, 
> >> >> >> > rtx vals)
> >> >> >> >vector register.  For big-endian we want that position 
> >> >> >> > to hold
> >> >> >> >the last element of VALS.  */
> >> >> >> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
> >> >> >> > +
> >> >> >> > +   /* If we have a single constant element, use that for 
> >> >> >> > duplicating
> >> >> >> > +  instead.  */
> >> >> >> > +   if (n_var == n_elts - 1)
> >> >> >> > + for (int i = 0; i < n_elts; i++)
> >> >> >> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
> >> >> >> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
> >> >> >> > + {
> >> >> >> > +   maxelement = i;
> >> >> >> > +   break;
> >> >> >> > + }
> >> >> >> > +
> >> >> >> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, 
> >> >> >> > maxelement));
> >> >> >> > aarch64_emit_move (target, lowpart_subreg (mode, x, 
> >> >> >> > inner_mode));
> >> >> >>
> >> >> >> We don't want to force the constant into a register though.
> >> >> > OK right, sorry.
> >> >> > With the attached patch, for the following test-case:
> >> >> > int64x2_t f_s64(int64_t x)
> >> >> > {
> >> >> >   return (int64x2_t) { x, 1 };
> >> >> > }
> >> >> >
> >> >> > it loads constant from memory (same code-gen as without patch).
> >> >> > f_s64:
> >> >> > adrpx1, .LC0
> >> >> > ldr q0, [x1, #:lo12:.LC0]
> >> >> > ins v0.d[0], x0
> >> >> > ret
> >> >> >
> >> >> > Does the patch look OK ?
> >> >> >
> >> >> > Thanks,
> >> >> > Prathamesh
> >> >> > [...]
> >> >> > [aarch64] Improve code-gen for vector initialization with single 
> >> >> > constant element.
> >> >> >
> >> >> > gcc/ChangeLog:
> >> >> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
> >> >> > condition
> >> >> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
> >> >> >   and if maxv == 1, use constant element for duplicating into 
> >> >> > register.
> >> >> >
> >> >> > gcc/testsuite/ChangeLog:
> >> >> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
> >> >> >
> >> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
> >> >> > b/gcc/config/aarch64/aarch64.cc
> >> >> > index 2b0de7ca038..97309ddec4f 100644
> >> >> > --- a/gcc/config/aarch64/aarch64.cc
> >> >> > +++ b/gcc/config/aarch64/aarch64.cc
> >> >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx 
> >> >> > vals)
> >> >> >   and matches[X][1] with the count of duplicate elements (if X is 
> >> >> > the
> >> >> >   earliest element which has duplicates).  */
> >> >> >
> >> >> > -  if (n_var == n_elts && n_elts <= 16)
> >> >> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
> >> >>
> >> >> No need for the extra brackets.
> >> > Adjusted, thanks. Sorry if this sounds like a silly question, but why
> >> > do we need the n_elts <= 16 check ?
> >> > Won't n_elts be always <= 16 since max number of elements in a vector
> >> > would be 16 for V16QI ?
> >>
> >> Was wondering the same thing :)
> >>
> >> Let's leave it though.
> >>
> >> >> >  {
> >> >> >int matches[16][2] = {0};

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread 钟居哲

So, you mean I also need to add frm into vsqrt? 
If yes, I am now send another patch to add it.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-15 22:07
To: 钟居哲
CC: Jeff Law; gcc-patches; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
instructions
Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
checked spike that match that.
 
On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
>
> I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> maintainer) said we should
> not add frm into vsqrt.v. Maybe kito knows the reason ?
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Jeff Law
> Date: 2023-05-15 21:52
> To: juzhe.zhong; gcc-patches
> CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
>
>
> On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > From: Juzhe-Zhong 
> >
> > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > into floating-point instructions.
> >
> > The floating-point instructions we added FRM and rounding mode operand:
> > 1. vfadd/vfsub
> > 2. vfwadd/vfwsub
> > 3. vfmul
> > 4. vfdiv
> > 5. vfwmul
> > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > 7. vfsqrt7/vfrec7
> > 8. floating-point conversions.
> > 9. floating-point reductions.
> >
> > The floating-point instructions we did NOT add FRM and rounding mode 
> > operand:
> > 1. vfsqrt/vfneg
> Assuming vfsqrt is actually an estimator the best place to handle
> rounding modes is at the last step(s) after N-R or Goldschmidt
> refinement steps.  I haven't paid too much attention to FP yet, but this
> is an area I've got fairly extensive experience.
>
> Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> are going to result in an implementation that may not actually be any
> better than what glibc can do.
>
> Jeff
>

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread Kito Cheng via Gcc-patches

Oh, Craig says vfrsqrt7.v not have frm but vsqrt.v have frm, and
checked spike that match that.

On Mon, May 15, 2023 at 9:55 PM 钟居哲  wrote:
>
> I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
> maintainer) said we should
> not add frm into vsqrt.v. Maybe kito knows the reason ?
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Jeff Law
> Date: 2023-05-15 21:52
> To: juzhe.zhong; gcc-patches
> CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
> Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
> instructions
>
>
> On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> > From: Juzhe-Zhong 
> >
> > This patch is adding rounding mode operand and FRM_REGNUM dependency
> > into floating-point instructions.
> >
> > The floating-point instructions we added FRM and rounding mode operand:
> > 1. vfadd/vfsub
> > 2. vfwadd/vfwsub
> > 3. vfmul
> > 4. vfdiv
> > 5. vfwmul
> > 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> > 7. vfsqrt7/vfrec7
> > 8. floating-point conversions.
> > 9. floating-point reductions.
> >
> > The floating-point instructions we did NOT add FRM and rounding mode 
> > operand:
> > 1. vfsqrt/vfneg
> Assuming vfsqrt is actually an estimator the best place to handle
> rounding modes is at the last step(s) after N-R or Goldschmidt
> refinement steps.  I haven't paid too much attention to FP yet, but this
> is an area I've got fairly extensive experience.
>
> Sadly RISC-V's estimator is fairly poor and the single instance FMACs
> are going to result in an implementation that may not actually be any
> better than what glibc can do.
>
> Jeff
>

RE: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics

2023-05-15 Thread Kyrylo Tkachov via Gcc-patches

Hi Richard,

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
> Sandiford via Gcc-patches
> Sent: Tuesday, May 9, 2023 7:48 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford 
> Subject: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics
> 
> Some ACLE intrinsics map to instructions that tie the output
> operand to an input operand.  If all the operands are allocated
> to different registers, and if MOVPRFX can't be used, we will need
> a move either before the instruction or after it.  Many tests only
> matched the "before" case; this patch makes them accept the "after"
> case too.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: Allow
>   moves to occur after the intrinsic instruction, rather than requiring
>   them to happen before.
>   * gcc.target/aarch64/advsimd-intrinsics/bfdot-1.c: Likewise.
>   * gcc.target/aarch64/advsimd-intrinsics/vdot-3-1.c: Likewise.

I'm seeing some dot-product intrinsics failures:
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O1   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O1   
check-function-bodies ufooq_lane_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2   
check-function-bodies ufooq_lane_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   check-function-bodies 
ufooq_lane_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O3 -g   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -O3 -g   
check-function-bodies ufooq_lane_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Og -g   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Og -g   
check-function-bodies ufooq_lane_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Os   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c   -Os   
check-function-bodies ufooq_lane_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O1   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O1   
check-function-bodies ufooq_laneq_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2   
check-function-bodies ufooq_laneq_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   check-function-bodies 
ufooq_laneq_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O3 -g   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -O3 -g   
check-function-bodies ufooq_laneq_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Og -g   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Og -g   
check-function-bodies ufooq_laneq_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Os   
check-function-bodies ufoo_untied
FAIL: gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c   -Os   
check-function-bodies ufooq_laneq_untied

>From a quick inspection it looks like it's just an alternative regalloc that 
>moves the mov + dot instructions around, similar to what you fixed in 
>bfdot-2.c and vdot-3-2.c.
I guess they need a similar adjustment?
Thanks,
Kyrill

>   * gcc.target/aarch64/sve/acle/asm/adda_f16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/brka_b.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/brkb_b.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/brkn_b.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/clasta_f16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/clasta_f32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/clasta_f64.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/clastb_f16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/clastb_f32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/clastb_f64.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/pfirst_b.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/pnext_b16.c: Likewise.
>   * gcc.tar

Re: [PATCH RFC] c-family: make -fno-permissive upgrade pedwarns

2023-05-15 Thread Jason Merrill via Gcc-patches


On 5/15/23 03:32, Richard Biener wrote:

On Fri, May 12, 2023 at 10:54 PM Jason Merrill via Gcc-patches
 wrote:


In the context of the recent discussion, it occurred to me that this semantic
would be useful, but currently there is no easy way to access it.  Bikeshedding
welcome; the use of this flag is a bit odd, but it has the advantage of being
accepted without error going back at least to 4.3.

-- 8< --

Currently there is no flag to use to upgrade all currently-enabled pedwarns
from warning to error.  -pedantic-errors also enables the -Wpedantic
pedwarns, while -Werror=pedantic uselessly makes only the -Wpedantic
pedwarns errors.

I suggest that since -fpermissive lowers some diagnostics from error to
warning, -fno-permissive could do the reverse.


Hmm, but that makes '-fno-permissive' different from '-fpermissive
-fno-permissive'?
What about '-fpermissive -fno-permissive -fno-permissive' then?

So I think over-loading -fno-permissive with differen semantics from negating
the option is bad.


Fair enough.  Any other thoughts?  It occurs to me now that it is 
already possible to specify this behavior with -pedantic-errors 
-Wno-pedantic, maybe that's sufficient if a bit cumbersome.



gcc/ChangeLog:

 * doc/invoke.texi: Document -fno-permissive.

gcc/c-family/ChangeLog:

 * c.opt (fpermissive): Accept in C and ObjC as well.
 * c-opts.cc (c_common_post_options): -fno-permissive sets
 global_dc->pedantic_errors.
---
  gcc/doc/invoke.texi| 7 +++
  gcc/c-family/c.opt | 2 +-
  gcc/c-family/c-opts.cc | 4 
  3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b92b8576027..6198df14382 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3438,11 +3438,18 @@ issue.  Currently, the only such diagnostic issued by 
G++ is the one for
  a name having multiple meanings within a class.

  @opindex fpermissive
+@opindex fno-permissive
  @item -fpermissive
  Downgrade some diagnostics about nonconformant code from errors to
  warnings.  Thus, using @option{-fpermissive} allows some
  nonconforming code to compile.

+Conversely, @option{-fno-permissive} can be used to upgrade some
+diagnostics about nonconformant code from warnings to errors.  This
+differs from @option{-pedantic-errors} in that the latter also implies
+@option{-Wpedantic}; this option does not enable additional
+diagnostics, only upgrades the severity of those that are enabled.
+
  @opindex fno-pretty-templates
  @opindex fpretty-templates
  @item -fno-pretty-templates
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cddeece..07165d2bbe8 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2075,7 +2075,7 @@ C ObjC C++ ObjC++
  Look for and use PCH files even when preprocessing.

  fpermissive
-C++ ObjC++ Var(flag_permissive)
+C ObjC C++ ObjC++ Var(flag_permissive)
  Downgrade conformance errors to warnings.

  fplan9-extensions
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index c68a2a27469..1973c068d59 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1021,6 +1021,10 @@ c_common_post_options (const char **pfilename)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
flag_delete_dead_exceptions, true);

+  if (!global_options_set.x_flag_pedantic_errors
+  && global_options_set.x_flag_permissive)
+global_dc->pedantic_errors = !flag_permissive;
+
if (cxx_dialect >= cxx11)
  {
/* If we're allowing C++0x constructs, don't warn about C++98

base-commit: 62c4d34ec005e95f000ffabd34da440dc62ac346
--
2.31.1

Re: Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread 钟居哲

I don't know why we should not add frm vfsqrt.v since I saw topper (LLVM 
maintainer) said we should
not add frm into vsqrt.v. Maybe kito knows the reason ?

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226 




juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-15 21:52
To: juzhe.zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add rounding mode operand for floating point 
instructions
 
 
On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> This patch is adding rounding mode operand and FRM_REGNUM dependency
> into floating-point instructions.
> 
> The floating-point instructions we added FRM and rounding mode operand:
> 1. vfadd/vfsub
> 2. vfwadd/vfwsub
> 3. vfmul
> 4. vfdiv
> 5. vfwmul
> 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> 7. vfsqrt7/vfrec7
> 8. floating-point conversions.
> 9. floating-point reductions.
> 
> The floating-point instructions we did NOT add FRM and rounding mode operand:
> 1. vfsqrt/vfneg
Assuming vfsqrt is actually an estimator the best place to handle 
rounding modes is at the last step(s) after N-R or Goldschmidt 
refinement steps.  I haven't paid too much attention to FP yet, but this 
is an area I've got fairly extensive experience.
 
Sadly RISC-V's estimator is fairly poor and the single instance FMACs 
are going to result in an implementation that may not actually be any 
better than what glibc can do.
 
Jeff

Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 07:44, Kito Cheng wrote:

LGTM

Agreed.
jeff

Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 05:49, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This patch is adding rounding mode operand and FRM_REGNUM dependency
into floating-point instructions.

The floating-point instructions we added FRM and rounding mode operand:
1. vfadd/vfsub
2. vfwadd/vfwsub
3. vfmul
4. vfdiv
5. vfwmul
6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
7. vfsqrt7/vfrec7
8. floating-point conversions.
9. floating-point reductions.

The floating-point instructions we did NOT add FRM and rounding mode operand:
1. vfsqrt/vfneg
Assuming vfsqrt is actually an estimator the best place to handle 
rounding modes is at the last step(s) after N-R or Goldschmidt 
refinement steps.  I haven't paid too much attention to FP yet, but this 
is an area I've got fairly extensive experience.


Sadly RISC-V's estimator is fairly poor and the single instance FMACs 
are going to result in an implementation that may not actually be any 
better than what glibc can do.


Jeff

Re: [PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread Kito Cheng via Gcc-patches

LGTM

於 2023年5月15日 週一，19:50寫道：

> From: Juzhe-Zhong 
>
> This patch is adding rounding mode operand and FRM_REGNUM dependency
> into floating-point instructions.
>
> The floating-point instructions we added FRM and rounding mode operand:
> 1. vfadd/vfsub
> 2. vfwadd/vfwsub
> 3. vfmul
> 4. vfdiv
> 5. vfwmul
> 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
> 7. vfsqrt7/vfrec7
> 8. floating-point conversions.
> 9. floating-point reductions.
>
> The floating-point instructions we did NOT add FRM and rounding mode
> operand:
> 1. vfsqrt/vfneg
> 2. vfmin/vfmax
> 3. comparisons
> 4. vfclass
> 5. vfsgnj/vfsgnjn/vfsgnjx
> 6. vfmerge
> 7. vfmv.v.f
>
> TODO: floating-point ternary: FRM and rounding mode operand should be
> added but
> they are not added in this patch since it will exceed the number of
> operands can
> be handled in optabs.cc. Will add it the next patch.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (enum frm_field_enum): New enum.
> * config/riscv/riscv-vector-builtins.cc
> (function_expander::use_widen_ternop_insn): Add default rounding mode.
> * config/riscv/riscv.cc (riscv_hard_regno_nregs): Add FRM_REGNUM.
> (riscv_hard_regno_mode_ok): Ditto.
> (riscv_conditional_register_usage): Ditto.
> * config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
> (FRM_REG_P): Ditto.
> (RISCV_DWARF_FRM): Ditto.
> * config/riscv/riscv.md: Ditto.
> * config/riscv/vector-iterators.md: split smax/smin and plus/mult
> since smax/smin doesn't need FRM.
> * config/riscv/vector.md (@pred__scalar): Splitted
> pattern.
> (@pred_): Ditto.
>
> ---
>  gcc/config/riscv/riscv-protos.h   |  10 ++
>  gcc/config/riscv/riscv-vector-builtins.cc |   7 +
>  gcc/config/riscv/riscv.cc |   7 +-
>  gcc/config/riscv/riscv.h  |   7 +-
>  gcc/config/riscv/riscv.md |   1 +
>  gcc/config/riscv/vector-iterators.md  |   6 +-
>  gcc/config/riscv/vector.md| 171 ++
>  7 files changed, 171 insertions(+), 38 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-protos.h
> b/gcc/config/riscv/riscv-protos.h
> index 835bb802fc6..12634d0ac1a 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -231,6 +231,16 @@ enum vxrm_field_enum
>VXRM_RDN,
>VXRM_ROD
>  };
> +/* Rounding mode bitfield for floating point FRM.  */
> +enum frm_field_enum
> +{
> +  FRM_RNE = 0b000,
> +  FRM_RTZ = 0b001,
> +  FRM_RDN = 0b010,
> +  FRM_RUP = 0b011,
> +  FRM_RMM = 0b100,
> +  DYN = 0b111
> +};
>  }
>
>  /* We classify builtin types into two classes:
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 1de075fb90d..f10f38f6425 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -3482,6 +3482,13 @@ function_expander::use_widen_ternop_insn (insn_code
> icode)
>add_input_operand (Pmode, get_tail_policy_for_pred (pred));
>add_input_operand (Pmode, get_mask_policy_for_pred (pred));
>add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
> +
> +  /* TODO: Currently, we don't support intrinsic that is modeling
> rounding mode.
> + We add default rounding mode for the intrinsics that didn't model
> rounding
> + mode yet.  */
> +  if (opno != insn_data[icode].n_generator_args)
> +add_input_operand (Pmode, const0_rtx);
> +
>return generate_insn (icode);
>  }
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index b52e613c629..de5b87b1a87 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -6082,7 +6082,8 @@ riscv_hard_regno_nregs (unsigned int regno,
> machine_mode mode)
>
>/* mode for VL or VTYPE are just a marker, not holding value,
>   so it always consume one register.  */
> -  if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno))
> +  if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno)
> +  || FRM_REG_P (regno))
>  return 1;
>
>/* Assume every valid non-vector mode fits in one vector register.  */
> @@ -6150,7 +6151,8 @@ riscv_hard_regno_mode_ok (unsigned int regno,
> machine_mode mode)
>if (lmul != 1)
> return ((regno % lmul) == 0);
>  }
> -  else if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno))
> +  else if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno)
> +  || FRM_REG_P (regno))
>  return true;
>else
>  return false;
> @@ -6587,6 +6589,7 @@ riscv_conditional_register_usage (void)
>fixed_regs[VTYPE_REGNUM] = call_used_regs[VTYPE_REGNUM] = 1;
>fixed_regs[VL_REGNUM] = call_used_regs[VL_REGNUM] = 1;
>fixed_regs[VXRM_REGNUM] = call_used_regs[VXRM_REGNUM] = 1;
> +  fixed_regs[FRM_REGNUM] = call_used_regs[FRM_REGNUM] = 1;
>  }
>  }
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index

Re: [PATCH] RISC-V: Add FRM and rounding mode operand into floating-point ternary instructions

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 06:22, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This patch is adding FRM and rounding mode into floating-point ternary 
instructions.
This patch should be merged after optabs.cc patch.

gcc/ChangeLog:

 * config/riscv/riscv-vector-builtins.cc 
(function_expander::use_ternop_insn): Add default rounding mode.
 * config/riscv/vector.md: Add rounding mode operand and FRM_REGNUM.

OK
jeff

Re: [PATCH V3] RISC-V: Add rounding mode operand for fixed-point patterns

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 04:25, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

Since we are going to have fixed-point intrinsics that are modeling rounding 
mode
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222

We should have operand to specify rounding mode in fixed-point instructions.
We don't support these modeling rounding mode intrinsics yet but we will 
definetely
support them later.

This is the preparing patch for new coming intrinsics.

gcc/ChangeLog:

 * config/riscv/riscv-protos.h (enum vxrm_field_enum): New enum.
 * config/riscv/riscv-vector-builtins.cc 
(function_expander::use_exact_insn): Add default rounding mode operand.
 * config/riscv/riscv.cc (riscv_hard_regno_nregs): Add VXRM_REGNUM.
 (riscv_hard_regno_mode_ok): Ditto.
 (riscv_conditional_register_usage): Ditto.
 * config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
 (VXRM_REG_P): Ditto.
 (RISCV_DWARF_VXRM): Ditto.
 * config/riscv/riscv.md: Ditto.
 * config/riscv/vector.md: Ditto.

OK.
jeff

Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-15 Thread Jeff Law via Gcc-patches





On 5/15/23 07:05, Thomas Neumann via Gcc-patches wrote:
Hello, this patch breaks the build on targets where range is not 
declared i.e. where the #ifdef ATOMIC_FDE_FAST_PATH path is not taken.


argh, I did not realize I tested the patch only on atomic fast path 
platforms. The patch below fixes that by moving the check inside the 
#ifdef.


I will check that everything works on atomic and non-atomic platforms 
and commit the trivial move then. Sorry for the breakage.


Best

Thomas



 From 550dc27f547a067e96137adeb85148d8a84c81a0 Mon Sep 17 00:00:00 2001
From: Thomas Neumann 
Date: Mon, 15 May 2023 14:59:22 +0200
Subject: [PATCH] fix assert in non-atomic path

The non-atomic path does not have range information,
we have to adjust the assert handle that case, too.

libgcc/ChangeLog:
 * unwind-dw2-fde.c: Fix assert in non-atomic path.

OK for the trunk.
jeff

Re: [PATCH v9] RISC-V: Add the 'zfa' extension, version 0.2

2023-05-15 Thread jinma via Gcc-patches

According to Jeff's review feedback, the issues regarding UNSPEC's 
implementation of round, ceil, nearbyint, etc. still need to be determined:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617706.html

source: 
https://github.com/majin2020/gcc-mirror/commit/93d7a2d995cee588d494d1839f56e8151c6cb057

RE: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-15 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Thomas Neumann 
> Sent: Monday, May 15, 2023 2:06 PM
> To: Kyrylo Tkachov ; Richard Biener
> 
> Cc: Sören Tempel ; gcc-patches@gcc.gnu.org;
> al...@ayaya.dev
> Subject: Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes
> 
> > Hello, this patch breaks the build on targets where range is not declared 
> > i.e.
> where the #ifdef ATOMIC_FDE_FAST_PATH path is not taken.
> 
> argh, I did not realize I tested the patch only on atomic fast path
> platforms. The patch below fixes that by moving the check inside the #ifdef.
> 
> I will check that everything works on atomic and non-atomic platforms
> and commit the trivial move then. Sorry for the breakage.

Thanks for the quick fix. I can confirm the aarch64 build succeeds now.
Kyrill

> 
> Best
> 
> Thomas
> 
> 
> 
>  From 550dc27f547a067e96137adeb85148d8a84c81a0 Mon Sep 17 00:00:00
> 2001
> From: Thomas Neumann 
> Date: Mon, 15 May 2023 14:59:22 +0200
> Subject: [PATCH] fix assert in non-atomic path
> 
> The non-atomic path does not have range information,
> we have to adjust the assert handle that case, too.
> 
> libgcc/ChangeLog:
>   * unwind-dw2-fde.c: Fix assert in non-atomic path.
> ---
>   libgcc/unwind-dw2-fde.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index 8683a65aa02..df461a1527d 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -240,6 +240,7 @@ __deregister_frame_info_bases (const void *begin)
> 
> // And remove
> ob = btree_remove (®istered_frames, range[0]);
> +  bool empty_table = (range[1] - range[0]) == 0;
>   #else
> init_object_mutex_once ();
> __gthread_mutex_lock (&object_mutex);
> @@ -276,11 +277,12 @@ __deregister_frame_info_bases (const void *begin)
> 
>out:
> __gthread_mutex_unlock (&object_mutex);
> +  bool empty_table = false;
>   #endif
> 
> // If we didn't find anything in the lookup data structures then they
> // were either already destroyed or we tried to remove an empty range.
> -  gcc_assert (in_shutdown || ((range[1] - range[0]) == 0 || ob));
> +  gcc_assert (in_shutdown || (empty_table || ob));
> return (void *) ob;
>   }
> 
> --
> 2.39.2
>

[PATCH v9] RISC-V: Add the 'zfa' extension, version 0.2

2023-05-15 Thread Jin Ma via Gcc-patches

This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb

The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html

What needs special explanation is:
1, The immediate number of the instructions FLI.H/S/D is represented in the 
assembly as a
  floating-point value, with scientific counting when rs1 is 2,3, and decimal 
numbers for
  the rest.

  Related llvm link:
https://reviews.llvm.org/D145645
  Related discussion link:
https://github.com/riscv/riscv-isa-manual/issues/980

2, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally 
to
  accelerate the processing of JavaScript Numbers.", so it seems that no 
implementation
  is required.

3, The instructions FMINM and FMAXM correspond to C23 library function fminimum 
and fmaximum.
  Therefore, this patch has simply implemented the pattern of fminm3 
and
  fmaxm3 to prepare for later.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension version.
* config/riscv/constraints.md (zfli): Constrain the floating point 
number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
(rup): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, 
memory is not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no 
split is required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D 
immediate in assembly
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm3): New.
(fmaxm3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f_quiet4_zfa): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-rv32.c: New test.
* gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
* gcc.target/riscv/zfa-fround-rv32.c: New test.
* gcc.target/riscv/zfa-fround.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   4 +
 gcc/config/riscv/constraints.md   |  21 +-
 gcc/config/riscv/iterators.md |   5 +
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv.cc | 204 +-
 gcc/config/riscv/riscv.md | 145 +++--
 .../gcc.target/riscv/zfa-fleq-fltq-rv32.c |  19 ++
 .../gcc.target/riscv/zfa-fleq-fltq.c  |  19 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c |  79 +++
 .../gcc.target/riscv/zfa-fli-zfh-rv32.c   |  41 
 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  41 
 gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  79 +++
 .../gcc.target/riscv/zfa-fmovh-fmovp-rv32.c   |  10 +
 .../gcc.target/riscv/zfa-fround-rv32.c|  42 
 gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  42 
 16 files changed, 719 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 3a285dfbff0..550f6796e98 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -217,6 +217,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zfa", ISA_SPEC_CLASS_NONE, 0, 2},
+
   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1260,6 +1262,8 @@ static const riscv_ext_flag_

RE: middle-end: Support early break/return auto-vectorization.

2023-05-15 Thread Tamar Christina via Gcc-patches

Hi,

Yes I hope to upstream it this year.  I'm busy cleaning up a new version of the
patch which and hope to send it up for review again next week if all tests pass.

Cheers,
Tamar

From: juzhe.zh...@rivai.ai 
Sent: Monday, May 15, 2023 6:20 AM
To: gcc-patches 
Cc: rguenther ; Tamar Christina ; 
Richard Sandiford 
Subject: middle-end: Support early break/return auto-vectorization.

Hi, this patch is very interesting patch and I found it's very beneficial after 
applying to my downstream RVV GCC.
However, it has been a long time that this patch didn't update.
Is it possible that this patch will be refined and merged into trunk in the 
future ?

Thanks

juzhe.zh...@rivai.ai

Re: [PATCH 1/2] PR gcc/98350:Add a param to control the length of the chain with FMA in reassoc pass

2023-05-15 Thread Richard Biener via Gcc-patches

On Fri, May 12, 2023 at 11:05 AM Cui, Lili  wrote:
>
> > ISTR there were no sufficient comments in the code explaining why
> > rewrite_expr_tree_parallel_for_fma is better by design.  In fact ...
> >
> > >
> > > >
> > > > >   if (!reassoc_insert_powi_p
> > > > > - && ops.length () > 3
> > > > > + && len > 3
> > > > > + && (!keep_fma_chain
> > > > > + || (keep_fma_chain
> > > > > + && len >
> > > > > + param_reassoc_max_chain_length_with_fma))
> > > >
> > > > in the case len < param_reassoc_max_chain_length_with_fma we have
> > > > the chain re-sorted but fall through to non-parallel rewrite.  I
> > > > wonder if we do not want to instead adjust the reassociation width?
> > > > I'd say it depends on the number of mult cases in the chain (sth the re-
> > sorting could have computed).
> > > > Why do we have two completely independent --params here?  Can you
> > > > give an example --param value combination that makes "sense" and
> > > > show how it is beneficial?
> > >
> > > For this small case https://godbolt.org/z/Pxczrre8P a * b + c * d + e
> > > * f  + j
> > >
> > > GCC trunk: ops_num = 4, targetm.sched.reassociation_width is 4 (scalar fp
> > cost is 4). Calculated: Width = 2. we can get 2 FMAs.
> > > --
> > >   _1 = a_6(D) * b_7(D);
> > >   _2 = c_8(D) * d_9(D);
> > >   _5 = _1 + _2;
> > >   _4 = e_10(D) * f_11(D);
> > >   _3 = _4 + j_12(D);
> > >   _13 = _3 + _5;
> > > 
> > >   _2 = c_8(D) * d_9(D);
> > >   _5 = .FMA (a_6(D), b_7(D), _2);
> > >   _3 = .FMA (e_10(D), f_11(D), j_12(D));
> > >   _13 = _3 + _5;
> > > 
> > > New patch: If just rearrange ops and fall through to parallel rewrite to
> > break the chain with width = 2.
> > >
> > > -
> > >   _1 = a_6(D) * b_7(D);
> > >   _2 = j + _1;  -> put j at the first.
> > >   _3 = c_8(D) * d_9(D);
> > >   _4 = e_10(D) * f_11(D);
> > >   _5 = _3 + _4;   -> break chain with width = 2. we lost a FMA 
> > > here.
> > >   _13 = _2 + 5;
> > >
> > > ---
> > >   _3 = c_8(D) * d_9(D);
> > >   _2 = .FMA (a_6(D), b_7(D), j);
> > >   _5 = .FMA (e_10(D), f_11(D), _3);
> > >   _13 = _2 + _5;
> > > 
> > > Sometimes break chain will lose FMA( break chain needs put two
> > > mult-ops together, which will lose one FMA ), we can only get 2 FMAs
> > > here, if we want to get 3 FMAs, we need to keep the chain and not
> > > break it. So I added a param to control chain length
> > > "param_reassoc_max_chain_length_with_fma = 4" (For the small case in
> > > Bugzilla 98350, we need to keep the chain to generate 6 FMAs.)
> > > ---
> > >   _1 = a_6(D) * b_7(D);
> > >   _2 = c_8(D) * d_9(D);
> > >   _4 = e_10(D) * f_11(D);
> > >   _15 = _4 + j_12(D);
> > >   _16 = _15 + _2;
> > >   _13 = _16 + _1;
> > > ---
> > >   _15 = .FMA (e_10(D), f_11(D), j_12(D));
> > >   _16 = .FMA (c_8(D), d_9(D), _15);
> > >   _13 = .FMA (a_6(D), b_7(D), _16);
> > > ---
> > > In some case we want to break the chain with width, we can set
> > "param_reassoc_max_chain_length_with_fma = 2", it will rearrange ops and
> > break the chain with width.
> >
> > ... it sounds like the problem could be fully addressed by sorting the chain
> > with reassoc-width in mind?
> > Wouldn't it be preferable if rewrite_expr_tree_parallel would get a vector 
> > of
> > mul and a vector of non-mul ops so it can pick from the optimal candidate?
> >
> > That said, I think rewrite_expr_tree_parallel_for_fma at least needs more
> > comments.
> >
> Sorry for not writing note clearly enough, I'll add more.
> I have two places that need to be clarified.
>
> 1. For some case we need to keep chain to generate more FMAs, because break 
> chain will lose FMA.
>for example  g + a * b + c * d + e * f,
>Keep chain can get 3 FMAs, break chain can get 2 FMAs. It's hard to say 
> which one is better, so we provide a param for users to customize.
>
> 2. when the chain has FMAs and need to break the chain with width,
> for example l + a * b + c * d + e * f + g * h + j * k;(we already put non-mul 
> first)
> rewrite_expr_tree_parallel :
> when width = 2, it will break the chain like this. actually it break the 
> chain in to 3. It ignores the width and adds all ops two by two. it will lose 
> FMA.
>
> ssa1 = l + a * b;
> ssa2 = c * d + e * f;
> ssa3 = g * h + j * k;
> ssa4 = ssa1 + ssa2;
> ssa5 = ssa4 + ssa3;
>
> rewrite_expr_tree_parallel_for_fma
> when width = 2, we break the chain into two like this.
>
> ssa1 = l + a * b;
> ssa2 = c * d +

Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-15 Thread Thomas Neumann via Gcc-patches


Hello, this patch breaks the build on targets where range is not declared i.e. 
where the #ifdef ATOMIC_FDE_FAST_PATH path is not taken.


argh, I did not realize I tested the patch only on atomic fast path 
platforms. The patch below fixes that by moving the check inside the #ifdef.


I will check that everything works on atomic and non-atomic platforms 
and commit the trivial move then. Sorry for the breakage.


Best

Thomas



From 550dc27f547a067e96137adeb85148d8a84c81a0 Mon Sep 17 00:00:00 2001
From: Thomas Neumann 
Date: Mon, 15 May 2023 14:59:22 +0200
Subject: [PATCH] fix assert in non-atomic path

The non-atomic path does not have range information,
we have to adjust the assert handle that case, too.

libgcc/ChangeLog:
* unwind-dw2-fde.c: Fix assert in non-atomic path.
---
 libgcc/unwind-dw2-fde.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 8683a65aa02..df461a1527d 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -240,6 +240,7 @@ __deregister_frame_info_bases (const void *begin)

   // And remove
   ob = btree_remove (®istered_frames, range[0]);
+  bool empty_table = (range[1] - range[0]) == 0;
 #else
   init_object_mutex_once ();
   __gthread_mutex_lock (&object_mutex);
@@ -276,11 +277,12 @@ __deregister_frame_info_bases (const void *begin)

  out:
   __gthread_mutex_unlock (&object_mutex);
+  bool empty_table = false;
 #endif

   // If we didn't find anything in the lookup data structures then they
   // were either already destroyed or we tried to remove an empty range.
-  gcc_assert (in_shutdown || ((range[1] - range[0]) == 0 || ob));
+  gcc_assert (in_shutdown || (empty_table || ob));
   return (void *) ob;
 }

--
2.39.2

Re: [PATCH] OPTABS: Extend the number of expanding instructions pattern.

2023-05-15 Thread Richard Biener via Gcc-patches

On Mon, 15 May 2023, juzhe.zh...@rivai.ai wrote:

> From: Juzhe-Zhong 
> 
> Hi, Richi.
> 
> We (RVV) is going to add a rounding mode operand into floating-point 
> instructions
> which have 11 operands.
> 
> Since we are going have intrinsic that is adding rounding mode argument:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
> 
> This is the patch that is adding rounding mode operand in RISC-V port:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618573.html
> You can see there are 11 operands in these patterns.
> 
> Is it Ok for trunk ?

OK.

Richard.

> Thanks
> 
> gcc/ChangeLog:
> 
> * optabs.cc (maybe_gen_insn): Add case to generate instruction that 
> has 11 operands.
> 
> ---
>  gcc/optabs.cc | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index c8e39c82d57..a12333c7169 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -8139,6 +8139,11 @@ maybe_gen_insn (enum insn_code icode, unsigned int 
> nops,
> ops[3].value, ops[4].value, ops[5].value,
> ops[6].value, ops[7].value, ops[8].value,
> ops[9].value);
> +case 11:
> +  return GEN_FCN (icode) (ops[0].value, ops[1].value, ops[2].value,
> +   ops[3].value, ops[4].value, ops[5].value,
> +   ops[6].value, ops[7].value, ops[8].value,
> +   ops[9].value, ops[10].value);
>  }
>gcc_unreachable ();
>  }
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

RE: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-15 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard Biener
> via Gcc-patches
> Sent: Monday, May 15, 2023 8:59 AM
> To: Thomas Neumann 
> Cc: Sören Tempel ; gcc-patches@gcc.gnu.org;
> al...@ayaya.dev
> Subject: Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes
> 
> On Sun, May 14, 2023 at 9:00 PM Thomas Neumann via Gcc-patches
>  wrote:
> >
> > Dear Sören,
> >
> > > we ran into a regression introduced by these changes. The regression
> > > manifests itself in a failing assertion in __deregister_frame_info_bases.
> > > The assertion failure was observed while using Chromium's `flatc` build
> > > system tool. The failing assertion is:
> > >
> > >   unwind-dw2-fde.c:281gcc_assert (in_shutdown || ob);
> > > [snip]
> > > However, I believe there is one more edge case that isn't being account
> > > for presently: If the inserted entry has a size of 0 (i.e. if range[1] -
> > > range[0] == 0) then the btree_insert call in __register_frame_info_bases
> > > will not insert anything. This is not accounted for in
> >  > [snip]
> > >
> > > Would be cool if this could be fixed on the GCC trunk.
> >
> > thanks for the details analysis and the patch, it looks obviously
> > correct for me. I can apply it to trunk, but we need approval from a gcc
> > maintainer first.
> 
> The patch is OK for trunk and affected branches.
> 

Hello, this patch breaks the build on targets where range is not declared i.e. 
where the #ifdef ATOMIC_FDE_FAST_PATH path is not taken.

Thanks,
Kyrill

> Thanks,
> Richard.
> 
> > But independent of your patch, do you have the test case available in
> > some easily accessible form, for example a docker image or an automated
> > build script? I ask because something odd is happening here, somebody
> > registered a non-empty EH that does not contain a single unwind range. I
> > am puzzled why anybody would do that, I would like to double check that
> > this is indeed the intended behavior and not a bug somewhere else. Or if
> > you have the test case at hand, it would be great if you could do a
> > quick step through get_pc_range for the affected frame to double-check
> > that the table is indeed empty and we don't miss an entry for some
> > strange reason.
> >
> > Best
> >
> > Thomas
> >
> >
> >

RE: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-15 Thread Oluwatamilore Adebayo via Gcc-patches



From: Oluwatamilore Adebayo 
Sent: Wednesday, May 10, 2023 14:29
To: Richard Biener ; gcc-patches@gcc.gnu.org; 
Richard Sandiford 
Subject: Re: [PATCH] vect: Missed opportunity to use [SU]ABD

When using inputs of 0x7fff and 0x8000 the result yielded is -1.
When using inputs of -1 and 0x7fff the results yielded is 0x8000.

Tami

From: Richard Biener 
mailto:richard.guent...@gmail.com>>
Sent: Wednesday, May 10, 2023 10:49 AM
To: Oluwatamilore Adebayo 
mailto:oluwatamilore.adeb...@arm.com>>; 
gcc-patches@gcc.gnu.org 
mailto:gcc-patches@gcc.gnu.org>>; 
richard.guent...@gmail.com 
mailto:richard.guent...@gmail.com>>; Richard 
Sandiford mailto:richard.sandif...@arm.com>>
Subject: Re: [PATCH] vect: Missed opportunity to use [SU]ABD

On Wed, May 10, 2023 at 11:01 AM Richard Sandiford
mailto:richard.sandif...@arm.com>> wrote:
>
> Oluwatamilore Adebayo 
> mailto:oluwatamilore.adeb...@arm.com>> writes:
> > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
> > From: oluade01 
> > mailto:oluwatamilore.adeb...@arm.com>>
> > Date: Fri, 14 Apr 2023 10:24:43 +0100
> > Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD
> >
> > This adds a recognition pattern for the non-widening
> > absolute difference (ABD).
> >
> > gcc/ChangeLog:
> >
> > * doc/md.texi (sabd, uabd): Document them.
> > * internal-fn.def (ABD): Use new optab.
> > * optabs.def (sabd_optab, uabd_optab): New optabs,
> > * tree-vect-patterns.cc (vect_recog_absolute_difference):
> > Recognize the following idiom abs (a - b).
> > (vect_recog_sad_pattern): Refactor to use
> > vect_recog_absolute_difference.
> > (vect_recog_abd_pattern): Use patterns found by
> > vect_recog_absolute_difference to build a new ABD
> > internal call.
> > ---
> >  gcc/doc/md.texi   |  10 ++
> >  gcc/internal-fn.def   |   3 +
> >  gcc/optabs.def|   2 +
> >  gcc/tree-vect-patterns.cc | 250 +-
> >  4 files changed, 234 insertions(+), 31 deletions(-)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 
> > 07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db894f437d1e21f0245a8
> >  100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
> >  Vector shift and rotate instructions that take vectors as operand 2
> >  instead of a scalar type.
> >
> > +@cindex @code{uabd@var{m}} instruction pattern
> > +@cindex @code{sabd@var{m}} instruction pattern
> > +@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
> > +Signed and unsigned absolute difference instructions.  These
> > +instructions find the difference between operands 1 and 2
> > +then return the absolute value.  A C code equivalent would be:
> > +@smallexample
> > +op0 = abs (op0 - op1)
>
> op0 = abs (op1 - op2)
>
> But that isn't the correct calculation for unsigned (where abs doesn't
> really work).  It also doesn't handle some cases correctly for signed.
>
> I think it's more:
>
>   op0 = op1 > op2 ? (unsigned type) op1 - op2 : (unsigned type) op2 - op1
>
> or (conceptually) max minus min.
>
> E.g. for 16-bit values, the absolute difference between signed 0x7fff
> and signed -0x8000 is 0x (reinterpreted as -1 if you cast back
> to signed).  But, ignoring undefined behaviour:
>
>   0x7fff - 0x8000 = -1
>   abs(-1) = 1
>
> which gives the wrong answer.
>
> We might still be able to fold C abs(a - b) to abd for signed a and b
> by relying on undefined behaviour (TYPE_OVERFLOW_UNDEFINED).  But we
> can't do it for -fwrapv.
>
> Richi knows better than me what would be appropriate here.

The question is what does the hardware do?  For the widening [us]sad it's
obvious since the difference is computed in a wider signed mode and the
absolute value always fits.

So what does it actually do, esp. when the difference yields 0x8000?

Richard.

>
> Thanks,
> Richard
>
> > +@end smallexample
> > +
> >  @cindex @code{avg@var{m}3_floor} instruction pattern
> >  @cindex @code{uavg@var{m}3_floor} instruction pattern
> >  @item @samp{avg@var{m}3_floor}
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 
> > 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
> >  100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
> > + sabd, uabd, binary)
> > +
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
> >   savg_floor, uavg_floor, binary)
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL,

Re: [PATCH 2/2] ivopts: Revert register pressure cost when there are enough registers.

2023-05-15 Thread Richard Biener via Gcc-patches

On Mon, May 15, 2023 at 12:44 PM Richard Biener
 wrote:
>
> On Wed, Dec 21, 2022 at 2:12 PM Dimitrije Milošević
>  wrote:
> >
> > When there are enough registers, the register pressure cost is
> > unnecessarily bumped by adding another n_cands.
> >
> > This behavior may result in register pressure costs for the case
> > when there are enough registers being higher than for other cases.
> >
> > When there are enough registers, the register pressure cost should be
> > equal to n_invs + n_cands.
> >
> > This used to be the case before c18101f.
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> >
> > Signed-off-by: Dimitrije Milosevic 
> > ---
> >  gcc/tree-ssa-loop-ivopts.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index 60c61dc9e49..3176482d0d9 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -6092,7 +6092,7 @@ ivopts_estimate_reg_pressure (struct ivopts_data 
> > *data, unsigned n_invs,
> >
> >/* If we have enough registers.  */
> >if (regs_needed + target_res_regs < available_regs)
> > -cost = n_new;
> > +return n_new;
>
> This still doesn't make much sense (before nor after).  We're
> comparing apples and oranges.
>
> I think it would make most sense to merge this case with the following
> and thus do
> the following.  The distinction between the cases should be preserved
> and attenuated
> by the adding of n_cands at the end (as tie-breaker).
>
> Does this help the mips case?  I'm going to throw it at x86_64-linux
> bootstrap/regtest.
>
> Btw, I don't think using address complexity makes much sense for a port that
> has only one addressing mode so I guess a better approach for 1/2 would be
> to make sure it is consistently the same value (I suppose it is not, otherwise
> you wouldn't have changed it).  Oh, and we're adding the
> reg-pressure cost to the same bucket as well, and there we don't really know
> how many times we're going to spill.  That said, I think ->complexity should
> rather go away - we are asking for address-cost already and IVOPTs uses
> built RTX to query the target.
>
> But yes, I agree ivopts_estimate_reg_pressure has an issue.
>
> Sorry for the very long delay,
> Richard.

The patch below bootstraps and regtests ok on x86_64-unknown-linux-gnu,
but I guess that doesn't mean much.

Richard.

> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 6fbd2d59318..bc8493622de 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -6077,8 +6077,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data
> *data, unsigned n_invs,
>   unsigned n_cands)
>  {
>unsigned cost;
> -  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> -  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> +  unsigned n_old = data->regs_used;
> +  unsigned regs_needed = n_invs + n_cands + n_old;
> +  unsigned available_regs = target_avail_regs;
>bool speed = data->speed;
>
>/* If there is a call in the loop body, the call-clobbered registers
> @@ -6087,10 +6088,7 @@ ivopts_estimate_reg_pressure (struct
> ivopts_data *data, unsigned n_invs,
>  available_regs = available_regs - target_clobbered_regs;
>
>/* If we have enough registers.  */
> -  if (regs_needed + target_res_regs < available_regs)
> -cost = n_new;
> -  /* If close to running out of registers, try to preserve them.  */
> -  else if (regs_needed <= available_regs)
> +  if (regs_needed <= available_regs)
>  cost = target_reg_cost [speed] * regs_needed;
>/* If we run out of available registers but the number of candidates
>   does not, we penalize extra registers using target_spill_cost.  */
>
>
> >/* If close to running out of registers, try to preserve them.  */
> >else if (regs_needed <= available_regs)
> >  cost = target_reg_cost [speed] * regs_needed;
> > --
> > 2.25.1
> >

[PATCH] OPTABS: Extend the number of expanding instructions pattern.

2023-05-15 Thread juzhe . zhong

From: Juzhe-Zhong 

Hi, Richi.

We (RVV) is going to add a rounding mode operand into floating-point 
instructions
which have 11 operands.

Since we are going have intrinsic that is adding rounding mode argument:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

This is the patch that is adding rounding mode operand in RISC-V port:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618573.html
You can see there are 11 operands in these patterns.

Is it Ok for trunk ?

Thanks

gcc/ChangeLog:

* optabs.cc (maybe_gen_insn): Add case to generate instruction that has 
11 operands.

---
 gcc/optabs.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index c8e39c82d57..a12333c7169 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -8139,6 +8139,11 @@ maybe_gen_insn (enum insn_code icode, unsigned int nops,
  ops[3].value, ops[4].value, ops[5].value,
  ops[6].value, ops[7].value, ops[8].value,
  ops[9].value);
+case 11:
+  return GEN_FCN (icode) (ops[0].value, ops[1].value, ops[2].value,
+ ops[3].value, ops[4].value, ops[5].value,
+ ops[6].value, ops[7].value, ops[8].value,
+ ops[9].value, ops[10].value);
 }
   gcc_unreachable ();
 }
-- 
2.36.1

[PATCH] RISC-V: Add FRM and rounding mode operand into floating-point ternary instructions

2023-05-15 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is adding FRM and rounding mode into floating-point ternary 
instructions.
This patch should be merged after optabs.cc patch.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc 
(function_expander::use_ternop_insn): Add default rounding mode.
* config/riscv/vector.md: Add rounding mode operand and FRM_REGNUM.

---
 gcc/config/riscv/riscv-vector-builtins.cc |  7 +++
 gcc/config/riscv/vector.md| 64 +--
 2 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index f10f38f6425..b7458aaace6 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3460,6 +3460,13 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
insn_code icode)
   add_input_operand (Pmode, get_tail_policy_for_pred (pred));
   add_input_operand (Pmode, get_mask_policy_for_pred (pred));
   add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
+
+  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
mode.
+ We add default rounding mode for the intrinsics that didn't model rounding
+ mode yet.  */
+  if (opno != insn_data[icode].n_generator_args)
+add_input_operand (Pmode, const0_rtx);
+
   return generate_insn (icode);
 }
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0929d19d5ec..80f9ba9bd28 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -5902,8 +5902,10 @@
 (match_operand 7 "const_int_operand")
 (match_operand 8 "const_int_operand")
 (match_operand 9 "const_int_operand")
+(match_operand 10 "const_int_operand")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (match_operand:VF 2 "register_operand")
@@ -5927,8 +5929,10 @@
 (match_operand 6 "const_int_operand""  i,i,  i,i")
 (match_operand 7 "const_int_operand""  i,i,  i,i")
 (match_operand 8 "const_int_operand""  i,i,  i,i")
+(match_operand 9 "const_int_operand""  i,i,  i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (match_operand:VF 2 "register_operand" "  0,   vr,  0,   vr")
@@ -5958,8 +5962,10 @@
 (match_operand 6 "const_int_operand""  i,i,  i,i")
 (match_operand 7 "const_int_operand""  i,i,  i,i")
 (match_operand 8 "const_int_operand""  i,i,  i,i")
+(match_operand 9 "const_int_operand""  i,i,  i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (match_operand:VF 2 "register_operand" " vr,   vr, vr,   vr")
@@ -5989,8 +5995,10 @@
 (match_operand 7 "const_int_operand""i,i")
 (match_operand 8 "const_int_operand""i,i")
 (match_operand 9 "const_int_operand""i,i")
+(match_operand 10 "const_int_operand"   "i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (match_operand:VF 2 "register_operand" "   vr,   vr")
@@ -6024,8 +6032,10 @@
 (match_operand 7 "const_int_operand")
 (match_operand 8 "const_int_operand")
 (match_operand 9 "const_int_operand")
+(match_operand 10 "const_int_operand")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (vec_duplicate:VF
@@ -6045,8 +6055,10 @@
 (match_operand 6 "const_int_operand" "  i,i,  i,i")
 (match_operand 7 "const_int_operand" "  i,i,  i,i")
 (match_operand 8 "const_int_operand" "  i,i,  i,i")
+(match_operand 9 "const_int_operand" "  i,i,  i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(m

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-15 Thread Richard Biener via Gcc-patches

On Mon, 15 May 2023, Andre Vieira (lists) wrote:

> 
> 
> On 15/05/2023 12:01, Richard Biener wrote:
> > On Mon, 15 May 2023, Richard Sandiford wrote:
> > 
> >> Richard Biener  writes:
> >>> On Fri, 12 May 2023, Richard Sandiford wrote:
> >>>
>  Richard Biener  writes:
> > On Fri, 12 May 2023, Andre Vieira (lists) wrote:
> >
> >> I have dealt with, I think..., most of your comments. There's quite a
> >> few
> >> changes, I think it's all a bit simpler now. I made some other changes
> >> to the
> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to
> >> preserve
> >> the same behaviour as we had with the tree codes before. Also added
> >> some extra
> >> checks to tree-cfg.cc that made sense to me.
> >>
> >> I am still regression testing the gimple-range-op change, as that was a
> >> last
> >> minute change, but the rest survived a bootstrap and regression test on
> >> aarch64-unknown-linux-gnu.
> >>
> >> cover letter:
> >>
> >> This patch replaces the existing tree_code widen_plus and widen_minus
> >> patterns with internal_fn versions.
> >>
> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and
> >> DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN
> >> respectively
> >> except they provide convenience wrappers for defining conversions that
> >> require
> >> a hi/lo split.  Each definition for  will require optabs for _hi
> >> and _lo
> >> and each of those will also require a signed and unsigned version in
> >> the case
> >> of widening. The hi/lo pair is necessary because the widening and
> >> narrowing
> >> operations take n narrow elements as inputs and return n/2 wide
> >> elements as
> >> outputs. The 'lo' operation operates on the first n/2 elements of
> >> input. The
> >> 'hi' operation operates on the second n/2 elements of input. Defining
> >> an
> >> internal_fn along with hi/lo variations allows a single internal
> >> function to
> >> be returned from a vect_recog function that will later be expanded to
> >> hi/lo.
> >>
> >>
> >>   For example:
> >>   IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_add_hi_ ->
> >> (u/s)addl2
> >> IFN_VEC_WIDEN_PLUS_LO  ->
> >> vec_widen_add_lo_
> >> -> (u/s)addl
> >>
> >> This gives the same functionality as the previous
> >> WIDEN_PLUS/WIDEN_MINUS tree
> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> >
> > What I still don't understand is how we are so narrowly focused on
> > HI/LO?  We need a combined scalar IFN for pattern selection (not
> > sure why that's now called _HILO, I expected no suffix).  Then there's
> > three possibilities the target can implement this:
> >
> >   1) with a widen_[su]add instruction - I _think_ that's what
> >  RISCV is going to offer since it is a target where vector modes
> >  have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
> >  RVV can do a V4HI to V4SI widening and widening add/subtract
> >  using vwadd[u] and vwsub[u] (the HI->SI widening is actually
> >  done with a widening add of zero - eh).
> >  IIRC GCN is the same here.
> 
>  SVE currently does this too, but the addition and widening are
>  separate operations.  E.g. in principle there's no reason why
>  you can't sign-extend one operand, zero-extend the other, and
>  then add the result together.  Or you could extend them from
>  different sizes (QI and HI).  All of those are supported
>  (if the costing allows them).
> >>>
> >>> I see.  So why does the target the expose widen_[su]add at all?
> >>
> >> It shouldn't (need to) do that.  I don't think we should have an optab
> >> for the unsplit operation.
> >>
> >> At least on SVE, we really want the extensions to be fused with loads
> >> (where possible) rather than with arithmetic.
> >>
> >> We can still do the widening arithmetic in one go.  It's just that
> >> fusing with the loads works for the mixed-sign and mixed-size cases,
> >> and can handle more than just doubling the element size.
> >>
>  If the target has operations to do combined extending and adding (or
>  whatever), then at the moment we rely on combine to generate them.
> 
>  So I think this case is separate from Andre's work.  The addition
>  itself is just an ordinary addition, and any widening happens by
>  vectorising a CONVERT/NOP_EXPR.
> 
> >   2) with a widen_[su]add{_lo,_hi} combo - that's what the tree
> >  codes currently support (exclusively)
> >   3) similar, but widen_[su]add{_even,_odd}
> >
> > that said, things like decomposes_to_hilo_fn_p look to paint u

[PATCH] RISC-V: Add FRM and rounding mode operand into floating-point ternary instructions

2023-05-15 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is adding FRM and rounding mode into floating-point ternary 
instructions.
This patch should be merged after optabs.cc patch.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc 
(function_expander::use_ternop_insn): Add default rounding mode.
* config/riscv/vector.md: Add rounding mode operand and FRM_REGNUM.

---
 gcc/config/riscv/riscv-vector-builtins.cc |  7 +++
 gcc/config/riscv/vector.md| 64 +--
 2 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index f10f38f6425..b7458aaace6 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3460,6 +3460,13 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
insn_code icode)
   add_input_operand (Pmode, get_tail_policy_for_pred (pred));
   add_input_operand (Pmode, get_mask_policy_for_pred (pred));
   add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
+
+  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
mode.
+ We add default rounding mode for the intrinsics that didn't model rounding
+ mode yet.  */
+  if (opno != insn_data[icode].n_generator_args)
+add_input_operand (Pmode, const0_rtx);
+
   return generate_insn (icode);
 }
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0929d19d5ec..80f9ba9bd28 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -5902,8 +5902,10 @@
 (match_operand 7 "const_int_operand")
 (match_operand 8 "const_int_operand")
 (match_operand 9 "const_int_operand")
+(match_operand 10 "const_int_operand")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (match_operand:VF 2 "register_operand")
@@ -5927,8 +5929,10 @@
 (match_operand 6 "const_int_operand""  i,i,  i,i")
 (match_operand 7 "const_int_operand""  i,i,  i,i")
 (match_operand 8 "const_int_operand""  i,i,  i,i")
+(match_operand 9 "const_int_operand""  i,i,  i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (match_operand:VF 2 "register_operand" "  0,   vr,  0,   vr")
@@ -5958,8 +5962,10 @@
 (match_operand 6 "const_int_operand""  i,i,  i,i")
 (match_operand 7 "const_int_operand""  i,i,  i,i")
 (match_operand 8 "const_int_operand""  i,i,  i,i")
+(match_operand 9 "const_int_operand""  i,i,  i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (match_operand:VF 2 "register_operand" " vr,   vr, vr,   vr")
@@ -5989,8 +5995,10 @@
 (match_operand 7 "const_int_operand""i,i")
 (match_operand 8 "const_int_operand""i,i")
 (match_operand 9 "const_int_operand""i,i")
+(match_operand 10 "const_int_operand"   "i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (match_operand:VF 2 "register_operand" "   vr,   vr")
@@ -6024,8 +6032,10 @@
 (match_operand 7 "const_int_operand")
 (match_operand 8 "const_int_operand")
 (match_operand 9 "const_int_operand")
+(match_operand 10 "const_int_operand")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(mult:VF
  (vec_duplicate:VF
@@ -6045,8 +6055,10 @@
 (match_operand 6 "const_int_operand" "  i,i,  i,i")
 (match_operand 7 "const_int_operand" "  i,i,  i,i")
 (match_operand 8 "const_int_operand" "  i,i,  i,i")
+(match_operand 9 "const_int_operand" "  i,i,  i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)
+(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VF
(m

Re: [PATCH v2] libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

2023-05-15 Thread Mike Crowe via Gcc-patches

On Friday 12 May 2023 at 11:32:56 +0100, Jonathan Wakely wrote:
> On Fri, 12 May 2023 at 11:30, Mike Crowe  wrote:
> > On Thursday 11 May 2023 at 21:52:22 +0100, Jonathan Wakely wrote:
> > > On Thu, 11 May 2023 at 13:42, Jonathan Wakely 
> > wrote:
> > > > On Thu, 11 May 2023 at 13:19, Mike Crowe  wrote:
> > > >> However, ...
> > > >>
> > > >> > > diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> > > >> > > index 89e7f5f5f45..e2700b05ec3 100644
> > > >> > > --- a/libstdc++-v3/acinclude.m4
> > > >> > > +++ b/libstdc++-v3/acinclude.m4
> > > >> > > @@ -4284,7 +4284,7 @@
> > > >> AC_DEFUN([GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT], [
> > > >> > >[glibcxx_cv_PTHREAD_COND_CLOCKWAIT=no])
> > > >> > >])
> > > >> > >if test $glibcxx_cv_PTHREAD_COND_CLOCKWAIT = yes; then
> > > >> > > -AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, 1, [Define if
> > > >> > > pthread_cond_clockwait is available in .])
> > > >> > > +AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT,
> > > >> (_GLIBCXX_TSAN==0),
> > > >> > > [Define if pthread_cond_clockwait is available in .])
> > > >> > >fi
> > > >>
> > > >> TSan does appear to have an interceptor for pthread_cond_clockwait,
> > even
> > > >> if
> > > >> it lacks the others. Does this mean that this part is unnecessary?
> > > >>
> > > >
> > > > Ah good point, thanks. I grepped for clocklock but not clockwait.
> > > >
> > >
> > > In fact it seems like we don't need to change
> > > _GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK either, because I don't get any
> > tsan
> > > warnings for that. It doesn't have interceptors for
> > > pthread_rwlock_{rd,wr}lock, but it doesn't complain anyway (maybe it's
> > > simply not instrumenting the rwlock functions at all?!)
> >
> > It looks like TSan does have interceptors for pthread_rwlock_timedrdlock
> > etc. I can't explain why this doesn't cause problems when libstdc++ uses
> > pthread_rwlock_clockrdlock etc.
> >
> 
> I think glibc has renamed the rwlock functions, and so the interceptors no
> longer work.
> 
> # ifdef __USE_XOPEN2K
> /* Try to acquire read lock for RWLOCK or return after specfied time.  */
> #  ifndef __USE_TIME_BITS64
> extern int pthread_rwlock_timedrdlock (pthread_rwlock_t *__restrict
> __rwlock,
>   const struct timespec *__restrict
>   __abstime) __THROWNL __nonnull ((1, 2));
> #  else
> #   ifdef __REDIRECT_NTHNL
> extern int __REDIRECT_NTHNL (pthread_rwlock_timedrdlock,
>  (pthread_rwlock_t *__restrict __rwlock,
>   const struct timespec *__restrict __abstime),
>  __pthread_rwlock_timedrdlock64)
> __nonnull ((1, 2));
> #   else
> #define pthread_rwlock_timedrdlock __pthread_rwlock_timedrdlock64
> #   endif
> #  endif
> # endif
> 
> If glibc is really providing a function called
> __pthread_rwlock_timedrdlock64 then will tsan be able to intercept that?

I'm by no means an expert, but I would guess not. I suspect that the
renaming was introduced as part of the Y2038 fixes and TSan hasn't caught
up with them either.

Mike.

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-15 Thread Andre Vieira (lists) via Gcc-patches




On 15/05/2023 12:01, Richard Biener wrote:

On Mon, 15 May 2023, Richard Sandiford wrote:


Richard Biener  writes:

On Fri, 12 May 2023, Richard Sandiford wrote:


Richard Biener  writes:

On Fri, 12 May 2023, Andre Vieira (lists) wrote:


I have dealt with, I think..., most of your comments. There's quite a few
changes, I think it's all a bit simpler now. I made some other changes to the
costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
the same behaviour as we had with the tree codes before. Also added some extra
checks to tree-cfg.cc that made sense to me.

I am still regression testing the gimple-range-op change, as that was a last
minute change, but the rest survived a bootstrap and regression test on
aarch64-unknown-linux-gnu.

cover letter:

This patch replaces the existing tree_code widen_plus and widen_minus
patterns with internal_fn versions.

DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
except they provide convenience wrappers for defining conversions that require
a hi/lo split.  Each definition for  will require optabs for _hi and _lo
and each of those will also require a signed and unsigned version in the case
of widening. The hi/lo pair is necessary because the widening and narrowing
operations take n narrow elements as inputs and return n/2 wide elements as
outputs. The 'lo' operation operates on the first n/2 elements of input. The
'hi' operation operates on the second n/2 elements of input. Defining an
internal_fn along with hi/lo variations allows a single internal function to
be returned from a vect_recog function that will later be expanded to hi/lo.


  For example:
  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_add_hi_ ->
(u/s)addl2
IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_add_lo_
-> (u/s)addl

This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.


What I still don't understand is how we are so narrowly focused on
HI/LO?  We need a combined scalar IFN for pattern selection (not
sure why that's now called _HILO, I expected no suffix).  Then there's
three possibilities the target can implement this:

  1) with a widen_[su]add instruction - I _think_ that's what
 RISCV is going to offer since it is a target where vector modes
 have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
 RVV can do a V4HI to V4SI widening and widening add/subtract
 using vwadd[u] and vwsub[u] (the HI->SI widening is actually
 done with a widening add of zero - eh).
 IIRC GCN is the same here.


SVE currently does this too, but the addition and widening are
separate operations.  E.g. in principle there's no reason why
you can't sign-extend one operand, zero-extend the other, and
then add the result together.  Or you could extend them from
different sizes (QI and HI).  All of those are supported
(if the costing allows them).


I see.  So why does the target the expose widen_[su]add at all?


It shouldn't (need to) do that.  I don't think we should have an optab
for the unsplit operation.

At least on SVE, we really want the extensions to be fused with loads
(where possible) rather than with arithmetic.

We can still do the widening arithmetic in one go.  It's just that
fusing with the loads works for the mixed-sign and mixed-size cases,
and can handle more than just doubling the element size.


If the target has operations to do combined extending and adding (or
whatever), then at the moment we rely on combine to generate them.

So I think this case is separate from Andre's work.  The addition
itself is just an ordinary addition, and any widening happens by
vectorising a CONVERT/NOP_EXPR.


  2) with a widen_[su]add{_lo,_hi} combo - that's what the tree
 codes currently support (exclusively)
  3) similar, but widen_[su]add{_even,_odd}

that said, things like decomposes_to_hilo_fn_p look to paint us into
a 2) corner without good reason.


I suppose one question is: how much of the patch is really specific
to HI/LO, and how much is just grouping two halves together?


Yep, that I don't know for sure.


  The nice
thing about the internal-fn grouping macros is that, if (3) is
implemented in future, the structure will strongly encourage even/odd
pairs to be supported for all operations that support hi/lo.  That is,
I would expect the grouping macros to be extended to define even/odd
ifns alongside hi/lo ones, rather than adding separate definitions
for even/odd functions.

If so, at least from the internal-fn.* side of things, I think the question
is whether it's OK to stick with hilo names for now, or whether we should
use more forward-looking names.


I think for parts that are independent we could use a more
forward-looking name.  Maybe _halves?


Using _h

[PATCH] RISC-V: Add rounding mode operand for floating point instructions

2023-05-15 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is adding rounding mode operand and FRM_REGNUM dependency
into floating-point instructions.

The floating-point instructions we added FRM and rounding mode operand:
1. vfadd/vfsub
2. vfwadd/vfwsub
3. vfmul
4. vfdiv
5. vfwmul
6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
7. vfsqrt7/vfrec7
8. floating-point conversions.
9. floating-point reductions.

The floating-point instructions we did NOT add FRM and rounding mode operand:
1. vfsqrt/vfneg
2. vfmin/vfmax
3. comparisons
4. vfclass
5. vfsgnj/vfsgnjn/vfsgnjx
6. vfmerge
7. vfmv.v.f

TODO: floating-point ternary: FRM and rounding mode operand should be added but
they are not added in this patch since it will exceed the number of operands can
be handled in optabs.cc. Will add it the next patch.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum frm_field_enum): New enum.
* config/riscv/riscv-vector-builtins.cc 
(function_expander::use_widen_ternop_insn): Add default rounding mode.
* config/riscv/riscv.cc (riscv_hard_regno_nregs): Add FRM_REGNUM.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_conditional_register_usage): Ditto.
* config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
(FRM_REG_P): Ditto.
(RISCV_DWARF_FRM): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: split smax/smin and plus/mult since 
smax/smin doesn't need FRM.
* config/riscv/vector.md (@pred__scalar): Splitted pattern.
(@pred_): Ditto.

---
 gcc/config/riscv/riscv-protos.h   |  10 ++
 gcc/config/riscv/riscv-vector-builtins.cc |   7 +
 gcc/config/riscv/riscv.cc |   7 +-
 gcc/config/riscv/riscv.h  |   7 +-
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/vector-iterators.md  |   6 +-
 gcc/config/riscv/vector.md| 171 ++
 7 files changed, 171 insertions(+), 38 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 835bb802fc6..12634d0ac1a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -231,6 +231,16 @@ enum vxrm_field_enum
   VXRM_RDN,
   VXRM_ROD
 };
+/* Rounding mode bitfield for floating point FRM.  */
+enum frm_field_enum
+{
+  FRM_RNE = 0b000,
+  FRM_RTZ = 0b001,
+  FRM_RDN = 0b010,
+  FRM_RUP = 0b011,
+  FRM_RMM = 0b100,
+  DYN = 0b111
+};
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 1de075fb90d..f10f38f6425 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3482,6 +3482,13 @@ function_expander::use_widen_ternop_insn (insn_code 
icode)
   add_input_operand (Pmode, get_tail_policy_for_pred (pred));
   add_input_operand (Pmode, get_mask_policy_for_pred (pred));
   add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
+
+  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
mode.
+ We add default rounding mode for the intrinsics that didn't model rounding
+ mode yet.  */
+  if (opno != insn_data[icode].n_generator_args)
+add_input_operand (Pmode, const0_rtx);
+
   return generate_insn (icode);
 }
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b52e613c629..de5b87b1a87 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6082,7 +6082,8 @@ riscv_hard_regno_nregs (unsigned int regno, machine_mode 
mode)
 
   /* mode for VL or VTYPE are just a marker, not holding value,
  so it always consume one register.  */
-  if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno))
+  if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno)
+  || FRM_REG_P (regno))
 return 1;
 
   /* Assume every valid non-vector mode fits in one vector register.  */
@@ -6150,7 +6151,8 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
   if (lmul != 1)
return ((regno % lmul) == 0);
 }
-  else if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno))
+  else if (VTYPE_REG_P (regno) || VL_REG_P (regno) || VXRM_REG_P (regno)
+  || FRM_REG_P (regno))
 return true;
   else
 return false;
@@ -6587,6 +6589,7 @@ riscv_conditional_register_usage (void)
   fixed_regs[VTYPE_REGNUM] = call_used_regs[VTYPE_REGNUM] = 1;
   fixed_regs[VL_REGNUM] = call_used_regs[VL_REGNUM] = 1;
   fixed_regs[VXRM_REGNUM] = call_used_regs[VXRM_REGNUM] = 1;
+  fixed_regs[FRM_REGNUM] = call_used_regs[FRM_REGNUM] = 1;
 }
 }
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index f74b70de562..f55bd6112a8 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -121,8 +121,9 @@ ASM_MISA_SPEC
 
 /* The mapping from gcc register number to DWARF 2 CFA column number.  */
 #define DWARF_FRAME_REGNUM(REGNO)

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Jakub Jelinek via Gcc-patches

On Mon, May 15, 2023 at 01:08:51PM +0200, Richard Biener wrote:
> Btw, why's there a trailing underscore for union but not intersect?

Because union is a C++ keyword, while intersect is not.

Jakub

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-15 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Mon, 15 May 2023, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > But I'm also not sure
>> > how much of that is really needed (it seems to be tied around
>> > optimizing optabs space?)
>> 
>> Not sure what you mean by "this".  Optabs space shouldn't be a problem
>> though.  The optab encoding gives us a full int to play with, and it
>> could easily go up to 64 bits if necessary/convenient.
>> 
>> At least on the internal-fn.* side, the aim is really just to establish
>> a regular structure, so that we don't have arbitrary differences between
>> different widening operations, or too much cut-&-paste.
>
> Hmm, I'm looking at the need for the std::map and 
> internal_fn_hilo_keys_array and internal_fn_hilo_values_array.
> The vectorizer pieces contain
>
> +  if (code.is_fn_code ())
> + {
> +  internal_fn ifn = as_internal_fn ((combined_fn) code);
> +  gcc_assert (decomposes_to_hilo_fn_p (ifn));
> +
> +  internal_fn lo, hi;
> +  lookup_hilo_internal_fn (ifn, &lo, &hi);
> +  *code1 = as_combined_fn (lo);
> +  *code2 = as_combined_fn (hi);
> +  optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype));
> +  optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype));
>
> so that tries to automatically associate the scalar widening IFN
> with the set(s) of IFN pairs we can split to.  But then this
> list should be static and there's no need to create a std::map?
> Maybe gencfn-macros.cc can be enhanced to output these static
> cases?  Or the vectorizer could (as it did previously) simply
> open-code the handled cases (I guess since we deal with two
> cases only now I'd prefer that).

Ah, yeah, I pushed back against that too.  I think it should be possible
to do it using the preprocessor, if the macros are defined appropriately.
But if it isn't possible to do it with macros then I agree that a
generator would be better than initialisation within the compiler.

Thanks,
Richard

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Richard Biener via Gcc-patches

On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez  wrote:
>
> 
> We can now have int_range for automatically
> resizable ranges.  int_range_max is now int_range<3, true>
> for a 69X reduction in size from current trunk, and 6.9X reduction from
> GCC12.  This incurs a 5% performance penalty for VRP that is more than
> covered by our > 13% improvements recently.
> 
>
> int_range_max is the temporary range object we use in the ranger for
> integers.  With the conversion to wide_int, this structure bloated up
> significantly because wide_ints are huge (80 bytes a piece) and are
> about 10 times as big as a plain tree.  Since the temporary object
> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
> This means the structure grew from 4112 bytes to 40912 bytes.
>
> This patch adds the ability to resize ranges as needed, defaulting to
> no resizing, while int_range_max now defaults to 3 sub-ranges (instead
> of 255) and grows to 255 when the range being calculated does not fit.
>
> For example:
>
> int_range<1> foo;   // 1 sub-range with no resizing.
> int_range<5> foo;   // 5 sub-ranges with no resizing.
> int_range<5, true> foo; // 5 sub-ranges with resizing.
>
> I ran some tests and found that 3 sub-ranges cover 99% of cases, so
> I've set the int_range_max default to that:
>
> typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
>
> We don't bother growing incrementally, since the default covers most
> cases and we have a 255 hard-limit.  This hard limit could be reduced
> to 128, since my tests never saw a range needing more than 124, but we
> could do that as a follow-up if needed.
>
> With 3-subranges, int_range_max is now 592 bytes versus 40912 for
> trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
> and 3.02% for threading, with no noticeable change in overall
> compilation (0.27%).  This is more than covered by our 13.26%
> improvements for the legacy removal + wide_int conversion.

Thanks for doing this.

> I think this approach is a good alternative, while providing us with
> flexibility going forward.  For example, we could try defaulting to a
> 8 sub-ranges for a noticeable improvement in VRP.  We could also use
> large sub-ranges for switch analysis to avoid resizing.
>
> Another approach I tried was always resizing.  With this, we could
> drop the whole int_range nonsense, and have irange just hold a
> resizable range.  This simplified things, but incurred a 7% penalty on
> ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
> this wasn't some artifact of valgrind.  However, until we're sure,
> let's avoid massive changes, especially since IPA changes are coming
> up.
>
> For the curious, a particular hot spot for IPA in this area was:
>
> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
> {
> ...
> ...
>   value_range save (m_vr);
>   m_vr.union_ (*other_vr);
>   return m_vr != save;
> }
>
> The problem isn't the resizing (since we do that at most once) but the
> fact that for some functions with lots of callers we end up a huge
> range that gets copied and compared for every meet operation.  Maybe
> the IPA algorithm could be adjusted somehow??.

Well, the above just wants to know whether the union_ operation changed
the range.  I suppose that would be an interesting (and easy to compute?)
secondary output of union_ and it seems it already computes that (but
maybe not correctly?).  So I suggest to change the above to

  bool res;
  if (flag_checking)
   {
  value_range save (m_vr);
  res = m_vr.union_ (*other_vr);
  gcc_assert (res == (m_vr != save));
   }
 else
res = m_vr.union (*other_vr);
 return res;

Btw, why's there a trailing underscore for union but not intersect?

Richard.

> Anywhooo... for now there is nothing to worry about, since value_range
> still has 2 subranges and is not resizable.  But we should probably
> think what if anything we want to do here, as I envision IPA using
> infinite ranges here (well, int_range_max) and handling frange's, etc.
>
> I'll hold off a day or two, as I'd appreciate feedback here.
>
> gcc/ChangeLog:
>
> PR tree-optimization/109695
> * value-range.cc (irange::operator=): Resize range.
> (irange::union_): Same.
> (irange::intersect): Same.
> (irange::invert): Same.
> (int_range_max): Default to 3 sub-ranges and resize as needed.
> * value-range.h (irange::maybe_resize): New.
> (~int_range): New.
> (int_range::int_range): Adjust for resizing.
> (int_range::operator=): Same.
> ---
>  gcc/value-range.cc | 14 +++
>  gcc/value-range.h  | 98 --
>  2 files changed, 82 insertions(+), 30 deletions(-)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index def9299dc0e..cea4ff59254 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -901,6 +901,9 @@ frange::set_nonnegative (tree type)
>  irange &
>  irange::operator= (

[PATCH][committed] aarch64: Cost vector comparisons more accurately

2023-05-15 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

We are missing cases for combining of FACGE/FACGT instructions. In the testcase 
of the patch we generate:
foo:
fabsv3.4s, v0.4s
fabsv0.4s, v1.4s
fabsv1.4s, v2.4s
fcmgt   v0.4s, v3.4s, v0.4s
fcmgt   v1.4s, v3.4s, v1.4s
b   g

This is because combine is rejecting the pattern due to costs:
Successfully matched this instruction:
(set (reg:V4SI 106)
(neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113))
(abs:V4SF (reg:V4SF 111)
rejecting combination of insns 8, 9 and 10
original costs 8 + 8 + 12 = 28
replacement costs 8 + 28 = 36

It is obviously recursing in the various arms of the RTX and such.
This patch teaches the aarch64 rtx costs routine that our vector comparisons 
are represented as a NEG of
compare operators, with the FACGE/FAGT operations in particular having ABS on 
each arm. With this patch we get
the much more reasonable dump:
original costs 8 + 8 + 8 = 24
replacement costs 8 + 8 = 16
and generate the optimal assembly:
foo:
mov v31.16b, v0.16b
facgt   v0.4s, v0.4s, v1.4s
facgt   v1.4s, v31.4s, v2.4s
b   g

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_rtx_costs, NEG case): Add costing
logic for vector modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/facg_1.c: New test.


vcmpcst.patch
Description: vcmpcst.patch

Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-05-15 Thread Jakub Jelinek via Gcc-patches

On Mon, May 15, 2023 at 12:19:00PM +0200, Jakub Jelinek via Gcc-patches wrote:
> For C++ in templates we obviously need to defer that until instantiations,
> the constants in the clauses etc. could be template parameters etc.

Even in C++ the how many canonical loop nest form loops does this
transformation generate can be probably answered during parsing at least
for the 5.1/5.2 loop transformations.
I think we don't really allow
template 
void foo ()
{
  #pragma omp for collapse(2)
  #pragma omp tile sizes(args...)
  for (int i = 0; i < 64; i++)
for (int j = 0; j < 64; j++)
  for (int k = 0; k < 64; k++)
;
}
there how many arguments sizes clause has would be determined only
after instantiation.  Of course, we don't know the exact values...

Jakub

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-15 Thread Richard Biener via Gcc-patches

On Mon, 15 May 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 12 May 2023, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Fri, 12 May 2023, Andre Vieira (lists) wrote:
> >> >
> >> >> I have dealt with, I think..., most of your comments. There's quite a 
> >> >> few
> >> >> changes, I think it's all a bit simpler now. I made some other changes 
> >> >> to the
> >> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to 
> >> >> preserve
> >> >> the same behaviour as we had with the tree codes before. Also added 
> >> >> some extra
> >> >> checks to tree-cfg.cc that made sense to me.
> >> >> 
> >> >> I am still regression testing the gimple-range-op change, as that was a 
> >> >> last
> >> >> minute change, but the rest survived a bootstrap and regression test on
> >> >> aarch64-unknown-linux-gnu.
> >> >> 
> >> >> cover letter:
> >> >> 
> >> >> This patch replaces the existing tree_code widen_plus and widen_minus
> >> >> patterns with internal_fn versions.
> >> >> 
> >> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and 
> >> >> DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
> >> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN 
> >> >> respectively
> >> >> except they provide convenience wrappers for defining conversions that 
> >> >> require
> >> >> a hi/lo split.  Each definition for  will require optabs for _hi 
> >> >> and _lo
> >> >> and each of those will also require a signed and unsigned version in 
> >> >> the case
> >> >> of widening. The hi/lo pair is necessary because the widening and 
> >> >> narrowing
> >> >> operations take n narrow elements as inputs and return n/2 wide 
> >> >> elements as
> >> >> outputs. The 'lo' operation operates on the first n/2 elements of 
> >> >> input. The
> >> >> 'hi' operation operates on the second n/2 elements of input. Defining an
> >> >> internal_fn along with hi/lo variations allows a single internal 
> >> >> function to
> >> >> be returned from a vect_recog function that will later be expanded to 
> >> >> hi/lo.
> >> >> 
> >> >> 
> >> >>  For example:
> >> >>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
> >> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_add_hi_ ->
> >> >> (u/s)addl2
> >> >>IFN_VEC_WIDEN_PLUS_LO  -> 
> >> >> vec_widen_add_lo_
> >> >> -> (u/s)addl
> >> >> 
> >> >> This gives the same functionality as the previous 
> >> >> WIDEN_PLUS/WIDEN_MINUS tree
> >> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> >> >
> >> > What I still don't understand is how we are so narrowly focused on
> >> > HI/LO?  We need a combined scalar IFN for pattern selection (not
> >> > sure why that's now called _HILO, I expected no suffix).  Then there's
> >> > three possibilities the target can implement this:
> >> >
> >> >  1) with a widen_[su]add instruction - I _think_ that's what
> >> > RISCV is going to offer since it is a target where vector modes
> >> > have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
> >> > RVV can do a V4HI to V4SI widening and widening add/subtract
> >> > using vwadd[u] and vwsub[u] (the HI->SI widening is actually
> >> > done with a widening add of zero - eh).
> >> > IIRC GCN is the same here.
> >> 
> >> SVE currently does this too, but the addition and widening are
> >> separate operations.  E.g. in principle there's no reason why
> >> you can't sign-extend one operand, zero-extend the other, and
> >> then add the result together.  Or you could extend them from
> >> different sizes (QI and HI).  All of those are supported
> >> (if the costing allows them).
> >
> > I see.  So why does the target the expose widen_[su]add at all?
> 
> It shouldn't (need to) do that.  I don't think we should have an optab
> for the unsplit operation.
> 
> At least on SVE, we really want the extensions to be fused with loads
> (where possible) rather than with arithmetic.
> 
> We can still do the widening arithmetic in one go.  It's just that
> fusing with the loads works for the mixed-sign and mixed-size cases,
> and can handle more than just doubling the element size.
> 
> >> If the target has operations to do combined extending and adding (or
> >> whatever), then at the moment we rely on combine to generate them.
> >> 
> >> So I think this case is separate from Andre's work.  The addition
> >> itself is just an ordinary addition, and any widening happens by
> >> vectorising a CONVERT/NOP_EXPR.
> >> 
> >> >  2) with a widen_[su]add{_lo,_hi} combo - that's what the tree
> >> > codes currently support (exclusively)
> >> >  3) similar, but widen_[su]add{_even,_odd}
> >> >
> >> > that said, things like decomposes_to_hilo_fn_p look to paint us into
> >> > a 2) corner without good reason.
> >> 
> >> I suppose one question is: how much of the patch is really specific
> >> to HI/LO, and how much is just grouping two halves together?
> >
> > Yep, that I don't know for sure.
> >
> >>  The n

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-15 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Fri, 12 May 2023, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Fri, 12 May 2023, Andre Vieira (lists) wrote:
>> >
>> >> I have dealt with, I think..., most of your comments. There's quite a few
>> >> changes, I think it's all a bit simpler now. I made some other changes to 
>> >> the
>> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to 
>> >> preserve
>> >> the same behaviour as we had with the tree codes before. Also added some 
>> >> extra
>> >> checks to tree-cfg.cc that made sense to me.
>> >> 
>> >> I am still regression testing the gimple-range-op change, as that was a 
>> >> last
>> >> minute change, but the rest survived a bootstrap and regression test on
>> >> aarch64-unknown-linux-gnu.
>> >> 
>> >> cover letter:
>> >> 
>> >> This patch replaces the existing tree_code widen_plus and widen_minus
>> >> patterns with internal_fn versions.
>> >> 
>> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and 
>> >> DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
>> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN 
>> >> respectively
>> >> except they provide convenience wrappers for defining conversions that 
>> >> require
>> >> a hi/lo split.  Each definition for  will require optabs for _hi 
>> >> and _lo
>> >> and each of those will also require a signed and unsigned version in the 
>> >> case
>> >> of widening. The hi/lo pair is necessary because the widening and 
>> >> narrowing
>> >> operations take n narrow elements as inputs and return n/2 wide elements 
>> >> as
>> >> outputs. The 'lo' operation operates on the first n/2 elements of input. 
>> >> The
>> >> 'hi' operation operates on the second n/2 elements of input. Defining an
>> >> internal_fn along with hi/lo variations allows a single internal function 
>> >> to
>> >> be returned from a vect_recog function that will later be expanded to 
>> >> hi/lo.
>> >> 
>> >> 
>> >>  For example:
>> >>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
>> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_add_hi_ ->
>> >> (u/s)addl2
>> >>IFN_VEC_WIDEN_PLUS_LO  -> 
>> >> vec_widen_add_lo_
>> >> -> (u/s)addl
>> >> 
>> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS 
>> >> tree
>> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
>> >
>> > What I still don't understand is how we are so narrowly focused on
>> > HI/LO?  We need a combined scalar IFN for pattern selection (not
>> > sure why that's now called _HILO, I expected no suffix).  Then there's
>> > three possibilities the target can implement this:
>> >
>> >  1) with a widen_[su]add instruction - I _think_ that's what
>> > RISCV is going to offer since it is a target where vector modes
>> > have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
>> > RVV can do a V4HI to V4SI widening and widening add/subtract
>> > using vwadd[u] and vwsub[u] (the HI->SI widening is actually
>> > done with a widening add of zero - eh).
>> > IIRC GCN is the same here.
>> 
>> SVE currently does this too, but the addition and widening are
>> separate operations.  E.g. in principle there's no reason why
>> you can't sign-extend one operand, zero-extend the other, and
>> then add the result together.  Or you could extend them from
>> different sizes (QI and HI).  All of those are supported
>> (if the costing allows them).
>
> I see.  So why does the target the expose widen_[su]add at all?

It shouldn't (need to) do that.  I don't think we should have an optab
for the unsplit operation.

At least on SVE, we really want the extensions to be fused with loads
(where possible) rather than with arithmetic.

We can still do the widening arithmetic in one go.  It's just that
fusing with the loads works for the mixed-sign and mixed-size cases,
and can handle more than just doubling the element size.

>> If the target has operations to do combined extending and adding (or
>> whatever), then at the moment we rely on combine to generate them.
>> 
>> So I think this case is separate from Andre's work.  The addition
>> itself is just an ordinary addition, and any widening happens by
>> vectorising a CONVERT/NOP_EXPR.
>> 
>> >  2) with a widen_[su]add{_lo,_hi} combo - that's what the tree
>> > codes currently support (exclusively)
>> >  3) similar, but widen_[su]add{_even,_odd}
>> >
>> > that said, things like decomposes_to_hilo_fn_p look to paint us into
>> > a 2) corner without good reason.
>> 
>> I suppose one question is: how much of the patch is really specific
>> to HI/LO, and how much is just grouping two halves together?
>
> Yep, that I don't know for sure.
>
>>  The nice
>> thing about the internal-fn grouping macros is that, if (3) is
>> implemented in future, the structure will strongly encourage even/odd
>> pairs to be supported for all operations that support hi/lo.  That is,
>> I would expect the grouping macros to be extended to define

Re: [PATCH 2/2] ivopts: Revert register pressure cost when there are enough registers.

2023-05-15 Thread Richard Biener via Gcc-patches

On Wed, Dec 21, 2022 at 2:12 PM Dimitrije Milošević
 wrote:
>
> When there are enough registers, the register pressure cost is
> unnecessarily bumped by adding another n_cands.
>
> This behavior may result in register pressure costs for the case
> when there are enough registers being higher than for other cases.
>
> When there are enough registers, the register pressure cost should be
> equal to n_invs + n_cands.
>
> This used to be the case before c18101f.
>
> gcc/ChangeLog:
>
> * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
>
> Signed-off-by: Dimitrije Milosevic 
> ---
>  gcc/tree-ssa-loop-ivopts.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 60c61dc9e49..3176482d0d9 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -6092,7 +6092,7 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, 
> unsigned n_invs,
>
>/* If we have enough registers.  */
>if (regs_needed + target_res_regs < available_regs)
> -cost = n_new;
> +return n_new;

This still doesn't make much sense (before nor after).  We're
comparing apples and oranges.

I think it would make most sense to merge this case with the following
and thus do
the following.  The distinction between the cases should be preserved
and attenuated
by the adding of n_cands at the end (as tie-breaker).

Does this help the mips case?  I'm going to throw it at x86_64-linux
bootstrap/regtest.

Btw, I don't think using address complexity makes much sense for a port that
has only one addressing mode so I guess a better approach for 1/2 would be
to make sure it is consistently the same value (I suppose it is not, otherwise
you wouldn't have changed it).  Oh, and we're adding the
reg-pressure cost to the same bucket as well, and there we don't really know
how many times we're going to spill.  That said, I think ->complexity should
rather go away - we are asking for address-cost already and IVOPTs uses
built RTX to query the target.

But yes, I agree ivopts_estimate_reg_pressure has an issue.

Sorry for the very long delay,
Richard.

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 6fbd2d59318..bc8493622de 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -6077,8 +6077,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data
*data, unsigned n_invs,
  unsigned n_cands)
 {
   unsigned cost;
-  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
-  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
+  unsigned n_old = data->regs_used;
+  unsigned regs_needed = n_invs + n_cands + n_old;
+  unsigned available_regs = target_avail_regs;
   bool speed = data->speed;

   /* If there is a call in the loop body, the call-clobbered registers
@@ -6087,10 +6088,7 @@ ivopts_estimate_reg_pressure (struct
ivopts_data *data, unsigned n_invs,
 available_regs = available_regs - target_clobbered_regs;

   /* If we have enough registers.  */
-  if (regs_needed + target_res_regs < available_regs)
-cost = n_new;
-  /* If close to running out of registers, try to preserve them.  */
-  else if (regs_needed <= available_regs)
+  if (regs_needed <= available_regs)
 cost = target_reg_cost [speed] * regs_needed;
   /* If we run out of available registers but the number of candidates
  does not, we penalize extra registers using target_spill_cost.  */


>/* If close to running out of registers, try to preserve them.  */
>else if (regs_needed <= available_regs)
>  cost = target_reg_cost [speed] * regs_needed;
> --
> 2.25.1
>

Re: [PATCH] Add auto-resizing capability to irange's [PR109695]

2023-05-15 Thread Jakub Jelinek via Gcc-patches

On Mon, May 15, 2023 at 12:35:23PM +0200, Aldy Hernandez wrote:
> gcc/ChangeLog:
> 
>   PR tree-optimization/109695
>   * value-range.cc (irange::operator=): Resize range.
>   (irange::union_): Same.
>   (irange::intersect): Same.
>   (irange::invert): Same.
>   (int_range_max): Default to 3 sub-ranges and resize as needed.
>   * value-range.h (irange::maybe_resize): New.
>   (~int_range): New.
>   (int_range::int_range): Adjust for resizing.
>   (int_range::operator=): Same.

LGTM.

One question is if we shouldn't do it for GCC13/GCC12 as well, perhaps
changing it to some larger number than 3 when the members aren't wide_ints
in there but just trees.  Sure, in 13/12 the problem is 10x less severe
than in current trunk, but still we have some cases where we run out of
stack because of it on some hosts.

Jakub

1 2 >

1 - 100 of 171 matches

Mail list logo