Re: [PATCH v2] RISC-V: Error early with V and no M extension.

2024-07-24 Thread Palmer Dabbelt

On Wed, 24 Jul 2024 08:25:30 PDT (-0700), Robin Dapp wrote:

Hi,

now with proper diff...

For calculating the value of a poly_int at runtime we use a
multiplication instruction that requires the M extension.
Instead of just asserting and ICEing this patch emits an early
error at option-parsing time.

We have several tests that use only "i" (without "m") and I adjusted all of
them to "im".  For now, I didn't verify if the original error just with "i"
still occurs but just added "m".

Tested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

PR target/116036

* config/riscv/riscv.cc (riscv_override_options_internal): Error
with TARGET_VECTOR && !TARGET_MUL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
* gcc.target/riscv/arch-32.c: Ditto.
* gcc.target/riscv/arch-37.c: Ditto.
* gcc.target/riscv/arch-38.c: Ditto.
* gcc.target/riscv/predef-14.c: Ditto.
* gcc.target/riscv/predef-15.c: Ditto.
* gcc.target/riscv/predef-16.c: Ditto.
* gcc.target/riscv/predef-26.c: Ditto.
* gcc.target/riscv/predef-27.c: Ditto.
* gcc.target/riscv/predef-32.c: Ditto.
* gcc.target/riscv/predef-33.c: Ditto.
* gcc.target/riscv/predef-36.c: Ditto.
* gcc.target/riscv/predef-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
* gcc.target/riscv/compare-debug-1.c: Ditto.
* gcc.target/riscv/compare-debug-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr116036.c: New test.
---
 gcc/config/riscv/riscv.cc |  5 +
 gcc/testsuite/gcc.target/riscv/arch-31.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-32.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-37.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-38.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/compare-debug-1.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/compare-debug-2.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/predef-14.c|  6 +++---
 gcc/testsuite/gcc.target/riscv/predef-15.c|  4 ++--
 gcc/testsuite/gcc.target/riscv/predef-16.c|  4 ++--
 gcc/testsuite/gcc.target/riscv/predef-26.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-27.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-32.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-33.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-36.c|  6 +-
 gcc/testsuite/gcc.target/riscv/predef-37.c|  6 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111486.c |  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c| 11 +++
 18 files changed, 60 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116036.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7016a33cce3..fcdb7ab08dd 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9691,6 +9691,11 @@ riscv_override_options_internal (struct gcc_options 
*opts)
   else if (!TARGET_MUL_OPTS_P (opts) && TARGET_DIV_OPTS_P (opts))
 error ("%<-mdiv%> requires %<-march%> to subsume the % extension");
 
+  /* We might use a multiplication to calculate the scalable vector length at

+ runtime.  Therefore, require the M extension.  */
+  if (TARGET_VECTOR && !TARGET_MUL)
+sorry ("the % extension requires the % extension");


It's really GCC's implementation of the V extension that requires M, not 
the actul ISA V extension.  So I think the wording could be a little 
confusing for users here, but no big deal either way on my end so


Reviewed-by: Palmer Dabbelt 

Thanks!


+
   /* Likewise floating-point division and square root.  */
   if ((TARGET_HARD_FLOAT_OPTS_P (opts) || TARGET_ZFINX_OPTS_P (opts))
   && ((target_flags_explicit & MASK_FDIV) == 0))
diff --git a/gcc/testsuite/gcc.target/riscv/arch-31.c 
b/gcc/testsuite/gcc.target/riscv/arch-31.c
index 5180753b905..9b867c5ecd2 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-31.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-31.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32i_zvfbfmin -mabi=ilp32f" } */
+/* { dg-options "-march=rv32im_zvfbfmin -mabi=ilp32f" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-32.c 
b/gcc/testsuite/gcc.target/riscv/arch-32.c
index 49616832512..49a3db79489 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-32.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64iv_zvfbfmin -mabi=lp64d" } */
+/* { dg-options "-march=rv64imv_zvfbfmin -mabi=lp64d" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-37.c 
b/gcc/

Re: [PATCH v3] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-07-23 Thread Palmer Dabbelt

On Mon, 22 Jul 2024 07:16:28 PDT (-0700), kito.ch...@sifive.com wrote:

This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.

The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
  - For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...

This API is intended to provide the capability to query minimal common 
available extensions on the system.

Proposal in riscv-c-api-doc: 
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74


That's not merged, but I'm not sure what the rules are on stability for 
the C API doc.



Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.

Changes since v2:
- Prevent it initialize more than once.

Changes since v1:
- Fix the format.
- Prevented race conditions by introducing a local variable to avoid load/store
  operations during the computation of the feature bit.

libgcc/ChangeLog:

* config/riscv/feature_bits.c: New.
* config/riscv/t-elf (LIB2ADD): Add feature_bits.c.
---
 libgcc/config/riscv/feature_bits.c | 313 +
 libgcc/config/riscv/t-elf  |   1 +
 2 files changed, 314 insertions(+)
 create mode 100644 libgcc/config/riscv/feature_bits.c

diff --git a/libgcc/config/riscv/feature_bits.c 
b/libgcc/config/riscv/feature_bits.c
new file mode 100644
index 000..cce4fbfa6be
--- /dev/null
+++ b/libgcc/config/riscv/feature_bits.c
@@ -0,0 +1,313 @@
+/* Helper function for function multi-versioning for RISC-V.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#define RISCV_FEATURE_BITS_LENGTH 1
+struct {
+  unsigned length;
+  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
+} __riscv_feature_bits __attribute__((visibility("hidden"), nocommon));
+
+#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
+
+struct {
+  unsigned vendorID;
+  unsigned length;
+  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
+} __riscv_vendor_feature_bits __attribute__((visibility("hidden"), nocommon));
+
+#define A_GROUPID 0
+#define A_BITMASK (1ULL << 0)
+#define C_GROUPID 0
+#define C_BITMASK (1ULL << 2)
+#define D_GROUPID 0
+#define D_BITMASK (1ULL << 3)
+#define F_GROUPID 0
+#define F_BITMASK (1ULL << 5)
+#define I_GROUPID 0
+#define I_BITMASK (1ULL << 8)
+#define M_GROUPID 0
+#define M_BITMASK (1ULL << 12)
+#define V_GROUPID 0
+#define V_BITMASK (1ULL << 21)
+#define ZACAS_GROUPID 0
+#define ZACAS_BITMASK (1ULL << 26)
+#define ZBA_GROUPID 0
+#define ZBA_BITMASK (1ULL << 27)
+#define ZBB_GROUPID 0
+#define ZBB_BITMASK (1ULL << 28)
+#define ZBC_GROUPID 0
+#define ZBC_BITMASK (1ULL << 29)
+#define ZBKB_GROUPID 0
+#define ZBKB_BITMASK (1ULL << 30)
+#define ZBKC_GROUPID 0
+#define ZBKC_BITMASK (1ULL << 31)
+#define ZBKX_GROUPID 0
+#define ZBKX_BITMASK (1ULL << 32)
+#define ZBS_GROUPID 0
+#define ZBS_BITMASK (1ULL << 33)
+#define ZFA_GROUPID 0
+#define ZFA_BITMASK (1ULL << 34)
+#define ZFH_GROUPID 0
+#define ZFH_BITMASK (1ULL << 35)
+#define ZFHMIN_GROUPID 0
+#define ZFHMIN_BITMASK (1ULL << 36)
+#define ZICBOZ_GROUPID 0
+#define ZICBOZ_BITMASK (1ULL << 37)
+#define ZICOND_GROUPID 0
+#define ZICOND_BITMASK (1ULL << 38)
+#define ZIHINTNTL_GROUPID 0
+#define ZIHINTNTL_BITMASK (1ULL << 39)
+#define ZIHINTPAUSE_GROUPID 0
+#define ZIHINTPAUSE_BITMASK (1ULL << 40)
+#define ZKND_GROUPID 0
+#define ZKND_BITMASK (1ULL << 41)
+#define ZKNE_GROUPID 0
+#define ZKNE_BITMASK (1ULL << 42)
+#define ZKNH_GROUPID 0
+#define ZKNH_BITMASK (1ULL << 43)
+#define ZKSED_GROUPID 0
+#define ZKSED_BITMASK (1ULL << 44)
+#define ZKSH_GROUPID 0
+#define ZKSH_BITMASK (1ULL << 45)
+#define ZKT_GROUPID 0
+#define ZKT_BITMASK (1ULL << 46)
+#define ZTSO_GROUPID 0
+#define 

Re: [PATCH] RISC-V: Implement __init_riscv_features_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-07-16 Thread Palmer Dabbelt

On Tue, 16 Jul 2024 07:49:13 PDT (-0700), kito.ch...@sifive.com wrote:

This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.

The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
  - For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...

This API is intended to provide the capability to query minimal common 
available extensions on the system.

Proposal in riscv-c-api-doc: 
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74

Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.


Is LLVM actually going to use this?  Last I heard the plan was to just 
wait until we're much closer to the glibc release so the hwprobe() libc 
ABI is frozen and then just call that directly.




libgcc/ChangeLog:

* config/riscv/feature_bits.c: New.
* config/riscv/t-elf (LIB2ADD): Add feature_bits.c.
---
 libgcc/config/riscv/feature_bits.c | 298 +
 libgcc/config/riscv/t-elf  |   1 +
 2 files changed, 299 insertions(+)
 create mode 100644 libgcc/config/riscv/feature_bits.c

diff --git a/libgcc/config/riscv/feature_bits.c 
b/libgcc/config/riscv/feature_bits.c
new file mode 100644
index 000..c207dbba67c
--- /dev/null
+++ b/libgcc/config/riscv/feature_bits.c
@@ -0,0 +1,298 @@
+/* Helper function for function multi-versioning for RISC-V.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#define RISCV_FEATURE_BITS_LENGTH 1
+struct {
+  unsigned length;
+  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
+} __riscv_feature_bits __attribute__((visibility("hidden"), nocommon));
+
+#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
+
+struct {
+  unsigned vendorID;
+  unsigned length;
+  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
+} __riscv_vendor_feature_bits __attribute__((visibility("hidden"), nocommon));
+
+#define A_GROUPID 0
+#define A_BITMASK (1ULL << 0)
+#define C_GROUPID 0
+#define C_BITMASK (1ULL << 2)
+#define D_GROUPID 0
+#define D_BITMASK (1ULL << 3)
+#define F_GROUPID 0
+#define F_BITMASK (1ULL << 5)
+#define I_GROUPID 0
+#define I_BITMASK (1ULL << 8)
+#define M_GROUPID 0
+#define M_BITMASK (1ULL << 12)
+#define V_GROUPID 0
+#define V_BITMASK (1ULL << 21)
+#define ZACAS_GROUPID 0
+#define ZACAS_BITMASK (1ULL << 26)
+#define ZBA_GROUPID 0
+#define ZBA_BITMASK (1ULL << 27)
+#define ZBB_GROUPID 0
+#define ZBB_BITMASK (1ULL << 28)
+#define ZBC_GROUPID 0
+#define ZBC_BITMASK (1ULL << 29)
+#define ZBKB_GROUPID 0
+#define ZBKB_BITMASK (1ULL << 30)
+#define ZBKC_GROUPID 0
+#define ZBKC_BITMASK (1ULL << 31)
+#define ZBKX_GROUPID 0
+#define ZBKX_BITMASK (1ULL << 32)
+#define ZBS_GROUPID 0
+#define ZBS_BITMASK (1ULL << 33)
+#define ZFA_GROUPID 0
+#define ZFA_BITMASK (1ULL << 34)
+#define ZFH_GROUPID 0
+#define ZFH_BITMASK (1ULL << 35)
+#define ZFHMIN_GROUPID 0
+#define ZFHMIN_BITMASK (1ULL << 36)
+#define ZICBOZ_GROUPID 0
+#define ZICBOZ_BITMASK (1ULL << 37)
+#define ZICOND_GROUPID 0
+#define ZICOND_BITMASK (1ULL << 38)
+#define ZIHINTNTL_GROUPID 0
+#define ZIHINTNTL_BITMASK (1ULL << 39)
+#define ZIHINTPAUSE_GROUPID 0
+#define ZIHINTPAUSE_BITMASK (1ULL << 40)
+#define ZKND_GROUPID 0
+#define ZKND_BITMASK (1ULL << 41)
+#define ZKNE_GROUPID 0
+#define ZKNE_BITMASK (1ULL << 42)
+#define ZKNH_GROUPID 0
+#define ZKNH_BITMASK (1ULL << 43)
+#define ZKSED_GROUPID 0
+#define ZKSED_BITMASK (1ULL << 44)
+#define ZKSH_GROUPID 0
+#define ZKSH_BITMASK (1ULL << 45)
+#define ZKT_GROUPID 0
+#define ZKT_BITMASK (1ULL << 46)
+#define ZTSO_GROUPID 0
+#define ZTSO_BITMASK (1ULL << 47)
+#define ZVBB_GROUPID 0
+#define ZVBB_BITMASK (1ULL << 48)
+#define ZVBC_GROUPID 0
+#define 

[RFC PATCH] cse: Add another CSE pass after split1

2024-06-27 Thread Palmer Dabbelt
This is really more of a question than a patch.

Looking at PR/115687 I managed to convince myself there's a general
class of problems here: splitting might produce constant subexpressions,
but as far as I can tell there's nothing to eliminate those constant
subexpressions.  So I very quickly threw together a CSE that doesn't
fold expressions, and it does eliminate the high-part constants in
question.

At that point I realized the implementation here is bogus: it's not the
folding that's the problem, but introducing new expressions post-split
would break things -- or at least I think it would, we'd end up with
insns the backends don't expect to have that late.  I'm not sure if
split2 would end up cleaning all that up at a functional level, but it
certainly seems like optimization would be pretty far off the rails at
that point and thus doesn't seem like a good idea.  I'm also not sure
how effective this would be without doing the folding, as without
folding we can only eliminate the last insn in the constant sequence --
that's fine here, but it wouldn't work for more complicated stuff.

So I think if this was to go anywhere we'd want to have a CSE that
really only eliminates expressions (ie, doesn't do any of the other
juggling to try and produce more constant subexpressions).  There's a
few places where new expressions can be introduced, so it'd probably be
better done as a new cse_insn-type function instead of just a flag.  It
seems somewhat manageable to write, though.

That said, I really don't know what I'm doing here.  So I figured I'd
just send out what I'd put together, mostly as a way to ask if it's
worth putting time into this?
---
 gcc/common.opt  |   4 ++
 gcc/cse.cc  | 112 ++--
 gcc/opts.cc |   1 +
 gcc/passes.def  |   1 +
 gcc/tree-pass.h |   1 +
 5 files changed, 105 insertions(+), 14 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 327230967ea..efc4b8ddaf3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2695,6 +2695,10 @@ frerun-cse-after-loop
 Common Var(flag_rerun_cse_after_loop) Optimization
 Add a common subexpression elimination pass after loop optimizations.
 
+frerun-cse-after-split
+Common Var(flag_rerun_cse_after_split) Optimization
+Add a common subexpression elimination pass after splitting instructions.
+
 frerun-loop-opt
 Common Ignore
 Does nothing.  Preserved for backward compatibility.
diff --git a/gcc/cse.cc b/gcc/cse.cc
index c53deecbe54..d3955001ce7 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -543,11 +543,11 @@ static rtx fold_rtx (rtx, rtx_insn *);
 static rtx equiv_constant (rtx);
 static void record_jump_equiv (rtx_insn *, bool);
 static void record_jump_cond (enum rtx_code, machine_mode, rtx, rtx);
-static void cse_insn (rtx_insn *);
+static void cse_insn (rtx_insn *, int);
 static void cse_prescan_path (struct cse_basic_block_data *);
 static void invalidate_from_clobbers (rtx_insn *);
 static void invalidate_from_sets_and_clobbers (rtx_insn *);
-static void cse_extended_basic_block (struct cse_basic_block_data *);
+static void cse_extended_basic_block (struct cse_basic_block_data *, int);
 extern void dump_class (struct table_elt*);
 static void get_cse_reg_info_1 (unsigned int regno);
 static struct cse_reg_info * get_cse_reg_info (unsigned int regno);
@@ -4511,12 +4511,13 @@ canonicalize_insn (rtx_insn *insn, vec 
*psets)
 
 /* Main function of CSE.
First simplify sources and addresses of all assignments
-   in the instruction, using previously-computed equivalents values.
+   in the instruction, using previously-computed equivalents values when
+   simplification is allowed.
Then install the new sources and destinations in the table
of available values.  */
 
 static void
-cse_insn (rtx_insn *insn)
+cse_insn (rtx_insn *insn, int simplify)
 {
   rtx x = PATTERN (insn);
   int i;
@@ -4662,9 +4663,15 @@ cse_insn (rtx_insn *insn)
   else
src_eqv_here = src_eqv;
 
-  /* Simplify and foldable subexpressions in SRC.  Then get the fully-
-simplified result, which may not necessarily be valid.  */
-  src_folded = fold_rtx (src, NULL);
+  /* If simplification is enabled, then simplify and foldable
+subexpressions in SRC.  Then get the fully-simplified result, which
+may not necessarily be valid.
+
+Otherwise, just leave SRC alone.  */
+  if (simplify)
+   src_folded = fold_rtx (src, NULL);
+  else
+   src_folded = src;
 
 #if 0
   /* ??? This caused bad code to be generated for the m68k port with -O2.
@@ -6504,7 +6511,7 @@ check_for_label_ref (rtx_insn *insn)
 /* Process a single extended basic block described by EBB_DATA.  */
 
 static void
-cse_extended_basic_block (struct cse_basic_block_data *ebb_data)
+cse_extended_basic_block (struct cse_basic_block_data *ebb_data, int simplify)
 {
   int path_size = ebb_data->path_size;
   int path_entry;
@@ -6571,7 +6578,7 @@ cse_extended_basic_block (struct 

Re: [PATCH] RISC-V: Add support for Zabha extension

2024-06-26 Thread Palmer Dabbelt

On Wed, 26 Jun 2024 08:50:57 PDT (-0700), Andrea Parri wrote:

Tested using amo.exp with rv64gc_zalrsc, rv64id_zaamo, rv64id_zalrsc,
rv64id_zabha (using tip-of-tree qemu w/ zabha patches [2] applied for
execution tests).


My interpretation of the Zabha specification, in particular of "The Zabha
extension depends upon the Zaamo standard extension", is that rv64id_zabha
should result in a dependency violation (some compiler warning).

The changes at stake seem instead to make the Zabha extension "select" the
Zaamo extension; IOW, these changes seem to make rv64id_zabha an alias of
rv64id_zaamo_zabha: I am wondering whether this was intentional?


I think your interpretation of "depends on" is reasonable, but it's not 
the way we've handled it for other extension dependencies.  For the 
others we're treating "depends on" the way this code does, ie enabling 
the dependant extensions implicitly.  IIRC that's how the RISC-V specs 
want it to be.


That said, we do call it "implied" in the sources because that's really 
the right word for it.  So we should probably add something to the docs 
that describes how/why things are this way, as I don't think it's the 
first time someone's been confused.


Maybe just something like

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23d90db2925..429275d56df 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -31037,6 +31037,10 @@ If both @option{-march} and @option{-mcpu=} are not 
specified, the default for
this argument is system dependent, users who want a specific architecture
extensions should specify one explicitly.

+When the RISC-V specifications define an extension as depending on other
+extensions, GCC will implicitly add the dependant extensions to the enabled
+extension set if they weren't added explicitly.
+
@opindex mcpu
@item -mcpu=@var{processor-string}
Use architecture of and optimize the output for the given processor, specified

would do it?



  Andrea


Re: [PATCH] RISC-V: Support -m[no-]unaligned-access

2024-06-24 Thread Palmer Dabbelt

On Fri, 22 Dec 2023 01:23:13 PST (-0800), wangpengcheng...@bytedance.com wrote:

These two options are negative alias of -m[no-]strict-align.

This matches LLVM implmentation.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add option alias.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-align-10.c: New test.
* gcc.target/riscv/predef-align-7.c: New test.
* gcc.target/riscv/predef-align-8.c: New test.
* gcc.target/riscv/predef-align-9.c: New test.

Signed-off-by: Wang Pengcheng


Sorry for being slow here.  With the scalar/vector alignment split we're 
cleaning up a bunch of these LLVM/GCC differences, and we're waiting for 
the LLVM folks to decide how these are going to behave.  LLVM will 
release well before GCC does, so we've got some time.


So this isn't lost, just slow.


---
gcc/config/riscv/riscv.opt | 4 
gcc/testsuite/gcc.target/riscv/predef-align-10.c | 16 
gcc/testsuite/gcc.target/riscv/predef-align-7.c | 15 +++
gcc/testsuite/gcc.target/riscv/predef-align-8.c | 16 
gcc/testsuite/gcc.target/riscv/predef-align-9.c | 15 +++
5 files changed, 66 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-9.c

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index cf207d4dcdf..1e22998ce6e 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -116,6 +116,10 @@ mstrict-align
Target Mask(STRICT_ALIGN) Save
Do not generate unaligned memory accesses.

+munaligned-access
+Target Alias(mstrict-align) NegativeAlias
+Enable unaligned memory accesses.
+
Enum
Name(code_model) Type(enum riscv_code_model)
Known code models (for use with the -mcmodel= option):
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-10.c
b/gcc/testsuite/gcc.target/riscv/predef-align-10.c
new file mode 100644
index 000..c86b2c7a5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-10.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=rocket -munaligned-access" } */
+
+int main() {
+
+/* rocket default is cpu tune param misaligned access slow */
+#if !defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_slow is not set"
+#endif
+
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_fast is
unexpectedly set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-7.c
b/gcc/testsuite/gcc.target/riscv/predef-align-7.c
new file mode 100644
index 000..405f3686c2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-7.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=thead-c906 -mno-unaligned-access" } */
+
+int main() {
+
+#if !defined(__riscv_misaligned_avoid)
+#error "__riscv_misaligned_avoid is not set"
+#endif
+
+#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly
set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-8.c
b/gcc/testsuite/gcc.target/riscv/predef-align-8.c
new file mode 100644
index 000..64072c04a47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-8.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=thead-c906 -munaligned-access" } */
+
+int main() {
+
+/* thead-c906 default is cpu tune param misaligned access fast */
+#if !defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_fast is not set"
+#endif
+
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_slow is
unexpectedly set"
+#endif
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-9.c
b/gcc/testsuite/gcc.target/riscv/predef-align-9.c
new file mode 100644
index 000..f5418de87cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-9.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=rocket -mno-unaligned-access" } */
+
+int main() {
+
+#if !defined(__riscv_misaligned_avoid)
+#error "__riscv_misaligned_avoid is not set"
+#endif
+
+#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly
set"
+#endif
+
+ return 0;
+}


Re: [PATCH] RISC-V: Add configure check for Zaamo/Zalrsc assembler support

2024-06-12 Thread Palmer Dabbelt

On Wed, 12 Jun 2024 16:56:09 PDT (-0700), Patrick O'Neill wrote:


On 6/12/24 16:49, Sam James wrote:

Palmer Dabbelt  writes:


On Wed, 12 Jun 2024 16:20:26 PDT (-0700), Patrick O'Neill wrote:

Binutils 2.42 and before don't support Zaamo/Zalrsc. Add a configure
check to prevent emitting Zaamo/Zalrsc in the arch string when the
assember does not support it.

Should we just rewrite these to A when binutils doesn't support the
subsets?  That'd avoid a forced binutils bump, but really user should
be upgrading anyway...  Either way

Acked-by: Palmer Dabbelt  # RISC-V
Reviewed-by: Palmer Dabbelt  # RISC-V

though I'm not suer if the configure churn is sane, it looks like a
version mismatch of some sort.  Hopefully someone who knows those bits
better can chime in?

Your instinct is right!


gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
  (riscv_subset_list::to_string): Skip zaamo/zalrsc when not
  supported by the assembler.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add zaamo/zalrsc assmeber check.

Signed-off-by: Patrick O'Neill 
---
Tested using newlib rv64gc with binutils tip-of-tree and 2.42.

This results in calls being emitted when compiling for _zaamo_zalrsc
when the assember does not support these extensions.


cat amo.c

void foo (int* bar, int* baz)
{
   __atomic_add_fetch(bar, baz, __ATOMIC_RELAXED);
}

gcc -march=rv64id_zaamo_zalrsc -O3 amo.c

results in:
foo:
 sext.w  a1,a1
 li  a2,0
 tail__atomic_fetch_add_4

As a result there are some testsuite failures on zalrsc specific
testcases and when using an old version of binutils on non-a targets.
Not a cause for concern imo but worth calling out.
Also testcases that check for the default isa string will fail with
the old binutils since zaamo/zalrsc aren't emitted anymore.
---
  gcc/common/config/riscv/riscv-common.cc | 11 +++
  gcc/config.in   |  6 
  gcc/configure   | 41 ++---
  gcc/configure.ac|  5 +++
  4 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 78dfd6b1470..1dc1d9904c7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -916,6 +916,7 @@ riscv_subset_list::to_string (bool version_p) const
riscv_subset_t *subset;

bool skip_zifencei = false;
+  bool skip_zaamo_zalrsc = false;
bool skip_zicsr = false;
bool i2p0 = false;

@@ -943,6 +944,10 @@ riscv_subset_list::to_string (bool version_p) const
   a mistake in that binutils 2.35 supports zicsr but not zifencei.  */
skip_zifencei = true;
  #endif
+#ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC
+  /* Skip since binutils 2.42 and earlier don't recognize zaamo/zalrsc.  */
+  skip_zaamo_zalrsc = true;
+#endif

for (subset = m_head; subset != NULL; subset = subset->next)
  {
@@ -954,6 +959,12 @@ riscv_subset_list::to_string (bool version_p) const
  subset->name == "zicsr")
continue;

+  if (skip_zaamo_zalrsc && subset->name == "zaamo")
+   continue;
+
+  if (skip_zaamo_zalrsc && subset->name == "zalrsc")
+   continue;
+
/* For !version_p, we only separate extension with underline for
 multi-letter extension.  */
if (!first &&
diff --git a/gcc/config.in b/gcc/config.in
index e41b6dc97cd..acab3c0f126 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -629,6 +629,12 @@
  #endif


+/* Define if the assembler understands -march=rv*_zaamo_zalrsc. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_MARCH_ZAAMO_ZALRSC
+#endif
+
+
  /* Define if the assembler understands -march=rv*_zifencei. */
  #ifndef USED_FOR_TARGET
  #undef HAVE_AS_MARCH_ZIFENCEI
diff --git a/gcc/configure b/gcc/configure
index aaf5899cc03..09b794c1225 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -6228,7 +6228,7 @@ else
  We can't simply define LARGE_OFF_T to be 9223372036854775807,
  since some C++ compilers masquerading as C compilers
  incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721

I think you may be using patched autoconf which fixes
http://bugs.debian.org/742780.

The fix landed in 2.70: 
https://git.savannah.gnu.org/gitweb/?p=autoconf.git;a=commit;h=a1d8293f3bfa2516f9a0424e3a6e63c2f8e93c6e.

Please drop those hunks.


I thought I could get away with using the apt autoconf2.69 package
directly ;)

Thanks. I'll regenerate without those hunks for v2.


FWIW, I just use the distro packages, toss the hunks I don't like, and 
thes re-build things to make sure it doesn't fall

Re: [PATCH] RISC-V: Add configure check for Zaamo/Zalrsc assembler support

2024-06-12 Thread Palmer Dabbelt

On Wed, 12 Jun 2024 16:20:26 PDT (-0700), Patrick O'Neill wrote:

Binutils 2.42 and before don't support Zaamo/Zalrsc. Add a configure
check to prevent emitting Zaamo/Zalrsc in the arch string when the
assember does not support it.


Should we just rewrite these to A when binutils doesn't support the 
subsets?  That'd avoid a forced binutils bump, but really user should be 
upgrading anyway...  Either way


Acked-by: Palmer Dabbelt  # RISC-V
Reviewed-by: Palmer Dabbelt  # RISC-V

though I'm not suer if the configure churn is sane, it looks like a 
version mismatch of some sort.  Hopefully someone who knows those bits 
better can chime in?



gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
  (riscv_subset_list::to_string): Skip zaamo/zalrsc when not
  supported by the assembler.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add zaamo/zalrsc assmeber check.

Signed-off-by: Patrick O'Neill 
---
Tested using newlib rv64gc with binutils tip-of-tree and 2.42.

This results in calls being emitted when compiling for _zaamo_zalrsc
when the assember does not support these extensions.


cat amo.c

void foo (int* bar, int* baz)
{
  __atomic_add_fetch(bar, baz, __ATOMIC_RELAXED);
}

gcc -march=rv64id_zaamo_zalrsc -O3 amo.c

results in:
foo:
sext.w  a1,a1
li  a2,0
tail__atomic_fetch_add_4

As a result there are some testsuite failures on zalrsc specific
testcases and when using an old version of binutils on non-a targets.
Not a cause for concern imo but worth calling out.
Also testcases that check for the default isa string will fail with
the old binutils since zaamo/zalrsc aren't emitted anymore.
---
 gcc/common/config/riscv/riscv-common.cc | 11 +++
 gcc/config.in   |  6 
 gcc/configure   | 41 ++---
 gcc/configure.ac|  5 +++
 4 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 78dfd6b1470..1dc1d9904c7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -916,6 +916,7 @@ riscv_subset_list::to_string (bool version_p) const
   riscv_subset_t *subset;

   bool skip_zifencei = false;
+  bool skip_zaamo_zalrsc = false;
   bool skip_zicsr = false;
   bool i2p0 = false;

@@ -943,6 +944,10 @@ riscv_subset_list::to_string (bool version_p) const
  a mistake in that binutils 2.35 supports zicsr but not zifencei.  */
   skip_zifencei = true;
 #endif
+#ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC
+  /* Skip since binutils 2.42 and earlier don't recognize zaamo/zalrsc.  */
+  skip_zaamo_zalrsc = true;
+#endif

   for (subset = m_head; subset != NULL; subset = subset->next)
 {
@@ -954,6 +959,12 @@ riscv_subset_list::to_string (bool version_p) const
  subset->name == "zicsr")
continue;

+  if (skip_zaamo_zalrsc && subset->name == "zaamo")
+   continue;
+
+  if (skip_zaamo_zalrsc && subset->name == "zalrsc")
+   continue;
+
   /* For !version_p, we only separate extension with underline for
 multi-letter extension.  */
   if (!first &&
diff --git a/gcc/config.in b/gcc/config.in
index e41b6dc97cd..acab3c0f126 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -629,6 +629,12 @@
 #endif


+/* Define if the assembler understands -march=rv*_zaamo_zalrsc. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_MARCH_ZAAMO_ZALRSC
+#endif
+
+
 /* Define if the assembler understands -march=rv*_zifencei. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_MARCH_ZIFENCEI
diff --git a/gcc/configure b/gcc/configure
index aaf5899cc03..09b794c1225 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -6228,7 +6228,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -6274,7 +6274,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -6298,7 +6298,7 @@ rm -f core conftest.err conftest

Re: [Committed] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-12 Thread Palmer Dabbelt

On Wed, 12 Jun 2024 10:09:06 PDT (-0700), Patrick O'Neill wrote:


On 6/12/24 04:21, Andreas Schwab wrote:

On Jun 12 2024, Li, Pan2 wrote:


Do we need to upgrade the binutils of the riscv-gnu-toolchain repo? Or we may 
have unknown prefixed ISA extension `zaamo' when building.

There needs to be a configure check if binutils can grok the extension.


Ack. I'll make a patch for that.


Thanks.  That's how we usually handle this stuff, it keeps the world 
building.




In the meantime bumping binutils to tip-of-tree will resolve the build
issue.

Patrick


Re: [PATCH] RISC-V: Add min/max patterns for ifcvt.

2024-06-03 Thread Palmer Dabbelt

On Mon, 03 Jun 2024 11:50:54 PDT (-0700), jeffreya...@gmail.com wrote:



On 6/3/24 11:03 AM, Palmer Dabbelt wrote:



+;; Provide a minmax pattern for ifcvt to match.
+(define_insn "*_cmp_3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+    (if_then_else:X
+    (bitmanip_minmax_cmp_op
+    (match_operand:X 1 "register_operand" "r")
+    (match_operand:X 2 "register_operand" "r"))
+    (match_dup 1)
+    (match_dup 2)))]
+  "TARGET_ZBB"
+  "\t%0,%1,%z2"
+  [(set_attr "type" "")])


This is a bit different than how we're handling the other min/max type
attributes

    (define_insn "*3"
  [(set (match_operand:X 0 "register_operand" "=r")
    (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
   (match_operand:X 2 "reg_or_0_operand" "rJ")))]
  "TARGET_ZBB"
  "\t%0,%1,%z2"
  [(set_attr "type" "")])

but it looks like it ends up with the same types after all the iterators
(there's some "max vs smax" and "smax vs maxs" juggling, but IIUC it
ends up in the same place).  I think it'd be clunkier to try and use all
the same iterators, though, so

Reviewed-by: Palmer Dabbelt 

[I was wondering if we need the reversed, Jeff on the call says we
don't.  I couldn't figure out how to write a test for it.]

Right.  I had managed to convince myself that we didn't need the
reversed case.  I'm less sure now than I was earlier, but I'm also
confident that if the need arises we can trivially handle it.  At some
point there's canonicalization of the condition and that's almost
certainly what's making it hard to synthesize a testcase for the
reversed pattern.


Ya, no reason to block merging this on the reversed cases.  I just 
couldn't figure out if we should add them.



The other thing I pondered was whether or not we should support SImode
min/max on rv64.  It was critical for simplifying that abs2 routine in
x264, but I couldn't convince myself it was needed here.  So I just set
it aside and didn't mention it.


I think because we only have the DI compares that we wouldn't need the 
smaller modes?  Maybe we should add them, though, as the hueristics 
around DImode-ifying compares are fragile so we might trip over some 
long-tail performance issues that would be wacky for users to reason 
about.


Either way, also doesn't seem like a reason to block this one.



jeff


Re: [PATCH] RISC-V: Add min/max patterns for ifcvt.

2024-06-03 Thread Palmer Dabbelt

On Fri, 31 May 2024 08:07:11 PDT (-0700), Robin Dapp wrote:

Hi,

ifcvt likes to emit

(set
  (if_then_else)
(ge (reg 1) (reg2))
(reg 1)
(reg 2))

which can be recognized as min/max patterns in the backend.
This patch adds such patterns and the respective iterators as well as a
test.

This depends on the generic ifcvt change.


https://inbox.sourceware.org/gcc-patches/57bb6ce5-79c3-4b08-b524-e807b9ac4...@gmail.com/T/#u

in case anyone's looking for it.


Regtested on rv64gcv_zvfh_zicond_zbb_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/bitmanip.md (*_cmp_3):
New min/max ifcvt pattern.
* config/riscv/iterators.md (minu): New iterator.
* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
Remove riscv-specific adjustment.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-04.c: New test.
---
 gcc/config/riscv/bitmanip.md  | 13 +
 gcc/config/riscv/iterators.md |  8 
 gcc/config/riscv/riscv.cc |  3 --
 .../gcc.target/riscv/zbb-min-max-04.c | 47 +++
 4 files changed, 68 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..11102985796 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -547,6 +547,19 @@ (define_insn "*3"
   "\t%0,%1,%z2"
   [(set_attr "type" "")])

+;; Provide a minmax pattern for ifcvt to match.
+(define_insn "*_cmp_3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (if_then_else:X
+   (bitmanip_minmax_cmp_op
+   (match_operand:X 1 "register_operand" "r")
+   (match_operand:X 2 "register_operand" "r"))
+   (match_dup 1)
+   (match_dup 2)))]
+  "TARGET_ZBB"
+  "\t%0,%1,%z2"
+  [(set_attr "type" "")])


This is a bit different than how we're handling the other min/max type 
attributes


   (define_insn "*3"
 [(set (match_operand:X 0 "register_operand" "=r")
   (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
  (match_operand:X 2 "reg_or_0_operand" "rJ")))]
 "TARGET_ZBB"
 "\t%0,%1,%z2"
 [(set_attr "type" "")])

but it looks like it ends up with the same types after all the iterators 
(there's some "max vs smax" and "smax vs maxs" juggling, but IIUC it 
ends up in the same place).  I think it'd be clunkier to try and use all 
the same iterators, though, so


Reviewed-by: Palmer Dabbelt 

[I was wondering if we need the reversed, Jeff on the call says we 
don't.  I couldn't figure out how to write a test for it.]



+
 ;; Optimize the common case of a SImode min/max against a constant
 ;; that is safe both for sign- and zero-extension.
 (define_insn_and_split "*minmax"
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 8a9d1986b4a..2f7be6e83c1 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -202,6 +202,14 @@ (define_code_iterator bitmanip_bitwise [and ior])

 (define_code_iterator bitmanip_minmax [smin umin smax umax])

+(define_code_iterator bitmanip_minmax_cmp_op [lt ltu le leu ge geu gt gtu])
+
+; Map a comparison operator to a min or max.
+(define_code_attr bitmanip_minmax_cmp_insn [(lt "min") (ltu "minu")
+   (le "min") (leu "minu")
+   (ge "max") (geu "maxu")
+   (gt "max") (gtu "maxu")])
+
 (define_code_iterator clz_ctz_pcnt [clz ctz popcount])

 (define_code_iterator bitmanip_rotate [rotate rotatert])
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 13cd61a4a22..d17c0a260a2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4009,9 +4009,6 @@ riscv_noce_conversion_profitable_p (rtx_insn *seq,
 {
   struct noce_if_info riscv_if_info = *if_info;

-  riscv_if_info.original_cost -= COSTS_N_INSNS (2);
-  riscv_if_info.original_cost += insn_cost (if_info->jump, if_info->speed_p);
-
   /* Hack alert!  When `noce_try_store_flag_mask' uses `cstore4'
  to emit a conditional set operation on DImode output it comes up
  with a sequence such as:
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c
new file mode 100644
index 000..ebf1889075d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64

Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 16:50:52 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 5:43 PM, Palmer Dabbelt wrote:



I'm only reading Zicclsm as saying both scalar and vector misaligned
accesses are supported, but nothing about the performance.

I think it was in the vector docs.  It didn't say anything about
performance, just a note that scalar & vector behavior could differ.


Either way, the split naming scheme seems clearer to me.  It also avoids
getting mixed up by the no-scalar-misaligned, yes-vector-misaligned
systems if they ever show up.

So if Robin's OK with re-spinning things, let's just go that way?

Works for me.  Hopefully he's offline until Monday as it's rather late
for him :-)  So we'll pick it back up in the Tuesday meeting.


Cool, no rush on my end.



jeff


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 16:41:39 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 5:39 PM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 16:31:48 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 11:14 AM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that
using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to
encode this: either we treat scalar and vector as independent, or we
couple them.  If we treat them independently then we end up with four
cases, it's not clear if they're all interesting.  IIUC with this patch
we'd be able to encode

Given the ISA documents them as independent, I think we should follow
suit and allow them to vary independently.


I'm only reading Zicclsm as saying both scalar and vector misaligned
accesses are supported, but nothing about the performance.

I think it was in the vector docs.  It didn't say anything about
performance, just a note that scalar & vector behavior could differ.


Either way, the split naming scheme seems clearer to me.  It also avoids 
getting mixed up by the no-scalar-misaligned, yes-vector-misaligned 
systems if they ever show up.


So if Robin's OK with re-spinning things, let's just go that way?


Seems reasonable to me.  Just having a regular naming scheme for the
scalar/vector makes it clear what we're doing, and it's not like having
the extra name for -mscalar-strict-align really costs anything.

That was my thinking -- get the names right should help avoid confusion.

Jeff


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 16:31:48 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 11:14 AM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to
encode this: either we treat scalar and vector as independent, or we
couple them.  If we treat them independently then we end up with four
cases, it's not clear if they're all interesting.  IIUC with this patch
we'd be able to encode

Given the ISA documents them as independent, I think we should follow
suit and allow them to vary independently.


I'm only reading Zicclsm as saying both scalar and vector misaligned 
accesses are supported, but nothing about the performance.



* -mstrict-align: Both scalar and vector misaligned accesses are
  unsupported (-mrvv-allow-misalign doesn't matter).  I'm not sure if
  there's hardware there, but given we have systems that don't support
  scalar misaligned accesses it seems reasonable to assume they'll also
  not support vector misaligned accesses.
* -mno-strict-align -mno-rvv-allow-misalign: Scalar misaligned are
  supported, vector misaligned aren't supported.  This matches our best
  theory of how the k230 and k1 behave, so it also seems reasonable to
  support.
* -mno-strict-align -mrvv-allow-misalign: Both scalar and vector
  misaligned accesses are supported.  This seems reasonable to support
  as it's how I'd hope big cores end up being designed, though again
  there's no hardware.

I'd almost lean towards -m[no-]scalar-strict-align and
-m[no-]vector-strict-align and deprecate -mstrict-align (aliasing it to
the scalar alignment option).  But I'll go with consensus here.


Seems reasonable to me.  Just having a regular naming scheme for the 
scalar/vector makes it clear what we're doing, and it's not like having 
the extra name for -mscalar-strict-align really costs anything.



The fourth case is kind of wacky: scalar misaligned is unsupported,
vector misaligned is supported.  I'm not really sure why we'd end up
with a system like that, but HW vendors do wacky things so it's kind of
hard to predict.

I've worked on one of these :-)  The thinking from the designers was
unaligned scalar access just wasn't that important, particularly with
mem* and str* using the vector rather than scalar ops.


OK then ;)


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to 
encode this: either we treat scalar and vector as independent, or we 
couple them.  If we treat them independently then we end up with four 
cases, it's not clear if they're all interesting.  IIUC with this patch 
we'd be able to encode


* -mstrict-align: Both scalar and vector misaligned accesses are 
 unsupported (-mrvv-allow-misalign doesn't matter).  I'm not sure if 
 there's hardware there, but given we have systems that don't support 
 scalar misaligned accesses it seems reasonable to assume they'll also 
 not support vector misaligned accesses.
* -mno-strict-align -mno-rvv-allow-misalign: Scalar misaligned are 
 supported, vector misaligned aren't supported.  This matches our best 
 theory of how the k230 and k1 behave, so it also seems reasonable to 
 support.
* -mno-strict-align -mrvv-allow-misalign: Both scalar and vector 
 misaligned accesses are supported.  This seems reasonable to support 
 as it's how I'd hope big cores end up being designed, though again 
 there's no hardware.


The fourth case is kind of wacky: scalar misaligned is unsupported, 
vector misaligned is supported.  I'm not really sure why we'd end up 
with a system like that, but HW vendors do wacky things so it's kind of 
hard to predict.


IMO it's fine if we're defining that as an unencodeable case it's fine, 
we can always add something later.  We should just write it down so 
nobody's confused.



Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

This patch changes the default from always enabling movmisalign to
not enabling it.  It adds an option to override the default and adds
generic-ooo to the uarchs that support misaligned vector access.

It also adds a check_effective_target_riscv_v_misalign_ok to the
testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and make dependent on uarch and rvv_allow_misalign.
* config/riscv/riscv.opt: Add -mrvv-allow-unaligned.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
-mrvv-allow-misalign.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c:
---
 gcc/config/riscv/riscv-opts.h |  3 --
 gcc/config/riscv/riscv.cc | 18 ++
 gcc/config/riscv/riscv.h  |  6 
 gcc/config/riscv/riscv.opt|  5 +++
 gcc/doc/invoke.texi   |  5 +++
 .../costmodel/riscv/rvv/dynamic-lmul2-7.c |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-10.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-11.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-12.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-8.c   |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-9.c   |  2 +-
 .../riscv/rvv/autovec/vls/misalign-1.c|  2 +-
 gcc/testsuite/lib/target-supports.exp | 34 +--
 13 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))

-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git 

Re: [PATCH] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 07:30:20 PDT (-0700), Robin Dapp wrote:

Hi,

this patch changes the default from always enabling movmisalign to
disabling it.  It adds an option to override the default and adds
generic-ooo to the uarchs that support misaligned vector access.

It also adds a check_effective_target_riscv_v_misalign_ok to the
testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.  I haven't actually tested it on a target that does not
support misaligned vector loads, though.

Regtested on rv64gcv_zvfh_zvbb.  There are a few additional
failures in the rvv testsuite.  They are caused by us overwriting
the default vectorizer flags rather than appending.  I'm going to
fix them in a subsequent patch but for now I'd rather get things
rolling.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and make dependent on uarch and rvv_allow_misalign.
* config/riscv/riscv.opt: Add -mrvv-allow-unaligned.


We should have something in doc/invoke too, this one is going to be 
tricky for users.  We'll also have to define how this interacts with the 
existing -mstrict-align.



gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
---
 gcc/config/riscv/riscv-opts.h |  3 ---
 gcc/config/riscv/riscv.h  |  5 
 gcc/config/riscv/riscv.opt|  5 
 gcc/testsuite/lib/target-supports.exp | 34 +--
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))

-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index d6b14c4d620..8434e5677b6 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -934,6 +934,11 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   || (riscv_microarchitecture == sifive_p400) \
   || (riscv_microarchitecture == sifive_p600))

+/* True if the target supports misaligned vector loads and stores.  */
+#define TARGET_VECTOR_MISALIGN_SUPPORTED \
+  (rvv_allow_misalign == 1 \
+   || riscv_microarchitecture == generic_ooo)


We should probably just stick it in a tune struct instead?  That seems 
cleaner than matching on the exact uarch.



+
 #define LOGICAL_OP_NON_SHORT_CIRCUIT 0

 /* Control the assembler format that we output.  */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 87f58332016..cff34eee8c9 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -628,3 +628,8 @@ Specify TLS dialect.
 mfence-tso
 Target Var(TARGET_FENCE_TSO) Init(1)
 Specifies whether the fence.tso instruction should be used.
+
+mrvv-allow-misalign
+Target Var(rvv_allow_misalign) Init(0)
+Allow the creation of misaligned vector loads and stores irrespective of the
+current uarch. The default is off.


IMO we should be explicit here about these being element-misaligned 
accesses, not register-misaligned accesses.  I don't want to get roped 
into handling register-misaligned accesses under the same flag, that 
would be a whole different flavor of codegen.



diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f0f6da52275..ebb908f5c8f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2034,7 +2034,7 @@ proc check_effective_target_riscv_zvfh_ok { } {
 # check if we can execute vector insns with the given hardware or
 # simulator
 set gcc_march [regsub {[[:alnum:]]*} [riscv_get_arch] ]
-if { [check_runtime ${gcc_march}_exec {
+if { [check_runtime ${gcc_march}_zvfh_exec {
int main()
{
asm ("vsetivli zero,8,e16,m1,ta,ma");
@@ -2047,6 +2047,29 @@ proc check_effective_target_riscv_zvfh_ok { } {
 return 0
 }

+# Return 1 if we can load a vector from a 1-byte aligned address.
+
+proc check_effective_target_riscv_v_misalign_ok { } {
+
+if { ![check_effective_target_riscv_v_ok] } {
+   return 0
+}
+
+set gcc_march [riscv_get_arch]
+if { [check_runtime ${gcc_march}_misalign_exec {
+ int main() {
+ unsigned char a[16]
+   = 

Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Palmer Dabbelt

On Wed, 22 May 2024 12:02:26 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 12:15 PM, Palmer Dabbelt wrote:

On Wed, 22 May 2024 11:01:16 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 6:47 AM, Jivan Hakobyan wrote:

After 8367c996e55b2 commit several checks on round_32.c test started to
fail.
The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test

I wonder if this test even makes sense for rv32 anymore given we can't
do a DF->DI as a single instruction and DF->SI is going to give
incorrect results.  So the underlying optimization to improve those
rounding cases just doesn't apply to DF mode objects for rv32.

Thoughts?


Unless I'm missing something, we should still be able to do the float
roundings on rv32?

I initially thought that as well.  The problem is we don't have a DF->DI
conversion instruction for rv32.  We can't use DF->SI as the range of
representable values is wrong.


Ya, right.  I guess we'd need to be calling roundf(), not round(), for 
those?  So maybe we should adjust the tests to do that?



I think with Zfa we'd also have testable sequences for the double/double
and float/float roundings, which could be useful to test.  I'm not
entirely sure there, though, as I always get a bit lost in which FP
rounding flavors map down.

Zfa is a different story as it has instructions with the proper
semantics ;-)  We'd just emit those new instructions and wouldn't have
to worry about the initial range test.


and I guess that'd just be an entirely different set of scan-assembly 
sets than round_32 or round_64, so maybe it's not a reason to keep these 
around.



I'd also kicked off some run trying to promote these to executable
tests.   IIRC it was just DG stuff (maybe just adding a `dg-do run`?)
but I don't know where I stashed the results...

Not a bad idea, particularly if we test the border cases.


Ya, makes sense -- I guess the current values aren't that exciting for 
execution, but we could just add some more interesting ones...



jeff


Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Palmer Dabbelt

On Wed, 22 May 2024 11:01:16 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 6:47 AM, Jivan Hakobyan wrote:

After 8367c996e55b2 commit several checks on round_32.c test started to
fail.
The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test

I wonder if this test even makes sense for rv32 anymore given we can't
do a DF->DI as a single instruction and DF->SI is going to give
incorrect results.  So the underlying optimization to improve those
rounding cases just doesn't apply to DF mode objects for rv32.

Thoughts?


Unless I'm missing something, we should still be able to do the 
float roundings on rv32?


I think with Zfa we'd also have testable sequences for the double/double 
and float/float roundings, which could be useful to test.  I'm not 
entirely sure there, though, as I always get a bit lost in which FP 
rounding flavors map down.


I'd also kicked off some run trying to promote these to executable 
tests.   IIRC it was just DG stuff (maybe just adding a `dg-do run`?) 
but I don't know where I stashed the results...



Jeff


Re: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

2024-05-17 Thread Palmer Dabbelt

On Fri, 17 May 2024 15:37:43 PDT (-0700), juzhe.zh...@rivai.ai wrote:

I think it should be backport to GCC-14 since it is a bug.


Seems reasonable to me -- I guess in theory those extended scalar 
patterns aren't bug fixes and we should split them out, but I don't 
think it's all that big of a deal.  We'd likely just backport them to 
the performance branch anyway, so it's essentially the same on my end.






juzhe.zh...@rivai.ai
 
From: Robin Dapp

Date: 2024-05-17 23:24
To: gcc-patches
CC: palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.
Hi,
 
vwadd.wx and vwsub.wx have the same problem vfwadd.wf had.  This patch

splits the insn pattern in the same way vfwadd.wf was split.
 
It also adds two patterns to recognize extended scalars.  In practice

those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.  If somebody
considers the patterns excessive, I'd be open to not add them.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards

Robin
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and

add extended_scalar patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx

tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.
---
gcc/config/riscv/vector.md| 62 ---
.../gcc.target/riscv/rvv/base/pr115068-run.c  | 24 +--
.../gcc.target/riscv/rvv/base/pr115068.c  | 26 
.../gcc.target/riscv/rvv/base/vwaddsub-1.c| 47 ++
4 files changed, 127 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vwaddsub-1.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md

index 107914afa3a..248461302dd 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3900,27 +3900,71 @@ (define_insn 
"@pred_single_widen_add"
(set_attr "mode" "")])
(define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, vr, 
vr")
(if_then_else:VWEXTI
  (unspec:
- [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
-  (match_operand 5 "vector_length_operand"  "   rK,   rK")
-  (match_operand 6 "const_int_operand"  "i,i")
-  (match_operand 7 "const_int_operand"  "i,i")
-  (match_operand 8 "const_int_operand"  "i,i")
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTI
- (match_operand:VWEXTI 3 "register_operand" "   vr,   vr")
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr")
(any_extend:VWEXTI
  (vec_duplicate:
- (match_operand: 4 "reg_or_0_operand"   "   rJ,   rJ"
-   (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,0")))]
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ"
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
   "TARGET_VECTOR"
   "vw.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "vi")
(set_attr "mode" "")])
+(define_insn "@pred_single_widen_add_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, vr, 
vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (plus:VWEXTI
+ (vec_duplicate:VWEXTI
+   (any_extend:
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ")))
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr"))
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
+  "TARGET_VECTOR"
+  "vwadd.wx\t%0,%3,%z4%p1"
+  [(set_attr "type" "viwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, vr, 
vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ 

Re: [committed][wwwdocs] gcc-12/changes.html: Document RISC-V changes

2024-05-17 Thread Palmer Dabbelt

On Fri, 17 May 2024 14:30:49 PDT (-0700), ger...@pfeifer.com wrote:

On Thu, 28 Apr 2022, Kito Cheng wrote:

---
 htdocs/gcc-12/changes.html | 13 -

:

+New ISA extension support for vector and scalar crypto was added, only
+   support architecture testing marco and -march= 
parsing.


I realized I'm not sure I understand what the second part ("only
support...") means.

That for the time being (back then) only the macros and -march parsing
were supported?


Ya, I guess it's kind of an odd phrasing.  Maybe it should be something 
like


   The vector and scalar crypto extensions are now accepted in ISA 
   strings via the -march argument.  Note that enabling these 
   extensions will only set the coorespending feature test macros and 
   enable assembler support, they don't yet generate binaries with the 
   instructions added in these extensions.



Gerald


[PATCH gcc-13] Fix RISC-V missing stack tie

2024-05-16 Thread Palmer Dabbelt
From: Jeff Law 

As some of you know, Raphael has been working on stack-clash support for the
RISC-V port.  A little while ago Florian reached out to us with an issue where
glibc was failing its smoke test due to referencing an unallocated stack slot.

Without diving into the code in detail I (incorrectly) concluded it was a
problem with the fallback of using Ada's stack-check paths due to not having
stack-clash support.

Once enough stack-clash bits were ready I had Raphael review the code generated
for Florian's test and we concluded the the original case from Florian was just
wrong irrespective of stack clash/stack check.  While Raphael's stack-clash
work will indirectly fix Florian's case, it really should also work without
stack-clash.

In particular this code was called out by valgrind:

> 0003cb5e :
> __GI___realpath():
>3cb5e:   81010113addisp,sp,-2032
>3cb62:   7d313423sd  s3,1992(sp)
>3cb66:   79fdlui s3,0xf
>3cb68:   7e813023sd  s0,2016(sp)
>3cb6c:   7c913c23sd  s1,2008(sp)
>3cb70:   7f010413addis0,sp,2032
>3cb74:   35098793addia5,s3,848 # f350 
> <__libc_initial+0xffe8946a>
>3cb78:   74fdlui s1,0xf
>3cb7a:   008789b3add s3,a5,s0
>3cb7e:   f9048793addia5,s1,-112 # ef90 
> <__libc_initial+0xffe890aa>
>3cb82:   008784b3add s1,a5,s0
>3cb86:   77fdlui a5,0xf
>3cb88:   7d413023sd  s4,1984(sp)
>3cb8c:   7b513c23sd  s5,1976(sp)
>3cb90:   7e113423sd  ra,2024(sp)
>3cb94:   7d213823sd  s2,2000(sp)
>3cb98:   7b613823sd  s6,1968(sp)
>3cb9c:   7b713423sd  s7,1960(sp)
>3cba0:   7b813023sd  s8,1952(sp)
>3cba4:   79913c23sd  s9,1944(sp)
>3cba8:   79a13823sd  s10,1936(sp)
>3cbac:   79b13423sd  s11,1928(sp)
>3cbb0:   34878793addia5,a5,840 # f348 
> <__libc_initial+0xffe89462>
>3cbb4:   4713li  a4,1024
>3cbb8:   00132a17auipc   s4,0x132
>3cbbc:   ae0a3a03ld  s4,-1312(s4) # 16e698 
> <__stack_chk_guard>
>3cbc0:   01098893addia7,s3,16
>3cbc4:   42098693addia3,s3,1056
>3cbc8:   b8040a93addis5,s0,-1152
>3cbcc:   97a2add a5,a5,s0
>3cbce:   000a3603ld  a2,0(s4)
>3cbd2:   f8c43423sd  a2,-120(s0)
>3cbd6:   4601li  a2,0
>3cbd8:   3d14b023sd  a7,960(s1)
>3cbdc:   3ce4b423sd  a4,968(s1)
>3cbe0:   7cd4b823sd  a3,2000(s1)
>3cbe4:   7ce4bc23sd  a4,2008(s1)
>3cbe8:   b7543823sd  s5,-1168(s0)
>3cbec:   b6e43c23sd  a4,-1160(s0)
>3cbf0:   e38csd  a1,0(a5)
>3cbf2:   b0010113addisp,sp,-1280
In particular note the store at 0x3cbd8.  That's hitting (s1 + 960). If you
chase the values around, you'll find it's a bit more than 1k into unallocated
stack space.  It's also worth noting the final stack adjustment at 0x3cbf2.

While I haven't reproduced Florian's code exactly, I was able to get reasonably
close and verify my suspicion that everything was fine before sched2 and
incorrect after sched2.  It was also obvious at that point what had gone wrong
-- we were missing a stack tie after the final stack pointer adjustment.

This patch adds the missing stack tie.

While not technically a regression, I shudder at the thought of chasing one of
these issues down again in the wild.  Been there, done that.

Regression tested on rv64gc.  Verified the scheduler no longer mucked up
realpath by hand.  Pushing to the trunk.

gcc/
* config/riscv/riscv.cc (riscv_expand_prologue): Add missing stack
tie for scalable and final stack adjustment if needed.

Co-authored-by: Raphael Zinsly 

(cherry picked from commit c65046ff2ef0a9a46e59bc0b3369b2d226f6a239)
---
I've only build tested this one, but it's tripping up some of the Fedora
folks here https://bugzilla.redhat.com/show_bug.cgi?id=2242327 so I
figured it's worth backporting.
---
 gcc/config/riscv/riscv.cc | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 

[PATCH] RISC-V: Implement -m{,no}fence-tso

2024-05-14 Thread Palmer Dabbelt
Some processors from T-Head don't implement the `fence.tso` instruction
natively and instead trap to firmware.  This breaks some users who
haven't yet updated the firmware and one could imagine it breaking users
who are trying to build firmware if they're using the C memory model.

So just add an option to disable emitting it, in a similar fashion to
how we allow users to forbid other instructions.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add -mno-fence-tso.
* config/riscv/sync-rvwmo.md (mem_thread_fence_rvwmo): Respect
-mno-fence-tso.
* doc/invoke.texi (RISC-V): Document -mno-fence-tso.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1070959
---
I've just smoke tested this one, but

void func(void) { __atomic_thread_fence(__ATOMIC_ACQ_REL); }

generates `fence.tso` without the argument and `fence rw,rw` with
`-mno-fence-tso`, so it seems to be at least mostly there.  I figured
I'd just send it up for comments before putting together the DG bits:
it's kind of a pain to carry around these workarounds for unimplemented
instructions, but it's in HW so there's not much we can do about that.
---
 gcc/config/riscv/riscv.opt | 4 
 gcc/config/riscv/sync-rvwmo.md | 2 +-
 gcc/doc/invoke.texi| 8 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 1252834aec5..fb8dac3df80 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -622,3 +622,7 @@ Enum(tls_type) String(desc) Value(TLS_DESCRIPTORS)
 mtls-dialect=
 Target RejectNegative Joined Enum(tls_type) Var(riscv_tls_dialect) 
Init(TLS_TRADITIONAL) Save
 Specify TLS dialect.
+
+mfence-tso
+Target Var(TARGET_FENCE_TSO) Init(1)
+Specifies whether the fence.tso instruction should be used.
diff --git a/gcc/config/riscv/sync-rvwmo.md b/gcc/config/riscv/sync-rvwmo.md
index d4fd26069f7..e639a1e2392 100644
--- a/gcc/config/riscv/sync-rvwmo.md
+++ b/gcc/config/riscv/sync-rvwmo.md
@@ -33,7 +33,7 @@ (define_insn "mem_thread_fence_rvwmo"
 if (model == MEMMODEL_SEQ_CST)
return "fence\trw,rw";
 else if (model == MEMMODEL_ACQ_REL)
-   return "fence.tso";
+   return TARGET_FENCE_TSO ? "fence.tso" : "fence\trw,rw";
 else if (model == MEMMODEL_ACQUIRE)
return "fence\tr,rw";
 else if (model == MEMMODEL_RELEASE)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ddcd5213f06..90b329b674b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1243,6 +1243,7 @@ See RS/6000 and PowerPC Options.
 -mplt  -mno-plt
 -mabi=@var{ABI-string}
 -mfdiv  -mno-fdiv
+-mfence-tso  -mno-fence-tso
 -mdiv  -mno-div
 -misa-spec=@var{ISA-spec-string}
 -march=@var{ISA-string}
@@ -30384,6 +30385,13 @@ Do or don't use hardware floating-point divide and 
square root instructions.
 This requires the F or D extensions for floating-point registers.  The default
 is to use them if the specified architecture has these instructions.
 
+@opindex mfence-tso
+@item -mfence-tso
+@itemx -mno-fence-tso
+Do or don't use the @samp{fence.tso} instruction, which is unimplemented on
+some processors (including those from T-Head).  If the @samp{fence.tso}
+instruction is not availiable then a stronger fence will be used instead.
+
 @opindex mdiv
 @item -mdiv
 @itemx -mno-div
-- 
2.45.0



Re: Follow up #1 (was Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265])

2024-05-14 Thread Palmer Dabbelt

On Mon, 13 May 2024 16:08:21 PDT (-0700), Vineet Gupta wrote:



On 5/13/24 15:47, Jeff Law wrote:

On 5/13/24 11:49, Vineet Gupta wrote:

  500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
  500.perlbench_r-1 |740,383,419,739 |   739,280,308,163 |
  500.perlbench_r-2 |692,074,638,817 |   691,118,734,547 |
  502.gcc_r-0   |190,820,141,435 |   190,857,065,988 |
  502.gcc_r-1   |225,747,660,839 |   225,809,444,357 | <- -0.02%
  502.gcc_r-2   |220,370,089,641 |   220,406,367,876 | <- -0.03%
  502.gcc_r-3   |179,111,460,458 |   179,135,609,723 | <- -0.02%
  502.gcc_r-4   |219,301,546,340 |   219,320,416,956 | <- -0.01%
  503.bwaves_r-0|278,733,324,691 |   278,733,323,575 | <- -0.01%
  503.bwaves_r-1|442,397,521,282 |   442,397,519,616 |
  503.bwaves_r-2|344,112,218,206 |   344,112,216,760 |
  503.bwaves_r-3|417,561,469,153 |   417,561,467,597 |
  505.mcf_r |669,319,257,525 |   669,318,763,084 |
  507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%

The small gcc regression seems like a tooling issue of some sort.
Looking at the topblocks, the insn sequences are exactly the same, only
the counts differ and its not obvious why.
Here's for gcc_r-1.


 > Block 0 @ 0x170ca, 12 insns, 87854493 times, 0.47%:

 000170ca :
    170ca:    7179        add    sp,sp,-48
    170cc:    ec26        sd    s1,24(sp)
    170ce:    e84a        sd    s2,16(sp)
    170d0:    e44e        sd    s3,8(sp)
    170d2:    f406        sd    ra,40(sp)
    170d4:    f022        sd    s0,32(sp)
    170d6:    84aa        mv    s1,a0
    170d8:    03200913      li    s2,50
    170dc:    03d00993      li    s3,61
    170e0:    8526        mv    a0,s1
    170e2:    001cd097      auipc    ra,0x1cd
    170e6:    bac080e7      jalr    -1108(ra) # 1e3c8e
 

 > Block 1 @ 0x706d0a, 3 insns, 274713936 times, 0.37%:
 >  Block 2 @ 0x1e3c8e, 9 insns, 88507109 times, 0.35%:
 ...

 < Block 0 @ 0x170ca, 12 insns, 87869602 times, 0.47%:
 < Block 1 @ 0x706d42, 3 insns, 274608893 times, 0.36%:
 < Block 2 @ 0x1e3c94, 9 insns, 88526354 times, 0.35%:


FWIW, Greg internally has been looking at some of this and found some
issues in the bbv tooling, but I wish all of this was  shared/upstream
(QEMU bbv plugin) for people to compare notes and not discover/fix the
same issues over and again.

Yea, we all meant to coordinate on those plugins.  The one we've got had
some problems with hash collisions and when there's a hash collision it
just produces total junk data.  I chased a few of these down and fixed
them about a year ago.

The other thing is qemu will split up blocks based on its internal
notion of a translation page.   So if you're looking at block level data
you'll stumble over that as well.  This aspect is the most troublesome
problem I'm aware of right now.


And these two are exactly what Greg fixed, among others :-)


IIRC the plan was for Jeff to send his version to the QEMU lists so we 
can talk about it over there.  Do you want us to just send Greg's 
version instead?  It's all based on the same original patch from the 
QEMU lists, just with possibly-different set of fixes.




-Vineet


[PATCH gcc-13-backport] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

2024-05-08 Thread Palmer Dabbelt
From: Yanzhang Wang 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_save_reg_p): Save ra for leaf
when enabling -mno-omit-leaf-frame-pointer
(riscv_option_override): Override omit-frame-pointer.
(riscv_frame_pointer_required): Save s0 for non-leaf function
(TARGET_FRAME_POINTER_REQUIRED): Override defination
* config/riscv/riscv.opt: Add option support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/omit-frame-pointer-1.c: New test.
* gcc.target/riscv/omit-frame-pointer-2.c: New test.
* gcc.target/riscv/omit-frame-pointer-3.c: New test.
* gcc.target/riscv/omit-frame-pointer-4.c: New test.
* gcc.target/riscv/omit-frame-pointer-test.c: New test.

Signed-off-by: Yanzhang Wang 
(cherry picked from commit 39663298b5934831a0125e12f113ebd83248c3be)
---
I haven't tested this (just an all-gcc build), but I figured I'd just
send it now as it's kind of a grey area for backports: the flag itself
is a new feature, but it also fixes a compatibility issue with the psABI
-- which itself is a grey area, as the psABI change was a retrofit and is
marked as optional.  I'd test it before pushing it, but this is one of
those things where I'm not really sure what the backporting rules
indicate we should do.

There's more discussion on this LKML thread:
https://lore.kernel.org/linux-riscv/527dd4d8-f1e5-4581-b1e3-aa315fea8...@sifive.com/T/#mf15ccc659b7b8b838b88959fbea460210875eb9c

That also has a much smaller fix, but having the whole argument seems
like a nicer user interface to me -- then users who really want
compatibility with the psABI's section on frame records can just ask for
it directly (via the odd spelling `-fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer`, but too late to change that).

Thoughts on this for 13?

We'd probably also want it all the way back to 11, but I assume that's
going to be the same discussion.
---
 gcc/config/riscv/riscv.cc | 34 ++-
 gcc/config/riscv/riscv.opt|  4 +++
 .../gcc.target/riscv/omit-frame-pointer-1.c   |  7 
 .../gcc.target/riscv/omit-frame-pointer-2.c   |  7 
 .../gcc.target/riscv/omit-frame-pointer-3.c   |  7 
 .../gcc.target/riscv/omit-frame-pointer-4.c   |  7 
 .../riscv/omit-frame-pointer-test.c   | 13 +++
 7 files changed, 78 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index cefd3b7b2b2..e8572f8739d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -408,6 +408,10 @@ static const struct riscv_tune_info 
riscv_tune_info_table[] = {
 #include "riscv-cores.def"
 };
 
+/* Global variable to distinguish whether we should save and restore s0/fp for
+   function.  */
+static bool riscv_save_frame_pointer;
+
 void riscv_frame_info::reset(void)
 {
   total_size = 0;
@@ -4786,7 +4790,11 @@ riscv_save_reg_p (unsigned int regno)
   if (regno == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)
 return true;
 
-  if (regno == RETURN_ADDR_REGNUM && crtl->calls_eh_return)
+  /* Need not to use ra for leaf when frame pointer is turned off by option
+ whatever the omit-leaf-frame's value.  */
+  bool keep_leaf_ra = frame_pointer_needed && crtl->is_leaf
+&& !TARGET_OMIT_LEAF_FRAME_POINTER;
+  if (regno == RETURN_ADDR_REGNUM && (crtl->calls_eh_return || keep_leaf_ra))
 return true;
 
   /* If this is an interrupt handler, then must save extra registers.  */
@@ -6316,6 +6324,21 @@ riscv_option_override (void)
   if (flag_pic)
 riscv_cmodel = CM_PIC;
 
+  /* We need to save the fp with ra for non-leaf functions with no fp and ra
+ for leaf functions while no-omit-frame-pointer with
+ omit-leaf-frame-pointer.  The x_flag_omit_frame_pointer has the first
+ priority to determine whether the frame pointer is needed.  If we do not
+ override it, the fp and ra will be stored for leaf functions, which is not
+ our wanted.  */
+  riscv_save_frame_pointer = false;
+  if (TARGET_OMIT_LEAF_FRAME_POINTER_P (global_options.x_target_flags))
+{
+  if (!global_options.x_flag_omit_frame_pointer)
+   riscv_save_frame_pointer = true;
+
+  global_options.x_flag_omit_frame_pointer = 1;
+}
+
   /* We get better code with explicit relocs for CM_MEDLOW, but
  worse code for the others (for now).  Pick the best default.  */
   if ((target_flags_explicit & MASK_EXPLICIT_RELOCS) == 0)
@@ -7235,6 +7258,12 @@ riscv_lshift_subword (machine_mode mode, rtx value, rtx 
shift,
  gen_lowpart (QImode, shift)));

Re: [committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

2024-05-07 Thread Palmer Dabbelt
On Tue, 07 May 2024 14:18:36 PDT (-0700), Jeff Law wrote:
> This is almost exclusively work from the VRULL team.
>
> As we've discussed in the Tuesday meeting in the past, we'd like to have
> a knob in the tuning structure to indicate that overlapped stores during
> move_by_pieces expansion of memcpy & friends are acceptable.
>
> This patch adds the that capability in our tuning structure.  It's off
> for all the uarchs upstream, but we have been using it inside Ventana
> for our uarch with success.  So technically it's NFC upstream, but puts
> in the infrastructure multiple organizations likely need.
>
>
> Built and tested rv64gc.  Pushing to the trunk shortly.
> jeff
> commit 300393484dbfa9fd3891174ea47aa3fb41915abc
> Author: Christoph Müllner 
> Date:   Tue May 7 15:16:21 2024 -0600
>
> [committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P
>
> This is almost exclusively work from the VRULL team.
>
> As we've discussed in the Tuesday meeting in the past, we'd like to have 
> a knob
> in the tuning structure to indicate that overlapped stores during
> move_by_pieces expansion of memcpy & friends are acceptable.
>
> This patch adds the that capability in our tuning structure.  It's off 
> for all
> the uarchs upstream, but we have been using it inside Ventana for our 
> uarch
> with success.  So technically it's NFC upstream, but puts in the 
> infrastructure
> multiple organizations likely need.
>
> gcc/
>
> * config/riscv/riscv.cc (struct riscv_tune_param): Add new
> "overlap_op_by_pieces" field.
> (rocket_tune_info, sifive_7_tune_info): Set it.
> (sifive_p400_tune_info, sifive_p600_tune_info): Likewise.
> (thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise.
> (generic_ooo_tune_info, optimize_size_tune_info): Likewise.
> (riscv_overlap_op_by_pieces): New function.
> (TARGET_OVERLAP_OP_BY_PIECES_P): define.
>
> gcc/testsuite/
>
> * gcc.target/riscv/memcpy-nonoverlapping.c: New test.
> * gcc.target/riscv/memset-nonoverlapping.c: New test.
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 545e68566dc..a9b57d41184 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -288,6 +288,7 @@ struct riscv_tune_param
>unsigned short fmv_cost;
>bool slow_unaligned_access;
>bool use_divmod_expansion;
> +  bool overlap_op_by_pieces;
>unsigned int fusible_ops;
>const struct cpu_vector_cost *vec_costs;
>  };
> @@ -427,6 +428,7 @@ static const struct riscv_tune_param rocket_tune_info = {
>8, /* fmv_cost */
>true,  /* 
> slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* overlap_op_by_pieces */
>RISCV_FUSE_NOTHING,   /* fusible_ops */
>NULL,  /* vector cost */
>  };
> @@ -444,6 +446,7 @@ static const struct riscv_tune_param sifive_7_tune_info = 
> {
>8, /* fmv_cost */
>true,  /* 
> slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* overlap_op_by_pieces */
>RISCV_FUSE_NOTHING,   /* fusible_ops */
>NULL,  /* vector cost */
>  };
> @@ -461,6 +464,7 @@ static const struct riscv_tune_param 
> sifive_p400_tune_info = {
>4, /* fmv_cost */
>true,  /* 
> slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* overlap_op_by_pieces */
>RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
>_vector_cost,  /* vector cost */
>  };
> @@ -478,6 +482,7 @@ static const struct riscv_tune_param 
> sifive_p600_tune_info = {
>4, /* fmv_cost */
>true,  /* 
> slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* overlap_op_by_pieces */
>RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
>_vector_cost,  /* vector cost */
>  };
> @@ -495,6 +500,7 @@ static const struct riscv_tune_param thead_c906_tune_info 
> = {
>8, /* fmv_cost */
>false,/* slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* 

Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Palmer Dabbelt
[+Adhemerval and Letu, who handled the glibc side of things, in case 
they have any more context.]


On Tue, 07 May 2024 07:11:08 PDT (-0700), jwak...@redhat.com wrote:

On Tue, 7 May 2024 at 15:06, Jonathan Wakely wrote:


On Tue, 7 May 2024 at 14:57, Jeff Law wrote:
>
>
>
> On 5/7/24 7:49 AM, Jonathan Wakely wrote:
> > Do we want this change for RISC-V, to fix PR113578?
> >
> > I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
> > anything).
> >
> > -- >8 --
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/113578
> >   * include/std/ostream (operator<<(basic_ostream&, float)):
> >   Restore signbit after converting to double.
> No strong opinion. One could argue that the existence of a
> conditional like that inherently implies the generic code is dependent
> on specific processor behavior which probably is unwise.  But again, no
> strong opinion.

Yes, but I'm not aware of any other processors that lose the signbit
like this, so in practice it's always worked fine to cast the float to
double.


The similar glibc fix for strfrom is specific to RISC-V:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0cc0033ef19bd3378445c2b851e53d7255cb1b1e


I missed the glibc patch, but IIUC the issue here is NaN 
canonicalization losing sign bits.  Presumably it's OK to lose the other 
bits?  Otherwise we'd need some different twiddling.


Either way, I think having the signed-NaN-preserving conversion is 
reasonable as it's what users are going to expect (even if it's only 
recommended by IEEE).  So


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

in case you want to pick it up.  I guess we should backport this too?

Maybe we should also have some sort of arch-independent 
`double __builtin_float_to_double_with_nan_sign_bits(float)` sort of 
thing?  Then we could just use it everywhere rather than duplicating 
this logic all over the place.



My patch uses copysign unconditionally, to avoid branching on isnan. I
don't know if that's the right choice.


IMO it's fine: it looks like this can get inlined so having the slightly 
shorter code sequence would help, and it's on an IO path so I doubt 
unconditionally executing the extra conversion instructions really 
matters.


Re: [PATCH v1] RISC-V: Adjust overlap attr after revert d3544cea63d and e65aaf8efe1

2024-04-23 Thread Palmer Dabbelt

On Tue, 23 Apr 2024 07:45:03 PDT (-0700), Patrick O'Neill wrote:

Hi Pan,

Sorry about that. It looks like there was difference between my local
machine and CI machine.

 From the CI it looks like we're back to the failure list we had on friday.

I'll do some local testing to manually confirm this.


Awesome, thanks. In the patchwork meeting, Kito was mentioning possibly 
wanting to revert some more of these widening ops?  If that's still the 
case we should get something on the lists as soon as we can, it's really 
late in the cycle already.




Thanks,
Patrick

On 4/22/24 23:50, Li, Pan2 wrote:


Hi Patrick,

After some investigation and double confirm, I think the 
gcc.dg/graphite/pr111878.c ice may have nothing to do
with the patches of revert series as it exists for quit a while. It may related 
to below commit

2e7abd09621a4401d44f4513adf126bce4b4828b RISC-V: Block VLSmodes according to 
TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR

Could you please help to double check about it *manually*? Here is my step(s) 
for your reference and I will take care of this failure soon.

../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc (GCC) 14.0.0 20231205 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

1. download isl-0.24, let isl -> /some-where/riscv-gnu-toolchain/gcc/isl-0.24
2. mkdir __BUILD__ && cd __BUILD__ && ../configure \
   --target=riscv64-unknown-elf \
   --prefix=${INSTALL_DIR} \
   --disable-shared \
   --enable-threads \
   --enable-tls \
   --enable-languages=c,c++,fortran \
   --with-system-zlib \
   --with-newlib \
   --disable-libmudflap \
   --disable-libssp \
   --disable-libquadmath \
   --disable-libgomp \
   --enable-nls \
   --disable-tm-clone-registry \
   --src=`pwd`/../ \
   --with-abi=lp64d \
   --with-arch=rv64gcv \
   --with-tune=rocket \
   --with-isa-spec=20191213 \
   CFLAGS_FOR_BUILD="-O0 -g" \
   CXXFLAGS_FOR_BUILD="-O0 -g" \
   CFLAGS_FOR_TARGET="-O0  -g" \
   CXXFLAGS_FOR_TARGET="-O0 -g" \
   BOOT_CFLAGS="-O0 -g" \
   CFLAGS="-O0 -g" \
   CXXFLAGS="-O0 -g" \
   GM2FLAGS_FOR_TARGET="-O0 -g" \
   GOCFLAGS_FOR_TARGET="-O0 -g" \
   GDCFLAGS_FOR_TARGET="-O0 -g"
make -j $(nproc) all-gcc && make install-gcc
3. ../__RISC-V_INSTALL___RV64/bin/riscv64-unknown-elf-gcc 
gcc/testsuite/gcc.dg/graphite/pr111878.c -O3 -fgraphite-identity 
-fsave-optimization-record -march=rv64gcv -mabi=lp64d -c -S -o -

Pan

-Original Message-
From: Li, Pan2
Sent: Tuesday, April 23, 2024 10:32 AM
To: Patrick O'Neill ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v1] RISC-V: Adjust overlap attr after revert d3544cea63d 
and e65aaf8efe1

Thanks Patrick.

Turn out that the make report cannot tell the error like below and then the 
graphite.exp test will never run.
That explains why I missed test failures, will take care of it ASAP.

sorry, unimplemented: Graphite loop optimizations cannot be used (isl is not 
available)

Pan

-Original Message-
From: Patrick O'Neill 
Sent: Tuesday, April 23, 2024 8:32 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Adjust overlap attr after revert d3544cea63d 
and e65aaf8efe1

This patch in particular does not cause any more regressions. It's one
of the other reverts from the weekend.

Before the reverts [1]:
                      |  gcc |
g++ | gfortran |
      rv64gcv/  lp64d/ medlow |   48/    32 |     12/    3|   12 /    2

After the reverts:
                      |  gcc |
g++ | gfortran |
      rv64gcv/  lp64d/ medlow |   50 /    33 |   12 / 3 |   26 / 7 |


gcc new fails:
FAIL: gcc.dg/graphite/pr111878.c (internal compiler error: in
extract_insn, at recog.cc:2812)
FAIL: gcc.dg/graphite/pr111878.c (test for excess errors)

gfortran new fails:
FAIL: gfortran.dg/graphite/id-27.f90   -O  (internal compiler error: in
extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/id-27.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (internal compiler error:
in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr14741.f90   -O  (test for excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (internal compiler
error: in extract_insn, at recog.cc:2812)
FAIL: gfortran.dg/graphite/pr29581.f90   -O3 -g  (test for excess errors)
FAIL: 

Re: Re: [PATCH v1] RISC-V: Revert RVV wv instructions overlap and xfail tests

2024-04-22 Thread Palmer Dabbelt

On Mon, 22 Apr 2024 15:07:59 PDT (-0700), juzhe.zh...@rivai.ai wrote:

Apologize that we didn't post our (me, kito and Li Pan) disscussions.


Some amount of off-list discussion is inevitable, but as far as I can 
tell here we ended up with some code committed that wasn't even posted 
to the lists and didn't even build.  I don't know exactly where the bar 
for public discussions is, but it's got to be somewhere higher than 
that.



This is the story:
We found that my previous patches which support highpart register overlap with 
register filter for instructions like (vwadd.wv)
cause ICE reported by:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714
and this is obviously a regression (No ICE on GCC 13.2, but ICE on GCC 14)

We have tried several fixes to work around this ICE, however, we failed.
And also I found my previous patches are quite wrong which is not the perfect 
solution to support register group overlap
for vwadd.wv. 
So, finally we decide to revert those patches.


Sure, reverting something that has a bug is reasonable.  The problem is 
how this happened: this code clearly did not get tested, as it doesn't 
even build and re-introduces a bunch of ICEs.  We're very late in stage 
4 and this is the second time the entire port has been broken in as many 
weeks.  That's a really bad time to be breaking things.


I think we're really set the wrong precedent on what the bar is for 
review here.  We had a lot of breakages early on in the 14 development 
cycle and never really dug into that, I was hoping that getting CI set 
up would be a strong enough hint to stop the breakages.  Clearly that 
didn't work, though, so:


Please stop breaking the port.

It's exceedingly rare that any patch needs to be committed minutes after 
it's posted.  These port-wide breakages are really the only thing where 
it could be agreed that's the way to go, but as we can see here rushing 
is as likely to dig a bigger hole as it is to fix things.  I get the 
testsuite can be kind of hard to run, but if you don't want to run it 
locally you can just wait for the CI to do it for you.  That's not 
really asking for very much.



Kito knows the details of this story, kito can share more details in GNU patche 
meeting.


Ya, we can talk tomorrow morning.

Do you guys have a fix for the regressions that showed up over the 
weekend?


Either way I'd prefer to go with reverting all this and then taking 
Robin's more self-contained fix.  If you guys want to do a bigger change 
later that's fine, we're just really close to the release and it's not a 
good time to risk breaking things.  We've only had a few days of a 
functioning port over the last week or two, that's already put us behind 
on the distro prerelases/RCs so I'm kind of worried something else has 
slipped in.




Thanks.


juzhe.zh...@rivai.ai
 
From: Patrick O'Neill

Date: 2024-04-23 01:20
To: Li, Pan2; Robin Dapp; gcc-patches@gcc.gnu.org
CC: juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Revert RVV wv instructions overlap and xfail 
tests
Hi Pan,
I'm not sure I'm following.  Did we miss something that should have been
covered?  Like only an overlap on the srcs but not the dest?
Are there testcases that fail?  If so we should definitely have one.
Can you give some additional information on why these reverts are needed?
+1 to the request for a failing testcase if there is one. Patrick If something 
is broken then indeed we should revert it.
Yes, we may need to support these in gcc-15.
... why not just revert everything and xfail all the tests in a
follow up?  Your patch is essentially a revert but doesn't look like
it.  I'd rather we let a revert be a revert and adjust the tests
separately so it becomes clear.
Sure, will revert b3b2799b872 and then file the patch for the xfail tests.
Pan



Re: [PATCH] Spelling fixes for translatable strings

2024-04-22 Thread Palmer Dabbelt

On Mon, 22 Apr 2024 14:30:21 PDT (-0700), ja...@redhat.com wrote:

Hi!

I've run aspell on gcc.pot (just quickly skimming, so pressing
I key hundreds of times and just stopping when I catch something that
looks like a misspelling).

I plan to commit this tomorrow as obvious unless somebody finds some
issues in it, you know, I'm not a native English speaker.
Yes, I know favour is valid UK spelling, but we spell the US way I think.
I've left some *ise* -> *ize* cases (recognise, initialise), those
had too many hits, though in translatable strings just 4, so maybe
worth changing too:
msgid "recognise the specified suffix as a definition module filename"
msgid "recognise the specified suffix as implementation and module filenames"
"initialiser for a dylib."
msgid "%qE attribute argument %qE is not recognised"

2024-04-22  Jakub Jelinek  

* config/epiphany/epiphany.opt (may-round-for-trunc): Spelling fix:
floatig -> floating.
* config/riscv/riscv.opt (mcsr-check): Spelling fix: CRS -> CSR.
* params.opt (-param=ipa-cp-profile-count-base=): Spelling fix:
frequncy -> frequency.
gcc/c-family/
* c.opt (Wstrict-flex-arrays): Spelling fix: inproper -> improper.
gcc/cp/
* parser.cc (cp_parser_using_declaration): Spelling fix: favour
-> favor.
gcc/m2/
* lang.opt (fuse-list=): Spelling fix: finalializations ->
finalizations.

--- gcc/config/epiphany/epiphany.opt.jj 2024-01-03 11:51:47.489509710 +0100
+++ gcc/config/epiphany/epiphany.opt2024-04-22 22:53:56.581549745 +0200
@@ -105,7 +105,7 @@ Enum(attr_fp_mode) String(int) Value(FP_

 may-round-for-trunc
 Target Mask(MAY_ROUND_FOR_TRUNC)
-A floatig point to integer truncation may be replaced with rounding to save 
mode switching.
+A floating point to integer truncation may be replaced with rounding to save 
mode switching.

 mvect-double
 Target Mask(VECT_DOUBLE)
--- gcc/config/riscv/riscv.opt.jj   2024-04-09 08:12:29.379449739 +0200
+++ gcc/config/riscv/riscv.opt  2024-04-22 22:52:06.780080712 +0200
@@ -152,7 +152,7 @@ required to materialize symbol addresses

 mcsr-check
 Target Var(riscv_mcsr_check) Init(0)
-Enable the CSR checking for the ISA-dependent CRS and the read-only CSR.
+Enable the CSR checking for the ISA-dependent CSR and the read-only CSR.
 The ISA-dependent CSR are only valid when the specific ISA is set.  The
 read-only CSR can not be written by the CSR instructions.


This came up on IRC.

Acked-by: Palmer Dabbelt 
Reviewed-by: Palmer Dabbelt 

In case you want to merge it with the rest of thus, but I think it 
should be something more like


   mcsr-check
   Target Var(riscv_mcsr_check) Init(0)
   Turn on (or off) stricter checking for CSR accesses.  With CSR 
   checking enabled a warning will be raised when a CSR access that is 
   not allowed by the currently enabled ISA is performed.  Checks 
   include writing a read-only CSR, and accessing a CSR that requires a 
   currently disabled base ISA or extension.  CSR checking is performed 
   at assembler-time, see the assembler documentation for more 
   information.


I can send that along later if you merge this, no big deal either way on 
my end.  Looks like the binutils docs are the same, so we should updote 
those as well...


Thanks!


--- gcc/cp/parser.cc.jj 2024-04-22 18:12:35.326282135 +0200
+++ gcc/cp/parser.cc2024-04-22 23:14:11.928605442 +0200
@@ -22431,7 +22431,7 @@ cp_parser_using_declaration (cp_parser*
   if (access_declaration_p && errorcount == oldcount)
 warning_at (diag_token->location, OPT_Wdeprecated,
"access declarations are deprecated "
-   "in favour of using-declarations; "
+   "in favor of using-declarations; "
"suggestion: add the % keyword");

   return true;
--- gcc/m2/lang.opt.jj  2024-04-03 09:58:33.538770735 +0200
+++ gcc/m2/lang.opt 2024-04-22 22:47:59.120533842 +0200
@@ -260,7 +260,7 @@ optimize non var unbounded parameters by

 fuse-list=
 Modula-2 Joined
-orders the initialization/finalializations for scaffold-static or force 
linking of modules if scaffold-dynamic
+orders the initialization/finalizations for scaffold-static or force linking 
of modules if scaffold-dynamic

 fversion
 Modula-2
--- gcc/c-family/c.opt.jj   2024-04-15 10:16:58.571245875 +0200
+++ gcc/c-family/c.opt  2024-04-22 22:41:48.188705755 +0200
@@ -1320,7 +1320,7 @@ C ObjC C++ ObjC++ LangEnabledBy(C ObjC C

 Wstrict-flex-arrays
 C C++ Var(warn_strict_flex_arrays) Warning
-Warn about inproper usages of flexible array members
+Warn about improper usages of flexible array members
 according to the level of -fstrict-flex-arrays.

 Wstrict-null-sentinel
--- gcc/params.opt.jj   2024-01-03 11:51:22.563855655 +0100
+++ gcc/params.opt  2024-04-22 23:06:47.466804309 +0200
@@ -263,7 +263,7 @@ Maximum size of a list of values 

[PATCH] RISC-V: Revert this weekend's changes

2024-04-22 Thread Palmer Dabbelt
Looks like we had a bunch of commits over the weekend that didn't get
tested/reviewed.  Some didn't even make it to the lists so it's hard to
tell exactly what happened, but the result was a trunk that doesn't even
build and a bunch of ICEs after some trivial fix ups landed on the
lists.

So let's just go back to what worked.  We're bumping up right next to a
release, it's a really bad time to be breaking stuff this badly.  It's
still not clear exactly what was broken here, so if something's wrong
then we should still fix it -- let's just at least build things until
GCC-14 branches.

gcc/ChangeLog:

* config/riscv/constraints.md (TARGET_VECTOR ? V_REGS :
NO_REGS): Revert to 90ded7512e1 ("Daily bump.").
* config/riscv/riscv.md (none,W21,W42,W84,W43,W86,W87,W0):
Likewise.
(no,W21,W42,W84,W41,W81,W82): Likewise.
(no,yes): Likewise.
* config/riscv/vector.md: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: Revert to
90ded7512e1 ("Daily bump.").
* gcc.target/riscv/rvv/base/pr112431-10.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-11.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-12.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-13.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-16.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-17.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-18.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-22.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-23.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-24.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-25.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-26.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-27.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-28.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-29.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-30.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-31.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-32.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-33.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-37.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-38.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-39.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-40.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-41.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-42.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-7.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-8.c: Likewise.
* gcc.target/riscv/rvv/base/pr112431-9.c: Likewise.

Fixes: cacc55a4c0b ("Revert "RISC-V: Rename vconstraint into group_overlap"")
Fixes: b78c88438cf ("Revert "RISC-V: Robostify the W43, W86, W87 constraint 
enabled attribute"")
Fixes: b991193eb8a ("RISC-V: Add xfail test case for highpart overlap 
floating-point widen insn")
Fixes: 4df96b4ec78 ("Revert "RISC-V: Support highpart overlap for 
floating-point widen instructions"")
Fixes: a367b99f916 ("RISC-V: Add xfail test case for indexed load overlap with 
SRC EEW < DEST EEW")
Fixes: 9257c7a7205 ("Revert "RISC-V: Support highpart overlap for indexed load 
with SRC EEW < DEST EEW"")
Fixes: c7506847c02 ("RISC-V: Add xfail test case for highest-number regno 
ternary overlap")
Fixes: cc46b6d4f3b ("Revert "RISC-V: Support highest-number regno overlap for 
widen ternary"")
Fixes: c4fdbdac122 ("RISC-V: Add xfail test case for widening register overlap 
of vf4/vf8")
Fixes: ec78916bb37 ("Revert "RISC-V: Support widening register overlap for 
vf4/vf8"")
Fixes: 338640fbee2 ("RISC-V: Add xfail test case for highpart register overlap 
of vx/vf widen")
Fixes: ef2392236ec ("Revert "RISC-V: Support highpart register overlap for 
widen vx/vf instructions"")
Fixes: d37b34fe82e ("RISC-V: Add xfail test case for incorrect overlap on v0")
Fixes: 3afcb04bd7d ("Revert "RISC-V: Fix overlap group incorrect overlap on 
v0"")
Fixes: 1690e47e101 ("RISC-V: Add xfail test case for wv insn highest overlap")
Fixes: f5447eae72f ("Revert "RISC-V: Support highest overlap for wv 
instructions"")
Fixes: 9f10005dbc9 ("RISC-V: Add xfail test case for wv insn register overlap")
Fixes: 0cbeafe2651 ("Revert "RISC-V: Support one more overlap for wv 
instructions"")
---

I haven't even built this one myself so I'm definately not going to
commit it, but I figured it'd be best to get something on the lists as
we're pretty broken.  If someone has a patch stack that gets things
building and fixes the ICEs then I'm happy to look at that, but it's not
even clear what we were trying to fix in the first place.

Either way, I think we've found a pretty major issue with our
development process here.
---
 gcc/config/riscv/constraints.md   |  12 +-
 gcc/config/riscv/riscv.md |  40 +-
 

Re: [PATCH v1] RISC-V: Adjust overlap attr after revert d3544cea63d and e65aaf8efe1

2024-04-22 Thread Palmer Dabbelt

On Mon, 22 Apr 2024 06:47:34 PDT (-0700), pan2...@intel.com wrote:

From: Pan Li 

After we reverted below 2 commits, the reference to attr need some
adjustment as the group_overlap is no longer available.

* RISC-V: Robostify the W43, W86, W87 constraint enabled attribute
* RISC-V: Rename vconstraint into group_overlap


That landed as cacc55a4c0b ("Revert "RISC-V: Rename vconstraint into 
group_overlap""), but I can't find a review on the lists and the commit 
appears to not even build.


Did I miss the reviews or something?


The below tests are passed for this patch.

* The rv64gcv fully regression tests.

gcc/ChangeLog:

* config/riscv/vector-crypto.md:

Signed-off-by: Pan Li 
---
 gcc/config/riscv/vector-crypto.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
index 519c6a10d94..23dc549e5b8 100755
--- a/gcc/config/riscv/vector-crypto.md
+++ b/gcc/config/riscv/vector-crypto.md
@@ -322,7 +322,7 @@ (define_insn "@pred_vwsll_scalar"
   "vwsll.v%o4\t%0,%3,%4%p1"
   [(set_attr "type" "vwsll")
(set_attr "mode" "")
-   (set_attr "group_overlap" 
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")])
+   (set_attr "vconstraint" 
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,no,no")])

 ;; vbrev.v vbrev8.v vrev8.v
 (define_insn "@pred_v"


Re: [PATCH] Regenerate opt.urls

2024-04-12 Thread Palmer Dabbelt

On Fri, 12 Apr 2024 12:25:42 PDT (-0700), tschwi...@baylibre.com wrote:

Hi!

After having received around a dozen more buildbot notifications...

On 2024-04-10T06:46:04-0700, Palmer Dabbelt  wrote:

On Tue, 09 Apr 2024 07:57:24 PDT (-0700), ishitatsuy...@gmail.com wrote:

Fixes: 97069657c4e ("RISC-V: Implement TLS Descriptors.")

gcc/ChangeLog:
* config/riscv/riscv.opt.urls: Regenerated.
---
 gcc/config/riscv/riscv.opt.urls | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index da31820e234..351f7f0dda2 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
 minline-strlen
 UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)

+; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
+


Thanks.  I had another one over here 
<https://inbox.sourceware.org/gcc-patches/2024041338.3926-1-pal...@rivosinc.com/>, 
but let's go with yours -- I think the actual contents are the same, but 
I didn't actually run the regenerate script.  So


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 


..., I've now pushed this to trunk branch in
commit c9500083073ff5e0f5c1c9db92d7ce6e51a62919
"Regenerate opt.urls".


Thanks, and sorry I forgot about this one.




Grüße
 Thomas


Re: [PATCH] wwwdocs: gcc-14: Add RISC-V changes

2024-04-10 Thread Palmer Dabbelt
ough the -mcpu
+  option (GCC identifiers in parentheses).
+
+  SiFive's X280 (sifive-x280).
+  SiFive's P450 (sifive-p450).
+  SiFive's P670 (sifive-p670).
+
+  
+  The following new CPUs are supported through the -mtune
+  option (GCC identifiers in parentheses).
+
+  Generic out-of-order core (generic-ooo).
+  SiFive's P400 series (sifive-p400-series).
+  SiFive's P600 series (sifive-p600-series).
+  XiangShan's Nanhu microarchitecture 
(xiangshan-nanhu).
+
+  
+

 


Thanks for doing this.  This all pretty minor wording stuff, so

Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

Maybe next year we'll remember to ask submitters for these ;)


Re: [PATCH] Regenerate opt.urls

2024-04-10 Thread Palmer Dabbelt

On Wed, 10 Apr 2024 00:57:59 PDT (-0700), sch...@suse.de wrote:

On Apr 09 2024, Palmer Dabbelt wrote:


I didn't actually regenerate this as I can't figure out how,


make regenerate-opt-urls


Ya, that's what the CI says too.  I think I might just have a broken 
build tree, something is mixed up and it picked up a host binutils.  
Looks like there's already a patch over here 
<https://inbox.sourceware.org/gcc-patches/20240409145724.9640-1-ishitatsuy...@gmail.com/>, 
so we should be good.


Re: [PATCH] Regenerate opt.urls

2024-04-10 Thread Palmer Dabbelt

On Tue, 09 Apr 2024 07:57:24 PDT (-0700), ishitatsuy...@gmail.com wrote:

Fixes: 97069657c4e ("RISC-V: Implement TLS Descriptors.")

gcc/ChangeLog:
* config/riscv/riscv.opt.urls: Regenerated.
---
 gcc/config/riscv/riscv.opt.urls | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index da31820e234..351f7f0dda2 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
 minline-strlen
 UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)

+; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
+


Thanks.  I had another one over here 
<https://inbox.sourceware.org/gcc-patches/2024041338.3926-1-pal...@rivosinc.com/>, 
but let's go with yours -- I think the actual contents are the same, but 
I didn't actually run the regenerate script.  So


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 


Re: [PATCH v2 2/3] aarch64: Add support for aarch64-gnu (GNU/Hurd on AArch64)

2024-04-09 Thread Palmer Dabbelt

On Tue, 09 Apr 2024 09:57:11 PDT (-0700), buga...@gmail.com wrote:

On Tue, Apr 9, 2024 at 7:24 PM Palmer Dabbelt  wrote:

> I assume the buildbot failure that I just got an email about is
> unrelated; it's failing on some RISC-V thing.

Sorry if I missed something here, do you have a pointer?


The buildbot failure emails reference [0] and [1].

[0]: https://builder.sourceware.org/buildbot/#/builders/269/builds/4216
[1]: https://builder.sourceware.org/buildbot/#/builders/269/builds/4218

 Specifically, the "git diff_1" step seems to be failing with

diff --git a/gcc/config/riscv/riscv.opt.urls
b/gcc/config/riscv/riscv.opt.urls
index da31820e234..351f7f0dda2 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
 minline-strlen
 UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)

+; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
+

I don't know what to make of that, but it seems unrelated to my
aarch64-gnu changes.


I'd never heard of it either, looks like a new thing we're supposed to 
do when toching the options.  I just sent 
<https://inbox.sourceware.org/gcc-patches/2024041338.3926-1-pal...@rivosinc.com/>, 
I'm sure there's a better way to do it...




Sergey


[PATCH] Regenerate opt.urls

2024-04-09 Thread Palmer Dabbelt
I didn't actually regenerate this as I can't figure out how, I've just
pasted in the diff from the sourceware buildbot (which is complaining
about post-regeneration diff).

Fixes: 97069657c4e ("RISC-V: Implement TLS Descriptors.")

gcc/ChangeLog:

* config/riscv/riscv.opt.urls: Regenerated.
---
 gcc/config/riscv/riscv.opt.urls | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index da31820e234..351f7f0dda2 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
 minline-strlen
 UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
 
+; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
+
-- 
2.44.0



Re: [PATCH v2 2/3] aarch64: Add support for aarch64-gnu (GNU/Hurd on AArch64)

2024-04-09 Thread Palmer Dabbelt

On Tue, 09 Apr 2024 01:04:34 PDT (-0700), buga...@gmail.com wrote:

On Tue, Apr 9, 2024 at 10:27 AM Thomas Schwinge  wrote:

Thanks, pushed to trunk branch:

  - commit 532c57f8c3a15b109a46d3e2b14d60a5c40979d5 "Move GNU/Hurd startfile spec 
from config/i386/gnu.h to config/gnu.h"
  - commit 9670a2326333caa8482377c00beb65723b7b4b26 "aarch64: Add support for 
aarch64-gnu (GNU/Hurd on AArch64)"
  - commit 46c91665f4bceba19aed56f5bd6e934c548b84ff "libgcc: Add basic support for 
aarch64-gnu (GNU/Hurd on AArch64)"


\o/ Thanks a lot!

This will unblock merging the aarch64-gnu glibc port upstream.

I assume the buildbot failure that I just got an email about is
unrelated; it's failing on some RISC-V thing.


Sorry if I missed something here, do you have a pointer?



Sergey

P.S. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114629


Re: [gcc-13 backport PATCH] RISC-V: Fix C23 (...) functions returning large aggregates [PR114175]

2024-04-04 Thread Palmer Dabbelt

On Thu, 04 Apr 2024 07:37:56 PDT (-0700), ja...@redhat.com wrote:

On Thu, Apr 04, 2024 at 07:28:40AM -0700, Palmer Dabbelt wrote:

I'm not sure if we need release maintainer approval,


For cherry-picking one's own non-risky bugfixes for regression or
documentation bugs from trunk to release branches no special approval
is needed, or maintainer of the corresponding code can approve that,
release manager approval is only needed when a branch is frozen before a
release.


Ya, I'm just never sure when the branch is frozen...


all I can find is the
13.2.1 status report saying 13.3 is expected in the spring
<https://inbox.sourceware.org/gcc/ZMJeq%2FY5SN+7i8a+@tucnak/>.  My allergies
certainly indicate it's spring, but that's kind of a wide time window...

Maybe Jakub knows?


Most likely some short time after 14.1 is released, so that one can still
cherry-pick whatever was fixed on the 14 branch and there is time for those
cherry-picks and testing.
https://gcc.gnu.org/releases.html#timeline gives some hints...


OK, so sounds like it's not frozen now and Edwin's OK to commit this on 
the 13 branch.  Thanks.




Jakub


Re: [gcc-13 backport PATCH] RISC-V: Fix C23 (...) functions returning large aggregates [PR114175]

2024-04-04 Thread Palmer Dabbelt

On Wed, 03 Apr 2024 13:17:36 PDT (-0700), e...@rivosinc.com wrote:

We assume that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and
there is nothing to advance, but that is not the case for (...) functions
returning by hidden reference which have one such artificial argument.
This causes gcc.dg/c23-stdarg-[68].c to fail

Fix the issue by checking if arg.type is NULL as r14-9503-g218d1749612
explains

Tested on linux rv64gcv.

gcc/ChangeLog:

PR target/114175
* config/riscv/riscv.cc (riscv_setup_incoming_varargs): Only skip
riscv_funciton_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL

(cherry picked from commit 60586710b0646efdbbd77a7f53b93fb5edb87a61)
---
 gcc/config/riscv/riscv.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 01eebc83cc5..cefd3b7b2b2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3961,7 +3961,8 @@ riscv_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 riscv_function_arg_advance (pack_cumulative_args (_cum), arg);

   /* Found out how many registers we need to save.  */


Acked-by: Palmer Dabbelt 

I'm not sure if we need release maintainer approval, all I can find is 
the 13.2.1 status report saying 13.3 is expected in the spring 
<https://inbox.sourceware.org/gcc/ZMJeq%2FY5SN+7i8a+@tucnak/>.  My 
allergies certainly indicate it's spring, but that's kind of a wide time 
window...


Maybe Jakub knows?


Re:[PATCH v2 1/1] [RISC-V] Add support for _Bfloat16

2024-04-02 Thread Palmer Dabbelt

On Tue, 02 Apr 2024 20:19:16 PDT (-0700), ji...@linux.alibaba.com wrote:

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparison.c: New test.
* gcc.target/riscv/bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/bf16_integer_libcall_convert.c: New test.


  Hi, I have test this patch and it is very good. I think we need to add some
runable tests to ensure that the results are right for various types of
conversions, operations, and libfuncs.


Sorry I forgot to reply earlier, a few of us were talking about this is 
the patchwork meeting this morning.  We think this is too big for GCC-14 
this late in the cycle.  Folks are still looking at bugs, so it might 
take a bit to get reviewed for GCC-15.


I took a look and don't see anything wrong, but I'm not a floating-point 
person so I'd want to try and talk to someone who is before committing 
it.


More testing never hurts, though ;)


BR,
Jin


Re: [committed] RISC-V: Add missing insn types to XiangShan Nanhu scheduler model

2024-03-31 Thread Palmer Dabbelt

On Sun, 31 Mar 2024 09:53:46 PDT (-0700), Jeff Law wrote:

The test for the recently added XiangShan Nanhu microarchitecture is
failing because the scheduler description does not have entries for
certain insn types.

I'm adding  branch, jalr, ret and sfb_alu to the scheduler description,
that's enough to get the trivial test to pass.  However, I strongly
suspect running any significant code through the compiler when
scheduling for this microarchitecture will trigger faults.


We should probably add a build with this a the default pipeline model to 
the lists of tests, even if it's just something we run every few weeks 
it'd still be good.



Basically we have checking now that will fault if we have an insn in the
IL without an associated type or if we have an insn in the IL that does
not map to an insn reservation in the scheduler model.  We were tripping
the latter assertion for one of those branch types.  My suspicion is
many insn types aren't handled by that DFA.

The branch insns were pretty obvious and easy to fix.  But someone with
more experience with the uarch needs to do an audit to ensure that all
insn types map to an insn reservation.

Pushing this to the trunk.

Jeff



commit 08eaafadd5beaa56beb2d1fceca9f97eeb0219ba
Author: Jeff Law 
Date:   Sun Mar 31 10:51:17 2024 -0600

[committed] RISC-V: Add missing insn types to XiangShan Nanhu scheduler 
model

The test for the recently added XiangShan Nanhu microarchitecture is failing
because the scheduler description does not have entries for certain insn 
types.

I'm adding  branch, jalr, ret and sfb_alu to the scheduler description, 
that's
enough to get the trivial test to pass.  However, I strongly suspect running
any significant code through the compiler when scheduling for this
microarchitecture will trigger faults.

Basically we have checking now that will fault if we have an insn in the IL
without an associated type or if we have an insn in the IL that does not 
map to
an insn reservation in the scheduler model.  We were tripping the latter
assertion for one of those branch types.  My suspicion is many insn types
aren't handled by that DFA.

The branch insns were pretty obvious and easy to fix.  But someone with more
experience with the uarch needs to do an audit to ensure that all insn types
map to an insn reservation.

gcc/
* config/riscv/xiangshan.md (xiangshan_jump): Add branch, jalr, ret
and sfb_alu.

diff --git a/gcc/config/riscv/xiangshan.md b/gcc/config/riscv/xiangshan.md
index 381c3ce1428..76539d332b8 100644
--- a/gcc/config/riscv/xiangshan.md
+++ b/gcc/config/riscv/xiangshan.md
@@ -70,7 +70,7 @@ (define_insn_reservation "xiangshan_fpstore" 1

 (define_insn_reservation "xiangshan_jump" 1
   (and (eq_attr "tune" "xiangshan")
-   (eq_attr "type" "jump,call,auipc,unknown"))
+   (eq_attr "type" "jump,call,auipc,unknown,branch,jalr,ret,sfb_alu"))
   "xs_jmp_rs")

 (define_insn_reservation "xiangshan_i2f" 3


[PATCH] RISC-V: Add vxsat as a register

2024-03-27 Thread Palmer Dabbelt
We aren't doing anything with vxsat right now, but I'd like to add it as
an accepted register to the clobber list.  If we get this into GCC-14
then we'll avoid some preprocessor-based twiddling if we ever start
using vxsat in the future.

gcc/ChangeLog:

* config/riscv/riscv.h (REGISTER_NAMES): Add vxsat.
---
IIUC we aren't using these N/A regnos for anything, they're just there to pad
out the types.  So I think this is safe, but Juzhe would likely know best here.

See
https://inbox.sourceware.org/libc-alpha/20240327193601.28903-2-pal...@rivosinc.com/
a use of this.
---
 gcc/config/riscv/riscv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index da089a03e9d..d5779512994 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -933,7 +933,7 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   "fs0", "fs1", "fa0", "fa1", "fa2", "fa3", "fa4", "fa5",  \
   "fa6", "fa7", "fs2", "fs3", "fs4", "fs5", "fs6", "fs7",  \
   "fs8", "fs9", "fs10","fs11","ft8", "ft9", "ft10","ft11", \
-  "arg", "frame", "vl", "vtype", "vxrm", "frm", "N/A", "N/A",   \
+  "arg", "frame", "vl", "vtype", "vxrm", "frm", "vxsat", "N/A", \
   "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
   "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
   "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
-- 
2.44.0



Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Palmer Dabbelt

On Mon, 25 Mar 2024 13:49:18 PDT (-0700), jeffreya...@gmail.com wrote:



On 3/25/24 2:31 PM, Palmer Dabbelt wrote:

On Mon, 25 Mar 2024 13:27:34 PDT (-0700), Jeff Law wrote:



I'd doubt it's worth the complexity.  Picking some reasonable value gets
you the vast majority of the benefit.   Something like
COSTS_N_INSNS(6) is enough to get CSE to trigger.  So what's left is a
reasonable cost, particularly for the division-by-constant case where we
need a ceiling for synth_mult.


Ya, makes sense.  I noticed our multi-word multiply costs are a bit odd
too (they really only work for 64-bit mul on 32-bit targets), but that's
probably not worth worrying about either.

We do have a changes locally that adjust various costs.  One of which is
highpart multiply.  One of the many things to start working through once
gcc-15 opens for development.  Hence my desire to help keep gcc-14 on
track for an on-time release.


Cool.  LMK if there's anything we can do to help on that front.



Jeff


Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Palmer Dabbelt

On Mon, 25 Mar 2024 13:27:34 PDT (-0700), Jeff Law wrote:



On 3/25/24 2:13 PM, Palmer Dabbelt wrote:

On Mon, 25 Mar 2024 12:59:14 PDT (-0700), Jeff Law wrote:



On 3/25/24 1:48 PM, Xi Ruoyao wrote:

On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:

+/* Costs to use when optimizing for xiangshan nanhu.  */
+static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* fp_add */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* fp_mul */
+  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},    /* fp_div */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},    /* int_div */
+  6,    /* issue_rate */
+  3,    /* branch_cost */
+  3,    /* memory_cost */
+  3,    /* fmv_cost */
+  true,    /* slow_unaligned_access */
+  false,    /* use_divmod_expansion */
+  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
+  NULL,    /* vector cost */



Is your integer division really that fast?  The table above essentially
says that your cpu can do integer division in 6 cycles.


Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

Yea, early outs are relatively common in the actual hardware
implementation.

The biggest reason to refine the cost of a division is so that we've got
a reasonably accurate cost for division by a constant -- which can often
be done with multiplication by reciprocal sequence.  The multiplication
by reciprocal sequence will use mult, add, sub, shadd insns and you need
a reasonable cost model for those so you can compare against the cost of
a hardware division.

So to answer your question.  Choose something sensible, you probably
don't want the fastest case and you may not want the slowest case.


Maybe we should have some sort of per-bit-set cost hook for mul/div?
Without that we're kind of just guessing at whether the implmentation
has early outs based on hueristics used to implicitly generate the cost
models.

Not sure that's really worth the complexity, though...

I'd doubt it's worth the complexity.  Picking some reasonable value gets
you the vast majority of the benefit.   Something like
COSTS_N_INSNS(6) is enough to get CSE to trigger.  So what's left is a
reasonable cost, particularly for the division-by-constant case where we
need a ceiling for synth_mult.


Ya, makes sense.  I noticed our multi-word multiply costs are a bit odd 
too (they really only work for 64-bit mul on 32-bit targets), but that's 
probably not worth worrying about either.




Jeff


Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Palmer Dabbelt

On Mon, 25 Mar 2024 12:59:14 PDT (-0700), Jeff Law wrote:



On 3/25/24 1:48 PM, Xi Ruoyao wrote:

On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:

+/* Costs to use when optimizing for xiangshan nanhu.  */
+static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
+  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
+  6,   /* issue_rate */
+  3,   /* branch_cost */
+  3,   /* memory_cost */
+  3,   /* fmv_cost */
+  true,/* 
slow_unaligned_access */
+  false,   /* use_divmod_expansion */
+  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
+  NULL,/* vector cost */



Is your integer division really that fast?  The table above essentially
says that your cpu can do integer division in 6 cycles.


Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

Yea, early outs are relatively common in the actual hardware
implementation.

The biggest reason to refine the cost of a division is so that we've got
a reasonably accurate cost for division by a constant -- which can often
be done with multiplication by reciprocal sequence.  The multiplication
by reciprocal sequence will use mult, add, sub, shadd insns and you need
a reasonable cost model for those so you can compare against the cost of
a hardware division.

So to answer your question.  Choose something sensible, you probably
don't want the fastest case and you may not want the slowest case.


Maybe we should have some sort of per-bit-set cost hook for mul/div?  
Without that we're kind of just guessing at whether the implmentation 
has early outs based on hueristics used to implicitly generate the cost 
models.


Not sure that's really worth the complexity, though...


Jeff


Re: [PATCH] RISC-V: Require a extension for ztso testcases with atomic insns

2024-03-22 Thread Palmer Dabbelt
ot; 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-7.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-7.c
index b5c42e1df1d..33928c0eac4 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-7.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-7.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* Verify that compare exchange mappings match the Ztso suggested mapping.  */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-final { scan-assembler-times "lr.w.aqrl\t" 1 } } */
 /* { dg-final { scan-assembler-times "sc.w.rl\t" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c
index 3ba69ebc325..2a40d6b1376 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* Verify that subword atomic op mappings match the Ztso suggested mapping.  */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-final { scan-assembler-times "lr.w\t" 1 } } */
 /* { dg-final { scan-assembler-times "sc.w\t" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c
index 4f38ed3015c..c79380f2611 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* Verify that subword atomic op mappings match the Ztso suggested mapping.  */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-final { scan-assembler-times "lr.w\t" 1 } } */
 /* { dg-final { scan-assembler-times "sc.w\t" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c
index e5bcb127552..d1a94eccfa8 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* Verify that subword atomic op mappings match the Ztso suggested mapping.  */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-final { scan-assembler-times "lr.w\t" 1 } } */
 /* { dg-final { scan-assembler-times "sc.w\t" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c
index 316183c268b..3d65bc2f64a 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* Verify that subword atomic op mappings match the Ztso suggested mapping.  */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-final { scan-assembler-times "lr.w\t" 1 } } */
 /* { dg-final { scan-assembler-times "sc.w\t" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c
index fc1aa8d94f1..10354387a13 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* Verify that subword atomic op mappings match the Ztso suggested mapping.  */
+/* { dg-add-options riscv_a } */
 /* { dg-add-options riscv_ztso } */
 /* { dg-final { scan-assembler-times "lr.w.aqrl\t" 1 } } */
 /* { dg-final { scan-assembler-times "sc.w.rl\t" 1 } } */


Presumably these trip up on the non-A targets that Edwin's just adding to the
testers?  They'd also trip up anyone running newlib/mulilib tests.

Either way they look right to me, so

Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

Thanks!


Re: RISC-V: Use convert instructions instead of calling library functions

2024-03-20 Thread Palmer Dabbelt

On Wed, 20 Mar 2024 11:54:34 PDT (-0700), Jeff Law wrote:



On 3/19/24 10:23 AM, Palmer Dabbelt wrote:

On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:



On 3/18/24 3:09 AM, Jivan Hakobyan wrote:

As RV has round instructions it is reasonable to use them instead of
calling the library functions.

With my patch for the following C code:
double foo(double a) {
     return ceil(a);
}

GCC generates the following ASM code (before it was tail call)
foo:
         fabs.d  fa4,fa0
         lui     a5,%hi(.LC0)
         fld     fa3,%lo(.LC0)(a5)
         flt.d   a5,fa4,fa3
         beq     a5,zero,.L3
         fcvt.l.d a5,fa0,rup


I'm not sure exactly what context this is in, but my reading of
"according to the current rounding mode" means we'd usually use the
dynamic rounding mode.

As Andrew W. noted, we're dealing with ceil and thus rup is the correct
rounding mode to use here.





My only worry here is that when we were doing the other patterns we
decided not to do rint.  I can't remember exactly why, but from reading
the docs we probably just skipped it due to the inexact handling and Zfa
having an instruction that just does this.  FP stuff is always a bit of
a time sink, so at that point it probably just fell off the priority list.

rint is supposed to raise FE_INEXACT, so it's actually a good match for
RISC-V fcvt semantics as they appropriately raise FE_INEXACT.

nearby* do not arise FE_INEXACT and thus would rely on the new Zfa
instructions where we have ones that do not raise FE_INEXACT or they
need to be conditional on flag_fp_int_builtin_inexact.  One could
reasonably argue that when flag_fp_int_builtin_inexact is enabled that a
call to nearby* ought to be converted into a call to rint*.




I'm not really an FP guy, so I usually just poke around what the other
ports generate and try to figure out what's going on.  arm64 has the Zfa
instruction and x86 FP is complicated, so I'm not sure exactly who else
to look at for this sort of stuff.  From just looking at the code,
though, I think there's two issues -- I'm not really an FP person,
though, so take this with a grain of salt:

Right.  And the condition under which we use the new sequence for
ceil/round actually borrows from x86.  Essentially we only use the new
sequence when we've been told we don't care about FE_INEXACT or fp
exceptions in general.



IIUC what we've got here doesn't actually set the inexact flag for the
bounds check failure case, as we're just loading up an exact constant
and doing the bounds check.  We're also not clamping to INT_MAX-type
values, but not sure if we're supposed to.  I think we could fix both of
those by adjusting the expansion to something like

The state of FE_INEXACT is a don't care here due the condition on the
expansion code.




          fabs.d  fa4,fa0
          lui     a5,%hi(.LC0)
          fld     fa3,%lo(.LC0)(a5)
          flt.d   a5,fa4,fa3
          bne     a5,zero,.L3
  mv  fa0, fa3
     .L3:
          fcvt.l.d a5,fa0,rup
          fcvt.d.l        fa4,a5
          fsgnj.d fa0,fa4,fa0
          ret

and then adjusting the constant to be an epsilon larger than INT_MAX so
it'll still trigger the clamping but also inexact.

I think Jivan's sequence is more correct.  It's not just INT_MAX here
that's concerning, there's a whole class of values that cause problems.


Ya, sorry, I thought I'd replied to Andrew's email somewhere -- I'd just 
managed to confuse myself about how the FP stuff works, I also think 
Jivan's code is correct now.



There's also a pair of changes to the ISA in 2020 that added the
conversion inexact handling requirement, it was a grey area before.  I
don't remember exactly what happened there, but I did remember it
happening.  I don't think anyone cares all that much about the
performance of systems that target the older ISAs, so maybe we just
restrict the non-libcall expansion to ISAs that contain the new wording?

I think all this got sufficiently cleaned up.  The spec is explicit
about when FE_INEXACT gets raised on the fcvt instructions.  I referred
to it repeatedly when analyzing Jivan's work.


We still have support for stuf like -misa-spec=2.2 (and some 2019 
releases with clunky version numbers).  Those all predate the 
convert/inexact wording that got added.


Though if FE_INEXACT is a don't care here, then I think it doesn't 
matter if the wording got changed.  In that case I think this is fine, 
so 


Reviewed-by: Palmer Dabbelt 


We can hash through the final items in a few weeks once the trunk
re-opens for development.


I think those were all the isuses on my end ;)




Jeff


Re: [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned

2024-03-19 Thread Palmer Dabbelt

On Tue, 19 Mar 2024 13:05:54 PDT (-0700), Vineet Gupta wrote:



On 3/19/24 06:10, Jeff Law wrote:

On 3/19/24 12:48 AM, Andrew Waterman wrote:

On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta  wrote:

On 3/16/24 13:21, Jeff Law wrote:

|   59944:add s0,sp,2047  <
|   59948:mv  a2,a0
|   5994c:mv  a3,a1
|   59950:mv  a0,sp
|   59954:li  a4,1
|   59958:lui a1,0x1
|   5995c:add s0,s0,1 <---
|   59960:jal 59a3c

SP here becomes unaligned, even if transitively which is undesirable as
well as incorrect:
   - ABI requires stack to be 8 byte aligned


It's 16-byte aligned in the default ABI, for Q, but that doesn't really 
matter.



   - asm code looks weird and unexpected
   - to the user it might falsely seem like a compiler bug even when not,
 specially when staring at asm for debugging unrelated issue.
It's not ideal, but I think it's still ABI compliant as-is.  If it
wasn't, then I suspect things like virtual origins in Ada couldn't be
made ABI compliant.

To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
I'd still like to avoid it as I'm sure someone will complain about it.


With the patch, we get following correct code instead:

| ..
| 59944: add s0,sp,2032
| ..
| 5995c: add s0,s0,16

Alternately you could tighten the positive side of the range of the
splitter from patch 1/3 so that you could always use 2032 rather than
2047 on the first addi.   ie instead of allowing 2048..4094, allow
2048..4064.

2033..4064 vs. 2048..4094

Yeah I was a bit split about this as well. Since you are OK with either,
I'll keep them as-is and perhaps add this observation to commitlog.

There's a subset of embedded use cases where an interrupt service
routine continues on the same stack as the interrupted thread,
requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
and with no important data on the stack at an address below sp).

Although not all use cases care about this property, it seems more
straightforward to maintain the invariant everywhere, rather than
selectively enforce it.

Just to be clear, the changes don't misalign the stack pointer at all.
They merely have the potential to create *another* pointer into the
stack which may or may not be aligned.  Which is totally normal, it's no
different than taking the address of a char on the stack.


IIRC the "always"-ness of the stack pointer alignment has come up 
before, and we decided to keep it aligned for these embedded 
interrupt-related reasons that Andrew points out.  That's a bit 
different than most other ABI requirements, where we're just enforcing 
them on function boundaries.


That said, this one sounds like just a terminology issue: it's not SP 
that's misaligned, but intemediate SP-based addressing calculations.  
I'm not sure if there's a word for these SP-based intermediate values, 
during last week's team meeting we came up with "stack anchors".



Right I never saw any sp,sp,2047 getting generated - not even in the
first version of patch which lacked any filtering of stack regs via
riscv_reg_frame_related () and obviously didn't have the stack variant
of splitter. I don't know if that is just being lucky and not enough
testing exposure (I only spot checked buildroot libc, vmlinux) or
something somewhere enforces that.


I guess we could run the tests with something like

   diff --git a/target/riscv/translate.c b/target/riscv/translate.c
   index ab18899122..e87cc83067 100644
   --- a/target/riscv/translate.c
   +++ b/target/riscv/translate.c
   @@ -320,12 +320,18 @@ static void gen_goto_tb(DisasContext *ctx, int n, 
target_long diff)
 */
static TCGv get_gpr(DisasContext *ctx, int reg_num, DisasExtend ext)
{
   -TCGv t;
   +TCGv t, gpr;
   
if (reg_num == 0) {

return ctx->zero;
}
   
   +if (reg_num == 2) {

   +gpr = tcg_gen_temp_new();
   +tcg_gen_andi_tl(gpr, cpu_gpr[reg_num], 0xF);
   +} else
   +gpr = cpu_gpr[reg_num];
   +
switch (get_ol(ctx)) {
case MXL_RV32:
switch (ext) {
   @@ -333,11 +339,11 @@ static TCGv get_gpr(DisasContext *ctx, int reg_num, 
DisasExtend ext)
break;
case EXT_SIGN:
t = tcg_temp_new();
   -tcg_gen_ext32s_tl(t, cpu_gpr[reg_num]);
   +tcg_gen_ext32s_tl(t, gpr);
return t;
case EXT_ZERO:
t = tcg_temp_new();
   -tcg_gen_ext32u_tl(t, cpu_gpr[reg_num]);
   +tcg_gen_ext32u_tl(t, gpr);
return t;
default:
g_assert_not_reached();
   @@ -349,7 +355,7 @@ static TCGv get_gpr(DisasContext *ctx, int reg_num, 
DisasExtend ext)
default:
g_assert_not_reached();
}
   -return cpu_gpr[reg_num];
   +return gpr;
}
   
static TCGv get_gprh(DisasContext *ctx, int reg_num)


and see if anything blows up?  

Re: RISC-V: Use convert instructions instead of calling library functions

2024-03-19 Thread Palmer Dabbelt

On Tue, 19 Mar 2024 12:58:41 PDT (-0700), Andrew Waterman wrote:

On Tue, Mar 19, 2024 at 9:23 AM Palmer Dabbelt  wrote:


On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:
>
>
> On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
>> As RV has round instructions it is reasonable to use them instead of
>> calling the library functions.
>>
>> With my patch for the following C code:
>> double foo(double a) {
>>  return ceil(a);
>> }
>>
>> GCC generates the following ASM code (before it was tail call)
>> foo:
>>  fabs.d  fa4,fa0
>>  lui a5,%hi(.LC0)
>>  fld fa3,%lo(.LC0)(a5)
>>  flt.d   a5,fa4,fa3
>>  beq a5,zero,.L3
>>  fcvt.l.d a5,fa0,rup

I'm not sure exactly what context this is in, but my reading of
"according to the current rounding mode" means we'd usually use the
dynamic rounding mode.



ceil doesn't depend on the current rounding mode; rup is correct.  For
rint, you'd be correct.


Ya, right, that's pretty obvious -- I must have just been falling asleep 
this morning.



>>  fcvt.d.lfa4,a5
>>  fsgnj.d fa0,fa4,fa0
>> .L3:
>>  ret
>>
>> .LC0:
>>  .word   0
>>  .word   1127219200 // 0x4330
>>
>>
>> The patch I have evaluated on SPEC2017.
>> Counted dynamic instructions counts and got the following improvements
>>
>> 510.parest_r   262 m  -
>> 511.povray_r  2.1  b0.04%
>> 521.wrt_r269 m   -
>> 526.blender_r3 b 0.1%
>> 527.cam4_r   15 b   0.6%
>> 538.imagick_r365 b 7.6%
>>
>> Overall executed 385 billion fewer instructions which is 0.5%.
> A few more notes.
>
> The sequence Jivan is using is derived from LLVM.  The condition in the
> generated code tests for those values were are supposed to pass through
> unaltered.  The condition in the pattern ensures we do something
> sensible WRT FE_INEXACT and mirrors how other ports handle these insns.

My only worry here is that when we were doing the other patterns we
decided not to do rint.  I can't remember exactly why, but from reading
the docs we probably just skipped it due to the inexact handling and Zfa
having an instruction that just does this.  FP stuff is always a bit of
a time sink, so at that point it probably just fell off the priority
list.

I'm not really an FP guy, so I usually just poke around what the other
ports generate and try to figure out what's going on.  arm64 has the Zfa
instruction and x86 FP is complicated, so I'm not sure exactly who else
to look at for this sort of stuff.  From just looking at the code,
though, I think there's two issues -- I'm not really an FP person,
though, so take this with a grain of salt:

IIUC what we've got here doesn't actually set the inexact flag for the
bounds check failure case



I think the original code was correct.  If you exceed the bounds, then by
construction you know that the input was already an integer, and so not
setting NX is correct.  If you didn't exceed the bounds, then fcvt.l.d will
set NX when appropriate.

, as we're just loading up an exact constant

and doing the bounds check.  We're also not clamping to INT_MAX-type
values, but not sure if we're supposed to.  I think we could fix both of
those by adjusting the expansion to something like

  fabs.d  fa4,fa0
  lui a5,%hi(.LC0)
  fld fa3,%lo(.LC0)(a5)
  flt.d   a5,fa4,fa3
  bne a5,zero,.L3
  mv  fa0, fa3
 .L3:
  fcvt.l.d a5,fa0,rup
  fcvt.d.lfa4,a5
  fsgnj.d fa0,fa4,fa0
  ret

and then adjusting the constant to be an epsilon larger than INT_MAX so
it'll still trigger the clamping but also inexact.



ceil/rint/etc. are supposed to work for the whole range of their
floating-point type; the range of the integral types isn't supposed to
affect the result.  The original code was correct in this regard, too.


Ya, sorry, I think I just misunderstood what was going on here -- it's 
not INT_MAX, it's just the largest FP values for which mantissas might 
produce non-integral values.



There's also a pair of changes to the ISA in 2020 that added the
conversion inexact handling requirement, it was a grey area before.  I
don't remember exactly what happened there, but I did remember it
happening.  I don't think anyone cares all that much about the
performance of systems that target the older ISAs, so maybe we just
restrict the non-libcall expansion to ISAs that contain the new wording?

commit 5890a1a702abf4157d5879717a39d8ecdae0de68 (tag:
draft-20201229-5890a1a)
Author: Andrew Waterman 
Date:   Mon Dec 28 18:19:44 

Re: RISC-V: Use convert instructions instead of calling library functions

2024-03-19 Thread Palmer Dabbelt

On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:



On 3/18/24 3:09 AM, Jivan Hakobyan wrote:

As RV has round instructions it is reasonable to use them instead of
calling the library functions.

With my patch for the following C code:
double foo(double a) {
     return ceil(a);
}

GCC generates the following ASM code (before it was tail call)
foo:
         fabs.d  fa4,fa0
         lui     a5,%hi(.LC0)
         fld     fa3,%lo(.LC0)(a5)
         flt.d   a5,fa4,fa3
         beq     a5,zero,.L3
         fcvt.l.d a5,fa0,rup


I'm not sure exactly what context this is in, but my reading of 
"according to the current rounding mode" means we'd usually use the 
dynamic rounding mode.



         fcvt.d.l        fa4,a5
         fsgnj.d fa0,fa4,fa0
.L3:
         ret

.LC0:
         .word   0
         .word   1127219200     // 0x4330


The patch I have evaluated on SPEC2017.
Counted dynamic instructions counts and got the following improvements

510.parest_r       262 m      -
511.povray_r      2.1  b        0.04%
521.wrt_r            269 m       -
526.blender_r    3 b             0.1%
527.cam4_r       15 b           0.6%
538.imagick_r    365 b         7.6%

Overall executed 385 billion fewer instructions which is 0.5%.

A few more notes.

The sequence Jivan is using is derived from LLVM.  The condition in the
generated code tests for those values were are supposed to pass through
unaltered.  The condition in the pattern ensures we do something
sensible WRT FE_INEXACT and mirrors how other ports handle these insns.


My only worry here is that when we were doing the other patterns we 
decided not to do rint.  I can't remember exactly why, but from reading 
the docs we probably just skipped it due to the inexact handling and Zfa 
having an instruction that just does this.  FP stuff is always a bit of 
a time sink, so at that point it probably just fell off the priority 
list.


I'm not really an FP guy, so I usually just poke around what the other 
ports generate and try to figure out what's going on.  arm64 has the Zfa 
instruction and x86 FP is complicated, so I'm not sure exactly who else 
to look at for this sort of stuff.  From just looking at the code, 
though, I think there's two issues -- I'm not really an FP person, 
though, so take this with a grain of salt:


IIUC what we've got here doesn't actually set the inexact flag for the 
bounds check failure case, as we're just loading up an exact constant 
and doing the bounds check.  We're also not clamping to INT_MAX-type 
values, but not sure if we're supposed to.  I think we could fix both of 
those by adjusting the expansion to something like


         fabs.d  fa4,fa0
         lui     a5,%hi(.LC0)
         fld     fa3,%lo(.LC0)(a5)
         flt.d   a5,fa4,fa3
         bne     a5,zero,.L3
 mv  fa0, fa3
.L3:
         fcvt.l.d a5,fa0,rup
         fcvt.d.l        fa4,a5
         fsgnj.d fa0,fa4,fa0
         ret

and then adjusting the constant to be an epsilon larger than INT_MAX so 
it'll still trigger the clamping but also inexact.


There's also a pair of changes to the ISA in 2020 that added the 
conversion inexact handling requirement, it was a grey area before.  I 
don't remember exactly what happened there, but I did remember it 
happening.  I don't think anyone cares all that much about the 
performance of systems that target the older ISAs, so maybe we just 
restrict the non-libcall expansion to ISAs that contain the new wording?


   commit 5890a1a702abf4157d5879717a39d8ecdae0de68 (tag: draft-20201229-5890a1a)
   Author: Andrew Waterman 
   Date:   Mon Dec 28 18:19:44 2020 -0800
   
   Clarify when FP conversions raise the Inexact flag
   
   diff --git a/src/f.tex b/src/f.tex

   index 8649d75..97c253b 100644
   --- a/src/f.tex
   +++ b/src/f.tex
   @@ -594,9 +594,9 @@ instructions round according to the {\em rm} field.  A 
floating-point register
can be initialized to floating-point positive zero using FCVT.S.W {\em rd},
{\tt x0}, which will never set any exception flags.
   
   -All floating-point conversion instructions set the Inexact exception flag if the

   -result differs from its operand value, yet is representable in the 
destination
   -format.
   +All floating-point conversion instructions set the Inexact exception flag if
   +the rounded result differs from the operand value and the Invalid exception
   +flag is not set.
   
\vspace{-0.2in}

\begin{center}
   
   commit 27b40fbc798357fcb4b1deaba4553646fe677576 (tag: draft-20200229-27b40fb)

   Author: Andrew Waterman 
   Date:   Fri Feb 28 18:12:47 2020 -0800
   
   Clarify that FCVT instructions signal inexact
   
   diff --git a/src/f.tex b/src/f.tex

   index 7680347..a9022c4 100644
   --- a/src/f.tex
   +++ b/src/f.tex
   @@ -582,6 +582,10 @@ instructions round according to the {\em rm} field.  A 
floating-point register
can be initialized to floating-point positive 

Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change

2024-02-29 Thread Palmer Dabbelt

On Wed, 28 Feb 2024 02:24:40 PST (-0800), Robin Dapp wrote:

I suggest specify -fno-schedule-insns to force tests assembler never
change for any scheduling model.


We already do that and that's the point - as I mentioned before, no
scheduling is worse than default scheduling here (for some definition
of worse).  The way to reduce the number of vsetvls is to set the
load latency to a low value.


I think -fno-schedule-insns is a perfectly reasonable way to get rid of 
the test failures in the short term.


Using -fno-schedule-insns doesn't really fix the core fragility of the 
tests, though: what the pass does depends very much on the order of 
instructions it sees, so anything that reorders RTL is going to cause 
churn in the tests.  Sure getting rid of scheduling will get rid of a 
big cause for reordering, but any pass could reorder RTL and thus change 
the expected vsetvl counts.


Maybe the right thing to do here is to rewrite these as RTL tests?  That 
way we can very tightly control the input ordering.  It's kind of the 
opposite of Jeff's suggestion to add more debug output to the pass, but 
I think that wouldn't actually solve the issue: we're not having trouble 
matching assembly, the fragility comes from the input side.


That might be a "grass is always greener" thing, though, as I don't 
think I've managed to write a useful RTL test yet...




Regards
 Robin


Re: [PATCH] RISC-V: Fix __atomic_compare_exchange with 32 bit value on RV64

2024-02-28 Thread Palmer Dabbelt

On Wed, 28 Feb 2024 09:36:38 PST (-0800), Patrick O'Neill wrote:


On 2/28/24 07:02, Palmer Dabbelt wrote:

On Wed, 28 Feb 2024 06:57:53 PST (-0800), jeffreya...@gmail.com wrote:



On 2/28/24 05:23, Kito Cheng wrote:

atomic_compare_and_swapsi will use lr.w and sc.w to do the atomic
operation on
RV64, however lr.w is doing sign extend to DI and compare
instruction only have
DI mode on RV64, so the expected value should be sign extend before
compare as
well, so that we can get right compare result.

gcc/ChangeLog:

PR target/114130
* config/riscv/sync.md (atomic_compare_and_swap): Sign
extend the expected value if needed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr114130.c: New.

Nearly rejected this as I think the description was a bit ambiguous and
I thought you were extending the result of the lr.w.  But it's actually
the other value you're ensuring gets properly extended.


I had the same response, but after reading it I'm not quite sure how
to say it better.


Maybe something like

   atomic_compare_and_swapsi will use lr.w to do obtain the original value, 
   which sign extends to DI.  RV64 only has DI comparisons, so we also need 
   to sign extend the expected value to DI as otherwise the comparison will 
   fail when the expected value has the 32nd bit set.


would do it?  Either way

Reviewed-by: Palmer Dabbelt 

as I've managed to convince myself it's correct.  We should probably 
backport this one, the bug has likely been around for a while.





OK.


I was looking at the code to try and ask if we have the same bug for
the short inline CAS routines, but I've got to run to some meetings...


I don't think subword AMO CAS is impacted.

As part of the CAS we mask both the expected value [2] and the retrieved
value[1] before comparing.


I'm always a bit lost when it comes to bit arithmetic, but I think it's 
OK.  It smells like it's being a little loose with the 
extensions/comparisons, but just looking at some generated code for this 
simple case:


   void foo(uint16_t *p, uint16_t *e, uint16_t *d) {
   __atomic_compare_exchange(p, e, d, 0, __ATOMIC_RELAXED, 
__ATOMIC_RELAXED);
   }

   foo:
   lhu a3,0(a2)
   lhu a2,0(a1)
   andia4,a0,3
   li  a5,65536
   slliw   a4,a4,3
   addiw   a5,a5,-1
   sllwa5,a5,a4
   sllwa3,a3,a4
   sllwa7,a2,a4
   andia0,a0,-4
   and a3,a3,a5
   not t1,a5
   and a7,a7,a5
   1:
   lr.wa6, 0(a0)
   and t3, a6, a5// Both a6 (from the lr.w) and a5 
  // (from the sllw) are sign extended, 
  // so the result in t3 is sign extended.
   bne t3, a7, 1f// a7 is also sign extended (before 
  // and after the masking above), so 
  // it's safe for comparison

   and t3, a6, t1
   or  t3, t3, a3
   sc.wt3, t3, 0(a0) // The top bits of t3 end up polluted 
	  // with sign extension, but it doesn't 
  // matter because of the sc.w.

   bnezt3, 1b
   1:
   srawa6,a6,a4
   slliw   a2,a2,16
   slliw   a5,a6,16
   sraiw   a2,a2,16
   sraiw   a5,a5,16
   subwa5,a5,a2
   beq a5,zero,.L1
   sh  a6,0(a1)
   .L1:
   ret

So I think we're OK -- that masking of a7 looks redundant here, but I 
don't think we could get away with just


   diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
   index 54bb0a66518..15956940032 100644
   --- a/gcc/config/riscv/sync.md
   +++ b/gcc/config/riscv/sync.md
   @@ -456,7 +456,6 @@ (define_expand "atomic_cas_value_strong"
  riscv_lshift_subword (mode, o, shift, _o);
  riscv_lshift_subword (mode, n, shift, _n);
   
   -  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));

  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
   
  enum memmodel model_success = (enum memmodel) INTVAL (operands[4]);


because we'd need the masking for when we don't know the high bits are 
safe pre-shift.  So maybe some sort of simplify call could help out 
there, but I bet it's not really worth bothering -- the bookeeping 
doesn't generally matter that much around AMOs.



- Patrick

[1]:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/sync.md;h=54bb0a66518ae353fa4ed640339213bf5da6682c;hb=refs/heads/master#l495
[2]:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/sync.md;h=54bb0a66518ae353fa4ed640339213bf5da6682c;hb=refs/heads/master#l459





Jeff


Re: [PATCH] RISC-V: Fix __atomic_compare_exchange with 32 bit value on RV64

2024-02-28 Thread Palmer Dabbelt

On Wed, 28 Feb 2024 06:57:53 PST (-0800), jeffreya...@gmail.com wrote:



On 2/28/24 05:23, Kito Cheng wrote:

atomic_compare_and_swapsi will use lr.w and sc.w to do the atomic operation on
RV64, however lr.w is doing sign extend to DI and compare instruction only have
DI mode on RV64, so the expected value should be sign extend before compare as
well, so that we can get right compare result.

gcc/ChangeLog:

PR target/114130
* config/riscv/sync.md (atomic_compare_and_swap): Sign
extend the expected value if needed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr114130.c: New.

Nearly rejected this as I think the description was a bit ambiguous and
I thought you were extending the result of the lr.w.  But it's actually
the other value you're ensuring gets properly extended.


I had the same response, but after reading it I'm not quite sure how to 
say it better.



OK.


I was looking at the code to try and ask if we have the same bug for the 
short inline CAS routines, but I've got to run to some meetings...




Jeff


Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change

2024-02-27 Thread Palmer Dabbelt

On Tue, 27 Feb 2024 15:53:19 PST (-0800), jeffreya...@gmail.com wrote:



On 2/27/24 15:56, 钟居哲 wrote:

 >> I don't think it's that simple.  On some uarchs vsetvls are nearly free

while on others they can be fairly expensive.  It's not clear (to me)
yet if one approach or the other is going to be the more common.


That's uarch dependent which is not the stuff I am talking about.
What's I want to say is that this patch breaks those testcases I added 
for VSETVL PASS testing.

And those testcases are uarch independent.
No, uarch impacts things like latency, which in turn impacts scheduling, 
which in turn impacts vsetvl generation/optimization.


Ya, and I think that's just what's expected for this sort of approach.  
Edwin and I were working through that possibility in the office earlier, 
but we didn't have the code up.  So I figured I'd just go through one in 
more detail to see if what we were talking about was sane.  Grabbing 
some arbitrary function in the changed set:


   void
   test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
   vbool1_t v1 = *(vbool1_t*)in;
   vbool64_t v2 = *(vbool64_t*)in;
   
   *(vbool1_t*)(out + 100) = v1;

   *(vbool64_t*)(out + 200) = v2;
   }

we currently get (from generic-ooo)

   test_vbool1_then_vbool64:
   vsetvli a4,zero,e8,m8,ta,ma
   vlm.v   v2,0(a0)
   vsetvli a5,zero,e8,mf8,ta,ma
   vlm.v   v1,0(a0)
   addia3,a1,100
   vsetvli a4,zero,e8,m8,ta,ma
   addia1,a1,200
   vsm.v   v2,0(a3)
   vsetvli a5,zero,e8,mf8,ta,ma
   vsm.v   v1,0(a1)
   ret

but we could generate correct code with 2, 3, or 4 vsetvli instructions 
depending on how things are scheduled.  For example, with 
-fno-schedule-insns I happen to get 3


   test_vbool1_then_vbool64:
   vsetvli a5,zero,e8,mf8,ta,ma
   vlm.v   v1,0(a0)
   vsetvli a4,zero,e8,m8,ta,ma
   vlm.v   v2,0(a0)
   addia3,a1,100
   addia1,a1,200
   vsm.v   v2,0(a3)
   vsetvli a5,zero,e8,mf8,ta,ma
   vsm.v   v1,0(a1)
   ret

because the load/store with the same vcfg end up scheduled back-to-back.  
I don't see any reason why something along the lines of


   test_vbool1_then_vbool64:
   vsetvli a4,zero,e8,m8,ta,ma
   vlm.v   v2,0(a0)
   addia3,a1,100
   vsm.v   v2,0(a3)
   vsetvli a5,zero,e8,mf8,ta,ma
   vlm.v   v1,0(a0)
   addia1,a1,200
   vsm.v   v1,0(a1)
   ret

wouldn't be correct (though I just reordered the loads/stores and then 
removed the redundant vsetvlis, so I might have some address calculation 
wrong in there).  The validity of removing a vsetvli depends on how the 
dependant instructions get scheduled, which is very much under the 
control of the pipeline model -- it's entirely possible the code with 
more vsetvlis is faster, if vsetvli is cheap and scheduling ends up 
hiding latency better.


So IMO it's completely reasonable to have vsetvli count ranges for a 
test like this.  I haven't looked at the others in any detail, but I 
remember seeing similar things elsewhere last time I was poking around 
these tests.  We should probably double-check all these and write some 
comments, just to make sure we're not missing any bugs, but I'd bet 
there's a bunch of valid testsuite changes.


Like we talked about in the call this morning we should probably make 
the tests more precise, but that's a larger effort.  After working 
through this I'm thinking it's a bit higher priority, though, as in this 
case the bounds are so wide we're not even really testing the pass any 
more.




jeff


Re: [PATCH] RISC-V: Point our Python scripts at python3

2024-02-23 Thread Palmer Dabbelt

On Thu, 22 Feb 2024 20:29:37 PST (-0800), Kito Cheng wrote:

I guess Palmer is too busy, so committed to trunk :P


Thanks, I got distracted with some work stuff ;)



On Tue, Feb 13, 2024 at 11:55 PM Jeff Law  wrote:




On 2/9/24 09:53, Palmer Dabbelt wrote:
> This builds for me, and I frequently have python-is-python3 type
> packages installed so I think I've been implicitly testing it for a
> while.  Looks like Kito's tested similar configurations, and the
> bugzilla indicates we should be moving over.
>
> gcc/ChangeLog:
>
>   PR 109668
>   * config/riscv/arch-canonicalize: Move to python3
>   * config/riscv/multilib-generator: Likewise
Just to summarize from the coordination call this morning.  We've agreed
this should go forward.  While there is minor risk (this code is rarely
run), it's something we're prepared to handle if there is fallout.

Jeff


Re: [PATCH v1] RISC-V: Upgrade RVV intrinsic version to 0.12

2024-02-22 Thread Palmer Dabbelt

On Wed, 21 Feb 2024 16:02:50 PST (-0800), Kito Cheng wrote:

Palmer Dabbelt  於 2024年2月22日 週四 07:42 寫道:


On Wed, 21 Feb 2024 15:34:32 PST (-0800), Kito Cheng wrote:
> LGTM for the patch
>
> Li, Pan2  於 2024年2月21日 週三 12:31 寫道:
>
>> Hi kito and juzhe.
>>
>> There may be 2 items for double-confirm. Thanks a lot.
>>
>> 1. Not very sure if we need to upgrade the version for
>> __riscv_th_v_intrinsic.
>>
>
> Yes since 0.11 and 0.12 is not really compatible

Where are the incompatibilities?  The whole reason we accepted the
intrinsics in the first place is because the RVI folks said they
wouldn't break compatibility, if that's changed then just dropping the
old version is going to break users.



0.12 have interface for segment load store and new fixed points intrinsic
compare to 0.11, the first one item is not incompatible change since it's
new added and gcc 13 isn't implemented the legacy one, the later one is
kinda broken on both llvm and gcc which is made is not really useful in
practice.

Other than that, everything are same, it's not 100% compatible so I am not
intend to cheating my self to say it's compatible, but we do think it's
necessary evil since fixing point stuff are not right design and
implementation.


OK, those don't seem so scary.  So maybe let's just put it in a NEWS 
entry or something?  It's mildly interesting to users, but I agree the 
earlier intrinsics spec was vague enough in some areas we can get away 
with the diffs I've seen.



Anyway it's became frozen mode, 1.0 rc0 has been tagged, no API will
change/remove.


OK, so I guess we should move to 1.0, then?  Are you guys going to pick 
that up?






> 2. Do we need to upgrade the even a newer version (like 1.0) for the GCC
14
>> release, or we can do it later.
>>
>
> Yeah, Ideal case is we can update that before release made :p
>
>
>
>
>> Pan
>>
>> -Original Message-
>> From: Li, Pan2 
>> Sent: Wednesday, February 21, 2024 12:27 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: juzhe.zh...@rivai.ai; Li, Pan2 ; Wang, Yanzhang
<
>> yanzhang.w...@intel.com>; kito.ch...@gmail.com
>> Subject: [PATCH v1] RISC-V: Upgrade RVV intrinsic version to 0.12
>>
>> From: Pan Li 
>>
>> Upgrade the version of RVV intrinsic from 0.11 to 0.12.
>>
>> PR target/114017
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Upgrade
>> the version to 0.12.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/predef-__riscv_v_intrinsic.c: Update the
>> version to 0.12.
>> * gcc.target/riscv/rvv/base/pr114017-1.c: New test.
>>
>> Signed-off-by: Pan Li 
>> ---
>>  gcc/config/riscv/riscv-c.cc   |  2 +-
>>  .../riscv/predef-__riscv_v_intrinsic.c|  2 +-
>>  .../gcc.target/riscv/rvv/base/pr114017-1.c| 19 +++
>>  3 files changed, 21 insertions(+), 2 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
>>
>> diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
>> index 3ef06dcfd2d..3755ec0b8ef 100644
>> --- a/gcc/config/riscv/riscv-c.cc
>> +++ b/gcc/config/riscv/riscv-c.cc
>> @@ -139,7 +139,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
>>  {
>>builtin_define ("__riscv_vector");
>>builtin_define_with_int_value ("__riscv_v_intrinsic",
>> -riscv_ext_version_value (0, 11));
>> +riscv_ext_version_value (0, 12));
>>  }
>>
>> if (TARGET_XTHEADVECTOR)
>> diff --git a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
>> b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
>> index dbbedf54f87..07f1f159a8f 100644
>> --- a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
>> +++ b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
>> @@ -3,7 +3,7 @@
>>
>>  int main () {
>>
>> -#if __riscv_v_intrinsic != 11000
>> +#if __riscv_v_intrinsic != 12000
>>  #error "__riscv_v_intrinsic"
>>  #endif
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
>> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
>> new file mode 100644
>> index 000..8eee7c68f71
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +vuint8mf2_t
>> +test (vuint16m1_t val, size_t shift, size_t vl)
>> +{
>> +#if __riscv_v_intrinsic == 11000
>> +  #warning "RVV Intrinsics v0.11"
>> +  return __riscv_vnclipu (val, shift, vl);
>> +#endif
>> +
>> +#if __riscv_v_intrinsic == 12000
>> +  #warning "RVV Intrinsics v0.12" /* { dg-warning "RVV Intrinsics
v0.12"
>> } */
>> +  return __riscv_vnclipu (val, shift, 0, vl);
>> +#endif
>> +}
>> +
>> --
>> 2.34.1
>>
>>



Re: [PATCH v1] RISC-V: Upgrade RVV intrinsic version to 0.12

2024-02-21 Thread Palmer Dabbelt

On Wed, 21 Feb 2024 15:34:32 PST (-0800), Kito Cheng wrote:

LGTM for the patch

Li, Pan2  於 2024年2月21日 週三 12:31 寫道:


Hi kito and juzhe.

There may be 2 items for double-confirm. Thanks a lot.

1. Not very sure if we need to upgrade the version for
__riscv_th_v_intrinsic.



Yes since 0.11 and 0.12 is not really compatible


Where are the incompatibilities?  The whole reason we accepted the 
intrinsics in the first place is because the RVI folks said they 
wouldn't break compatibility, if that's changed then just dropping the 
old version is going to break users.



2. Do we need to upgrade the even a newer version (like 1.0) for the GCC 14

release, or we can do it later.



Yeah, Ideal case is we can update that before release made :p





Pan

-Original Message-
From: Li, Pan2 
Sent: Wednesday, February 21, 2024 12:27 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Li, Pan2 ; Wang, Yanzhang <
yanzhang.w...@intel.com>; kito.ch...@gmail.com
Subject: [PATCH v1] RISC-V: Upgrade RVV intrinsic version to 0.12

From: Pan Li 

Upgrade the version of RVV intrinsic from 0.11 to 0.12.

PR target/114017

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Upgrade
the version to 0.12.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-__riscv_v_intrinsic.c: Update the
version to 0.12.
* gcc.target/riscv/rvv/base/pr114017-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   |  2 +-
 .../riscv/predef-__riscv_v_intrinsic.c|  2 +-
 .../gcc.target/riscv/rvv/base/pr114017-1.c| 19 +++
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 3ef06dcfd2d..3755ec0b8ef 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -139,7 +139,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 {
   builtin_define ("__riscv_vector");
   builtin_define_with_int_value ("__riscv_v_intrinsic",
-riscv_ext_version_value (0, 11));
+riscv_ext_version_value (0, 12));
 }

if (TARGET_XTHEADVECTOR)
diff --git a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
index dbbedf54f87..07f1f159a8f 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
@@ -3,7 +3,7 @@

 int main () {

-#if __riscv_v_intrinsic != 11000
+#if __riscv_v_intrinsic != 12000
 #error "__riscv_v_intrinsic"
 #endif

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
new file mode 100644
index 000..8eee7c68f71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+vuint8mf2_t
+test (vuint16m1_t val, size_t shift, size_t vl)
+{
+#if __riscv_v_intrinsic == 11000
+  #warning "RVV Intrinsics v0.11"
+  return __riscv_vnclipu (val, shift, vl);
+#endif
+
+#if __riscv_v_intrinsic == 12000
+  #warning "RVV Intrinsics v0.12" /* { dg-warning "RVV Intrinsics v0.12"
} */
+  return __riscv_vnclipu (val, shift, 0, vl);
+#endif
+}
+
--
2.34.1




[PATCH] doc: RISC-V: Document that -mcpu doesn't override -march or -mtune

2024-02-20 Thread Palmer Dabbelt
This came up recently as Edwin was looking through the test suite.  A
few of us were talking about this during the patchwork meeting and were
surprised.  Looks like this is the desired behavior, so let's at least
document it.

gcc/ChangeLog:

* doc/invoke.texi: Document -mcpu.

Signed-off-by: Palmer Dabbelt 
---
 gcc/doc/invoke.texi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6ec56493e59..4a4bba9f1cd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -30670,6 +30670,8 @@ Permissible values for this option are: 
@samp{sifive-e20}, @samp{sifive-e21},
 @samp{sifive-s21}, @samp{sifive-s51}, @samp{sifive-s54}, @samp{sifive-s76},
 @samp{sifive-u54}, @samp{sifive-u74}, and @samp{sifive-x280}.
 
+Note that @option{-mcpu} does not override @option{-march} or @option{-mtune}.
+
 @opindex mtune
 @item -mtune=@var{processor-string}
 Optimize the output for the given processor, specified by microarchitecture or
-- 
2.43.0



[PATCH] RISC-V: Point our Python scripts at python3

2024-02-09 Thread Palmer Dabbelt
This builds for me, and I frequently have python-is-python3 type
packages installed so I think I've been implicitly testing it for a
while.  Looks like Kito's tested similar configurations, and the
bugzilla indicates we should be moving over.

gcc/ChangeLog:

PR 109668
* config/riscv/arch-canonicalize: Move to python3
* config/riscv/multilib-generator: Likewise
---
I am in no way a Python expert, but I think this is functionally a NOP
for the configurations I've been building/testing.  It's passing a
simple cross build.

---
 gcc/config/riscv/arch-canonicalize  | 2 +-
 gcc/config/riscv/multilib-generator | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
index 629bed85347..8f7d040cdeb 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
 
 # Tool for canonical RISC-V architecture string.
 # Copyright (C) 2011-2024 Free Software Foundation, Inc.
diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
index 1a957878d0c..25cb6762ea7 100755
--- a/gcc/config/riscv/multilib-generator
+++ b/gcc/config/riscv/multilib-generator
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
 
 # RISC-V multilib list generator.
 # Copyright (C) 2011-2024 Free Software Foundation, Inc.
-- 
2.43.0



Re: [PATCH] RISC-V: Fix rvv intrinsic pragma tests dejagnu selector

2024-01-30 Thread Palmer Dabbelt

On Mon, 29 Jan 2024 11:38:12 PST (-0800), e...@rivosinc.com wrote:

Adding rvv related flags (i.e. --param=riscv-autovec-preference) to
non vector targets bypassed the dejagnu skip test directive. Change the
target selector to skip if rvv is enabled

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-1.c: change selector
* gcc.target/riscv/rvv/base/pragma-2.c: ditto
* gcc.target/riscv/rvv/base/pragma-3.c: ditto

Signed-off-by: Edwin Lu 
---
 gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c
index 2eef9e1e1a8..a072bdd47bf 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/abi-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! riscv_xtheadvector } } } */
-/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+/* { dg-skip-if "test rvv intrinsic" { ! riscv_v } } */

 void foo0 () {__rvv_bool64_t t;}
 void foo1 () {__rvv_bool32_t t;}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c
index fd2aa3066cd..fc1bb13c53d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-2.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
-/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+/* { dg-skip-if "test rvv intrinsic" { ! riscv_v } } */

 #pragma riscv intrinsic "vector"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c
index 96a0e051a29..45580bb2faa 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pragma-3.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
-/* { dg-skip-if "test rvv intrinsic" { *-*-* } { "*" } { "-march=rv*v*" } } */
+/* { dg-skip-if "test rvv intrinsic" { ! riscv_v } */

 #pragma riscv intrinsic "report-error" /* { dg-error {unknown '#pragma riscv 
intrinsic' option 'report-error'} } */


Reviewed-by: Palmer Dabbelt 


Re: [PATCH] RISC-V: Don't make Ztso imply A

2024-01-24 Thread Palmer Dabbelt

On Wed, 24 Jan 2024 16:19:06 PST (-0800), jeffreya...@gmail.com wrote:



On 1/24/24 17:07, Patrick O'Neill wrote:

On 12/16/23 10:58, Jeff Law wrote:



On 12/15/23 17:14, Andrew Waterman wrote:

On Fri, Dec 15, 2023 at 1:38 PM Jeff Law  wrote:




On 12/12/23 20:54, Palmer Dabbelt wrote:

I can't actually find anything in the ISA manual that makes Ztso imply
A.  In theory the memory ordering is just a different thing that
the set
of availiable instructions (ie, Ztso without A would still imply
TSO for
loads and stores).  It also seems like a configuration that could be
sane to build: without A it's all but impossible to write any
meaningful
multi-core code, and TSO is really cheap for a single core.

That said, I think it's kind of reasonable to provide A to users
asking
for Ztso.  So maybe even if this was a mistake it's the right thing to
do?

gcc/ChangeLog:

   * common/config/riscv/riscv-common.cc (riscv_implied_info):
   Remove {"ztso", "a"}.

I'd tend to think step #1 is to determine what the ISA intent is,
meaning engagement with RVI.

We've got time for that engagement and to adjust based on the result.
So I'd tend to defer until we know if Ztso should imply A or not.


Palmer is correct.  There is no coupling between Ztso and A. (And
there are uncontrived examples of such systems: e.g. embedded
processors without caches that don't support the LR/SC instructions,
but happen to be TSO.)

Thanks for the confirmation.  Palmer, commit whenever is convenient
for you.

jeff


I was going to commit on behalf of Palmer and saw this was marked as
Deferred in patchworks:
https://patchwork.sourceware.org/project/gcc/patch/20231213035405.2118-1-pal...@rivosinc.com/

Is this an old marking from before Andrew confirmed that they are
independent?

Yea, I put into deferred before Andrew chimed in.


OK, so I think we can just commit it?


Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Palmer Dabbelt

On Wed, 17 Jan 2024 19:19:58 PST (-0800), monk.chi...@sifive.com wrote:

Thanks for your advice!! I agree it should be fixed in the RISC-V backend
when expansion.


On Wed, Jan 17, 2024 at 10:37 PM Jeff Law  wrote:




On 1/17/24 05:14, Richard Biener wrote:
> On Wed, 17 Jan 2024, Monk Chiang wrote:
>
>> This allows the backend to generate movcc instructions, if target
>> machine has movcc pattern.
>>
>> branchless-cond.c needs to be updated since some target machines have
>> conditional move instructions, and the experssion will not change to
>> branchless expression.
>
> While I agree this pattern should possibly be applied during RTL
> expansion or instruction selection on x86 which also has movcc
> the multiplication is cheaper.  So I don't think this isn't the way to
go.
>
> I'd rather revert the change than trying to "fix" it this way?
WRT reverting -- the patch in question's sole purpose was to enable
branchless sequences for that very same code.  Reverting would regress
performance on a variety of micro-architectures.  IIUC, the issue is
that the SiFive part in question has a fusion which allows it to do the
branchy sequence cheaply.

ISTM this really needs to be addressed during expansion and most likely
with a RISC-V target twiddle for the micro-archs which have
short-forward-branch optimizations.


IIRC I ran into some of these middle-end interactions a year or two ago 
and determined that we'd need middle-end changes to get this working 
smoothly -- essentially replacing the expander checks for a MOVCC insn  
with some sort of costing.


Without that, we're just going to end up with some missed optimizations 
that favor one way or the other.




jeff



Re: [committed] RISC-V: Add crypto vector builtin function.

2024-01-04 Thread Palmer Dabbelt

On Thu, 04 Jan 2024 19:17:21 PST (-0800), juzhe.zh...@rivai.ai wrote:

Hi, Wang Feng.

Your patch has some ICEs:
FAIL: gcc.target/riscv/rvv/base/zvbc-intrinsic.c (internal compiler error: RTL 
check: expected code 'const_int', have 'reg' in vlmax_avl_type_p, at 
config/riscv/riscv-v.cc:4930)
FAIL: gcc.target/riscv/rvv/base/zvbc-intrinsic.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c (internal compiler 
error: RTL check: expected code 'const_int', have 'reg' in vlmax_avl_type_p, at 
config/riscv/riscv-v.cc:4930)
FAIL: gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c (internal compiler 
error: RTL check: expected code 'const_int', have 'reg' in vlmax_avl_type_p, at 
config/riscv/riscv-v.cc:4930)
FAIL: gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c (test for excess errors)


So let's just revert it, it doesn't even look like it was reviewed.  
We've set a really bad precedent here where we're just merging a bunch 
of unreviewed code and sorting out the regressions in trunk, that's not 
the right way to do things.




I suspect you didn't enable rtl check in the regression:

../../configure --enable-gcc-checking=rtl.
Plz enable rtl check in the regression tests.



juzhe.zh...@rivai.ai


Re: [PATCH]middle-end: Don't apply copysign optimization if target does not implement optab [PR112468]

2024-01-04 Thread Palmer Dabbelt
On Thu, 04 Jan 2024 10:20:25 PST (-0800), tamar.christ...@arm.com wrote:
> Hi All,
>
> currently GCC does not treat IFN_COPYSIGN the same as the copysign tree expr.
> The latter has a libcall fallback and the IFN can only do optabs.
>
> Because of this the change I made to optimize copysign only works if the
> target has impemented the optab, but it should work for those that have the
> libcall too.
>
> More annoyingly if a target has vector versions of ABS and NEG but not 
> COPYSIGN
> then the change made them lose vectorization.
>
> The proper fix for this is to treat the IFN the same as the tree EXPR and to
> enhance expand_COPYSIGN to also support vector calls.
>
> I have such a patch for GCC 15 but it's quite big and too invasive for 
> stage-4.
> As such this is a minimal fix, just don't apply the transformation and leave
> targets which don't have the optab unoptimized.
>
> Targets list for check_effective_target_ifn_copysign was gotten by grepping 
> for
> copysign and looking at the optab.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Tests ran in x86_64-pc-linux-gnu -m64/-m32 and tests no longer fail.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR tree-optimization/112468
>   * doc/sourcebuild.texi: Document ifn_copysign.
>   * match.pd: Only apply transformation if target supports the IFN.
>
> gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/112468
>   * gcc.dg/fold-copysign-1.c: Modify tests based on if target supports
>   IFN_COPYSIGN.
>   * gcc.dg/pr55152-2.c: Likewise.
>   * gcc.dg/tree-ssa/abs-4.c: Likewise.
>   * gcc.dg/tree-ssa/backprop-6.c: Likewise.
>   * gcc.dg/tree-ssa/copy-sign-2.c: Likewise.
>   * gcc.dg/tree-ssa/mult-abs-2.c: Likewise.
>   * lib/target-supports.exp (check_effective_target_ifn_copysign): New.
>
> --- inline copy of patch --
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 
> 4be67daedb20d394857c02739389cabf23c0d533..f4847dafe65cbbf8c9de34905f614ef6957658b4
>  100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2664,6 +2664,10 @@ Target requires a command line argument to enable a 
> SIMD instruction set.
>  @item xorsign
>  Target supports the xorsign optab expansion.
>
> +@item ifn_copysign
> +Target supports the IFN_COPYSIGN optab expansion for both scalar and vector
> +types.
> +
>  @end table
>
>  @subsubsection Environment attributes
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> d57e29bfe1d68afd4df4dda20fecc2405ff05332..87d13e7e3e1aa6d89119142b614890dc4729b521
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1159,13 +1159,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (simplify
>(copysigns @0 REAL_CST@1)
>(if (!REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
> -   (abs @0
> +   (abs @0)
> +#if GIMPLE
> +   (if (!direct_internal_fn_supported_p (IFN_COPYSIGN, type,
> +  OPTIMIZE_FOR_BOTH))
> +(negate (abs @0)))
> +#endif
> +   )))
>
> +#if GIMPLE
>  /* Transform fneg (fabs (X)) -> copysign (X, -1).  */
>  (simplify
>   (negate (abs @0))
> - (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
> -
> + (if (direct_internal_fn_supported_p (IFN_COPYSIGN, type,
> +   OPTIMIZE_FOR_BOTH))
> +   (IFN_COPYSIGN @0 { build_minus_one_cst (type); })))
> +#endif
>  /* copysign(copysign(x, y), z) -> copysign(x, z).  */
>  (for copysigns (COPYSIGN_ALL)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
> b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> index 
> f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6..96b80c733794fffada1b08274ef39cc8f6e442ce
>  100644
> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
> +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O -fdump-tree-cddce1" } */
> +/* { dg-additional-options "-msse -mfpmath=sse" { target { { i?86-*-* 
> x86_64-*-* } && ilp32 } } } */
>
>  double foo (double x)
>  {
> @@ -12,5 +13,7 @@ double bar (double x)
>return __builtin_copysign (x, minuszero);
>  }
>
> -/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */
> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" } } */
> +/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" { 
> target ifn_copysign } } } */
> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" { target 
> ifn_copysign } } } */
> +/* { dg-final { scan-tree-dump-times "= -" 1 "cddce1" { target { ! 
> ifn_copysign } } } } */
> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 2 "cddce1" { target { ! 
> ifn_copysign } } } } */
> diff --git a/gcc/testsuite/gcc.dg/pr55152-2.c 
> b/gcc/testsuite/gcc.dg/pr55152-2.c
> index 
> 605f202ed6bc7aa8fe921457b02ff0b88cc63ce6..24068cffa4a8e2807ba7d16c4ed3def4f736e797
>  100644
> --- a/gcc/testsuite/gcc.dg/pr55152-2.c
> +++ b/gcc/testsuite/gcc.dg/pr55152-2.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { 

Re: [PATCH] RISC-V: Add --with-cmodel configure-time argument

2023-12-21 Thread Palmer Dabbelt

On Thu, 21 Dec 2023 11:18:22 PST (-0800), jeffreya...@gmail.com wrote:



On 12/20/23 11:41, Palmer Dabbelt wrote:

I couldn't find another way to set the default code model.

gcc/ChangeLog:

* config.gcc (RISC-V): Add --with-cmodel
* config/riscv/riscv.h (TARGET_DEFAULT_CMODEL): Use
TARGET_RISCV_DEFAULT_CMODEL

OK once its sniff tested.


Thanks.  A few of us were chatting in the office yesterday, looks like 
it should be pretty manageable to get the large code model stuff into CI 
for testing.  With the holidays stuff might be a little clunky, but 
Patrick or Edwin should be able to get this going eventually.


So I'm going to do nothing for now ;)



jeff


Re: [RFC][V2] RISC-V: Support -mcmodel=large.

2023-12-20 Thread Palmer Dabbelt

On Wed, 20 Dec 2023 10:25:00 PST (-0800), jeffreya...@gmail.com wrote:



On 12/20/23 11:21, Palmer Dabbelt wrote:


Yea, the implementation relies largely on just pushing stuff into the
constant pool, so we're largely independent ABI stuff with the likely
exception being relocations.


Ya, but I think we'd only need the relocations if we were going to try
relaxing stuff.  We'd kicked around some ideas there: we could
de-duplicate constant pools or inline smaller constants.  That's all way
to complex to try and get into this upcoming binutils release, though
(doubly so with this LEB128 ABI break we're still trying to deal with).

Agreed.  And note that de-duplication is mostly implemented without need
for the target to do anything.  I was kindof amazed to see some of the
places it kicked in on other ports I've worked with.


I think all we'd need from GCC is some way to get the "this load is a 
constant pool address that can be messed with" relocation in there, the 
linker would do all the heavy lifting.  That's probably just a new 
assembler pseudo, so pretty much nothing on the compiler side of things.



In theory (and I did not test this), it should be possible to use large
code model codegen in a smaller mode and it should interoperate.  I
seriously pondered doing that as an additional test, then figured I had
other higher priority items on my list.


IMO we should test that.  At least the common case of a medlow libc
linked into medany programs should be easy.

+Patrick: let's add some configs to the CI for this?

I was pondering a one-off by turning on the large code model by default,
then doing a bootstrap & regression test in QEMU.  But integrated into
CI is even better.


OK, let's just add it to CI -- it'd be essentially the same testing, 
just it'll stick around.




Jeff


[PATCH] RISC-V: Add --with-cmodel configure-time argument

2023-12-20 Thread Palmer Dabbelt
I couldn't find another way to set the default code model.

gcc/ChangeLog:

* config.gcc (RISC-V): Add --with-cmodel
* config/riscv/riscv.h (TARGET_DEFAULT_CMODEL): Use
TARGET_RISCV_DEFAULT_CMODEL
---
I thought we had this already, but I figured I'd double-check my "ya,
that's easy we'll just add a --with-cmodel=large test to the CI" reply.
So maybe there's another way to do this, it's also entirely untested...
---
 gcc/config.gcc   | 15 +--
 gcc/config/riscv/riscv.h |  2 +-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f0676c830e8..9162e793b7d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4659,7 +4659,7 @@ case "${target}" in
;;
 
riscv*-*-*)
-   supported_defaults="abi arch tune riscv_attribute isa_spec"
+   supported_defaults="abi arch tune riscv_attribute isa_spec 
cmodel"
 
case "${target}" in
riscv-* | riscv32*) xlen=32 ;;
@@ -4700,6 +4700,17 @@ case "${target}" in
;;
esac
 
+   case "${with_cmodel}" in
+   ""|default|"medlow")
+   with_cmodel="CM_MEDLOW"
+   ;;
+   "medany")
+   with_cmodel="CM_MEDANY"
+   ;;
+   "large")
+   with_cmodel="CM_LARGE"
+   ;;
+   esac
 
# Infer arch from --with-arch, --target, and --with-abi.
case "${with_arch}" in
@@ -4725,7 +4736,7 @@ case "${target}" in
if test "x${PYTHON}" != x; then
with_arch=`${PYTHON} 
${srcdir}/config/riscv/arch-canonicalize -misa-spec=${with_isa_spec} 
${with_arch}`
fi
-   tm_defines="${tm_defines} 
TARGET_RISCV_DEFAULT_ARCH=${with_arch}"
+   tm_defines="${tm_defines} 
TARGET_RISCV_DEFAULT_ARCH=${with_arch} 
TARGET+RISCV_DEFAULT_CMODEL=${with_cmodel}"
 
# Make sure --with-abi is valid.  If it was not specified,
# pick a default based on the ISA, preferring soft-float
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..2d69b5276ef 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -112,7 +112,7 @@ ASM_MISA_SPEC
 "%{march=*:%:riscv_expand_arch(%*)} "  \
 "%{!march=*:%{mcpu=*:%:riscv_expand_arch_from_cpu(%*)}} "
 
-#define TARGET_DEFAULT_CMODEL CM_MEDLOW
+#define TARGET_DEFAULT_CMODEL TARGET_RISCV_DEFAULT_CMODEL
 
 #define LOCAL_LABEL_PREFIX "."
 #define USER_LABEL_PREFIX  ""
-- 
2.43.0



Re: [PATCH] RISC-V: Document -mcmodel=large

2023-12-20 Thread Palmer Dabbelt
On Wed, 20 Dec 2023 10:13:06 PST (-0800), jeffreya...@gmail.com wrote:
>
>
> On 12/20/23 11:08, Palmer Dabbelt wrote:
>> This slipped through the cracks.  Probably also NEWS-worthy.
>>
>> gcc/ChangeLog:
>>
>>  * doc/invoke.texi (RISC-V): Add -mcmodel=large.
> OK.
>
> And yes, I think we're going to need to to a new/changes update for the
> port as a whole as part of the gcc-14 process.

Sent.  Looks like we had no NEWS for 14, so we'll need to deal with 
that.  If nobody else does I'll scrub the history, but I need to deal 
with binutils, glibc, and Linux first...

>
> jeff


Re: [RFC][V2] RISC-V: Support -mcmodel=large.

2023-12-20 Thread Palmer Dabbelt

On Wed, 20 Dec 2023 10:12:04 PST (-0800), jeffreya...@gmail.com wrote:



On 12/20/23 11:05, Palmer Dabbelt wrote:

On Wed, 20 Dec 2023 09:55:48 PST (-0800), jeffreya...@gmail.com wrote:



On 12/18/23 00:46, KuanLin Chen wrote:

Hi Jeff,

Sorry for this missing.
I've removed riscv_asm_output_pool_epilogue because the pool
beginning is always aligned from FUNCTION_BOUNDARY.
Please find attached. Thank you.

Thanks. I regression tested this on rv64gc without any issues and fixed
up the ChangeLog a bit.  Pushed to the trunk.  Thanks for your patience.


Looks like the psABI PR is still open
<https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/388>?

I guess there's no hard psABI dependency here, we're just doing constant
pools so there's no new assembler/linker stuff that's strictly
necessary.  I'm fine just ignoring the psABI as it's a pretty miserable
place to try and get things done, we're started doing that where we can
elsewhere as well.

Yea, the implementation relies largely on just pushing stuff into the
constant pool, so we're largely independent ABI stuff with the likely
exception being relocations.


Ya, but I think we'd only need the relocations if we were going to try 
relaxing stuff.  We'd kicked around some ideas there: we could 
de-duplicate constant pools or inline smaller constants.  That's all way 
to complex to try and get into this upcoming binutils release, though 
(doubly so with this LEB128 ABI break we're still trying to deal with).


So I think we can just punt on all that for a bit.  We've got bigger 
fish to fry.



In theory (and I did not test this), it should be possible to use large
code model codegen in a smaller mode and it should interoperate.  I
seriously pondered doing that as an additional test, then figured I had
other higher priority items on my list.


IMO we should test that.  At least the common case of a medlow libc 
linked into medany programs should be easy.


+Patrick: let's add some configs to the CI for this?


So maybe we should just close that PR?

I'll let Kito chime in on that.


WFM, I try to stay as far away from that as possible ;)



jeff


[PATCH, wwwdocs] RISC-V: Add -mcmodel=large for GCC-14

2023-12-20 Thread Palmer Dabbelt
This was just merged.  Looks like we forgot to add any other NEWS items,
so I've added the header as well.
---
 htdocs/gcc-14/changes.html | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 24e6409a..2a7432a7 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -395,7 +395,11 @@ a work-in-progress.
 
 
 
-
+RISC-V
+
+
+  Support for -mcmodel=large has been added.
+
 
 
 
-- 
2.43.0



[PATCH] RISC-V: Document -mcmodel=large

2023-12-20 Thread Palmer Dabbelt
This slipped through the cracks.  Probably also NEWS-worthy.

gcc/ChangeLog:

* doc/invoke.texi (RISC-V): Add -mcmodel=large.
---
 gcc/doc/invoke.texi | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5af978b0a67..d8b355627d9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1245,7 +1245,7 @@ See RS/6000 and PowerPC Options.
 -msave-restore  -mno-save-restore
 -mshorten-memrefs  -mno-shorten-memrefs
 -mstrict-align  -mno-strict-align
--mcmodel=medlow  -mcmodel=medany
+-mcmodel=medlow  -mcmodel=medany -mcmodel=large
 -mexplicit-relocs  -mno-explicit-relocs
 -mrelax  -mno-relax
 -mriscv-attribute  -mno-riscv-attribute
@@ -30158,6 +30158,11 @@ The code generated by the medium-any code model is 
position-independent, but is
 not guaranteed to function correctly when linked into position-independent
 executables or libraries.
 
+@opindex -mcmodel=large
+@item -mcmodel=large
+Generate code for a large code model, which has no restrictions on size or
+placement of symbols.
+
 @item -mexplicit-relocs
 @itemx -mno-exlicit-relocs
 Use or do not use assembler relocation operators when dealing with symbolic
-- 
2.43.0



Re: [RFC][V2] RISC-V: Support -mcmodel=large.

2023-12-20 Thread Palmer Dabbelt

On Wed, 20 Dec 2023 09:55:48 PST (-0800), jeffreya...@gmail.com wrote:



On 12/18/23 00:46, KuanLin Chen wrote:

Hi Jeff,

Sorry for this missing.
I've removed riscv_asm_output_pool_epilogue because the pool
beginning is always aligned from FUNCTION_BOUNDARY.
Please find attached. Thank you.

Thanks. I regression tested this on rv64gc without any issues and fixed
up the ChangeLog a bit.  Pushed to the trunk.  Thanks for your patience.


Looks like the psABI PR is still open 
?


I guess there's no hard psABI dependency here, we're just doing constant 
pools so there's no new assembler/linker stuff that's strictly 
necessary.  I'm fine just ignoring the psABI as it's a pretty miserable 
place to try and get things done, we're started doing that where we can 
elsewhere as well.


So maybe we should just close that PR?


jeff


Re: [PATCH] RISC-V: Block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR

2023-12-20 Thread Palmer Dabbelt

On Tue, 05 Dec 2023 04:57:27 PST (-0800), juzhe.zh...@rivai.ai wrote:

This patch fixes ICE mentioned on PR112851 and PR112852.
Actually these ICEs happens many times in full coverage testing.

The ICE happens on:

bug.c:84:1: internal compiler error: in partial_subreg_p, at rtl.h:3187
   84 | }
  | ^
0x11a7271 partial_subreg_p(machine_mode, machine_mode)
../../../../gcc/gcc/rtl.h:3187

gcc_checking_assert (ordered_p (outer_prec, inner_prec));

outer_prec is the PRECISION of RVVM1SImode
inner_prec is the PRECISION of V64SImode

when it is zvl512b.

outer_prec is VLA mode with size (512, 512)
inner_prec is VLS mode with size (2048, 0)

Their precision/size relationship is not certain.
So block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR, then 
we never reaches
the situation that comparing the precision/size between VLA size and VLS size that 
size > coeffs[0] of VLA mode.

Note this patch cause following regression:

FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize  
scan-assembler-not vset
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize  
scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2

FAIL: gcc.target/riscv/rvv/base/cpymem-1.c check-function-bodies f3
FAIL: gcc.target/riscv/rvv/base/cpymem-2.c check-function-bodies f2
FAIL: gcc.target/riscv/rvv/base/cpymem-2.c check-function-bodies f3

1. cpymem check FAIL should be fixed on the testcase since the test is fragile 
which should be robostified.

2. pr111751.c is Vector cost model issue, and I will fix it in the following 
patch.

For now, we should land this patch first (highest-priority) since it is fixing 
ICE.

PR target/112851
PR target/112852


I know I'm pretty late here, but this has happened a bunch of times 
before and I keep getting stuck on other stuff and thus don't get the 
time to say anything.  So I figured I'd say something anyway:


Please stop committing code that introduces new test failures, even if 
you don't think those failures are important.  We've got a lot of people 
trying to push through the test failures with the hope of getting larger 
code bases compiling correctly, having to chase down a churn in the test 
suite just isn't productive.




gcc/ChangeLog:

* config/riscv/riscv-v.cc (vls_mode_valid_p): Block VLSmodes according 
TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: Add LMUL = 8 option.
* gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mod-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/pr112851.c: New test.
* gcc.target/riscv/rvv/autovec/pr112852.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 16 +++-
 .../gcc.target/riscv/rvv/autovec/pr112851.c   | 21 +
 .../gcc.target/riscv/rvv/autovec/pr112852.c   | 87 +++
 .../riscv/rvv/autovec/vls/consecutive-1.c |  2 +-
 .../riscv/rvv/autovec/vls/consecutive-2.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mod-1.c  |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-1.c  |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-10.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-11.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-12.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-13.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-14.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-15.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-16.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-17.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-3.c  |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-5.c  |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/mov-7.c  |  2 +-
 

Re: [PATCH v2] RISC-V: Supports RISC-V Profiles in '-march' option.

2023-12-19 Thread Palmer Dabbelt

On Tue, 12 Dec 2023 04:08:09 PST (-0800), jia...@iscas.ac.cn wrote:

Supports RISC-V profiles[1] in -march option.

Default input set the profile is before other formal extensions.

V2: Fixes some format errors and adds code comments for parse function
Thanks for Jeff Law's review and comments.

[1]https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc


IMO we should wait on the profiles.  They're all minor releases so it's 
not like there's any value for users.  Between the churn still in 
profiles and all these barely-defined extensions this all looks to have 
been very rushed, and I just don't trust RISC-V's stance on 
compatibility enough to try and support this sort of thing -- we've 
gotten burned enough times trying to do that.



gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (struct riscv_profiles):
  New struct.
(riscv_subset_list::parse_profiles): New function.
(riscv_subset_list::parse): New table.
* config/riscv/riscv-subset.h: New protype.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-31.c: New test.
* gcc.target/riscv/arch-32.c: New test.
* gcc.target/riscv/arch-33.c: New test.
* gcc.target/riscv/arch-34.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc  | 83 +++-
 gcc/config/riscv/riscv-subset.h  |  2 +
 gcc/testsuite/gcc.target/riscv/arch-31.c |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-32.c |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-33.c |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-34.c |  7 ++
 6 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-31.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-33.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-34.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4d5a2f874a2..8b674a4a280 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -195,6 +195,12 @@ struct riscv_ext_version
   int minor_version;
 };

+struct riscv_profiles
+{
+  const char *profile_name;
+  const char *profile_string;
+};
+
 /* All standard extensions defined in all supported ISA spec.  */
 static const struct riscv_ext_version riscv_ext_version_table[] =
 {
@@ -379,6 +385,42 @@ static const struct riscv_ext_version riscv_combine_info[] 
=
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };

+/* This table records the mapping form RISC-V Profiles into march string.  */
+static const riscv_profiles riscv_profiles_table[] =
+{
+  /* RVI20U only contains the base extesnion 'i' as mandatory extension.  */
+  {"RVI20U64", "rv64i"},
+  {"RVI20U32", "rv32i"},
+
+  /* RVA20U contains the 'i,m,a,f,d,c,zicsr' as mandatory extensions.
+ Currently we don't have zicntr,ziccif,ziccrse,ziccamoa,
+ zicclsm,za128rs yet.   */
+  {"RVA20U64", "rv64imafdc_zicsr"},
+
+  /* RVA20S64 mandatory include all the extensions in RVA20U64 and
+ additonal 'zifencei' as mandatory extensions.
+ Notes that ss1p11, svbare, sv39, svade, sscptr, ssvecd, sstvala should
+ control by binutils.  */
+  {"RVA20S64", "rv64imafdc_zicsr_zifencei"},
+
+  /* RVA22U contains the 'i,m,a,f,d,c,zicsr,zihintpause,zba,zbb,zbs,
+ zicbom,zicbop,zicboz,zfhmin,zkt' as mandatory extensions.
+ Currently we don't have zicntr,zihpm,ziccif,ziccrse,ziccamoa,
+ zicclsm,zic64b,za64rs yet.  */
+  {"RVA22U64", "rv64imafdc_zicsr_zihintpause_zba_zbb_zbs"  
\
+   "_zicbom_zicbop_zicboz_zfhmin_zkt"},
+
+  /* RVA22S64 mandatory include all the extensions in RVA22U64 and
+ additonal 'zifencei,svpbmt,svinval' as mandatory extensions.
+ Notes that ss1p12, svbare, sv39, svade, sscptr, ssvecd, sstvala,
+ scounterenw extentions should control by binutils.  */
+  {"RVA22S64","rv64imafdc_zicsr_zifencei_zihintpause"  
\
+   "_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt_svpbmt_svinval"},
+
+  /* Terminate the list.  */
+  {NULL, NULL}
+};
+
 static const riscv_cpu_info riscv_cpu_tables[] =
 {
 #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \
@@ -958,6 +1000,42 @@ riscv_subset_list::parsing_subset_version (const char 
*ext,
   return p;
 }

+/* Parsing RISC-V Profiles in -march string.
+   Return string with mandatory extensions of Profiles.  */
+const char *
+riscv_subset_list::parse_profiles (const char * p){
+  /* Checking if input string contains a Profiles.
+ There are two cases use Proifles in -march option
+
+   1. Only use Proifles as -march input
+   2. Mixed Profiles with other extensions
+
+ use '+' to split Profiles and other extension.  */
+  for (int i = 0; riscv_profiles_table[i].profile_name != NULL; ++i) {
+const char* match = strstr(p, riscv_profiles_table[i].profile_name);
+const char* plus_ext = strchr(p, '+');
+/* Find profile at the begin.  */
+if (match != NULL 

Re: [PATCH] RISC-V: Add Zvfbfmin extension to the -march= option

2023-12-12 Thread Palmer Dabbelt
_zvfbfmin -mabi=lp64d" } */
+int foo()
+{
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-32.c 
b/gcc/testsuite/gcc.target/riscv/predef-32.c
new file mode 100644
index 000..7417e0d996f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-32.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32i_zvfbfmin -mabi=ilp32f -mcmodel=medlow 
-misa-spec=20191213" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 32
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i)
+#error "__riscv_i"
+#endif
+
+#if !defined(__riscv_f)
+#error "__riscv_f"
+#endif
+
+#if !defined(__riscv_zvfbfmin)
+#error "__riscv_zvfbfmin"
+#endif
+
+#if defined(__riscv_v)
+#error "__riscv_v"
+#endif
+
+#if defined(__riscv_d)
+#error "__riscv_d"
+#endif
+
+#if defined(__riscv_c)
+#error "__riscv_c"
+#endif
+
+#if defined(__riscv_a)
+#error "__riscv_a"
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-33.c 
b/gcc/testsuite/gcc.target/riscv/predef-33.c
new file mode 100644
index 000..74d05bc9719
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-33.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64iv_zvfbfmin -mabi=lp64d -mcmodel=medlow 
-misa-spec=20191213" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 64
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i)
+#error "__riscv_i"
+#endif
+
+#if !defined(__riscv_f)
+#error "__riscv_f"
+#endif
+
+#if !defined(__riscv_d)
+#error "__riscv_d"
+#endif
+
+#if !defined(__riscv_v)
+#error "__riscv_v"
+#endif
+
+#if !defined(__riscv_zvfbfmin)
+#error "__riscv_zvfbfmin"
+#endif
+
+#if defined(__riscv_c)
+#error "__riscv_c"
+#endif
+
+#if defined(__riscv_a)
+#error "__riscv_a"
+#endif
+
+  return 0;
+}


Reviewed-by: Palmer Dabbelt 


[PATCH] RISC-V: Don't make Ztso imply A

2023-12-12 Thread Palmer Dabbelt
I can't actually find anything in the ISA manual that makes Ztso imply
A.  In theory the memory ordering is just a different thing that the set
of availiable instructions (ie, Ztso without A would still imply TSO for
loads and stores).  It also seems like a configuration that could be
sane to build: without A it's all but impossible to write any meaningful
multi-core code, and TSO is really cheap for a single core.

That said, I think it's kind of reasonable to provide A to users asking
for Ztso.  So maybe even if this was a mistake it's the right thing to
do?

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_info):
Remove {"ztso", "a"}.
---
 gcc/common/config/riscv/riscv-common.cc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index f142212f2ed..5f39e5ea462 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -71,8 +71,6 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zks", "zksed"},
   {"zks", "zksh"},
 
-  {"ztso", "a"},
-
   {"v", "zvl128b"},
   {"v", "zve64d"},
 
-- 
2.42.1



Re: [PATCH] RISC-V: Remove xfail from ssa-fre-3.c testcase

2023-12-06 Thread Palmer Dabbelt

On Wed, 06 Dec 2023 10:48:30 PST (-0800), Vineet Gupta wrote:


On 12/6/23 08:22, Palmer Dabbelt wrote:

Ran the test case at 122e7b4f9d0c2d54d865272463a1d812002d0a5c where the xfail

That's the original port submission, I'm actually kind of surprised it
still builds/works at all.


Full toolchain build would have been a stretch (matching pairing
binutils etc).
So I'd asked Edwin to just do a minimal cc1 build.


Ah, good idea.  I've gotten hung up a bunch of times trying to reproduce 
old stuff.  I'd always been trying full toolchain builds, I bet cc1 
would have a better chance of building for me.


Re: [PATCH] RISC-V: Remove xfail from ssa-fre-3.c testcase

2023-12-06 Thread Palmer Dabbelt

On Tue, 05 Dec 2023 16:39:06 PST (-0800), e...@rivosinc.com wrote:

Ran the test case at 122e7b4f9d0c2d54d865272463a1d812002d0a5c where the xfail


That's the original port submission, I'm actually kind of surprised it 
still builds/works at all.



was introduced. The test did pass at that hash and has continued to pass since
then. Remove the xfail

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-fre-3.c: Remove xfail

Signed-off-by: Edwin Lu 
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
index 224dd4f72ef..b2924837a22 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
@@ -18,4 +18,4 @@ foo (int a, int b)
   return aa + bb;
 }

-/* { dg-final { scan-tree-dump "Replaced \\\(int\\\) aa_.*with a_" "fre1" { xfail { 
riscv*-*-* && lp64 } } } } */
+/* { dg-final { scan-tree-dump "Replaced \\\(int\\\) aa_.*with a_" "fre1" } } 
*/


Reviewed-by: Palmer Dabbelt 

Though Kito did all the test suite stuff back then, so not sure if he 
happens to remember anything specific about what was going on.


Thanks!


Re: RISC-V: Support XTheadVector extensions

2023-11-28 Thread Palmer Dabbelt

On Fri, 17 Nov 2023 16:01:27 PST (-0800), jeffreya...@gmail.com wrote:



On 11/17/23 16:16, 钟居哲 wrote:

 >> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We

may also need to add '^' to the punct_valid_p hook.  But yes, this is
the preferred way to go when all we need to do is prefix the instruction
with "th.".


No. I don't think we need to add '^' . I don't want theadvector to touch
any codes
of vector.md.
Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
People like me don't want to touch any thing related to Thead.
But anyway, I will take care of that in GCC-15.

I suspect it's going to be even worse if you we have multiple patterns
with the same underlying RTL, but just different output strings.

The standard way to handle that has been with an output modifier and/or
ASSEMBLER_DIALECT.  If you look at the PA port for example, the
assembler syntax changed dramatically between the PA1.0/PA1.1 era and
the PA2.0 era.  But we support both variants trivially without
duplicating all the patterns.


IMO we're just stuck between a rock and a hard place here.  
Specifically, this isn't just an assembly syntax change but also comes 
with a bunch of behaviorial changes to the instructions in question -- 
I'm specifically thinking of things like the register packing, which 
IIRC changed a ton between 0.7 and 0.8 (and then again more for 1.0).  
That's the kind of stuff that tends to have non-local implications on 
the port, and thus can trip people up.


So if we model this as just assembly syntax then we risk people tripping 
over the differences, but if we try to model it as a whole different 
extension then we have more code to manage.  I'd start with the assembly 
syntax approach, as it should be the option with less code which is 
always nice.  If that turns out to be a problem then we can always just 
duplicate the patterns, but it's way harder to merge them back together 
if we start out with things duplicated.


During the patchwork call we also ended up talking about the P extension 
(and the likely vendor flavors).  Nothing's appeared for there yet, but 
the theory is that the RZ/Five (Renesas' line of RISC-V chips that came 
out earlier this year) has some P-related extension.  There's also some 
SIMD in CORE-V, as well as a bunch of low-hanging fruit missing from V 
that we'll probably see more vendor extensions for.


So I think if the goal is to have a single vector target for RISC-V then 
we've probably lost already.



But we've got time to sort this out.  I don't think the code in question
was targeted towards gcc-14.


[In case anyone else is watching: see the forked thread, it might be 
amied for 14 now...]





jeff


T-Head Vector for GCC-14? (was Re: RISC-V: Support XTheadVector extensions)

2023-11-28 Thread Palmer Dabbelt

On Wed, 22 Nov 2023 14:27:50 PST (-0800), jeffreya...@gmail.com wrote:

...


[Trimming everything else, as this is a big change.  I'm also making it 
a new subject/thread, so folks can see.]



More generally, I think I need to soften my prior statement about
deferring this to gcc-15.  This code was submitted in time for the
gcc-14 deadline, so it should be evaluated just like we do anything else
that makes the deadline.  There are various criteria we use to evaluate
if something should get integrated and we should just work through this
series like we always do and not treat it specially in any way.


We talked about this some in the pachwork meeting today.  There's a lot 
of moving parts here, so here's my best bet at summarizing 

It seems like folks broadly agree: I think the only reason everyone was 
so quick to defer to 15 was because we though the Vrull guys even want 
to, but sounds like there's some interest in getting this into 14.  
That's obviously a risky thing to do given it was sent right at the end 
of the window, but it meets the rules.


Folks in the call seemed generally amenable to at least trying for 14, 
so unless anyone's opposed on the lists it seems like the way to go.  
IIRC we ended up with the following TODO list:


* Make sure this doesn't regress on the targets we already support.  
 From the sounds of things there's been test suite runs that look fine, 
 so hopefully that's all manageable.  Christoph said he'd send 
 something out, we've had a bunch of test skew so there might be a bit 
 lurking but it should be generally manageable.
* We agree on some sort of support lifecycle.  There seemed to be 
 basically two proposals: merge for 14 with the aim of quickly 
 deperecating it (maybe even for 15), or merge for 14 with the aim of 
 keeping it until it ends up un-tested (ie, requiring test results are 
 published for every release).
* We actually find some time to sit down and do the code review.  
 That'll be a chunk of work and time is tight since most of us are 
 focusing on V-1.0, but hopefully we've got time to fit things in.
* There's some options for testing without hardware: QEMU dropped 
 support for V-0.7.1 a while ago, but there's a patch set that's not 
 yet on the lists to bring that back.


So I think unless anyone's opposed, we can at least start looking into 
getting this into GCC-14 -- there's obviously still a ton of review work 
to do and we might find something problematic, but we won't know until 
we actually sit down and do the reviews.


---

Then for my opinions:

The only policy worry I have is the support lifecycle: IMO merging 
something we're going to quickly deprecate is going to lead to headaches 
for users, so we should only merge this if we're going to plan on 
supporting it for the life of the hardware.  That's always hard to 
define, but we talked through the option of pushing this onto the users: 
we'd require test results published for every GCC release, and if no 
reasonably cleas test results are published then we'll assume the HW is 
defunct and support for it can be deprecated.  That's sort of patterned 
on how glibc documents deprecating ports.


IIRC we didn't really end up with any deprecation policy when merging 
the other vendor support, so I'd argue we should just make that the 
general plan for supporting vendor extensions.  It pushes a little more 
work to the vendors/users than we have before, but I think it's a good 
balance.  It's also a pretty easy policy for vendors to understand: if 
they want their custom stuff supported, they need to demonstrate it 
works. 


[PATCH 2/2] testsuite/unroll-8: Disable vectorization for varibale-factor targets

2023-11-21 Thread Palmer Dabbelt
The vectorizer picks up these loops and disables unrolling on targets
with variable vector factors.  That result in better code here, but it
trips up the unrolling tests.  So just disable vectorization for these.

gcc/testsuite/ChangeLog:

PR target/112531
* gcc.dg/unroll-8.c: Disable vectorization on arm64 and riscv.
---
This also isn't tested.
---
 gcc/testsuite/gcc.dg/unroll-8.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/unroll-8.c b/gcc/testsuite/gcc.dg/unroll-8.c
index 06d32e56893..4465c620800 100644
--- a/gcc/testsuite/gcc.dg/unroll-8.c
+++ b/gcc/testsuite/gcc.dg/unroll-8.c
@@ -1,6 +1,15 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-loop2_unroll-details-blocks -funroll-loops" } 
*/
+
+/*
+ * Targets that support variable-length vectorization don't unroll loops (see
+ * the "Disabling unrolling due to variable-length vectorization factor" out in
+ * tree-vect-loop.  So disable tree vectorization for these targets, as it will
+ * interfere with the unrolling we're looking for below.
+ */
+/* { dg-additional-options "-fno-tree-vectorize" { target aarch64-*-* } } */
 /* { dg-additional-options "-fno-tree-vectorize" { target amdgcn-*-* } } */
+/* { dg-additional-options "-fno-tree-vectorize" { target riscv*-*-* } } */
 
 struct a {int a[7];};
 void t(struct a *a, int n)
-- 
2.42.1



[PATCH 1/2] testsuite/unroll-8: Avoid triggering undefined behavior

2023-11-21 Thread Palmer Dabbelt
I was poking around with this test failure and noticed it was exercising
undefined behavior.  The return type doesn't matter for what's being
tested, so just mark it as void.

gcc/testsuite/ChangeLog:

* gcc.dg/unroll-8.c: Remove UB.
---
I didn't tes this, but it seems trivial enough that I'm just going to
throw it at the bots and hope I'm right.
---
 gcc/testsuite/gcc.dg/unroll-8.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/unroll-8.c b/gcc/testsuite/gcc.dg/unroll-8.c
index 4388f47d4c7..06d32e56893 100644
--- a/gcc/testsuite/gcc.dg/unroll-8.c
+++ b/gcc/testsuite/gcc.dg/unroll-8.c
@@ -3,7 +3,7 @@
 /* { dg-additional-options "-fno-tree-vectorize" { target amdgcn-*-* } } */
 
 struct a {int a[7];};
-int t(struct a *a, int n)
+void t(struct a *a, int n)
 {
   int i;
   for (i=0;i

Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread Palmer Dabbelt

On Fri, 17 Nov 2023 03:39:48 PST (-0800), juzhe.zh...@rivai.ai wrote:

90% theadvector extension reusing current RVV 1.0 instructions patterns:
Just change ASM, For example:

@@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
 (match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] 
VMULH)
  (match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
   "TARGET_VECTOR"
-  "vmulh.vx\t%0,%3,%z4%p1"
+  "%^vmulh.vx\t%0,%3,%z4%p1"
   [(set_attr "type" "vimul")
(set_attr "mode" "")])
+  if (letter == '^')
+{
+  if (TARGET_XTHEADVECTOR)
+   fputs ("th.", file);
+  return;
+}

For almost all patterns, you just simply append "th." in the ASM prefix.
like change "vmulh.vv" -> "th.vmulh.vv"

Almost all theadvector instructions are not new features,  all same as RVV1.0.
Why do you invent the such ISA doesn't include any features that RVV1.0 doesn't 
satisfy ?

I am not explicitly object this patch. But I should know the reason.


There's some more in the later threads, but with the top posting it kind 
of got lost so I'm just replying here.


This really isn't T-Head's fault: we announced V-0.7 as a stable draft 
that was being implemented, and then T-Head went and implemented it.  
Most of that history has been scrubbed by RVI, but you can still find 
some stuff like this old talk on YouTube 
.


In general we've just figured out a way to make things work when HW 
vendors end up in a grey area in RISC-V land.  That obviously results in 
a bunch of pain for the SW people, but this stuff is only useful if we 
can run on real HW and that always involves some amount of pain.  
Hopefully we can get to a point where we make fewer problems for 
ourselves, but we've got a long history to dig out from and there's 
going to be a lot more of this in the future.


So I don't like this XTHeadV stuff, but I think we're best to take it: 
these guys tried to do the right thing and got thrown under the bus by 
RVI, we should help them.  This is almost certainly going to be a lot 
more pain that we're used to, just given the size of the extensions in 
question, but I still think it's the right  way to go.


The other option is to essentially just tell them to fork the ISA, which 
isn't good for anyone.



Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as long as 
all other RISC-V maintainers agree.


I agree this is gcc-15 material: there's a lot of subtle differences in 
behavior between 0.7 and 1.0, even when the mnemonics are the same.  
We're already pretty buried in testing for 14, so trying to pick up 
another target is going to be a huge headache (particularly one that's a 
bit special).







juzhe.zh...@rivai.ai


RISC-V GCC Patchwork Sync on Nov 14th and 21st

2023-11-09 Thread Palmer Dabbelt
I'm going to be traveling for the next two weeks (Plumbers and then 
Thanksgiving), so I won't be at the patchwork syncs.


Re: [PATCH] RISC-V: Use stdint-gcc.h in rvv testsuite

2023-11-07 Thread Palmer Dabbelt

On Tue, 07 Nov 2023 01:45:19 PST (-0800), christoph.muell...@vrull.eu wrote:

From: Christoph Müllner 

stdint.h can be replaced with stdint-gcc.h to resolve some missing
system headers in non-multilib installations.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadmemidx-helpers.h:
Replace stdint.h with stdint-gcc.h.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h 
b/gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h
index a97f08c5cc1..9d8ce124a93 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h
@@ -1,7 +1,7 @@
 #ifndef XTHEADMEMIDX_HELPERS_H
 #define XTHEADMEMIDX_HELPERS_H

-#include 
+#include 

 #define intX_t long
 #define uintX_t unsigned long



Presumably this still passes the tests?  If so it LGTM so

Reviewed-by: Palmer Dabbelt 

Thanks!


Re: RISC-V patchworks call tomorrow ?

2023-11-06 Thread Palmer Dabbelt

On Mon, 06 Nov 2023 18:47:24 PST (-0800), jeffreya...@gmail.com wrote:



On 11/6/23 18:19, Vineet Gupta wrote:

Do we have call tomorrow, given some folks are traveling for RV Summit ?

I'll be in the air, so "no" from me.


I'll be on the ground, but not sure that counts for much.  IIRC Kito 
isn't going to the summit, so he should be as awake as he ever is (IIRC 
it's around midnight there).


So sounds like you might need to hit the 10 minute rule yourself?


Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-10-31 Thread Palmer Dabbelt

On Tue, 31 Oct 2023 16:18:35 PDT (-0700), jeffreya...@gmail.com wrote:



On 10/31/23 12:35, Vineet Gupta wrote:

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.

Hmm.  There may be dragons in here.  I'll need to find and review an old
conversation in this space (libcalls and argument promotions).


We also have a non-orthogonality in the ABI sign extension rules between 
SI and DI, a few of us were talking about it on the internal slack 
(though the specifics were for a different patch, Vineet has a few in 
flight).


Re: [PATCH] RISC-V/testsuite: Fix ILP32 RVV failures from missing

2023-09-27 Thread Palmer Dabbelt

On Wed, 27 Sep 2023 10:28:55 PDT (-0700), jeffreya...@gmail.com wrote:



On 9/25/23 15:17, Maciej W. Rozycki wrote:

On Mon, 25 Sep 2023, Maciej W. Rozycki wrote:


  NB the use of this specific  header, still in place elsewhere,
seems gratuitous to me.  We don't need or indeed want to print anything in
the test cases (unless verifying something specific to the print facility)
and if we want to avoid minor code duplication (i.e. not to have explicit:

   if (...)
 __builtin_abort ();

replicated across test cases), we can easily implement this via a local
header, there's no need to pull in a complex system facility.


  Overall we ought not to require any system headers in compile tests and
then link and run tests need a functional target environment anyway.  So
maybe the use of  in run tests isn't as bad after all if not for
the -DNDEBUG peculiarity.  However I still think the less we depend in
verification on external components the better, that's one variable to
exclude.

Certainly we don't want extraneous #includes.   We can often avoid them
with a few judicious prototypes, like for abort ().

But we also need to get to the point where we can run tests which have
#include directives that reference system headers.  Many tests in the
various GCC testsuites have those directives and we don't want to be
continually trying to eradicate #includes from those tests.

The standard way to deal with this is single tree builds which are
deprecated or to have an install tree with the suitable multilib headers
and libraries.  The latter seems like the only viable solution to me.


IMO this is one of those places where we should just be as normal as 
possible.  So if the other big ports allow system headers then we 
should, otherwise we should move everyone over to testing in some way 
we'll catch these before commit.




jeff


Re: On a Plane During Tomorrow's RISC-V GCC Patchwork Meeting

2023-09-25 Thread Palmer Dabbelt

On Mon, 18 Sep 2023 15:13:04 PDT (-0700), Vineet Gupta wrote:

On 9/18/23 09:11, Jeff Law wrote:



On 9/18/23 09:24, Kito Cheng wrote:

I may missed that one time too, not on plane yet, but need to go bed
earlier due to my flight is in next day early morning...

I'm unavailable as well, though I don't get on a plane until Wednesday
evening.


This is one meeting I really look forward to :-)
I'll be on a plane Wednesday evening as  well - see you all soon.


Looks like I'll also be traveling for this week's meeting, so I'll have 
to skip again.




-Vineet


Re: [Committed] RISC-V: Support VLS unary floating-point patterns

2023-09-21 Thread Palmer Dabbelt

On Thu, 21 Sep 2023 04:24:48 PDT (-0700), kito.ch...@sifive.com wrote:

GCC has built in function[1] for those math function stuff, e.g.
__builtin_ceilf, so we don't really need math.h :)

[1] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html


That's probably the right way to go for the test suite.  Something's 
still wrong somewhere with Patrick's builds, though...




On Thu, Sep 21, 2023 at 11:20 AM Palmer Dabbelt  wrote:


On Wed, 20 Sep 2023 10:47:23 PDT (-0700), Patrick O'Neill wrote:
> Juzhe,
>
> On a more general note, are we expecting #include  to cause a
> testcase to fail?
>
> My motivation is to make the testsuite less noisy when checking for
> regressions. For example, a patch like this one:
> 
https://patchwork.sourceware.org/project/gcc/patch/20230920023059.1728132-1-pan2...@intel.com/
> is showing 4 new failures on rv32gcv from the {dg-do compile} testcases
> that #include . I might be wrong, but those don't look like real
> failures to me [1][2][3].
>
> On glibc rv64gcv I'm seeing tests like:
> gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c
> fail with similar missing stubs-ilp32d.h errors.
>
> I want to sanity-check with other people that they are seeing similar
> errors and that these errors indicate something wrong with the testsuite.
> If nobody else is seeing these errors, I'd like to hear how you're
> running the testsuite so I can debug the riscv-gnu-toolchain repo.
>
> Patrick
>
> [1]:
> Executing on host:
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
> 
-B/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
> -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output
> -O3 -ftree-vectorize -march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize
> -fno-vect-cost-model -ffast-math -fno-schedule-insns
> -fno-schedule-insns2 -S   -o math-ceil-1.s (timeout = 600)
> spawn -ignore SIGHUP
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
> 
-B/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
> -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output
> -O3 -ftree-vectorize -march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize
> -fno-vect-cost-model -ffast-math -fno-schedule-insns
> -fno-schedule-insns2 -S -o math-ceil-1.s
> In file included from
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/features.h:515,
>   from
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/bits/libc-header-start.h:33,
>   from
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/math.h:27,
>   from
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h:1,
>   from
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c:5:
> 
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11:
> fatal error: gnu/stubs-lp64d.h: No such file or directory

That looks like a toolchain build/configuration issue, not a test issue.
IIRC this comes up from time to time, something's probably broken in
riscv-gnu-toolchain but I'm not sure what's wrong.

I get a working setup with just `./configure --enable-linux
--disable-multilib` and the latest riscv-gnu-toolchain master.  How are
you building things?

> compilation terminated.
> compiler exited with status 1
> FAIL: gcc.target/riscv/rvv/autovec/math-ceil-1.c -O3 -ftree-vectorize
> (test for excess errors)
>
> [2]:
> https://github.com/ewlu/riscv-gnu-toolchain/issues/170
>
> [3]:
> This also extends beyond math.h. I'm seeing similar failures for
> testcases like
> gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c that
> #include .
>
>
> On 9/19/23 18:12, Patrick O'Neill wrote:
>>
>> I'll let it run overnight and see if this helps. Even before this patch,
>> I was seeing 233 stubs related failures for rv32gcv and 7 for rv64gcv so
>> this won't fix all the issues.
>>
>> It's easily replicated using upstream riscv-gnu-toolchain
>> git clone https://github.com/riscv-collab/riscv-gnu-toolchain
>> cd riscv-gnu-toolchain
&g

Re: [Committed] RISC-V: Support VLS unary floating-point patterns

2023-09-21 Thread Palmer Dabbelt

On Wed, 20 Sep 2023 10:47:23 PDT (-0700), Patrick O'Neill wrote:

Juzhe,

On a more general note, are we expecting #include  to cause a
testcase to fail?

My motivation is to make the testsuite less noisy when checking for
regressions. For example, a patch like this one:
https://patchwork.sourceware.org/project/gcc/patch/20230920023059.1728132-1-pan2...@intel.com/
is showing 4 new failures on rv32gcv from the {dg-do compile} testcases
that #include . I might be wrong, but those don't look like real
failures to me [1][2][3].

On glibc rv64gcv I'm seeing tests like:
gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c
fail with similar missing stubs-ilp32d.h errors.

I want to sanity-check with other people that they are seeing similar
errors and that these errors indicate something wrong with the testsuite.
If nobody else is seeing these errors, I'd like to hear how you're
running the testsuite so I can debug the riscv-gnu-toolchain repo.

Patrick

[1]:
Executing on host:
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
-B/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-O3 -ftree-vectorize -march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize
-fno-vect-cost-model -ffast-math -fno-schedule-insns
-fno-schedule-insns2 -S   -o math-ceil-1.s (timeout = 600)
spawn -ignore SIGHUP
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
-B/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output
-O3 -ftree-vectorize -march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize
-fno-vect-cost-model -ffast-math -fno-schedule-insns
-fno-schedule-insns2 -S -o math-ceil-1.s
In file included from
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/features.h:515,
  from
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/bits/libc-header-start.h:33,
  from
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/math.h:27,
  from
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h:1,
  from
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c:5:
/github/ewlu-runner-2/_work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11:
fatal error: gnu/stubs-lp64d.h: No such file or directory


That looks like a toolchain build/configuration issue, not a test issue.  
IIRC this comes up from time to time, something's probably broken in 
riscv-gnu-toolchain but I'm not sure what's wrong.


I get a working setup with just `./configure --enable-linux 
--disable-multilib` and the latest riscv-gnu-toolchain master.  How are 
you building things?



compilation terminated.
compiler exited with status 1
FAIL: gcc.target/riscv/rvv/autovec/math-ceil-1.c -O3 -ftree-vectorize
(test for excess errors)

[2]:
https://github.com/ewlu/riscv-gnu-toolchain/issues/170

[3]:
This also extends beyond math.h. I'm seeing similar failures for
testcases like
gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c that
#include .


On 9/19/23 18:12, Patrick O'Neill wrote:


I'll let it run overnight and see if this helps. Even before this patch,
I was seeing 233 stubs related failures for rv32gcv and 7 for rv64gcv so
this won't fix all the issues.

It's easily replicated using upstream riscv-gnu-toolchain
git clone https://github.com/riscv-collab/riscv-gnu-toolchain
cd riscv-gnu-toolchain
git submodule update --init gcc
cd gcc
git pull master
cd ..
mkdir build
cd build
../configure --prefix=$(pwd) --with-arch=rv32gcv --with-abi=ilp32d
make report-linux -j32

Then search for "stubs" in the debug logs
(/build-gcc-linux-stage2/gcc/testsuite/*.log)

Patrick

On 9/19/23 17:54, juzhe.zh...@rivai.ai wrote:

I think we could remove match.h.

Hi, @Patrick. Could you verify it?

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
index 2292372d7a3..674098e9ba6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
@@ -1,5 +1,4 @@
 #include 
-#include 

and commit it.

Thanks.



Re: [Committed V4] internal-fn: Support undefined rtx for uninitialized SSA_NAME[PR110751]

2023-09-20 Thread Palmer Dabbelt

On Wed, 20 Sep 2023 07:58:49 PDT (-0700), juzhe.zh...@rivai.ai wrote:

According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

As Richard and Richi suggested, we recognize uninitialized SSA_NAME and convert 
it
into SCRATCH rtx if the target predicate allows SCRATCH.

It can help to reduce redundant data move instructions of targets like RISC-V.

Bootstrap and Regression on x86 passed.

gcc/ChangeLog:

* internal-fn.cc (expand_fn_using_insn): Support undefined rtx value.
* optabs.cc (maybe_legitimize_operand): Ditto.
(can_reuse_operands_p): Ditto.
* optabs.h (enum expand_operand_type): Ditto.
(create_undefined_input_operand): Ditto.


It's somewhat common to put the PR at the top of the ChangeLog (though I 
pretty frequently forget as well).




---
 gcc/internal-fn.cc |  4 
 gcc/optabs.cc  | 13 -
 gcc/optabs.h   | 13 -
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0fd34359247..61d5a9e4772 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -247,6 +247,10 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
unsigned int noutputs,
create_convert_operand_from ([opno], rhs_rtx,
 TYPE_MODE (rhs_type),
 TYPE_UNSIGNED (rhs_type));
+  else if (TREE_CODE (rhs) == SSA_NAME
+  && SSA_NAME_IS_DEFAULT_DEF (rhs)
+  && VAR_P (SSA_NAME_VAR (rhs)))
+   create_undefined_input_operand ([opno], TYPE_MODE (rhs_type));
   else
create_input_operand ([opno], rhs_rtx, TYPE_MODE (rhs_type));
   opno += 1;
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 32ff379ffc3..8b96f23aec0 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -8102,6 +8102,16 @@ maybe_legitimize_operand (enum insn_code icode, unsigned 
int opno,
  goto input;
}
   break;
+
+case EXPAND_UNDEFINED_INPUT:
+  /* See if the predicate accepts a SCRATCH rtx, which in this context
+indicates an undefined value.  Use an uninitialized register if not. */
+  if (!insn_operand_matches (icode, opno, op->value))
+   {
+ op->value = gen_reg_rtx (op->mode);
+ goto input;
+   }
+  return true;
 }
   return insn_operand_matches (icode, opno, op->value);
 }
@@ -8140,7 +8150,8 @@ can_reuse_operands_p (enum insn_code icode,
   switch (op1->type)
 {
 case EXPAND_OUTPUT:
-  /* Outputs must remain distinct.  */
+case EXPAND_UNDEFINED_INPUT:
+  /* Outputs and undefined intputs must remain distinct.  */
   return false;

 case EXPAND_FIXED:
diff --git a/gcc/optabs.h b/gcc/optabs.h
index c80b7f4dc1b..9b78d40a46c 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -37,7 +37,8 @@ enum expand_operand_type {
   EXPAND_CONVERT_TO,
   EXPAND_CONVERT_FROM,
   EXPAND_ADDRESS,
-  EXPAND_INTEGER
+  EXPAND_INTEGER,
+  EXPAND_UNDEFINED_INPUT
 };

 /* Information about an operand for instruction expansion.  */
@@ -117,6 +118,16 @@ create_input_operand (class expand_operand *op, rtx value,
   create_expand_operand (op, EXPAND_INPUT, value, mode, false);
 }

+/* Make OP describe an undefined input operand of mode MODE.  MODE cannot
+   be null.  */
+
+inline void
+create_undefined_input_operand (class expand_operand *op, machine_mode mode)
+{
+  create_expand_operand (op, EXPAND_UNDEFINED_INPUT, gen_rtx_SCRATCH (mode),
+mode, false);
+}
+
 /* Like create_input_operand, except that VALUE must first be converted
to mode MODE.  UNSIGNED_P says whether VALUE is unsigned.  */


On a Plane During Tomorrow's RISC-V GCC Patchwork Meeting

2023-09-18 Thread Palmer Dabbelt
My flight to the Cauldron lands in the middle of the meeting, so I'm 
going to miss it.  In theory it's all set up such that anyone can 
join/run the meeting.


Re: [PATCH] [RISC-V] fix PR 111259 invalid zcmp mov predicate.

2023-09-15 Thread Palmer Dabbelt

On Fri, 15 Sep 2023 09:37:48 PDT (-0700), Patrick O'Neill wrote:

On 9/15/23 01:49, Kito Cheng via Gcc-patches wrote:


I guess another solution is using reg_or_subregno instead of REGNO, but
that should not catch more cases, and just more run-time check, so this
version is LGTM.

I tested an equivalent patch (without the comment changes).
This patch resolves the build errors on glibc rv64gc with
--enable-checking=rtl.
Tested for regressions (without --enable-checking=rtl) using rv64gc &
rv32gc glibc.

This patch does not cause any regressions on those targets.


Reviewed-by: Palmer Dabbelt T
Acked-by: Palmer Dabbelt 

Thanks!



Patrick


Re: [PATCH 3/3] [V2] [RISC-V] support cm.mva01s cm.mvsa01 in zcmp

2023-09-07 Thread Palmer Dabbelt

On Thu, 07 Sep 2023 13:16:36 PDT (-0700), dimi...@dinux.eu wrote:

Hi,

This patch appears to have caused PR 111259.


Thanks.  Looks like wer'e not running our tests with RTL checking, 
Patrick is going to try and see if we've got compute time left for some 
builds -- even just having builds with checking would be a good one, we 
get bit by these bugs from time to time.


I'm spinning up a --enable-checking=yes build.  Maybe we just need 
something like


   diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
   index 53e7c1d03aa..aa4f02c67d5 100644
   --- a/gcc/config/riscv/predicates.md
   +++ b/gcc/config/riscv/predicates.md
   @@ -172,11 +172,11 @@ (define_predicate "stack_pop_up_to_s11_operand"

;; ZCMP predicates

(define_predicate "a0a1_reg_operand"
   -  (and (match_operand 0 "register_operand")
   +  (and (match_code "reg")
   (match_test "IN_RANGE (REGNO (op), A0_REGNUM, A1_REGNUM)")))

(define_predicate "zcmp_mv_sreg_operand"

   -  (and (match_operand 0 "register_operand")
   +  (and (match_code "reg")
   (match_test "TARGET_RVE ? IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
: IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
|| IN_RANGE (REGNO (op), S2_REGNUM, S7_REGNUM)")))


Regards,
Dimitar

On Tue, Aug 29, 2023 at 08:37:46AM +, Fei Gao wrote:

From: Die Li 

Signed-off-by: Die Li 
Co-Authored-By: Fei Gao 

gcc/ChangeLog:

* config/riscv/peephole.md: New pattern.
* config/riscv/predicates.md (a0a1_reg_operand): New predicate.
(zcmp_mv_sreg_operand): New predicate.
* config/riscv/riscv.md: New predicate.
* config/riscv/zc.md (*mva01s): New pattern.
(*mvsa01): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cm_mv_rv32.c: New test.
---
 gcc/config/riscv/peephole.md| 28 +
 gcc/config/riscv/predicates.md  | 11 
 gcc/config/riscv/riscv.md   |  1 +
 gcc/config/riscv/zc.md  | 22 
 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c | 23 +
 5 files changed, 85 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c

diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md
index 0ef0c04410b..92e57f9a447 100644
--- a/gcc/config/riscv/peephole.md
+++ b/gcc/config/riscv/peephole.md
@@ -38,3 +38,31 @@
 {
   operands[5] = GEN_INT (INTVAL (operands[2]) - INTVAL (operands[5]));
 })
+
+;; ZCMP
+(define_peephole2
+  [(set (match_operand:X 0 "a0a1_reg_operand")
+(match_operand:X 1 "zcmp_mv_sreg_operand"))
+   (set (match_operand:X 2 "a0a1_reg_operand")
+(match_operand:X 3 "zcmp_mv_sreg_operand"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[2]) != REGNO (operands[0]))"
+  [(parallel [(set (match_dup 0)
+   (match_dup 1))
+  (set (match_dup 2)
+   (match_dup 3))])]
+)
+
+(define_peephole2
+  [(set (match_operand:X 0 "zcmp_mv_sreg_operand")
+(match_operand:X 1 "a0a1_reg_operand"))
+   (set (match_operand:X 2 "zcmp_mv_sreg_operand")
+(match_operand:X 3 "a0a1_reg_operand"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[0]) != REGNO (operands[2]))
+   && (REGNO (operands[1]) != REGNO (operands[3]))"
+  [(parallel [(set (match_dup 0)
+   (match_dup 1))
+  (set (match_dup 2)
+   (match_dup 3))])]
+)
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 3ef09996a85..772f45df65c 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -165,6 +165,17 @@
   (and (match_code "const_int")
(match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 13)")))

+;; ZCMP predicates
+(define_predicate "a0a1_reg_operand"
+  (and (match_operand 0 "register_operand")
+   (match_test "IN_RANGE (REGNO (op), A0_REGNUM, A1_REGNUM)")))
+
+(define_predicate "zcmp_mv_sreg_operand"
+  (and (match_operand 0 "register_operand")
+   (match_test "TARGET_RVE ? IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
+: IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
+|| IN_RANGE (REGNO (op), S2_REGNUM, S7_REGNUM)")))
+
 ;; Only use branch-on-bit sequences when the mask is not an ANDI immediate.
 (define_predicate "branch_on_bit_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 8e09df6ff63..aa2b5b960dc 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -132,6 +132,7 @@
(S0_REGNUM  8)
(S1_REGNUM  9)
(A0_REGNUM  10)
+   (A1_REGNUM  11)
(S2_REGNUM  18)
(S3_REGNUM  19)
(S4_REGNUM  20)
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
index 8d7de97daad..77b28adde95 100644
--- 

Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion

2023-09-06 Thread Palmer Dabbelt

On Wed, 06 Sep 2023 09:47:05 PDT (-0700), jeffreya...@gmail.com wrote:



On 9/6/23 10:22, Palmer Dabbelt wrote:

On Wed, 06 Sep 2023 09:07:33 PDT (-0700), christoph.muell...@vrull.eu
wrote:

From: Christoph Müllner 

This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are
available.
The inserted sequences are:

rv32gc_zbb (RV64 is similar):
  add a3,a0,4
  li  a4,-1
.L1:  lw  a5,0(a0)
  add a0,a0,4
  orc.b   a5,a5
  beq a5,a4,.L1
  not a5,a5
  ctz a5,a5
  srl a5,a5,0x3
  add a0,a0,a5
  sub a0,a0,a3

rv64gc_xtheadbb (RV32 is similar):
  add   a4,a0,8
.L2:  ld    a5,0(a0)
  add   a0,a0,8
  th.tstnbz a5,a5
  beqz  a5,.L2
  th.rev    a5,a5
  th.ff1    a5,a5
  srl   a5,a5,0x3
  add   a0,a0,a5
  sub   a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test

The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.


Maybe this is more of a Jeff question, but this looks to me like
something that should be target-agnostic -- maybe we need some backend
work to actually emit the special instruction, but IIRC this is a
somewhat common flavor of instruction and is in other ISAs as well.  It
looks like there's already a strlen insn, so I guess the core issue is
why we need that unspec?

Sorry if I'm just missing something, though...


The generic strlen expansion in GCC doesn't really expand a strlen loop.
  It really just calls into the target code and forces the target to
handle everything.


OK, that explains it.


We could have generic strlen expansion code that kicks in if the target
expander fails.  And we could probably create the necessary opcodes to
express the optimized end-of-string comparison instructions that exist
on various architectures.  I'm not not sure it's worth that much effort
given targets are already doing their own strlen expansions.



If everyone does it this way then I don't think we need to worry about 
it.




jeff


Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion

2023-09-06 Thread Palmer Dabbelt

On Wed, 06 Sep 2023 09:07:33 PDT (-0700), christoph.muell...@vrull.eu wrote:

From: Christoph Müllner 

This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
The inserted sequences are:

rv32gc_zbb (RV64 is similar):
  add a3,a0,4
  li  a4,-1
.L1:  lw  a5,0(a0)
  add a0,a0,4
  orc.b   a5,a5
  beq a5,a4,.L1
  not a5,a5
  ctz a5,a5
  srl a5,a5,0x3
  add a0,a0,a5
  sub a0,a0,a3

rv64gc_xtheadbb (RV32 is similar):
  add   a4,a0,8
.L2:  lda5,0(a0)
  add   a0,a0,8
  th.tstnbz a5,a5
  beqz  a5,.L2
  th.reva5,a5
  th.ff1a5,a5
  srl   a5,a5,0x3
  add   a0,a0,a5
  sub   a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test

The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.


Maybe this is more of a Jeff question, but this looks to me like 
something that should be target-agnostic -- maybe we need some backend 
work to actually emit the special instruction, but IIRC this is a 
somewhat common flavor of instruction and is in other ISAs as well.  It 
looks like there's already a strlen insn, so I guess the core issue is 
why we need that unspec?


Sorry if I'm just missing something, though...


Tested using the glibc string tests.

Signed-off-by: Christoph Müllner 

gcc/ChangeLog:

* config.gcc: Add new object riscv-string.o.
riscv-string.cc.
* config/riscv/riscv-protos.h (riscv_expand_strlen):
New function.
* config/riscv/riscv.md (strlen): New expand INSN.
* config/riscv/riscv.opt: New flag 'minline-strlen'.
* config/riscv/t-riscv: Add new object riscv-string.o.
* config/riscv/thead.md (th_rev2): Export INSN name.
(th_rev2): Likewise.
(th_tstnbz2): New INSN.
* doc/invoke.texi: Document '-minline-strlen'.
* emit-rtl.cc (emit_likely_jump_insn): New helper function.
(emit_unlikely_jump_insn): Likewise.
* rtl.h (emit_likely_jump_insn): New prototype.
(emit_unlikely_jump_insn): Likewise.
* config/riscv/riscv-string.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
* gcc.target/riscv/xtheadbb-strlen.c: New test.
* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
* gcc.target/riscv/zbb-strlen-disabled.c: New test.
* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
* gcc.target/riscv/zbb-strlen.c: New test.
---
 gcc/config.gcc|   3 +-
 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv-string.cc  | 183 ++
 gcc/config/riscv/riscv.md |  28 +++
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/t-riscv  |   6 +
 gcc/config/riscv/thead.md |   9 +-
 gcc/doc/invoke.texi   |  11 +-
 gcc/emit-rtl.cc   |  24 +++
 gcc/rtl.h |   2 +
 .../riscv/xtheadbb-strlen-unaligned.c |  14 ++
 .../gcc.target/riscv/xtheadbb-strlen.c|  19 ++
 .../gcc.target/riscv/zbb-strlen-disabled-2.c  |  15 ++
 .../gcc.target/riscv/zbb-strlen-disabled.c|  15 ++
 .../gcc.target/riscv/zbb-strlen-unaligned.c   |  14 ++
 gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  19 ++
 16 files changed, 366 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-string.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2fe7c7ceef..aff6b6a5601 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -530,7 +530,8 @@ pru-*-*)
;;
 riscv*)
cpu_type=riscv
-   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o 
riscv-selftests.o riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o 
riscv-selftests.o riscv-string.o"
+   extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
   

Re: [PATCH v3 1/1] RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

2023-09-05 Thread Palmer Dabbelt

On Tue, 05 Sep 2023 20:07:16 PDT (-0700), gcc-patches@gcc.gnu.org wrote:



On 9/5/23 20:33, Tsukasa OI wrote:


Internally we have this as:


(TARGET_ZICOND || TARGET_XVENTANACONDOPS)

I don't really care, so I'm happy to go with yours.



Because XVentanaCondOps instructions are only available on 64-bit target
(I wanted to prevent misuses because we don't reject XVentanaCondOps on
RV32), the target expression would be:

(a) (TARGT_ZICOND || (TARGET_XVENTANACONDOPS && TARGET_64BIT))

and I had three options to deal with it.

1.  Use the plain expression (a)
2.  Name te expression (a)
3.  Enable TARGET_XVENTANACONDOPS only on 64-bit target

I think option 2 is the simplest yet understandable.

Sure.  It may also give us the option to roll in some of the thead code
at some point.  Their conditional move support seems to line up pretty
well with zicond/xventanacondops too, though I haven't looked at it very
deeply yet.


IIUC the T-Head stuff is actually a conditional move, so it's a little 
different than the conditional move/zero extensions (which IIUC have 
exactly the same semantics, just different encodings).   Hopefully the 
cmov fits in a bit easier, we shouldn't need to juggle the extra 0 
input.



I'm happy to hear that because I had no confidence so whether we can use
#include to share common parts.  I haven't tried yet but I believe we
have to #include only common parts (not including dg instructions
containing -march=..._zicond) so I will likely required to modify zicond
tests as well.

You actually don't even have to break out the common parts.  The dg-
directives in an included file aren't parsed by the dg framework.




I'll submit PATCH v4 (not committing directly) as changes will be a bit
larger (and Jeff's words seem "near approval" even after fixing the
tests, not complete approval).

Sounds perfect.  Given the bulk of the review work is already done, the
final review ack will be easy.

jeff


Re: [PATCH] RISC-V: zicond: remove bogus opt2 pattern

2023-09-01 Thread Palmer Dabbelt

On Thu, 31 Aug 2023 10:57:52 PDT (-0700), Vineet Gupta wrote:



On 8/31/23 06:51, Jeff Law wrote:



On 8/30/23 15:57, Vineet Gupta wrote:

This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the
pattern semantics can't be expressed by zicond instructions.

This involves test code snippet:

   if (a == 0)
return 0;
   else
return x;
 }

which is equivalent to:  "x = (a != 0) ? x : a"

Isn't it

x = (a == 0) ? 0 : x

Which seems like it ought to fit zicond just fine.


Logically they are equivalent, but 



If we take yours;

x = (a != 0) ? x : a

And simplify with the known value of a on the false arm we get:

x = (a != 0 ) ? x : 0;

Which is equivalent to

x = (a == 0) ? 0 : x;

So ISTM this does fit zicond just fine.


I could very well be mistaken, but define_insn is a pattern match and
opt2 has *ne* so the expression has to be in != form and thus needs to
work with that condition. No ?


and matches define_insn "*czero.nez..opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|    (if_then_else:DI (ne (reg/v:DI 134 [ a ])
|    (const_int 0 [0]))
|    (reg/v:DI 136 [ x ])
|    (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
 czero.nez x, x, a   ; %0, %2, %1
implying
 "x = (a != 0) ? 0 : a"

I get this from the RTL pattern:

x = (a != 0) ? x : a
x = (a != 0) ? x : 0


This is the issue, for ne, czero.nez can only express
x = (a != 0) ? 0 : x



I think you got the arms reversed.


Just working through this in email, as there's a lot of 
double-negatives and I managed to screw up my Linux PR this morning so I 
may not be thinking that well...


The docs say "(if_then_else test true-value false-value)".  So in this 
case it's


   test:  (ne (match_operand:X 1 "register_operand" "r") (const_int 0))
   true:  (match_operand:GPR 2 "register_operand" "r")
   false: (match_operand:GPR 3 "register_operand" "1") == (match_operand:X 1 
"register_operand" "r")

and we're encoding it as

   czero.nez %0,%2,%1

so that's

   rd:  output
   rs1: on-true
   rs2: condition (the value inside the ne in RTL)

That looks correct to me: the instruction's condition source register is 
inside a "(ne ... 0)", but we're doing the cmov.nez so it looks OK.


The rest of the zero juggling looks sane as well -- I'm not sure if the 
X vs GPR mismatch will confuse something else, but it should be caught 
by the rtx_equal_p() and thus should at least be safe.



What I meant was czero.nez as specified in RTL pattern would generate x
= (a != 0) ? 0 : a, whereas pattern's desired semantics is (a != 0) ? x : 0
And that is a problem because after all equivalents/simplifications, a
ternary operation's middle operand has to be zero to map to czero*, but
it doesn't for the opt2 RTL semantics.

I've sat on this for 2 days, trying to convince myself I was wrong, but
as it stands, it was generating wrong code in the test which is fixed
after the patch.


It might be easier for everyone to understand if you add a specific 
testcase for just the broken codegen.  I'm not having luck constructing 
a small reproducer (though I don't have a clean tree lying around, so I 
might have screwed something up here).


IIUC something like

   long func(long x, long a) {
   if (a != 0)
 return x;
   return 0;
   }

should do it, but I'm getting

   func:
   czero.eqz   a0,a0,a1
   ret

which looks right to me -- though it's not triggering this pattern, so 
not sure that means much.




Thx,
-Vineet


  1   2   3   4   5   >