[PATCH v5] RISC-V: Implement TLS Descriptors.

2024-03-28 Thread Tatsuyuki Ishi
This implements TLS Descriptors (TLSDESC) as specified in [1].

The 4-instruction sequence is implemented as a single RTX insn for
simplicity, but this can be revisited later if instruction scheduling or
more flexible RA is desired.

The default remains to be the traditional TLS model, but can be configured
with --with-tls={trad,desc}. The choice can be revisited once toolchain
and libc support ships.

[1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373.

gcc/Changelog:
* config/riscv/riscv.opt: Add -mtls-dialect to configure TLS flavor.
* config.gcc: Add --with-tls configuration option to change the
default TLS flavor.
* config/riscv/riscv.h: Add TARGET_TLSDESC determined from
-mtls-dialect and with_tls defaults.
* config/riscv/riscv-opts.h: Define enum riscv_tls_type for the
two TLS flavors.
* config/riscv/riscv-protos.h: Define SYMBOL_TLSDESC symbol type.
* config/riscv/riscv.md: Add instruction sequence for TLSDESC.
* config/riscv/riscv.cc (riscv_symbol_insns): Add instruction
sequence length data for TLSDESC.
(riscv_legitimize_tls_address): Add lowering of TLSDESC.
* doc/install.texi: Document --with-tls for RISC-V.
* doc/invoke.texi: Document -mtls-dialect for RISC-V.
* testsuite/gcc.target/riscv/tls_1.x: Add TLSDESC GD test case.
* testsuite/gcc.target/riscv/tlsdesc.c: Same as above.
---
No regression in gcc tests for rv32gcv and rv64gcv, tested alongside
the binutils and glibc implementation. Tested with --with-tls=desc.

v2: Add with_tls configuration option, and a few readability improvements.
Added Changelog.
v3: Add documentation per Kito's suggestion.
Fix minor issues pointed out by Kito and Jeff.
Thanks Kito Cheng and Jeff Law for review.
v4: Add TLSDESC GD assembly test.
Rebase on top of trunk.
v5: Trivial rebase on top of trunk.

I have recently addressed relaxation concerns on binutils and RVV
register save/restore on glibc, so I'm sending out a trivial rebase
with the hope that the full set can be merged soon.

 gcc/config.gcc   | 15 ++-
 gcc/config/riscv/riscv-opts.h|  6 ++
 gcc/config/riscv/riscv-protos.h  |  5 +++--
 gcc/config/riscv/riscv.cc| 24 
 gcc/config/riscv/riscv.h |  9 +++--
 gcc/config/riscv/riscv.md| 20 +++-
 gcc/config/riscv/riscv.opt   | 14 ++
 gcc/doc/install.texi |  3 +++
 gcc/doc/invoke.texi  | 13 -
 gcc/testsuite/gcc.target/riscv/tls_1.x   |  5 +
 gcc/testsuite/gcc.target/riscv/tlsdesc.c | 12 
 11 files changed, 115 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/tls_1.x
 create mode 100644 gcc/testsuite/gcc.target/riscv/tlsdesc.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 17873ac2103..1a5870672d2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2492,6 +2492,7 @@ riscv*-*-linux*)
# Force .init_array support.  The configure script cannot always
# automatically detect that GAS supports it, yet we require it.
gcc_cv_initfini_array=yes
+   with_tls=${with_tls:-trad}
;;
 riscv*-*-elf* | riscv*-*-rtems*)
tm_file="elfos.h newlib-stdint.h ${tm_file} riscv/elf.h"
@@ -2534,6 +2535,7 @@ riscv*-*-freebsd*)
# Force .init_array support.  The configure script cannot always
# automatically detect that GAS supports it, yet we require it.
gcc_cv_initfini_array=yes
+   with_tls=${with_tls:-trad}
;;
 
 loongarch*-*-linux*)
@@ -4671,7 +4673,7 @@ case "${target}" in
;;
 
riscv*-*-*)
-   supported_defaults="abi arch tune riscv_attribute isa_spec"
+   supported_defaults="abi arch tune riscv_attribute isa_spec tls"
 
case "${target}" in
riscv-* | riscv32*) xlen=32 ;;
@@ -4801,6 +4803,17 @@ case "${target}" in
;;
esac
fi
+   # Handle --with-tls.
+   case "$with_tls" in
+   "" \
+   | trad | desc)
+   # OK
+   ;;
+   *)
+   echo "Unknown TLS method used in --with-tls=$with_tls" 
1>&2
+   exit 1
+   ;;
+   esac
 
# Handle --with-multilib-list.
if test "x${with_multilib_list}" != xdefault; then
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 392b9169240..1b2dd5757a8 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -154,4 +154,10 @@ enum rvv_vector_bits_enum {
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul 

Re: [PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-28 Thread YunQiang Su
Xi Ruoyao  于2024年3月20日周三 15:12写道:
>
> We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
> arguments and there is nothing to advance, but that is not the case
> for (...) functions returning by hidden reference which have one such
> artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
> fail.
>
> Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
>
> gcc/ChangeLog:
>
> PR target/114175
> * config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
> mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
> functions if arg.type is NULL.
> ---
>
> Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk?
>

Thanks. LGTM.

>  gcc/config/mips/mips.cc | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index 68e2ae8d8fa..ce764a5cb35 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
>   argument.  Advance a local copy of CUM past the last "real" named
>   argument, to find out how many registers are left over.  */
>local_cum = *get_cumulative_args (cum);
> -  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
> +
> +  /* For a C23 variadic function w/o any named argument, and w/o an
> + artifical argument for large return value, skip advancing args.
> + There is such an artifical argument iff. arg.type is non-NULL
> + (PR 114175).  */
> +  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
> +  || arg.type != NULL_TREE)
>  mips_function_arg_advance (pack_cumulative_args (_cum), arg);
>
>/* Found out how many registers we need to save.  */
> --
> 2.44.0
>


[COMMITTED] Use fatal_error instead of internal_error for when ZSTD is not enabled

2024-03-28 Thread Andrew Pinski
This changes an internal error to be a fatal error for when the ZSTD
is not enabled but the section was compressed as ZSTD.

Committed as approved after bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

* lto-compress.cc (lto_end_uncompression): Use
fatal_error instead of internal_error when ZSTD
is not enabled.

Signed-off-by: Andrew Pinski 
---
 gcc/lto-compress.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lto-compress.cc b/gcc/lto-compress.cc
index c167ac967aa..bebf0277ef6 100644
--- a/gcc/lto-compress.cc
+++ b/gcc/lto-compress.cc
@@ -408,7 +408,7 @@ lto_end_uncompression (struct lto_compression_stream 
*stream,
 }
 #endif
   if (compression == ZSTD)
-internal_error ("compiler does not support ZSTD LTO compression");
+fatal_error (UNKNOWN_LOCATION, "compiler does not support ZSTD LTO 
compression");
 
   lto_uncompression_zlib (stream);
 }
-- 
2.43.0



[PATCH] c++: Keep DECL_SAVED_TREE of destructor instantiations in modules [PR104040]

2024-03-28 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

A template instantiation still needs to have its DECL_SAVED_TREE so that
its definition is emitted into the CMI. This way it can be emitted in
the object file of any importers that use it, in case it doesn't end up
getting emitted in this TU.

PR c++/104040

gcc/cp/ChangeLog:

* semantics.cc (expand_or_defer_fn_1): Also keep DECL_SAVED_TREE
for template instantiations.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr104040_a.C: New test.
* g++.dg/modules/pr104040_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/semantics.cc   |  7 +--
 gcc/testsuite/g++.dg/modules/pr104040_a.C | 14 ++
 gcc/testsuite/g++.dg/modules/pr104040_b.C |  8 
 3 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr104040_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr104040_b.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index adb1ba48d29..84e9901509a 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5033,9 +5033,12 @@ expand_or_defer_fn_1 (tree fn)
   /* We don't want to process FN again, so pretend we've written
 it out, even though we haven't.  */
   TREE_ASM_WRITTEN (fn) = 1;
-  /* If this is a constexpr function, keep DECL_SAVED_TREE.  */
+  /* If this is a constexpr function, or the body might need to be
+exported from a module CMI, keep DECL_SAVED_TREE.  */
   if (!DECL_DECLARED_CONSTEXPR_P (fn)
- && !(modules_p () && DECL_DECLARED_INLINE_P (fn)))
+ && !(modules_p ()
+  && (DECL_DECLARED_INLINE_P (fn)
+  || DECL_TEMPLATE_INSTANTIATION (fn
DECL_SAVED_TREE (fn) = NULL_TREE;
   return false;
 }
diff --git a/gcc/testsuite/g++.dg/modules/pr104040_a.C 
b/gcc/testsuite/g++.dg/modules/pr104040_a.C
new file mode 100644
index 000..ea36ce0a798
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr104040_a.C
@@ -0,0 +1,14 @@
+// PR c++/104040
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi test }
+
+export module test;
+
+export template 
+struct test {
+  ~test() {}
+};
+
+test use() {
+  return {};
+}
diff --git a/gcc/testsuite/g++.dg/modules/pr104040_b.C 
b/gcc/testsuite/g++.dg/modules/pr104040_b.C
new file mode 100644
index 000..efe014673fb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr104040_b.C
@@ -0,0 +1,8 @@
+// PR c++/104040
+// { dg-additional-options "-fmodules-ts" }
+
+import test;
+
+int main() {
+  test t{};
+}
-- 
2.43.2



Re: [PATCH] LoongArch: Increase division costs

2024-03-28 Thread chenglulu



在 2024/3/27 下午8:42, Xi Ruoyao 写道:

On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:

On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:

在 2024/3/26 下午5:48, Xi Ruoyao 写道:

The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

  int
  main (void)
  {
    unsigned long stat = 0xdeadbeef;
    for (int i = 0; i < 1; i++)
  stat = (stat * stat + stat * 114514 + 1919810) % 17;
    asm(""::"r"(stat));
  }

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html

The test case div-const-reduction.c is modified to assemble the instruction
sequence as follows:
lu12i.w $r12,97440>>12# 0x3b9ac000
ori $r12,$r12,2567
mod.w   $r13,$r13,$r12

This sequence of instructions takes 5 clock cycles.

It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.


Hmm indeed, it seems a waste to do this reduction for int / 17.
I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?


I tested spec2006. In the floating-point program, the test items with large

fluctuations are removed, and the rest is basically unchanged.

The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and (10,22).



Ping: [PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-28 Thread Xi Ruoyao
Ping.

On Wed, 2024-03-20 at 15:10 +0800, Xi Ruoyao wrote:
> We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
> arguments and there is nothing to advance, but that is not the case
> for (...) functions returning by hidden reference which have one such
> artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
> fail.
> 
> Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
> 
> gcc/ChangeLog:
> 
>   PR target/114175
>   * config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
>   mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
>   functions if arg.type is NULL.
> ---
> 
> Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk?
> 
>  gcc/config/mips/mips.cc | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index 68e2ae8d8fa..ce764a5cb35 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
>   argument.  Advance a local copy of CUM past the last "real" named
>   argument, to find out how many registers are left over.  */
>    local_cum = *get_cumulative_args (cum);
> -  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
> +
> +  /* For a C23 variadic function w/o any named argument, and w/o an
> + artifical argument for large return value, skip advancing args.
> + There is such an artifical argument iff. arg.type is non-NULL
> + (PR 114175).  */
> +  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
> +  || arg.type != NULL_TREE)
>  mips_function_arg_advance (pack_cumulative_args (_cum), arg);
>  
>    /* Found out how many registers we need to save.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[committed] Provide suitable output template for zero_extendqihi2 on H8

2024-03-28 Thread Jeff Law
Segher's recent combine change, quite unexpectedly, triggered a 
regression on the H8 port.  It failed to build newlib.


The zero_extendqihi2 pattern provided two alternatives.  One where the 
source and destination matched.  That turns into a suitable instruction 
trivially.   The second alternative was actually meant to capture cases 
where the value is coming from memory.


What was missing here was the reg->reg case where the source and 
destination do not match.  That fell into the second case which was 
requested to be split by the pattern's output template.


The splitter had a suitable condition to make sure it only triggered in 
the right cases.  Unfortunately with the pattern requiring a split in a 
case where the splitter was going to fail led to the fault.


So regardless of what's going on in the combiner, this code was just 
wrong.  Fixed thusly by providing a suitable output template for the 
reg->reg case.


Regression tested on h8300-elf.  Pushing to the trunk.

gcc/

* config/h8300/extensions.md (zero_extendqihi*): Add output
template for reg->reg case where the regs don't match.

Jeffcommit c1e66532cbb424bd7ea8c3b2c1ffea4bb5233309
Author: Jeff Law 
Date:   Thu Mar 28 16:56:53 2024 -0600

[committed] Provide suitable output template for zero_extendqihi2 on H8

Segher's recent combine change, quite unexpectedly, triggered a regression 
on
the H8 port.  It failed to build newlib.

The zero_extendqihi2 pattern provided two alternatives.  One where the 
source
and destination matched.  That turns into a suitable instruction trivially.
The second alternative was actually meant to capture cases where the value 
is
coming from memory.

What was missing here was the reg->reg case where the source and 
destination do
not match.  That fell into the second case which was requested to be split 
by
the pattern's output template.

The splitter had a suitable condition to make sure it only triggered in the
right cases.  Unfortunately with the pattern requiring a split in a case 
where
the splitter was going to fail led to the fault.

So regardless of what's going on in the combiner, this code was just wrong.
Fixed thusly by providing a suitable output template for the reg->reg case.

Regression tested on h8300-elf.  Pushing to the trunk.

gcc/

* config/h8300/extensions.md (zero_extendqihi*): Add output
template for reg->reg case where the regs don't match.

diff --git a/gcc/config/h8300/extensions.md b/gcc/config/h8300/extensions.md
index 7149dc0ac52..a1e8c4abd37 100644
--- a/gcc/config/h8300/extensions.md
+++ b/gcc/config/h8300/extensions.md
@@ -12,8 +12,8 @@ (define_expand "zero_extendqi2"
   })
 
 (define_insn_and_split "*zero_extendqihi2"
-  [(set (match_operand:HI 0 "register_operand" "=r,r")
-   (zero_extend:HI (match_operand:QI 1 "general_operand_src" "0,g>")))]
+  [(set (match_operand:HI 0 "register_operand" "=r,r,r")
+   (zero_extend:HI (match_operand:QI 1 "general_operand_src" "0,r,g>")))]
   ""
   "#"
   "&& reload_completed"
@@ -21,14 +21,15 @@ (define_insn_and_split "*zero_extendqihi2"
  (clobber (reg:CC CC_REG))])])
 
 (define_insn "*zero_extendqihi2"
-  [(set (match_operand:HI 0 "register_operand" "=r,r")
-   (zero_extend:HI (match_operand:QI 1 "general_operand_src" "0,g>")))
+  [(set (match_operand:HI 0 "register_operand" "=r,r,r")
+   (zero_extend:HI (match_operand:QI 1 "general_operand_src" "0,r,g>")))
(clobber (reg:CC CC_REG))]
   ""
   "@
   extu.w   %T0
+  mov.b\t%X1,%R0\;extu.w\t%T0
   #"
-  [(set_attr "length" "2,10")])
+  [(set_attr "length" "2,4,10")])
 
 ;; Split the zero extension of a general operand (actually a memory
 ;; operand) into a load of the operand and the actual zero extension


[PATCH] Allow `gcc_jit_type_get_size` to work with pointers

2024-03-28 Thread Guillaume Gomez
Hi,

Here's a little fix to allow the `gcc_jit_type_get_size` function to
work on pointer types as well.

Cordially.
From 21e6e2d5ea897fc74d0e3194973093c58157e6fa Mon Sep 17 00:00:00 2001
From: Guillaume Gomez 
Date: Tue, 26 Mar 2024 17:56:36 +0100
Subject: [PATCH] [PATH] Allow `gcc_jit_type_get_size` to work with pointers
 gcc/jit/ChangeLog:

	* libgccjit.cc (gcc_jit_type_get_size): Add pointer support
---
 gcc/jit/libgccjit.cc |  4 ++--
 gcc/testsuite/jit.dg/test-pointer_size.c | 27 
 2 files changed, 29 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-pointer_size.c

diff --git a/gcc/jit/libgccjit.cc b/gcc/jit/libgccjit.cc
index a2cdc01a3a4..58d47723e38 100644
--- a/gcc/jit/libgccjit.cc
+++ b/gcc/jit/libgccjit.cc
@@ -575,8 +575,8 @@ gcc_jit_type_get_size (gcc_jit_type *type)
 {
   RETURN_VAL_IF_FAIL (type, -1, NULL, NULL, "NULL type");
   RETURN_VAL_IF_FAIL
-(type->is_int () || type->is_float (), -1, NULL, NULL,
- "only getting the size of integer or floating-point types is supported for now");
+(type->is_int () || type->is_float () || type->is_pointer (), -1, NULL, NULL,
+ "only getting the size of integer or floating-point or pointer types is supported for now");
   return type->get_size ();
 }
 
diff --git a/gcc/testsuite/jit.dg/test-pointer_size.c b/gcc/testsuite/jit.dg/test-pointer_size.c
new file mode 100644
index 000..337796acc2a
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-pointer_size.c
@@ -0,0 +1,27 @@
+/* { dg-do compile { target x86_64-*-* } } */
+
+#include 
+#include "libgccjit.h"
+
+#include "harness.h"
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{}
+
+void
+verify_code (gcc_jit_context *ctxt, gcc_jit_result *result)
+{
+  gcc_jit_type *int_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
+  gcc_jit_type *int_ptr_type = gcc_jit_type_get_pointer (int_type);
+
+  int int_ptr_size = gcc_jit_type_get_size (int_ptr_type);
+  CHECK_VALUE (int_ptr_size, 8);
+
+  gcc_jit_type *void_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_VOID);
+  gcc_jit_type *void_ptr_type = gcc_jit_type_get_pointer (void_type);
+
+  CHECK_VALUE (int_ptr_size, gcc_jit_type_get_size (void_ptr_type));
+}
-- 
2.24.1.2762.gfe2e4819b8



Re: [Patch, fortran] PR110987 and PR113885 - gimplifier ICEs and wrong results in finalization

2024-03-28 Thread Harald Anlauf

Hi Paul,

Am 28.03.24 um 16:39 schrieb Paul Richard Thomas:

Hi All,

The attached patch has two elements:

(i) A fix for gimplifier ICEs with derived type having no components. The
reporter himself suggested (thanks Kirill!):

-  if (derived && derived->attr.zero_comp)
+  if (derived && (derived->components == NULL))

As far as I can tell, this is the correct fix. I tried setting
attr.zero_comp in resolve.cc for all the OK types without components but
this caused all sorts of fallout.

(ii) Final calls were occurring in the wrong place for finalizable
elemental function calls within scalarizer loops. This caused incorrect
results even for derived types with components. This is also fixed.


yes, this looks good here.


It should be noted that finalizer calls from the rhs of an assignment are
occurring at the wrong time, since F2018/24-7.5.6.3 requires:
"If an executable construct references a nonpointer function, the result is
finalized after execution of the innermost executable construct containing
the reference.", while in the present implementation, this happening just
before assignment to the lhs temporary. Fixing this is going to be really
tough and invasive, so I decided that getting the right results and the
correct number of finalization should be sufficient for the 14-branch
release. As it happens, I had been mulling over how to do this for
finalizations hidden in constructors and other contexts than assignment
(eg. write statements or allocation with source). It's a few months away
and will be appropriate for stage 1.

Regtests on x86_64 - OK for mainline and then, after a bit, for backporting
to 13-branch?


The patch looks rather "conservative" (read: safe) and appears to
fix the regressions very well, so go ahead as planned.

Thanks for the patch!

Harald


Regards to all

Paul

Fortran: Fix a gimplifier ICE/wrong result with finalization [PR104555]

2024-03-28  Paul Thomas  

gcc/fortran
PR fortran/36337
PR fortran/110987
PR fortran/113885
* trans-expr.cc (gfc_trans_assignment_1): Place finalization
block before rhs post block for elemental rhs.
* trans.cc (gfc_finalize_tree_expr): Check directly if a type
has no components, rather than the zero components attribute.
Treat elemental zero component expressions in the same way as
scalars.


gcc/testsuite/
PR fortran/113885
* gfortran.dg/finalize_54.f90: New test.
* gfortran.dg/finalize_55.f90: New test.

gcc/testsuite/
PR fortran/110987
* gfortran.dg/finalize_56.f90: New test.





Re: No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'. [PR106472]

2024-03-28 Thread Andrew Pinski
On Thu, Mar 28, 2024 at 3:15 PM Дилян Палаузов
 wrote:
>
> Hello Ian,
>
> when I add in gcc/go/config-lang.in the line
>   boot_language=yes
>
> then on stage3 x86_64-pc-linux-gnu/libbacktrace is compiled before 
> x86_64-pc-linux-gnu/libgo and this error is gone.
>
> But then Makefile.def has
>   target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
>
> and in x86_64-pc-linux-gnu libatomic is not compiled before 
> x86_64-pc-linux-gnu/libgo .  Linking the latter fails
>
> make[2]: Entering directory '/git/gcc/build/x86_64-pc-linux-gnu/libgo'
> /bin/sh ./libtool --tag=CC --mode=link /git/gcc/build/./gcc/xgcc 
> -B/git/gcc/build/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ …long text… 
> golang.org/x/sys/cpu_gccgo_x86.lo ../libbacktrace/libbacktrace.la 
> ../libatomic/libatomic_convenience.la ../libffi/libffi_convenience.la 
> -lpthread -lm
> ./libtool: line 5195: cd: ../libatomic/.libs: No such file or directory
> libtool: link: cannot determine absolute directory name of 
> `../libatomic/.libs'
>
> So either lib_path=.libs interferes (when gcc/go/config-lang.in contains 
> “boot_language=yes”), I have made the semi-serial build, trying to save a lot 
> of time waiting to get on stage3, somehow wrong, or libatomic must be 
> mentioned in gcc/go/config-lang.in . I have the feeling that ./configure 
> --enable-langugage=all works, because gcc/d/config-lang.in contains 
> boot_language=yes, and then in some way libphobos or d depend on libatomic.
>
> That said bootstrap=true might only be relevant when boot_langugages=yes is 
> present.
>
> In addition gcc/go/config-lang.in:boot_language=yes implies that on stage2 
> (thus in prev-x86_64-pc-linux-gnu/) libbacktrace is built, which I do not 
> want this, as libbacktrace is needed only by libgo on stage3.
>
> Can someone explain, why is libbacktrace built once in the built-root, as 
> stage1-libbacktrace, prev-libbacktrace and libbacktrace (for stage3) and once 
> again in stage1-x86_64-pc-linux-gnu/libbacktrace, 
> prev-x86_64-pc-linux-gnu/libbacktrace/ and in 
> x86_64-pc-linux-gnu/libbacktrace ? My precise question is why libbacktrace is 
> built once in the build-root directory and once in the x86_64-pc-linux-gnu 
> directory?

Because it is both a target library and a host library. Take a cross
compiler that is being built on say target A and targeting target B.
It will be built as a host library to be included as part of the
cc1/cc1plus/etc. and be a target library that will be used for
libsanitizer (and libgo). The GCC build does not use the target
library to link cc1/cc1plus with it; only the host library version.
Does that make sense now?

Thanks,
Andrew Pinski

>
> Kind regards Дилян
>
>
> Am 26. März 2024 16:37:40 UTC schrieb Ian Lance Taylor :
>>
>> On Tue, Mar 26, 2024 at 9:33 AM Дилян Палаузов
>>  wrote:
>>>
>>>
>>>  Makefile.def contains already:
>>>
>>>  host_modules= { module= libbacktrace; bootstrap=true; }; // since 
>>> eff02e4f84 - "libbacktrace/: * Initial implementation" year 2012
>>>
>>>  host_modules= { module= libcpp; bootstrap=true; }; // since 
>>> 4f4e53dd8517c0b2 - year 2004
>>
>>
>> Yes.  I was just trying to answer your question.
>>
>> Ian
>>
>>> Am 25. März 2024 23:59:52 UTC schrieb Ian Lance Taylor :


  On Sat, Mar 23, 2024 at 4:32 AM Дилян Палаузов
   wrote:
>
>
>
>   Can the build experts say what needs to be changed?  The dependencies I 
> added are missing in the build configuration (@if gcc-bootstrap).
>
>   I cannot say if libbacktrace should or should not be a bootstrap=true 
> module.



  I don't count as a build expert these days, but since GCC itself links
  against libbacktrace, my understanding is that the libbacktrace
  host_module should be bootstrap=true, just like, say, libcpp.

  Ian


Re: No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'. [PR106472]

2024-03-28 Thread Дилян Палаузов
Hello Ian,

when I add in gcc/go/config-lang.in the line
  boot_language=yes

then on stage3 x86_64-pc-linux-gnu/libbacktrace is compiled before 
x86_64-pc-linux-gnu/libgo and this error is gone.

But then Makefile.def has
  target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };

and in x86_64-pc-linux-gnu libatomic is not compiled before 
x86_64-pc-linux-gnu/libgo .  Linking the latter fails

make[2]: Entering directory '/git/gcc/build/x86_64-pc-linux-gnu/libgo'
/bin/sh ./libtool --tag=CC --mode=link /git/gcc/build/./gcc/xgcc 
-B/git/gcc/build/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ …long text… 
golang.org/x/sys/cpu_gccgo_x86.lo ../libbacktrace/libbacktrace.la 
../libatomic/libatomic_convenience.la ../libffi/libffi_convenience.la -lpthread 
-lm
./libtool: line 5195: cd: ../libatomic/.libs: No such file or directory
libtool: link: cannot determine absolute directory name of `../libatomic/.libs'

So either lib_path=.libs interferes (when gcc/go/config-lang.in contains 
“boot_language=yes”), I have made the semi-serial build, trying to save a lot 
of time waiting to get on stage3, somehow wrong, or libatomic must be mentioned 
in gcc/go/config-lang.in . I have the feeling that ./configure 
--enable-langugage=all works, because gcc/d/config-lang.in contains 
boot_language=yes, and then in some way libphobos or d depend on libatomic.

That said bootstrap=true might only be relevant when boot_langugages=yes is 
present.

In addition gcc/go/config-lang.in:boot_language=yes implies that on stage2 
(thus in prev-x86_64-pc-linux-gnu/) libbacktrace is built, which I do not want 
this, as libbacktrace is needed only by libgo on stage3.

Can someone explain, why is libbacktrace built once in the built-root, as 
stage1-libbacktrace, prev-libbacktrace and libbacktrace (for stage3) and once 
again in stage1-x86_64-pc-linux-gnu/libbacktrace, 
prev-x86_64-pc-linux-gnu/libbacktrace/ and in x86_64-pc-linux-gnu/libbacktrace 
? My precise question is why libbacktrace is built once in the build-root 
directory and once in the x86_64-pc-linux-gnu directory?

Kind regards Дилян

Am 26. März 2024 16:37:40 UTC schrieb Ian Lance Taylor :
>On Tue, Mar 26, 2024 at 9:33 AM Дилян Палаузов
> wrote:
>>
>> Makefile.def contains already:
>>
>> host_modules= { module= libbacktrace; bootstrap=true; }; // since eff02e4f84 
>> - "libbacktrace/: * Initial implementation" year 2012
>>
>> host_modules= { module= libcpp; bootstrap=true; }; // since 4f4e53dd8517c0b2 
>> - year 2004
>
>Yes.  I was just trying to answer your question.
>
>Ian
>
>> Am 25. März 2024 23:59:52 UTC schrieb Ian Lance Taylor :
>>>
>>> On Sat, Mar 23, 2024 at 4:32 AM Дилян Палаузов
>>>  wrote:


  Can the build experts say what needs to be changed?  The dependencies I 
 added are missing in the build configuration (@if gcc-bootstrap).

  I cannot say if libbacktrace should or should not be a bootstrap=true 
 module.
>>>
>>>
>>> I don't count as a build expert these days, but since GCC itself links
>>> against libbacktrace, my understanding is that the libbacktrace
>>> host_module should be bootstrap=true, just like, say, libcpp.
>>>
>>> Ian


[PATCH] Fortran: fix NULL pointer dereference on overlapping initialization [PR50410]

2024-03-28 Thread Harald Anlauf
Dear all,

the attached simple, obvious and ancient patch from the PR fixes a
NULL pointer dereference that occurs on overlapping initializations
of derived types/DT components in DATA statements.

Gfortran currently does not detect or report overlapping initializations
in such cases, and some other compilers also do not (Intel) or give only
a warning (e.g. Nvidia).  For this reason I decided to add -std=legacy
to the options in the testcase.  Detecting the overlapping initializations
appears to require deeper changes in the way we look up DT components when
handling DATA statements, which is beyond the current stage.

Regtested on x86_64-pc-linux-gnu.

I intend to commit soon unless there are objections.

Thanks,
Harald

From b3970a30679959eed159dffa816899e4430e9da5 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 28 Mar 2024 22:34:40 +0100
Subject: [PATCH] Fortran: fix NULL pointer dereference on overlapping
 initialization [PR50410]

gcc/fortran/ChangeLog:

	PR fortran/50410
	* trans-expr.cc (gfc_conv_structure): Check for NULL pointer.

gcc/testsuite/ChangeLog:

	PR fortran/50410
	* gfortran.dg/data_initialized_4.f90: New test.
---
 gcc/fortran/trans-expr.cc|  2 +-
 gcc/testsuite/gfortran.dg/data_initialized_4.f90 | 16 
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/data_initialized_4.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 76bed9830c4..7ce798ab8a5 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -9650,7 +9650,7 @@ gfc_conv_structure (gfc_se * se, gfc_expr * expr, int init)
   cm = expr->ts.u.derived->components;

   for (c = gfc_constructor_first (expr->value.constructor);
-   c; c = gfc_constructor_next (c), cm = cm->next)
+   c && cm; c = gfc_constructor_next (c), cm = cm->next)
 {
   /* Skip absent members in default initializers and allocatable
 	 components.  Although the latter have a default initializer
diff --git a/gcc/testsuite/gfortran.dg/data_initialized_4.f90 b/gcc/testsuite/gfortran.dg/data_initialized_4.f90
new file mode 100644
index 000..156b6607edf
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/data_initialized_4.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-additional-options "-std=legacy" }
+!
+! PR fortran/50410
+!
+! Silently allow overlapping initialization in legacy mode (used to ICE)
+
+program p
+  implicit none
+  type t
+ integer :: g = 1
+  end type t
+  type(t) :: u = t(2)
+  data u%g /3/
+  print *, u! this might print "2"
+end
--
2.35.3



[PATCH] Prettify output of debug_dwarf_die

2024-03-28 Thread Tom Tromey
When debugging gcc, I tried calling debug_dwarf_die and I saw this
output:

  DW_AT_location: location descriptor:
(0x7fffe9c2e870) DW_OP_dup 0, 0
(0x7fffe9c2e8c0) DW_OP_bra location descriptor (0x7fffe9c2e640)
, 0
(0x7fffe9c2e820) DW_OP_lit4 4, 0
(0x7fffe9c2e910) DW_OP_skip location descriptor (0x7fffe9c2e9b0)
, 0
(0x7fffe9c2e640) DW_OP_dup 0, 0

I think those ", 0" should not appear on their own lines.  The issue
seems to be that print_dw_val should not generally emit a newline,
except when recursing.

gcc/ChangeLog

* dwarf2out.cc (print_dw_val) : Don't
print newline when not recursing.
---
 gcc/dwarf2out.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 8f18bc4fe64..1b0e8b5a5b2 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -6651,7 +6651,7 @@ print_dw_val (dw_val_node *val, bool recurse, FILE 
*outfile)
 case dw_val_class_loc:
   fprintf (outfile, "location descriptor");
   if (val->v.val_loc == NULL)
-   fprintf (outfile, " -> \n");
+   fprintf (outfile, " -> ");
   else if (recurse)
{
  fprintf (outfile, ":\n");
@@ -6662,9 +6662,9 @@ print_dw_val (dw_val_node *val, bool recurse, FILE 
*outfile)
   else
{
  if (flag_dump_noaddr || flag_dump_unnumbered)
-   fprintf (outfile, " #\n");
+   fprintf (outfile, " #");
  else
-   fprintf (outfile, " (%p)\n", (void *) val->v.val_loc);
+   fprintf (outfile, " (%p)", (void *) val->v.val_loc);
}
   break;
 case dw_val_class_loc_list:
-- 
2.43.0



[committed] Fix failure of c-c++-common/analyzer/stdarg-pr111289-int.c on hpux

2024-03-28 Thread John David Anglin
Fixes conflicting declarations of mode_t.

Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.
Committed to trunk.

Dave
---

Fix failure of c-c++-common/analyzer/stdarg-pr111289-int.c on hpux

2024-03-28  John David Anglin  

gcc/testsuite/ChangeLog:

PR analyzer/111289
* c-c++-common/analyzer/stdarg-pr111289-int.c: Don't include
.

diff --git a/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-int.c 
b/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-int.c
index 33d83169c3e..8faa58c9480 100644
--- a/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-int.c
+++ b/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-int.c
@@ -1,6 +1,5 @@
 #include 
 #include 
-#include 
 
 typedef unsigned int mode_t;
 


signature.asc
Description: PGP signature


[PATCH] c++: __is_constructible ref binding [PR100667]

2024-03-28 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The requirement that a type argument be complete is excessive in the case of
direct reference binding to the same type, which does not rely on any
properties of the type.  This is LWG 2939.

PR c++/100667

gcc/cp/ChangeLog:

* semantics.cc (same_type_ref_bind_p): New.
(finish_trait_expr): Use it.

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_constructible8.C: New test.
---
 gcc/cp/semantics.cc  | 54 +---
 gcc/testsuite/g++.dg/ext/is_constructible8.C | 31 +++
 2 files changed, 78 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_constructible8.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index adb1ba48d29..9838331d2a9 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12607,6 +12607,45 @@ check_trait_type (tree type, int kind = 1)
   return true;
 }
 
+/* True iff the conversion (if any) would be a direct reference
+   binding, not requiring complete types.  This is LWG2939.  */
+
+static bool
+same_type_ref_bind_p (cp_trait_kind kind, tree type1, tree type2)
+{
+  tree from, to;
+  switch (kind)
+{
+  /* These put the target type first.  */
+case CPTK_IS_CONSTRUCTIBLE:
+case CPTK_IS_NOTHROW_CONSTRUCTIBLE:
+case CPTK_IS_TRIVIALLY_CONSTRUCTIBLE:
+case CPTK_REF_CONSTRUCTS_FROM_TEMPORARY:
+case CPTK_REF_CONVERTS_FROM_TEMPORARY:
+  to = type1;
+  from = type2;
+  break;
+
+  /* These put it second.  */
+case CPTK_IS_CONVERTIBLE:
+case CPTK_IS_NOTHROW_CONVERTIBLE:
+  to = type2;
+  from = type1;
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  if (TREE_CODE (to) != REFERENCE_TYPE || !from)
+return false;
+  if (TREE_CODE (from) == TREE_VEC && TREE_VEC_LENGTH (from) == 1)
+from = TREE_VEC_ELT (from, 0);
+  return (TYPE_P (from)
+ && (same_type_ignoring_top_level_qualifiers_p
+ (non_reference (to), non_reference (from;
+}
+
 /* Process a trait expression.  */
 
 tree
@@ -12666,20 +12705,21 @@ finish_trait_expr (location_t loc, cp_trait_kind 
kind, tree type1, tree type2)
return error_mark_node;
   break;
 
-case CPTK_IS_ASSIGNABLE:
 case CPTK_IS_CONSTRUCTIBLE:
-  if (!check_trait_type (type1))
-   return error_mark_node;
-  break;
-
 case CPTK_IS_CONVERTIBLE:
-case CPTK_IS_NOTHROW_ASSIGNABLE:
 case CPTK_IS_NOTHROW_CONSTRUCTIBLE:
 case CPTK_IS_NOTHROW_CONVERTIBLE:
-case CPTK_IS_TRIVIALLY_ASSIGNABLE:
 case CPTK_IS_TRIVIALLY_CONSTRUCTIBLE:
 case CPTK_REF_CONSTRUCTS_FROM_TEMPORARY:
 case CPTK_REF_CONVERTS_FROM_TEMPORARY:
+  /* Don't check completeness for direct reference binding.  */;
+  if (same_type_ref_bind_p (kind, type1, type2))
+   break;
+  gcc_fallthrough ();
+
+case CPTK_IS_ASSIGNABLE:
+case CPTK_IS_NOTHROW_ASSIGNABLE:
+case CPTK_IS_TRIVIALLY_ASSIGNABLE:
   if (!check_trait_type (type1)
  || !check_trait_type (type2))
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/ext/is_constructible8.C 
b/gcc/testsuite/g++.dg/ext/is_constructible8.C
new file mode 100644
index 000..a27ec6eddd7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_constructible8.C
@@ -0,0 +1,31 @@
+// PR c++/100667
+// { dg-do compile { target c++11 } }
+
+struct T;
+
+#define SA(X) static_assert ((X), #X);
+
+SA (__is_constructible(T&&, T));
+SA (__is_constructible(const T&, T));
+SA (!__is_constructible(T&, T));
+SA (__is_nothrow_constructible(T&&, T));
+SA (__is_nothrow_constructible(const T&, T));
+SA (!__is_nothrow_constructible(T&, T));
+SA (__is_trivially_constructible(T&&, T));
+SA (__is_trivially_constructible(const T&, T));
+SA (!__is_trivially_constructible(T&, T));
+
+SA (__is_convertible(T, T&&));
+SA (__is_convertible(T, const T&));
+SA (!__is_convertible(T, T&));
+SA (__is_nothrow_convertible(T, T&&));
+SA (__is_nothrow_convertible(T, const T&));
+SA (!__is_nothrow_convertible(T, T&));
+
+// All false because either the conversion fails or it doesn't bind a temporary
+SA (!__reference_constructs_from_temporary (T&&, T));
+SA (!__reference_constructs_from_temporary (const T&, T));
+SA (!__reference_constructs_from_temporary (T&, T));
+SA (!__reference_converts_from_temporary (T&&, T));
+SA (!__reference_converts_from_temporary (const T&, T));
+SA (!__reference_converts_from_temporary (T&, T));

base-commit: bbb7c513dddc5c9b2d5e9b78bc1c2f85a0cfe07e
-- 
2.44.0



[oops pushed] aarch64: Fix vld1/st1_x4 intrinsic definitions

2024-03-28 Thread Richard Sandiford
Gah.  As mentioned on irc, I'd written this patch to fix PR114521.
The bug was fixed properly by Jonathan's struct rework in GCC 12,
but that's much too invasive to backport.  The attached patch therefore
deals with the bug directly.

Since it's new work, and since there's only one GCC 11 release to go,
I was originally planning to attach the patch to the PR for any distros
that wanted to take it.  But due to bad use of git, I accidentally
committed the patch while backporting the fix for PR97696.

Andrew suggested that we leave the patch in, so I'll do that unless
anyone objects.  (Please let me know if you do object though!)

Bootstrapped & regression-tested on aarch64-linux-gnu.  The PR contains
a patch to the tests that shows up the problem.

Sorry for the mistake.

Richard

---

The vld1_x4 and vst1_x4 patterns use XI registers for both 64-bit and
128-bit vectors.  This has the nice property that each individual
vector is within a separate 16-byte subreg of the XI, which should
reduce the number of memory spills needed.  However, it means that the
64-bit vector forms must convert between the native 4x64-bit structure
layout and the padded 4x128-bit XI layout.

The vld4 and vst4 functions did this correctly.  But the vld1x4 and
vst1x4 functions used a union between the native and padded layouts,
even though the layouts are different sizes.

This patch makes vld1x4 and vst1x4 use the same approach as vld4
and vst4.  It also fixes some uses of variables in the user namespace.

gcc/
* config/aarch64/arm_neon.h (vld1_s8_x4, vld1_s16_x4, vld1_s32_x4):
(vld1_u8_x4, vld1_u16_x4, vld1_u32_x4, vld1_f16_x4, vld1_f32_x4):
(vld1_p8_x4, vld1_p16_x4, vld1_s64_x4, vld1_u64_x4, vld1_p64_x4):
(vld1_f64_x4): Avoid using a union of a 256-bit structure and 512-bit
XImode integer.  Instead use the same approach as the vld4 intrinsics.
(vst1_s8_x4, vst1_s16_x4, vst1_s32_x4, vst1_u8_x4, vst1_u16_x4):
(vst1_u32_x4, vst1_f16_x4, vst1_f32_x4, vst1_p8_x4, vst1_p16_x4):
(vst1_s64_x4, vst1_u64_x4, vst1_p64_x4, vst1_f64_x4, vld1_bf16_x4):
(vst1_bf16_x4): Likewise for stores.
(vst1q_s8_x4, vst1q_s16_x4, vst1q_s32_x4, vst1q_u8_x4, vst1q_u16_x4):
(vst1q_u32_x4, vst1q_f16_x4, vst1q_f32_x4, vst1q_p8_x4, vst1q_p16_x4):
(vst1q_s64_x4, vst1q_u64_x4, vst1q_p64_x4, vst1q_f64_x4)
(vst1q_bf16_x4): Rename val parameter to __val.
---
 gcc/config/aarch64/arm_neon.h | 469 --
 1 file changed, 334 insertions(+), 135 deletions(-)

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index baa30bd5a9d..8f53f4e1559 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -16498,10 +16498,14 @@ __extension__ extern __inline int8x8x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_s8_x4 (const int8_t *__a)
 {
-  union { int8x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au;
-  __au.__o
-= __builtin_aarch64_ld1x4v8qi ((const __builtin_aarch64_simd_qi *) __a);
-  return __au.__i;
+  int8x8x4_t ret;
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_ld1x4v8qi ((const __builtin_aarch64_simd_qi *) __a);
+  ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
+  ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
+  ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
+  ret.val[3] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
+  return ret;
 }
 
 __extension__ extern __inline int8x16x4_t
@@ -16518,10 +16522,14 @@ __extension__ extern __inline int16x4x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_s16_x4 (const int16_t *__a)
 {
-  union { int16x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au;
-  __au.__o
-= __builtin_aarch64_ld1x4v4hi ((const __builtin_aarch64_simd_hi *) __a);
-  return __au.__i;
+  int16x4x4_t ret;
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_ld1x4v4hi ((const __builtin_aarch64_simd_hi *) __a);
+  ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
+  ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
+  ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
+  ret.val[3] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
+  return ret;
 }
 
 __extension__ extern __inline int16x8x4_t
@@ -16538,10 +16546,14 @@ __extension__ extern __inline int32x2x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_s32_x4 (const int32_t *__a)
 {
-  union { int32x2x4_t __i; __builtin_aarch64_simd_xi __o; } __au;
-  __au.__o
-  = __builtin_aarch64_ld1x4v2si ((const __builtin_aarch64_simd_si *) __a);
-  return __au.__i;
+  int32x2x4_t ret;
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_ld1x4v2si ((const __builtin_aarch64_simd_si *) __a);
+  ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
+  ret.val[1] = (int32x2_t) 

[COMMITTED] RISC-V: testsuite: ensure vtype is call clobbered

2024-03-28 Thread Vineet Gupta
Per classic Vector calling convention ABI, vtype is call clobbered,
so ensure gcc regenerates a VSETVLI in following cases:
 - after a function call.
 - after an inline asm fragment which clobbers vtype.

ATM gcc seems to be doing the right thing, but a test can never hurt.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vtype-call-clobbered.c: New Test.

Signed-off-by: Vineet Gupta 
---
 .../riscv/rvv/vtype-call-clobbered.c  | 47 +++
 1 file changed, 47 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c
new file mode 100644
index ..be9f312aa508
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O2" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+
+#include "riscv_vector.h"
+
+extern void can_clobber_vtype();
+
+static inline void v_loop (void * restrict in, void * restrict out, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  vuint8mf8_t v = *(vuint8mf8_t*)(in + i);
+  *(vuint8mf8_t*)(out + i) = v;
+}
+}
+
+/* Two V instructions back-back.
+   Only 1 vsetvli insn.  */
+void
+vec1 (void * restrict in, void * restrict out1,  void * restrict out2, int n)
+{
+ v_loop(in, out1, n);
+ v_loop(in, out2, n);
+}
+
+/* Two V instructions seperated by a function call.
+   Both need to have a corresponding vsetvli insn.  */
+void
+vec2 (void * restrict in, void * restrict out1,  void * restrict out2, int n)
+{
+ v_loop(in, out1, n);
+ can_clobber_vtype();
+ v_loop(in, out2, n);
+}
+
+/* Two V instructions seperated by an inline asm with vtype clobber.
+   Both need to have a corresponding vsetvli insn.  */
+void
+vec3 (void * restrict in, void * restrict out1,  void * restrict out2, int n)
+{
+ v_loop(in, out1, n);
+ asm volatile("":::"vtype");
+ v_loop(in, out2, n);
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 5 } } */
-- 
2.34.1



[Patch, fortran] PR110987 and PR113885 - gimplifier ICEs and wrong results in finalization

2024-03-28 Thread Paul Richard Thomas
Hi All,

The attached patch has two elements:

(i) A fix for gimplifier ICEs with derived type having no components. The
reporter himself suggested (thanks Kirill!):

-  if (derived && derived->attr.zero_comp)
+  if (derived && (derived->components == NULL))

As far as I can tell, this is the correct fix. I tried setting
attr.zero_comp in resolve.cc for all the OK types without components but
this caused all sorts of fallout.

(ii) Final calls were occurring in the wrong place for finalizable
elemental function calls within scalarizer loops. This caused incorrect
results even for derived types with components. This is also fixed.

It should be noted that finalizer calls from the rhs of an assignment are
occurring at the wrong time, since F2018/24-7.5.6.3 requires:
"If an executable construct references a nonpointer function, the result is
finalized after execution of the innermost executable construct containing
the reference.", while in the present implementation, this happening just
before assignment to the lhs temporary. Fixing this is going to be really
tough and invasive, so I decided that getting the right results and the
correct number of finalization should be sufficient for the 14-branch
release. As it happens, I had been mulling over how to do this for
finalizations hidden in constructors and other contexts than assignment
(eg. write statements or allocation with source). It's a few months away
and will be appropriate for stage 1.

Regtests on x86_64 - OK for mainline and then, after a bit, for backporting
to 13-branch?

Regards to all

Paul

Fortran: Fix a gimplifier ICE/wrong result with finalization [PR104555]

2024-03-28  Paul Thomas  

gcc/fortran
PR fortran/36337
PR fortran/110987
PR fortran/113885
* trans-expr.cc (gfc_trans_assignment_1): Place finalization
block before rhs post block for elemental rhs.
* trans.cc (gfc_finalize_tree_expr): Check directly if a type
has no components, rather than the zero components attribute.
Treat elemental zero component expressions in the same way as
scalars.


gcc/testsuite/
PR fortran/113885
* gfortran.dg/finalize_54.f90: New test.
* gfortran.dg/finalize_55.f90: New test.

gcc/testsuite/
PR fortran/110987
* gfortran.dg/finalize_56.f90: New test.
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 76bed9830c4..079ac93aa8a 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -12511,11 +12511,14 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
   gfc_add_block_to_block (, );
   gfc_add_expr_to_block (, tmp);

-  /* Add the post blocks to the body.  */
-  if (!l_is_temp)
+  /* Add the post blocks to the body.  Scalar finalization must appear before
+ the post block in case any dellocations are done.  */
+  if (rse.finalblock.head
+  && (!l_is_temp || (expr2->expr_type == EXPR_FUNCTION
+			 && gfc_expr_attr (expr2).elemental)))
 {
-  gfc_add_block_to_block (, );
   gfc_add_block_to_block (, );
+  gfc_add_block_to_block (, );
 }
   else
 gfc_add_block_to_block (, );
diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index 7f50b16aee9..badad6ae892 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -1624,7 +1624,7 @@ gfc_finalize_tree_expr (gfc_se *se, gfc_symbol *derived,
 }
   else if (derived && gfc_is_finalizable (derived, NULL))
 {
-  if (derived->attr.zero_comp && !rank)
+  if (!derived->components && (!rank || attr.elemental))
 	{
 	  /* Any attempt to assign zero length entities, causes the gimplifier
 	 all manner of problems. Instead, a variable is created to act as
@@ -1675,7 +1675,7 @@ gfc_finalize_tree_expr (gfc_se *se, gfc_symbol *derived,
 	  final_fndecl);
   if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (desc)))
 {
-  if (is_class)
+  if (is_class || attr.elemental)
 	desc = gfc_conv_scalar_to_descriptor (se, desc, attr);
   else
 	{
@@ -1685,7 +1685,7 @@ gfc_finalize_tree_expr (gfc_se *se, gfc_symbol *derived,
 	}
 }

-  if (derived && derived->attr.zero_comp)
+  if (derived && !derived->components)
 {
   /* All the conditions below break down for zero length derived types.  */
   tmp = build_call_expr_loc (input_location, final_fndecl, 3,
diff --git a/gcc/testsuite/gfortran.dg/finalize_54.f90 b/gcc/testsuite/gfortran.dg/finalize_54.f90
new file mode 100644
index 000..73d32b1b333
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/finalize_54.f90
@@ -0,0 +1,47 @@
+! { dg-do compile }
+! Test the fix for PR113885, where not only was there a gimplifier ICE
+! for a derived type 't' with no components but, with a component, gfortran
+! gave wrong results.
+! Contributed by David Binderman  
+!
+module types
+  type t
+   contains
+ final :: finalize
+  end type t
+contains
+  pure subroutine finalize(x)
+type(t), intent(inout) :: x
+  end subroutine finalize
+end module types
+
+subroutine test1(x)
+  use types
+  interface
+ elemental function elem(x)
+  

Re: [PATCH] ipa: Avoid duplicate replacements in IPA-SRA transformation phase

2024-03-28 Thread Martin Jambor
Hello,

and ping, please.  (In my copy I have fixed the formatting issue spotted
by Jakub.)

Martin

On Fri, Mar 15 2024, Martin Jambor wrote:
> Hi,
>
> when the analysis part of IPA-SRA figures out that it would split out
> a scalar part of an aggregate which is known by IPA-CP to contain a
> known constant, it skips it knowing that the transformation part looks
> at IPA-CP aggregate results too and does the right thing (which can
> include doing the propagation in GIMPLE because that is the last
> moment the parameter exists).
>
> However, when IPA-SRA wants to split out a smaller non-aggregate out
> of an aggregate, which happens to be of the same size as a known
> scalar constant at the same offset, the transformation bit fails to
> recognize the situation, tries to do both splitting and constant
> propagation and in PR 111571 testcase creates a nonsensical call
> statement on which the call redirection then ICEs.
>
> Fixed by making sure we don't try to do two replacements of the same
> part of the same parameter.
>
> The look-up among replacements requires these are sorted and this
> patch just sorts them if they are not already sorted before each new
> look-up.  The worst number of sortings that can happen is number of
> parameters which are both split and have aggregate constants times
> param_ipa_max_agg_items (default 16).  I don't think complicating the
> source code to optimize for this unlikely case is worth it but if need
> be, it can of course be done.
>
> Bootstrapped and tested on x86_64-linux.  OK for master and eventually
> also the gcc-13 branch?
>
> Thanks,
>
> Martin
>
>
>
> gcc/ChangeLog:
>
> 2024-03-15  Martin Jambor  
>
>   PR ipa/111571
>   * ipa-param-manipulation.cc
>   (ipa_param_body_adjustments::common_initialization): Avoid creating
>   duplicate replacement entries.
>
> gcc/testsuite/ChangeLog:
>
> 2024-03-15  Martin Jambor  
>
>   PR ipa/111571
>   * gcc.dg/ipa/pr111571.c: New test.
> ---
>  gcc/ipa-param-manipulation.cc   | 16 
>  gcc/testsuite/gcc.dg/ipa/pr111571.c | 29 +
>  2 files changed, 45 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr111571.c
>
> diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
> index 3e0df6a6f77..4c6337cc563 100644
> --- a/gcc/ipa-param-manipulation.cc
> +++ b/gcc/ipa-param-manipulation.cc
> @@ -1525,6 +1525,22 @@ ipa_param_body_adjustments::common_initialization 
> (tree old_fndecl,
>replacement with a constant (for split aggregates passed
>by value).  */
>  
> +   if (split[parm_num])
> + {
> +   /* We must be careful not to add a duplicate
> +  replacement. */
> +   sort_replacements ();
> +   ipa_param_body_replacement *pbr =
> + lookup_replacement_1 (m_oparms[parm_num],
> +   av.unit_offset);
> +   if (pbr)
> + {
> +   /* Otherwise IPA-SRA should have bailed out.  */
> +   gcc_assert (AGGREGATE_TYPE_P (TREE_TYPE (pbr->repl)));
> +   continue;
> + }
> + }
> +
> tree repl;
> if (av.by_ref)
>   repl = av.value;
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr111571.c 
> b/gcc/testsuite/gcc.dg/ipa/pr111571.c
> new file mode 100644
> index 000..2a4adc608db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr111571.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2"  } */
> +
> +struct a {
> +  int b;
> +};
> +struct c {
> +  long d;
> +  struct a e;
> +  long f;
> +};
> +int g, h, i;
> +int j() {return 0;}
> +static void k(struct a l, int p) {
> +  if (h)
> +g = 0;
> +  for (; g; g = j())
> +if (l.b)
> +  break;
> +}
> +static void m(struct c l) {
> +  k(l.e, l.f);
> +  for (;; --i)
> +;
> +}
> +int main() {
> +  struct c n = {10, 9};
> +  m(n);
> +}
> -- 
> 2.44.0


Re: [PATCHv2 2/2] aarch64: Add support for _BitInt

2024-03-28 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Thu, Mar 28, 2024 at 03:00:46PM +, Richard Sandiford wrote:
>> >* gcc.target/aarch64/bitint-alignments.c: New test.
>> >* gcc.target/aarch64/bitint-args.c: New test.
>> >* gcc.target/aarch64/bitint-sizes.c: New test.
>> >* gcc.target/aarch64/bitfield-bitint-abi.h: New header.
>> >* gcc.target/aarch64/bitfield-bitint-abi-align16.c: New test.
>> >* gcc.target/aarch64/bitfield-bitint-abi-align8.c: New test.
>> 
>> Since we don't support big-endian yet, I assume the tests should be
>> conditional on aarch64_little_endian.
>
> Perhaps better on bitint effective target, then they'll become available
> automatically as soon as big endian aarch64 _BitInt support is turned on.

Ah, yeah, good point.

Richard


Re: [PATCHv2 2/2] aarch64: Add support for _BitInt

2024-03-28 Thread Jakub Jelinek
On Thu, Mar 28, 2024 at 03:00:46PM +, Richard Sandiford wrote:
> > * gcc.target/aarch64/bitint-alignments.c: New test.
> > * gcc.target/aarch64/bitint-args.c: New test.
> > * gcc.target/aarch64/bitint-sizes.c: New test.
> > * gcc.target/aarch64/bitfield-bitint-abi.h: New header.
> > * gcc.target/aarch64/bitfield-bitint-abi-align16.c: New test.
> > * gcc.target/aarch64/bitfield-bitint-abi-align8.c: New test.
> 
> Since we don't support big-endian yet, I assume the tests should be
> conditional on aarch64_little_endian.

Perhaps better on bitint effective target, then they'll become available
automatically as soon as big endian aarch64 _BitInt support is turned on.

Jakub



Re: [PATCHv2 2/2] aarch64: Add support for _BitInt

2024-03-28 Thread Richard Sandiford
"Andre Vieira (lists)"  writes:
> This patch adds support for C23's _BitInt for the AArch64 port when 
> compiling for little endianness.  Big Endianness requires further 
> target-agnostic support and we therefor disable it for now.
>
> The tests expose some suboptimal codegen for which I'll create PR's for 
> optimizations after this goes in.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (TARGET_C_BITINT_TYPE_INFO): Declare MACRO.
>   (aarch64_bitint_type_info): New function.
>   (aarch64_return_in_memory_1): Return large _BitInt's in memory.
>   (aarch64_function_arg_alignment): Adapt to correctly return the ABI
>   mandated alignment of _BitInt(N) where N > 128 as the alignment of
>   TImode.
>   (aarch64_composite_type_p): Return true for _BitInt(N), where N > 128.
>
> libgcc/ChangeLog:
>
>   * config/aarch64/t-softfp (softfp_extras): Add floatbitinthf,
>   floatbitintbf, floatbitinttf and fixtfbitint.
>   * config/aarch64/libgcc-softfp.ver (GCC_14.0.0): Add __floatbitinthf,
>   __floatbitintbf, __floatbitinttf and __fixtfbitint.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/bitint-alignments.c: New test.
>   * gcc.target/aarch64/bitint-args.c: New test.
>   * gcc.target/aarch64/bitint-sizes.c: New test.
>   * gcc.target/aarch64/bitfield-bitint-abi.h: New header.
>   * gcc.target/aarch64/bitfield-bitint-abi-align16.c: New test.
>   * gcc.target/aarch64/bitfield-bitint-abi-align8.c: New test.

Since we don't support big-endian yet, I assume the tests should be
conditional on aarch64_little_endian.

> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/bitfield-bitint-abi-align16.c 
> b/gcc/testsuite/gcc.target/aarch64/bitfield-bitint-abi-align16.c
> new file mode 100644
> index 
> ..048d04e4c1bf90215892aa0173f6246a097d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/bitfield-bitint-abi-align16.c
> @@ -0,0 +1,378 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -save-temps -fno-schedule-insns 
> -fno-schedule-insns2" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#define ALIGN 16
> +#include "bitfield-bitint-abi.h"
> +
> +// f1-f16 are all the same
> +
> +/*
> +** f1:
> +**   and x0, x2, 1
> +**   ret
> +*/
> +/*
> +** f8:
> +**   and x0, x2, 1
> +**   ret
> +*/
> +/*
> +** f16:
> +**   and x0, x2, 1
> +**   ret
> +*/
> +
> +/* fp seems to be unable to optimize away stack-usage, TODO: to fix.  */
> +
> +/*
> +** fp:
> +**...
> +**   and x0, x1, 1
> +**...
> +**   ret
> +*/
> +
> +// all other f1p-f8p generate the same code, for f16p the value comes from x2
> +/*
> +** f1p:
> +**   and x0, x1, 1
> +**   ret
> +*/
> +/*
> +** f8p:
> +**   and x0, x1, 1
> +**   ret
> +*/
> +/*
> +** f16p:
> +**   and x0, x2, 1
> +**   ret
> +*/
> +
> +// g1-g16 are all the same
> +/*
> +** g1:
> +**   mov (x[0-9]+), x0
> +**   mov w0, w1
> +**   and x4, \1, 9223372036854775807
> +**   and x2, \1, 1
> +**   mov x3, 0
> +**   b   f1
> +*/
> +
> +/*
> +** g8:
> +**   mov (x[0-9]+), x0
> +**   mov w0, w1
> +**   and x4, \1, 9223372036854775807
> +**   and x2, \1, 1
> +**   mov x3, 0
> +**   b   f8
> +*/
> +/*
> +** g16:
> +**   mov (x[0-9]+), x0
> +**   mov w0, w1
> +**   and x4, \1, 9223372036854775807
> +**   and x2, \1, 1
> +**   mov x3, 0
> +**   b   f16
> +*/
> +
> +// again gp different from the rest
> +
> +/*
> +** gp:
> +**   sub sp, sp, #16
> +**   mov (x[0-9]+), x0
> +**   mov w0, w1
> +**   sbfxx([0-9]+), \1, 0, 63
> +**   mov (w[0-9]+), 0
> +**   bfi \3, w\2, 0, 1
> +**   and x3, x\2, 9223372036854775807
> +**   mov x2, 0
> +**   str xzr, \[sp\]
> +**   strb\3, \[sp\]
> +**   ldr x1, \[sp\]
> +**   add sp, sp, 16
> +**   b   fp
> +*/
> +
> +// g1p-g8p are all the same, g16p uses x2 to pass parameter to f16p
> +
> +/*
> +** g1p:
> +**   mov (w[0-9]+), w1
> +**   and x3, x0, 9223372036854775807
> +**   and x1, x0, 1
> +**   mov x2, 0
> +**   mov w0, \1
> +**   b   f1p
> +*/
> +/*
> +** g8p:
> +**   mov (w[0-9]+), w1
> +**   and x3, x0, 9223372036854775807
> +**   and x1, x0, 1
> +**   mov x2, 0
> +**   mov w0, \1
> +**   b   f8p
> +*/
> +/*
> +** g16p:
> +**   mov (x[0-9]+), x0
> +**   mov w0, w1
> +**   and x4, \1, 9223372036854775807
> +**   and x2, \1, 1
> +**   mov x3, 0
> +**   b   f16p
> +*/
> +
> +// f*_stack are all the same
> +/*
> +** f1_stack:
> +**   ldr (x[0-9]+), \[sp, 16\]
> +**   and x0, \1, 1
> +**   ret
> +*/
> +/*
> +** f8_stack:
> +**   ldr (x[0-9]+), \[sp, 16\]
> +**   and x0, \1, 1
> +**   ret
> +*/
> +/*
> +** f16_stack:
> +**   ldr (x[0-9]+), \[sp, 16\]
> +**   and x0, \1, 1
> +**   ret
> +*/
> +
> +// fp{,1,8}_stack are all the same but fp16_stack loads from 

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-03-28 Thread Matthias Kretz
On Mittwoch, 27. März 2024 14:34:52 CET Richard Sandiford wrote:
> Matthias Kretz  writes:
> > The big issue here is that, IIUC, a user (and the simd library) cannot do
> > the right thing at the moment. There simply isn't enough context
> > information available when parsing the  header. I.e.
> > on definition of the class template there's no facility to take
> > target_clones or SME "streaming" mode into account. Consequently, if we
> > want the library to be fit for SME, then we need more language
> > extension(s) to make it work.
> 
> Yeah.  I think the same applies to plain SVE.

With "plain SVE" you mean the *scalable* part of it, right? BTW, I've 
experimented with implementing simd basically as

template 
class simd
{
  alignas(bit_ceil(sizeof(T) * N)) T data[N];

See here: https://compiler-explorer.com/z/WW6KqanTW

Maybe the compiler can get better at optimizing this approach. But for now 
it's not a solution for a *scalable* variant, because every code is going to 
be load/store bound from the get go.

@Srinivas: See the guard variables for __index0123? They need to go. I believe 
you can and should declare them `constexpr`.

>  It seems reasonable to
> have functions whose implementation is specialised for a specific SVE
> length, with that function being selected at runtime where appropriate.
> Those functions needn't (in principle) be in separate TUs.  The “best”
> definition of native then becomes a per-function property rather
> than a per-TU property.

Hmm, I never considered this; but can one actually write fixed-length SVE code 
if -msve-vector-bits is not given? Then it's certainly possible to write a 
single TU with a runtime dispatch for all different SVE-widths. (This is less 
interesting on x86 where we need to dispatch on ISA extensions *and* vector 
width. It's much simpler (and safer) to compile a TU multiple times, 
restricted to a certain set of ISA extensions and then dispatch to the right 
translation at from some general code section.)

> As you note later, I think the same thing would apply to x86_64.

Yes. I don't think "same" is the case (yet) but it's very similar. Once ARM is 
at SVE9  and binaries need to support HW from SVE2 up to SVE9 it gets closer 
to "same".

> > The big issue I see here is that currently all of std::* is declared
> > without a arm_streaming or arm_streaming_compatible. Thus, IIUC, you
> > can't use anything from the standard library in streaming mode. Since
> > that also applies to std::experimental::simd, we're not creating a new
> > footgun, only missing out on potential users?
> 
> Kind-of.  However, we can inline a non-streaming function into a streaming
> function if that doesn't change defined behaviour.  And that's important
> in practice for C++, since most trivial inline functions will not be
> marked streaming-compatible despite being so in practice.

Ah good to know that it takes a pragmatic approach here. But I imagine this 
could become a source of confusion to users.

> > [...]
> > the compiler *must* virally apply target_clones to all functions it calls.
> > And member functions must either also get cloned as functions, or the
> > whole type must be cloned (as in the std::simd case, where the sizeof
> > needs to change). 
> Yeah, tricky :)
> 
> It's also not just about vector widths.  The target-clones case also has
> the problem that you cannot detect at include time which features are
> available.  E.g. “do I have SVE2-specific instructions?” becomes a
> contextual question rather than a global question.
> 
> Fortunately, this should just be a missed optimisation.  But it would be
> nice if uses of std::simd in SVE2 clones could take advantage of SVE2-only
> instructions, even if SVE2 wasn't enabled at include time.

Exactly. Even if we solve the scalable vector-length question, the 
target_clones question stays relevant.

So far my best answer, for x86 at least, is to compile the SIMD code multiple 
times into different shared libraries. And then let the dynamic linker pick 
the right library variant depending on the CPU. I'd be happy to have something 
simpler and working right out of the box.

Best,
  Matthias

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──





Re: [PATCH] RISC-V: Refine the condition for add additional vars in RVV cost model

2024-03-28 Thread Jeff Law




On 3/28/24 4:31 AM, demin.han wrote:

The adjacent_dr_p is sufficient and unnecessary condition for contiguous access.
So unnecessary live-ranges are added and result in spill.

This patch uses MEMORY_ACCESS_TYPE as condition and constrains segment
load/store.

Tested on RV64 and no regression.

PR target/114506

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (non_contiguous_memory_access_p): 
Rename
(need_additional_vector_vars_p): Rename and refine condition

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr114506.c: New test.
Note I think this should defer to gcc-15.  It doesn't affect code 
correctness AFAICT and it's not a regression relative to gcc-13.


jeff



[committed] predict: Fix comment typo

2024-03-28 Thread Jakub Jelinek
Hi!

I've noticed a typo in a comment.

Fixed thusly, committed to trunk as obvious.

2024-03-28  Jakub Jelinek  

* predict.cc (estimate_bb_frequencies): Fix comment typo,
scalling -> scaling.

--- gcc/predict.cc.jj   2024-01-18 08:44:33.593917768 +0100
+++ gcc/predict.cc  2024-03-28 14:20:22.642724959 +0100
@@ -4035,7 +4035,7 @@ estimate_bb_frequencies ()
 
   /* Scaling frequencies up to maximal profile count may result in
  frequent overflows especially when inlining loops.
- Small scalling results in unnecesary precision loss.  Stay in
+ Small scaling results in unnecesary precision loss.  Stay in
  the half of the (exponential) range.  */
   freq_max = (sreal (1) << (profile_count::n_bits / 2)) / freq_max;
   if (freq_max < 16)

Jakub



Re: [PATCH] profile-count: Avoid overflows into uninitialized [PR112303]

2024-03-28 Thread Jan Hubicka
Hi,
so what goes wrong with the testcase is the fact that after recursive
inliing we have large loop nest and consequently invalid profile since
every loop is predicted to iterate quite a lot.  Rebuild_frequences
should take care of the problem, but it doesn't since there is:
  if (freq_max < 16)
freq_max = 16;
Removing this check solves the testcase.  Looking how it went in, I made
it in 2017 when dropping the original code to scale into range 0...1
https://gcc.gnu.org/pipermail/gcc-patches/2017-November/488115.html

I have no recolleciton of inventing that check, but I suppose one can
argue that we do not want to scale most of CFG to 0 since the branch
prediciton is likely wrong and we do not know if the code with
unrealistic BB profile is important at all.  So perhaps it is safer to
cap rather than scale most of function body to 0.

profile_count arithmetics is indeed supposed to be saturating, it is bad
I managed to miss the check for such a common operation as + :(
> 
> 2024-03-27  Jakub Jelinek  
> 
>   PR tree-optimization/112303
>   * profile-count.h (profile_count::operator+): Perform
>   addition in uint64_t variable and set m_val to MIN of that
>   val and max_count.
>   (profile_count::operator+=): Likewise.
>   (profile_count::operator-=): Formatting fix.

These two changes as OK.

In apply_probability
> @@ -1127,7 +1129,9 @@ public:
>if (!initialized_p ())
>   return uninitialized ();
>profile_count ret;
> -  ret.m_val = RDIV (m_val * prob, REG_BR_PROB_BASE);
> +  uint64_t tmp;
> +  safe_scale_64bit (m_val, prob, REG_BR_PROB_BASE, );
> +  ret.m_val = MIN (tmp, max_count);

This exists only for old code that still uses REG_BR_PROB_BAS integer.
Valid prob is always prob <= REG_BR_PROB_BASE :)
So we need safe_scale_64bit to watch overflow, but result does not need
MIN.

>ret.m_quality = MIN (m_quality, ADJUSTED);
>return ret;
>  }
> @@ -1145,7 +1149,7 @@ public:
>uint64_t tmp;
>safe_scale_64bit (m_val, prob.m_val, 
> profile_probability::max_probability,
>   );
> -  ret.m_val = tmp;
> +  ret.m_val = MIN (tmp, max_count);

Same here, it is unnecesary to do MIN.

OK with this change.

Thanks for looking into this,
Honza


Re: [PATCH] RISC-V: testsuite: ensure vtype is call clobbered

2024-03-28 Thread Jeff Law




On 3/27/24 4:14 PM, Vineet Gupta wrote:

Per classic Vector calling convention ABI, vtype is call clobbered,
so ensure gcc generates fresh a VSETVLI after a function call or an
inline asm which clobbers vtype.

ATM gcc seems to be doing the right thing, but a test can never be
harmful.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vtype-call-clobbered.c: New Test.

OK
jeff



Re: [PATCH] profile-count: Avoid overflows into uninitialized [PR112303]

2024-03-28 Thread Jan Hubicka
> __attribute__((noipa)) void
> bar (void)
> {
>   __builtin_exit (0);
> }
> 
> __attribute__((noipa)) void
> foo (void)
> {
>   for (int i = 0; i < 1000; ++i)
>   for (int j = 0; j < 1000; ++j)
>   for (int k = 0; k < 1000; ++k)
>   for (int l = 0; l < 1000; ++l)
>   for (int m = 0; m < 1000; ++m)
>   for (int n = 0; n < 1000; ++n)
>   for (int o = 0; o < 1000; ++o)
>   for (int p = 0; p < 1000; ++p)
>   for (int q = 0; q < 1000; ++q)
>   for (int r = 0; r < 1000; ++r)
>   for (int s = 0; s < 1000; ++s)
>   for (int t = 0; t < 1000; ++t)
>   for (int u = 0; u < 1000; ++u)
>   for (int v = 0; v < 1000; ++v)
>   for (int w = 0; w < 1000; ++w)
>   for (int x = 0; x < 1000; ++x)
>   for (int y = 0; y < 1000; ++y)
>   for (int z = 0; z < 1000; ++z)
>   for (int a = 0; a < 1000; ++a)
>   for (int b = 0; b < 1000; ++b)
> bar ();
> }
> 
> int
> main ()
> {
>   foo ();
> }
> reaches the maximum count already on the 11th loop.

This should not happen - this is a reason why we esimate in sreals and
convert to profile_counts only later.  In this case we should push down
the profile_count of entry block (to 0)

  freq_max = 0;
  FOR_EACH_BB_FN (bb, cfun)
if (freq_max < BLOCK_INFO (bb)->frequency)
  freq_max = BLOCK_INFO (bb)->frequency;

  /* Scaling frequencies up to maximal profile count may result in
 frequent overflows especially when inlining loops.
 Small scalling results in unnecesary precision loss.  Stay in
 the half of the (exponential) range.  */
  freq_max = (sreal (1) << (profile_count::n_bits / 2)) / freq_max;
  if (freq_max < 16)
freq_max = 16;

I am looking on what goes wrong here.
Honza


Re: [PATCHv2 1/2] aarch64: Do not give ABI change diagnostics for _BitInt(N)

2024-03-28 Thread Richard Sandiford
"Andre Vieira (lists)"  writes:
> This patch makes sure we do not give ABI change diagnostics for the ABI 
> breaks of GCC 9, 13 and 14 for any type involving _BitInt(N), since that 
> type did not exist before this GCC version.
>
> ChangeLog:
>
>   * config/aarch64/aarch64.cc (bitint_or_aggr_of_bitint_p): New function.
>   (aarch64_layout_arg): Don't emit diagnostics for types involving
>   _BitInt(N).
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 1ea84c8bd7386e399f6ffa3a5e36408cf8831fc6..b68cf3e7cb9a6fa89b4e5826a39ffa11f64ca20a
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -6744,6 +6744,33 @@ aarch64_function_arg_alignment (machine_mode mode, 
> const_tree type,
>return alignment;
>  }
>  
> +/* Return true if TYPE describes a _BitInt(N) or an angreggate that uses the
> +   _BitInt(N) type.  These include ARRAY_TYPE's with an element that is a
> +   _BitInt(N) or an aggregate that uses it, and a RECORD_TYPE or a UNION_TYPE
> +   with a field member that is a _BitInt(N) or an aggregate that uses it.
> +   Return false otherwise.  */
> +
> +static bool
> +bitint_or_aggr_of_bitint_p (tree type)
> +{
> +  if (!type)
> +return false;
> +
> +  if (TREE_CODE (type) == BITINT_TYPE)
> +return true;
> +
> +  /* If ARRAY_TYPE, check it's element type.  */
> +  if (TREE_CODE (type) == ARRAY_TYPE)
> +return bitint_or_aggr_of_bitint_p (TREE_TYPE (type));
> +
> +  /* If RECORD_TYPE or UNION_TYPE, check the fields' types.  */
> +  if (RECORD_OR_UNION_TYPE_P (type))
> +for (tree field = TYPE_FIELDS (type); field; field = TREE_CHAIN (field))
> +  if (bitint_or_aggr_of_bitint_p (TREE_TYPE (field)))
> + return true;
> +  return false;
> +}
> +
>  /* Layout a function argument according to the AAPCS64 rules.  The rule
> numbers refer to the rule numbers in the AAPCS64.  ORIG_MODE is the
> mode that was originally given to us by the target hook, whereas the
> @@ -6767,12 +6794,6 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
> function_arg_info )
>if (pcum->aapcs_arg_processed)
>  return;
>  
> -  bool warn_pcs_change
> -= (warn_psabi
> -   && !pcum->silent_p
> -   && (currently_expanding_function_start
> -|| currently_expanding_gimple_stmt));
> -
>/* HFAs and HVAs can have an alignment greater than 16 bytes.  For example:
>  
> typedef struct foo {
> @@ -6907,6 +6928,18 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
> function_arg_info )
> && (!alignment || abi_break_gcc_9 < alignment)
> && (!abi_break_gcc_13 || alignment < abi_break_gcc_13));
>  
> +
> +  bool warn_pcs_change
> += (warn_psabi
> +   && !pcum->silent_p
> +   && (currently_expanding_function_start
> +|| currently_expanding_gimple_stmt)
> +  /* warn_pcs_change is currently used to gate diagnostics in case of
> +  abi_break_gcc_{9,13,14}.  These however, do not apply to _BitInt(N)
> +  types as they were only introduced in GCC 14.  */
> +   && (!type || !bitint_or_aggr_of_bitint_p (type)));

How about making this a new variable such as:

  /* _BitInt(N) was only added in GCC 14.  */
  bool warn_pcs_change_le_gcc14
= (warn_psabi && !bitint_or_aggr_of_bitint_p (type);

(and keeping warn_pcs_change where it is).  In principle, warn_pcs_change
is meaningful for any future ABI breaks, and we might forget that it
excludes bitints.  The name is just a suggestion.

OK with that change, thanks.

Richard

> +
> +
>/* allocate_ncrn may be false-positive, but allocate_nvrn is quite 
> reliable.
>   The following code thus handles passing by SIMD/FP registers first.  */
>  
> @@ -21266,19 +21299,25 @@ aarch64_gimplify_va_arg_expr (tree valist, tree 
> type, gimple_seq *pre_p,
>rsize = ROUND_UP (size, UNITS_PER_WORD);
>nregs = rsize / UNITS_PER_WORD;
>  
> -  if (align <= 8 && abi_break_gcc_13 && warn_psabi)
> +  if (align <= 8
> +   && abi_break_gcc_13
> +   && warn_psabi
> +   && !bitint_or_aggr_of_bitint_p (type))
>   inform (input_location, "parameter passing for argument of type "
>   "%qT changed in GCC 13.1", type);
>  
>if (warn_psabi
> && abi_break_gcc_14
> -   && (abi_break_gcc_14 > 8 * BITS_PER_UNIT) != (align > 8))
> +   && (abi_break_gcc_14 > 8 * BITS_PER_UNIT) != (align > 8)
> +   && !bitint_or_aggr_of_bitint_p (type))
>   inform (input_location, "parameter passing for argument of type "
>   "%qT changed in GCC 14.1", type);
>  
>if (align > 8)
>   {
> -   if (abi_break_gcc_9 && warn_psabi)
> +   if (abi_break_gcc_9
> +   && warn_psabi
> +   && !bitint_or_aggr_of_bitint_p (type))
>   inform (input_location, "parameter passing for argument of type "
>   "%qT changed in GCC 9.1", type);
> dw_align = true;


[PATCH] c++/modules: Prefer partition indexes when installing imported entities [PR99377]

2024-03-28 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

The testcase in comment 15 of the linked PR is caused because the
following assumption in depset::hash::make_dependency doesn't hold:

  if (DECL_LANG_SPECIFIC (not_tmpl)
  && DECL_MODULE_IMPORT_P (not_tmpl))
{
  /* Store the module number and index in cluster/section,
 so we don't have to look them up again.  */
  unsigned index = import_entity_index (decl);
  module_state *from = import_entity_module (index);
  /* Remap will be zero for imports from partitions, which
 we want to treat as-if declared in this TU.  */
  if (from->remap)
{
  dep->cluster = index - from->entity_lwm;
  dep->section = from->remap;
  dep->set_flag_bit ();
}
}

This is because at least for template specialisations, we first see the
declaration in the header unit imported from the partition, and then the
instantiation provided by the partition itself.  This means that the
'import_entity_index' lookup doesn't report that the specialisation was
declared in the partition and thus should be considered as-if it was
part of the TU, and get exported.

To fix this, this patch allows, as a special case for installing an
entity from a partition, to overwrite the entity_map entry with the
(later) index into the partition so that this assumption holds again.

We only do this for the first time we override with a partition, so that
entities are at least still reported as originating from the first
imported partition that declares them (rather than the last); existing
tests check for this and this seems to be a friendlier approach to go
for, albeit slightly more expensive.

PR c++/99377

gcc/cp/ChangeLog:

* module.cc (trees_in::install_entity): Overwrite entity map
index if installing from a partition.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99377-3_a.H: New test.
* g++.dg/modules/pr99377-3_b.C: New test.
* g++.dg/modules/pr99377-3_c.C: New test.
* g++.dg/modules/pr99377-3_d.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc   | 13 +
 gcc/testsuite/g++.dg/modules/pr99377-3_a.H | 17 +
 gcc/testsuite/g++.dg/modules/pr99377-3_b.C | 10 ++
 gcc/testsuite/g++.dg/modules/pr99377-3_c.C |  5 +
 gcc/testsuite/g++.dg/modules/pr99377-3_d.C |  8 
 5 files changed, 53 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99377-3_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99377-3_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99377-3_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99377-3_d.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 8aab9ea0bae..55ca17a88da 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -7649,6 +7649,19 @@ trees_in::install_entity (tree decl)
   gcc_checking_assert (!existed);
   slot = ident;
 }
+  else if (state->is_partition ())
+{
+  /* The decl is already in the entity map but we see it again now from a
+partition: we want to overwrite if the original decl wasn't also from
+a (possibly different) partition.  Otherwise, for things like template
+instantiations, make_dependency might not realise that this is also
+provided from a partition and should be considered part of this module
+(and thus always exported).  */
+  unsigned *slot = entity_map->get (DECL_UID (decl));
+  module_state *imp = import_entity_module (*slot);
+  if (!imp->is_partition ())
+   *slot = ident;
+}
 
   return true;
 }
diff --git a/gcc/testsuite/g++.dg/modules/pr99377-3_a.H 
b/gcc/testsuite/g++.dg/modules/pr99377-3_a.H
new file mode 100644
index 000..580a7631ae1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99377-3_a.H
@@ -0,0 +1,17 @@
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+template
+struct Widget
+{
+  Widget (int) { }
+
+  bool First() const { return true; }
+
+  bool Second () const { return true;}
+};
+
+inline void Frob (const Widget& w) noexcept
+{
+  w.First ();
+}
diff --git a/gcc/testsuite/g++.dg/modules/pr99377-3_b.C 
b/gcc/testsuite/g++.dg/modules/pr99377-3_b.C
new file mode 100644
index 000..5cbce7b3544
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99377-3_b.C
@@ -0,0 +1,10 @@
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi Foo:check }
+
+export module Foo:check;
+import "pr99377-3_a.H";
+
+export inline bool Check (const Widget& w)
+{
+  return w.Second ();
+}
diff --git a/gcc/testsuite/g++.dg/modules/pr99377-3_c.C 
b/gcc/testsuite/g++.dg/modules/pr99377-3_c.C
new file mode 100644
index 000..fa7c24203bd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99377-3_c.C
@@ -0,0 +1,5 @@
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi Foo }
+
+export module Foo;
+export import :check;
diff --git 

RE: RE: [PATCH] RISC-V: Refine the condition for add additional vars in RVV cost model

2024-03-28 Thread Demin Han
OK,I will spilt them.

Thanks.

From: juzhe.zh...@rivai.ai 
Sent: 2024年3月28日 19:11
To: Demin Han ; gcc-patches 

Cc: kito.cheng ; pan2.li ; jeffreyalaw 
; Robin Dapp 
Subject: 回复: RE: [PATCH] RISC-V: Refine the condition for add additional vars 
in RVV cost model

OK. It's an obvious fix but it seems to be unrelated to the PR.

Could you split it 2 separate patches ?

Thanks.


juzhe.zh...@rivai.ai

发件人: Demin Han
发送时间: 2024-03-28 19:06
收件人: juzhe.zh...@rivai.ai; 
gcc-patches
抄送: kito.cheng; pan2.li; 
jeffreyalaw; Robin 
Dapp
主题: RE: [PATCH] RISC-V: Refine the condition for add additional vars in RVV 
cost model
Hi,
the point starts from 1. the max_point should equal to length();

Should I prepare an individual patch for this?

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: 2024年3月28日 18:45
To: Demin Han mailto:demin@starfivetech.com>>; 
gcc-patches mailto:gcc-patches@gcc.gnu.org>>
Cc: kito.cheng mailto:kito.ch...@gmail.com>>; pan2.li 
mailto:pan2...@intel.com>>; jeffreyalaw 
mailto:jeffreya...@gmail.com>>; Robin Dapp 
mailto:rdapp@gmail.com>>
Subject: Re: [PATCH] RISC-V: Refine the condition for add additional vars in 
RVV cost model

Thanks a lot for trying to optimize the dynamic LMUL cost model.

The need_additional_vector_vars_p looks good to me.



But

-  = (*program_points_per_bb.get (bb)).length () - 1;

+  = (*program_points_per_bb.get (bb)).length ();
I wonder why you remove - 1?


juzhe.zh...@rivai.ai

From: demin.han
Date: 2024-03-28 18:31
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Refine the condition for add additional vars in RVV 
cost model
The adjacent_dr_p is sufficient and unnecessary condition for contiguous access.
So unnecessary live-ranges are added and result in spill.

This patch uses MEMORY_ACCESS_TYPE as condition and constrains segment
load/store.

Tested on RV64 and no regression.

PR target/114506

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (non_contiguous_memory_access_p): Rename
(need_additional_vector_vars_p): Rename and refine condition

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr114506.c: New test.

Signed-off-by: demin.han 
mailto:demin@starfivetech.com>>
---
gcc/config/riscv/riscv-vector-costs.cc| 25 ---
.../vect/costmodel/riscv/rvv/pr114506.c   | 23 +
2 files changed, 39 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index f462c272a6e..9f7fe936a29 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -563,14 +563,24 @@ get_store_value (gimple *stmt)
 return gimple_assign_rhs1 (stmt);
}
-/* Return true if it is non-contiguous load/store.  */
+/* Return true if addtional vector vars needed.  */
static bool
-non_contiguous_memory_access_p (stmt_vec_info stmt_info)
+need_additional_vector_vars_p (stmt_vec_info stmt_info)
{
   enum stmt_vec_info_type type
 = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-  return ((type == load_vec_info_type || type == store_vec_info_type)
-   && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)));
+  if (type == load_vec_info_type || type == store_vec_info_type)
+{
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
+ return true;
+
+  machine_mode mode = TYPE_MODE (STMT_VINFO_VECTYPE (stmt_info));
+  int lmul = riscv_get_v_regno_alignment (mode);
+  if (DR_GROUP_SIZE (stmt_info) * lmul > RVV_M8)
+ return true;
+}
+  return false;
}
/* Return the LMUL of the current analysis.  */
@@ -739,10 +749,7 @@ update_local_live_ranges (
  stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
  enum stmt_vec_info_type type
= STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-   if (non_contiguous_memory_access_p (stmt_info)
-   /* LOAD_LANES/STORE_LANES doesn't need a perm indice.  */
-   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)
-!= VMAT_LOAD_STORE_LANES)
+   if (need_additional_vector_vars_p (stmt_info))
{
  /* For non-adjacent load/store STMT, we will potentially
convert it into:
@@ -752,7 +759,7 @@ update_local_live_ranges (
We will be likely using one more vector 

[PATCH 1/1] [RISCV] Add support for _Bfloat16

2024-03-28 Thread Xiao Zeng
1 At point ,
  BF16 has already been completed "post public review".

2 LLVM has also added support for RISCV BF16 in
   and
  .

3 According to the discussion 
,
  this use __bf16 and use DF16b in riscv_mangle_type like x86.

Below test are passed for this patch
* The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/iterators.md: New mode iterator HFBF.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type _Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_init_libfuncs): Set the conversion method for BFmode and
HFmode.
(riscv_block_arith_comp_libfuncs_for_mode): Set the arithmetic
and comparison libfuncs for the mode.
* config/riscv/riscv.md (mode" ): Add BF.
(movhf): Support for BFmode.
(mov): Ditto.
(*movhf_softfloat): Ditto.
(*mov_softfloat): Ditto.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): Ditto.
* config/riscv/t-softfp32: Add support for BF16 libfuncs.
* config/riscv/t-softfp64: Ditto.
* soft-fp/floatsibf.c: For si -> bf16.
* soft-fp/floatunsibf.c: For unsi -> bf16.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparison.c: New test.
* gcc.target/riscv/bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/bf16_integer_libcall_convert.c: New test.

Co-authored-by: Jin Ma 
---
 gcc/config/riscv/iterators.md |  2 +
 gcc/config/riscv/riscv-builtins.cc| 16 
 gcc/config/riscv/riscv-modes.def  |  3 +
 gcc/config/riscv/riscv.cc | 64 ++-
 gcc/config/riscv/riscv.md | 24 +++---
 .../gcc.target/riscv/bf16_arithmetic.c| 42 ++
 gcc/testsuite/gcc.target/riscv/bf16_call.c| 12 +++
 .../gcc.target/riscv/bf16_comparison.c| 36 +
 .../riscv/bf16_float_libcall_convert.c| 57 +
 .../riscv/bf16_integer_libcall_convert.c  | 81 +++
 libgcc/config/riscv/sfp-machine.h |  3 +
 libgcc/config/riscv/t-softfp32| 10 ++-
 libgcc/config/riscv/t-softfp64|  3 +-
 libgcc/soft-fp/floatsibf.c| 45 +++
 libgcc/soft-fp/floatunsibf.c  | 45 +++
 15 files changed, 407 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_call.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_comparison.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_float_libcall_convert.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/bf16_integer_libcall_convert.c
 create mode 100644 libgcc/soft-fp/floatsibf.c
 create mode 100644 libgcc/soft-fp/floatunsibf.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index a7694137685..40bf20f42bb 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -75,6 +75,8 @@
 ;; Iterator for floating-point modes that can be loaded into X registers.
 (define_mode_iterator SOFTF [SF (DF "TARGET_64BIT") (HF "TARGET_ZFHMIN")])
 
+;; Iterator for floating-point modes of BF16
+(define_mode_iterator HFBF [HF BF])
 
 ;; ---
 ;; Mode attributes
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index d457e306dd1..4c08834288a 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -230,6 +230,7 @@ static GTY(()) int riscv_builtin_decl_index[NUM_INSN_CODES];
   riscv_builtin_decls[riscv_builtin_decl_index[(CODE)]]
 
 tree riscv_float16_type_node = NULL_TREE;
+tree riscv_bfloat16_type_node = NULL_TREE;
 
 /* Return the function type associated with function prototype TYPE.  */
 
@@ -273,6 +274,21 @@ riscv_init_builtin_types (void)
   if (!maybe_get_identifier ("_Float16"))
 lang_hooks.types.register_builtin_type (riscv_float16_type_node,
"_Float16");
+
+  /* Provide the _Bfloat16 type and bfloat16_type_node if needed.  */
+  if (!bfloat16_type_node)
+{
+  riscv_bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (riscv_bfloat16_type_node) = 16;
+  SET_TYPE_MODE (riscv_bfloat16_type_node, BFmode);
+  layout_type (riscv_bfloat16_type_node);
+ 

[PATCH 0/1] [RISCV] Add support for _Bfloat16

2024-03-28 Thread Xiao Zeng
Hi all RISC-V folks:

This patch completes the support for the bf16 data type in the 
riscv architecture.On this basis, there will be a series of
patches in the future to strengthen support for BF16.

It is recommended to first review this patch from the testcases,
where detailed explanations have been provided on the flow of
data type conversion.

The basis of this patch is: 


Xiao Zeng (1):
  [RISCV] Add support for _Bfloat16

 gcc/config/riscv/iterators.md |  2 +
 gcc/config/riscv/riscv-builtins.cc| 16 
 gcc/config/riscv/riscv-modes.def  |  3 +
 gcc/config/riscv/riscv.cc | 64 ++-
 gcc/config/riscv/riscv.md | 24 +++---
 .../gcc.target/riscv/bf16_arithmetic.c| 42 ++
 gcc/testsuite/gcc.target/riscv/bf16_call.c| 12 +++
 .../gcc.target/riscv/bf16_comparison.c| 36 +
 .../riscv/bf16_float_libcall_convert.c| 57 +
 .../riscv/bf16_integer_libcall_convert.c  | 81 +++
 libgcc/config/riscv/sfp-machine.h |  3 +
 libgcc/config/riscv/t-softfp32| 10 ++-
 libgcc/config/riscv/t-softfp64|  3 +-
 libgcc/soft-fp/floatsibf.c| 45 +++
 libgcc/soft-fp/floatunsibf.c  | 45 +++
 15 files changed, 407 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_call.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_comparison.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_float_libcall_convert.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/bf16_integer_libcall_convert.c
 create mode 100644 libgcc/soft-fp/floatsibf.c
 create mode 100644 libgcc/soft-fp/floatunsibf.c

-- 
2.17.1



回复: RE: [PATCH] RISC-V: Refine the condition for add additional vars in RVV cost model

2024-03-28 Thread juzhe.zh...@rivai.ai
OK. It's an obvious fix but it seems to be unrelated to the PR.

Could you split it 2 separate patches ?

Thanks.



juzhe.zh...@rivai.ai
 
发件人: Demin Han
发送时间: 2024-03-28 19:06
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: kito.cheng; pan2.li; jeffreyalaw; Robin Dapp
主题: RE: [PATCH] RISC-V: Refine the condition for add additional vars in RVV 
cost model
Hi,
the point starts from 1. the max_point should equal to length();
 
Should I prepare an individual patch for this?
 
From: juzhe.zh...@rivai.ai  
Sent: 2024年3月28日 18:45
To: Demin Han ; gcc-patches 

Cc: kito.cheng ; pan2.li ; jeffreyalaw 
; Robin Dapp 
Subject: Re: [PATCH] RISC-V: Refine the condition for add additional vars in 
RVV cost model
 
Thanks a lot for trying to optimize the dynamic LMUL cost model.
 
The need_additional_vector_vars_p looks good to me.


But
-  = (*program_points_per_bb.get (bb)).length () - 1;+  
= (*program_points_per_bb.get (bb)).length ();
I wonder why you remove - 1?
 


juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-03-28 18:31
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Refine the condition for add additional vars in RVV 
cost model
The adjacent_dr_p is sufficient and unnecessary condition for contiguous access.
So unnecessary live-ranges are added and result in spill.
 
This patch uses MEMORY_ACCESS_TYPE as condition and constrains segment
load/store.
 
Tested on RV64 and no regression.
 
PR target/114506
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc (non_contiguous_memory_access_p): Rename
(need_additional_vector_vars_p): Rename and refine condition
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/pr114506.c: New test.
 
Signed-off-by: demin.han 
---
gcc/config/riscv/riscv-vector-costs.cc| 25 ---
.../vect/costmodel/riscv/rvv/pr114506.c   | 23 +
2 files changed, 39 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index f462c272a6e..9f7fe936a29 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -563,14 +563,24 @@ get_store_value (gimple *stmt)
 return gimple_assign_rhs1 (stmt);
}
-/* Return true if it is non-contiguous load/store.  */
+/* Return true if addtional vector vars needed.  */
static bool
-non_contiguous_memory_access_p (stmt_vec_info stmt_info)
+need_additional_vector_vars_p (stmt_vec_info stmt_info)
{
   enum stmt_vec_info_type type
 = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-  return ((type == load_vec_info_type || type == store_vec_info_type)
-   && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)));
+  if (type == load_vec_info_type || type == store_vec_info_type)
+{
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
+ return true;
+
+  machine_mode mode = TYPE_MODE (STMT_VINFO_VECTYPE (stmt_info));
+  int lmul = riscv_get_v_regno_alignment (mode);
+  if (DR_GROUP_SIZE (stmt_info) * lmul > RVV_M8)
+ return true;
+}
+  return false;
}
/* Return the LMUL of the current analysis.  */
@@ -739,10 +749,7 @@ update_local_live_ranges (
  stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
  enum stmt_vec_info_type type
= STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-   if (non_contiguous_memory_access_p (stmt_info)
-   /* LOAD_LANES/STORE_LANES doesn't need a perm indice.  */
-   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)
-!= VMAT_LOAD_STORE_LANES)
+   if (need_additional_vector_vars_p (stmt_info))
{
  /* For non-adjacent load/store STMT, we will potentially
convert it into:
@@ -752,7 +759,7 @@ update_local_live_ranges (
We will be likely using one more vector variable.  */
  unsigned int max_point
- = (*program_points_per_bb.get (bb)).length () - 1;
+ = (*program_points_per_bb.get (bb)).length ();
  auto *live_ranges = live_ranges_per_bb.get (bb);
  bool existed_p = false;
  tree var = type == load_vec_info_type
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
new file mode 100644
index 000..a88d24b2d2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-mrvv-max-lmul=dynamic -fdump-tree-vect-details" } */
+
+float a[32000], b[32000], c[32000], d[32000];
+float aa[256][256], bb[256][256], cc[256][256];
+
+void
+s2275 ()
+{
+  for (int i = 0; i < 256; i++)
+{
+  for (int j = 0; j < 256; j++)
+ {
+   aa[j][i] = aa[j][i] + bb[j][i] * cc[j][i];
+ }
+  a[i] = b[i] + c[i] * d[i];
+}
+}
+
+/* { dg-final { scan-assembler-times {e32,m8} 1 } } */
+/* { dg-final { 

RE: [PATCH] RISC-V: Refine the condition for add additional vars in RVV cost model

2024-03-28 Thread Demin Han
Hi,
the point starts from 1. the max_point should equal to length();

Should I prepare an individual patch for this?

From: juzhe.zh...@rivai.ai 
Sent: 2024年3月28日 18:45
To: Demin Han ; gcc-patches 

Cc: kito.cheng ; pan2.li ; jeffreyalaw 
; Robin Dapp 
Subject: Re: [PATCH] RISC-V: Refine the condition for add additional vars in 
RVV cost model

Thanks a lot for trying to optimize the dynamic LMUL cost model.

The need_additional_vector_vars_p looks good to me.


But

-  = (*program_points_per_bb.get (bb)).length () - 1;

+  = (*program_points_per_bb.get (bb)).length ();
I wonder why you remove - 1?


juzhe.zh...@rivai.ai

From: demin.han
Date: 2024-03-28 18:31
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Refine the condition for add additional vars in RVV 
cost model
The adjacent_dr_p is sufficient and unnecessary condition for contiguous access.
So unnecessary live-ranges are added and result in spill.

This patch uses MEMORY_ACCESS_TYPE as condition and constrains segment
load/store.

Tested on RV64 and no regression.

PR target/114506

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (non_contiguous_memory_access_p): Rename
(need_additional_vector_vars_p): Rename and refine condition

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr114506.c: New test.

Signed-off-by: demin.han 
mailto:demin@starfivetech.com>>
---
gcc/config/riscv/riscv-vector-costs.cc| 25 ---
.../vect/costmodel/riscv/rvv/pr114506.c   | 23 +
2 files changed, 39 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index f462c272a6e..9f7fe936a29 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -563,14 +563,24 @@ get_store_value (gimple *stmt)
 return gimple_assign_rhs1 (stmt);
}
-/* Return true if it is non-contiguous load/store.  */
+/* Return true if addtional vector vars needed.  */
static bool
-non_contiguous_memory_access_p (stmt_vec_info stmt_info)
+need_additional_vector_vars_p (stmt_vec_info stmt_info)
{
   enum stmt_vec_info_type type
 = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-  return ((type == load_vec_info_type || type == store_vec_info_type)
-   && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)));
+  if (type == load_vec_info_type || type == store_vec_info_type)
+{
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
+ return true;
+
+  machine_mode mode = TYPE_MODE (STMT_VINFO_VECTYPE (stmt_info));
+  int lmul = riscv_get_v_regno_alignment (mode);
+  if (DR_GROUP_SIZE (stmt_info) * lmul > RVV_M8)
+ return true;
+}
+  return false;
}
/* Return the LMUL of the current analysis.  */
@@ -739,10 +749,7 @@ update_local_live_ranges (
  stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
  enum stmt_vec_info_type type
= STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-   if (non_contiguous_memory_access_p (stmt_info)
-   /* LOAD_LANES/STORE_LANES doesn't need a perm indice.  */
-   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)
-!= VMAT_LOAD_STORE_LANES)
+   if (need_additional_vector_vars_p (stmt_info))
{
  /* For non-adjacent load/store STMT, we will potentially
convert it into:
@@ -752,7 +759,7 @@ update_local_live_ranges (
We will be likely using one more vector variable.  */
  unsigned int max_point
- = (*program_points_per_bb.get (bb)).length () - 1;
+ = (*program_points_per_bb.get (bb)).length ();
  auto *live_ranges = live_ranges_per_bb.get (bb);
  bool existed_p = false;
  tree var = type == load_vec_info_type
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
new file mode 100644
index 000..a88d24b2d2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-mrvv-max-lmul=dynamic -fdump-tree-vect-details" } */
+
+float a[32000], b[32000], c[32000], d[32000];
+float aa[256][256], bb[256][256], cc[256][256];
+
+void
+s2275 ()
+{
+  for (int i = 0; i < 256; i++)
+{
+  for (int j = 0; j < 256; j++)
+ {
+   aa[j][i] = aa[j][i] + bb[j][i] * cc[j][i];
+ }
+  a[i] = b[i] + c[i] * d[i];
+}
+}
+
+/* { dg-final { scan-assembler-times {e32,m8} 1 } } */
+/* { dg-final { scan-assembler-not {e32,m4} } } */
+/* { dg-final { 

[committed] testsuite: Add testcase for already fixed PR [PR109925]

2024-03-28 Thread Jakub Jelinek
Hi!

This testcase was made latent by r14-4089 and got fixed both
on the trunk and 13 branch with PR113372 fix.

Adding testcase to the testsuite and will close the PR as a dup.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as
obvious.

2024-03-28  Jakub Jelinek  

PR tree-optimization/109925
* gcc.c-torture/execute/pr109925.c: New test.

--- gcc/testsuite/gcc.c-torture/execute/pr109925.c.jj   2024-03-27 
17:54:32.836301831 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr109925.c  2024-03-27 
17:54:18.602494494 +0100
@@ -0,0 +1,30 @@
+/* PR tree-optimization/109925 */
+
+int a, c, f;
+
+int
+main ()
+{
+  int g[2];
+  for (c = 0; c < 2; c++)
+{
+  {
+   char h[20], *b = h;
+   int d = 48, e = 0;
+   while (d && e < 5)
+ b[e++] = d /= 10;
+   f = e;
+  }
+  g[f - 2 + c] = 0;
+}
+  for (;;)
+{
+  for (; a <= 4; a++)
+   if (g[0])
+ break;
+  break;
+}
+  if (a != 5)
+__builtin_abort ();
+  return 0;
+}

Jakub



[PATCH] profile-count: Avoid overflows into uninitialized [PR112303]

2024-03-28 Thread Jakub Jelinek
Hi!

The testcase in the patch ICEs with
--- gcc/tree-scalar-evolution.cc
+++ gcc/tree-scalar-evolution.cc
@@ -3881,7 +3881,7 @@ final_value_replacement_loop (class loop *loop)
 
   /* Propagate constants immediately, but leave an unused initialization
 around to avoid invalidating the SCEV cache.  */
-  if (CONSTANT_CLASS_P (def) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt))
+  if (0 && CONSTANT_CLASS_P (def) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI 
(rslt))
replace_uses_by (rslt, def);
 
   /* Create the replacement statements.  */
(the addition of the above made the ICE latent), because profile_count
addition doesn't check for overflows and if unlucky, we can even overflow
into the uninitialized value.
Getting really huge profile counts is very easy even when not using
recursive inlining in loops, e.g.
__attribute__((noipa)) void
bar (void)
{
  __builtin_exit (0);
}

__attribute__((noipa)) void
foo (void)
{
  for (int i = 0; i < 1000; ++i)
  for (int j = 0; j < 1000; ++j)
  for (int k = 0; k < 1000; ++k)
  for (int l = 0; l < 1000; ++l)
  for (int m = 0; m < 1000; ++m)
  for (int n = 0; n < 1000; ++n)
  for (int o = 0; o < 1000; ++o)
  for (int p = 0; p < 1000; ++p)
  for (int q = 0; q < 1000; ++q)
  for (int r = 0; r < 1000; ++r)
  for (int s = 0; s < 1000; ++s)
  for (int t = 0; t < 1000; ++t)
  for (int u = 0; u < 1000; ++u)
  for (int v = 0; v < 1000; ++v)
  for (int w = 0; w < 1000; ++w)
  for (int x = 0; x < 1000; ++x)
  for (int y = 0; y < 1000; ++y)
  for (int z = 0; z < 1000; ++z)
  for (int a = 0; a < 1000; ++a)
  for (int b = 0; b < 1000; ++b)
bar ();
}

int
main ()
{
  foo ();
}
reaches the maximum count already on the 11th loop.

Some other methods of profile_count like apply_scale already
do use MIN (val, max_count) before assignment to m_val, this patch
just extends that to other methods.
Furthermore, one overload of apply_probability wasn't using
safe_scale_64bit and so could very easily overflow as well
- prob is required to be [0, 1] and if m_val is near the max_count,
it can overflow even with multiplications by 8.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if we also shouldn't do
if (safe_scale_64bit (..., )) tmp = max_count;
in the existing as well as new spots, because if safe_scale_64bit returns
true, it just wraps around I think.

2024-03-27  Jakub Jelinek  

PR tree-optimization/112303
* profile-count.h (profile_count::operator+): Perform
addition in uint64_t variable and set m_val to MIN of that
val and max_count.
(profile_count::operator+=): Likewise.
(profile_count::operator-=): Formatting fix.
(profile_count::apply_probability): Use safe_scale_64bit
even in the int overload.  Set m_val to MIN of tmp and
max_count.

* gcc.c-torture/compile/pr112303.c: New test.

--- gcc/profile-count.h.jj  2024-02-22 13:07:22.870344133 +0100
+++ gcc/profile-count.h 2024-03-27 15:16:16.461774110 +0100
@@ -910,7 +910,8 @@ public:
 
   profile_count ret;
   gcc_checking_assert (compatible_p (other));
-  ret.m_val = m_val + other.m_val;
+  uint64_t ret_val = m_val + other.m_val;
+  ret.m_val = MIN (ret_val, max_count);
   ret.m_quality = MIN (m_quality, other.m_quality);
   return ret;
 }
@@ -929,7 +930,8 @@ public:
   else
{
   gcc_checking_assert (compatible_p (other));
- m_val += other.m_val;
+ uint64_t ret_val = m_val + other.m_val;
+ m_val = MIN (ret_val, max_count);
  m_quality = MIN (m_quality, other.m_quality);
}
   return *this;
@@ -957,7 +959,7 @@ public:
   else
{
   gcc_checking_assert (compatible_p (other));
- m_val = m_val >= other.m_val ? m_val - other.m_val: 0;
+ m_val = m_val >= other.m_val ? m_val - other.m_val : 0;
  m_quality = MIN (m_quality, other.m_quality);
}
   return *this;
@@ -1127,7 +1129,9 @@ public:
   if (!initialized_p ())
return uninitialized ();
   profile_count ret;
-  ret.m_val = RDIV (m_val * prob, REG_BR_PROB_BASE);
+  uint64_t tmp;
+  safe_scale_64bit (m_val, prob, REG_BR_PROB_BASE, );
+  ret.m_val = MIN (tmp, max_count);
   ret.m_quality = MIN (m_quality, ADJUSTED);
   return ret;
 }
@@ -1145,7 +1149,7 @@ public:
   uint64_t tmp;
   safe_scale_64bit (m_val, prob.m_val, 
profile_probability::max_probability,
);
-  ret.m_val = tmp;
+  ret.m_val = MIN (tmp, max_count);
   ret.m_quality = MIN (m_quality, prob.m_quality);
   return ret;
 }
--- gcc/testsuite/gcc.c-torture/compile/pr112303.c.jj   2024-03-27 
15:16:57.873214557 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr112303.c  2024-03-26 
12:04:18.741670482 +0100
@@ -0,0 +1,25 @@
+/* PR tree-optimization/112303 */
+
+int a, b, d, e, f, **g, h;
+char c;
+
+int *
+foo (void)
+{
+  for (int i = 0; i < 3; i++)
+ 

Re: [PATCH] RISC-V: Refine the condition for add additional vars in RVV cost model

2024-03-28 Thread juzhe.zh...@rivai.ai
Thanks a lot for trying to optimize the dynamic LMUL cost model.

The need_additional_vector_vars_p looks good to me.

But
-   = (*program_points_per_bb.get (bb)).length () - 1;
+   = (*program_points_per_bb.get (bb)).length ();
I wonder why you remove - 1?



juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-03-28 18:31
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Refine the condition for add additional vars in RVV 
cost model
The adjacent_dr_p is sufficient and unnecessary condition for contiguous access.
So unnecessary live-ranges are added and result in spill.
 
This patch uses MEMORY_ACCESS_TYPE as condition and constrains segment
load/store.
 
Tested on RV64 and no regression.
 
PR target/114506
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc (non_contiguous_memory_access_p): Rename
(need_additional_vector_vars_p): Rename and refine condition
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/pr114506.c: New test.
 
Signed-off-by: demin.han 
---
gcc/config/riscv/riscv-vector-costs.cc| 25 ---
.../vect/costmodel/riscv/rvv/pr114506.c   | 23 +
2 files changed, 39 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index f462c272a6e..9f7fe936a29 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -563,14 +563,24 @@ get_store_value (gimple *stmt)
 return gimple_assign_rhs1 (stmt);
}
-/* Return true if it is non-contiguous load/store.  */
+/* Return true if addtional vector vars needed.  */
static bool
-non_contiguous_memory_access_p (stmt_vec_info stmt_info)
+need_additional_vector_vars_p (stmt_vec_info stmt_info)
{
   enum stmt_vec_info_type type
 = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-  return ((type == load_vec_info_type || type == store_vec_info_type)
-   && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)));
+  if (type == load_vec_info_type || type == store_vec_info_type)
+{
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
+ return true;
+
+  machine_mode mode = TYPE_MODE (STMT_VINFO_VECTYPE (stmt_info));
+  int lmul = riscv_get_v_regno_alignment (mode);
+  if (DR_GROUP_SIZE (stmt_info) * lmul > RVV_M8)
+ return true;
+}
+  return false;
}
/* Return the LMUL of the current analysis.  */
@@ -739,10 +749,7 @@ update_local_live_ranges (
  stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
  enum stmt_vec_info_type type
= STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-   if (non_contiguous_memory_access_p (stmt_info)
-   /* LOAD_LANES/STORE_LANES doesn't need a perm indice.  */
-   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)
-!= VMAT_LOAD_STORE_LANES)
+   if (need_additional_vector_vars_p (stmt_info))
{
  /* For non-adjacent load/store STMT, we will potentially
convert it into:
@@ -752,7 +759,7 @@ update_local_live_ranges (
We will be likely using one more vector variable.  */
  unsigned int max_point
- = (*program_points_per_bb.get (bb)).length () - 1;
+ = (*program_points_per_bb.get (bb)).length ();
  auto *live_ranges = live_ranges_per_bb.get (bb);
  bool existed_p = false;
  tree var = type == load_vec_info_type
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
new file mode 100644
index 000..a88d24b2d2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-mrvv-max-lmul=dynamic -fdump-tree-vect-details" } */
+
+float a[32000], b[32000], c[32000], d[32000];
+float aa[256][256], bb[256][256], cc[256][256];
+
+void
+s2275 ()
+{
+  for (int i = 0; i < 256; i++)
+{
+  for (int j = 0; j < 256; j++)
+ {
+   aa[j][i] = aa[j][i] + bb[j][i] * cc[j][i];
+ }
+  a[i] = b[i] + c[i] * d[i];
+}
+}
+
+/* { dg-final { scan-assembler-times {e32,m8} 1 } } */
+/* { dg-final { scan-assembler-not {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-not "Preferring smaller LMUL loop because it 
has unexpected spills" "vect" } } */
-- 
2.44.0
 
 


[PATCH] RISC-V: Refine the condition for add additional vars in RVV cost model

2024-03-28 Thread demin.han
The adjacent_dr_p is sufficient and unnecessary condition for contiguous access.
So unnecessary live-ranges are added and result in spill.

This patch uses MEMORY_ACCESS_TYPE as condition and constrains segment
load/store.

Tested on RV64 and no regression.

PR target/114506

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (non_contiguous_memory_access_p): 
Rename
(need_additional_vector_vars_p): Rename and refine condition

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr114506.c: New test.

Signed-off-by: demin.han 
---
 gcc/config/riscv/riscv-vector-costs.cc| 25 ---
 .../vect/costmodel/riscv/rvv/pr114506.c   | 23 +
 2 files changed, 39 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index f462c272a6e..9f7fe936a29 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -563,14 +563,24 @@ get_store_value (gimple *stmt)
 return gimple_assign_rhs1 (stmt);
 }
 
-/* Return true if it is non-contiguous load/store.  */
+/* Return true if addtional vector vars needed.  */
 static bool
-non_contiguous_memory_access_p (stmt_vec_info stmt_info)
+need_additional_vector_vars_p (stmt_vec_info stmt_info)
 {
   enum stmt_vec_info_type type
 = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
-  return ((type == load_vec_info_type || type == store_vec_info_type)
- && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)));
+  if (type == load_vec_info_type || type == store_vec_info_type)
+{
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+ && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
+   return true;
+
+  machine_mode mode = TYPE_MODE (STMT_VINFO_VECTYPE (stmt_info));
+  int lmul = riscv_get_v_regno_alignment (mode);
+  if (DR_GROUP_SIZE (stmt_info) * lmul > RVV_M8)
+   return true;
+}
+  return false;
 }
 
 /* Return the LMUL of the current analysis.  */
@@ -739,10 +749,7 @@ update_local_live_ranges (
  stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
  enum stmt_vec_info_type type
= STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
- if (non_contiguous_memory_access_p (stmt_info)
- /* LOAD_LANES/STORE_LANES doesn't need a perm indice.  */
- && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)
-  != VMAT_LOAD_STORE_LANES)
+ if (need_additional_vector_vars_p (stmt_info))
{
  /* For non-adjacent load/store STMT, we will potentially
 convert it into:
@@ -752,7 +759,7 @@ update_local_live_ranges (
 
We will be likely using one more vector variable.  */
  unsigned int max_point
-   = (*program_points_per_bb.get (bb)).length () - 1;
+   = (*program_points_per_bb.get (bb)).length ();
  auto *live_ranges = live_ranges_per_bb.get (bb);
  bool existed_p = false;
  tree var = type == load_vec_info_type
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
new file mode 100644
index 000..a88d24b2d2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114506.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-mrvv-max-lmul=dynamic -fdump-tree-vect-details" } */
+
+float a[32000], b[32000], c[32000], d[32000];
+float aa[256][256], bb[256][256], cc[256][256];
+
+void
+s2275 ()
+{
+  for (int i = 0; i < 256; i++)
+{
+  for (int j = 0; j < 256; j++)
+   {
+ aa[j][i] = aa[j][i] + bb[j][i] * cc[j][i];
+   }
+  a[i] = b[i] + c[i] * d[i];
+}
+}
+
+/* { dg-final { scan-assembler-times {e32,m8} 1 } } */
+/* { dg-final { scan-assembler-not {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-not "Preferring smaller LMUL loop because it 
has unexpected spills" "vect" } } */
-- 
2.44.0



Re: [PATCH] middle-end/114480 - IDF compute is slow

2024-03-28 Thread Richard Biener
On Wed, 27 Mar 2024, Michael Matz wrote:

> Hey,
> 
> On Wed, 27 Mar 2024, Jakub Jelinek wrote:
> 
> > > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
> > >gcc_checking_assert (bb_index
> > >  < (unsigned) last_basic_block_for_fn (cfun));
> > >  
> > > -  EXECUTE_IF_AND_COMPL_IN_BITMAP ([bb_index], 
> > > phi_insertion_points,
> > > -   0, i, bi)
> > > - {
> > > +  EXECUTE_IF_SET_IN_BITMAP ([bb_index], 0, i, bi)
> > > + if (bitmap_set_bit (phi_insertion_points, i))
> > > bitmap_set_bit (work_set, i);
> > > -   bitmap_set_bit (phi_insertion_points, i);
> > > - }
> > >  }
> > 
> > I don't understand why the above is better.
> > Wouldn't it be best to do
> >   bitmap_ior_and_compl_into (work_set, [bb_index],
> >  phi_insertion_points);
> >   bitmap_ior_into (phi_insertion_points, [bb_index]);
> > ?
> 
> I had the same hunch, but:
> 
> 1) One would have to make work_set be non-tree-view again (which with the 
> current structure is a wash anyway, and that makes sense as accesses to 
> work_set aren't heavily random here).

The tree-view is a wash indeed (I tried many things).

> 2) But doing that and using bitmap_ior.._into is still measurably slower: 
> on a reduced testcase with -O0 -fno-checking, proposed structure 
> (tree-view or not-tree-view workset doesn't matter):
> 
>  tree SSA rewrite   :  14.93 ( 12%)   0.01 (  2%)  14.95 ( 
> 12%)27M (  8%)
> 
> with non-tree-view, and your suggestion:
> 
>  tree SSA rewrite   :  20.68 ( 12%)   0.02 (  4%)  20.75 ( 
> 12%)27M (  8%)
> 
> I can only speculate that the usually extreme sparsity of the bitmaps in 
> question make the setup costs of the two bitmap_ior calls actually more 
> expensive than the often skipped second call to bitmap_set_bit in Richis 
> proposed structure.  (That or cache effects)

So slightly "better" than Jakubs variant would be

  if (bitmap_ior_and_compl_into (work_set, [bb_index],
 phi_insertion_points))
bitmap_ior_into (phi_insertion_points, [bb_index]);

since phi_insertion_points grows that IOR becomes more expensive over 
time.

The above for me (today Zen2, yesterday Zen4) is

 tree SSA rewrite   : 181.02 ( 37%) 

with unconditiona ior_into:

 tree SSA rewrite   : 180.93 ( 36%)

while my patch is

 tree SSA rewrite   :  22.04 (  6%)

not sure what uarch Micha tested on.  I think the testcase has simply
many variables we write into SSA (man compute_idf calls), many BBs
but very low popcount DFS[] so iterating over DFS[] only is very
beneficial here as opposed to also walk phi_insertion_points
and work_set.  I think low popcount DFS[] is quite typical
for a CFG - but for sure popcount of DFS[] is going to be lower
than popcount of the IDF (phi_insertion_points).

Btw, with my patch compute_idf is completely off the profile so it's
hard to improve further (we do process blocks possibly twice for
example, but that doesn't make a difference here)

Indeed doing statistics shows the maximum popcount of a dominance
frontier is 8 but 99% have just a single block.  But the popcount
of the final IDF is more than 1 half of the time and more
than 1000 90% of the time.

I have pushed the patch now.

Richard.


RE: [PATCH v1] RISC-V: Allow RVV intrinsic for more function target

2024-03-28 Thread Li, Pan2
I see. This failure comes from that we have zve32x (TARGET_VECTOR is true) in 
command line, and then we don't do the reinit in riscv_pragma_intrinsic in v1.

As I understand, we need something like below, no matter TARGET_VECTOR is true 
or false.

Int flags_backup = flags;
Int new_flags = flags | ...;

reinit ();

flags = flags_backup ();
reinit ();

> Also I guess all zvk* and zvbb may also need to be added as well,
> but...I suspect it's not scalable way?

If zvk* and zvbb doesn't introduce new modes, I suspect we don't need to add 
here, let me double check about it and update in v2.

Pan

-Original Message-
From: Li, Pan2  
Sent: Thursday, March 28, 2024 3:32 PM
To: Kito Cheng 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: RE: [PATCH v1] RISC-V: Allow RVV intrinsic for more function target

Thanks kito, looks missed this part in test, let me check it out.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, March 28, 2024 2:44 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Allow RVV intrinsic for more function target

Just tried something interesting:

$ riscv64-unknown-linux-gnu-gcc -march=rv64gc -O
target_attribute_v_with_intrinsic-9.c -S # Work
$ riscv64-unknown-linux-gnu-gcc -march=rv64gc_zve32x -O
target_attribute_v_with_intrinsic-9.c -S # Not work

Also I guess all zvk* and zvbb may also need to be added as well,
but...I suspect it's not scalable way?


RE: [PATCH v1] RISC-V: Allow RVV intrinsic for more function target

2024-03-28 Thread Li, Pan2
Thanks kito, looks missed this part in test, let me check it out.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, March 28, 2024 2:44 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Allow RVV intrinsic for more function target

Just tried something interesting:

$ riscv64-unknown-linux-gnu-gcc -march=rv64gc -O
target_attribute_v_with_intrinsic-9.c -S # Work
$ riscv64-unknown-linux-gnu-gcc -march=rv64gc_zve32x -O
target_attribute_v_with_intrinsic-9.c -S # Not work

Also I guess all zvk* and zvbb may also need to be added as well,
but...I suspect it's not scalable way?


Re: [committed] amdgcn: Prefer V32 on RDNA devices

2024-03-28 Thread Thomas Schwinge
Hi Andrew!

On 2024-03-22T15:54:48+, Andrew Stubbs  wrote:
> This patch alters the default (preferred) vector size to 32 on RDNA devices to
> better match the actual hardware.  64-lane vectors will continue to be
> used where they are hard-coded (such as function prologues).
>
> We run these devices in wavefrontsize64 for compatibility, but they actually
> only have 32-lane vectors, natively.  If the upper part of a V64 is masked
> off (as it is in V32) then RDNA devices will skip execution of the upper part
> for most operations, so this adjustment shouldn't leave too much performance 
> on
> the table.  One exception is memory instructions, so full wavefrontsize32
> support would be better.
>
> The advantage is that we avoid the missing V64 operations (such as permute and
> vec_extract).
>
> Committed to mainline.

In my GCN target '-march=gfx1100' testing, this commit
"amdgcn: Prefer V32 on RDNA devices" does resolve (or, make latent?) a
number of execution test FAILs (that is, regressions compared to earlier
'-march=gfx90a' etc. testing).

This commit also resolves (for my '-march=gfx1100' testing) one
pre-existing FAIL (that is, already seen in '-march=gfx90a' earlier
etc. testing):

PASS: gcc.dg/tree-ssa/scev-14.c (test for excess errors)
[-FAIL:-]{+PASS:+} gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts 
"Overflowness wrto loop niter:\tNo-overflow"

That means, this test case specifically (or, just its 'scan-tree-dump'?)
needs to be adjusted for GCN V64 testing?

This commit, as you'd also mentioned elsewhere, however also causes a
number of regressions in 'gcc.target/gcn/gcn.exp', see list below.

Those can be "fixed" with 'dg-additional-options -march=gfx90a' (or
similar) in the affected test cases (let me know if you'd like me to
'git push' that), but I suppose something more elaborate may be in order?
(Conditionalize those on 'target { ! gcn_rdna }', and add respective
scanning for 'target gcn_rdna'?  I can help with effective-target
'gcn_rdna' (or similar), if you'd like me to.)

And/or, have a '-mpreferred-simd-mode=v64' (or similar) to be used for
such test cases, to override 'if (TARGET_RDNA2_PLUS)' etc. in
'gcn_vectorize_preferred_simd_mode'?

Best, probably, both these things, to properly test both V32 and V64?

PASS: gcc.target/gcn/cond_fmaxnm_1.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times 
smaxv64df3_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times 
smaxv64sf3_exec 3
PASS: gcc.target/gcn/cond_fmaxnm_1_run.c (test for excess errors)
PASS: gcc.target/gcn/cond_fmaxnm_1_run.c execution test

PASS: gcc.target/gcn/cond_fmaxnm_2.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times 
smaxv64df3_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times 
smaxv64sf3_exec 3
PASS: gcc.target/gcn/cond_fmaxnm_2_run.c (test for excess errors)
PASS: gcc.target/gcn/cond_fmaxnm_2_run.c execution test

PASS: gcc.target/gcn/cond_fmaxnm_3.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
movv64df_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
movv64sf_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
smaxv64sf3 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
smaxv64sf3 3
PASS: gcc.target/gcn/cond_fmaxnm_3_run.c (test for excess errors)
PASS: gcc.target/gcn/cond_fmaxnm_3_run.c execution test

PASS: gcc.target/gcn/cond_fmaxnm_4.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
movv64df_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
movv64sf_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
smaxv64sf3 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
smaxv64sf3 3
PASS: gcc.target/gcn/cond_fmaxnm_4_run.c (test for excess errors)
PASS: gcc.target/gcn/cond_fmaxnm_4_run.c execution test

PASS: gcc.target/gcn/cond_fmaxnm_5.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times 
smaxv64df3_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times 
smaxv64sf3_exec 3
PASS: 

Re: [PATCH v1] RISC-V: Allow RVV intrinsic for more function target

2024-03-28 Thread Kito Cheng
Just tried something interesting:

$ riscv64-unknown-linux-gnu-gcc -march=rv64gc -O
target_attribute_v_with_intrinsic-9.c -S # Work
$ riscv64-unknown-linux-gnu-gcc -march=rv64gc_zve32x -O
target_attribute_v_with_intrinsic-9.c -S # Not work

Also I guess all zvk* and zvbb may also need to be added as well,
but...I suspect it's not scalable way?


Re: [PATCH] RISC-V: Add vxsat as a register

2024-03-28 Thread Kito Cheng
LGTM, and committed to trunk :)

On Thu, Mar 28, 2024 at 5:37 AM Palmer Dabbelt  wrote:
>
> We aren't doing anything with vxsat right now, but I'd like to add it as
> an accepted register to the clobber list.  If we get this into GCC-14
> then we'll avoid some preprocessor-based twiddling if we ever start
> using vxsat in the future.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.h (REGISTER_NAMES): Add vxsat.
> ---
> IIUC we aren't using these N/A regnos for anything, they're just there to pad
> out the types.  So I think this is safe, but Juzhe would likely know best 
> here.
>
> See
> https://inbox.sourceware.org/libc-alpha/20240327193601.28903-2-pal...@rivosinc.com/
> a use of this.
> ---
>  gcc/config/riscv/riscv.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index da089a03e9d..d5779512994 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -933,7 +933,7 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
>"fs0", "fs1", "fa0", "fa1", "fa2", "fa3", "fa4", "fa5",  \
>"fa6", "fa7", "fs2", "fs3", "fs4", "fs5", "fs6", "fs7",  \
>"fs8", "fs9", "fs10","fs11","ft8", "ft9", "ft10","ft11", \
> -  "arg", "frame", "vl", "vtype", "vxrm", "frm", "N/A", "N/A",   \
> +  "arg", "frame", "vl", "vtype", "vxrm", "frm", "vxsat", "N/A", \
>"N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
>"N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
>"N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A",  \
> --
> 2.44.0
>