[PATCH] Document refactoring of the option -fcf-protection=x.

2024-01-09 Thread liuhongt
To override -fcf-protection, -fcf-protection=none needs to be added
and then with -fcf-protection=xxx.
---
 htdocs/gcc-14/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index e3a68998..72b0d291 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -40,6 +40,12 @@ a work-in-progress.
   https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wflex-array-member-not-at-end;>-Wflex-array-member-not-at-end
 to
   identify all such cases in the source code and modify them.
   
+  https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html;>-fcf-protection=[full|branch|return|none|check]
+  is refactored, to override -fcf-protection,
+  -fcf-protection=none needs to be added and then
+  with -fcf-protection=xxx.
+  
+
 
 
 
-- 
2.31.1



[PATCH] i386: Add AVX10.1 related macros

2024-01-09 Thread Haochen Jiang
Hi all,

This patch aims to add AVX10.1 related macros for libgomp's request. The
request comes following:

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html

Ok for trunk?

Thx,
Haochen

gcc/ChangeLog:

PR target/113288
* config/i386/i386-c.cc (ix86_target_macros_internal):
Add __AVX10_1__, __AVX10_1_256__ and __AVX10_1_512__.
---
 gcc/config/i386/i386-c.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index c3ae984670b..366b560158a 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -735,6 +735,13 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__EVEX512__");
   if (isa_flag2 & OPTION_MASK_ISA2_USER_MSR)
 def_or_undef (parse_in, "__USER_MSR__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_256)
+{
+  def_or_undef (parse_in, "__AVX10_1_256__");
+  def_or_undef (parse_in, "__AVX10_1__");
+}
+  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_512)
+def_or_undef (parse_in, "__AVX10_1_512__");
   if (TARGET_IAMCU)
 {
   def_or_undef (parse_in, "__iamcu");
-- 
2.31.1



Re:Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread joshua
I'm confused why I cannot add new shapes. I think adding
new shapes is the basic part in implementation for new
intrinsics.







--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 15:17
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector


Why do you need to invade existing shapes ?




juzhe.zh...@rivai.ai

 
发件人: joshua
发送时间: 2024-01-10 15:16
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

These xttheadvector speical intrinsics are different from rvv1.0
in determining function name from base name. We cannot directly
reuse the existing shapes.
 
In order not to invade existing shapes, we add new shapes for new
functions. Also, we create new thead-vector-builtins.cc for xtheadvector
function_base implementation.
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 15:01
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Why do you add theadvector shapes ? I think you can reuse the current existing 
shapes.
 
 
+thead-vector-builtins.o: \+  $(srcdir)/config/riscv/thead-vector-builtins.cc 
\+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \+  
$(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \+  $(EXPR_H) 
$(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \+  gimple-iterator.h 
gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \+  
rtx-vector-builder.h \+  $(srcdir)/config/riscv/riscv-vector-builtins-shapes.h 
\+  $(srcdir)/config/riscv/riscv-vector-builtins-bases.h \+  
$(srcdir)/config/riscv/thead-vector-builtins.h \+  $(RISCV_BUILTINS_H)+   
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \+  
$(srcdir)/config/riscv/thead-vector-builtins.cc+
Why do you rebuild another new object ?
 
 
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
Incorrect copyright
 
 
 
 
 
 
juzhe.zh...@rivai.ai
 
 
发件人: joshua
发送时间: 2024-01-10 10:57
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
Hi Juzhe,
Thank you for so many useful comments for this patch!
 
There are some more patches to support xtheadvector
special instrinsics as well as handle register overlap issue and
rewrite assembly output.
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641774.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641732.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html
 
Also, there is a precedent patch to refactor riscv-vector-builtins-bases.cc
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641530.html
Jeff has reviewed it, but didn't have LGTM yet.
 
Joshua
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 10:34
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Thanks for your patience.
 
 
LGTM from myside.
 
 
I think it's pretty clean now. I can image in the future when some day the 
theadvector is no longer used, we can remove it very easily.
 
 
And also,  the theadvector won't affect our RVV1.0 maintain since it's isolated 
cleanly. 
 
 
But I'd like to wait for a few more days some body want to chime in.
 
 
And you should do more things before commit it:
1. Remember you should run the full coverage RVV1.0 API test, the 
test-generator is downloaded from official intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc 
 
 
2. Also the regression of RV32 an RV64 of GCC testsuite.
 
 
Do you have more patches of theadvector that I didn't review ? plz point them 
to me again.
 
 
Thanks.
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2024-01-10 10:22
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." 

回复:Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread joshua
For example

+/* th_loadstore_width_def class.  */
+struct th_loadstore_width_def : public build_base
+{
+  void build (function_builder ,
+ const function_group_info ) const override
+  {
+/* Report an error if there is no xtheadvector.  */
+if (!TARGET_XTHEADVECTOR)
+  return;
+
+build_all (b, group);
+  }
+
+  char *get_name (function_builder , const function_instance ,
+ bool overloaded_p) const override
+  {
+/* Report an error if there is no xtheadvector.  */
+if (!TARGET_XTHEADVECTOR)
+  return nullptr;
+
+/* Return nullptr if it can not be overloaded.  */
+if (overloaded_p && !instance.base->can_be_overloaded_p (instance.pred))
+  return nullptr;
+
+b.append_base_name (instance.base_name);
+
+/* vop_v --> vop_v_.  */
+if (!overloaded_p)
+  {
+   /* vop --> vop_v.  */
+   b.append_name (operand_suffixes[instance.op_info->op]);
+   /* vop_v --> vop_v_.  */
+   b.append_name (type_suffixes[instance.type.index].vector);
+  }
+
+/* According to rvv-intrinsic-doc, it does not add "_m" suffix
+   for vop_m C++ overloaded API.  */
+if (overloaded_p && instance.pred == PRED_TYPE_m)
+  return b.finish_name ();
+b.append_name (predication_suffixes[instance.pred]);
+return b.finish_name ();
+  }
+};

I cannot find totally the sam shape that I can reuse.
Maybe loadstore_def? But we do not need to do
vop --> vop for our new intrinsics. If we reuse
this shape, we need to add some logic here.

Also, the shape "th_extract" for "ext" is a new shape that
existing shapes haven't implemented.




--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 15:17
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector


Why do you need to invade existing shapes ?




juzhe.zh...@rivai.ai

 
发件人: joshua
发送时间: 2024-01-10 15:16
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

These xttheadvector speical intrinsics are different from rvv1.0
in determining function name from base name. We cannot directly
reuse the existing shapes.
 
In order not to invade existing shapes, we add new shapes for new
functions. Also, we create new thead-vector-builtins.cc for xtheadvector
function_base implementation.
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 15:01
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Why do you add theadvector shapes ? I think you can reuse the current existing 
shapes.
 
 
+thead-vector-builtins.o: \+  $(srcdir)/config/riscv/thead-vector-builtins.cc 
\+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \+  
$(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \+  $(EXPR_H) 
$(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \+  gimple-iterator.h 
gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \+  
rtx-vector-builder.h \+  $(srcdir)/config/riscv/riscv-vector-builtins-shapes.h 
\+  $(srcdir)/config/riscv/riscv-vector-builtins-bases.h \+  
$(srcdir)/config/riscv/thead-vector-builtins.h \+  $(RISCV_BUILTINS_H)+   
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \+  
$(srcdir)/config/riscv/thead-vector-builtins.cc+
Why do you rebuild another new object ?
 
 
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
Incorrect copyright
 
 
 
 
 
 
juzhe.zh...@rivai.ai
 
 
发件人: joshua
发送时间: 2024-01-10 10:57
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
Hi Juzhe,
Thank you for so many useful comments for this patch!
 
There are some more patches to support xtheadvector
special instrinsics as well as handle register overlap issue and
rewrite assembly output.
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641774.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641732.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html
 
Also, there is a precedent patch to refactor riscv-vector-builtins-bases.cc
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641530.html
Jeff has reviewed it, but didn't have LGTM yet.
 
Joshua
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 10:34
收件人:"cooper.joshua"; 
"gcc-patches"
抄 

[PATCH v1] LoongArch: testsuite:Fixed a bug that added a target check error.

2024-01-09 Thread chenxiaolong
After the code is committed in r14-6948, GCC regression testing on some
architectures will produce the following error:

"error executing dg-final: unknown effective target keyword `loongarch*-*-*'"

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Removed an issue with "target keyword"
checking errors on LoongArch architecture.
---
 gcc/testsuite/lib/target-supports.exp | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 5c6bb602cc0..dbc4f016091 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7994,7 +7994,6 @@ proc check_effective_target_vect_widen_mult_qi_to_hi { } {
  || ([istarget aarch64*-*-*]
  && ![check_effective_target_aarch64_sve])
  || [is-effective-target arm_neon]
- || [is-effective-target loongarch*-*-*]
  || ([istarget s390*-*-*]
  && [check_effective_target_s390_vx])) 
  || [istarget amdgcn-*-*] }}]
@@ -8019,7 +8018,6 @@ proc check_effective_target_vect_widen_mult_hi_to_si { } {
 && ![check_effective_target_aarch64_sve])
 || [istarget i?86-*-*] || [istarget x86_64-*-*]
 || [is-effective-target arm_neon]
-|| [is-effective-target loongarch*-*-*]
 || ([istarget s390*-*-*]
 && [check_effective_target_s390_vx]))
 || [istarget amdgcn-*-*] }}]
-- 
2.20.1



[PATCH v2] LoongArch: testsuite:Added support for loongarch.

2024-01-09 Thread chenxiaolong
The function of this test is to check that the compiler supports vectorization
using SLP and vec_{load/store/*}_lanes. However, vec_{load/store/*}_lanes are
not supported on LoongArch, such as the corresponding "st4/ld4" directives on
aarch64.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-21.c: Add loongarch.
---
 gcc/testsuite/gcc.dg/vect/slp-21.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c 
b/gcc/testsuite/gcc.dg/vect/slp-21.c
index 712a73b69d7..58751688414 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
@@ -213,7 +213,7 @@ int main (void)
 
Not all vect_perm targets support that, and it's a bit too specific to have
its own effective-target selector, so we just test targets directly.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { powerpc64*-*-* s390*-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } 
} } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { 
target { ! { vect_strided4 } } } } } */
   
-- 
2.20.1



RE: [PATCH] RISC-V: Refine unsigned avg_floor/avg_ceil

2024-01-09 Thread Li, Pan2
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Wednesday, January 10, 2024 3:12 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; kito.ch...@sifive.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Refine unsigned avg_floor/avg_ceil

LGTM!

On Wed, Jan 10, 2024 at 1:05 PM Juzhe-Zhong  wrote:
>
> This patch is inspired by LLVM patches:
> https://github.com/llvm/llvm-project/pull/76550
> https://github.com/llvm/llvm-project/pull/77473
>
> Use vaaddu for AVG vectorization.
>
> Before this patch:
>
> vsetivlizero,8,e8,mf2,ta,ma
> vle8.v  v3,0(a1)
> vle8.v  v2,0(a2)
> vwaddu.vvv1,v3,v2
> vsetvli zero,zero,e16,m1,ta,ma
> vadd.vi v1,v1,1
> vsetvli zero,zero,e8,mf2,ta,ma
> vnsrl.wiv1,v1,1
> vse8.v  v1,0(a0)
> ret
>
> After this patch:
>
> vsetivlizero,8,e8,mf2,ta,ma
> csrwi   vxrm,0
> vle8.v  v1,0(a1)
> vle8.v  v2,0(a2)
> vaaddu.vv   v1,v1,v2
> vse8.v  v1,0(a0)
> ret
>
> Note on signed averaging addition
>
> Based on the rvv spec, there is also a variant for signed averaging addition 
> called vaadd.
> But AFAIU, no matter in which rounding mode, we cannot achieve the semantic 
> of signed averaging addition through vaadd.
> Thus this patch only introduces vaaddu.
>
> More details in:
> https://github.com/riscv/riscv-v-spec/issues/935
> https://github.com/riscv/riscv-v-spec/issues/934
>
> Tested on both RV32 and RV64 no regression.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (avg3_floor): Remove.
> (avg3_floor): New pattern.
> (avg3_ceil): Remove.
> (avg3_ceil): New pattern.
> (uavg3_floor): Ditto.
> (uavg3_ceil): Ditto.
> * config/riscv/riscv-protos.h (enum insn_flags): Add for average 
> addition.
> (enum insn_type): Ditto.
> * config/riscv/riscv-v.cc: Ditto.
> * config/riscv/vector-iterators.md (ashiftrt): Remove.
> (ASHIFTRT): Ditto.
> * config/riscv/vector.md: Add VLS modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vls/avg-1.c: Adapt test.
> * gcc.target/riscv/rvv/autovec/vls/avg-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/avg-3.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/avg-4.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/avg-5.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/avg-6.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: Ditto.
>
> ---
>  gcc/config/riscv/autovec.md   | 50 ++-
>  gcc/config/riscv/riscv-protos.h   |  8 +++
>  gcc/config/riscv/riscv-v.cc   | 11 
>  gcc/config/riscv/vector-iterators.md  |  5 --
>  gcc/config/riscv/vector.md| 12 ++---
>  .../gcc.target/riscv/rvv/autovec/vls/avg-1.c  |  4 +-
>  .../gcc.target/riscv/rvv/autovec/vls/avg-2.c  |  4 +-
>  .../gcc.target/riscv/rvv/autovec/vls/avg-3.c  |  4 +-
>  .../gcc.target/riscv/rvv/autovec/vls/avg-4.c  |  6 +--
>  .../gcc.target/riscv/rvv/autovec/vls/avg-5.c  |  6 +--
>  .../gcc.target/riscv/rvv/autovec/vls/avg-6.c  |  6 +--
>  .../riscv/rvv/autovec/widen/vec-avg-rv32gcv.c |  7 +--
>  .../riscv/rvv/autovec/widen/vec-avg-rv64gcv.c |  7 +--
>  13 files changed, 86 insertions(+), 44 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 775eaa825b0..706cd9717cb 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -2345,39 +2345,39 @@
>  ;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
>  ;; -
>
> -(define_expand "avg3_floor"
> +(define_expand "avg3_floor"
>   [(set (match_operand: 0 "register_operand")
> (truncate:
> -(:VWEXTI
> +(ashiftrt:VWEXTI
>   (plus:VWEXTI
> -  (any_extend:VWEXTI
> +  (sign_extend:VWEXTI
> (match_operand: 1 "register_operand"))
> -  (any_extend:VWEXTI
> +  (sign_extend:VWEXTI
> (match_operand: 2 "register_operand"))]
>"TARGET_VECTOR"
>  {
>/* First emit a widening addition.  */
>rtx tmp1 = gen_reg_rtx (mode);
>rtx ops1[] = {tmp1, operands[1], operands[2]};
> -  insn_code icode = code_for_pred_dual_widen (PLUS, , mode);
> +  insn_code icode = code_for_pred_dual_widen (PLUS, SIGN_EXTEND, mode);
>riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
>
>/* Then a narrowing shift.  */
>rtx ops2[] = {operands[0], tmp1, const1_rtx};
> -  icode = code_for_pred_narrow_scalar (, mode);
> +  icode = code_for_pred_narrow_scalar (ASHIFTRT, mode);
>riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
>DONE;
>  })
>
> -(define_expand "avg3_ceil"
> 

Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread juzhe.zh...@rivai.ai
Why do you need to invade existing shapes ?




juzhe.zh...@rivai.ai
 
发件人: joshua
发送时间: 2024-01-10 15:16
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
These xttheadvector speical intrinsics are different from rvv1.0
in determining function name from base name. We cannot directly
reuse the existing shapes.
 
In order not to invade existing shapes, we add new shapes for new
functions. Also, we create new thead-vector-builtins.cc for xtheadvector
function_base implementation.
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 15:01
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Why do you add theadvector shapes ? I think you can reuse the current existing 
shapes.
 
 
+thead-vector-builtins.o: \+  $(srcdir)/config/riscv/thead-vector-builtins.cc 
\+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \+  
$(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \+  $(EXPR_H) 
$(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \+  gimple-iterator.h 
gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \+  
rtx-vector-builder.h \+  $(srcdir)/config/riscv/riscv-vector-builtins-shapes.h 
\+  $(srcdir)/config/riscv/riscv-vector-builtins-bases.h \+  
$(srcdir)/config/riscv/thead-vector-builtins.h \+  $(RISCV_BUILTINS_H)+ 
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \+ 
$(srcdir)/config/riscv/thead-vector-builtins.cc+
Why do you rebuild another new object ?
 
 
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
Incorrect copyright
 
 
 
 
 
 
juzhe.zh...@rivai.ai
 
 
发件人: joshua
发送时间: 2024-01-10 10:57
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
Hi Juzhe,
Thank you for so many useful comments for this patch!
 
There are some more patches to support xtheadvector
special instrinsics as well as handle register overlap issue and
rewrite assembly output.
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641774.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641732.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html
 
Also, there is a precedent patch to refactor riscv-vector-builtins-bases.cc
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641530.html
Jeff has reviewed it, but didn't have LGTM yet.
 
Joshua
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 10:34
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Thanks for your patience.
 
 
LGTM from myside.
 
 
I think it's pretty clean now. I can image in the future when some day the 
theadvector is no longer used, we can remove it very easily.
 
 
And also,  the theadvector won't affect our RVV1.0 maintain since it's isolated 
cleanly. 
 
 
But I'd like to wait for a few more days some body want to chime in.
 
 
And you should do more things before commit it:
1. Remember you should run the full coverage RVV1.0 API test, the 
test-generator is downloaded from official intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc 
 
 
2. Also the regression of RV32 an RV64 of GCC testsuite.
 
 
Do you have more patches of theadvector that I didn't review ? plz point them 
to me again.
 
 
Thanks.
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2024-01-10 10:22
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* 

Re:Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread joshua
These xttheadvector speical intrinsics are different from rvv1.0
in determining function name from base name. We cannot directly
reuse the existing shapes.

In order not to invade existing shapes, we add new shapes for new
functions. Also, we create new thead-vector-builtins.cc for xtheadvector
function_base implementation.





--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 15:01
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector


Why do you add theadvector shapes ? I think you can reuse the current existing 
shapes.


+thead-vector-builtins.o: \+  $(srcdir)/config/riscv/thead-vector-builtins.cc 
\+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \+  
$(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \+  $(EXPR_H) 
$(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \+  gimple-iterator.h 
gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \+  
rtx-vector-builder.h \+  $(srcdir)/config/riscv/riscv-vector-builtins-shapes.h 
\+  $(srcdir)/config/riscv/riscv-vector-builtins-bases.h \+  
$(srcdir)/config/riscv/thead-vector-builtins.h \+  $(RISCV_BUILTINS_H)+   
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \+  
$(srcdir)/config/riscv/thead-vector-builtins.cc+
Why do you rebuild another new object ?


+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
Incorrect copyright






juzhe.zh...@rivai.ai

 
发件人: joshua
发送时间: 2024-01-10 10:57
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

Hi Juzhe,
Thank you for so many useful comments for this patch!
 
There are some more patches to support xtheadvector
special instrinsics as well as handle register overlap issue and
rewrite assembly output.
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641774.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641732.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html
 
Also, there is a precedent patch to refactor riscv-vector-builtins-bases.cc
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641530.html
Jeff has reviewed it, but didn't have LGTM yet.
 
Joshua
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 10:34
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Thanks for your patience.
 
 
LGTM from myside.
 
 
I think it's pretty clean now. I can image in the future when some day the 
theadvector is no longer used, we can remove it very easily.
 
 
And also,  the theadvector won't affect our RVV1.0 maintain since it's isolated 
cleanly. 
 
 
But I'd like to wait for a few more days some body want to chime in.
 
 
And you should do more things before commit it:
1. Remember you should run the full coverage RVV1.0 API test, the 
test-generator is downloaded from official intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc 
 
 
2. Also the regression of RV32 an RV64 of GCC testsuite.
 
 
Do you have more patches of theadvector that I didn't review ? plz point them 
to me again.
 
 
Thanks.
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2024-01-10 10:22
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-string.cc (vls_mode_valid_p): 
Avoid autovec.
* 

Re: [PATCH] RISC-V: Refine unsigned avg_floor/avg_ceil

2024-01-09 Thread Kito Cheng
LGTM!

On Wed, Jan 10, 2024 at 1:05 PM Juzhe-Zhong  wrote:
>
> This patch is inspired by LLVM patches:
> https://github.com/llvm/llvm-project/pull/76550
> https://github.com/llvm/llvm-project/pull/77473
>
> Use vaaddu for AVG vectorization.
>
> Before this patch:
>
> vsetivlizero,8,e8,mf2,ta,ma
> vle8.v  v3,0(a1)
> vle8.v  v2,0(a2)
> vwaddu.vvv1,v3,v2
> vsetvli zero,zero,e16,m1,ta,ma
> vadd.vi v1,v1,1
> vsetvli zero,zero,e8,mf2,ta,ma
> vnsrl.wiv1,v1,1
> vse8.v  v1,0(a0)
> ret
>
> After this patch:
>
> vsetivlizero,8,e8,mf2,ta,ma
> csrwi   vxrm,0
> vle8.v  v1,0(a1)
> vle8.v  v2,0(a2)
> vaaddu.vv   v1,v1,v2
> vse8.v  v1,0(a0)
> ret
>
> Note on signed averaging addition
>
> Based on the rvv spec, there is also a variant for signed averaging addition 
> called vaadd.
> But AFAIU, no matter in which rounding mode, we cannot achieve the semantic 
> of signed averaging addition through vaadd.
> Thus this patch only introduces vaaddu.
>
> More details in:
> https://github.com/riscv/riscv-v-spec/issues/935
> https://github.com/riscv/riscv-v-spec/issues/934
>
> Tested on both RV32 and RV64 no regression.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (avg3_floor): Remove.
> (avg3_floor): New pattern.
> (avg3_ceil): Remove.
> (avg3_ceil): New pattern.
> (uavg3_floor): Ditto.
> (uavg3_ceil): Ditto.
> * config/riscv/riscv-protos.h (enum insn_flags): Add for average 
> addition.
> (enum insn_type): Ditto.
> * config/riscv/riscv-v.cc: Ditto.
> * config/riscv/vector-iterators.md (ashiftrt): Remove.
> (ASHIFTRT): Ditto.
> * config/riscv/vector.md: Add VLS modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vls/avg-1.c: Adapt test.
> * gcc.target/riscv/rvv/autovec/vls/avg-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/avg-3.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/avg-4.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/avg-5.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/avg-6.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: Ditto.
>
> ---
>  gcc/config/riscv/autovec.md   | 50 ++-
>  gcc/config/riscv/riscv-protos.h   |  8 +++
>  gcc/config/riscv/riscv-v.cc   | 11 
>  gcc/config/riscv/vector-iterators.md  |  5 --
>  gcc/config/riscv/vector.md| 12 ++---
>  .../gcc.target/riscv/rvv/autovec/vls/avg-1.c  |  4 +-
>  .../gcc.target/riscv/rvv/autovec/vls/avg-2.c  |  4 +-
>  .../gcc.target/riscv/rvv/autovec/vls/avg-3.c  |  4 +-
>  .../gcc.target/riscv/rvv/autovec/vls/avg-4.c  |  6 +--
>  .../gcc.target/riscv/rvv/autovec/vls/avg-5.c  |  6 +--
>  .../gcc.target/riscv/rvv/autovec/vls/avg-6.c  |  6 +--
>  .../riscv/rvv/autovec/widen/vec-avg-rv32gcv.c |  7 +--
>  .../riscv/rvv/autovec/widen/vec-avg-rv64gcv.c |  7 +--
>  13 files changed, 86 insertions(+), 44 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 775eaa825b0..706cd9717cb 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -2345,39 +2345,39 @@
>  ;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
>  ;; -
>
> -(define_expand "avg3_floor"
> +(define_expand "avg3_floor"
>   [(set (match_operand: 0 "register_operand")
> (truncate:
> -(:VWEXTI
> +(ashiftrt:VWEXTI
>   (plus:VWEXTI
> -  (any_extend:VWEXTI
> +  (sign_extend:VWEXTI
> (match_operand: 1 "register_operand"))
> -  (any_extend:VWEXTI
> +  (sign_extend:VWEXTI
> (match_operand: 2 "register_operand"))]
>"TARGET_VECTOR"
>  {
>/* First emit a widening addition.  */
>rtx tmp1 = gen_reg_rtx (mode);
>rtx ops1[] = {tmp1, operands[1], operands[2]};
> -  insn_code icode = code_for_pred_dual_widen (PLUS, , mode);
> +  insn_code icode = code_for_pred_dual_widen (PLUS, SIGN_EXTEND, mode);
>riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
>
>/* Then a narrowing shift.  */
>rtx ops2[] = {operands[0], tmp1, const1_rtx};
> -  icode = code_for_pred_narrow_scalar (, mode);
> +  icode = code_for_pred_narrow_scalar (ASHIFTRT, mode);
>riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
>DONE;
>  })
>
> -(define_expand "avg3_ceil"
> +(define_expand "avg3_ceil"
>   [(set (match_operand: 0 "register_operand")
> (truncate:
> -(:VWEXTI
> +(ashiftrt:VWEXTI
>   (plus:VWEXTI
>(plus:VWEXTI
> -   (any_extend:VWEXTI
> +   (sign_extend:VWEXTI
> (match_operand: 1 "register_operand"))
> -   

[PATCH][wwwdoc] gcc-14: Add arm cortex-m52 cpu support

2024-01-09 Thread Chung-Ju Wu

Hi Gerald,

The Arm Cortex-M52 CPU has been added to the upstream:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642230.html

I would like to document this on the gcc-14 changes.html page.
Attached is the patch for gcc-wwwdocs repository.

Is it OK?

Regards,
jasonwucjFrom 2513e83f07490451eb4be593454afa1f513b6153 Mon Sep 17 00:00:00 2001
From: Chung-Ju Wu 
Date: Thu, 7 Dec 2023 13:33:43 +0800
Subject: [PATCH] gcc-14: Add arm cortex-m52 cpu support

---
 htdocs/gcc-14/changes.html | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index e3a68998..0e3557c7 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -328,7 +328,12 @@ a work-in-progress.
 
 
 
-
+arm
+
+  The Cortex-M52 CPU is now supported through the cortex-m52
+  argument to the -mcpu and -mtune options.
+  
+
 
 
 
-- 
2.34.3



Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread juzhe.zh...@rivai.ai
Why do you add theadvector shapes ? I think you can reuse the current existing 
shapes.

+thead-vector-builtins.o: \
+  $(srcdir)/config/riscv/thead-vector-builtins.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
+  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
+  $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
+  gimple-iterator.h gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \
+  rtx-vector-builder.h \
+  $(srcdir)/config/riscv/riscv-vector-builtins-shapes.h \
+  $(srcdir)/config/riscv/riscv-vector-builtins-bases.h \
+  $(srcdir)/config/riscv/thead-vector-builtins.h \
+  $(RISCV_BUILTINS_H)
+   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+   $(srcdir)/config/riscv/thead-vector-builtins.cc
+
Why do you rebuild another new object ?

+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
Incorrect copyright





juzhe.zh...@rivai.ai
 
发件人: joshua
发送时间: 2024-01-10 10:57
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
Hi Juzhe,
Thank you for so many useful comments for this patch!
 
There are some more patches to support xtheadvector
special instrinsics as well as handle register overlap issue and
rewrite assembly output.
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641774.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641732.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html
 
Also, there is a precedent patch to refactor riscv-vector-builtins-bases.cc
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641530.html
Jeff has reviewed it, but didn't have LGTM yet.
 
Joshua
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 10:34
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Thanks for your patience.
 
 
LGTM from myside.
 
 
I think it's pretty clean now. I can image in the future when some day the 
theadvector is no longer used, we can remove it very easily.
 
 
And also,  the theadvector won't affect our RVV1.0 maintain since it's isolated 
cleanly. 
 
 
But I'd like to wait for a few more days some body want to chime in.
 
 
And you should do more things before commit it:
1. Remember you should run the full coverage RVV1.0 API test, the 
test-generator is downloaded from official intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc 
 
 
2. Also the regression of RV32 an RV64 of GCC testsuite.
 
 
Do you have more patches of theadvector that I didn't review ? plz point them 
to me again.
 
 
Thanks.
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2024-01-10 10:22
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-string.cc (vls_mode_valid_p): 
Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.

Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread juzhe.zh...@rivai.ai
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html 

This patch is ok from my side.



juzhe.zh...@rivai.ai
 
发件人: joshua
发送时间: 2024-01-10 10:57
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
Hi Juzhe,
Thank you for so many useful comments for this patch!
 
There are some more patches to support xtheadvector
special instrinsics as well as handle register overlap issue and
rewrite assembly output.
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641774.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641732.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html
 
Also, there is a precedent patch to refactor riscv-vector-builtins-bases.cc
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641530.html
Jeff has reviewed it, but didn't have LGTM yet.
 
Joshua
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 10:34
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Thanks for your patience.
 
 
LGTM from myside.
 
 
I think it's pretty clean now. I can image in the future when some day the 
theadvector is no longer used, we can remove it very easily.
 
 
And also,  the theadvector won't affect our RVV1.0 maintain since it's isolated 
cleanly. 
 
 
But I'd like to wait for a few more days some body want to chime in.
 
 
And you should do more things before commit it:
1. Remember you should run the full coverage RVV1.0 API test, the 
test-generator is downloaded from official intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc 
 
 
2. Also the regression of RV32 an RV64 of GCC testsuite.
 
 
Do you have more patches of theadvector that I didn't review ? plz point them 
to me again.
 
 
Thanks.
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2024-01-10 10:22
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-string.cc (vls_mode_valid_p): 
Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config.gcc|   2 +-
gcc/config/riscv/autovec.md   |   2 +-
gcc/config/riscv/predicates.md|   4 +-
gcc/config/riscv/riscv-c.cc   |   3 +-
gcc/config/riscv/riscv-string.cc  |   3 

Re: [PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-09 Thread juzhe.zh...@rivai.ai
LGTM from myside. Give another a few more days that some one want to chime in.



juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2024-01-10 14:51
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector 
instructions
For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
and floating-point compare instructions, an illegal instruction
exception will be raised if the destination vector register overlaps
a source vector register group.
 
To handle this issue, we use "group_overlap" and "enabled" attribute
to disable some alternatives for xtheadvector.
 
gcc/ChangeLog:
 
* config/riscv/riscv.md (none,W21,W42,W84,W43,W86,W87,W0):
(none,W21,W42,W84,W43,W86,W87,W0,thv_disabled,rvv_disabled):
Add group-overlap constraint for xtheadvector.
* config/riscv/vector.md: 
Disable alternatives that destination register overlaps
source register group for xtheadvector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config/riscv/riscv.md  |  12 +-
gcc/config/riscv/vector.md | 314 +
2 files changed, 190 insertions(+), 136 deletions(-)
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 84212430dc0..411d1d17391 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -537,7 +537,7 @@
;; Widening instructions have group-overlap constraints.  Those are only
;; valid for certain register-group sizes.  This attribute marks the
;; alternatives not matching the required register-group size as disabled.
-(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
+(define_attr "group_overlap" 
"none,W21,W42,W84,W43,W86,W87,W0,thv_disabled,rvv_disabled"
   (const_string "none"))
(define_attr "group_overlap_valid" "no,yes"
@@ -576,7 +576,15 @@
  (and (eq_attr "group_overlap" "W0")
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) > 1"))
(const_string "no")
-]
+
+ (and (eq_attr "group_overlap" "thv_disabled")
+   (match_test "TARGET_XTHEADVECTOR"))
+ (const_string "no")
+
+ (and (eq_attr "group_overlap" "rvv_disabled")
+   (match_test "TARGET_VECTOR && !TARGET_XTHEADVECTOR"))
+ (const_string "no")
+ ]
(const_string "yes")))
;; Attribute to control enable or disable instructions.
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 3eb6daafbc2..4748ddd34a2 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3260,7 +3260,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none,none")])
(define_insn "@pred_msbc"
   [(set (match_operand: 0 "register_operand""=vr, vr, ")
@@ -3279,7 +3280,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,thv_disabled,none")])
(define_insn "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3299,7 +3301,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
(define_insn "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3319,7 +3322,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
(define_expand "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3368,7 +3372,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
(define_insn "*pred_madc_extended_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3389,7 +3394,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
(define_expand "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3438,7 +3444,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" 

[PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-09 Thread Jun Sha (Joshua)
For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
and floating-point compare instructions, an illegal instruction
exception will be raised if the destination vector register overlaps
a source vector register group.

To handle this issue, we use "group_overlap" and "enabled" attribute
to disable some alternatives for xtheadvector.

gcc/ChangeLog:

* config/riscv/riscv.md (none,W21,W42,W84,W43,W86,W87,W0):
(none,W21,W42,W84,W43,W86,W87,W0,thv_disabled,rvv_disabled):
Add group-overlap constraint for xtheadvector.
* config/riscv/vector.md: 
Disable alternatives that destination register overlaps
source register group for xtheadvector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.md  |  12 +-
 gcc/config/riscv/vector.md | 314 +
 2 files changed, 190 insertions(+), 136 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 84212430dc0..411d1d17391 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -537,7 +537,7 @@
 ;; Widening instructions have group-overlap constraints.  Those are only
 ;; valid for certain register-group sizes.  This attribute marks the
 ;; alternatives not matching the required register-group size as disabled.
-(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
+(define_attr "group_overlap" 
"none,W21,W42,W84,W43,W86,W87,W0,thv_disabled,rvv_disabled"
   (const_string "none"))
 
 (define_attr "group_overlap_valid" "no,yes"
@@ -576,7 +576,15 @@
  (and (eq_attr "group_overlap" "W0")
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
> 1"))
 (const_string "no")
-]
+
+(and (eq_attr "group_overlap" "thv_disabled")
+ (match_test "TARGET_XTHEADVECTOR"))
+(const_string "no")
+
+(and (eq_attr "group_overlap" "rvv_disabled")
+ (match_test "TARGET_VECTOR && !TARGET_XTHEADVECTOR"))
+(const_string "no")
+   ]
(const_string "yes")))
 
 ;; Attribute to control enable or disable instructions.
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 3eb6daafbc2..4748ddd34a2 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3260,7 +3260,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none,none")])
 
 (define_insn "@pred_msbc"
   [(set (match_operand: 0 "register_operand""=vr, vr, ")
@@ -3279,7 +3280,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,thv_disabled,none")])
 
 (define_insn "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3299,7 +3301,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
 
 (define_insn "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3319,7 +3322,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
 
 (define_expand "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3368,7 +3372,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
 
 (define_insn "*pred_madc_extended_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3389,7 +3394,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
 
 (define_expand "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3438,7 +3444,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "thv_disabled,none")])
 
 (define_insn "*pred_msbc_extended_scalar"
   [(set (match_operand: 0 "register_operand"  "=vr, ")
@@ -3459,7 +3466,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") 

Re: [PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-09 Thread juzhe.zh...@rivai.ai
+ (and (eq_attr "group_overlap" "th")
+   (match_test "TARGET_XTHEADVECTOR"))
+ (const_string "no")
+
+ (and (eq_attr "group_overlap" "rvv")
+   (match_test "TARGET_VECTOR && !TARGET_XTHEADVECTOR"))
+ (const_string "no")
+ ]


Change it into:

+ (and (eq_attr "group_overlap" "thv_disabled")
+   (match_test "TARGET_XTHEADVECTOR"))
+ (const_string "no")
+
+ (and (eq_attr "group_overlap" "rvv_disabled")
+   (match_test "TARGET_VECTOR && !TARGET_XTHEADVECTOR"))
+ (const_string "no")
+ ]





juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2024-01-10 14:02
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector 
instructions
For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
and floating-point compare instructions, an illegal instruction
exception will be raised if the destination vector register overlaps
a source vector register group.
 
To handle this issue, we use "group_overlap" and "enabled" attribute
to disable some alternatives for xtheadvector.
 
gcc/ChangeLog:
 
* config/riscv/riscv.md (none,W21,W42,W84,W43,W86,W87,W0):
(none,W21,W42,W84,W43,W86,W87,W0,th):
Add group-overlap constraint for xtheadvector.
* config/riscv/vector.md: 
Disable alternatives that destination register overlaps
source register group for xtheadvector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config/riscv/riscv.md  |  12 +-
gcc/config/riscv/vector.md | 314 +
2 files changed, 190 insertions(+), 136 deletions(-)
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 84212430dc0..2fe15fd7340 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -537,7 +537,7 @@
;; Widening instructions have group-overlap constraints.  Those are only
;; valid for certain register-group sizes.  This attribute marks the
;; alternatives not matching the required register-group size as disabled.
-(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
+(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0,th,rvv"
   (const_string "none"))
(define_attr "group_overlap_valid" "no,yes"
@@ -576,7 +576,15 @@
  (and (eq_attr "group_overlap" "W0")
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) > 1"))
(const_string "no")
-]
+
+ (and (eq_attr "group_overlap" "th")
+   (match_test "TARGET_XTHEADVECTOR"))
+ (const_string "no")
+
+ (and (eq_attr "group_overlap" "rvv")
+   (match_test "TARGET_VECTOR && !TARGET_XTHEADVECTOR"))
+ (const_string "no")
+ ]
(const_string "yes")))
;; Attribute to control enable or disable instructions.
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 3eb6daafbc2..cd83c1f3321 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3260,7 +3260,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none,none")])
(define_insn "@pred_msbc"
   [(set (match_operand: 0 "register_operand""=vr, vr, ")
@@ -3279,7 +3280,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,th,none")])
(define_insn "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3299,7 +3301,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
(define_insn "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3319,7 +3322,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
(define_expand "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3368,7 +3372,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
(define_insn "*pred_madc_extended_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3389,7 +3394,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
(define_expand "@pred_msbc_scalar"
   

[PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-09 Thread Jun Sha (Joshua)
For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
and floating-point compare instructions, an illegal instruction
exception will be raised if the destination vector register overlaps
a source vector register group.

To handle this issue, we use "group_overlap" and "enabled" attribute
to disable some alternatives for xtheadvector.

gcc/ChangeLog:

* config/riscv/riscv.md (none,W21,W42,W84,W43,W86,W87,W0):
(none,W21,W42,W84,W43,W86,W87,W0,th):
Add group-overlap constraint for xtheadvector.
* config/riscv/vector.md: 
Disable alternatives that destination register overlaps
source register group for xtheadvector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.md  |  12 +-
 gcc/config/riscv/vector.md | 314 +
 2 files changed, 190 insertions(+), 136 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 84212430dc0..2fe15fd7340 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -537,7 +537,7 @@
 ;; Widening instructions have group-overlap constraints.  Those are only
 ;; valid for certain register-group sizes.  This attribute marks the
 ;; alternatives not matching the required register-group size as disabled.
-(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
+(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0,th,rvv"
   (const_string "none"))
 
 (define_attr "group_overlap_valid" "no,yes"
@@ -576,7 +576,15 @@
  (and (eq_attr "group_overlap" "W0")
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
> 1"))
 (const_string "no")
-]
+
+(and (eq_attr "group_overlap" "th")
+ (match_test "TARGET_XTHEADVECTOR"))
+(const_string "no")
+
+(and (eq_attr "group_overlap" "rvv")
+ (match_test "TARGET_VECTOR && !TARGET_XTHEADVECTOR"))
+(const_string "no")
+   ]
(const_string "yes")))
 
 ;; Attribute to control enable or disable instructions.
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 3eb6daafbc2..cd83c1f3321 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3260,7 +3260,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none,none")])
 
 (define_insn "@pred_msbc"
   [(set (match_operand: 0 "register_operand""=vr, vr, ")
@@ -3279,7 +3280,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,th,none")])
 
 (define_insn "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3299,7 +3301,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3319,7 +3322,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_expand "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3368,7 +3372,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "*pred_madc_extended_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, ")
@@ -3389,7 +3394,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_expand "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3438,7 +3444,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "*pred_msbc_extended_scalar"
   [(set (match_operand: 0 "register_operand"  "=vr, ")
@@ -3459,7 +3466,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "@pred_madc_overflow"
   

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-09 Thread Kewen.Lin
on 2024/1/8 19:44, Richard Biener wrote:
> On Mon, Jan 8, 2024 at 3:35 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR113100 shows, the unbiasing introduced by r14-6737 can
>> cause the scrubbing to overrun and screw some critical data
>> on stack like saved toc base consequently cause segfault on
>> Power.
>>
>> By checking PR112917, IMHO we should keep this unbiasing
>> guarded under SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 &&
>> TARGET_STACK_BIAS), similar to some existing code special
>> treating SPARC stack bias.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux and
>> powerpc64{,le}-linux-gnu.  All reported failures in
>> PR113100 are gone.  I also expect the culprit commit can
>> affect those ports with nonzero STACK_POINTER_OFFSET.
>>
>> Is it ok for trunk?
> 
> OK

Thanks!  Pushed as r14-7089.

BR,
Kewen

> 
>> BR,
>> Kewen
>> -
>> PR middle-end/113100
>>
>> gcc/ChangeLog:
>>
>> * builtins.cc (expand_builtin_stack_address): Guard stack point
>> adjustment with SPARC_STACK_BOUNDARY_HACK.
>> ---
>>  gcc/builtins.cc | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>> index 125ea158ebf..9bad1e962b4 100644
>> --- a/gcc/builtins.cc
>> +++ b/gcc/builtins.cc
>> @@ -5450,6 +5450,7 @@ expand_builtin_stack_address ()
>>rtx ret = convert_to_mode (ptr_mode, copy_to_reg (stack_pointer_rtx),
>>  STACK_UNSIGNED);
>>
>> +#ifdef SPARC_STACK_BOUNDARY_HACK
>>/* Unbias the stack pointer, bringing it to the boundary between the
>>   stack area claimed by the active function calling this builtin,
>>   and stack ranges that could get clobbered if it called another
>> @@ -5476,7 +5477,9 @@ expand_builtin_stack_address ()
>>   (caller) function's active area as well, whereas those pushed or
>>   allocated temporarily for a call are regarded as part of the
>>   callee's stack range, rather than the caller's.  */
>> -  ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET);
>> +  if (SPARC_STACK_BOUNDARY_HACK)
>> +ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET);
>> +#endif
>>
>>return force_reg (ptr_mode, ret);
>>  }
>> --
>> 2.39.3


[PATCH] RISC-V: Refine unsigned avg_floor/avg_ceil

2024-01-09 Thread Juzhe-Zhong
This patch is inspired by LLVM patches:
https://github.com/llvm/llvm-project/pull/76550
https://github.com/llvm/llvm-project/pull/77473

Use vaaddu for AVG vectorization.

Before this patch:

vsetivlizero,8,e8,mf2,ta,ma
vle8.v  v3,0(a1)
vle8.v  v2,0(a2)
vwaddu.vvv1,v3,v2
vsetvli zero,zero,e16,m1,ta,ma
vadd.vi v1,v1,1
vsetvli zero,zero,e8,mf2,ta,ma
vnsrl.wiv1,v1,1
vse8.v  v1,0(a0)
ret

After this patch:

vsetivlizero,8,e8,mf2,ta,ma
csrwi   vxrm,0
vle8.v  v1,0(a1)
vle8.v  v2,0(a2)
vaaddu.vv   v1,v1,v2
vse8.v  v1,0(a0)
ret

Note on signed averaging addition

Based on the rvv spec, there is also a variant for signed averaging addition 
called vaadd.
But AFAIU, no matter in which rounding mode, we cannot achieve the semantic of 
signed averaging addition through vaadd.
Thus this patch only introduces vaaddu.

More details in:
https://github.com/riscv/riscv-v-spec/issues/935
https://github.com/riscv/riscv-v-spec/issues/934

Tested on both RV32 and RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md (avg3_floor): Remove.
(avg3_floor): New pattern.
(avg3_ceil): Remove.
(avg3_ceil): New pattern.
(uavg3_floor): Ditto.
(uavg3_ceil): Ditto.
* config/riscv/riscv-protos.h (enum insn_flags): Add for average 
addition.
(enum insn_type): Ditto.
* config/riscv/riscv-v.cc: Ditto.
* config/riscv/vector-iterators.md (ashiftrt): Remove.
(ASHIFTRT): Ditto.
* config/riscv/vector.md: Add VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/avg-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/avg-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/avg-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/avg-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/avg-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/avg-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: Ditto.

---
 gcc/config/riscv/autovec.md   | 50 ++-
 gcc/config/riscv/riscv-protos.h   |  8 +++
 gcc/config/riscv/riscv-v.cc   | 11 
 gcc/config/riscv/vector-iterators.md  |  5 --
 gcc/config/riscv/vector.md| 12 ++---
 .../gcc.target/riscv/rvv/autovec/vls/avg-1.c  |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/avg-2.c  |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/avg-3.c  |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/avg-4.c  |  6 +--
 .../gcc.target/riscv/rvv/autovec/vls/avg-5.c  |  6 +--
 .../gcc.target/riscv/rvv/autovec/vls/avg-6.c  |  6 +--
 .../riscv/rvv/autovec/widen/vec-avg-rv32gcv.c |  7 +--
 .../riscv/rvv/autovec/widen/vec-avg-rv64gcv.c |  7 +--
 13 files changed, 86 insertions(+), 44 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 775eaa825b0..706cd9717cb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2345,39 +2345,39 @@
 ;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
 ;; -
 
-(define_expand "avg3_floor"
+(define_expand "avg3_floor"
  [(set (match_operand: 0 "register_operand")
(truncate:
-(:VWEXTI
+(ashiftrt:VWEXTI
  (plus:VWEXTI
-  (any_extend:VWEXTI
+  (sign_extend:VWEXTI
(match_operand: 1 "register_operand"))
-  (any_extend:VWEXTI
+  (sign_extend:VWEXTI
(match_operand: 2 "register_operand"))]
   "TARGET_VECTOR"
 {
   /* First emit a widening addition.  */
   rtx tmp1 = gen_reg_rtx (mode);
   rtx ops1[] = {tmp1, operands[1], operands[2]};
-  insn_code icode = code_for_pred_dual_widen (PLUS, , mode);
+  insn_code icode = code_for_pred_dual_widen (PLUS, SIGN_EXTEND, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
 
   /* Then a narrowing shift.  */
   rtx ops2[] = {operands[0], tmp1, const1_rtx};
-  icode = code_for_pred_narrow_scalar (, mode);
+  icode = code_for_pred_narrow_scalar (ASHIFTRT, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
   DONE;
 })
 
-(define_expand "avg3_ceil"
+(define_expand "avg3_ceil"
  [(set (match_operand: 0 "register_operand")
(truncate:
-(:VWEXTI
+(ashiftrt:VWEXTI
  (plus:VWEXTI
   (plus:VWEXTI
-   (any_extend:VWEXTI
+   (sign_extend:VWEXTI
(match_operand: 1 "register_operand"))
-   (any_extend:VWEXTI
+   (sign_extend:VWEXTI
(match_operand: 2 "register_operand")))
   (const_int 1)]
   "TARGET_VECTOR"
@@ -2385,7 +2385,7 @@
   /* First emit a widening addition.  */
   rtx tmp1 = gen_reg_rtx (mode);
   rtx ops1[] = {tmp1, operands[1], operands[2]};
-  insn_code icode = 

RE: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

2024-01-09 Thread Li, Pan2
Thanks Jeff and Richard for confirmation and comments.

It looks like firstly we should address the issue of the original commits in v4 
and then
back to if there is something we need to deal with option no-signed-zero for 
the riscv.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, January 10, 2024 1:46 AM
To: Richard Biener ; Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: Re: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option



On 1/8/24 03:45, Richard Biener wrote:
> On Tue, Jan 2, 2024 at 2:37 PM  wrote:
>>
>> From: Pan Li 
>>
>> According to the sematics of no-signed-zeros option, the backend
>> like RISC-V should treat the minus zero -0.0f as plus zero 0.0f.
>>
>> Consider below example with option -fno-signed-zeros.
>>
>> void
>> test (float *a)
>> {
>>*a = -0.0;
>> }
>>
>> We will generate code as below, which doesn't treat the minus zero
>> as plus zero.
>>
>> test:
>>lui  a5,%hi(.LC0)
>>flw  fa5,%lo(.LC0)(a5)
>>fsw  fa5,0(a0)
>>ret
>>
>> .LC0:
>>.word -2147483648 // aka -0.0 (0x8000 in hex)
>>
>> This patch would like to fix the bug and treat the minus zero -0.0
>> as plus zero, aka +0.0. Thus after this patch we will have asm code
>> as below for the above sampe code.
>>
>> test:
>>sw zero,0(a0)
>>ret
>>
>> This patch also fix the run failure of the test case pr30957-1.c. The
>> below tests are passed for this patch.
> 
> We don't really expect targets to do this.  The small testcase above
> is somewhat ill-formed with -fno-signed-zeros.  Note there's no
> -0.0 in pr30957-1.c so why does that one fail for you?  Does
> the -fvariable-expansion-in-unroller code maybe not trigger for
> riscv?
Loop unrolling (and thus variable expansion) doesn't trigger on the VLA 
style architectures.  aarch64 passes becuase its backend knows it can 
translate -0.0 into 0.0.

While we don't require that from ports, I'd just assume do the 
optimization similar to aarch64 rather than xfail or skip the test on 
RISC-V.  We can load 0.0 more efficiently than -0.0.


> 
> I think we should go to PR30957 and see what that was filed originally
> for, the testcase doesn't make much sense to me.
It's got more history than I'd like :(


jeff


Re:[pushed] [PATCH v2 0/4] Adjust option handling code

2024-01-09 Thread chenglulu

Pushed to r14-7085...r14-7088

在 2024/1/8 上午9:14, Yang Yujie 写道:

This patchset performs some code cleanup, and is bootstrapped and regtested
on loongarch64-linux-gnu.

Changes from v1 -> v2:
* Replaced all TARGET_ macros from .opt.
* Fixed definition of ISA_HAS_LAMCAS.

Yang Yujie (4):
   LoongArch: Handle ISA evolution switches along with other options
   LoongArch: Rename ISA_BASE_LA64V100 to ISA_BASE_LA64
   LoongArch: Use enums for constants
   LoongArch: Simplify -mexplicit-reloc definitions

  gcc/config/loongarch/genopts/genstr.sh|   2 +-
  .../loongarch/genopts/loongarch-strings   |   8 +-
  gcc/config/loongarch/genopts/loongarch.opt.in |  16 +--
  gcc/config/loongarch/lasx.md  |   4 +-
  gcc/config/loongarch/loongarch-builtins.cc|   6 +-
  gcc/config/loongarch/loongarch-c.cc   |   2 +-
  gcc/config/loongarch/loongarch-cpu.cc |   2 +-
  gcc/config/loongarch/loongarch-def.cc |  14 +-
  gcc/config/loongarch/loongarch-def.h  | 120 +++---
  gcc/config/loongarch/loongarch-driver.cc  |   5 +-
  gcc/config/loongarch/loongarch-opts.cc|  27 +++-
  gcc/config/loongarch/loongarch-opts.h |  26 +++-
  gcc/config/loongarch/loongarch-str.h  |   7 +-
  gcc/config/loongarch/loongarch.cc |  36 ++
  gcc/config/loongarch/loongarch.md |  12 +-
  gcc/config/loongarch/loongarch.opt|  20 +--
  gcc/config/loongarch/lsx.md   |   4 +-
  gcc/config/loongarch/sync.md  |  22 ++--
  18 files changed, 180 insertions(+), 153 deletions(-)





[PATCH V2] RISC-V: Minor tweak dynamic cost model

2024-01-09 Thread Juzhe-Zhong
v2 update: Robostify tests.

While working on cost model, I notice one case that dynamic lmul cost doesn't 
work well.

Before this patch:

foo:
lui a4,%hi(.LANCHOR0)
li  a0,1953
li  a1,63
addia4,a4,%lo(.LANCHOR0)
li  a3,64
vsetvli a2,zero,e32,mf2,ta,ma
vmv.v.x v5,a0
vmv.v.x v4,a1
vid.v   v3
.L2:
vsetvli a5,a3,e32,mf2,ta,ma
vadd.vi v2,v3,1
vadd.vv v1,v3,v5
mv  a2,a5
vmacc.vvv1,v2,v4
sllia1,a5,2
vse32.v v1,0(a4)
sub a3,a3,a5
add a4,a4,a1
vsetvli a5,zero,e32,mf2,ta,ma
vmv.v.x v1,a2
vadd.vv v3,v3,v1
bne a3,zero,.L2
li  a0,0
ret

Unexpected: Use scalable vector and LMUL = MF2 which is wasting computation 
resources.

Ideally, we should use LMUL = M8 VLS modes.

The root cause is the dynamic LMUL heuristic dominates the VLS heuristic.
Adapt the cost model heuristic.

After this patch:

foo:
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
li  a3,4096
li  a5,32
li  a1,2016
addia2,a4,128
addiw   a3,a3,-32
vsetvli zero,a5,e32,m8,ta,ma
li  a0,0
vid.v   v8
vsll.vi v8,v8,6
vadd.vx v16,v8,a1
vadd.vx v8,v8,a3
vse32.v v16,0(a4)
vse32.v v8,0(a2)
ret

Tested on both RV32/RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): 
Minior tweak.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Fix test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.

---
 gcc/config/riscv/riscv-vector-costs.cc | 3 ++-
 .../gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c| 5 ++---
 .../gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c| 5 ++---
 .../gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c| 7 +++
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index f4a1a789f23..e53f4a186f3 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -994,7 +994,8 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
 vect_vf_for_cost (other_loop_vinfo));
 
   /* Apply the unrolling heuristic described above m_unrolled_vls_niters.  */
-  if (bool (m_unrolled_vls_stmts) != bool (other->m_unrolled_vls_stmts))
+  if (bool (m_unrolled_vls_stmts) != bool (other->m_unrolled_vls_stmts)
+  && m_cost_type != other->m_cost_type)
 {
   bool this_prefer_unrolled = this->prefer_unrolled_loop ();
   bool other_prefer_unrolled = other->prefer_unrolled_loop ();
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c
index 3ddffa37fe4..89a6c678960 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c
@@ -3,7 +3,7 @@
 
 #include 
 
-#define N 40
+#define N 48
 
 int a[N];
 
@@ -22,7 +22,6 @@ foo (){
   return 0;
 }
 
-/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*8,\s*e32,\s*m2,\s*t[au],\s*m[au]} 1 } } */
 /* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*16,\s*e32,\s*m4,\s*t[au],\s*m[au]} 1 } } */
-/* { dg-final { scan-assembler-times {vsetivli} 2 } } */
+/* { dg-final { scan-assembler-times {vsetivli} 1 } } */
 /* { dg-final { scan-assembler-not {vsetvli} } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c
index 7625ec5c4b1..86732ef2ce5 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c
@@ -3,7 +3,7 @@
 
 #include 
 
-#define N 40
+#define N 64
 
 int a[N];
 
@@ -22,7 +22,6 @@ foo (){
   return 0;
 }
 
-/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*8,\s*e32,\s*m2,\s*t[au],\s*m[au]} 1 } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e32,\s*m8,\s*t[au],\s*m[au]} 1 } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c
index 7625ec5c4b1..a1fcb3f3443 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c
@@ -1,9 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 

Re: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread juzhe.zh...@rivai.ai
 ;; We don't use early-clobber for LMUL <= 1 to get better codegen.
 (define_insn "*pred_cmp"
-  [(set (match_operand: 0 "register_operand""=vr,   vr,   
vr,   vr")
+  [(set (match_operand: 0 "register_operand""=vr,   vr,   
vr,   vr,   ,   ,   ,   ")
(if_then_else:
  (unspec:
-   [(match_operand: 1 "vector_mask_operand"  
"vmWc1,vmWc1,vmWc1,vmWc1")
-(match_operand 6 "vector_length_operand" "   rK,   rK,   
rK,   rK")
-(match_operand 7 "const_int_operand" "i,i,
i,i")
-(match_operand 8 "const_int_operand" "i,i,
i,i")
+   [(match_operand: 1 "vector_mask_operand"  
"vmWc1,vmWc1,vmWc1,vmWc1,vmWc1,vmWc1,vmWc1,vmWc1")
+(match_operand 6 "vector_length_operand" "   rK,   rK,   
rK,   rK,   rK,   rK,   rK,   rK")
+(match_operand 7 "const_int_operand" "i,i,
i,i,i,i,i,i")
+(match_operand 8 "const_int_operand" "i,i,
i,i,i,i,i,i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (match_operator: 3 "comparison_except_ltge_operator"
-[(match_operand:V_VLSI 4 "register_operand"  "   vr,   vr, 
  vr,   vr")
- (match_operand:V_VLSI 5 "vector_arith_operand"  "   vr,   vr, 
  vi,   vi")])
- (match_operand: 2 "vector_merge_operand""   vu,0,   
vu,0")))]
+[(match_operand:V_VLSI 4 "register_operand"  "   vr,   vr, 
  vr,   vr,   vr,   vr,   vr,   vr")
+ (match_operand:V_VLSI 5 "vector_arith_operand"  "   vr,   vr, 
  vi,   vi,   vr,   vr,   vi,   vi")])
+ (match_operand: 2 "vector_merge_operand""   vu,0,   
vu,0,   vu,0,   vu,0")))]
   "TARGET_VECTOR && riscv_vector::cmp_lmul_le_one (mode)"
   "vms%B3.v%o5\t%0,%4,%v5%p1"
   [(set_attr "type" "vicmp")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set_attr "group_overlap" "th,th,th,th,none,none,none,none")])

You are add ", , , " which will be enabled when TARGET_VECTOR.

You should disable these constraints when TARGET_VECTOR is enabled.


juzhe.zh...@rivai.ai
 
发件人: joshua
发送时间: 2024-01-10 10:57
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
Hi Juzhe,
Thank you for so many useful comments for this patch!
 
There are some more patches to support xtheadvector
special instrinsics as well as handle register overlap issue and
rewrite assembly output.
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641774.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641732.html
 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html
 
Also, there is a precedent patch to refactor riscv-vector-builtins-bases.cc
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641530.html
Jeff has reviewed it, but didn't have LGTM yet.
 
Joshua
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 10:34
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
 
Thanks for your patience.
 
 
LGTM from myside.
 
 
I think it's pretty clean now. I can image in the future when some day the 
theadvector is no longer used, we can remove it very easily.
 
 
And also,  the theadvector won't affect our RVV1.0 maintain since it's isolated 
cleanly. 
 
 
But I'd like to wait for a few more days some body want to chime in.
 
 
And you should do more things before commit it:
1. Remember you should run the full coverage RVV1.0 API test, the 
test-generator is downloaded from official intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc 
 
 
2. Also the regression of RV32 an RV64 of GCC testsuite.
 
 
Do you have more patches of theadvector that I didn't review ? plz point them 
to me again.
 
 
Thanks.
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2024-01-10 10:22
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string 

Re:Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-09 Thread joshua
Hi Kito,

Thank you for your support again.
I believe we can get all our xtheadvector patches
ready before the end of Feb.

May I please ping the arch patch again?
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641801.html
This is the patch that all the following patches rely on.

Joshua






--
发件人:Kito Cheng 
发送时间:2024年1月8日(星期一) 11:40
收件人:joshua
抄 送:"juzhe.zh...@rivai.ai"; 
jeffreyalaw; "gcc-patches"; Jim 
Wilson; palmer; 
andrew; "philipp.tomsich"; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.


It depends on the timing when you send out the v1 patch to the mailing
list, not the timing of when to merge, but of course it's case by
case, I would say no IF it's still not ready when time is the end of
Feb for this kind of big patch set.

On Mon, Jan 8, 2024 at 11:17 AM joshua  wrote:
>
> Hi Kito,
>
> Thank you for your support.
> So even during stage 4, we can merge this for GCC 14?
>
>
>
>
>
> --
> 发件人:Kito Cheng 
> 发送时间:2024年1月8日(星期一) 11:06
> 收件人:joshua
> 抄 送:"juzhe.zh...@rivai.ai"; 
> jeffreyalaw; "gcc-patches"; 
> Jim Wilson; palmer; 
> andrew; "philipp.tomsich"; 
> "christoph.muellner"; 
> jinma; "cooper.qu"
> 主 题:Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
> I am ok with merging this for GCC 14, as we discussed several times in
> the RISC-V GCC sync up meeting, I think at least we reach consensus
> among Jeff Law, Palmer Dabbelt and me.
>
> But please be careful: don't break anything for standard vector stuff.
>
> On Mon, Jan 8, 2024 at 10:11 AM joshua  
> wrote:
> >
> > Hi Juzhe,
> >
> > Stage 3 will close today and there are still some patches that
> > haven't been reviewed left.
> > So is it possible to get xtheadvector merged in GCC-14?
> > We emailed Kito regarding this, but haven't got any reply yet.
> >
> > Joshua
> >
> >
> >
> >
> >
> >
> > --
> > 发件人:juzhe.zh...@rivai.ai 
> > 发送时间:2024年1月4日(星期四) 17:18
> > 收件人:"cooper.joshua"; 
> > jeffreyalaw; "gcc-patches"
> > 抄 送:Jim Wilson; palmer; 
> > andrew; "philipp.tomsich"; 
> > "christoph.muellner"; 
> > jinma; "cooper.qu"
> > 主 题:Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
> > of XTheadVector.
> >
> >
> > \ No newline at end of file
> > Each file needs newline.
> >
> >
> > I am not able to review arch stuff. This needs kito.
> >
> >
> > Besides, Andrew Pinski want us defer theadvector to GCC-15.
> >
> >
> > I have no strong opinion here.
> >
> >
> > juzhe.zh...@rivai.ai
> >
> >
> > 发件人: joshua
> > 发送时间: 2024-01-04 17:15
> > 收件人: 钟居哲; Jeff Law; gcc-patches
> > 抄送: jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; 
> > jinma; Cooper Qu
> > 主题: Re:Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> > XTheadVector.
> >
> > Hi Juzhe,
> >
> > So is the following patch that this patch relies on OK to commit?
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
> >
> > Joshua
> >
> >
> >
> >
> > --
> > 发件人:钟居哲 
> > 发送时间:2024年1月2日(星期二) 06:57
> > 收件人:Jeff Law; 
> > "cooper.joshua"; 
> > "gcc-patches"
> > 抄 送:"jim.wilson.gcc"; palmer; 
> > andrew; "philipp.tomsich"; 
> > "Christoph Müllner"; 
> > jinma; Cooper Qu
> > 主 题:Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
> > of XTheadVector.
> >
> >
> > This is Ok from my side.
> > But before commit this patch, I think we need this patch first:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
> >
> >
> > I will be back to work so I will take a look at other patches today.
> > juzhe.zh...@rivai.ai
> >
> >
> > From: Jeff Law
> > Date: 2024-01-01 01:43
> > To: Jun Sha (Joshua); gcc-patches
> > CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> > juzhe.zhong; Jin Ma; Xianmiao Qu
> > Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
> > of XTheadVector.
> >
> >
> >
> > On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > > This patch adds th. prefix to all XTheadVector instructions by
> > > implementing new assembly output functions. We only check the
> > > prefix is 'v', so that no extra attribute is needed.
> > >
> > > gcc/ChangeLog:
> > >
> > >       * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> > >       New function to add assembler insn code prefix/suffix.
> > >       * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> > >       * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> > >
> > > Co-authored-by: Jin Ma 
> > > Co-authored-by: Xianmiao Qu 
> > > Co-authored-by: Christoph Müllner 
> > > ---
> > >   gcc/config/riscv/riscv-protos.h                    |  1 +
> > >   gcc/config/riscv/riscv.cc                     

Re:[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread joshua
Hi Juzhe,
Thank you for so many useful comments for this patch!

There are some more patches to support xtheadvector
special instrinsics as well as handle register overlap issue and
rewrite assembly output.

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641774.html

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641732.html

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641733.html

Also, there is a precedent patch to refactor riscv-vector-builtins-bases.cc
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641530.html
Jeff has reviewed it, but didn't have LGTM yet.

Joshua

--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月10日(星期三) 10:34
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector


Thanks for your patience.


LGTM from myside.


I think it's pretty clean now. I can image in the future when some day the 
theadvector is no longer used, we can remove it very easily.


And also,  the theadvector won't affect our RVV1.0 maintain since it's isolated 
cleanly. 


But I'd like to wait for a few more days some body want to chime in.


And you should do more things before commit it:
1. Remember you should run the full coverage RVV1.0 API test, the 
test-generator is downloaded from official intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc 


2. Also the regression of RV32 an RV64 of GCC testsuite.


Do you have more patches of theadvector that I didn't review ? plz point them 
to me again.


Thanks.
juzhe.zh...@rivai.ai

 
From: Jun Sha (Joshua)
Date: 2024-01-10 10:22
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-string.cc (vls_mode_valid_p): 
Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc    |   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md    |   4 +-
 gcc/config/riscv/riscv-c.cc   |   3 +-
 gcc/config/riscv/riscv-string.cc  |   3 +-
 gcc/config/riscv/riscv-v.cc   |   2 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  48 --
 

Re: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread juzhe.zh...@rivai.ai
Thanks for your patience.

LGTM from myside.

I think it's pretty clean now. I can image in the future when some day the 
theadvector is no longer used, we can remove it very easily.

And also,  the theadvector won't affect our RVV1.0 maintain since it's isolated 
cleanly. 

But I'd like to wait for a few more days some body want to chime in.

And you should do more things before commit it:
1. Remember you should run the full coverage RVV1.0 API test, the 
test-generator is downloaded from official intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc 

2. Also the regression of RV32 an RV64 of GCC testsuite.

Do you have more patches of theadvector that I didn't review ? plz point them 
to me again.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2024-01-10 10:22
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-string.cc (vls_mode_valid_p): 
Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config.gcc|   2 +-
gcc/config/riscv/autovec.md   |   2 +-
gcc/config/riscv/predicates.md|   4 +-
gcc/config/riscv/riscv-c.cc   |   3 +-
gcc/config/riscv/riscv-string.cc  |   3 +-
gcc/config/riscv/riscv-v.cc   |   2 +-
.../riscv/riscv-vector-builtins-bases.cc  |  48 --
.../riscv/riscv-vector-builtins-shapes.cc |  23 +++
gcc/config/riscv/riscv-vector-switch.def  | 150 +-
gcc/config/riscv/riscv.cc |  20 ++-
gcc/config/riscv/riscv_th_vector.h|  49 ++
gcc/config/riscv/thead-vector.md  | 102 
gcc/config/riscv/thead.cc |  23 ++-
gcc/config/riscv/vector.md|  49 --
.../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
.../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
gcc/testsuite/lib/target-supports.exp |  12 ++
17 files changed, 383 insertions(+), 113 deletions(-)
create mode 100644 gcc/config/riscv/riscv_th_vector.h
create mode 100644 gcc/config/riscv/thead-vector.md
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7e583390024..047e4c02cf4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
- extra_headers="riscv_vector.h"
+ extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"

[PATCH] Add -mevex512 into invoke.texi

2024-01-09 Thread Haochen Jiang
Hi Richard,

It seems that I send out a not updated patch. This patch should what
I want to send.

Thx,
Haochen

gcc/ChangeLog:

* doc/invoke.texi: Add -mevex512.
---
 gcc/doc/invoke.texi | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 68d1f364ac0..6d4f92f1101 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1463,7 +1463,7 @@ See RS/6000 and PowerPC Options.
 -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni
 -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16
 -mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4 -mapxf
--musermsr -mavx10.1 -mavx10.1-256 -mavx10.1-512
+-musermsr -mavx10.1 -mavx10.1-256 -mavx10.1-512 -mevex512
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg}
 -mkl -mwidekl
@@ -35272,6 +35272,11 @@ r8-r15 registers so that the call and jmp instruction 
length is 6 bytes
 to allow them to be replaced with @samp{lfence; call *%r8-r15} or
 @samp{lfence; jmp *%r8-r15} at run-time.
 
+@opindex mevex512
+@item -mevex512
+@itemx -mno-evex512
+Enables/disables 512-bit vector. It will be default on if AVX512F is enabled.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
-- 
2.31.1



[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-09 Thread Jun Sha (Joshua)
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 

For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.

gcc/ChangeLog:

* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-string.cc (vls_mode_valid_p): 
Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md|   4 +-
 gcc/config/riscv/riscv-c.cc   |   3 +-
 gcc/config/riscv/riscv-string.cc  |   3 +-
 gcc/config/riscv/riscv-v.cc   |   2 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  48 --
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +-
 gcc/config/riscv/riscv.cc |  20 ++-
 gcc/config/riscv/riscv_th_vector.h|  49 ++
 gcc/config/riscv/thead-vector.md  | 102 
 gcc/config/riscv/thead.cc |  23 ++-
 gcc/config/riscv/vector.md|  49 --
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |  12 ++
 17 files changed, 383 insertions(+), 113 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h
 create mode 100644 gcc/config/riscv/thead-vector.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7e583390024..047e4c02cf4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 775eaa825b0..0477781cabe 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2579,7 +2579,7 @@
   [(match_operand  0 "register_operand")
(match_operand  1 "memory_operand")
(match_operand:ANYI 2 "const_int_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && !TARGET_XTHEADVECTOR"
   {
 riscv_vector::expand_rawmemchr(mode, operands[0], operands[1],
   operands[2]);
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 

[PATCH] config: delete unused CYG_AC_PATH_LIBERTY macro

2024-01-09 Thread Mike Frysinger
Nothing uses this, so delete it to avoid confusion.

config/ChangeLog:

* acinclude.m4 (CYG_AC_PATH_LIBERTY): Delete.
---
 config/acinclude.m4 | 22 --
 1 file changed, 22 deletions(-)

diff --git a/config/acinclude.m4 b/config/acinclude.m4
index 0abccafa0353..f18f0d6e8c77 100644
--- a/config/acinclude.m4
+++ b/config/acinclude.m4
@@ -238,28 +238,6 @@ fi
 AC_SUBST(BFDLIB)
 ])
 
-dnl 
-dnl Find the libiberty library. This defines many commonly used C
-dnl functions that exists in various states based on the underlying OS.
-AC_DEFUN([CYG_AC_PATH_LIBERTY], [
-AC_MSG_CHECKING(for the liberty library in the build tree)
-dirlist=".. ../../ ../../../ ../../../../ ../../../../../ ../../../../../../ 
../../../../../../.. ../../../../../../../.. ../../../../../../../../.. 
../../../../../../../../../.."
-AC_CACHE_VAL(ac_cv_c_liberty,[
-for i in $dirlist; do
-if test -f "$i/libiberty/Makefile" ; then
-   ac_cv_c_liberty=`(cd $i/libiberty; ${PWDCMD-pwd})`
-fi
-done
-])
-if test x"${ac_cv_c_liberty}" != x; then
-LIBERTY="-L${ac_cv_c_liberty}"
-AC_MSG_RESULT(${ac_cv_c_liberty})
-else
-AC_MSG_RESULT(none)
-fi
-AC_SUBST(LIBERTY)
-])
-
 dnl 
 dnl Find the opcodes library. This is used to do dissasemblies.
 AC_DEFUN([CYG_AC_PATH_OPCODES], [
-- 
2.43.0



[PATCH] Update documents for fcf-protection=

2024-01-09 Thread liuhongt
After r14-2692-g1c6231c05bdcca, the option is defined as EnumSet and
-fcf-protection=branch won't unset any others bits since they're in
different groups. So to override -fcf-protection, an explicit
-fcf-protection=none needs to be added and then with
-fcf-protection=XXX

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

* doc/invoke.texi (fcf-protection=): Update documents.
---
 gcc/doc/invoke.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 68d1f364ac0..d1e6fafb98c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17734,6 +17734,9 @@ function.  The value @code{full} is an alias for 
specifying both
 @code{branch} and @code{return}. The value @code{none} turns off
 instrumentation.
 
+To override @option{-fcf-protection}, @option{-fcf-protection=none}
+needs to be explicitly added and then with @option{-fcf-protection=xxx}.
+
 The value @code{check} is used for the final link with link-time
 optimization (LTO).  An error is issued if LTO object files are
 compiled with different @option{-fcf-protection} values.  The
-- 
2.31.1



Re: [PATCH 0/5] RISC-V: Relax the -march string for accept any order

2024-01-09 Thread Fangrui Song
On Tue, Jan 9, 2024 at 4:59 PM Kito Cheng  wrote:
>
> Oops, I should leave more context here:
>
> Actually we discussed that years ago, and most people agree with that, but I 
> guess we are just missing that, and also the ISA string isn't so terribly 
> long yet at that moment, however...the number of extensions are growth so 
> fast in last year, so I think it's time to moving this forward.
>
> Also we (SiFive) will send patches for clang/LLVM to relax that as well :)
>
> https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/14
>
> On Wed, Jan 10, 2024 at 2:31 AM Jeff Law  wrote:
>>
>>
>>
>> On 1/8/24 06:47, Kito Cheng wrote:
>> >
>> > Do you know how to build a ISA string with following extension?
>> > - g
>> > - c
>> > - zba
>> > - zbs
>> > - svnapot
>> > - zve64d
>> > - zvl128b
>> >
>> > Don't trial and error with your gcc and don't read RISC-V ISA spec! OK, I 
>> > believe it's impossible for most people, even I work for RISC-V so many 
>> > years, I remember most of the rule of the the canonical order, it's still 
>> > hard to order that right in short time...
>> >
>> > So I think it's time to relax that for the -march string inputs, since we 
>> > have so many extension today, but we still keep the canonicalization 
>> > within the compiler, because we need that to handle multi-lib and also 
>> > it's easier to compare different ISA string.
>> >
>> > This patch break into serveral part:
>> > 1) Small refactor patch
>> > 2) Change the way of parsing ISA string.
>> > 3) Remove unused functions
>> > 4) Update test cases
>> > 5) Update document
>> Just because something is hard doesn't necessarily mean we should avoid it.
>>
>> A great example would be strict aliasing.  I'd bet that 90% of C/C++
>> developers would get something wrong in this space.  Similarly for
>> oddities of FP arithmetic.
>>
>> My biggest worry is consistency across various tools.  It's rather lame
>> if GCC were on an island by itself either in being too strict or too loose.
>>
>> So where are the other key tools in this regard?  Are we an outlier
>> right now or will this patch make us an outlier?
>>
>> jeff

If we had fewer extensions, ensuring a canonical order is better as a
code search of a fixed string will retrieve the relevant results, and
I'd wish that we did not lose the strictness.
Now that there are a hundred extensions, I agree that enforcing a
strict order has lost its goodness...


-- 
宋方睿


[PATCH] RISC-V: Minor tweak dynamic cost model

2024-01-09 Thread Juzhe-Zhong
While working on cost model, I notice one case that dynamic lmul cost doesn't 
work well.

Before this patch:

foo:
lui a4,%hi(.LANCHOR0)
li  a0,1953
li  a1,63
addia4,a4,%lo(.LANCHOR0)
li  a3,64
vsetvli a2,zero,e32,mf2,ta,ma
vmv.v.x v5,a0
vmv.v.x v4,a1
vid.v   v3
.L2:
vsetvli a5,a3,e32,mf2,ta,ma
vadd.vi v2,v3,1
vadd.vv v1,v3,v5
mv  a2,a5
vmacc.vvv1,v2,v4
sllia1,a5,2
vse32.v v1,0(a4)
sub a3,a3,a5
add a4,a4,a1
vsetvli a5,zero,e32,mf2,ta,ma
vmv.v.x v1,a2
vadd.vv v3,v3,v1
bne a3,zero,.L2
li  a0,0
ret

Unexpected: Use scalable vector and LMUL = MF2 which is wasting computation 
resources.

Ideally, we should use LMUL = M8 VLS modes.

The root cause is the dynamic LMUL heuristic dominates the VLS heuristic.
Adapt the cost model heuristic.

After this patch:

foo:
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
li  a3,4096
li  a5,32
li  a1,2016
addia2,a4,128
addiw   a3,a3,-32
vsetvli zero,a5,e32,m8,ta,ma
li  a0,0
vid.v   v8
vsll.vi v8,v8,6
vadd.vx v16,v8,a1
vadd.vx v8,v8,a3
vse32.v v16,0(a4)
vse32.v v8,0(a2)
ret

Tested on both RV32/RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): 
Minior tweak.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Fix test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.

---
 gcc/config/riscv/riscv-vector-costs.cc   | 3 ++-
 .../gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c  | 5 ++---
 .../gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c  | 5 ++---
 .../gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c  | 2 +-
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index f4a1a789f23..e53f4a186f3 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -994,7 +994,8 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
 vect_vf_for_cost (other_loop_vinfo));
 
   /* Apply the unrolling heuristic described above m_unrolled_vls_niters.  */
-  if (bool (m_unrolled_vls_stmts) != bool (other->m_unrolled_vls_stmts))
+  if (bool (m_unrolled_vls_stmts) != bool (other->m_unrolled_vls_stmts)
+  && m_cost_type != other->m_cost_type)
 {
   bool this_prefer_unrolled = this->prefer_unrolled_loop ();
   bool other_prefer_unrolled = other->prefer_unrolled_loop ();
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c
index 3ddffa37fe4..89a6c678960 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c
@@ -3,7 +3,7 @@
 
 #include 
 
-#define N 40
+#define N 48
 
 int a[N];
 
@@ -22,7 +22,6 @@ foo (){
   return 0;
 }
 
-/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*8,\s*e32,\s*m2,\s*t[au],\s*m[au]} 1 } } */
 /* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*16,\s*e32,\s*m4,\s*t[au],\s*m[au]} 1 } } */
-/* { dg-final { scan-assembler-times {vsetivli} 2 } } */
+/* { dg-final { scan-assembler-times {vsetivli} 1 } } */
 /* { dg-final { scan-assembler-not {vsetvli} } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c
index 7625ec5c4b1..86732ef2ce5 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c
@@ -3,7 +3,7 @@
 
 #include 
 
-#define N 40
+#define N 64
 
 int a[N];
 
@@ -22,7 +22,6 @@ foo (){
   return 0;
 }
 
-/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*8,\s*e32,\s*m2,\s*t[au],\s*m[au]} 1 } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e32,\s*m8,\s*t[au],\s*m[au]} 1 } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 } } */
+/* { dg-final { scan-assembler-not {vsetivli} } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c
index 7625ec5c4b1..505c4cd2c40 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 --param=riscv-autovec-lmul=m8 

Re: [PATCH v3] LoongArch: testsuite:Added support for vector object detection.

2024-01-09 Thread chenglulu



在 2024/1/10 上午3:51, Andreas Schwab 写道:

gcc: gcc.dg/vect/vect-outer-4a-big-array.c -flto -ffat-lto-objects: error 
executing dg-final: unknown effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4a-big-array.c: error executing dg-final: unknown 
effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4a.c -flto -ffat-lto-objects: error executing 
dg-final: unknown effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4a.c: error executing dg-final: unknown effective 
target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4b-big-array.c -flto -ffat-lto-objects: error 
executing dg-final: unknown effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4b-big-array.c: error executing dg-final: unknown 
effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4b.c -flto -ffat-lto-objects: error executing 
dg-final: unknown effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4b.c: error executing dg-final: unknown effective 
target keyword `loongarch*-*-*'


Sorry, we'll fix it as soon as possible.



[PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-09 Thread HAO CHEN GUI
Hi,
  This patch refactors function expand_compare_loop and split it to two
functions. One is for fixed length and another is for variable length.
These two functions share some low level common help functions.

  Besides above changes, the patch also does:
1. Don't generate load and compare loop when max_bytes is less than
loop bytes.
2. Remove do_load_mask_compare as it's no needed. All sub-targets
entering the function should support efficient overlapping load and
compare.
3. Implement an variable length overlapping load and compare for the
case which remain bytes is less than the loop bytes in variable length
compare. The 4k boundary test and one-byte load and compare loop are
removed as they're no need now.
4. Remove the codes for "bytes > max_bytes" with fixed length as the
case is already excluded by pre-checking.
5. Remove running time codes for "bytes > max_bytes" with variable length
as it should jump to call library at the beginning.
6. Enhance do_overlap_load_compare to avoid overlapping load and compare
when the remain bytes can be loaded and compared by a smaller unit.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Refactor expand_compare_loop and split it to two functions

The original expand_compare_loop has a complicated logical as it's
designed for both fixed and variable length.  This patch splits it to
two functions and make these two functions share common help functions.
Also the 4K boundary test and corresponding one byte load and compare
are replaced by variable length overlapping load and compare.  The
do_load_mask_compare is removed as all sub-targets entering the function
has efficient overlapping load and compare so that mask load is no needed.

gcc/
* config/rs6000/rs6000-string.cc (do_isel): Remove.
(do_load_mask_compare): Remove.
(do_reg_compare): New.
(do_load_and_compare): New.
(do_overlap_load_compare): Do load and compare with a small unit
other than overlapping load and compare when the remain bytes can
be done by one instruction.
(expand_compare_loop): Remove.
(get_max_inline_loop_bytes): New.
(do_load_compare_rest_of_loop): New.
(generate_6432_conversion): Set it to a static function and move
ahead of gen_diff_handle.
(gen_diff_handle): New.
(gen_load_compare_loop): New.
(gen_library_call): New.
(expand_compare_with_fixed_length): New.
(expand_compare_with_variable_length): New.
(expand_block_compare): Call expand_compare_with_variable_length
to expand block compare for variable length.  Call
expand_compare_with_fixed_length to expand block compare loop for
fixed length.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-5.c: New.
* gcc.target/powerpc/block-cmp-6.c: New.
* gcc.target/powerpc/block-cmp-7.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index f707bb2727e..018b87f2501 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -404,21 +404,6 @@ do_ifelse (machine_mode cmpmode, rtx_code comparison,
   LABEL_NUSES (true_label) += 1;
 }

-/* Emit an isel of the proper mode for DEST.
-
-   DEST is the isel destination register.
-   SRC1 is the isel source if CR is true.
-   SRC2 is the isel source if CR is false.
-   CR is the condition for the isel.  */
-static void
-do_isel (rtx dest, rtx cmp, rtx src_t, rtx src_f, rtx cr)
-{
-  if (GET_MODE (dest) == DImode)
-emit_insn (gen_isel_cc_di (dest, cmp, src_t, src_f, cr));
-  else
-emit_insn (gen_isel_cc_si (dest, cmp, src_t, src_f, cr));
-}
-
 /* Emit a subtract of the proper mode for DEST.

DEST is the destination register for the subtract.
@@ -499,65 +484,61 @@ do_rotl3 (rtx dest, rtx src1, rtx src2)
 emit_insn (gen_rotlsi3 (dest, src1, src2));
 }

-/* Generate rtl for a load, shift, and compare of less than a full word.
-
-   LOAD_MODE is the machine mode for the loads.
-   DIFF is the reg for the difference.
-   CMP_REM is the reg containing the remaining bytes to compare.
-   DCOND is the CCUNS reg for the compare if we are doing P9 code with setb.
-   SRC1_ADDR is the first source address.
-   SRC2_ADDR is the second source address.
-   ORIG_SRC1 is the original first source block's address rtx.
-   ORIG_SRC2 is the original second source block's address rtx.  */
+/* Do the compare for two registers.  */
 static void
-do_load_mask_compare (const machine_mode load_mode, rtx diff, rtx cmp_rem, rtx 
dcond,
- rtx src1_addr, rtx src2_addr, rtx orig_src1, rtx 
orig_src2)
+do_reg_compare (bool use_vec, rtx vec_result, rtx diff, rtx *dcond, rtx d1,
+   rtx d2)
 {
-  HOST_WIDE_INT load_mode_size = GET_MODE_SIZE (load_mode);
-  rtx shift_amount = gen_reg_rtx (word_mode);
-  rtx d1 = gen_reg_rtx 

[PATCH V2 3/4][RFC] RISC-V: Use default cost model for insn scheduling for tests affected in PR113249

2024-01-09 Thread Edwin Lu
Use default cost model scheduling on these test cases. All these tests
introduce scan dump failures with -mtune generic-ooo. Since the vector
cost models are the same across all three tunes, some of the tests
in PR113249 will be fixed with this patch series.

Unfortunately, 40 unique testsuite failures (scan dumps) will still be present.
I don't know how optimal the new output is compared to the old. Should I update
the testcase expected output to match the new scan dumps?

PR target/113249

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/bug-1.C: use default tune scheduling
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-12.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-16.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-17.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-19.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-21.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-23.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-25.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-27.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-29.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-31.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-33.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-35.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-4.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-40.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-44.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-50.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-56.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-62.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-68.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-74.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-79.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-8.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-84.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-90.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-96.c: ditto
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-30.c: ditto
* gcc.target/riscv/rvv/base/pr108185-1.c: ditto
* gcc.target/riscv/rvv/base/pr108185-2.c: ditto
* gcc.target/riscv/rvv/base/pr108185-3.c: ditto
* gcc.target/riscv/rvv/base/pr108185-4.c: ditto
* gcc.target/riscv/rvv/base/pr108185-5.c: ditto
* gcc.target/riscv/rvv/base/pr108185-6.c: ditto
* gcc.target/riscv/rvv/base/pr108185-7.c: ditto
* gcc.target/riscv/rvv/base/shift_vx_constraint-1.c: ditto
* gcc.target/riscv/rvv/vsetvl/pr111037-3.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-28.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-29.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-32.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-33.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-17.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-18.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-19.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-11.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-12.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-4.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-5.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-6.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-7.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-8.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-9.c: ditto
* gfortran.dg/vect/vect-8.f90: ditto

Signed-off-by: Edwin Lu 
---
V2: 
- New patch
---
 gcc/testsuite/g++.target/riscv/rvv/base/bug-1.C | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-102.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-108.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-114.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-119.c | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-12.c  | 2 ++
 .../gcc.target/riscv/rvv/base/binop_vx_constraint-16.c  | 2 ++
 

[PATCH V2 1/4][RFC] RISC-V: Add non-vector types to dfa pipelines

2024-01-09 Thread Edwin Lu
This patch adds non-vector related insn reservations and updates/creates
new insn reservations so all non-vector typed instructions have a reservation.

gcc/ChangeLog:

* config/riscv/generic-ooo.md (generic_ooo_sfb_alu): Add reservation
(generic_ooo_branch): ditto
* config/riscv/generic.md (generic_sfb_alu): ditto
(generic_fmul_half): ditto
* config/riscv/riscv.md: Remove cbo, pushpop, and rdfrm types
* config/riscv/sifive-7.md (sifive_7_hfma): Add reservation
(sifive_7_popcount): ditto
* config/riscv/vector.md: change rdfrm to fmove
* config/riscv/zc.md: change pushpop to load/store

Signed-off-by: Edwin Lu 
---
V2:
- Add insn reservations for HF fmul
- Remove/adjust insn types
---
 gcc/config/riscv/generic-ooo.md | 15 +-
 gcc/config/riscv/generic.md | 20 +--
 gcc/config/riscv/riscv.md   | 18 +++
 gcc/config/riscv/sifive-7.md| 17 +-
 gcc/config/riscv/vector.md  |  2 +-
 gcc/config/riscv/zc.md  | 96 -
 6 files changed, 102 insertions(+), 66 deletions(-)

diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
index 421a7bb929d..ef8cb96daf4 100644
--- a/gcc/config/riscv/generic-ooo.md
+++ b/gcc/config/riscv/generic-ooo.md
@@ -115,9 +115,20 @@ (define_insn_reservation "generic_ooo_vec_loadstore_seg" 10
 (define_insn_reservation "generic_ooo_alu" 1
   (and (eq_attr "tune" "generic_ooo")
(eq_attr "type" "unknown,const,arith,shift,slt,multi,auipc,nop,logical,\
-   move,bitmanip,min,max,minu,maxu,clz,ctz"))
+   move,bitmanip,rotate,min,max,minu,maxu,clz,ctz,atomic,\
+   condmove,mvpair,zicond"))
   "generic_ooo_issue,generic_ooo_ixu_alu")
 
+(define_insn_reservation "generic_ooo_sfb_alu" 2
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "sfb_alu"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
+
+;; Branch instructions
+(define_insn_reservation "generic_ooo_branch" 1
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "branch,jump,call,jalr,ret,trap"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
 
 ;; Float move, convert and compare.
 (define_insn_reservation "generic_ooo_float_move" 3
@@ -184,7 +195,7 @@ (define_insn_reservation "generic_ooo_popcount" 2
 (define_insn_reservation "generic_ooo_vec_alu" 3
   (and (eq_attr "tune" "generic_ooo")
(eq_attr "type" 
"vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,\
-   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov"))
+   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov,vector"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector float comparison, conversion etc.
diff --git a/gcc/config/riscv/generic.md b/gcc/config/riscv/generic.md
index b99ae345bb3..45986cfea89 100644
--- a/gcc/config/riscv/generic.md
+++ b/gcc/config/riscv/generic.md
@@ -27,7 +27,9 @@ (define_cpu_unit "fdivsqrt" "pipe0")
 
 (define_insn_reservation "generic_alu" 1
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" 
"unknown,const,arith,shift,slt,multi,auipc,nop,logical,move,bitmanip,min,max,minu,maxu,clz,ctz,cpop"))
+   (eq_attr "type" "unknown,const,arith,shift,slt,multi,auipc,nop,logical,\
+   move,bitmanip,min,max,minu,maxu,clz,ctz,rotate,atomic,\
+   condmove,crypto,mvpair,zicond"))
   "alu")
 
 (define_insn_reservation "generic_load" 3
@@ -47,12 +49,17 @@ (define_insn_reservation "generic_xfer" 3
 
 (define_insn_reservation "generic_branch" 1
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" "branch,jump,call,jalr"))
+   (eq_attr "type" "branch,jump,call,jalr,ret,trap"))
+  "alu")
+
+(define_insn_reservation "generic_sfb_alu" 2
+  (and (eq_attr "tune" "generic")
+   (eq_attr "type" "sfb_alu"))
   "alu")
 
 (define_insn_reservation "generic_imul" 10
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" "imul,clmul"))
+   (eq_attr "type" "imul,clmul,cpop"))
   "imuldiv*10")
 
 (define_insn_reservation "generic_idivsi" 34
@@ -67,6 +74,12 @@ (define_insn_reservation "generic_idivdi" 66
(eq_attr "mode" "DI")))
   "imuldiv*66")
 
+(define_insn_reservation "generic_fmul_half" 5
+  (and (eq_attr "tune" "generic")
+   (and (eq_attr "type" "fadd,fmul,fmadd")
+   (eq_attr "mode" "HF")))
+  "alu")
+
 (define_insn_reservation "generic_fmul_single" 5
   (and (eq_attr "tune" "generic")
(and (eq_attr "type" "fadd,fmul,fmadd")
@@ -88,3 +101,4 @@ (define_insn_reservation "generic_fsqrt" 25
   (and (eq_attr "tune" "generic")
(eq_attr "type" "fsqrt"))
   "fdivsqrt*25")
+
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 84212430dc0..afa15c433d0 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -322,9 +322,7 @@ (define_attr "ext_enabled" "no,yes"
 ;; rotate   rotation instructions
 ;; atomic   atomic instructions
 ;; condmoveconditional 

[PATCH V2 4/4][RFC] RISC-V: Enable assert for insn_has_dfa_reservation

2024-01-09 Thread Edwin Lu
Enables assert that every typed instruction is associated with a
dfa reservation

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): enable assert

Signed-off-by: Edwin Lu 
---
V2:
- No changes
---
 gcc/config/riscv/riscv.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32183d63180..e275fcc2245 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8150,9 +8150,11 @@ riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, 
int more)
 
   /* If we ever encounter an insn without an insn reservation, trip
  an assert so we can find and fix this problem.  */
-#if 0
+  if (! insn_has_dfa_reservation_p (insn)) {
+print_rtl(stderr, insn);
+fprintf(stderr, "%d", get_attr_type (insn));
+  }
   gcc_assert (insn_has_dfa_reservation_p (insn));
-#endif
 
   return more - 1;
 }
-- 
2.34.1



[PATCH V2 0/4][RFC] RISC-V: Associate typed insns to dfa reservation

2024-01-09 Thread Edwin Lu
This series is a prototype for adding all typed instructions to a dfa 
scheduling pipeline.

This is what I currently have for cleaning up the cost models. Adding the 
vector insns to the dfa pipelines changes the expected output of a lot of test
cases as expected. Should I update the expected output of the test cases to
the output of the new cost model? I'm not fully sure which codegen is more
optimal. Please let me know if I should do so and I'll add a patch adjusting
the expected testcase output.

Edwin Lu (4):
  RISC-V: Add non-vector types to dfa pipelines
  RISC-V: Add vector related reservations
  RISC-V: Use default cost model for insn scheduling for tests affected
in PR113249
  RISC-V: Enable assert for insn_has_dfa_reservation

---
V2:
- Update non-vector insn types and add new pipelines
- Add -fno-schedule-insn -fno-schedule-insn2 to some test cases
---

 gcc/config/riscv/generic-ooo.md   |  40 -
 gcc/config/riscv/generic.md   | 163 +-
 gcc/config/riscv/riscv.cc |   6 +-
 gcc/config/riscv/riscv.md |  18 +-
 gcc/config/riscv/sifive-7.md  | 161 -
 gcc/config/riscv/vector.md|   2 +-
 gcc/config/riscv/zc.md|  96 +--
 .../g++.target/riscv/rvv/base/bug-1.C |   2 +
 .../riscv/rvv/autovec/reduc/reduc_call-2.c|   2 +
 .../riscv/rvv/base/binop_vx_constraint-102.c  |   2 +
 .../riscv/rvv/base/binop_vx_constraint-108.c  |   2 +
 .../riscv/rvv/base/binop_vx_constraint-114.c  |   2 +
 .../riscv/rvv/base/binop_vx_constraint-119.c  |   2 +
 .../riscv/rvv/base/binop_vx_constraint-12.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-16.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-17.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-19.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-21.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-23.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-25.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-27.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-29.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-31.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-33.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-35.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-4.c|   2 +
 .../riscv/rvv/base/binop_vx_constraint-40.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-44.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-50.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-56.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-62.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-68.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-74.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-79.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-8.c|   2 +
 .../riscv/rvv/base/binop_vx_constraint-84.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-90.c   |   2 +
 .../riscv/rvv/base/binop_vx_constraint-96.c   |   2 +
 .../rvv/base/float-point-dynamic-frm-30.c |   2 +
 .../gcc.target/riscv/rvv/base/pr108185-1.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-2.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-3.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-4.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-5.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-6.c|   2 +
 .../gcc.target/riscv/rvv/base/pr108185-7.c|   2 +
 .../riscv/rvv/base/shift_vx_constraint-1.c|   2 +
 .../gcc.target/riscv/rvv/vsetvl/pr111037-3.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-28.c |   2 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-29.c |   2 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-32.c |   2 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-33.c |   2 +
 .../riscv/rvv/vsetvl/vlmax_single_block-17.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_single_block-18.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_single_block-19.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-10.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-11.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-12.c  |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-4.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-5.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-6.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-7.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-8.c   |   2 +
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-9.c   |   2 +
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 |   2 +
 65 files changed, 532 insertions(+), 70 deletions(-)

-- 
2.34.1



[PATCH V2 2/4][RFC] RISC-V: Add vector related reservations

2024-01-09 Thread Edwin Lu
This patch copies the vector reservations from generic-ooo.md and
inserts them into generic.md and sifive.md. Creates new vector crypto related
insn reservations.

gcc/ChangeLog:

* config/riscv/generic-ooo.md (generic_ooo_crypto_aes): create 
reservation
(generic_ooo_crypto_sha): ditto
(generic_ooo_crypto_sm): ditto
(generic_ooo_vec_vesetvl): ditto
(generic_ooo_vec_vsetvl): ditto
* config/riscv/generic.md (pipe0): ditto
(generic_vec_load): ditto
(generic_vec_store): ditto
(generic_vec_loadstore_seg): ditto
(generic_vec_alu): ditto
(generic_vec_fcmp): ditto
(generic_vec_imul): ditto
(generic_vec_fadd): ditto
(generic_vec_fmul): ditto
(generic_crypto): ditto
(generic_crypto_aes): ditto
(generic_crypto_sha): ditto
(generic_crypto_sm): ditto
(generic_perm): ditto
(generic_vec_reduction): ditto
(generic_vec_ordered_reduction): ditto
(generic_vec_idiv): ditto
(generic_vec_float_divsqrt): ditto
(generic_vec_mask): ditto
(generic_vec_vesetvl): ditto
(generic_vec_setrm): ditto
(generic_vec_readlen): ditto
* config/riscv/sifive-7.md (sifive_7): ditto
(sifive_7_vec_load): ditto
(sifive_7_vec_store): ditto
(sifive_7_vec_loadstore_seg): ditto
(sifive_7_vec_alu): ditto
(sifive_7_vec_fcmp): ditto
(sifive_7_vec_imul): ditto
(sifive_7_vec_fadd): ditto
(sifive_7_vec_fmul): ditto
(sifive_7_crypto): ditto
(sifive_7_crypto_aes): ditto
(sifive_7_crypto_sha): ditto
(sifive_7_crypto_sm): ditto
(sifive_7_perm): ditto
(sifive_7_vec_reduction): ditto
(sifive_7_vec_ordered_reduction): ditto
(sifive_7_vec_idiv): ditto
(sifive_7_vec_float_divsqrt): ditto
(sifive_7_vec_mask): ditto
(sifive_7_vec_vesetvl): ditto
(sifive_7_vec_setrm): ditto
(sifive_7_vec_readlen): ditto

Signed-off-by: Edwin Lu 
Co-authored-by: Robin Dapp 
---
V2:
- Remove unnecessary syntax changes in generic-ooo
- Add new vector crypto reservations and types to
  pipelines
---
 gcc/config/riscv/generic-ooo.md |  27 +-
 gcc/config/riscv/generic.md | 143 +++
 gcc/config/riscv/sifive-7.md| 144 
 3 files changed, 311 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
index ef8cb96daf4..fb5f34c0ef2 100644
--- a/gcc/config/riscv/generic-ooo.md
+++ b/gcc/config/riscv/generic-ooo.md
@@ -195,7 +195,8 @@ (define_insn_reservation "generic_ooo_popcount" 2
 (define_insn_reservation "generic_ooo_vec_alu" 3
   (and (eq_attr "tune" "generic_ooo")
(eq_attr "type" 
"vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,\
-   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov,vector"))
+   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov,vector,\
+   vandn,vbrev,vbrev8,vrev8,vclz,vctz,vrol,vror,vwsll"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector float comparison, conversion etc.
@@ -209,7 +210,8 @@ (define_insn_reservation "generic_ooo_vec_fcmp" 3
 ;; Vector integer multiplication.
 (define_insn_reservation "generic_ooo_vec_imul" 4
   (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vimul,viwmul,vimuladd,viwmuladd,vsmul"))
+   (eq_attr "type" "vimul,viwmul,vimuladd,viwmuladd,vsmul,vclmul,vclmulh,\
+   vghsh,vgmul"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector float addition.
@@ -230,6 +232,25 @@ (define_insn_reservation "generic_ooo_crypto" 4
(eq_attr "type" "crypto"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
+;; Vector crypto, AES
+(define_insn_reservation "generic_ooo_crypto_aes" 4
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "vaesef,vaesem,vaesdf,vaesdm,vaeskf1,vaeskf2,vaesz"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector crypto, sha
+(define_insn_reservation "generic_ooo_crypto_sha" 4
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "vsha2ms,vsha2ch,vsha2cl"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector crypto, SM3/4
+(define_insn_reservation "generic_ooo_crypto_sm" 4
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "vsm4k,vsm4r,vsm3me,vsm3c"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+
 ;; Vector permute.
 (define_insn_reservation "generic_ooo_perm" 3
   (and (eq_attr "tune" "generic_ooo")
@@ -271,7 +292,7 @@ (define_insn_reservation "generic_ooo_vec_mask" 2
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector vsetvl.
-(define_insn_reservation "generic_ooo_vec_vesetvl" 1
+(define_insn_reservation "generic_ooo_vec_vsetvl" 1
   (and (eq_attr "tune" "generic_ooo")
(eq_attr "type" "vsetvl,vsetvl_pre"))
   

[Committed] RISC-V: Robostify dynamic lmul test

2024-01-09 Thread Juzhe-Zhong
While working on refining the cost model, I notice this test will generate 
unexpected
scalar xor instructions if we don't tune cost model carefully.

Add more assembler to avoid future regression.

Committed.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add assembler-not 
check.

---
 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
index 87e963edc47..38cbefbe625 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
@@ -22,3 +22,4 @@ x264_pixel_8x8 (unsigned char *pix1, unsigned char *pix2, int 
i_stride_pix2)
 }
 
 /* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {xor} } } */
-- 
2.36.3



Re: [PATCH 0/5] RISC-V: Relax the -march string for accept any order

2024-01-09 Thread Kito Cheng
Oops, I should leave more context here:

Actually we discussed that years ago, and most people agree with that, but
I guess we are just missing that, and also the ISA string isn't so
terribly long yet at that moment, however...the number of extensions are
growth so fast in last year, so I think it's time to moving this forward.

Also we (SiFive) will send patches for clang/LLVM to relax that as well :)

https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/14

On Wed, Jan 10, 2024 at 2:31 AM Jeff Law  wrote:

>
>
> On 1/8/24 06:47, Kito Cheng wrote:
> >
> > Do you know how to build a ISA string with following extension?
> > - g
> > - c
> > - zba
> > - zbs
> > - svnapot
> > - zve64d
> > - zvl128b
> >
> > Don't trial and error with your gcc and don't read RISC-V ISA spec! OK,
> I believe it's impossible for most people, even I work for RISC-V so many
> years, I remember most of the rule of the the canonical order, it's still
> hard to order that right in short time...
> >
> > So I think it's time to relax that for the -march string inputs, since
> we have so many extension today, but we still keep the canonicalization
> within the compiler, because we need that to handle multi-lib and also it's
> easier to compare different ISA string.
> >
> > This patch break into serveral part:
> > 1) Small refactor patch
> > 2) Change the way of parsing ISA string.
> > 3) Remove unused functions
> > 4) Update test cases
> > 5) Update document
> Just because something is hard doesn't necessarily mean we should avoid it.
>
> A great example would be strict aliasing.  I'd bet that 90% of C/C++
> developers would get something wrong in this space.  Similarly for
> oddities of FP arithmetic.
>
> My biggest worry is consistency across various tools.  It's rather lame
> if GCC were on an island by itself either in being too strict or too loose.
>
> So where are the other key tools in this regard?  Are we an outlier
> right now or will this patch make us an outlier?
>
> jeff
>


Re: [PATCH v8 1/4] c++: P0847R7 (deducing this) - prerequisite changes. [PR102609]

2024-01-09 Thread waffl3x






On Tuesday, January 9th, 2024 at 3:52 PM, Jason Merrill  
wrote:


> 
> 
> On 1/9/24 17:34, waffl3x wrote:
> 
> > On Tuesday, January 9th, 2024 at 2:56 PM, Jason Merrill ja...@redhat.com 
> > wrote:
> > 
> > 
> > Is the type of an implicit object parameter specified elsewhere? I have
> > looked for it more than once and I could only find that passage, but I
> > guess [over.match.funcs-1] is what I missed here, so
> > [over.match.funcs-4] doesn't apply elsewhere.
> 
> 
> That's the only definition I know of. And indeed my statement was
> wrong, it also affects the considerations in add_method. But I still
> think it would be more complicated in more places to deal with proxy
> FUNCTION_DECLs (like we do for inheriting constructors).

You would know the code base better than I, so you're probably right.
The code in add_method probably already handles it just fine, the only
caveat being that it needs corresponding object parameters when
deciding to discard an overload introduced by a using declaration...
uhoh yeah that might not be okay. This probably results in ambiguous
overload resolution, I'll throw together a test case to see if I can
confirm this...

struct B {
  void f() {}
};

struct S : B {
  using B::f;
  void f(this S&) {}
};

int main() {
  S s{};
  s.f();
}

And yep, you're right, add_method is going to need to care about this
too. In the above test case, the call to f is incorrectly ambiguous.
The overload of f introduced by the using declaration should have been
discarded as the object parameters should correspond.

struct S;

struct B {
  void g(this S&) {}
};

struct S : B {
  using B::g;
  void g() {}
};

int main()
{
  S s{};
  s.g();
}

While this case works correctly, the overload of g introduced by the
using declaration is correctly discarded.

Hopefully it's not too annoying to solve, I'm not exactly sure how to
go about it (other than the method you've already turned down) so I'll
leave it to you.

Alex



[PATCH v2] rs6000: Fix ASAN linker errors for Power ELF V1 ABI [PR113284]

2024-01-09 Thread Ilya Leoshkevich
v1: 
https://inbox.sourceware.org/gcc-patches/20240109105253.332676-1-...@linux.ibm.com/
v1 -> v2: Move the .LASANPC label to the .text section (Jakub).
  Jakub okay-ed this version in the GCC Bugzilla.

Bootstrap and regtest running on ppc64le-redhat-linux and
powerpc64-linux-gnu.  Ok for trunk when successful?



rs6000_elf_declare_function_name () outputs Power ELF V1 ABI function
entry labels without using ASM_OUTPUT_FUNCTION_LABEL ().  As a result,
.LASANPC labels are not emitted, causing linker errors.

In theory, it is possible to reuse ASM_OUTPUT_FUNCTION_LABEL () by
changing rs6000_output_function_entry () to generate label names
without outputting them, but this would be quite a large change.

Instead, factor out the .LASANPC emitting code from
ASM_OUTPUT_FUNCTION_LABEL () and call it manually.

Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using 
ASM_OUTPUT_FUNCTION_LABEL")
Suggested-by: Jakub Jelinek 
Signed-off-by: Ilya Leoshkevich 

gcc/ChangeLog:

PR sanitizer/113284
* config/rs6000/rs6000.cc (rs6000_elf_declare_function_name):
Use assemble_function_label_final () for Power ELF V1 ABI.
* output.h (assemble_function_label_final): New function.
* varasm.cc (assemble_function_label_raw): Use
assemble_function_label_final ().
(assemble_function_label_final): New function.
---
 gcc/config/rs6000/rs6000.cc | 1 +
 gcc/output.h| 4 
 gcc/varasm.cc   | 9 +
 3 files changed, 14 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 94fbf46f2b6..5d975dab921 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -21357,6 +21357,7 @@ rs6000_elf_declare_function_name (FILE *file, const 
char *name, tree decl)
   ASM_DECLARE_RESULT (file, DECL_RESULT (decl));
   rs6000_output_function_entry (file, name);
   fputs (":\n", file);
+  assemble_function_label_final ();
   return;
 }
 
diff --git a/gcc/output.h b/gcc/output.h
index c8fe1d2643d..46b0033b221 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -182,6 +182,10 @@ extern const char *get_fnname_from_decl (tree);
code or data is output after the label.  */
 extern void assemble_function_label_raw (FILE *, const char *);
 
+/* Finish outputting function label.  Needs to be called when outputting
+   function label without using assemble_function_label_raw ().  */
+extern void assemble_function_label_final (void);
+
 /* Output assembler code for the constant pool of a function and associated
with defining the name of the function.  DECL describes the function.
NAME is the function's name.  For the constant pool, we use the current
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 1a869ae458a..2b633822434 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -1843,6 +1843,15 @@ void
 assemble_function_label_raw (FILE *file, const char *name)
 {
   ASM_OUTPUT_LABEL (file, name);
+  assemble_function_label_final ();
+}
+
+/* Finish outputting function label.  Needs to be called when outputting
+   function label without using assemble_function_label_raw ().  */
+
+void
+assemble_function_label_final (void)
+{
   if ((flag_sanitize & SANITIZE_ADDRESS)
   /* Notify ASAN only about the first function label.  */
   && (in_cold_section_p == first_function_block_is_cold)
-- 
2.43.0



[committed] libstdc++: Fix Unicode property detection functions

2024-01-09 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Fix some copy & pasted logic in __is_extended_pictographic. This
function should yield false for the values before the first edge, not
true. Also add a missing boundary condition check in __incb_property.

Also Fix an off-by-one error in _Utf_iterator::operator++() that would
make dereferencing a past-the-end iterator undefined (where the intended
design is that the iterator is always incrementable and dereferenceable,
for better memory safety).

Also simplify the grapheme view iterator, which still contained some
remnants of an earlier design I was experimenting with.

Slightly tweak the gen_libstdcxx_unicode_data.py script so that the
_Gcb_property enumerators are in the order we encounter them in the data
file, instead of sorting them alphabetically. Start with the "Other"
property at value 0, because that's the default property for anything
not in the file. This makes no practical difference, but seems cleaner.
It causes the values in the __gcb_edges table to change, so can only be
done now before anybody is using this code yet. The enumerator values
and table entries become ABI artefacts for the function using them.

contrib/ChangeLog:

* unicode/gen_libstdcxx_unicode_data.py: Print out Gcb_property
enumerators in the order they're seen, not alphabetical order.

libstdc++-v3/ChangeLog:

* include/bits/unicode-data.h: Regenerate.
* include/bits/unicode.h (_Utf_iterator::operator++()): Fix off
by one error.
(__incb_property): Add missing check for values before the
first edge.
(__is_extended_pictographic): Invert return values to fix
copy logic.
(_Grapheme_cluster_view::_Iterator): Remove second iterator
member and find end of cluster lazily.
* testsuite/ext/unicode/grapheme_view.cc: New test.
* testsuite/ext/unicode/properties.cc: New test.
* testsuite/ext/unicode/view.cc: New test.
---
 contrib/unicode/gen_libstdcxx_unicode_data.py |   5 +-
 libstdc++-v3/include/bits/unicode-data.h  | 596 +-
 libstdc++-v3/include/bits/unicode.h   |  51 +-
 .../testsuite/ext/unicode/grapheme_view.cc|  95 +++
 .../testsuite/ext/unicode/properties.cc   | 128 
 libstdc++-v3/testsuite/ext/unicode/view.cc|  30 +
 6 files changed, 581 insertions(+), 324 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/ext/unicode/grapheme_view.cc
 create mode 100644 libstdc++-v3/testsuite/ext/unicode/properties.cc

diff --git a/contrib/unicode/gen_libstdcxx_unicode_data.py 
b/contrib/unicode/gen_libstdcxx_unicode_data.py
index 14491451435..f2f2f8a8ec2 100755
--- a/contrib/unicode/gen_libstdcxx_unicode_data.py
+++ b/contrib/unicode/gen_libstdcxx_unicode_data.py
@@ -122,7 +122,10 @@ for line in open("GraphemeBreakProperty.txt", "r"):
 process_code_points(code_points, grapheme_property.strip())
 
 edges = find_edges(all_code_points)
-gcb_props = {p:i+1 for i,p in enumerate(sorted(set([x[1] for x in edges])))}
+gcb_props = {"Other":0}
+for c, p in edges:
+if p not in gcb_props:
+gcb_props[p] = len(gcb_props)
 shift_bits = int(math.ceil(math.log2(len(gcb_props
 
 # Enum definition for std::__unicode::_Gcb_property
diff --git a/libstdc++-v3/include/bits/unicode-data.h 
b/libstdc++-v3/include/bits/unicode-data.h
index c0c7e7d86ff..83968096499 100644
--- a/libstdc++-v3/include/bits/unicode-data.h
+++ b/libstdc++-v3/include/bits/unicode-data.h
@@ -37,20 +37,20 @@
   };
 
   enum class _Gcb_property {
-_Gcb_CR = 1,
-_Gcb_Control = 2,
-_Gcb_Extend = 3,
-_Gcb_L = 4,
-_Gcb_LF = 5,
-_Gcb_LV = 6,
-_Gcb_LVT = 7,
-_Gcb_Other = 8,
-_Gcb_Prepend = 9,
-_Gcb_Regional_Indicator = 10,
-_Gcb_SpacingMark = 11,
-_Gcb_T = 12,
-_Gcb_V = 13,
-_Gcb_ZWJ = 14,
+_Gcb_Other = 0,
+_Gcb_Control = 1,
+_Gcb_LF = 2,
+_Gcb_CR = 3,
+_Gcb_Extend = 4,
+_Gcb_Prepend = 5,
+_Gcb_SpacingMark = 6,
+_Gcb_L = 7,
+_Gcb_V = 8,
+_Gcb_T = 9,
+_Gcb_ZWJ = 10,
+_Gcb_LV = 11,
+_Gcb_LVT = 12,
+_Gcb_Regional_Indicator = 13,
   };
 
   // Values generated by contrib/unicode/gen_std_format_width.py,
@@ -58,290 +58,290 @@
   // Entries are (code_point << shift_bits) + property.
   inline constexpr int __gcb_shift_bits = 0x4;
   inline constexpr uint32_t __gcb_edges[] = {
-0x2, 0xa5, 0xb2, 0xd1, 0xe2, 0x208,
-0x7f2, 0xa08, 0xad2, 0xae8, 0x3003, 0x3708,
-0x4833, 0x48a8, 0x5913, 0x5be8, 0x5bf3, 0x5c08,
-0x5c13, 0x5c38, 0x5c43, 0x5c68, 0x5c73, 0x5c88,
-0x6009, 0x6068, 0x6103, 0x61b8, 0x61c2, 0x61d8,
-0x64b3, 0x6608, 0x6703, 0x6718, 0x6d63, 0x6dd9,
-0x6de8, 0x6df3, 0x6e58, 0x6e73, 0x6e98, 0x6ea3,
-0x6ee8, 0x70f9, 0x7108, 0x7113, 0x7128, 0x7303,
-0x74b8, 0x7a63, 0x7b18, 0x7eb3, 0x7f48, 0x7fd3,
-0x7fe8, 0x8163, 0x81a8, 0x81b3, 0x8248, 0x8253,
-0x8288, 0x8293, 0x82e8, 0x8593, 0x85c8, 0x8909,
-0x8928, 0x8983, 0x8a08, 

[r14-7033 Regression] FAIL: g++.dg/gomp/bad-array-section-4.C -std=c++98 at line 37 (test for warnings, line 35) on Linux/x86_64

2024-01-09 Thread haochen.jiang
On Linux/x86_64,

1413af02d62182bc1e19698aaa4dae406f8f13bf is the first bad commit
commit 1413af02d62182bc1e19698aaa4dae406f8f13bf
Author: Julian Brown 
Date:   Mon Sep 12 17:11:29 2022 +

OpenMP: lvalue parsing for map/to/from clauses (C++)

caused

FAIL: g++.dg/gomp/array-section-1.C  -std=c++14  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-1.C  -std=c++17  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-1.C  -std=c++20  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-2.C  -std=c++14  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-2.C  -std=c++14  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-2.C  -std=c++17  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-2.C  -std=c++17  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-2.C  -std=c++20  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-2.C  -std=c++20  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
FAIL: g++.dg/gomp/bad-array-section-4.C  -std=c++14  at line 37 (test for 
warnings, line 35)
FAIL: g++.dg/gomp/bad-array-section-4.C  -std=c++17  at line 37 (test for 
warnings, line 35)
FAIL: g++.dg/gomp/bad-array-section-4.C  -std=c++20  at line 37 (test for 
warnings, line 35)
FAIL: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 37 (test for 
warnings, line 35)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-7033/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=g++.dg/gomp/array-section-1.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=g++.dg/gomp/array-section-1.C --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=g++.dg/gomp/array-section-2.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=g++.dg/gomp/array-section-2.C --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=g++.dg/gomp/bad-array-section-4.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=g++.dg/gomp/bad-array-section-4.C 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with 

Re: [PATCH 1/8] OpenMP: lvalue parsing for map/to/from clauses (C++)

2024-01-09 Thread Thomas Schwinge
Hi Julian!

On 2024-01-07T16:04:37+0100, Tobias Burnus  wrote:
> Am 05.01.24 um 13:23 schrieb Julian Brown:
>> Here's a rebased/retested version [...]

> LGTM - [...]

Got pushed as commit r14-7033-g1413af02d62182bc1e19698aaa4dae406f8f13bf
"OpenMP: lvalue parsing for map/to/from clauses (C++)".

Some (hopefully minor) tuning in the test cases is necessary; for
example, for x86_64-pc-linux-gnu '-m32' testing, I see a few FAILs:

+PASS: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[1\\] [len: x != 0 ? [0-9]+ : [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: [0-9]+\\]\\)"
+PASS: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[1\\] \\[len: x != 0 \\? [0-9]+ : [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: [0-9]+\\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+PASS: g++.dg/gomp/array-section-1.C  -std=c++98 (test for excess errors)

Etc.

+PASS: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[0\\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: 0\\]\\)"
+PASS: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[0\\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: 0\\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+PASS: g++.dg/gomp/array-section-2.C  -std=c++98 (test for excess errors)

Etc.

+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 15 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 16 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 17 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 22 (test for 
warnings, line 21)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 36 (test for 
errors, line 35)
+FAIL: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 37 (test for 
warnings, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 38 (test for 
errors, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 39 (test for 
errors, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 44 (test for 
warnings, line 43)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98 (test for excess 
errors)

Etc.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v8 1/4] c++: P0847R7 (deducing this) - prerequisite changes. [PR102609]

2024-01-09 Thread Jason Merrill

On 1/9/24 17:34, waffl3x wrote:

On Tuesday, January 9th, 2024 at 2:56 PM, Jason Merrill  
wrote:

On 1/9/24 15:58, Jason Merrill wrote:

On 1/6/24 19:00, waffl3x wrote:


Bootstrapped and tested on x86_64-linux with no regressions.

I'm considering this finished, I have CWG2586 working but I have not
included it in this version of the patch. I was not happy with the
amount of work I had done on it. I will try to get it finished before
we get cut off, and I'm pretty sure I can. I just don't want to risk
missing the boat for the whole patch just for that.

There aren't too many changes from v7, it's mostly just cleaned up.
There are a few though, so do take a look, if there's anything severe I
can rush to fix it if necessary.

That's all, hopefully all is good, fingers crossed.


Great. Given where we are in the release cycle, I'm thinking to put
these with only minimal changes and then do any further adjustments
afterward.

For this first one, I needed to fix the commit message to wrap at 75
columns so that it fits in 80 columns with the initial padding added by
'git log'. I also needed to adjust the ChangeLog entry to please git
gcc-verify. And I changed the credit note to a Co-authored-by.

The other commit messages only needed wrapping.


I just pushed your patches along with these two follow-ons:


Is the type of an implicit object parameter specified elsewhere? I have
looked for it more than once and I could only find that passage, but I
guess [over.match.funcs-1] is what I missed here, so
[over.match.funcs-4] doesn't apply elsewhere.


That's the only definition I know of.  And indeed my statement was 
wrong, it also affects the considerations in add_method.  But I still 
think it would be more complicated in more places to deal with proxy 
FUNCTION_DECLs (like we do for inheriting constructors).



I have no objection to changing the comment in
iobj_xobj_parameters_correspond if you have a better idea on how to fix
the bug that inspired that comment. Especially since the standard is on
your side for it.

In this case, perhaps the comment should instead say something along
the lines of "this function does not account for using declarations."
That does get a little hairy though if both an iobj and xobj member
function are introduced by a using declaration. Perhaps we should just
optionally take a type that we can compare against? That sounds really
icky, maybe the solution you have in mind will work just fine.

BTW, I definitely assumed TYPE_MAIN_VARIANT was what I wanted in other
places in the patch, so there might be latent errors waiting to happen.
It's probably fine because those areas where I used them are pretty
well tested, but I thought it would be worth mentioning just in case.


The uses in copy_fn_p seem a little dubious, but there were already 
similar ones there, so I think it should be OK.



I'm very excited to see this get through, thanks again for all the help
and patience.


Thank you for all your work!

Jason



Re: [PATCH] PR target/112886, Add %S to print_operand for vector pair support

2024-01-09 Thread Peter Bergner
On 1/5/24 4:18 PM, Michael Meissner wrote:
> @@ -14504,13 +14504,17 @@ print_operand (FILE *file, rtx x, int code)
>   print_operand (file, x, 0);
>return;
>  
> +case 'S':
>  case 'x':
> -  /* X is a FPR or Altivec register used in a VSX context.  */
> +  /* X is a FPR or Altivec register used in a VSX context.  %x prints
> +  the VSX register number, %S prints the 2nd register number for
> +  vector pair, decimal 128-bit floating and IBM 128-bit binary floating
> +  values.  */
>if (!REG_P (x) || !VSX_REGNO_P (REGNO (x)))
> - output_operand_lossage ("invalid %%x value");
> + output_operand_lossage ("invalid %%%c value", (code == 'S' ? 'S' : 
> 'x'));
>else
>   {
> -   int reg = REGNO (x);
> +   int reg = REGNO (x) + (code == 'S' ? 1 : 0);
> int vsx_reg = (FP_REGNO_P (reg)
>? reg - 32
>: reg - FIRST_ALTIVEC_REGNO + 32);

The above looks good to me.  However:


> +: "=v" (*p)
> +: "v" (*q), "v" (*r));

These really should use "wa" rather than "v", since these are
VSX instructions... or did you use those to ensure you got
Altivec registers numbers assigned?



> +/* { dg-final { scan-assembler-times {\mxvadddp 
> (3[2-9]|[45][0-9]|6[0-3]),(3[2-9]|[45][0-9]|6[0-3]),(3[2-9]|[45][0-9]|6[0-3])\M}
>  2 } } */

...and this is really ugly and hard to read/understand.  Can't we use
register variables to make it simpler?  Something like the following
which tests having both FPR and Altivec reg numbers assigned?

...
void
test (__vector_pair *ptr)
{
  register __vector_pair p asm ("vs10");
  register __vector_pair q asm ("vs42");
  register __vector_pair r asm ("vs44");
  q = ptr[1];
  r = ptr[2];
  __asm__ ("xvadddp %x0,%x1,%x2\n\txvadddp %S0,%S1,%S2"
   : "=wa" (p)
   : "wa" (q), "wa" (r));
  ptr[2] = p;
}

/* { dg-final { scan-assembler-times {\mxvadddp 10,42,44\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvadddp 11,43,45\M} 1 } } */
...

Peter



Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-09 Thread 钟居哲
Yes. I aggree with you that we should wait until all theadvector are acccepted.

Thanks.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2024-01-10 01:49
To: 钟居哲; cooper.joshua; gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; jinma; 
Cooper Qu
Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
 
 
On 1/8/24 16:04, 钟居哲 wrote:
> This patch looks ok from myside.
Likewise.
 
So I think the only question for this specific patch is whether or not 
it makes sense to include it now or wait for more of the thead bits to 
get to acceptance.
 
I tend to think it should wait since I don't think it has any value 
without the rest of the thead vector changes and it's not 100% clear if 
those changes are going to make it into gcc-14 or not.
 
Jeff
 


Re: [PATCH v8 1/4] c++: P0847R7 (deducing this) - prerequisite changes. [PR102609]

2024-01-09 Thread waffl3x






On Tuesday, January 9th, 2024 at 2:56 PM, Jason Merrill  
wrote:


> 
> 
> On 1/9/24 15:58, Jason Merrill wrote:
> 
> > On 1/6/24 19:00, waffl3x wrote:
> > 
> > > Bootstrapped and tested on x86_64-linux with no regressions.
> > > 
> > > I'm considering this finished, I have CWG2586 working but I have not
> > > included it in this version of the patch. I was not happy with the
> > > amount of work I had done on it. I will try to get it finished before
> > > we get cut off, and I'm pretty sure I can. I just don't want to risk
> > > missing the boat for the whole patch just for that.
> > > 
> > > There aren't too many changes from v7, it's mostly just cleaned up.
> > > There are a few though, so do take a look, if there's anything severe I
> > > can rush to fix it if necessary.
> > > 
> > > That's all, hopefully all is good, fingers crossed.
> > 
> > Great. Given where we are in the release cycle, I'm thinking to put
> > these with only minimal changes and then do any further adjustments
> > afterward.
> > 
> > For this first one, I needed to fix the commit message to wrap at 75
> > columns so that it fits in 80 columns with the initial padding added by
> > 'git log'. I also needed to adjust the ChangeLog entry to please git
> > gcc-verify. And I changed the credit note to a Co-authored-by.
> > 
> > The other commit messages only needed wrapping.
> 
> 
> I just pushed your patches along with these two follow-ons:

Is the type of an implicit object parameter specified elsewhere? I have
looked for it more than once and I could only find that passage, but I
guess [over.match.funcs-1] is what I missed here, so
[over.match.funcs-4] doesn't apply elsewhere.

I have no objection to changing the comment in
iobj_xobj_parameters_correspond if you have a better idea on how to fix
the bug that inspired that comment. Especially since the standard is on
your side for it.

In this case, perhaps the comment should instead say something along
the lines of "this function does not account for using declarations."
That does get a little hairy though if both an iobj and xobj member
function are introduced by a using declaration. Perhaps we should just
optionally take a type that we can compare against? That sounds really
icky, maybe the solution you have in mind will work just fine.

BTW, I definitely assumed TYPE_MAIN_VARIANT was what I wanted in other
places in the patch, so there might be latent errors waiting to happen.
It's probably fine because those areas where I used them are pretty
well tested, but I thought it would be worth mentioning just in case.

I'm very excited to see this get through, thanks again for all the help
and patience.

Alex


Re: [PATCH] Fix spurious match in extract_symvers

2024-01-09 Thread Jonathan Wakely
On Tue, 9 Jan 2024 at 21:47, Andreas Schwab wrote:
>
> Tighten the regex to find the start of the .dynsym symtab in the readelf
> output to avoid matching the section symbol in the normal symtab.

OK, thanks.


>
> libstdc++-v3:
> * scripts/extract_symvers.in: Require final colon to only match
> .dsynsym in the header of the dynamic symtab.
> ---
>  libstdc++-v3/scripts/extract_symvers.in | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/scripts/extract_symvers.in 
> b/libstdc++-v3/scripts/extract_symvers.in
> index 17f0d31bd1c..6bb951c7145 100755
> --- a/libstdc++-v3/scripts/extract_symvers.in
> +++ b/libstdc++-v3/scripts/extract_symvers.in
> @@ -52,7 +52,7 @@ SunOS)
># Omit _DYNAMIC etc. for consistency with extract_symvers.pl, only
># present on Solaris.
>${readelf} ${lib} |\
> -  sed -e 's/ \[: [A-Fa-f0-9]*\] //' -e '/\.dynsym/,/^$/p;d' |\
> +  sed -e 's/ \[: [A-Fa-f0-9]*\] //' -e '/\.dynsym.*:$/,/^$/p;d' |\
>sed -e 's/ \[: [0-9]*\] //' |\
>grep -E -v ' (LOCAL|UND) ' |\
>grep -E -v ' 
> (_DYNAMIC|_GLOBAL_OFFSET_TABLE_|_PROCEDURE_LINKAGE_TABLE_|_edata|_end|_etext)$'
>  |\
> --
> 2.43.0
>
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
>



[PATCH] libstdc++: Prefer posix_memalign for aligned-new [PR113258]

2024-01-09 Thread Jonathan Wakely
Does anybody see any problem with making this change, so that we avoid
the problem described in the PR?

-- >8 --

As described in PR libstdc++/113258 there are old versions of tcmalloc
which replace malloc and related APIs, but do not repalce aligned_alloc
because it didn't exist at the time they were released. This means that
when operator new(size_t, align_val_t) uses aligned_alloc to obtain
memory, it comes from libc's aligned_alloc not from tcmalloc. But when
operator delete(void*, size_t, align_val_t) uses free to deallocate the
memory, that goes to tcmalloc's replacement version of free, which
doesn't know how to free it.

If we give preference to the older posix_memalign instead of
aligned_alloc then we're more likely to use a function that will be
compatible with the replacement version of free. Because posix_memalign
has been around for longer, it's more likely that old third-party malloc
replacements will also replace posix_memalign alongside malloc and free.

libstdc++-v3/ChangeLog:

PR libstdc++/113258
* libsupc++/new_opa.cc: Prefer to use posix_memalign if
available.
---
 libstdc++-v3/libsupc++/new_opa.cc | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/libsupc++/new_opa.cc 
b/libstdc++-v3/libsupc++/new_opa.cc
index 8326b7497fe..35606e1c1b3 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -46,12 +46,12 @@ using std::bad_alloc;
 using std::size_t;
 extern "C"
 {
-# if _GLIBCXX_HAVE_ALIGNED_ALLOC
+# if _GLIBCXX_HAVE_POSIX_MEMALIGN
+  void *posix_memalign(void **, size_t alignment, size_t size);
+# elif _GLIBCXX_HAVE_ALIGNED_ALLOC
   void *aligned_alloc(size_t alignment, size_t size);
 # elif _GLIBCXX_HAVE__ALIGNED_MALLOC
   void *_aligned_malloc(size_t size, size_t alignment);
-# elif _GLIBCXX_HAVE_POSIX_MEMALIGN
-  void *posix_memalign(void **, size_t alignment, size_t size);
 # elif _GLIBCXX_HAVE_MEMALIGN
   void *memalign(size_t alignment, size_t size);
 # else
@@ -63,13 +63,10 @@ extern "C"
 #endif
 
 namespace __gnu_cxx {
-#if _GLIBCXX_HAVE_ALIGNED_ALLOC
-using ::aligned_alloc;
-#elif _GLIBCXX_HAVE__ALIGNED_MALLOC
-static inline void*
-aligned_alloc (std::size_t al, std::size_t sz)
-{ return _aligned_malloc(sz, al); }
-#elif _GLIBCXX_HAVE_POSIX_MEMALIGN
+// Prefer posix_memalign if available, because it's older than aligned_alloc
+// and so more likely to be provided by replacement malloc libraries that
+// predate the addition of aligned_alloc. See PR libstdc++/113258.
+#if _GLIBCXX_HAVE_POSIX_MEMALIGN
 static inline void*
 aligned_alloc (std::size_t al, std::size_t sz)
 {
@@ -83,6 +80,12 @@ aligned_alloc (std::size_t al, std::size_t sz)
 return ptr;
   return nullptr;
 }
+#elif _GLIBCXX_HAVE_ALIGNED_ALLOC
+using ::aligned_alloc;
+#elif _GLIBCXX_HAVE__ALIGNED_MALLOC
+static inline void*
+aligned_alloc (std::size_t al, std::size_t sz)
+{ return _aligned_malloc(sz, al); }
 #elif _GLIBCXX_HAVE_MEMALIGN
 static inline void*
 aligned_alloc (std::size_t al, std::size_t sz)
@@ -128,7 +131,8 @@ operator new (std::size_t sz, std::align_val_t al)
   if (__builtin_expect (sz == 0, false))
 sz = 1;
 
-#if _GLIBCXX_HAVE_ALIGNED_ALLOC
+#if _GLIBCXX_HAVE_POSIX_MEMALIGN
+#elif _GLIBCXX_HAVE_ALIGNED_ALLOC
 # if defined _AIX || defined __APPLE__
   /* AIX 7.2.0.0 aligned_alloc incorrectly has posix_memalign's requirement
* that alignment is a multiple of sizeof(void*).
-- 
2.43.0



Re: [PATCH v8 1/4] c++: P0847R7 (deducing this) - prerequisite changes. [PR102609]

2024-01-09 Thread Jason Merrill

On 1/9/24 15:58, Jason Merrill wrote:

On 1/6/24 19:00, waffl3x wrote:

Bootstrapped and tested on x86_64-linux with no regressions.

I'm considering this finished, I have CWG2586 working but I have not
included it in this version of the patch. I was not happy with the
amount of work I had done on it. I will try to get it finished before
we get cut off, and I'm pretty sure I can. I just don't want to risk
missing the boat for the whole patch just for that.

There aren't too many changes from v7, it's mostly just cleaned up.
There are a few though, so do take a look, if there's anything severe I
can rush to fix it if necessary.

That's all, hopefully all is good, fingers crossed.


Great.  Given where we are in the release cycle, I'm thinking to put 
these with only minimal changes and then do any further adjustments 
afterward.


For this first one, I needed to fix the commit message to wrap at 75 
columns so that it fits in 80 columns with the initial padding added by 
'git log'.  I also needed to adjust the ChangeLog entry to please git 
gcc-verify.  And I changed the credit note to a Co-authored-by.


The other commit messages only needed wrapping.


I just pushed your patches along with these two follow-ons:From 5a6d3b1737843aa64d83ffc5d639fa0afa5d8318 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Tue, 9 Jan 2024 16:00:52 -0500
Subject: [PATCH 1/2] c++: explicit object cleanups
To: gcc-patches@gcc.gnu.org

The FIXME in xobj_iobj_parameters_correspond was due to expecting
TYPE_MAIN_VARIANT to be the same for all equivalent types, which is not the
case.  And I adjusted some comments that I disagree with; the iobj parameter
adjustment only applies to overload resolution, we can handle that in
cand_parms_match (and I have WIP for that).

gcc/cp/ChangeLog:

	* call.cc (build_over_call): Refactor handle_arg lambda.
	* class.cc (xobj_iobj_parameters_correspond): Fix FIXME.
	* method.cc (defaulted_late_check): Adjust comments.
---
 gcc/cp/call.cc   | 24 
 gcc/cp/class.cc  | 40 
 gcc/cp/method.cc |  7 +--
 3 files changed, 25 insertions(+), 46 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index dca8e5090e2..7d3d67600c8 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10187,11 +10187,11 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain)
   parm = TREE_CHAIN (parm);
 }
 
-  auto handle_arg = [fn, flags, complain](tree type,
-	  tree arg,
-	  int const param_index,
-	  conversion *conv,
-	  bool const conversion_warning)
+  auto handle_arg = [fn, flags](tree type,
+tree arg,
+int const param_index,
+conversion *conv,
+tsubst_flags_t const arg_complain)
 {
   /* Set user_conv_p on the argument conversions, so rvalue/base handling
 	 knows not to allow any more UDCs.  This needs to happen after we
@@ -10199,9 +10199,6 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain)
   if (flags & LOOKUP_NO_CONVERSION)
 	conv->user_conv_p = true;
 
-  tsubst_flags_t const arg_complain
-	= conversion_warning ? complain : complain & ~tf_warning;
-
   if (arg_complain & tf_warning)
 	maybe_warn_pessimizing_move (arg, type, /*return_p=*/false);
 
@@ -10214,13 +10211,12 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain)
   if (DECL_XOBJ_MEMBER_FUNCTION_P (fn))
 {
   gcc_assert (cand->num_convs > 0);
-  static constexpr bool conversion_warning = true;
   tree object_arg = consume_object_arg ();
   val = handle_arg (TREE_VALUE (parm),
 			object_arg,
 			param_index++,
 			convs[conv_index++],
-			conversion_warning);
+			complain);
 
   if (val == error_mark_node)
 	return error_mark_node;
@@ -10260,11 +10256,14 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain)
 	&& cand->template_decl
 	&& !cand->explicit_targs);
 
+  tsubst_flags_t const arg_complain
+	= conversion_warning ? complain : complain & ~tf_warning;
+
   val = handle_arg (TREE_VALUE (parm),
 			current_arg,
 			param_index,
 			convs[conv_index],
-			conversion_warning);
+			arg_complain);
 
   if (val == error_mark_node)
 	return error_mark_node;
@@ -10273,7 +10272,8 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain)
 }
 
   /* Default arguments */
-  for (; parm && parm != void_list_node; parm = TREE_CHAIN (parm), param_index++)
+  for (; parm && parm != void_list_node;
+   parm = TREE_CHAIN (parm), param_index++)
 {
   if (TREE_VALUE (parm) == error_mark_node)
 	return error_mark_node;
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index f3cfa9f0f23..e5e609badf3 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -1020,9 +1020,12 @@ modify_vtable_entry (tree t,
 
 
 /* Check if the object parameters of an xobj and iobj member function
-   correspond. This function assumes that the iobj parameter has been 

[PATCH] Fix spurious match in extract_symvers

2024-01-09 Thread Andreas Schwab
Tighten the regex to find the start of the .dynsym symtab in the readelf
output to avoid matching the section symbol in the normal symtab.

libstdc++-v3:
* scripts/extract_symvers.in: Require final colon to only match
.dsynsym in the header of the dynamic symtab.
---
 libstdc++-v3/scripts/extract_symvers.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/scripts/extract_symvers.in 
b/libstdc++-v3/scripts/extract_symvers.in
index 17f0d31bd1c..6bb951c7145 100755
--- a/libstdc++-v3/scripts/extract_symvers.in
+++ b/libstdc++-v3/scripts/extract_symvers.in
@@ -52,7 +52,7 @@ SunOS)
   # Omit _DYNAMIC etc. for consistency with extract_symvers.pl, only
   # present on Solaris.
   ${readelf} ${lib} |\
-  sed -e 's/ \[: [A-Fa-f0-9]*\] //' -e '/\.dynsym/,/^$/p;d' |\
+  sed -e 's/ \[: [A-Fa-f0-9]*\] //' -e '/\.dynsym.*:$/,/^$/p;d' |\
   sed -e 's/ \[: [0-9]*\] //' |\
   grep -E -v ' (LOCAL|UND) ' |\
   grep -E -v ' 
(_DYNAMIC|_GLOBAL_OFFSET_TABLE_|_PROCEDURE_LINKAGE_TABLE_|_edata|_end|_etext)$' 
|\
-- 
2.43.0


-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH v8 1/4] c++: P0847R7 (deducing this) - prerequisite changes. [PR102609]

2024-01-09 Thread waffl3x
On Tuesday, January 9th, 2024 at 1:58 PM, Jason Merrill  
wrote:


> 
> 
> On 1/6/24 19:00, waffl3x wrote:
> 
> > Bootstrapped and tested on x86_64-linux with no regressions.
> > 
> > I'm considering this finished, I have CWG2586 working but I have not
> > included it in this version of the patch. I was not happy with the
> > amount of work I had done on it. I will try to get it finished before
> > we get cut off, and I'm pretty sure I can. I just don't want to risk
> > missing the boat for the whole patch just for that.
> > 
> > There aren't too many changes from v7, it's mostly just cleaned up.
> > There are a few though, so do take a look, if there's anything severe I
> > can rush to fix it if necessary.
> > 
> > That's all, hopefully all is good, fingers crossed.
> 
> 
> Great. Given where we are in the release cycle, I'm thinking to put
> these with only minimal changes and then do any further adjustments
> afterward.
> 
> For this first one, I needed to fix the commit message to wrap at 75
> columns so that it fits in 80 columns with the initial padding added by
> 'git log'. I also needed to adjust the ChangeLog entry to please git
> gcc-verify. And I changed the credit note to a Co-authored-by.
> 
> The other commit messages only needed wrapping.
> 
> Thanks!
> 
> Jason

Sounds good to me, it's been a pleasure working with you. If there's
anything you would like me to do differently for future patches just
let me know. For now, I think I will take a break.

Alex


Re: [PATCH v8 1/4] c++: P0847R7 (deducing this) - prerequisite changes. [PR102609]

2024-01-09 Thread Jason Merrill

On 1/6/24 19:00, waffl3x wrote:

Bootstrapped and tested on x86_64-linux with no regressions.

I'm considering this finished, I have CWG2586 working but I have not
included it in this version of the patch. I was not happy with the
amount of work I had done on it. I will try to get it finished before
we get cut off, and I'm pretty sure I can. I just don't want to risk
missing the boat for the whole patch just for that.

There aren't too many changes from v7, it's mostly just cleaned up.
There are a few though, so do take a look, if there's anything severe I
can rush to fix it if necessary.

That's all, hopefully all is good, fingers crossed.


Great.  Given where we are in the release cycle, I'm thinking to put 
these with only minimal changes and then do any further adjustments 
afterward.


For this first one, I needed to fix the commit message to wrap at 75 
columns so that it fits in 80 columns with the initial padding added by 
'git log'.  I also needed to adjust the ChangeLog entry to please git 
gcc-verify.  And I changed the credit note to a Co-authored-by.


The other commit messages only needed wrapping.

Thanks!

Jason



Re: [PATCH v4] AArch64: Cleanup memset expansion

2024-01-09 Thread Wilco Dijkstra
Hi Richard,

>> +#define MAX_SET_SIZE(speed) (speed ? 256 : 96)
>
> Since this isn't (AFAIK) a standard macro, there doesn't seem to be
> any need to put it in the header file.  It could just go at the head
> of aarch64.cc instead.

Sure, I've moved it in v4.

>> +  if (len <= 24 || (aarch64_tune_params.extra_tuning_flags
>> +   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
>> +    set_max = 16;
>
> I think we should take the tuning parameter into account when applying
> the MAX_SET_SIZE limit for -Os.  Shouldn't it be 48 rather than 96 in
> that case?  (Alternatively, I suppose it would make sense to ignore
> the param for -Os, although we don't seem to do that elsewhere.)

That tune is only used by an obsolete core. I ran the memcpy and memset
benchmarks from Optimized Routines on xgene-1 with and without LDP/STP.
There is no measurable penalty for using LDP/STP. I'm not sure why it was
ever added given it does not do anything useful. I'll post a separate patch
to remove it to reduce the maintenance overhead.

Cheers,
Wilco


Here is v4 (move MAX_SET_SIZE definition to aarch64.cc):

Cleanup memset implementation.  Similar to memcpy/memmove, use an offset and
bytes throughout.  Simplify the complex calculations when optimizing for size
by using a fixed limit.

Passes regress/bootstrap, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64.cc (MAX_SET_SIZE): New define.
(aarch64_progress_pointer): Remove function.
(aarch64_set_one_block_and_progress_pointer): Simplify and clean up.
(aarch64_expand_setmem): Clean up implementation, use byte offsets,
simplify size calculation.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
a5a6b52730d6c5013346d128e89915883f1707ae..62f4eee429c1c5195d54604f1d341a8a5a499d89
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -101,6 +101,10 @@
 /* Defined for convenience.  */
 #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
 
+/* Maximum bytes set for an inline memset expansion.  With -Os use 3 STP
+   and 1 MOVI/DUP (same size as a call).  */
+#define MAX_SET_SIZE(speed) (speed ? 256 : 96)
+
 /* Flags that describe how a function shares certain architectural state
with its callers.
 
@@ -26321,15 +26325,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amount)
next, amount);
 }
 
-/* Return a new RTX holding the result of moving POINTER forward by the
-   size of the mode it points to.  */
-
-static rtx
-aarch64_progress_pointer (rtx pointer)
-{
-  return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
-}
-
 typedef auto_vec, 12> copy_ops;
 
 /* Copy one block of size MODE from SRC to DST at offset OFFSET.  */
@@ -26484,45 +26479,21 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove)
   return true;
 }
 
-/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
-   SRC is a register we have created with the duplicated value to be set.  */
+/* Set one block of size MODE at DST at offset OFFSET to value in SRC.  */
 static void
-aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
-   machine_mode mode)
+aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode)
 {
-  /* If we are copying 128bits or 256bits, we can do that straight from
- the SIMD register we prepared.  */
-  if (known_eq (GET_MODE_BITSIZE (mode), 256))
-{
-  mode = GET_MODE (src);
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memset.  */
-  emit_insn (aarch64_gen_store_pair (*dst, src, src));
-
-  /* Move the pointers forward.  */
-  *dst = aarch64_move_pointer (*dst, 32);
-  return;
-}
-  if (known_eq (GET_MODE_BITSIZE (mode), 128))
+  /* Emit explict store pair instructions for 32-byte writes.  */
+  if (known_eq (GET_MODE_SIZE (mode), 32))
 {
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, GET_MODE (src), 0);
-  /* Emit the memset.  */
-  emit_move_insn (*dst, src);
-  /* Move the pointers forward.  */
-  *dst = aarch64_move_pointer (*dst, 16);
+  mode = V16QImode;
+  rtx dst1 = adjust_address (dst, mode, offset);
+  emit_insn (aarch64_gen_store_pair (dst1, src, src));
   return;
 }
-  /* For copying less, we have to extract the right amount from src.  */
-  rtx reg = lowpart_subreg (mode, src, GET_MODE (src));
-
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memset.  */
-  emit_move_insn (*dst, reg);
-  /* Move the pointer forward.  */
-  *dst = aarch64_progress_pointer (*dst);
+  if (known_lt (GET_MODE_SIZE (mode), 16))
+src = lowpart_subreg (mode, src, GET_MODE (src));
+  emit_move_insn (adjust_address (dst, mode, offset), src);
 }
 
 /* Expand a setmem using the MOPS instructions.  

Re: [PATCH] aarch64: Fix dwarf2cfi ICEs due to recent CFI note changes [PR113077]

2024-01-09 Thread Richard Sandiford
Alex Coplan  writes:
> Hi,
>
> In r14-6604-gd7ee988c491cde43d04fe25f2b3dbad9d85ded45 we changed the CFI notes
> attached to callee saves (in aarch64_save_callee_saves).  That patch changed
> the ldp/stp representation to use unspecs instead of PARALLEL moves.  This 
> meant
> that we needed to attach CFI notes to all frame-related pair saves such that
> dwarf2cfi could still emit the appropriate CFI (it cannot interpret the 
> unspecs
> directly).  The patch also attached REG_CFA_OFFSET notes to individual saves 
> so
> that the ldp/stp pass could easily preserve them when forming stps.
>
> In that change I chose to use REG_CFA_OFFSET, but as the PR shows, that
> choice was problematic in that REG_CFA_OFFSET requires the attached
> store to be expressed in terms of the current CFA register at all times.
> This means that even scheduling of frame-related insns can break this
> invariant, leading to ICEs in dwarf2cfi.
>
> The old behaviour (before that change) allowed dwarf2cfi to interpret the RTL
> directly for sp-relative saves.  This change restores that behaviour by using
> REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.  REG_FRAME_RELATED_EXPR
> effectively just gives a different pattern for dwarf2cfi to look at instead of
> the main insn pattern.  That allows us to attach the old-style PARALLEL move
> representation in a REG_FRAME_RELATED_EXPR note and means we are free to 
> always
> express the save addresses in terms of the stack pointer.
>
> Since the ldp/stp fusion pass can combine frame-related stores, this patch 
> also
> updates it to preserve REG_FRAME_RELATED_EXPR notes, and additionally gives it
> the ability to synthesize those notes when combining sp-relative saves into an
> stp (the latter always needs a note due to the unspec representation, the 
> former
> does not).
>
> Bootstrapped/regetested on aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113077
>   * config/aarch64/aarch64-ldp-fusion.cc (filter_notes): Add fr_expr 
> param to
>   extract REG_FRAME_RELATED_EXPR notes.
>   (combine_reg_notes): Handle REG_FRAME_RELATED_EXPR notes, and
>   synthesize these if needed.  Update caller ...
>   (ldp_bb_info::fuse_pair): ... here.
>   * config/aarch64/aarch64.cc (aarch64_save_callee_saves): Use
>   REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/113077
>   * gcc.target/aarch64/pr113077.c: New test.

Thanks, mostly looks good, but some comments below.

> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 2fe1b1d4d84..00bc8b749c8 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -904,9 +904,11 @@ aarch64_operand_mode_for_pair_mode (machine_mode mode)
>  // Go through the reg notes rooted at NOTE, dropping those that we should 
> drop,
>  // and preserving those that we want to keep by prepending them to (and
>  // returning) RESULT.  EH_REGION is used to make sure we have at most one
> -// REG_EH_REGION note in the resulting list.
> +// REG_EH_REGION note in the resulting list.  FR_EXPR is used to return any
> +// REG_FRAME_RELATED_EXPR note we find, as these can need special handling in
> +// combine_reg_notes.
>  static rtx
> -filter_notes (rtx note, rtx result, bool *eh_region)
> +filter_notes (rtx note, rtx result, bool *eh_region, rtx *fr_expr)
>  {
>for (; note; note = XEXP (note, 1))
>  {
> @@ -940,6 +942,10 @@ filter_notes (rtx note, rtx result, bool *eh_region)
>  copy_rtx (XEXP (note, 0)),
>  result);
> break;
> + case REG_FRAME_RELATED_EXPR:
> +   gcc_assert (!*fr_expr);
> +   *fr_expr = copy_rtx (XEXP (note, 0));
> +   break;
>   default:
> // Unexpected REG_NOTE kind.
> gcc_unreachable ();
> @@ -951,13 +957,65 @@ filter_notes (rtx note, rtx result, bool *eh_region)
>  
>  // Return the notes that should be attached to a combination of I1 and I2, 
> where
>  // *I1 < *I2.
> +//
> +// LOAD_P is true for loads, REVERSED is true if the insns in
> +// program order are not in offset order, BASE_REGNO is the chosen base
> +// register number for the pair, and PATS gives the final RTL patterns for 
> the
> +// accesses.
>  static rtx
> -combine_reg_notes (insn_info *i1, insn_info *i2)
> +combine_reg_notes (insn_info *i1, insn_info *i2,
> +bool load_p, bool reversed,
> +int base_regno, rtx pats[2])
>  {
> +  // Temporary storage for REG_FRAME_RELATED_EXPR notes.
> +  rtx fr_expr[2] = {};
> +
>bool found_eh_region = false;
>rtx result = NULL_RTX;
> -  result = filter_notes (REG_NOTES (i2->rtl ()), result, _eh_region);
> -  return filter_notes (REG_NOTES (i1->rtl ()), result, _eh_region);
> +  result = filter_notes (REG_NOTES (i2->rtl ()), result,
> +  _eh_region, fr_expr);

[PATCH] Fix debug info for enumeration types with reverse Scalar_Storage_Order

2024-01-09 Thread Eric Botcazou
Hi,

this is not really a regression but the patch was written last week and is 
quite straightforward, so hopefully can nevertheless be OK.  It implements the 
support of DW_AT_endianity for enumeration types because they are scalar and, 
therefore, reverse Scalar_Storage_Order is supported for them, but only when 
the -gstrict-dwarf switch is not passed because this is an extension.

There is an associated GDB patch to be submitted by Tom to grok the new DWARF.

Tested on x86-64/Linux, OK for the mainline?  It may also help the GDB side to 
backport it for the upcoming 13.3 release.


2024-01-09  Eric Botcazou  

* dwarf2out.cc (modified_type_die): Extend the support of reverse
storage order to enumeration types if -gstrict-dwarf is not passed.
(gen_enumeration_type_die): Add REVERSE parameter and generate the
DIE immediately after the existing one if it is true.
(gen_tagged_type_die): Add REVERSE parameter and pass it in the
call to gen_enumeration_type_die.
(gen_type_die_with_usage): Add REVERSE parameter and pass it in the
first recursive call as well as the call to gen_tagged_type_die.
(gen_type_die): Add REVERSE parameter and pass it in the call to
gen_type_die_with_usage.

-- 
Eric Botcazoudiff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 2f9010bc3cb..1c994bb8b9b 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -3940,7 +3940,7 @@ static void gen_descr_array_type_die (tree, struct array_descr_info *, dw_die_re
 #if 0
 static void gen_entry_point_die (tree, dw_die_ref);
 #endif
-static dw_die_ref gen_enumeration_type_die (tree, dw_die_ref);
+static dw_die_ref gen_enumeration_type_die (tree, dw_die_ref, bool);
 static dw_die_ref gen_formal_parameter_die (tree, tree, bool, dw_die_ref);
 static dw_die_ref gen_formal_parameter_pack_die  (tree, tree, dw_die_ref, tree*);
 static void gen_unspecified_parameters_die (tree, dw_die_ref);
@@ -3960,7 +3960,7 @@ static void gen_struct_or_union_type_die (tree, dw_die_ref,
 		enum debug_info_usage);
 static void gen_subroutine_type_die (tree, dw_die_ref);
 static void gen_typedef_die (tree, dw_die_ref);
-static void gen_type_die (tree, dw_die_ref);
+static void gen_type_die (tree, dw_die_ref, bool = false);
 static void gen_block_die (tree, dw_die_ref);
 static void decls_for_scope (tree, dw_die_ref, bool = true);
 static bool is_naming_typedef_decl (const_tree);
@@ -3976,8 +3976,10 @@ static struct dwarf_file_data * lookup_filename (const char *);
 static void retry_incomplete_types (void);
 static void gen_type_die_for_member (tree, tree, dw_die_ref);
 static void gen_generic_params_dies (tree);
-static void gen_tagged_type_die (tree, dw_die_ref, enum debug_info_usage);
-static void gen_type_die_with_usage (tree, dw_die_ref, enum debug_info_usage);
+static void gen_tagged_type_die (tree, dw_die_ref, enum debug_info_usage,
+ bool = false);
+static void gen_type_die_with_usage (tree, dw_die_ref, enum debug_info_usage,
+ bool = false);
 static void splice_child_die (dw_die_ref, dw_die_ref);
 static int file_info_cmp (const void *, const void *);
 static dw_loc_list_ref new_loc_list (dw_loc_descr_ref, const char *, var_loc_view,
@@ -13665,8 +13667,11 @@ modified_type_die (tree type, int cv_quals, bool reverse,
   const int cv_qual_mask = (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE
 			| TYPE_QUAL_RESTRICT | TYPE_QUAL_ATOMIC | 
 			ENCODE_QUAL_ADDR_SPACE(~0U));
-  const bool reverse_base_type
-= need_endianity_attribute_p (reverse) && is_base_type (type);
+  /* DW_AT_endianity is specified only for base types in the standard.  */
+  const bool reverse_type
+= need_endianity_attribute_p (reverse)
+  && (is_base_type (type)
+	  || (TREE_CODE (type) == ENUMERAL_TYPE && !dwarf_strict));
 
   if (code == ERROR_MARK)
 return NULL;
@@ -13726,9 +13731,9 @@ modified_type_die (tree type, int cv_quals, bool reverse,
 
   /* DW_AT_endianity doesn't come from a qualifier on the type, so it is
 	 dealt with specially: the DIE with the attribute, if it exists, is
-	 placed immediately after the regular DIE for the same base type.  */
+	 placed immediately after the regular DIE for the same type.  */
   if (mod_type_die
-	  && (!reverse_base_type
+	  && (!reverse_type
 	  || ((mod_type_die = mod_type_die->die_sib) != NULL
 		  && get_AT_unsigned (mod_type_die, DW_AT_endianity
 	return mod_type_die;
@@ -13745,7 +13750,7 @@ modified_type_die (tree type, int cv_quals, bool reverse,
   tree dtype = TREE_TYPE (name);
 
   /* Skip the typedef for base types with DW_AT_endianity, no big deal.  */
-  if (qualified_type == dtype && !reverse_base_type)
+  if (qualified_type == dtype && !reverse_type)
 	{
 	  tree origin = decl_ultimate_origin (name);
 
@@ -13952,7 +13957,7 @@ modified_type_die (tree type, int cv_quals, bool reverse,
 	mod_type_die = base_type_die (type, reverse);
 
   /* The DIE with DW_AT_endianity is placed right 

Re: [PATCH 6/4] libbacktrace: Add loaded dlls after initialize

2024-01-09 Thread Björn Schäpers

Am 07.01.2024 um 18:03 schrieb Eli Zaretskii:

Date: Sun, 7 Jan 2024 17:07:06 +0100
Cc: i...@google.com, gcc-patches@gcc.gnu.org, g...@gcc.gnu.org
From: Björn Schäpers 


That was about GetModuleHandle, not about GetModuleHandleEx.  For the
latter, all Windows versions that support it also support "wide" APIs.
So my suggestion is to use GetModuleHandleExW here.  However, you will
need to make sure that notification_data->dll_base is declared as
'wchar_t *', not 'char *'.  If dll_base is declared as 'char *', then
only GetModuleHandleExA will work, and you will lose the ability to
support file names with non-ASCII characters outside of the current
system codepage.


The dll_base is a PVOID. With the GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS flag
GetModuleHandleEx does not look for a name, but uses an adress in the module to
get the HMODULE, so you cast it to char* or wchar_t* depending on which function
you call. Actually one could just cast the dll_base to HMODULE, at least in
win32 on x86 the HMODULE of a dll is always its base adress. But to make it
safer and future proof I went the way through GetModuleHandeEx.


In that case, you an call either GetModuleHandeExA or
GetModuleHandeExW, the difference is minor.


Here an updated version without relying on TEXT or TCHAR, directly calling 
GetModuleHandleExW.


Kind regards,
Björn.From a8e1e64ccb56158ec8a7e5de0d5228f3f6f7e5c4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Sch=C3=A4pers?= 
Date: Sat, 6 Jan 2024 22:53:54 +0100
Subject: [PATCH 3/3] libbacktrace: Add loaded dlls after initialize

libbacktrace/Changelog:

* pecoff.c [HAVE_WINDOWS_H]:
  (dll_notification_data): Added
  (dll_notification_context): Added
  (dll_notification): Added
  (backtrace_initialize): Use LdrRegisterDllNotification to load
  debug information of later loaded
  dlls.
---
 libbacktrace/pecoff.c | 104 +-
 1 file changed, 103 insertions(+), 1 deletion(-)

diff --git a/libbacktrace/pecoff.c b/libbacktrace/pecoff.c
index 647baa39640..d973a26f05a 100644
--- a/libbacktrace/pecoff.c
+++ b/libbacktrace/pecoff.c
@@ -62,6 +62,34 @@ POSSIBILITY OF SUCH DAMAGE.  */
 #undef Module32Next
 #endif
 #endif
+
+#if defined(_ARM_)
+#define NTAPI
+#else
+#define NTAPI __stdcall
+#endif
+
+/* This is a simplified (but binary compatible) version of what Microsoft
+   defines in their documentation. */
+struct dll_notifcation_data
+{
+  ULONG reserved;
+  /* The name as UNICODE_STRING struct. */
+  PVOID full_dll_name;
+  PVOID base_dll_name;
+  PVOID dll_base;
+  ULONG size_of_image;
+};
+
+#define LDR_DLL_NOTIFICATION_REASON_LOADED 1
+
+typedef LONG NTSTATUS;
+typedef VOID CALLBACK (*LDR_DLL_NOTIFICATION)(ULONG,
+ struct dll_notifcation_data*,
+ PVOID);
+typedef NTSTATUS NTAPI (*LDR_REGISTER_FUNCTION)(ULONG,
+   LDR_DLL_NOTIFICATION, PVOID,
+   PVOID*);
 #endif
 
 /* Coff file header.  */
@@ -912,7 +940,8 @@ coff_add (struct backtrace_state *state, int descriptor,
   return 0;
 }
 
-#if defined(HAVE_WINDOWS_H) && !defined(HAVE_TLHELP32_H)
+#ifdef HAVE_WINDOWS_H
+#ifndef HAVE_TLHELP32_H
 static void
 free_modules (struct backtrace_state *state,
  backtrace_error_callback error_callback, void *data,
@@ -958,6 +987,51 @@ get_all_modules (struct backtrace_state *state,
 }
 }
 #endif
+struct dll_notification_context
+{
+  struct backtrace_state *state;
+  backtrace_error_callback error_callback;
+  void *data;
+};
+
+VOID CALLBACK
+dll_notification (ULONG reason,
+struct dll_notifcation_data *notification_data,
+PVOID context)
+{
+  char module_name[MAX_PATH];
+  int descriptor;
+  struct dll_notification_context* dll_context =
+(struct dll_notification_context*) context;
+  struct backtrace_state *state = dll_context->state;
+  void *data = dll_context->data;
+  backtrace_error_callback error_callback = dll_context->data;
+  fileline fileline;
+  int found_sym;
+  int found_dwarf;
+  HMODULE module_handle;
+
+  if (reason != LDR_DLL_NOTIFICATION_REASON_LOADED)
+return;
+
+  if (!GetModuleHandleExW (GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS
+  | GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
+  (wchar_t*) notification_data->dll_base,
+  _handle))
+return;
+
+  if (!GetModuleFileNameA ((HMODULE) module_handle, module_name, MAX_PATH - 1))
+return;
+
+  descriptor = backtrace_open (module_name, error_callback, data, NULL);
+
+  if (descriptor < 0)
+return;
+
+  coff_add (state, descriptor, error_callback, data, , _sym,
+   _dwarf, (uintptr_t) module_handle);
+}
+#endif
 
 /* Initialize the backtrace data we need from an ELF executable.  At
the ELF level, all we need to do 

Re: [PATCH] Add support for function attributes and variable attributes

2024-01-09 Thread David Malcolm
On Wed, 2023-11-15 at 17:53 +0100, Guillaume Gomez wrote:
> Hi,
> 
> This patch adds the (incomplete) support for function and variable
> attributes. The added attributes are the ones we're using in
> rustc_codegen_gcc but all the groundwork is done to add more (and we
> will very likely add more as we didn't add all the ones we use in
> rustc_codegen_gcc yet).
> 
> The only big question with this patch is about `inline`. We currently
> handle it as an attribute because it is more convenient for us but is
> it ok or should we create a separate function to mark a function as
> inlined?
> 
> Thanks in advance for the review.

Thanks for the patch; sorry for the delay in reviewing.

At a high-level I think the API is OK as-is, but I have some nitpicks
with the implementation:

[...snip...]

> diff --git a/gcc/jit/docs/topics/types.rst b/gcc/jit/docs/topics/types.rst
> index d8c1d15d69d..6c72c99cbd9 100644
> --- a/gcc/jit/docs/topics/types.rst
> +++ b/gcc/jit/docs/topics/types.rst

[...snip...]

> +.. function::  void\
> +   gcc_jit_lvalue_add_string_attribute (gcc_jit_lvalue *variable,
> +enum 
> gcc_jit_fn_attribute attribute,
^^

This got out of sync with the declaration in the header file; it should
be
enum gcc_jit_variable_attribute attribute

[...snip...]

> diff --git a/gcc/jit/dummy-frontend.cc b/gcc/jit/dummy-frontend.cc
> index a729086bafb..898b4d6e7f8 100644
> --- a/gcc/jit/dummy-frontend.cc
> +++ b/gcc/jit/dummy-frontend.cc

It's unfortunate that jit/dummy-frontend.cc has its own copy of the
material in c-common/c-attribs.cc.  I glanced through this code, and it
seems that there are already various differences between the two copies
in the existing code, and the patch adds more such differences.

Bother - but I think this part of the patch is inevitable (and OK)
given the existing state of attribute handling here.

[...snip...]

I took a brief look through the handler functions and with the above
caveat I didn't see anything obviously wrong.  I'm going to assume this
code is OK given that presumably you've been testing it within rustcc,
right?

[..snip...]

> diff --git a/gcc/jit/libgccjit.cc b/gcc/jit/libgccjit.cc
> index 0451b4df7f9..337d4ea3b95 100644
> --- a/gcc/jit/libgccjit.cc
> +++ b/gcc/jit/libgccjit.cc
> @@ -3965,6 +3965,51 @@ gcc_jit_type_get_aligned (gcc_jit_type *type,
>return (gcc_jit_type *)type->get_aligned (alignment_in_bytes);
>  }
>  
> +void
> +gcc_jit_function_add_attribute (gcc_jit_function *func,
> + gcc_jit_fn_attribute attribute)
> +{
> +  RETURN_IF_FAIL (func, NULL, NULL, "NULL func");
> +
> +  func->add_attribute (attribute);

Ideally should validate parameter "attribute" here with a
RETURN_IF_FAIL.

> +}
> +
> +void
> +gcc_jit_function_add_string_attribute (gcc_jit_function *func,
> +gcc_jit_fn_attribute attribute,
> +const char* value)
> +{
> +  RETURN_IF_FAIL (func, NULL, NULL, "NULL func");

Likewise, ideally should validate parameter "attribute" here with a
RETURN_IF_FAIL.

Can "value" be NULL?  If not, then we should add a RETURN_IF_FAIL for
it here at the API boundary.

> +
> +  func->add_string_attribute (attribute, value);
> +}
> +
> +/* This function adds an attribute with multiple integer values.  For example
> +   `nonnull(1, 2)`.  The numbers in `values` are supposed to map how they
> +   should be written in C code.  So for `nonnull(1, 2)`, you should pass `1`
> +   and `2` in `values` (and set `length` to `2`). */
> +void
> +gcc_jit_function_add_integer_array_attribute (gcc_jit_function *func,
> +   gcc_jit_fn_attribute attribute,
> +   const int* values,
> +   size_t length)
> +{
> +  RETURN_IF_FAIL (func, NULL, NULL, "NULL func");

As before, ideally should validate parameter "attribute" here with a
RETURN_IF_FAIL.

> +  RETURN_IF_FAIL (values, NULL, NULL, "NULL values");
> +
> +  func->add_integer_array_attribute (attribute, values, length);
> +}
> +
> +void
> +gcc_jit_lvalue_add_string_attribute (gcc_jit_lvalue *variable,
> +  gcc_jit_variable_attribute attribute,
> +  const char* value)
> +{
> +  RETURN_IF_FAIL (variable, NULL, NULL, "NULL variable");

As before, we should validate parameters "attribute" and "value" here
with RETURN_IF_FAILs.

We should also validate here that "variable" is indeed a variable, not
some arbitrary lvalue e.g. the address of the element of an array (or
whatever).


> +
> +  variable->add_string_attribute (attribute, value);
> +}
> +

[...snip...]

> diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp
> index 8bf7e51c24f..657b454a003 100644
> --- a/gcc/testsuite/jit.dg/jit.exp
> +++ 

Re: [PATCH v3] LoongArch: testsuite:Added support for vector object detection.

2024-01-09 Thread Andreas Schwab
gcc: gcc.dg/vect/vect-outer-4a-big-array.c -flto -ffat-lto-objects: error 
executing dg-final: unknown effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4a-big-array.c: error executing dg-final: unknown 
effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4a.c -flto -ffat-lto-objects: error executing 
dg-final: unknown effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4a.c: error executing dg-final: unknown effective 
target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4b-big-array.c -flto -ffat-lto-objects: error 
executing dg-final: unknown effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4b-big-array.c: error executing dg-final: unknown 
effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4b.c -flto -ffat-lto-objects: error executing 
dg-final: unknown effective target keyword `loongarch*-*-*'
gcc: gcc.dg/vect/vect-outer-4b.c: error executing dg-final: unknown effective 
target keyword `loongarch*-*-*'

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] vect: Fix ICE in vect_analyze_loop_costing [PR113210]

2024-01-09 Thread Jeff Law




On 1/6/24 01:59, Jakub Jelinek wrote:

Hi!

The following testcase ICEs (on ARM/RISCV with certain options), because niters 
analysis
computes number of latch executions for the loop as
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 
255) : 0
where a.0_1 is unsigned char.  This is correct, but given that a.0_1 + 255
is done in unsigned char the condition is never true and so it is actually
equivalent to 0, but the folders don't know that.
The vectorizer sets LOOP_VINFO_NITERSM1 to that expression and does on with
computing LOOP_VINFO_NITERS by fold_build2 PLUS_EXPR of that expression
unshared and INTEGER_CST one.  In that folding we trigger various
optimizations, first it is correctly simplified into
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 
255) : 1
and next using
/* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
into
(short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) 
: 1
and for this the first COND_EXPR argument is folded and figured out to be 0
and so while LOOP_VINFO_NITERSM1 is a complex expression (unknown to be
equivalent to 0), LOOP_VINFO_NITERS is INTEGER_CST 1.
vect_analyze_loop_costing then uses LOOP_VINFO_NITERS_KNOWN_P (which checks
if LOOP_VINFO_NITERS is INTEGER_CST which fits into shwi or something like
that) and from that assumes that LOOP_VINFO_NITERSM1 will be INTEGER_CST.

The following patch fixes that by adding verification for that too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-06  Jakub Jelinek  

PR tree-optimization/113210
* tree-vect-loop.cc (vect_analyze_loop_costing): If LOOP_VINFO_NITERSM1
is not INTEGER_CST, don't try to use it.

* gcc.c-torture/compile/pr113210.c: New test.

OK
jeff


Re: [PATCH v2 2/2] asan: Align .LASANPC on function boundary

2024-01-09 Thread Ilya Leoshkevich
On Tue, 2024-01-09 at 11:55 -0700, Jeff Law wrote:
> 
> 
> On 1/2/24 12:41, Ilya Leoshkevich wrote:
> > GCC can emit code between the function label and the .LASANPC
> > label,
> > making the latter unaligned.  Some architectures cannot load
> > unaligned
> > labels directly and require literal pool entries, which is
> > inefficient.
> > 
> > Move the invocation of asan_function_start to
> > ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code
> > is
> > emitted.  This allows setting the .LASANPC label alignment to the
> > respective function alignment.
> > ---
> >   gcc/asan.cc |  6 ++
> >   gcc/config/i386/i386.cc |  2 +-
> >   gcc/config/s390/s390.cc |  2 +-
> >   gcc/defaults.h  |  2 +-
> >   gcc/final.cc    |  3 ---
> >   gcc/output.h    |  4 
> >   gcc/varasm.cc   | 14 ++
> >   7 files changed, 23 insertions(+), 10 deletions(-)
> So this needs a ChangeLog obviously.  I assume you've tested on
> s390[x]. 
>   It should also be tested on x86 since it's the only other platform 
> that redefined ASM_OUTPUT_FUNCTION_LABEL.
> 
> Assuming those tests pass without regression, then this is fine for
> the 
> trunk.
> 
> Thanks,
> Jeff

Hi Jeff,

Since Jakub already approved this 2/2, you approved 1/2, and
x86_64/ppc64le/s390x regtests were successful, I've already pushed this
series (with ChangeLogs).

Unfortunately people discovered two regressions on i686 [1] and ppc64be
[2].  The first one is already sorted out, I'm currently regtesting the
fix for the second one and will push it as soon as it's done.

Best regards,
Ilya

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113251
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113284


Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-09 Thread Radek Barton
Hello.

I forgot to add the target maintainers to the CC. My apologies for that.

Furthermore, I am adding also relevant changes in `libgcc/config/aarch64/lse.S` 
file to the patch. Originally we wanted to submit those changes separately but 
after the feedback from Andrew Pinski, it makes sense to add them here. I 
needed to rename `HIDDEN`, `TYPE`, and `SIZE` macros to `HIDDEN_PO`, `TYPE_PO`, 
and `SIZE_PO` (pseudo-op) because there is a collision with other macro named 
`SIZE` in the `lse.S` file.

Best regards,

Radek


v3-0001-Ifdef-.hidden-.type-and-.size-pseudo-ops-for-aarc.patch
Description: v3-0001-Ifdef-.hidden-.type-and-.size-pseudo-ops-for-aarc.patch


Re: [PATCH] c++: reference variable as default targ [PR101463]

2024-01-09 Thread Jason Merrill

On 1/5/24 15:01, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK for trunk?

-- >8 --

Here during default template argument substitution we wrongly consider
the (substituted) default arguments v and vt as value-dependent[1]
which ultimately leads to deduction failure for the calls.

The bogus value_dependent_expression_p result aside, I noticed
type_unification_real during default targ substitution keeps track of
whether all previous targs are known and non-dependent, as is the case
for these calls.  And in such cases it should be safe to avoid checking
dependence of the substituted default targ and just assume it's not.
This patch implements this optimization, which lets us accept both
testcases by sidestepping the value_dependent_expression_p issue
altogether.


Hmm, maybe instead of substituting and asking if it's dependent, we 
should specifically look for undeduced parameters.


Jason



Re: [PATCH v2 2/2] asan: Align .LASANPC on function boundary

2024-01-09 Thread Jeff Law




On 1/2/24 12:41, Ilya Leoshkevich wrote:

GCC can emit code between the function label and the .LASANPC label,
making the latter unaligned.  Some architectures cannot load unaligned
labels directly and require literal pool entries, which is inefficient.

Move the invocation of asan_function_start to
ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is
emitted.  This allows setting the .LASANPC label alignment to the
respective function alignment.
---
  gcc/asan.cc |  6 ++
  gcc/config/i386/i386.cc |  2 +-
  gcc/config/s390/s390.cc |  2 +-
  gcc/defaults.h  |  2 +-
  gcc/final.cc|  3 ---
  gcc/output.h|  4 
  gcc/varasm.cc   | 14 ++
  7 files changed, 23 insertions(+), 10 deletions(-)
So this needs a ChangeLog obviously.  I assume you've tested on s390[x]. 
 It should also be tested on x86 since it's the only other platform 
that redefined ASM_OUTPUT_FUNCTION_LABEL.


Assuming those tests pass without regression, then this is fine for the 
trunk.


Thanks,
Jeff


Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2024-01-09 Thread Jeff Law




On 1/3/24 16:39, Richard Sandiford wrote:

YunQiang Su  writes:

On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
if 31 or above bits is polluted by an bitops, we will need an
truncate. Let's emit one, and mark let's use the same hardreg
as in and out, the RTL may like:

(insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0)
 (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1
  (nil))

We use /s/u flags to mark it as really needed, as in
combine_simplify_rtx, this insn may be considered as truncated,
so let's skip this combination.

gcc/ChangeLog:
 PR: 104914.
 * combine.cc (try_combine): Skip combine with truncate if
dest is subreg and has /u/s flags on platforms
TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
* expr.cc (expand_assignment): Emit a truncate insn, if
31+ bits is polluted for SImode.

gcc/testsuite/ChangeLog:
PR: 104914.
* gcc.target/mips/pr104914.c: New testcase.


Sorry for not looking at this earlier.  I've got a bit lost in the
various threads, so apologies if this has been discussed already
but I think the fix is:

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 4f42c0ff487..9847eba19fe 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6275,9 +6275,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
  else
{
  rtx to_rtx1
-   = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
- SUBREG_REG (to_rtx),
- subreg_promoted_mode (to_rtx));
+   = convert_to_mode (subreg_unpromoted_mode (to_rtx),
+  SUBREG_REG (to_rtx), false);
  result = store_field (to_rtx1, bitsize, bitpos,
bitregion_start, bitregion_end,
mode1, from, get_alias_set (to),

(completely untested apart from the test case).  That should still
produce a subreg on most targets, but generates the required trunc
on MIPS.

I think this is a subset of Roger's work that got committed a few days back.

jeff


Re: [PATCH] Keep track of the FUNCTION_BEG note

2024-01-09 Thread Jeff Law




On 1/5/24 09:28, Richard Sandiford wrote:

function.cc emits a NOTE_FUNCTION_BEG after all arguments have
been copied to pseudos.  It then records this note in parm_birth_insn.
Various other pieces of code use this insn as a convenient place to
insert things at the start of the function.

However, cfgexpand later changes parm_birth_insn as follows:

   /* If we emitted any instructions for setting up the variables,
  emit them before the FUNCTION_START note.  */
   if (var_seq)
 {
   emit_insn_before (var_seq, parm_birth_insn);

   /* In expand_function_end we'll insert the alloca save/restore
 before parm_birth_insn.  We've just insertted an alloca call.
 Adjust the pointer to match.  */
   parm_birth_insn = var_seq;
 }

But the FUNCTION_BEG note is still useful for things that aren't
sensitive to stack allocation, and it has the advantage that
(unlike the var_seq above) it is never deleted or combined.
This patch adds a separate variable to track it.

Tested on aarch64-linux-gnu, where it's needed for fixing PR113196.
OK to install?

Richard


gcc/
* emit-rtl.h (rtl_data::x_function_beg_note): New member variable.
(function_beg_insn): New macro.
* function.cc (expand_function_start): Initialize function_beg_insn.

OK
jeff


Re: [PATCH v5 1/1] RISC-V: Add support for XCVbi extension in CV32E40P

2024-01-09 Thread Jeff Law




On 1/8/24 06:14, Mary Bennett wrote:

Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
   Mary Bennett 
   Nandni Jamnadas 
   Pietra Ferreira 
   Charlie Keaney
   Jessica Mills
   Craig Blackmore 
   Simon Cook 
   Jeremy Bennett 
   Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Create XCVbi extension
  support.
* config/riscv/riscv.opt: Likewise.
* config/riscv/corev.md: Implement cv_branch pattern
  for cv.beqimm and cv.bneimm.
* config/riscv/riscv.md: Add CORE-V branch immediate to RISC-V
  branch instruction pattern.
* config/riscv/constraints.md: Implement constraints
  cv_bi_s5 - signed 5-bit immediate.
* config/riscv/predicates.md: Implement predicate
  const_int5s_operand - signed 5 bit immediate.
* doc/sourcebuild.texi: Add XCVbi documentation.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bi-beqimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-beqimm-compile-2.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-2.c: New test.
* lib/target-supports.exp: Add proc for XCVbi.

Assuming this has gone through a testing cycle, this is fine for the trunk.

Thanks,
jeff


Re: [PATCH 0/5] RISC-V: Relax the -march string for accept any order

2024-01-09 Thread Jeff Law




On 1/8/24 06:47, Kito Cheng wrote:


Do you know how to build a ISA string with following extension?
- g
- c
- zba
- zbs
- svnapot
- zve64d
- zvl128b

Don't trial and error with your gcc and don't read RISC-V ISA spec! OK, I 
believe it's impossible for most people, even I work for RISC-V so many years, 
I remember most of the rule of the the canonical order, it's still hard to 
order that right in short time...

So I think it's time to relax that for the -march string inputs, since we have 
so many extension today, but we still keep the canonicalization within the 
compiler, because we need that to handle multi-lib and also it's easier to 
compare different ISA string.

This patch break into serveral part:
1) Small refactor patch
2) Change the way of parsing ISA string.
3) Remove unused functions
4) Update test cases
5) Update document

Just because something is hard doesn't necessarily mean we should avoid it.

A great example would be strict aliasing.  I'd bet that 90% of C/C++ 
developers would get something wrong in this space.  Similarly for 
oddities of FP arithmetic.


My biggest worry is consistency across various tools.  It's rather lame 
if GCC were on an island by itself either in being too strict or too loose.


So where are the other key tools in this regard?  Are we an outlier 
right now or will this patch make us an outlier?


jeff


Re: [PATCH] c-family: copy attribute diagnostic fixes [PR113262]

2024-01-09 Thread Jeff Law




On 1/9/24 01:52, Jakub Jelinek wrote:

Hi!

The copy attributes is allowed on decls as well as types and even has
checks whether decl (set to *node) is DECL_P or TYPE_P, but for diagnostics
unconditionally uses DECL_SOURCE_LOCATION (decl), which obviously only works
if it applies to a decl.

The following patch fixes that, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2024-01-09  Jakub Jelinek  

PR c/113262
* c-attribs.cc (handle_copy_attribute): Don't use
DECL_SOURCE_LOCATION (decl) if decl is not DECL_P, use input_location
instead.  Formatting fixes.

* gcc.dg/pr113262.c: New test.

ok
Jeff


Re: [PATCH v2] RISC-V: T-HEAD: Add support for the XTheadInt ISA extension

2024-01-09 Thread Jeff Law




On 11/17/23 00:33, Jin Ma wrote:

The XTheadInt ISA extension provides acceleration interruption
instructions as defined in T-Head-specific:
* th.ipush
* th.ipop

Ref:
https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf

gcc/ChangeLog:

* config/riscv/riscv-protos.h (th_int_get_mask): New prototype.
(th_int_get_save_adjustment): Likewise.
(th_int_adjust_cfi_prologue): Likewise.
* config/riscv/riscv.cc (TH_INT_INTERRUPT): New macro.
(riscv_expand_prologue): Add the processing of XTheadInt.
(riscv_expand_epilogue): Likewise.
* config/riscv/riscv.md: New unspec.
* config/riscv/thead.cc (BITSET_P): New macro.
* config/riscv/thead.md (th_int_push): New pattern.
(th_int_pop): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadint-push-pop.c: New test.
Thanks for the ping earlier today.  I've looked at this patch repeatedly 
over the last few weeks, but never enough to give it a full review.




diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index 2babfafb23c..4d6e16c0edc 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md



+(define_insn "th_int_pop"
+  [(unspec_volatile [(const_int 0)] UNSPECV_XTHEADINT_POP)
+   (clobber (reg:SI RETURN_ADDR_REGNUM))
+   (clobber (reg:SI T0_REGNUM))
+   (clobber (reg:SI T1_REGNUM))
+   (clobber (reg:SI T2_REGNUM))
+   (clobber (reg:SI A0_REGNUM))
+   (clobber (reg:SI A1_REGNUM))
+   (clobber (reg:SI A2_REGNUM))
+   (clobber (reg:SI A3_REGNUM))
+   (clobber (reg:SI A4_REGNUM))
+   (clobber (reg:SI A5_REGNUM))
+   (clobber (reg:SI A6_REGNUM))
+   (clobber (reg:SI A7_REGNUM))
+   (clobber (reg:SI T3_REGNUM))
+   (clobber (reg:SI T4_REGNUM))
+   (clobber (reg:SI T5_REGNUM))
+   (clobber (reg:SI T6_REGNUM))
+   (return)]
+  "TARGET_XTHEADINT && !TARGET_64BIT"
+  "th.ipop"
+  [(set_attr "type"  "ret")
+   (set_attr "mode"  "SI")])
I probably would have gone with a load type since its the loads that are 
most likely to interact existing code in the pipeline.  But I doubt it 
really matters in practice.



OK for the trunk.  Thanks for your patience.

jeff


Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-09 Thread Jeff Law




On 1/8/24 16:04, 钟居哲 wrote:

This patch looks ok from myside.

Likewise.

So I think the only question for this specific patch is whether or not 
it makes sense to include it now or wait for more of the thead bits to 
get to acceptance.


I tend to think it should wait since I don't think it has any value 
without the rest of the thead vector changes and it's not 100% clear if 
those changes are going to make it into gcc-14 or not.


Jeff


Re: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

2024-01-09 Thread Jeff Law




On 1/8/24 03:45, Richard Biener wrote:

On Tue, Jan 2, 2024 at 2:37 PM  wrote:


From: Pan Li 

According to the sematics of no-signed-zeros option, the backend
like RISC-V should treat the minus zero -0.0f as plus zero 0.0f.

Consider below example with option -fno-signed-zeros.

void
test (float *a)
{
   *a = -0.0;
}

We will generate code as below, which doesn't treat the minus zero
as plus zero.

test:
   lui  a5,%hi(.LC0)
   flw  fa5,%lo(.LC0)(a5)
   fsw  fa5,0(a0)
   ret

.LC0:
   .word -2147483648 // aka -0.0 (0x8000 in hex)

This patch would like to fix the bug and treat the minus zero -0.0
as plus zero, aka +0.0. Thus after this patch we will have asm code
as below for the above sampe code.

test:
   sw zero,0(a0)
   ret

This patch also fix the run failure of the test case pr30957-1.c. The
below tests are passed for this patch.


We don't really expect targets to do this.  The small testcase above
is somewhat ill-formed with -fno-signed-zeros.  Note there's no
-0.0 in pr30957-1.c so why does that one fail for you?  Does
the -fvariable-expansion-in-unroller code maybe not trigger for
riscv?
Loop unrolling (and thus variable expansion) doesn't trigger on the VLA 
style architectures.  aarch64 passes becuase its backend knows it can 
translate -0.0 into 0.0.


While we don't require that from ports, I'd just assume do the 
optimization similar to aarch64 rather than xfail or skip the test on 
RISC-V.  We can load 0.0 more efficiently than -0.0.





I think we should go to PR30957 and see what that was filed originally
for, the testcase doesn't make much sense to me.

It's got more history than I'd like :(


jeff


[committed] Adding missing prototype for __clzhi2 to xstormy port

2024-01-09 Thread Jeff Law
xstormy16 has failed since the c99 transition due to a missing prototype 
for __clzhi2 in the implementation of storm16_count_loaading_zeros.


This fixes the missing prototype.  Pushed to the trunk.

jeffcommit 9f7afa99c67f039e43019ebd08d14a7f01e2d89c
Author: Jeff Law 
Date:   Tue Jan 9 10:21:28 2024 -0700

[committed] Adding missing prototype for __clzhi2 to xstormy port

xstormy16 has failed since the c99 transition due to a missing prototype for
__clzhi2 in the implementation of stormy16_count_leading_zeros.

This fixes the missing prototype.  Pushed to the trunk.

include/
* longlong.h (__stormy16_count_leading_zeros): Add prototype for
__clzhi2.

diff --git a/include/longlong.h b/include/longlong.h
index e4fe1d24144..b5dec95b7ed 100644
--- a/include/longlong.h
+++ b/include/longlong.h
@@ -1573,6 +1573,7 @@ extern UHItype __stormy16_count_leading_zeros (UHItype);
   for ((count) = 0, size = W_TYPE_SIZE; size; size -= 16)  \
{   \
  UHItype c;\
+ extern UHItype __clzhi2 (UHItype);\
\
  c = __clzhi2 ((x) >> (size - 16));\
  (count) += c; \


[committed] libstdc++: Simplify some chrono formatters

2024-01-09 Thread Jonathan Wakely
Tested aarch64-linux. Pushed to trunk.

-- >8 --

I don't remember exactly why I made these bits of code reserve space in
a COW string and append to it, rather than just use the string returned
from std::format (which will undergo copy elision). The _Str_sink type
used by std::format means the string only performs a single allocation
for the formatted output, and the returned string's reference count will
be one, so won't reallocate when indexing into it. We can remove these
non-optimizations.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_F): Simplify
handling of string returned from std::format.
(__formatter_chrono::_M_R_T): Likewise.
---
 libstdc++-v3/include/bits/chrono_io.h | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index ec2ae9d53cc..7b5876b24e6 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -898,11 +898,8 @@ namespace __format
 _FormatContext&) const
{
  auto __ymd = _S_date(__t);
- basic_string<_CharT> __s;
-#if ! _GLIBCXX_USE_CXX11_ABI
- __s.reserve(11);
-#endif
- __s += std::format(_GLIBCXX_WIDEN("{:04d}-  -  "), (int)__ymd.year());
+ auto __s = std::format(_GLIBCXX_WIDEN("{:04d}-  -  "),
+(int)__ymd.year());
  auto __sv = _S_two_digits((unsigned)__ymd.month());
  __s[__s.size() - 5] = __sv[0];
  __s[__s.size() - 4] = __sv[1];
@@ -1093,11 +1090,8 @@ namespace __format
  // %T Equivalent to %H:%M:%S
  auto __hms = _S_hms(__t);
 
- basic_string<_CharT> __s;
-#if ! _GLIBCXX_USE_CXX11_ABI
- __s.reserve(11);
-#endif
- __s = std::format(_GLIBCXX_WIDEN("{:02d}:00"), __hms.hours().count());
+ auto __s = std::format(_GLIBCXX_WIDEN("{:02d}:00"),
+__hms.hours().count());
  auto __sv = _S_two_digits(__hms.minutes().count());
  __s[__s.size() - 2] = __sv[0];
  __s[__s.size() - 1] = __sv[1];
-- 
2.43.0



[committed] Fix minor bug in epiphany port

2024-01-09 Thread Jeff Law


So I consider this port dead as it semi-randomly fails in reload due to 
unrelated changes earlier in the gimple and RTL pipelines.  Regardless 
Richard S's late-combine work did show a very obvious error in the port 
that we should go ahead and fix as long as the port is in-tree.


The epiphany add-with-immediate instruction allows an 11 bit signed 
immediate.  That gives the instruction an immediate range of -1024..1023.


The port actually allowed -8192..8191 due to the uber-weird constraint 
definition.  I've simplified the constraint to match the hardware 
documentation I was able to find.  That was enough to get the epiphany 
port to build libgcc/newlib with Richard S's late-combine work.


The testsuite is so flakey on that port (due to the reload failures) 
that my tester doesn't run it.  So no comparisons are available.


Anyway, I've pushed this to the trunk.

jeffcommit 0beb20c01cf7120c724f9882be41a77e970fe63d
Author: Jeff Law 
Date:   Tue Jan 9 10:17:54 2024 -0700

[committed] Fix minor bug in epiphany port

So I consider this port dead as it semi-randomly fails in reload due to
unrelated changes earlier in the gimple and RTL pipelines.  Regardless 
Richard
S's late-combine work did show a very obvious error in the port that we 
should
go ahead and fix as long as the port is in-tree.

The epiphany add-with-immediate instruction allows an 11 bit signed 
immediate.
That gives the instruction an immediate range of -1024..1023.

The port actually allowed -8192..8191 due to the uber-weird constraint
definition.  I've simplified the constraint to match the hardware 
documentation
I was able to find.  That was enough to get the epiphany port to build
libgcc/newlib with Richard S's late-combine work.

The testsuite is so flakey on that port (due to the reload failures) that my
tester doesn't run it.  So no comparisons are available.

gcc/
* config/epiphany/constraints.md (Car): Allow -1024..1023, no more,
no less.

diff --git a/gcc/config/epiphany/constraints.md 
b/gcc/config/epiphany/constraints.md
index e4fda2d34a4..5dc960175f1 100644
--- a/gcc/config/epiphany/constraints.md
+++ b/gcc/config/epiphany/constraints.md
@@ -98,12 +98,12 @@ (define_constraint "Rgs"
(match_test "REGNO (op) >= FIRST_PSEUDO_REGISTER || REGNO (op) <= 7")))
 
 ;; Constant suitable for the addsi3_r pattern.
+;; No idea why we previously used RTX_OK_FOR_OFFSET with SI, HI an QI
+;; modes.  The instruction in question accepts 11 bit signed constants.
 (define_constraint "Car"
   "addsi3_r constant."
   (and (match_code "const_int")
-   (ior (match_test "RTX_OK_FOR_OFFSET_P (SImode, op)")
-   (match_test "RTX_OK_FOR_OFFSET_P (HImode, op)")
-   (match_test "RTX_OK_FOR_OFFSET_P (QImode, op)"
+   (match_test "IN_RANGE (INTVAL (op), -1024, 1023)")))
 
 ;; The return address if it can be replaced with GPR_LR.
 (define_constraint "Rra"


[committed] Fix minor bug on mn103 port

2024-01-09 Thread Jeff Law


Richard Sandiford debugged a failure on the mn103 port with his 
late-combine patches down to the subdi3 pattern not specifying the isa 
on alternatives which required newer variants of the chip family.


This patch adds the missing isa attribute and the port now works with 
his late-combine patch.  I'm pushing this to the trunk on his behalf.


Jeff


commit 9aaed2c1d7e54cee6966c632ced80e643525fe89
Author: Richard Sandiford 
Date:   Tue Jan 9 10:07:09 2024 -0700

[committed] Fix minor bug on mn103 port

Richard Sandiford debugged a failure on the mn103 port with his late-combine
patches down to the subdi3 pattern not specifying the isa on alternatives 
which
required newer variants of the chip family.

This patch adds the missing isa attribute and the port now works with his
late-combine patch.  I'm pushing this to the trunk on his behalf.

gcc/
* config/mn10300/mn10300.md (subdi3_degenerate): Add isa attribute.

diff --git a/gcc/config/mn10300/mn10300.md b/gcc/config/mn10300/mn10300.md
index 939f20d5a58..80b07cc4b36 100644
--- a/gcc/config/mn10300/mn10300.md
+++ b/gcc/config/mn10300/mn10300.md
@@ -957,7 +957,9 @@ (define_insn_and_split "*subdi3_degenerate"
   if (scratch)
 emit_move_insn (operands[0], scratch);
   DONE;
-})
+}
+  [(set_attr "isa" "*,am33")])
+
 
 (define_insn_and_split "negsi2"
   [(set (match_operand:SI 0 "register_operand"  "=D,")


Re: [PATCH 3/7] Lockfile.

2024-01-09 Thread Michal Jires
Hi,
> You do not implement GCOV_LINKED_WITH_LOCKING patch, does locking work
> with mingw? Or we only build gcc with cygwin emulation layer these days?

I tried to test _locking implementation with both mingw and msys2, in both
cases fcntl was present and _locking was not. Admittedly I was unable to
finish bootstrap without errors, so I might have been doing something wrong.

So I didn't include _locking implementation, because I was unable to test it,
and I am unsure whether we even have supported host which would require it.

Michal


[PATCH 2/7 v2] lto: Remove random_seed from section name.

2024-01-09 Thread Michal Jires
This patch removes suffixes from section names during LTO linking.

These suffixes were originally added for ld -r to work (PR lto/44992).
They were added to all LTO object files, but are only useful before WPA.
After that they waste space, and if kept random, make LTO caching impossible.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA.

gcc/lto/ChangeLog:

* lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.
---
 gcc/lto-streamer.cc   | 11 +--
 gcc/lto/lto-common.cc |  7 +++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc
index 8032bbf7108..61b5f8ed4dc 100644
--- a/gcc/lto-streamer.cc
+++ b/gcc/lto-streamer.cc
@@ -132,11 +132,18 @@ lto_get_section_name (int section_type, const char *name,
  doesn't confuse the reader with merged sections.
 
  For options don't add a ID, the option reader cannot deal with them
- and merging should be ok here. */
-  if (section_type == LTO_section_opts)
+ and merging should be ok here.
+
+ LTRANS files (output of wpa, input and output of ltrans) are handled
+ directly inside of linker/lto-wrapper, so name uniqueness for external
+ tools is not needed.
+ Randomness would inhibit incremental LTO.  */
+  if (section_type == LTO_section_opts || flag_ltrans)
 strcpy (post, "");
   else if (f != NULL) 
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
+  else if (flag_wpa)
+strcpy (post, "");
   else
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
   char *res = concat (section_name_prefix, sep, add, post, NULL);
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 11e7d63f1be..44aeeddf46f 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -2174,6 +2174,13 @@ lto_section_with_id (const char *name, unsigned 
HOST_WIDE_INT *id)
 
   if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
 return 0;
+
+  if (flag_ltrans)
+{
+  *id = 0;
+  return 1;
+}
+
   s = strrchr (name, '.');
   if (!s)
 return 0;
-- 
2.43.0



Re: [wwwdocs] gcc-14/changes.html: OpenMP - improve wording

2024-01-09 Thread Tobias Burnus

Ups - now attached. Thanks Martin!

Martin Jambor wrote:

On Mon, Jan 08 2024, Tobias Burnus wrote:

The attached patch


there was no patch attached to your message.

Martin


does a tiny updated to the OpenMP features (AMD GCN
now also has an optimized memcpy_rect not only nvptx), but the main
change is some shifting around to make it more consistent and better
readable.

I intend to commit this relatively soon; like always, comments and
suggestions are welcome - be it before or after the commit.

Current version: http://gcc.gnu.org/gcc-14/changes.html

Thanks,

Tobiasgcc-14/changes.html: OpenMP - improve wording

This is mostly some shifting of items in bullet points to make it
more organized; it also improved the wording a bit. And there is one
new feature: the optimization for omp_target_memcpy_rect is now also
done for AMD GPUs and not only for nvptx.

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index e3a68998..2c64cf67 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -49,14 +49,24 @@ a work-in-progress.
 
   https://gcc.gnu.org/projects/gomp/;>OpenMP
   
+
+  The https://gcc.gnu.org/onlinedocs/libgomp/;>GNU Offloading and
+  Multi Processing Runtime Library Manual has been updated and extended,
+  improving especially the description of ICVs, memory allocation, environment variables and OpenMP
+  routines.
+
 
   The requires directive's unified_address
   requirement is now fulfilled by both AMD GCN and nvptx devices.
   AMD GCN and nvptx devices now support low-latency allocators as
   https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html;
   >detailed in the manual. Initial support for pinned-memory
-  allocators has been added (https://gcc.gnu.org/onlinedocs/libgomp/Memory-allocation.html;
-  >as detailed in the manual)
+  allocators has been added and, on Linux,
+  https://github.com/numactl/numactl;>libnuma is now used
+  for allocators requesting the nearest-partition trait (both is described
+  in the https://gcc.gnu.org/onlinedocs/libgomp/Memory-allocation.html;
+  >memory allocation section of the manual).
 
 OpenMP 5.0: The allocate directive is now
   supported for stack variables in C and Fortran, including the OpenMP 5.1
@@ -74,8 +84,8 @@ a work-in-progress.
   using present as map-type modifier and in
   defaultmap. The indirect clause is now supported
   for C and C++.  The performance of copying strided data from or to nvptx
-  devices using the OpenMP 5.1 routine omp_target_memcpy_rect
-  has been improved.
+  and AMD GPU devices using the OpenMP 5.1 routine
+  omp_target_memcpy_rect has been improved.
 
 
   OpenMP 5.2: The OMP_TARGET_OFFLOAD=mandatory handling has
@@ -83,23 +93,14 @@ a work-in-progress.
   For Fortran, the list of directives permitted in Fortran pure procedures
   was extended. Additionally, the spec change has been implemented for
   default implicit mapping of C/C++ pointers pointing to unmapped storage.
-  The destroy now optionally accepts the depend object as
-  argument.
+  The destroy clause now optionally accepts the depend object
+  as argument.
 
 
   OpenMP 6.0 preview (TR11/TR12): The decl attribute is now
   supported in C++ 11 and the directive, sequence
   and decl attributes are now supported in C 23.
 
-
-  The https://gcc.gnu.org/onlinedocs/libgomp/;>GNU Offloading and
-  Multi Processing Runtime Library Manual has been updated and extended,
-  improving especially the description of ICVs, memory allocation, environment variables and OpenMP
-  routines. On Linux, https://github.com/numactl/numactl;>libnuma
-  is now used for allocators requesting the nearest-partition trait as
-  detailed in the manual.
-
   
   
   https://gcc.gnu.org/wiki/OpenACC;>OpenACC


[PATCH] hwasan: Check if Intel LAM_U57 is enabled

2024-01-09 Thread H.J. Lu
When -fsanitize=hwaddress is used, libhwasan will try to enable LAM_U57
in the startup code.  Update the target check to enable hwaddress tests
if LAM_U57 is enabled.  Also compile hwaddress tests with -mlam=u57 on
x86-64 since hwasan requires LAM_U57 on x86-64.

* lib/hwasan-dg.exp (check_effective_target_hwaddress_exec):
Return 1 if Intel LAM_U57 is enabled.
(hwasan_init): Add -mlam=u57 on x86-64.
---
 gcc/testsuite/lib/hwasan-dg.exp | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/hwasan-dg.exp b/gcc/testsuite/lib/hwasan-dg.exp
index e9c5ef6524d..76057502ee6 100644
--- a/gcc/testsuite/lib/hwasan-dg.exp
+++ b/gcc/testsuite/lib/hwasan-dg.exp
@@ -44,11 +44,25 @@ proc check_effective_target_hwaddress_exec {} {
#ifdef __cplusplus
extern "C" {
#endif
+   extern int arch_prctl (int, unsigned long int *);
extern int prctl(int, unsigned long, unsigned long, unsigned long, 
unsigned long);
#ifdef __cplusplus
}
#endif
int main (void) {
+   #ifdef __x86_64__
+   # ifdef __LP64__
+   #  define ARCH_GET_UNTAG_MASK 0x4001
+   #  define LAM_U57_MASK (0x3fULL << 57)
+ unsigned long mask = 0;
+ if (arch_prctl(ARCH_GET_UNTAG_MASK, ) != 0)
+   return 1;
+ if (mask != ~LAM_U57_MASK)
+   return 1;
+ return 0;
+   # endif
+ return 1;
+   #else
#define PR_SET_TAGGED_ADDR_CTRL 55
#define PR_GET_TAGGED_ADDR_CTRL 56
#define PR_TAGGED_ADDR_ENABLE (1UL << 0)
@@ -58,6 +72,7 @@ proc check_effective_target_hwaddress_exec {} {
  || !prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0))
return 1;
  return 0;
+   #endif
}
 }] {
return 0;
@@ -102,6 +117,10 @@ proc hwasan_init { args } {
 
 setenv HWASAN_OPTIONS "random_tags=0"
 
+if [istarget x86_64-*-*] {
+  set target_hwasan_flags "-mlam=u57"
+}
+
 set link_flags ""
 if ![is_remote host] {
if [info exists TOOL_OPTIONS] {
@@ -119,12 +138,12 @@ proc hwasan_init { args } {
 if [info exists ALWAYS_CXXFLAGS] {
set hwasan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS
set ALWAYS_CXXFLAGS [concat "{ldflags=$link_flags}" $ALWAYS_CXXFLAGS]
-   set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=hwaddress 
--param hwasan-random-frame-tag=0 -g $include_flags}" $ALWAYS_CXXFLAGS]
+   set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=hwaddress 
$target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags}" 
$ALWAYS_CXXFLAGS]
 } else {
if [info exists TEST_ALWAYS_FLAGS] {
-   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress --param 
hwasan-random-frame-tag=0 -g $include_flags $TEST_ALWAYS_FLAGS"
+   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress 
$target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags 
$TEST_ALWAYS_FLAGS"
} else {
-   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress --param 
hwasan-random-frame-tag=0 -g $include_flags"
+   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress 
$target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags"
}
 }
 }
-- 
2.43.0



Re: [PATCH] libgccjit: Add missing builtins needed by optimizations

2024-01-09 Thread David Malcolm
On Fri, 2023-12-22 at 09:39 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds missing builtins needed by optimizations.
> Thanks for the review.

The patch looks good to me.

Thanks!
Dave



Re: [PATCH] libgccjit: Implement sizeof operator

2024-01-09 Thread David Malcolm
On Fri, 2023-12-22 at 10:25 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds the support of the sizeof operator.
> I was wondering if this new API entrypoint should take a location as
> a
> parameter. What do you think?

I'd prefer it if it did (even if it's currently ignored internally),
but it's not a big deal.

> Thanks for the review.

The patch is OK as-is.

Thanks
Dave



[PATCH][committed]middle-end: removed unused variable in vectorizable_live_operation_1

2024-01-09 Thread Tamar Christina
Hi All,

It looks like the previous patch had an unused variable.
It's odd that my bootstrap didn't catch it (I'm assuming
-Werror is still on for O3 bootstraps) but this fixes it.

Committed to fix bootstrap.

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_live_operation_1): Drop unused
restart_loop.
(vectorizable_live_operation): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
39b1161309d8ff8bfe88ee26df9147df0af0a58c..c218d514fe4be57fca97a85a36be7240d3e84edf
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10575,13 +10575,12 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 
helper function for vectorizable_live_operation.  */
 
-tree
+static tree
 vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
   stmt_vec_info stmt_info, basic_block exit_bb,
   tree vectype, int ncopies, slp_tree slp_node,
   tree bitsize, tree bitstart, tree vec_lhs,
-  tree lhs_type, bool restart_loop,
-  gimple_stmt_iterator *exit_gsi)
+  tree lhs_type, gimple_stmt_iterator *exit_gsi)
 {
   gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
@@ -10597,7 +10596,7 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
   if (integer_zerop (bitstart))
 {
   tree scalar_res = gimple_build (, BIT_FIELD_REF, TREE_TYPE 
(vectype),
-  vec_lhs_phi, bitsize, bitstart);
+ vec_lhs_phi, bitsize, bitstart);
 
   /* Convert the extracted vector element to the scalar type.  */
   new_tree = gimple_convert (, lhs_type, scalar_res);
@@ -10958,8 +10957,7 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
 dest, vectype, ncopies,
 slp_node, bitsize,
 tmp_bitstart, tmp_vec_lhs,
-lhs_type, restart_loop,
-_gsi);
+lhs_type, _gsi);
 
  if (gimple_phi_num_args (use_stmt) == 1)
{




-- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
39b1161309d8ff8bfe88ee26df9147df0af0a58c..c218d514fe4be57fca97a85a36be7240d3e84edf
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10575,13 +10575,12 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 
helper function for vectorizable_live_operation.  */
 
-tree
+static tree
 vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
   stmt_vec_info stmt_info, basic_block exit_bb,
   tree vectype, int ncopies, slp_tree slp_node,
   tree bitsize, tree bitstart, tree vec_lhs,
-  tree lhs_type, bool restart_loop,
-  gimple_stmt_iterator *exit_gsi)
+  tree lhs_type, gimple_stmt_iterator *exit_gsi)
 {
   gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
@@ -10597,7 +10596,7 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
   if (integer_zerop (bitstart))
 {
   tree scalar_res = gimple_build (, BIT_FIELD_REF, TREE_TYPE 
(vectype),
-  vec_lhs_phi, bitsize, bitstart);
+ vec_lhs_phi, bitsize, bitstart);
 
   /* Convert the extracted vector element to the scalar type.  */
   new_tree = gimple_convert (, lhs_type, scalar_res);
@@ -10958,8 +10957,7 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
 dest, vectype, ncopies,
 slp_node, bitsize,
 tmp_bitstart, tmp_vec_lhs,
-lhs_type, restart_loop,
-_gsi);
+lhs_type, _gsi);
 
  if (gimple_phi_num_args (use_stmt) == 1)
{





Re: [PATCH V3 0/3] RISC-V: Add intrinsics for Bitmanip and Scalar Crypto extensions

2024-01-09 Thread Christoph Müllner
The tests still fail.

gcc: Unexpected fails for rv64gc lp64d medlow
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-32.c   -O0  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-32.c   -O1  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-32.c   -O2  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-32.c   -Os  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-32.c  -Oz  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-32.c   -O0  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-32.c   -O1  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-32.c   -O2  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-32.c   -Os  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-32.c  -Oz  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c   -O0  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c   -O1  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c   -O2  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c   -Os  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c  -Oz  (test for
excess errors)

Note, this is not only a rv32/rv64 issue, because also -64.c tests fail.

gcc: Unexpected fails for rv32gc ilp32d medlow
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c   -O1
(test for excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c   -O2
(test for excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c   -Os
(test for excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c  -Oz
(test for excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64.c   -O0  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64.c   -O1  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64.c   -O2  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64.c   -Os  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_bitmanip_intrinsic-64.c  -Oz  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c   -O0  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c   -O1  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c   -O2  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c   -Os  (test for
excess errors)
FAIL: gcc.target/riscv/scalar_crypto_intrinsic-64.c  -Oz  (test for
excess errors)




On Tue, Dec 26, 2023 at 6:47 AM Liao Shihua  wrote:
>
> Update v2 -> v3:
>   1. Change pattern mode form X to GPR in orcb, clmul, and brev8.
>   2. Add emulated testsuite.
>   3. Removed duplicate testsuite between built-in and intrinsic.
>   4. Typo fix.
>
> Update v1 -> v2:
>   1. Rename *_intrinsic-* to *_intrinsic-XLEN.
>   2. Typo fix.
>   3. Intrinsics with immediate arguments will use marcos at O0 .
>
> It's a little patch add just provides a mapping from the RV intrinsics to the 
> builtin
> names within GCC.
>
> Liao Shihua (3):
>   RISC-V: Remove the Scalar Bitmanip and Crypto Built-In function
> testsuites
>   RISC-V: Add C intrinsic for Scalar Crypto Extension
>   RISC-V: Add C intrinsic for Scalar Bitmanip Extension
>
>  gcc/config.gcc|   2 +-
>  gcc/config/riscv/bitmanip.md  |  10 +-
>  gcc/config/riscv/crypto.md|   4 +-
>  gcc/config/riscv/riscv-builtins.cc|  22 ++
>  gcc/config/riscv/riscv-cmo.def|  12 +-
>  gcc/config/riscv/riscv-ftypes.def |   2 +
>  gcc/config/riscv/riscv-scalar-crypto.def  |  22 +-
>  gcc/config/riscv/riscv_bitmanip.h | 297 +
>  gcc/config/riscv/riscv_crypto.h   | 309 ++
>  .../riscv/scalar_bitmanip_intrinsic-32.c  |  96 ++
>  .../scalar_bitmanip_intrinsic-64-emulated.c   |  32 ++
>  .../riscv/scalar_bitmanip_intrinsic-64.c  | 114 +++
>  .../riscv/scalar_crypto_intrinsic-32.c| 114 +++
>  .../riscv/scalar_crypto_intrinsic-64.c| 122 +++
>  gcc/testsuite/gcc.target/riscv/zbbw.c |  26 --
>  gcc/testsuite/gcc.target/riscv/zbc32.c|  23 --
>  gcc/testsuite/gcc.target/riscv/zbc64.c|  23 --
>  gcc/testsuite/gcc.target/riscv/zbkb32.c   |  18 -
>  gcc/testsuite/gcc.target/riscv/zbkb64.c   |   5 -
>  gcc/testsuite/gcc.target/riscv/zbkc32.c   |  17 -
>  gcc/testsuite/gcc.target/riscv/zbkc64.c   |  17 -
>  gcc/testsuite/gcc.target/riscv/zbkx32.c   |  18 -
>  gcc/testsuite/gcc.target/riscv/zbkx64.c   |  18 -
>  gcc/testsuite/gcc.target/riscv/zknd32-2.c |  28 --
>  gcc/testsuite/gcc.target/riscv/zknd64-2.c |  42 ---
>  gcc/testsuite/gcc.target/riscv/zkne32-2.c |  28 --
>  

Re: [PATCH]middle-end: check if target can do extract first for early breaks [PR113199]

2024-01-09 Thread H.J. Lu
On Tue, Jan 9, 2024 at 8:10 AM Tamar Christina  wrote:
>
> Hmm I'm confused as to why It didn't break mine.. just did one again.. anyway 
> I'll remove the unused variable.

Can you also make vectorizable_live_operation_1 static?

> > -Original Message-
> > From: Rainer Orth 
> > Sent: Tuesday, January 9, 2024 4:06 PM
> > To: Richard Biener 
> > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd
> > ; j...@ventanamicro.com
> > Subject: Re: [PATCH]middle-end: check if target can do extract first for 
> > early breaks
> > [PR113199]
> >
> > Richard Biener  writes:
> >
> > > On Tue, 9 Jan 2024, Tamar Christina wrote:
> > >
> > >> > > -
> > >> > > -  gimple_seq_add_seq (, tem);
> > >> > > -
> > >> > > -  scalar_res = gimple_build (, CFN_EXTRACT_LAST, 
> > >> > > scalar_type,
> > >> > > -   mask, vec_lhs_phi);
> > >> > > +  scalar_res = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE
> > >> > (vectype),
> > >> > > + vec_lhs_phi, bitstart);
> > >> >
> > >> > So bitstart is always zero?  I wonder why using CFN_VEC_EXTRACT over
> > >> > BIT_FIELD_REF here which wouldn't need any additional target support.
> > >> >
> > >>
> > >> Ok, how about...
> > >>
> > >> ---
> > >>
> > >> I was generating the vector reverse mask without checking if the target
> > >> actually supported such an operation.
> > >>
> > >> This patch changes it to if the bitstart is 0 then use BIT_FIELD_REF 
> > >> instead
> > >> to extract the first element since this is supported by all targets.
> > >>
> > >> This is good for now since masks always come from whilelo.  But in the 
> > >> future
> > >> when masks can come from other sources we will need the old code back.
> > >>
> > >> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > >> and no issues with --enable-checking=release --enable-lto
> > >> --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> > >> tested on cross cc1 for amdgcn-amdhsa and issue fixed.
> > >>
> > >> Ok for master?
> > >
> > > OK.
> > >
> > >> Thanks,
> > >> Tamar
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >>PR tree-optimization/113199
> > >>* tree-vect-loop.cc (vectorizable_live_operation_1): Use
> > >>BIT_FIELD_REF.
> >
> > This patch broke bootstrap (everywhere, it seems; seen on
> > i386-pc-solaris2.11 and sparc-sun-solaris2.11):
> >
> > /vol/gcc/src/hg/master/local/gcc/tree-vect-loop.cc: In function 'tree_node*
> > vectorizable_live_operation_1(loop_vec_info, stmt_vec_info, basic_block, 
> > tree, int,
> > slp_tree, tree, tree, tree, tree, bool, gimple_stmt_iterator*)':
> > /vol/gcc/src/hg/master/local/gcc/tree-vect-loop.cc:10598:52: error: unused
> > parameter 'restart_loop' [-Werror=unused-parameter]
> > 10598 |tree lhs_type, bool restart_loop,
> >   |   ~^~~~
> >
> >   Rainer
> >
> > --
> > -
> > Rainer Orth, Center for Biotechnology, Bielefeld University



-- 
H.J.


RE: [PATCH]middle-end: check if target can do extract first for early breaks [PR113199]

2024-01-09 Thread Tamar Christina
Hmm I'm confused as to why It didn't break mine.. just did one again.. anyway 
I'll remove the unused variable.

> -Original Message-
> From: Rainer Orth 
> Sent: Tuesday, January 9, 2024 4:06 PM
> To: Richard Biener 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd
> ; j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end: check if target can do extract first for 
> early breaks
> [PR113199]
> 
> Richard Biener  writes:
> 
> > On Tue, 9 Jan 2024, Tamar Christina wrote:
> >
> >> > > -
> >> > > -  gimple_seq_add_seq (, tem);
> >> > > -
> >> > > -  scalar_res = gimple_build (, CFN_EXTRACT_LAST, 
> >> > > scalar_type,
> >> > > -   mask, vec_lhs_phi);
> >> > > +  scalar_res = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE
> >> > (vectype),
> >> > > + vec_lhs_phi, bitstart);
> >> >
> >> > So bitstart is always zero?  I wonder why using CFN_VEC_EXTRACT over
> >> > BIT_FIELD_REF here which wouldn't need any additional target support.
> >> >
> >>
> >> Ok, how about...
> >>
> >> ---
> >>
> >> I was generating the vector reverse mask without checking if the target
> >> actually supported such an operation.
> >>
> >> This patch changes it to if the bitstart is 0 then use BIT_FIELD_REF 
> >> instead
> >> to extract the first element since this is supported by all targets.
> >>
> >> This is good for now since masks always come from whilelo.  But in the 
> >> future
> >> when masks can come from other sources we will need the old code back.
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> >> and no issues with --enable-checking=release --enable-lto
> >> --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> >> tested on cross cc1 for amdgcn-amdhsa and issue fixed.
> >>
> >> Ok for master?
> >
> > OK.
> >
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >>PR tree-optimization/113199
> >>* tree-vect-loop.cc (vectorizable_live_operation_1): Use
> >>BIT_FIELD_REF.
> 
> This patch broke bootstrap (everywhere, it seems; seen on
> i386-pc-solaris2.11 and sparc-sun-solaris2.11):
> 
> /vol/gcc/src/hg/master/local/gcc/tree-vect-loop.cc: In function 'tree_node*
> vectorizable_live_operation_1(loop_vec_info, stmt_vec_info, basic_block, 
> tree, int,
> slp_tree, tree, tree, tree, tree, bool, gimple_stmt_iterator*)':
> /vol/gcc/src/hg/master/local/gcc/tree-vect-loop.cc:10598:52: error: unused
> parameter 'restart_loop' [-Werror=unused-parameter]
> 10598 |tree lhs_type, bool restart_loop,
>   |   ~^~~~
> 
>   Rainer
> 
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH]middle-end: check if target can do extract first for early breaks [PR113199]

2024-01-09 Thread Rainer Orth
Richard Biener  writes:

> On Tue, 9 Jan 2024, Tamar Christina wrote:
>
>> > > -
>> > > -  gimple_seq_add_seq (, tem);
>> > > -
>> > > -  scalar_res = gimple_build (, CFN_EXTRACT_LAST, scalar_type,
>> > > - mask, vec_lhs_phi);
>> > > +scalar_res = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE
>> > (vectype),
>> > > +   vec_lhs_phi, bitstart);
>> > 
>> > So bitstart is always zero?  I wonder why using CFN_VEC_EXTRACT over
>> > BIT_FIELD_REF here which wouldn't need any additional target support.
>> > 
>> 
>> Ok, how about...
>> 
>> ---
>> 
>> I was generating the vector reverse mask without checking if the target
>> actually supported such an operation.
>> 
>> This patch changes it to if the bitstart is 0 then use BIT_FIELD_REF instead
>> to extract the first element since this is supported by all targets.
>> 
>> This is good for now since masks always come from whilelo.  But in the future
>> when masks can come from other sources we will need the old code back.
>> 
>> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
>> and no issues with --enable-checking=release --enable-lto
>> --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
>> tested on cross cc1 for amdgcn-amdhsa and issue fixed.
>> 
>> Ok for master?
>
> OK.
>
>> Thanks,
>> Tamar
>> 
>> gcc/ChangeLog:
>> 
>>  PR tree-optimization/113199
>>  * tree-vect-loop.cc (vectorizable_live_operation_1): Use
>>  BIT_FIELD_REF.

This patch broke bootstrap (everywhere, it seems; seen on
i386-pc-solaris2.11 and sparc-sun-solaris2.11):

/vol/gcc/src/hg/master/local/gcc/tree-vect-loop.cc: In function 'tree_node* 
vectorizable_live_operation_1(loop_vec_info, stmt_vec_info, basic_block, tree, 
int, slp_tree, tree, tree, tree, tree, bool, gimple_stmt_iterator*)':
/vol/gcc/src/hg/master/local/gcc/tree-vect-loop.cc:10598:52: error: unused 
parameter 'restart_loop' [-Werror=unused-parameter]
10598 |tree lhs_type, bool restart_loop,
  |   ~^~~~

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH]middle-end: check if target can do extract first for early breaks [PR113199]

2024-01-09 Thread H.J. Lu
On Tue, Jan 9, 2024 at 4:13 AM Richard Biener  wrote:
>
> On Tue, 9 Jan 2024, Tamar Christina wrote:
>
> > > > -
> > > > -  gimple_seq_add_seq (, tem);
> > > > -
> > > > -  scalar_res = gimple_build (, CFN_EXTRACT_LAST, scalar_type,
> > > > -  mask, vec_lhs_phi);
> > > > + scalar_res = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE
> > > (vectype),
> > > > +vec_lhs_phi, bitstart);
> > >
> > > So bitstart is always zero?  I wonder why using CFN_VEC_EXTRACT over
> > > BIT_FIELD_REF here which wouldn't need any additional target support.
> > >
> >
> > Ok, how about...
> >
> > ---
> >
> > I was generating the vector reverse mask without checking if the target
> > actually supported such an operation.
> >
> > This patch changes it to if the bitstart is 0 then use BIT_FIELD_REF instead
> > to extract the first element since this is supported by all targets.
> >
> > This is good for now since masks always come from whilelo.  But in the 
> > future
> > when masks can come from other sources we will need the old code back.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues with --enable-checking=release --enable-lto
> > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> > tested on cross cc1 for amdgcn-amdhsa and issue fixed.
> >
> > Ok for master?
>
> OK.
>
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >   PR tree-optimization/113199
> >   * tree-vect-loop.cc (vectorizable_live_operation_1): Use
> >   BIT_FIELD_REF.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR tree-optimization/113199
> >   * gcc.target/gcn/pr113199.c: New test.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/testsuite/gcc.target/gcn/pr113199.c 
> > b/gcc/testsuite/gcc.target/gcn/pr113199.c
> > new file mode 100644
> > index 
> > ..8a641e5536e80e207ca0163cac66c0f4f6ca93f7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/gcn/pr113199.c
> > @@ -0,0 +1,44 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O2" } */
> > +
> > +typedef long unsigned int size_t;
> > +typedef int wchar_t;
> > +struct tm
> > +{
> > +  int tm_mon;
> > +  int tm_year;
> > +};
> > +int abs (int);
> > +struct lc_time_T { const char *month[12]; };
> > +struct __locale_t * __get_current_locale (void) { }
> > +const struct lc_time_T * __get_time_locale (struct __locale_t *locale) { }
> > +const wchar_t * __ctloc (wchar_t *buf, const char *elem, size_t *len_ret) 
> > { return buf; }
> > +size_t
> > +__strftime (wchar_t *s, size_t maxsize, const wchar_t *format,
> > + const struct tm *tim_p, struct __locale_t *locale)
> > +{
> > +  size_t count = 0;
> > +  const wchar_t *ctloc;
> > +  wchar_t ctlocbuf[256];
> > +  size_t i, ctloclen;
> > +  const struct lc_time_T *_CurrentTimeLocale = __get_time_locale (locale);
> > +{
> > +  switch (*format)
> > + {
> > + case L'B':
> > +   (ctloc = __ctloc (ctlocbuf, _CurrentTimeLocale->month[tim_p->tm_mon], 
> > ));
> > +   for (i = 0; i < ctloclen; i++)
> > + {
> > +   if (count < maxsize - 1)
> > +  s[count++] = ctloc[i];
> > +   else
> > +  return 0;
> > +   {
> > +  int century = tim_p->tm_year >= 0
> > +? tim_p->tm_year / 100 + 1900 / 100
> > +: abs (tim_p->tm_year + 1900) / 100;
> > +   }
> > +   }
> > + }
> > +}
> > +}
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 
> > 37f1be1101ffae779214056a0886411e0683e887..39b1161309d8ff8bfe88ee26df9147df0af0a58c
> >  100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10592,7 +10592,17 @@ vectorizable_live_operation_1 (loop_vec_info 
> > loop_vinfo,
> >
> >gimple_seq stmts = NULL;
> >tree new_tree;
> > -  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> > +
> > +  /* If bitstart is 0 then we can use a BIT_FIELD_REF  */
> > +  if (integer_zerop (bitstart))
> > +{
> > +  tree scalar_res = gimple_build (, BIT_FIELD_REF, TREE_TYPE 
> > (vectype),
> > +vec_lhs_phi, bitsize, bitstart);
> > +
> > +  /* Convert the extracted vector element to the scalar type.  */
> > +  new_tree = gimple_convert (, lhs_type, scalar_res);
> > +}
> > +  else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> >  {
> >/* Emit:
> >
> > @@ -10618,12 +10628,6 @@ vectorizable_live_operation_1 (loop_vec_info 
> > loop_vinfo,
> >tree last_index = gimple_build (, PLUS_EXPR, TREE_TYPE (len),
> >len, bias_minus_one);
> >
> > -  /* This needs to implement extraction of the first index, but not 
> > sure
> > -  how the LEN stuff works.  At the moment we shouldn't get here since
> > -  there's no LEN support for early breaks.  But guard this so there's
> > -  no incorrect codegen.  */
> > -  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> > -
> >/* SCALAR_RES = VEC_EXTRACT .  */
> 

Re: [PATCH] libgccjit: Support signed char flag

2024-01-09 Thread David Malcolm
On Thu, 2023-12-21 at 08:42 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds support for the -fsigned-char flag.

Thanks.  The patch looks correct to me.

> I'm not sure how to test it since I stumbled upon this bug when I
> found
> this other bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863)
> which is now fixed.
> Any idea how I could test this patch?

We already document that GCC_JIT_TYPE_CHAR has "some signedness".  The
bug being fixed here is that gcc_jit_context compilations were always
treating "char" as unsigned, regardless of the value of -fsigned-char
(either from the target's default, or as a context option), when it
makes more sense to follow the C frontend's behavior.

So perhaps jit-written code with a context that has -fsigned-char as an
option (via gcc_jit_context_add_command_line_option), and which
promotes a negative char to a signed int, and then returns the result
as an int?  Presumably if we're erroneously forcing "char" to be
unsigned, the int will be in the range 0x80 to 0xff, rather that being
negative.

Dave



[PATCH] aarch64: Fix dwarf2cfi ICEs due to recent CFI note changes [PR113077]

2024-01-09 Thread Alex Coplan
Hi,

In r14-6604-gd7ee988c491cde43d04fe25f2b3dbad9d85ded45 we changed the CFI notes
attached to callee saves (in aarch64_save_callee_saves).  That patch changed
the ldp/stp representation to use unspecs instead of PARALLEL moves.  This meant
that we needed to attach CFI notes to all frame-related pair saves such that
dwarf2cfi could still emit the appropriate CFI (it cannot interpret the unspecs
directly).  The patch also attached REG_CFA_OFFSET notes to individual saves so
that the ldp/stp pass could easily preserve them when forming stps.

In that change I chose to use REG_CFA_OFFSET, but as the PR shows, that
choice was problematic in that REG_CFA_OFFSET requires the attached
store to be expressed in terms of the current CFA register at all times.
This means that even scheduling of frame-related insns can break this
invariant, leading to ICEs in dwarf2cfi.

The old behaviour (before that change) allowed dwarf2cfi to interpret the RTL
directly for sp-relative saves.  This change restores that behaviour by using
REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.  REG_FRAME_RELATED_EXPR
effectively just gives a different pattern for dwarf2cfi to look at instead of
the main insn pattern.  That allows us to attach the old-style PARALLEL move
representation in a REG_FRAME_RELATED_EXPR note and means we are free to always
express the save addresses in terms of the stack pointer.

Since the ldp/stp fusion pass can combine frame-related stores, this patch also
updates it to preserve REG_FRAME_RELATED_EXPR notes, and additionally gives it
the ability to synthesize those notes when combining sp-relative saves into an
stp (the latter always needs a note due to the unspec representation, the former
does not).

Bootstrapped/regetested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/113077
* config/aarch64/aarch64-ldp-fusion.cc (filter_notes): Add fr_expr 
param to
extract REG_FRAME_RELATED_EXPR notes.
(combine_reg_notes): Handle REG_FRAME_RELATED_EXPR notes, and
synthesize these if needed.  Update caller ...
(ldp_bb_info::fuse_pair): ... here.
* config/aarch64/aarch64.cc (aarch64_save_callee_saves): Use
REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.

gcc/testsuite/ChangeLog:

PR target/113077
* gcc.target/aarch64/pr113077.c: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 2fe1b1d4d84..00bc8b749c8 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -904,9 +904,11 @@ aarch64_operand_mode_for_pair_mode (machine_mode mode)
 // Go through the reg notes rooted at NOTE, dropping those that we should drop,
 // and preserving those that we want to keep by prepending them to (and
 // returning) RESULT.  EH_REGION is used to make sure we have at most one
-// REG_EH_REGION note in the resulting list.
+// REG_EH_REGION note in the resulting list.  FR_EXPR is used to return any
+// REG_FRAME_RELATED_EXPR note we find, as these can need special handling in
+// combine_reg_notes.
 static rtx
-filter_notes (rtx note, rtx result, bool *eh_region)
+filter_notes (rtx note, rtx result, bool *eh_region, rtx *fr_expr)
 {
   for (; note; note = XEXP (note, 1))
 {
@@ -940,6 +942,10 @@ filter_notes (rtx note, rtx result, bool *eh_region)
   copy_rtx (XEXP (note, 0)),
   result);
  break;
+   case REG_FRAME_RELATED_EXPR:
+ gcc_assert (!*fr_expr);
+ *fr_expr = copy_rtx (XEXP (note, 0));
+ break;
default:
  // Unexpected REG_NOTE kind.
  gcc_unreachable ();
@@ -951,13 +957,65 @@ filter_notes (rtx note, rtx result, bool *eh_region)
 
 // Return the notes that should be attached to a combination of I1 and I2, 
where
 // *I1 < *I2.
+//
+// LOAD_P is true for loads, REVERSED is true if the insns in
+// program order are not in offset order, BASE_REGNO is the chosen base
+// register number for the pair, and PATS gives the final RTL patterns for the
+// accesses.
 static rtx
-combine_reg_notes (insn_info *i1, insn_info *i2)
+combine_reg_notes (insn_info *i1, insn_info *i2,
+  bool load_p, bool reversed,
+  int base_regno, rtx pats[2])
 {
+  // Temporary storage for REG_FRAME_RELATED_EXPR notes.
+  rtx fr_expr[2] = {};
+
   bool found_eh_region = false;
   rtx result = NULL_RTX;
-  result = filter_notes (REG_NOTES (i2->rtl ()), result, _eh_region);
-  return filter_notes (REG_NOTES (i1->rtl ()), result, _eh_region);
+  result = filter_notes (REG_NOTES (i2->rtl ()), result,
+_eh_region, fr_expr);
+  result = filter_notes (REG_NOTES (i1->rtl ()), result,
+_eh_region, fr_expr + 1);
+
+  if (!load_p)
+{
+  // Frame-related saves must either be sp-based or must already have
+  // a REG_FRAME_RELATED_EXPR note.
+

Re: [PATCH] SECURITY.txt: Drop "exploitable" in reference to hardening issues

2024-01-09 Thread Richard Biener



> Am 09.01.2024 um 16:13 schrieb Siddhesh Poyarekar :
> 
> On 2023-12-18 09:35, Siddhesh Poyarekar wrote:
>> The "exploitable vulnerability" may lead to a misunderstanding that missed 
>> hardening issues are considered vulnerabilities, just that they're not 
>> exploitable.  This is not true, since while hardening bugs may be 
>> security-relevant, the absence of hardening does not make a program any more 
>> vulnerable to exploits than without.
>> Drop the "exploitable" word to make it clear that missed hardening is not 
>> considered a vulnerability.
> 
> Ping, may I commit this if there are no objections?

Go ahead.

Richard 

> Thanks,
> Sid
> 
>> diff --git a/SECURITY.txt b/SECURITY.txt
>> index b3e2bbfda90..126603d4c22 100644
>> --- a/SECURITY.txt
>> +++ b/SECURITY.txt
>> @@ -155,10 +155,10 @@ Security features implemented in GCC
>>  GCC implements a number of security features that reduce the impact
>>  of security issues in applications, such as -fstack-protector,
>>  -fstack-clash-protection, _FORTIFY_SOURCE and so on.  A failure of
>> -these features to function perfectly in all situations is not an
>> -exploitable vulnerability in itself since it does not affect the
>> -correctness of programs.  Further, they're dependent on heuristics
>> -and may not always have full coverage for protection.
>> +these features to function perfectly in all situations is not a
>> +vulnerability in itself since it does not affect the correctness of
>> +programs.  Further, they're dependent on heuristics and may not
>> +always have full coverage for protection.
>>  Similarly, GCC may transform code in a way that the correctness of
>>  the expressed algorithm is preserved, but supplementary properties


Re: [PATCH] SECURITY.txt: Drop "exploitable" in reference to hardening issues

2024-01-09 Thread Siddhesh Poyarekar

On 2023-12-18 09:35, Siddhesh Poyarekar wrote:
The "exploitable vulnerability" may lead to a misunderstanding that 
missed hardening issues are considered vulnerabilities, just that 
they're not exploitable.  This is not true, since while hardening bugs 
may be security-relevant, the absence of hardening does not make a 
program any more vulnerable to exploits than without.


Drop the "exploitable" word to make it clear that missed hardening is 
not considered a vulnerability.


Ping, may I commit this if there are no objections?

Thanks,
Sid



diff --git a/SECURITY.txt b/SECURITY.txt
index b3e2bbfda90..126603d4c22 100644
--- a/SECURITY.txt
+++ b/SECURITY.txt
@@ -155,10 +155,10 @@ Security features implemented in GCC
  GCC implements a number of security features that reduce the impact
  of security issues in applications, such as -fstack-protector,
  -fstack-clash-protection, _FORTIFY_SOURCE and so on.  A failure of
-    these features to function perfectly in all situations is not an
-    exploitable vulnerability in itself since it does not affect the
-    correctness of programs.  Further, they're dependent on heuristics
-    and may not always have full coverage for protection.
+    these features to function perfectly in all situations is not a
+    vulnerability in itself since it does not affect the correctness of
+    programs.  Further, they're dependent on heuristics and may not
+    always have full coverage for protection.

  Similarly, GCC may transform code in a way that the correctness of
  the expressed algorithm is preserved, but supplementary properties



Re: [PATCH] RISC-V: Also handle sign extension in branch costing

2024-01-09 Thread Jeff Law




On 1/7/24 17:06, Maciej W. Rozycki wrote:

Complement commit c1e8cb3d9f94 ("RISC-V: Rework branch costing model for
if-conversion") and also handle extraneous sign extend operations that
are sometimes produced by `noce_try_cmove_arith' instead of zero extend
operations, making branch costing consistent.  It is unclear what the
condition is for the middle end to choose between the zero extend and
sign extend operation, but the test case included uses sign extension
with 64-bit targets, preventing if-conversion from triggering across all
the architectural variants.

There are further anomalies revealed by the test case, specifically the
exceedingly high branch cost of 6 required for the `-mmovcc' variant
despite that the final branchless sequence only uses 4 instructions, the
missed conversion at -O1 for 32-bit targets even though code is machine
word size agnostic, and the missed conversion at -Os and -Oz for 32-bit
Zicond targets even though the branchless sequence would be shorter than
the branched one.  These will have to be handled separately.

gcc/
* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
Also handle sign extension.

gcc/testsuite/
* gcc.target/riscv/cset-sext-sfb.c: New test.
* gcc.target/riscv/cset-sext-thead.c: New test.
* gcc.target/riscv/cset-sext-ventana.c: New test.
* gcc.target/riscv/cset-sext-zicond.c: New test.
* gcc.target/riscv/cset-sext.c: New test.
---
Hi,

  This is still in regression-testing, but as a branch costing adjustment
only I don't expect any code correctness issues, and the performance
advantage seems very obvious as the sign extend operation applied to the
result of a conditional set instruction is always a no-op, just as with
the zero extension.

  Depending on how you look at it you may qualify this as a bug fix (for
the commit referred; it's surely rare enough a case I missed in original
testing) or a missed optimisation.  Either way it's a narrow-scoped very
small change, almost an obviously correct one.  I'll be very happy to get
it off my plate now, but if it has to wait for GCC 15, I'll accept the
decision.

  OK to apply then or shall I wait?

OK to apply.

jeff


Re: [Committed] RISC-V: Use MAX instead of std::max [VSETVL PASS]

2024-01-09 Thread Jeff Law




On 1/7/24 16:07, 钟居哲 wrote:
Since in the previous review from Robin, he have ever asked me change 
std::max into MAX,

I thought the policy is preferring MAX instead of std::max.

I change the codes to make them consistent but it seems I am wrong.

So is it reasonable that I change all RVV-related codes back to use 
std::max/min ?


If yes, I can send a patch to adapt all of them in RVV related codes.
If Robin asked for MAX, let's leave it as-is.  It's not a hard 
requirement, just a general direction towards std:: when we can.  It may 
be the case that with other codes using MAX nearby that keeping 
consistency is better.


jeff


Re: [PATCH] arm/aarch64: Add bti for all functions [PR106671]

2024-01-09 Thread Andrea Corallo
Andrea Corallo  writes:

> Feng Xue OS via Gcc-patches  writes:
>
>> This patch extends option -mbranch-protection=bti with an optional argument
>> as bti[+all] to force compiler to unconditionally insert bti for all
>> functions. Because a direct function call at the stage of compiling might be
>> rewritten to an indirect call with some kind of linker-generated thunk stub
>> as invocation relay for some reasons. One instance is if a direct callee is
>> placed far from its caller, direct BL {imm} instruction could not represent
>> the distance, so indirect BLR {reg} should be used. For this case, a bti is
>> required at the beginning of the callee.
>>
>>caller() {
>>bl callee
>>}
>> 
>> =>
>> 
>>caller() {
>>adrp   reg, 
>>addreg, reg, #constant
>>blrreg
>>}
>> 
>> Although the issue could be fixed with a pretty new version of ld, here we
>> provide another means for user who has to rely on the old ld or other non-ld
>> linker. I also checked LLVM, by default, it implements bti just as the 
>> proposed
>> -mbranch-protection=bti+all.
>>
>> Feng
>>
>> ---
>>  gcc/config/aarch64/aarch64.cc| 12 +++-
>>  gcc/config/aarch64/aarch64.opt   |  2 +-
>>  gcc/config/arm/aarch-bti-insert.cc   |  3 ++-
>>  gcc/config/arm/aarch-common.cc   | 22 ++
>>  gcc/config/arm/aarch-common.h| 18 ++
>>  gcc/config/arm/arm.cc|  4 ++--
>>  gcc/config/arm/arm.opt   |  2 +-
>>  gcc/doc/invoke.texi  | 16 ++--
>>  gcc/testsuite/gcc.target/aarch64/bti-5.c | 17 +
>>  9 files changed, 76 insertions(+), 20 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/aarch64/bti-5.c
>
> [...]
>
> Hi Feng,
>
> I think this patch is missing its ChangeLog entry.  Also you should
> specify the state of the testing and regression for this patch, please
> see [1].
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/bti-5.c 
>> b/gcc/testsuite/gcc.target/aarch64/bti-5.c
>> new file mode 100644
>> index 000..654cd0cce7e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/bti-5.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O1 -save-temps" } */
>> +/* { dg-require-effective-target lp64 } */
>> +/* { dg-additional-options "-mbranch-protection=bti+all" { target { ! 
>> default_branch_protection } } } */

Also an afterthought: given the patch is enabling this feature on arm as
well wouldn't be better to have a test case for arm as well?

Thanks

  Andrea


Re: [PATCH] c-family: copy attribute diagnostic fixes [PR113262]

2024-01-09 Thread Marek Polacek
On Tue, Jan 09, 2024 at 09:52:17AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> The copy attributes is allowed on decls as well as types and even has
> checks whether decl (set to *node) is DECL_P or TYPE_P, but for diagnostics
> unconditionally uses DECL_SOURCE_LOCATION (decl), which obviously only works
> if it applies to a decl.
> 
> The following patch fixes that, bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk?

Ok, thanks!
 
> 2024-01-09  Jakub Jelinek  
> 
>   PR c/113262
>   * c-attribs.cc (handle_copy_attribute): Don't use
>   DECL_SOURCE_LOCATION (decl) if decl is not DECL_P, use input_location
>   instead.  Formatting fixes.
> 
>   * gcc.dg/pr113262.c: New test.
> 
> --- gcc/c-family/c-attribs.cc.jj  2024-01-03 12:07:02.020736256 +0100
> +++ gcc/c-family/c-attribs.cc 2024-01-08 22:10:04.789616664 +0100
> @@ -3143,13 +3143,14 @@ handle_copy_attribute (tree *node, tree
>if (ref == error_mark_node)
>  return NULL_TREE;
>  
> +  location_t loc = input_location;
> +  if (DECL_P (decl))
> +loc = DECL_SOURCE_LOCATION (decl);
>if (TREE_CODE (ref) == STRING_CST)
>  {
>/* Explicitly handle this case since using a string literal
>as an argument is a likely mistake.  */
> -  error_at (DECL_SOURCE_LOCATION (decl),
> - "%qE attribute argument cannot be a string",
> - name);
> +  error_at (loc, "%qE attribute argument cannot be a string", name);
>return NULL_TREE;
>  }
>  
> @@ -3160,10 +3161,8 @@ handle_copy_attribute (tree *node, tree
>/* Similar to the string case, since some function attributes
>accept literal numbers as arguments (e.g., alloc_size or
>nonnull) using one here is a likely mistake.  */
> -  error_at (DECL_SOURCE_LOCATION (decl),
> - "%qE attribute argument cannot be a constant arithmetic "
> - "expression",
> - name);
> +  error_at (loc, "%qE attribute argument cannot be a constant arithmetic 
> "
> + "expression", name);
>return NULL_TREE;
>  }
>  
> @@ -3171,12 +3170,11 @@ handle_copy_attribute (tree *node, tree
>  {
>/* Another possible mistake (but indirect self-references aren't
>and diagnosed and shouldn't be).  */
> -  if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
> +  if (warning_at (loc, OPT_Wattributes,
> "%qE attribute ignored on a redeclaration "
> -   "of the referenced symbol",
> -   name))
> - inform (DECL_SOURCE_LOCATION (node[1]),
> - "previous declaration here");
> +   "of the referenced symbol", name)
> +   && DECL_P (node[1]))
> + inform (DECL_SOURCE_LOCATION (node[1]), "previous declaration here");
>return NULL_TREE;
>  }
>  
> @@ -3196,7 +3194,8 @@ handle_copy_attribute (tree *node, tree
>   ref = TREE_OPERAND (ref, 1);
>else
>   break;
> -} while (!DECL_P (ref));
> +}
> +  while (!DECL_P (ref));
>  
>/* For object pointer expressions, consider those to be requests
>   to copy from their type, such as in:
> @@ -3228,8 +3227,7 @@ handle_copy_attribute (tree *node, tree
>to a variable, or variable attributes to a function.  */
> if (warning (OPT_Wattributes,
>  "%qE attribute ignored on a declaration of "
> -"a different kind than referenced symbol",
> -name)
> +"a different kind than referenced symbol", name)
> && DECL_P (ref))
>   inform (DECL_SOURCE_LOCATION (ref),
>   "symbol %qD referenced by %qD declared here", ref, decl);
> @@ -3279,9 +3277,7 @@ handle_copy_attribute (tree *node, tree
>  }
>else if (!TYPE_P (decl))
>  {
> -  error_at (DECL_SOURCE_LOCATION (decl),
> - "%qE attribute must apply to a declaration",
> - name);
> +  error_at (loc, "%qE attribute must apply to a declaration", name);
>return NULL_TREE;
>  }
>  
> --- gcc/testsuite/gcc.dg/pr113262.c.jj2024-01-08 22:19:07.414588762 
> +0100
> +++ gcc/testsuite/gcc.dg/pr113262.c   2024-01-08 22:18:51.327815573 +0100
> @@ -0,0 +1,6 @@
> +/* PR c/113262 */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +int [[gnu::copy ("")]] a;/* { dg-error "'copy' attribute argument cannot 
> be a string" } */
> +
> 
>   Jakub
> 

Marek



RE: [PATCH]middle-end: Fix dominators updates when peeling with multiple exits [PR113144]

2024-01-09 Thread Richard Biener
On Tue, 9 Jan 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, January 9, 2024 1:51 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH]middle-end: Fix dominators updates when peeling with
> > multiple exits [PR113144]
> > 
> > On Tue, 9 Jan 2024, Richard Biener wrote:
> > 
> > > On Tue, 9 Jan 2024, Tamar Christina wrote:
> > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Tuesday, January 9, 2024 12:26 PM
> > > > > To: Tamar Christina 
> > > > > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > > > > Subject: RE: [PATCH]middle-end: Fix dominators updates when peeling 
> > > > > with
> > > > > multiple exits [PR113144]
> > > > >
> > > > > On Tue, 9 Jan 2024, Tamar Christina wrote:
> > > > >
> > > > > > > This makes it quadratic in the number of vectorized early exit 
> > > > > > > loops
> > > > > > > in a function.  The vectorizer CFG manipulation operates in a 
> > > > > > > local
> > > > > > > enough bubble that programmatic updating of dominators should be
> > > > > > > possible (after all we manage to produce correct SSA form!), the
> > > > > > > proposed change gets us too far off to a point where 
> > > > > > > re-computating
> > > > > > > dominance info is likely cheaper (but no, we shouldn't do this 
> > > > > > > either).
> > > > > > >
> > > > > > > Can you instead give manual updating a try again?  I think
> > > > > > > versioning should produce up-to-date dominator info, it's only
> > > > > > > when you redirect branches during peeling that you'd need
> > > > > > > adjustments - but IIRC we're never introducing new merges?
> > > > > > >
> > > > > > > IIRC we can't wipe dominators during transform since we query them
> > > > > > > during code generation.  We possibly could code generate all
> > > > > > > CFG manipulations of all vectorized loops, recompute all 
> > > > > > > dominators
> > > > > > > and then do code generation of all vectorized loops.
> > > > > > >
> > > > > > > But then we're doing a loop transform and the exits will 
> > > > > > > ultimatively
> > > > > > > end up in the same place, so the CFG and dominator update is 
> > > > > > > bound to
> > > > > > > where the original exits went to.
> > > > > >
> > > > > > Yeah that's a fair point, the issue is specifically with at_exit.  
> > > > > > So how about:
> > > > > >
> > > > > > When we peel at_exit we are moving the new loop at the exit of the
> > previous
> > > > > > loop.  This means that the blocks outside the loop dat the previous 
> > > > > > loop
> > used to
> > > > > > dominate are no longer being dominated by it.
> > > > >
> > > > > Hmm, indeed.  Note this does make the dominator update 
> > > > > O(function-size)
> > > > > and when vectorizing multiple loops in a function this becomes
> > > > > quadratic.  That's quite unfortunate so I wonder if we can delay the
> > > > > update to the parts we do not need up-to-date dominators during
> > > > > vectorization (of course it gets fragile with having only partly
> > > > > correct dominators).
> > > >
> > > > Fair, I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113290 and 
> > > > will
> > > > tackle it when I add SLP support in GCC 15.
> > > >
> > > > I think the problem is, and the reason we do early dominator correction 
> > > > and
> > > > validation is because the same function is used by loop distribution.
> > > >
> > > > But you're right that during vectorization we perform dominators update 
> > > > twice
> > > > now.
> > >
> > > We're performing it at least once per multi-exit loop that is vectorized,
> > > covering all downstream blocks.
> > 
> > That is, consider sth like
> > 
> > int a[77];
> > 
> > int bar ();
> > void foo ()
> > {
> >   int val;
> > #define LOOP \
> >   val = bar (); \
> >   for (int i = 0; i < 77; ++i) \
> > { \
> >   if (a[i] == val) \
> > break; \
> >   a[i]++; \
> > }
> > #define LOOP10 LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP
> > #define LOOP100 LOOP10 LOOP10 LOOP10 LOOP10 LOOP10 LOOP10 LOOP10
> > LOOP10
> > LOOP10 LOOP10
> > #define LOOP1000 LOOP100 LOOP100 LOOP100 LOOP100 LOOP100 LOOP100
> > LOOP100
> > LOOP100 LOOP100 LOOP100
> >   LOOP1000
> > }
> > 
> > where on x86_64 with -O3 -msse4.1 -fno-gcse -fno-gcse-after-reload we're
> > currently "fine" (calling iterate_fix_dominators 2000 times).  But
> > with geting all dominated blocks you should get every block to exit
> > "fixed" and maybe get dominance compute to show up in the profile.
> > 
> 
> Yeah, that makes sense.  If we can move loop distribution to a different 
> method,
> then we can just perform dominators update only once for all loops at the end
> of vectorization right?

Maybe.  We'll have to see.  It would be good to have a way to mark
dominator regions as invalid (you could simply disconnect the nodes
from the tree).  What we know is that within loops 

  1   2   >