Re: [PATCH 2/2] arm: Add cortex-m52 doc

2024-01-08 Thread Chung-Ju Wu



On 2024/01/08 22:32 UTC+8, Kyrylo Tkachov wrote:




-Original Message-
From: Chung-Ju Wu 
Sent: Monday, January 8, 2024 6:17 AM
To: gcc-patches ; Kyrylo Tkachov
; Richard Earnshaw 
Cc: jason...@anshingtek.com.tw
Subject: [PATCH 2/2] arm: Add cortex-m52 doc

Hi,

This is the patch to add cortex-m52 in the Arm-related options
sections of the gcc invoke.texi documentation.

Is it OK for trunk?


In the ChangeLog entry:
gcc/ChangeLog:

* doc/invoke.texi: Update docs.

Let's be more specific and specify something like
* doc/invoke.texi (Arm Options): Document Cortex-m52 options.

Ok with a better ChangeLog entry.


Hi Kyrylo,

Thanks for the suggestion and approval.
The patch is revised and committed as: 
https://gcc.gnu.org/g:43c4f982113076ad54c3405f865cc63b0a5ba5aa

Thanks,
jasonwucj



Thanks,
Kyrill


Regards,
jasonwucj


Re: [PATCH 1/2] arm: Add cortex-m52 core

2024-01-08 Thread Chung-Ju Wu



On 2024/01/08 22:31 UTC+8, Kyrylo Tkachov wrote:

Hi jasonwucj,


-Original Message-
From: Chung-Ju Wu 
Sent: Monday, January 8, 2024 6:16 AM
To: gcc-patches ; Kyrylo Tkachov
; Richard Earnshaw 
Cc: jason...@anshingtek.com.tw
Subject: [PATCH 1/2] arm: Add cortex-m52 core

Hi,

Recently, Arm announced the Cortex-M52, delivering increased performance
in DSP and ML along with a range of other features and benefits.
For the completeness of Arm ecosystem, we hope that cortex-m52 support
could be available in gcc-14.

Attached is the patch to support cortex-m52 cpu with MVE and PACBTI enabled in
GCC.
Bootstrapped and tested on arm-none-eabi.

Is it OK for trunk?


The patch looks good to me. It should be safe to include it in GCC 14 as it 
doesn’t add any new logic beyond a new entry in arm-cpus.in.
Do you have commit rights to push it?


Hi Kyrylo,

Thanks for the approval.

Yes, I have commit right to push it.
The patch is committed as: 
https://gcc.gnu.org/g:6e249a9ad9d26fb01b147d33be9f9bfebca85c24

Thanks,
jasonwucj



Thanks,
Kyrill



Regards,
jasonwucj


RE: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

2024-01-08 Thread Li, Pan2
The test case pr30957-1.c first comes from this commit about 19 years ago which 
expect the -1.0 for testing.

https://github.com/gcc-mirror/gcc/commit/290358f770d21d9204ea621f839ee8fba606a275

Then the below commit changes from -1.0 to +1.0 for this test file only, 
because of the instantiates copy(s) of the
accumulator which it initializes with +0.0.

https://github.com/gcc-mirror/gcc/commit/ffefa9288ab95b06b1dfed95e7235f4c09619a91

According to the implementation details of insert_var_expansion_initialization. 
The zero_init will be
the CONST0_RTX (mode) if HONOR_SIGNED_ZEROS is false. If my understanding is 
correct, maybe the test
case is not well designed for the variable expanding in unrolling? At least it 
is not good idea to mix/rely on
the HONOR_SIGNED_ZEROS when testing variable-expansion-in-unroller.

CC the original author and please feel free to correct me if any 
misunderstanding.

Pan

-Original Message-
From: Li, Pan2 
Sent: Tuesday, January 9, 2024 9:22 AM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com; jeffreya...@gmail.com
Subject: RE: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

Thanks Richard B for comments.

> We don't really expect targets to do this.  The small testcase above
> is somewhat ill-formed with -fno-signed-zeros.  Note there's no
> -0.0 in pr30957-1.c so why does that one fail for you?  Does
> the -fvariable-expansion-in-unroller code maybe not trigger for
> riscv?

Sorry this confused me a little about the sematics of the option 
-fno-signed-zeros.
I wonder what the target/backend need to do for this option.

About the failure, it comes from below code in pr30957-1.c. The 0.0 / -5.0 is 
initialized to -0.0 in riscv but +0.0 in aarch64.

  if (__builtin_copysignf (1.0, foo (0.0 / -5.0, 10)) != 1.0)
abort ();

If my understanding is correct, the loop will be vectorized during 
vect_transform_loop with a variable factor.
It won't benefit from unrolling/peeling and mark the loop->unroll as 1, and 
then we have tree-vect log similar to below:

Disabling unrolling due to variable-length vectorization factor.

> I think we should go to PR30957 and see what that was filed originally
> for, the testcase doesn't make much sense to me.

Sure thing, will take a look and back to you later.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, January 8, 2024 6:45 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com; jeffreya...@gmail.com
Subject: Re: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

On Tue, Jan 2, 2024 at 2:37 PM  wrote:
>
> From: Pan Li 
>
> According to the sematics of no-signed-zeros option, the backend
> like RISC-V should treat the minus zero -0.0f as plus zero 0.0f.
>
> Consider below example with option -fno-signed-zeros.
>
> void
> test (float *a)
> {
>   *a = -0.0;
> }
>
> We will generate code as below, which doesn't treat the minus zero
> as plus zero.
>
> test:
>   lui  a5,%hi(.LC0)
>   flw  fa5,%lo(.LC0)(a5)
>   fsw  fa5,0(a0)
>   ret
>
> .LC0:
>   .word -2147483648 // aka -0.0 (0x8000 in hex)
>
> This patch would like to fix the bug and treat the minus zero -0.0
> as plus zero, aka +0.0. Thus after this patch we will have asm code
> as below for the above sampe code.
>
> test:
>   sw zero,0(a0)
>   ret
>
> This patch also fix the run failure of the test case pr30957-1.c. The
> below tests are passed for this patch.

We don't really expect targets to do this.  The small testcase above
is somewhat ill-formed with -fno-signed-zeros.  Note there's no
-0.0 in pr30957-1.c so why does that one fail for you?  Does
the -fvariable-expansion-in-unroller code maybe not trigger for
riscv?

I think we should go to PR30957 and see what that was filed originally
for, the testcase doesn't make much sense to me.

> * The riscv regression tests.
> * The pr30957-1.c run tests.
>
> gcc/ChangeLog:
>
> * config/riscv/constraints.md: Leverage func 
> riscv_float_const_zero_rtx_p
> for predicating the rtx is const zero float or not.
> * config/riscv/predicates.md: Ditto.
> * config/riscv/riscv.cc (riscv_const_insns): Ditto.
> (riscv_float_const_zero_rtx_p): New func impl for predicating the rtx 
> is
> const zero float or not.
> (riscv_const_zero_rtx_p): New func impl for predicating the rtx
> is const zero (both int and fp) or not.
> * config/riscv/riscv-protos.h (riscv_float_const_zero_rtx_p):
> New func decl.
> (riscv_const_zero_rtx_p): Ditto.
> * config/riscv/riscv.md: Making sure the operand[1] of movfp is
> CONST0_RTX when the operand[1] is const zero float.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/no-signed-zeros-0.c: New test.
> * gcc.target/riscv/no-signed-zeros-1.c: New test.
> * gcc.target/riscv/no-signed-zeros-2.c: New test.
> * 

[wwwdocs][PATCH] gcc-14/changes: Update APX inline asm behavior for x86_64

2024-01-08 Thread Hongyu Wang
Hi,

This patch adds missing description for inline asm behavior and related
compiler switch for APX.

Ok for gcc-wwwdocs?

---
 htdocs/gcc-14/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index e3a68998..73a90d30 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -342,6 +342,12 @@ a work-in-progress.
   NDD, PPX and PUSH2POP2. APX support is available via the
   -mapxf compiler switch.
   
+  For inline asm support with APX, by default the EGPR feature was
+  disabled to prevent potential illegal instruction with EGPR occurs.
+  To invoke egpr usage in inline asm, use new compiler option
+  -mapx-inline-asm-use-gpr32 and user should ensure the instruction
+  supports EGPR.
+  
   New ISA extension support for Intel AVX10.1 was added.
   AVX10.1 intrinsics are available via the -mavx10.1 or
   -mavx10.1-256 compiler switch with 256-bit vector size
-- 
2.31.1



[PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-08 Thread Hongyu Wang
Hi,

For APX, the inline asm behavior was not mentioned in any document
before. Add description for it.

Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.opt: Adjust document.
* doc/invoke.texi: Add description for
-mapx-inline-asm-use-gpr32.
---
 gcc/config/i386/i386.opt | 3 +--
 gcc/doc/invoke.texi  | 7 +++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index a38e92baf92..5b4f1bff25f 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1357,8 +1357,7 @@ Enum(apx_features) String(all) Value(apx_all) Set(1)
 
 mapx-inline-asm-use-gpr32
 Target Var(ix86_apx_inline_asm_use_gpr32) Init(0)
-Enable GPR32 in inline asm when APX_EGPR enabled, do not
-hook reg or mem constraint in inline asm to GPR16.
+Enable GPR32 in inline asm when APX_F enabled.
 
 mevex512
 Target Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 68d1f364ac0..47fd96648d8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -35272,6 +35272,13 @@ r8-r15 registers so that the call and jmp instruction 
length is 6 bytes
 to allow them to be replaced with @samp{lfence; call *%r8-r15} or
 @samp{lfence; jmp *%r8-r15} at run-time.
 
+@opindex mapx-inline-asm-use-gpr32
+@item -mapx-inline-asm-use-gpr32
+When APX_F enabled, EGPR usage was by default disabled to prevent
+unexpected EGPR generation in instructions that does not support it.
+To invoke EGPR usage in inline asm, use this switch to allow EGPR in
+inline asm, while user should ensure the asm actually supports EGPR.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
-- 
2.31.1



[PATCH] Add -mevex512 into invoke.texi

2024-01-08 Thread Haochen Jiang
Hi all,

In invoke.texi, -mevex512 is missing. This patch adds that.

Ok for trunk?

Thx,
Haochen

gcc/ChangeLog:

* doc/invoke.texi: Add -mevex512.
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 68d1f364ac0..1a92dcdc1ef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1463,7 +1463,7 @@ See RS/6000 and PowerPC Options.
 -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni
 -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16
 -mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4 -mapxf
--musermsr -mavx10.1 -mavx10.1-256 -mavx10.1-512
+-musermsr -mavx10.1 -mavx10.1-256 -mavx10.1-512 -mevex512
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg}
 -mkl -mwidekl
-- 
2.31.1



Re:[pushed] [PATCH] LoongArch: Implenment vec_init where N is a LSX vector mode

2024-01-08 Thread chenglulu

Pushed to r14-7022.

在 2024/1/5 下午3:38, Jiahao Xu 写道:

This patch implenments more vec_init optabs that can handle two LSX vectors 
producing a LASX
vector by concatenating them. When an lsx vector is concatenated with an LSX 
const_vector of
zeroes, the vec_concatz pattern can be used effectively. For example as below

typedef short v8hi __attribute__ ((vector_size (16)));
typedef short v16hi __attribute__ ((vector_size (32)));
v8hi a, b;

v16hi vec_initv16hiv8hi ()
{
  return __builtin_shufflevector (a, b, 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 
6, 14, 7, 15);
}

Before this patch:

vec_initv16hiv8hi:
 addi.d  $r3,$r3,-64
 .cfi_def_cfa_offset 64
 xvrepli.h   $xr0,0
 la.local$r12,.LANCHOR0
 xvst$xr0,$r3,0
 xvst$xr0,$r3,32
 vld $vr0,$r12,0
 vst $vr0,$r3,0
 vld $vr0,$r12,16
 vst $vr0,$r3,32
 xvld$xr1,$r3,32
 xvld$xr2,$r3,32
 xvld$xr0,$r3,0
 xvilvh.h$xr0,$xr1,$xr0
 xvld$xr1,$r3,0
 xvilvl.h$xr1,$xr2,$xr1
 addi.d  $r3,$r3,64
 .cfi_def_cfa_offset 0
 xvpermi.q   $xr0,$xr1,32
 jr  $r1

After this patch:

vec_initv16hiv8hi:
 la.local$r12,.LANCHOR0
 vld $vr0,$r12,32
 vld $vr2,$r12,48
 xvilvh.h$xr1,$xr2,$xr0
 xvilvl.h$xr0,$xr2,$xr0
 xvpermi.q   $xr1,$xr0,32
 xvst$xr1,$r4,0
 jr  $r1

gcc/ChangeLog:

* config/loongarch/lasx.md (vec_initv32qiv16qi): Rename to ..
(vec_init): .. this, and extend to mode.
(@vec_concatz): New insn pattern.
* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init):
Handle VALS containing two vectors.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vec-init-2.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e196613ffe4..36dc3d95eac 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -465,6 +465,11 @@
 (V16HI "w")
 (V32QI "w")])
  
+;; Half modes of all LASX vector modes, in lower-case.

+(define_mode_attr lasxhalf [(V32QI "v16qi")  (V16HI "v8hi")
+ (V8SI "v4si")  (V4DI  "v2di")
+ (V8SF  "v4sf") (V4DF  "v2df")])
+
  (define_expand "vec_init"
[(match_operand:LASX 0 "register_operand")
 (match_operand:LASX 1 "")]
@@ -474,9 +479,9 @@
DONE;
  })
  
-(define_expand "vec_initv32qiv16qi"

- [(match_operand:V32QI 0 "register_operand")
-  (match_operand:V16QI 1 "")]
+(define_expand "vec_init"
+ [(match_operand:LASX 0 "register_operand")
+  (match_operand: 1 "")]
"ISA_HAS_LASX"
  {
loongarch_expand_vector_group_init (operands[0], operands[1]);
@@ -577,6 +582,21 @@
[(set_attr "type" "simd_insert")
 (set_attr "mode" "")])
  
+(define_insn "@vec_concatz"

+  [(set (match_operand:LASX 0 "register_operand" "=f")
+(vec_concat:LASX
+  (match_operand: 1 "nonimmediate_operand")
+  (match_operand: 2 "const_0_operand")))]
+  "ISA_HAS_LASX"
+{
+  if (MEM_P (operands[1]))
+return "vld\t%w0,%1";
+  else
+return "vori.b\t%w0,%w1,0";
+}
+  [(set_attr "type" "simd_splat")
+   (set_attr "mode" "")])
+
  (define_insn "vec_concat"
[(set (match_operand:LASX 0 "register_operand" "=f")
(vec_concat:LASX
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 28d64135c54..b2a296a1dd9 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -9858,10 +9858,46 @@ loongarch_gen_const_int_vector_shuffle (machine_mode 
mode, int val)
  void
  loongarch_expand_vector_group_init (rtx target, rtx vals)
  {
-  rtx ops[2] = { force_reg (E_V16QImode, XVECEXP (vals, 0, 0)),
-  force_reg (E_V16QImode, XVECEXP (vals, 0, 1)) };
-  emit_insn (gen_rtx_SET (target, gen_rtx_VEC_CONCAT (E_V32QImode, ops[0],
- ops[1])));
+  machine_mode vmode = GET_MODE (target);
+  machine_mode half_mode = VOIDmode;
+  rtx low = XVECEXP (vals, 0, 0);
+  rtx high = XVECEXP (vals, 0, 1);
+
+  switch (vmode)
+{
+case E_V32QImode:
+  half_mode = V16QImode;
+  break;
+case E_V16HImode:
+  half_mode = V8HImode;
+  break;
+case E_V8SImode:
+  half_mode = V4SImode;
+  break;
+case E_V4DImode:
+  half_mode = V2DImode;
+  break;
+case E_V8SFmode:
+  half_mode = V4SFmode;
+  break;
+case E_V4DFmode:
+  half_mode = V2DFmode;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (high == CONST0_RTX (half_mode))
+emit_insn (gen_vec_concatz (vmode, target, low, high));
+  else
+{
+  if (!register_operand (low, half_mode))
+   low = force_reg (half_mode, low);
+  if (!register_operand (high, half_mode))
+   high = force_reg (half_mode, high);
+  emit_insn (gen_rtx_SET (target,
+ gen_rtx_VEC_CONCAT (vmode, low, high)));
+}
  }
  
  /* Expand initialization of a vector which has all same 

Re:[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2024-01-08 Thread joshua
It has been updated.
[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector (gnu.org)




--
发件人:钟居哲 
发送时间:2024年1月9日(星期二) 07:08
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:"jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; Jeff 
Law; "Christoph Müllner"; 
"cooper.joshua"; 
jinma; Cooper Qu
主 题:Re: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector


-  return TAIL_ANY;
+  return TARGET_XTHEADVECTOR ? TAIL_AGNOSTIC : TAIL_ANY;



-  return MASK_ANY;
+  return TARGET_XTHEADVECTOR ? MASK_UNDISTURBED : MASK_ANY;



You shouldn't change this.


-  "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"
+  { return TARGET_XTHEADVECTOR ? "vsetvli\t%0,%1,e%2,%m3" : 
"vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"; }

I prefer do it in ASM_OUTPUT


+   Copyright (C) 2022-2023 Free Software Foundation, Inc.


Copyright is not correct.


juzhe.zh...@rivai.ai

 
From: Jun Sha (Joshua)
Date: 2024-01-03 14:15
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc    |   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md    |   4 +-
 gcc/config/riscv/riscv-c.cc   |   3 +-
 gcc/config/riscv/riscv-string.cc  |   3 +-
 gcc/config/riscv/riscv-v.cc   |   6 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  48 +++--
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h    |  49 +
 gcc/config/riscv/thead-vector.md  |  69 +++
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md    |  55 --
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |  12 ++
 17 files changed, 427 insertions(+), 209 deletions(-)
 

[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector

2024-01-08 Thread Jun Sha (Joshua)
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 

For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.

gcc/ChangeLog:

* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md|   4 +-
 gcc/config/riscv/riscv-c.cc   |   3 +-
 gcc/config/riscv/riscv-string.cc  |   3 +-
 gcc/config/riscv/riscv-v.cc   |   2 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  48 +++--
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h|  49 +
 gcc/config/riscv/thead-vector.md  | 102 ++
 gcc/config/riscv/thead.cc |  23 ++-
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md|  49 -
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |  12 ++
 18 files changed, 476 insertions(+), 206 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h
 create mode 100644 gcc/config/riscv/thead-vector.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7e583390024..047e4c02cf4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 775eaa825b0..0477781cabe 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2579,7 +2579,7 @@
   [(match_operand  0 "register_operand")
(match_operand  1 "memory_operand")
(match_operand:ANYI 2 "const_int_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && 

[Committed] RISC-V: Fix comments of segment load/store intrinsic

2024-01-08 Thread Juzhe-Zhong
We have supported segment load/store intrinsics.

Committed as it is obvious.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (vleff): Move 
comments.
(vundefined): Ditto.

---
 gcc/config/riscv/riscv-vector-builtins-functions.def | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 96dd0d95dec..f742c98be8a 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -79,8 +79,6 @@ DEF_RVV_FUNCTION (vsoxei64, indexed_loadstore, none_m_preds, 
all_v_scalar_ptr_ee
 // 7.7. Unit-stride Fault-Only-First Loads
 DEF_RVV_FUNCTION (vleff, fault_load, full_preds, 
all_v_scalar_const_ptr_size_ptr_ops)
 
-// TODO: 7.8. Vector Load/Store Segment Instructions
-
 /* 11. Vector Integer Arithmetic Instructions.  */
 
 // 11.1. Vector Single-Width Integer Add and Subtract
@@ -630,6 +628,8 @@ DEF_RVV_FUNCTION (vset, vset, none_preds, 
all_v_vset_tuple_ops)
 DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_tuple_ops)
 DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_tuple_ops)
 DEF_RVV_FUNCTION (vundefined, vundefined, none_preds, all_none_void_tuple_ops)
+
+// 7.8. Vector Load/Store Segment Instructions
 DEF_RVV_FUNCTION (vlseg, seg_loadstore, full_preds, 
tuple_v_scalar_const_ptr_ops)
 DEF_RVV_FUNCTION (vsseg, seg_loadstore, none_m_preds, tuple_v_scalar_ptr_ops)
 DEF_RVV_FUNCTION (vlsseg, seg_loadstore, full_preds, 
tuple_v_scalar_const_ptr_ptrdiff_ops)
-- 
2.36.3



Re:[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2024-01-08 Thread joshua
For the vsetvl issue, we have discussed last week. 
Maybe riscv_asm_output function cannot return
instructions like riscv_output_move.
The briefest approach may be to add some logic in
the vsetvl patterns. Only 3 patterns need to be modified
and that will not be  too invasive.







--
发件人:钟居哲 
发送时间:2024年1月9日(星期二) 07:08
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:"jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; Jeff 
Law; "Christoph Müllner"; 
"cooper.joshua"; 
jinma; Cooper Qu
主 题:Re: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector


-  return TAIL_ANY;
+  return TARGET_XTHEADVECTOR ? TAIL_AGNOSTIC : TAIL_ANY;



-  return MASK_ANY;
+  return TARGET_XTHEADVECTOR ? MASK_UNDISTURBED : MASK_ANY;



You shouldn't change this.


-  "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"
+  { return TARGET_XTHEADVECTOR ? "vsetvli\t%0,%1,e%2,%m3" : 
"vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"; }

I prefer do it in ASM_OUTPUT


+   Copyright (C) 2022-2023 Free Software Foundation, Inc.


Copyright is not correct.


juzhe.zh...@rivai.ai

 
From: Jun Sha (Joshua)
Date: 2024-01-03 14:15
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc    |   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md    |   4 +-
 gcc/config/riscv/riscv-c.cc   |   3 +-
 gcc/config/riscv/riscv-string.cc  |   3 +-
 gcc/config/riscv/riscv-v.cc   |   6 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  48 +++--
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h    |  49 +
 gcc/config/riscv/thead-vector.md  |  69 +++
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md    |  55 --
 .../gcc.target/riscv/rvv/base/abi-1.c 

[Committed] RISC-V: Fix comments of segment load/store intrinsic[NFC]

2024-01-08 Thread Juzhe-Zhong
We have supported segment load/store intrinsics.

Committed as it is obvious.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (vleff): Move 
comments to real place.
(vcreate): Ditto.

---
 gcc/config/riscv/riscv-vector-builtins-functions.def | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 96dd0d95dec..14560923d11 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -79,8 +79,6 @@ DEF_RVV_FUNCTION (vsoxei64, indexed_loadstore, none_m_preds, 
all_v_scalar_ptr_ee
 // 7.7. Unit-stride Fault-Only-First Loads
 DEF_RVV_FUNCTION (vleff, fault_load, full_preds, 
all_v_scalar_const_ptr_size_ptr_ops)
 
-// TODO: 7.8. Vector Load/Store Segment Instructions
-
 /* 11. Vector Integer Arithmetic Instructions.  */
 
 // 11.1. Vector Single-Width Integer Add and Subtract
@@ -625,7 +623,7 @@ DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, 
all_v_vcreate_lmul2_x2_ops)
 DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x4_ops)
 DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul4_x2_ops)
 
-// Tuple types
+// 7.8. Vector Load/Store Segment Instructions
 DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops)
 DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_tuple_ops)
 DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_tuple_ops)
-- 
2.36.3



回复: Re: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.

2024-01-08 Thread Feng Wang

Committed, thanks Juzhe.
 
发件人: 钟居哲
发送时间: 2024-01-09 07:02
收件人: wangfeng; gcc-patches
抄送: kito.cheng; Jeff Law; wangfeng
主题: Re: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.
LGTM.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-08 17:12
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.
Patch v7:Resubmit after fix trl-checking issue. Passed all the riscv regression 
test.
Patch v6:Remove unused code.
Patch v5:Rebase.
Patch v4:Merge crypto vector function.def into vector.
Patch v3:Define a shape for vaesz and merge vector-crypto-types.def
 into riscv-vector-builtins-types.def.
Patch v2:Optimize function_shape class for crypto_vector.
 
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto. 
(class b_reverse):Ditto. 
(class vwsll):   Ditto. 
(class clmul):   Ditto. 
(class vg_nhab):  Ditto. 
(class crypto_vv):Ditto. 
(class crypto_vi):Ditto. 
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
Add crypto vector intrinsic definition.
(vbrev): Ditto.
(vclz): Ditto.
(vctz): Ditto.
(vwsll): Ditto.
(vandn): Ditto.
(vbrev8): Ditto.
(vrev8): Ditto.
(vrol): Ditto.
(vror): Ditto.
(vclmul): Ditto.
(vclmulh): Ditto.
(vghsh): Ditto.
(vgmul): Ditto.
(vaesef): Ditto.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz): Ditto.
(vaeskf1): Ditto.
(vaeskf2): Ditto.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
(vsm4k): Ditto.
(vsm4r): Ditto.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(registered_function::overloaded_hash): Processing size_t uimm for C overloaded 
func.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
---
.../riscv/riscv-vector-builtins-bases.cc  | 264 +-
.../riscv/riscv-vector-builtins-bases.h   |  28 ++
.../riscv/riscv-vector-builtins-functions.def |  94 +++
.../riscv/riscv-vector-builtins-shapes.cc |  87 +-
.../riscv/riscv-vector-builtins-shapes.h  |   4 +
.../riscv/riscv-vector-builtins-types.def |  25 ++
gcc/config/riscv/riscv-vector-builtins.cc | 133 -
gcc/config/riscv/riscv-vector-builtins.def|   1 +
8 files changed, 633 insertions(+), 3 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..d12bb89f91c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,212 @@ public:
   }
};
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+{
+  case OP_TYPE_v:
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn 

回复: Re: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-08 Thread Feng Wang

Committed, thanks Juzhe.

发件人: 钟居哲
发送时间: 2024-01-09 07:02
收件人: wangfeng; gcc-patches
抄送: kito.cheng; Jeff Law; wangfeng
主题: Re: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.
LGTM.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-08 17:12
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.
Patch v8: Resubmit after fix the rtl-checking issue. Passed all the riscv 
regression test.
Patch v7: Add newline at the end of file.
Patch v6: Move intrinsic tests into rvv/base.
Patch v5: Rebase
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing cases.
Patch v2: Update march info according to the change of riscv-common.c
 
This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test.
* gcc.target/riscv/zvkb.c: New test.
---
.../riscv/rvv/base/zvbb-intrinsic.c   | 179 ++
.../riscv/rvv/base/zvbb_vandn_vx_constraint.c |  15 ++
.../riscv/rvv/base/zvbc-intrinsic.c   |  62 ++
.../riscv/rvv/base/zvbc_vx_constraint-1.c |  14 ++
.../riscv/rvv/base/zvbc_vx_constraint-2.c |  14 ++
.../riscv/rvv/base/zvkg-intrinsic.c   |  24 +++
.../riscv/rvv/base/zvkned-intrinsic.c | 104 ++
.../riscv/rvv/base/zvknha-intrinsic.c |  33 
.../riscv/rvv/base/zvknhb-intrinsic.c |  33 
.../riscv/rvv/base/zvksed-intrinsic.c |  33 
.../riscv/rvv/base/zvksh-intrinsic.c  |  24 +++
gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
12 files changed, 548 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
new file mode 100644
index 000..b7e25bfe819
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include "riscv_vector.h"
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 

Re: Re: [PATCH] RISC-V: Teach liveness computation loop invariant shift amount[Dynamic LMUL]

2024-01-08 Thread juzhe.zh...@rivai.ai
Yes. It does sufficient. Send a patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642216.html 




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-09 00:45
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Teach liveness computation loop invariant shift 
amount[Dynamic LMUL]
> > +  if (is_gimple_min_invariant (op))
> > +return true;
> > +  if (SSA_NAME_IS_DEFAULT_DEF (op)
> > +  || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT 
> (op
> > +return true;
> > +  return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1;
> > +}
> > +
 
Does gimple_uid ever return something useful for us here?
In tree-ssa-loop-ch it is being populated
before and then used but I don't think we populate it properly?
 
So my question would be, isn't is_gimple_constant and
flow_bb_inside_loop_p sufficient for our purpose?
 
Regards
Robin
 


[PATCH] RISC-V: Fix loop invariant check

2024-01-08 Thread Juzhe-Zhong
As Robin suggested, remove gimple_uid check which is sufficient for our need.

Tested on both RV32/RV64 no regression, ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (loop_invariant_op_p): Fix loop 
invariant check.

---
 gcc/config/riscv/riscv-vector-costs.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 3bae581d6fd..f4a1a789f23 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -241,7 +241,7 @@ loop_invariant_op_p (class loop *loop,
   if (SSA_NAME_IS_DEFAULT_DEF (op)
   || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (op
 return true;
-  return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1;
+  return false;
 }
 
 /* Return true if the variable should be counted into liveness.  */
-- 
2.36.3



RE: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

2024-01-08 Thread Li, Pan2
Thanks Richard B for comments.

> We don't really expect targets to do this.  The small testcase above
> is somewhat ill-formed with -fno-signed-zeros.  Note there's no
> -0.0 in pr30957-1.c so why does that one fail for you?  Does
> the -fvariable-expansion-in-unroller code maybe not trigger for
> riscv?

Sorry this confused me a little about the sematics of the option 
-fno-signed-zeros.
I wonder what the target/backend need to do for this option.

About the failure, it comes from below code in pr30957-1.c. The 0.0 / -5.0 is 
initialized to -0.0 in riscv but +0.0 in aarch64.

  if (__builtin_copysignf (1.0, foo (0.0 / -5.0, 10)) != 1.0)
abort ();

If my understanding is correct, the loop will be vectorized during 
vect_transform_loop with a variable factor.
It won't benefit from unrolling/peeling and mark the loop->unroll as 1, and 
then we have tree-vect log similar to below:

Disabling unrolling due to variable-length vectorization factor.

> I think we should go to PR30957 and see what that was filed originally
> for, the testcase doesn't make much sense to me.

Sure thing, will take a look and back to you later.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, January 8, 2024 6:45 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com; jeffreya...@gmail.com
Subject: Re: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

On Tue, Jan 2, 2024 at 2:37 PM  wrote:
>
> From: Pan Li 
>
> According to the sematics of no-signed-zeros option, the backend
> like RISC-V should treat the minus zero -0.0f as plus zero 0.0f.
>
> Consider below example with option -fno-signed-zeros.
>
> void
> test (float *a)
> {
>   *a = -0.0;
> }
>
> We will generate code as below, which doesn't treat the minus zero
> as plus zero.
>
> test:
>   lui  a5,%hi(.LC0)
>   flw  fa5,%lo(.LC0)(a5)
>   fsw  fa5,0(a0)
>   ret
>
> .LC0:
>   .word -2147483648 // aka -0.0 (0x8000 in hex)
>
> This patch would like to fix the bug and treat the minus zero -0.0
> as plus zero, aka +0.0. Thus after this patch we will have asm code
> as below for the above sampe code.
>
> test:
>   sw zero,0(a0)
>   ret
>
> This patch also fix the run failure of the test case pr30957-1.c. The
> below tests are passed for this patch.

We don't really expect targets to do this.  The small testcase above
is somewhat ill-formed with -fno-signed-zeros.  Note there's no
-0.0 in pr30957-1.c so why does that one fail for you?  Does
the -fvariable-expansion-in-unroller code maybe not trigger for
riscv?

I think we should go to PR30957 and see what that was filed originally
for, the testcase doesn't make much sense to me.

> * The riscv regression tests.
> * The pr30957-1.c run tests.
>
> gcc/ChangeLog:
>
> * config/riscv/constraints.md: Leverage func 
> riscv_float_const_zero_rtx_p
> for predicating the rtx is const zero float or not.
> * config/riscv/predicates.md: Ditto.
> * config/riscv/riscv.cc (riscv_const_insns): Ditto.
> (riscv_float_const_zero_rtx_p): New func impl for predicating the rtx 
> is
> const zero float or not.
> (riscv_const_zero_rtx_p): New func impl for predicating the rtx
> is const zero (both int and fp) or not.
> * config/riscv/riscv-protos.h (riscv_float_const_zero_rtx_p):
> New func decl.
> (riscv_const_zero_rtx_p): Ditto.
> * config/riscv/riscv.md: Making sure the operand[1] of movfp is
> CONST0_RTX when the operand[1] is const zero float.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/no-signed-zeros-0.c: New test.
> * gcc.target/riscv/no-signed-zeros-1.c: New test.
> * gcc.target/riscv/no-signed-zeros-2.c: New test.
> * gcc.target/riscv/no-signed-zeros-3.c: New test.
> * gcc.target/riscv/no-signed-zeros-4.c: New test.
> * gcc.target/riscv/no-signed-zeros-5.c: New test.
> * gcc.target/riscv/no-signed-zeros-run-0.c: New test.
> * gcc.target/riscv/no-signed-zeros-run-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/constraints.md   |  2 +-
>  gcc/config/riscv/predicates.md|  2 +-
>  gcc/config/riscv/riscv-protos.h   |  2 +
>  gcc/config/riscv/riscv.cc | 35 -
>  gcc/config/riscv/riscv.md | 49 ---
>  .../gcc.target/riscv/no-signed-zeros-0.c  | 26 ++
>  .../gcc.target/riscv/no-signed-zeros-1.c  | 28 +++
>  .../gcc.target/riscv/no-signed-zeros-2.c  | 26 ++
>  .../gcc.target/riscv/no-signed-zeros-3.c  | 28 +++
>  .../gcc.target/riscv/no-signed-zeros-4.c  | 26 ++
>  .../gcc.target/riscv/no-signed-zeros-5.c  | 28 +++
>  .../gcc.target/riscv/no-signed-zeros-run-0.c  | 36 ++
>  .../gcc.target/riscv/no-signed-zeros-run-1.c  | 36 ++
>  13 files changed, 314 insertions(+), 10 

RE: [PATCH] i386: Fix recent testcase fail

2024-01-08 Thread Liu, Hongtao



> -Original Message-
> From: Jiang, Haochen 
> Sent: Monday, January 8, 2024 4:41 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Fix recent testcase fail
> 
> After commit 01f4251b8775c832a92d55e2df57c9ac72eaceef, early break
> vectorization is supported. The two testcases need to be fixed.
Ok.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/avx512fp16-xorsign-1.c: Fix testcase.
>   * gcc.target/i386/part-vect-absneghf.c: Ditto.
> ---
>  gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c | 2 +-
>  gcc/testsuite/gcc.target/i386/part-vect-absneghf.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
> b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
> index a22a6ceabff..f5dd457c9eb 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
> @@ -35,7 +35,7 @@ do_test (void)
>abort ();
>  }
> 
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } }
> +*/
>  /* { dg-final { scan-assembler "\[ \t\]xor" } } */
>  /* { dg-final { scan-assembler "\[ \t\]and" } } */
>  /* { dg-final { scan-assembler-not "copysign" } } */ diff --git
> a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> index 48aed14d604..713f0bff4dd 100644
> --- a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> +++ b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run { target avx512fp16 } } */
> -/* { dg-options "-O1 -mavx512fp16 -mavx512vl -ftree-vectorize -fdump-
> tree-slp-details -fdump-tree-optimized" } */
> +/* { dg-options "-O1 -mavx512fp16 -mavx512vl -fdump-tree-slp-details
> +-fdump-tree-optimized" } */
> 
>  extern void abort ();
> 
> --
> 2.31.1



ping^3: [PATCH] diagnostics: Fix behavior of permerror options after diagnostic pop [PR111918]

2024-01-08 Thread Lewis Hyatt
Can I please ping this one again? It's 3 lines or so to fix the PR. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638692.html

On Tue, Dec 19, 2023 at 6:20 PM Lewis Hyatt  wrote:
>
> Hello-
>
> May I please ping this one? Thanks...
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638692.html
>
> -Lewis
>
> On Wed, Nov 29, 2023 at 7:05 PM Lewis Hyatt  wrote:
> >
> > On Thu, Nov 09, 2023 at 04:16:10PM -0500, Lewis Hyatt wrote:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111918
> > >
> > > This patch fixes the behavior of `#pragma GCC diagnostic pop' for 
> > > permissive
> > > error diagnostics such as -Wnarrowing (in C++11). Those currently do not
> > > return to the correct state after the last pop; they become effectively
> > > simple warnings instead. Bootstrap + regtest all languages on x86-64, does
> > > it look OK please? Thanks!
> >
> > Hello-
> >
> > May I please ping this bug fix?
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635871.html
> >
> > Please note, it requires a trivial rebase on top of recent changes to
> > the class diagnostic_context public interface. I attached the rebased patch
> > here as well. Thanks!
> >
> > -Lewis


[PATCH] Resolve issue with Canadian build for x86_64-w64-mingw32 multilibs

2024-01-08 Thread unlvsur
From: trcrsired 

In the case of x86_64-w64-mingw32 gcc with multilibs, a conflict arises as both 
64-bit and 32-bit DLLs attempt to copy into the bin/ directory. This 
discrepancy results in coverage issues.

This commit aligns the Canadian build process for gcc targeting Windows with 
cross builds. Consequently, DLLs will no longer be copied into bin/ but will 
instead reside in the lib and lib32 directories.
---
 gcc/configure   | 32 ++--
 libatomic/configure | 16 +++-
 libbacktrace/configure  | 16 +++-
 libcc1/configure| 32 ++--
 libffi/configure| 32 ++--
 libgfortran/configure   | 32 ++--
 libgm2/configure| 32 ++--
 libgo/config/libtool.m4 | 16 +++-
 libgo/configure | 16 +++-
 libgomp/configure   | 32 ++--
 libgrust/configure  | 32 ++--
 libitm/configure| 32 ++--
 libobjc/configure   | 16 +++-
 libphobos/configure | 16 +++-
 libquadmath/configure   | 16 +++-
 libsanitizer/configure  | 32 ++--
 libssp/configure| 16 +++-
 libstdc++-v3/configure  | 32 ++--
 libtool.m4  | 16 +++-
 libvtv/configure| 32 ++--
 lto-plugin/configure| 16 +++-
 zlib/configure  | 16 +++-
 22 files changed, 495 insertions(+), 33 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 996046f5198..db9a5c8f40b 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -20631,6 +20631,19 @@ cygwin* | mingw* | pw32* | cegcc*)
   yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*)
 library_names_spec='$libname.dll.a'
 # DLL is installed to $(libdir)/../bin by postinstall_cmds
+# If user builds GCC with mulitlibs enabled,
+# it should just install on $(libdir)
+# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
+if test ${multilib} = yes; then
+postinstall_cmds='base_file=`basename \${file}`~
+  dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
\$dlname'\''`~
+  dldir=$destdir/`dirname \$dlpath`~
+  $install_prog $dir/$dlname $destdir/$dlname~
+  chmod a+x $destdir/$dlname~
+  if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
+   eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
+  fi'
+else
 postinstall_cmds='base_file=`basename \${file}`~
   dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
\$dlname'\''`~
   dldir=$destdir/`dirname \$dlpath`~
@@ -20638,8 +20651,9 @@ cygwin* | mingw* | pw32* | cegcc*)
   $install_prog $dir/$dlname \$dldir/$dlname~
   chmod a+x \$dldir/$dlname~
   if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
-eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
+   eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
   fi'
+fi
 postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
   dlpath=$dir/\$dldll~
$RM \$dlpath'
@@ -24359,6 +24373,19 @@ cygwin* | mingw* | pw32* | cegcc*)
   yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*)
 library_names_spec='$libname.dll.a'
 # DLL is installed to $(libdir)/../bin by postinstall_cmds
+# If user builds GCC with mulitlibs enabled,
+# it should just install on $(libdir)
+# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
+if test ${multilib} = yes; then
+postinstall_cmds='base_file=`basename \${file}`~
+  dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
\$dlname'\''`~
+  dldir=$destdir/`dirname \$dlpath`~
+  $install_prog $dir/$dlname $destdir/$dlname~
+  chmod a+x $destdir/$dlname~
+  if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
+   eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
+  fi'
+else
 postinstall_cmds='base_file=`basename \${file}`~
   dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
\$dlname'\''`~
   dldir=$destdir/`dirname \$dlpath`~
@@ -24366,8 +24393,9 @@ cygwin* | mingw* | pw32* | cegcc*)
   $install_prog $dir/$dlname \$dldir/$dlname~
   chmod a+x \$dldir/$dlname~
   if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
-eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
+   eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
   fi'
+fi
 postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
   dlpath=$dir/\$dldll~
$RM \$dlpath'
diff --git a/libatomic/configure b/libatomic/configure
index d579bab96f8..bf5e3858f94 100755
--- a/libatomic/configure
+++ b/libatomic/configure
@@ -10518,6 +10518,19 @@ 

Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-08 Thread 钟居哲
This patch looks ok from myside.



juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2024-01-03 14:08
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. We only check the
prefix is 'v', so that no extra attribute is needed.
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (riscv_asm_output_opcode):
New function to add assembler insn code prefix/suffix.
(th_asm_output_opcode):
Thead function to add assembler insn code prefix/suffix.
* config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise
* config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
* config/riscv/thead.cc (th_asm_output_opcode): Likewise
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/xtheadvector/prefix.c: New test.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config/riscv/riscv-protos.h |  2 ++
gcc/config/riscv/riscv.cc   | 11 +++
gcc/config/riscv/riscv.h|  4 
gcc/config/riscv/thead.cc   | 13 +
.../gcc.target/riscv/rvv/xtheadvector/prefix.c  | 12 
5 files changed, 42 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..71724dabdb5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,7 @@ struct riscv_address_info {
};
/* Routines implemented in riscv.cc.  */
+extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
extern int riscv_float_const_rtx_index_for_fli (rtx);
@@ -717,6 +718,7 @@ extern void th_mempair_prepare_save_restore_operands 
(rtx[4], bool,
  int, HOST_WIDE_INT,
  int, HOST_WIDE_INT);
extern void th_mempair_save_restore_regs (rtx[4], bool, machine_mode);
+extern const char *th_asm_output_opcode (FILE *asm_out_file, const char *p);
#ifdef RTX_CODE
extern const char*
th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..51878797287 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5636,6 +5636,17 @@ riscv_get_v_regno_alignment (machine_mode mode)
   return lmul;
}
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  if (TARGET_XTHEADVECTOR)
+return th_asm_output_opcode (asm_out_file, p);
+
+  return p;
+}
+
/* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
'h' Print the high-part relocation associated with OP, after stripping
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..c33361a254d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   asm_fprintf ((FILE), "%U%s", (NAME)); \
   } while (0)
+#undef ASM_OUTPUT_OPCODE
+#define ASM_OUTPUT_OPCODE(STREAM, PTR) \
+  (PTR) = riscv_asm_output_opcode(STREAM, PTR)
+
#define JUMP_TABLES_IN_TEXT_SECTION 0
#define CASE_VECTOR_MODE SImode
#define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 20353995931..dc3aed3904d 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -883,6 +883,19 @@ th_output_move (rtx dest, rtx src)
   return NULL;
}
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+th_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  /* We need to add th. prefix to all the xtheadvector
+ instructions here.*/
+  if (current_output_insn != NULL && p[0] == 'v')
+fputs ("th.", asm_out_file);
+
+  return p;
+}
+
/* Implement TARGET_PRINT_OPERAND_ADDRESS for XTheadMemIdx.  */
bool
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
new file mode 100644
index 000..eee727ef6b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
+{
+  return __riscv_vadd_vv_i32m1 (vx, vy, vl);
+}
+
+/* { dg-final { scan-assembler {\mth\.v\M} } } */
-- 
2.17.1
 
 


Re: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-08 Thread 钟居哲
LGTM.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-08 17:12
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.
Patch v8: Resubmit after fix the rtl-checking issue. Passed all the riscv 
regression test.
Patch v7: Add newline at the end of file.
Patch v6: Move intrinsic tests into rvv/base.
Patch v5: Rebase
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing cases.
Patch v2: Update march info according to the change of riscv-common.c
 
This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test.
* gcc.target/riscv/zvkb.c: New test.
---
.../riscv/rvv/base/zvbb-intrinsic.c   | 179 ++
.../riscv/rvv/base/zvbb_vandn_vx_constraint.c |  15 ++
.../riscv/rvv/base/zvbc-intrinsic.c   |  62 ++
.../riscv/rvv/base/zvbc_vx_constraint-1.c |  14 ++
.../riscv/rvv/base/zvbc_vx_constraint-2.c |  14 ++
.../riscv/rvv/base/zvkg-intrinsic.c   |  24 +++
.../riscv/rvv/base/zvkned-intrinsic.c | 104 ++
.../riscv/rvv/base/zvknha-intrinsic.c |  33 
.../riscv/rvv/base/zvknhb-intrinsic.c |  33 
.../riscv/rvv/base/zvksed-intrinsic.c |  33 
.../riscv/rvv/base/zvksh-intrinsic.c  |  24 +++
gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
12 files changed, 548 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
new file mode 100644
index 000..b7e25bfe819
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include "riscv_vector.h"
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) {
+  return 

Re: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.

2024-01-08 Thread 钟居哲
LGTM.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-08 17:12
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.
Patch v7:Resubmit after fix trl-checking issue. Passed all the riscv regression 
test.
Patch v6:Remove unused code.
Patch v5:Rebase.
Patch v4:Merge crypto vector function.def into vector.
Patch v3:Define a shape for vaesz and merge vector-crypto-types.def
 into riscv-vector-builtins-types.def.
Patch v2:Optimize function_shape class for crypto_vector.
 
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto. 
(class b_reverse):Ditto. 
(class vwsll):   Ditto. 
(class clmul):   Ditto. 
(class vg_nhab):  Ditto. 
(class crypto_vv):Ditto. 
(class crypto_vi):Ditto. 
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
Add crypto vector intrinsic definition.
(vbrev): Ditto.
(vclz): Ditto.
(vctz): Ditto.
(vwsll): Ditto.
(vandn): Ditto.
(vbrev8): Ditto.
(vrev8): Ditto.
(vrol): Ditto.
(vror): Ditto.
(vclmul): Ditto.
(vclmulh): Ditto.
(vghsh): Ditto.
(vgmul): Ditto.
(vaesef): Ditto.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz): Ditto.
(vaeskf1): Ditto.
(vaeskf2): Ditto.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
(vsm4k): Ditto.
(vsm4r): Ditto.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(registered_function::overloaded_hash): Processing size_t uimm for C overloaded 
func.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
---
.../riscv/riscv-vector-builtins-bases.cc  | 264 +-
.../riscv/riscv-vector-builtins-bases.h   |  28 ++
.../riscv/riscv-vector-builtins-functions.def |  94 +++
.../riscv/riscv-vector-builtins-shapes.cc |  87 +-
.../riscv/riscv-vector-builtins-shapes.h  |   4 +
.../riscv/riscv-vector-builtins-types.def |  25 ++
gcc/config/riscv/riscv-vector-builtins.cc | 133 -
gcc/config/riscv/riscv-vector-builtins.def|   1 +
8 files changed, 633 insertions(+), 3 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..d12bb89f91c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,212 @@ public:
   }
};
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+{
+  case OP_TYPE_v:
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_v_scalar (CODE, e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+}
+  }
+};
+
+/* Implements vbrev/vbrev8/vrev8.  */
+template
+class b_reverse : public 

Re: [committed V3] libstdc++: Add Unicode-aware width estimation for std::format

2024-01-08 Thread Jonathan Wakely
On Mon, 8 Jan 2024 at 01:19, Jonathan Wakely  wrote:
>
> I decided to push this now, not wait for the morning.
>
> This is mostly the same as V2, but adds to the contrib/unicode/README as
> suggested by Lewis, and avoids a trailing whitespace character in the
> generated header.
>
> Tested x86_64-linux and aarch64-linux. Pushed to trunk.
>
> -- >8 --
>
>
> This implements the requirements in the following proposals, which
> dictate how std::format deals with non-ASCII strings:
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1868r1.html
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2572r1.html
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2675r1.pdf
>
> There are two parts to this. The width estimation for strings must only
> count the width of the first character in an extended grapheme cluster.
> That requires implementing the algorithm for detecting cluster breaks,
> which requires a number of lookup tables of the grapheme cluster break
> properties (and Indic_Conjunct_Break and Extended_Pictographic
> properties) of every code point. Additionally, some characters have a
> field width of 2, which requires another lookup table of field widths
> for every code point.  The tables added in this commit do not contain
> entries for every code point from 0 to 0x10 as that would be very
> inefficient and use too much memory. Instead the tables only contain the
> code points that form an "edge" for a property, omitting all the code
> points that have the same property as the preceding one. We can use a
> binary search to find the closest code point in the table that is not
> greater than the one we're looking for.
>
> The tables are generated by a new Python script added to the
> contrib/unicode directory, and a new data file downloaded from the
> Unicode Consortium website.
>
> The rules for extended grapheme cluster breaking are implemented for the
> latest Unicode standard, version 15.1.0.
>
> libstdc++-v3/ChangeLog:
>
> * include/Makefile.am: Add new headers.
> * include/Makefile.in: Regenerate.
> * include/bits/unicode.h: New file.
> * include/bits/unicode-data.h: New file.
> * include/std/format: Include .
> (__literal_encoding_is_utf8): Move to .
> (_Spec::_M_fill): Change type to char32_t.
> (_Spec::_M_parse_fill_and_align): Read a Unicode scalar value
> instead of a single character.
> (__write_padded): Change __fill_char parameter to char32_t and
> encode it into the output.
> (__formatter_str::format): Use new __unicode::__field_width and
> __unicode::__truncate functions.
> * include/std/ostream: Adjust namespace qualification for
> __literal_encoding_is_utf8.
> * include/std/print: Likewise.
> * src/c++23/print.cc: Add [[unlikely]] attribute to error path.
> * testsuite/ext/unicode/view.cc: New test.
> * testsuite/std/format/functions/format.cc: Add missing examples
> from the standard demonstrating alignment with non-ASCII
> characters. Add examples checking correct handling of extended
> grapheme clusters.
>
> contrib/ChangeLog:
>
> * unicode/README: Add notes about generating libstdc++ tables.
> * unicode/GraphemeBreakProperty.txt: New file.
> * unicode/emoji-data.txt: New file.
> * unicode/gen_libstdcxx_unicode_data.py: New file.
> ---


While writing some more tests I realised I'd forgotten to finish this
function, and had left it as a copy from __field_width(char32_t)
above:

> +  constexpr bool
> +  __is_extended_pictographic(char32_t __c)
> +  {
> +if (__c < __xpicto_edges[0]) [[likely]]
> +  return 1;
> +
> +auto* __p = std::upper_bound(__xpicto_edges, std::end(__xpicto_edges), 
> __c);
> +return (__p - __xpicto_edges) % 2 + 1;
> +  }

It should be:

  constexpr bool
  __is_extended_pictographic(char32_t __c)
  {
if (__c < __xpicto_edges[0]) [[likely]]
  return false;

auto* __p = std::upper_bound(__xpicto_edges, std::end(__xpicto_edges), __c);
return (__p - __xpicto_edges) % 2;
  }

I'll push a fix for that (and add my new tests) tomorrow.



Re: c++/modules: Emit definitions of ODR-used static members imported from modules [PR112899]

2024-01-08 Thread Jason Merrill

On 1/8/24 04:21, Iain Sandoe wrote:

On 6 Jan 2024, at 22:30, Nathan Sidwell  wrote:

Richard Smith & I discussed whether we should use the module interface's 
capability of giving vague linkage entities a strong location. I didn't want to go 
messing with that, 'cos it was changing yet more stuff.

But, perhaps we should revisit that?  Any keyless polymorphic class in module 
purview gets its vtables etc emitted in the module's object file?  Likewise 
these kinds of entities.

cc'ing Iain, who probably knows more about Clang's state here.


I have been trying to keep up with this thread, but not sure if I can throw a 
whole lot of light on things.

There is an on-going attempt (now some 3 or 4 papers in) to try and figure out 
how to handle `static inline` entities at least at file scope - but that 
appears to be a different case (I can try an locate the latest paper on this if 
needed; the topic was discussed in Varna and Kona, but no new paper yet - 
perhaps Michael [Spencer] will bring a paper in Tokyo).

clang ran into some issues with vtables and that resulted in some discussion 
about whether there should be an amendment to the Itanium ABI to deal with the 
module-specific stuff.

https://github.com/itanium-cxx-abi/cxx-abi/issues/170

https://github.com/llvm/llvm-project/pull/75912#discussion_r1444150069

Sorry I cannot be much more specific at present,


That's pretty specific that vtables at least get emitted in the module 
whether or not there's a key function.  I've asked on that issue why 
this only applies to vtables.


Jason



[committed] xfail dg-final "Sunk statements: 5" on hppa*64*-*-*

2024-01-08 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

xfail dg-final "Sunk statements: 5" on hppa*64*-*-*

2024-01-08  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-sink-18.c: xfail dg-final "Sunk statements: 5"
on hppa*64*-*-*.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
index 1372100882e..b199df26a0f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
@@ -215,4 +215,4 @@ compute_on_bytes (uint8_t *in_data, int in_len, uint8_t 
*out_data, int out_len)
 base+index addressing modes, so the ip[len] address computation can't be
 made from the IV computation above.  powerpc64le similarly is affected.  */
 
- /* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink2" { target 
lp64 xfail { riscv64-*-* powerpc64le-*-* } } } } */
+ /* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink2" { target 
lp64 xfail { riscv64-*-* powerpc64le-*-* hppa*64*-*-* } } } } */


signature.asc
Description: PGP signature


[committed] Skip gfortran.dg/dec_math.f90 on hppa*-*-hpux*

2024-01-08 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

Skip gfortran.dg/dec_math.f90 on hppa

hppa*-*-hpux* doesn't have any long double trig functions.

2024-01-08  John David Anglin  

gcc/testsuite/ChangeLog:

* gfortran.dg/dec_math.f90: Skip on hppa*-*-hpux*.

diff --git a/gcc/testsuite/gfortran.dg/dec_math.f90 
b/gcc/testsuite/gfortran.dg/dec_math.f90
index d95233a5169..393e7def88e 100644
--- a/gcc/testsuite/gfortran.dg/dec_math.f90
+++ b/gcc/testsuite/gfortran.dg/dec_math.f90
@@ -1,5 +1,6 @@
 ! { dg-options "-cpp -std=gnu" }
 ! { dg-do run { xfail i?86-*-freebsd* } }
+! { dg-skip-if "No long double libc functions" { hppa*-*-hpux* } }
 !
 ! Test extra math intrinsics formerly offered by -fdec-math,
 ! now included with -std=gnu or -std=legacy.


signature.asc
Description: PGP signature


[r14-7003 Regression] FAIL: gfortran.dg/power_8.f90 -O3 -g execution test on Linux/x86_64

2024-01-08 Thread haochen.jiang
On Linux/x86_64,

b3cc5a1efead520bc977b4ba51f1328d01b3e516 is the first bad commit
commit b3cc5a1efead520bc977b4ba51f1328d01b3e516
Author: Richard Biener 
Date:   Fri Dec 15 10:32:29 2023 +0100

tree-optimization/113026 - avoid vector epilog in more cases

caused

FAIL: gcc.c-torture/execute/950612-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/950612-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/builtin-bitops-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/builtin-bitops-1.c   -O3 -g  execution test
FAIL: gcc.dg/vect/vect-early-break_74.c execution test
FAIL: gcc.dg/vect/vect-early-break_74.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-early-break_78.c execution test
FAIL: gcc.dg/vect/vect-early-break_78.c -flto -ffat-lto-objects execution test
FAIL: gfortran.dg/power_8.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/power_8.f90   -O3 -g  execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-7003/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/950612-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/950612-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/builtin-bitops-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/builtin-bitops-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_74.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_74.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_78.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_78.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/power_8.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/power_8.f90 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH][frontend]: don't ice with pragma NOVECTOR if loop in C has no condition [PR113267]

2024-01-08 Thread Joseph Myers
On Mon, 8 Jan 2024, Tamar Christina wrote:

> Hi All,
> 
> In C you can have loops without a condition, the original version of the patch
> was rejecting the use of #pragma GCC novector, however during review it was
> changed to not due this with the reason that we didn't want to give a compile
> error with such cases.
> 
> However because annotations seem to be only be allowed on conditions (unless
> I'm mistaken?) the attached example ICEs because there's no condition.
> 
> This will have it ignore the pragma instead of ICEing.  I don't know if this 
> is
> the best solution,  but as far as I can tell we can't attach the annotation to
> anything else.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [Patch, fortran PR89645/99065 No IMPLICIT type error with: ASSOCIATE( X => function() )

2024-01-08 Thread Harald Anlauf

Hi Paul,

your patch looks already very impressive!

Regarding the patch as is, I am still trying to grok it, even with your
explanations at hand...

While the testcase works as advertised, I noticed that it exhibits a
runtime memleak that occurs for (likely) each case where the associate
target is an allocatable, class-valued function result.

I tried to produce a minimal testcase using class(*), which apparently
is not handled by your patch (it ICEs for me):

program p
  implicit none
  class(*), allocatable :: x(:)
  x = foo()
  call prt (x)
  deallocate (x)
  ! up to here no memleak...
  associate (var => foo())
call prt (var)
  end associate
contains
  function foo() result(res)
class(*), allocatable :: res(:)
res = [42]
  end function foo
  subroutine prt (x)
class(*), intent(in) :: x(:)
select type (x)
type is (integer)
   print *, x
class default
   stop 99
end select
  end subroutine prt
end

Traceback (truncated):

foo.f90:9:18:

9 | call prt (var)
  |  1
internal compiler error: tree check: expected record_type or union_type
or qual_union_type, have function_type in gfc_class_len_get, at
fortran/trans-expr.cc:271
0x19fd5d5 tree_check_failed(tree_node const*, char const*, int, char
const*, ...)
../../gcc-trunk/gcc/tree.cc:8952
0xe1562d tree_check3(tree_node*, char const*, int, char const*,
tree_code, tree_code, tree_code)
../../gcc-trunk/gcc/tree.h:3652
0xe3e264 gfc_class_len_get(tree_node*)
../../gcc-trunk/gcc/fortran/trans-expr.cc:271
0xecda48 trans_associate_var
../../gcc-trunk/gcc/fortran/trans-stmt.cc:2325
0xecdd09 gfc_trans_block_construct(gfc_code*)
../../gcc-trunk/gcc/fortran/trans-stmt.cc:2383
[...]

I don't see anything wrong with it: NAG groks it, like Nvidia and Flang,
while Intel crashes at runtime.

Can you have another brief look?

Thanks,
Harald


On 1/6/24 18:26, Paul Richard Thomas wrote:

These PRs come about because of gfortran's single pass parsing. If the
function in the title is parsed after the associate construct, then its
type and rank are not known. The point at which this becomes a problem is
when expressions within the associate block are parsed. primary.cc
(gfc_match_varspec) could already deal with intrinsic types and so
component references were the trigger for the problem.

The two major parts of this patch are the fixup needed in gfc_match_varspec
and the resolution of  expressions with references in resolve.cc
(gfc_fixup_inferred_type_refs). The former relies on the two new functions
in symbol.cc to search for derived types with an appropriate component to
match the component reference and then set the associate name to have a
matching derived type. gfc_fixup_inferred_type_refs is called in resolution
and so the type of the selector function is known.
gfc_fixup_inferred_type_refs ensures that the component references use this
derived type and that array references occur in the right place in
expressions and match preceding array specs. Most of the work in preparing
the patch was sorting out cases where the selector was not a derived type
but, instead, a class function. If it were not for this, the patch would
have been submitted six months ago :-(

The patch is relatively safe because most of the chunks are guarded by
testing for the associate name being an inferred type, which is set in
gfc_match_varspec. For this reason, I do not think it likely that the patch
will cause regressions. However, it is more than possible that variants not
appearing in the submitted testcase will throw up new bugs.

Jerry has already given the patch a whirl and found that it applies
cleanly, regtests OK and works as advertised.

OK for trunk?

Paul

Fortran: Fix class/derived type function associate selectors [PR87477]

2024-01-06  Paul Thomas  

gcc/fortran
PR fortran/87477
PR fortran/89645
PR fortran/99065
* class.cc (gfc_change_class): New function needed for
associate names, when rank changes or a derived type is
produced by resolution
* dump-parse-tree.cc (show_code_node): Make output for SELECT
TYPE more comprehensible.
* gfortran.h : Add 'gfc_association_list' to structure
'gfc_association_list'. Add prototypes for
'gfc_find_derived_types', 'gfc_fixup_inferred_type_refs' and
'gfc_change_class'. Add macro IS_INFERRED_TYPE.
* match.cc (copy_ts_from_selector_to_associate): Add bolean arg
'select_type' with default false. If this is a select type name
and the selector is a inferred type, build the class type and
apply it to the associate name.
(build_associate_name): Pass true to 'select_type' in call to
previous.
* parse.cc (parse_associate): If the selector is a inferred type
the associate name is too. Make sure that function selector
class and rank, if known, are passed to the associate name. If
a function result exists, pass its typespec to the associate
name.
* primary.cc (gfc_match_varspec): If a scalar derived type
select type temporary has an array reference, 

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-08 Thread Jeff Law




On 1/8/24 12:11, Richard Sandiford wrote:



Thanks.  That led me to the following, which seems a bit more plausible
than my first attempt.  I'll test it on aarch64-linux-gnu and
x86_64-linux-gnu.  Does it look OK?
It looks reasonable to me.  I'm going to send another failure (ICE in 
finalize_new_accesses on a different target) separately.


Jeff


[committed] hppa: Fix bind_c_coms.f90 and bind_c_vars.f90 tests on hppa

2024-01-08 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

hppa: Fix bind_c_coms.f90 and bind_c_vars.f90 tests on hppa

Commit 6271dd98 changed the default from -fcommon to -fno-common.
This silently changed the alignment of uninitialized BSS data on
hppa where the alignment of common data must be greater or equal
to the alignment of the largest type that will fit in the block.
For example, the alignment of `double d[2];' changed from 16 to 8
on hppa64.

The hppa architecture requires strict alignment and the linker
warns about inconsistent alignment of variables.  This change broke
the gfortran.dg/bind_c_coms.f90 and gfortran.dg/bind_c_vars.f90
tests.  These tests check whether bind_c works between fortran
and C.

Adding the -fcommon option fixes the tests.  Probably, gcc and HP
C are now by default inconsistent but that's water under the bridge.

2024-01-08  John David Anglin  

gcc/testsuite/ChangeLog:

PR testsuite/94253
* gfortran.dg/bind_c_coms.f90: Add -fcommon option on hppa*-*-*.
* gfortran.dg/bind_c_vars.f90: Likewise.

diff --git a/gcc/testsuite/gfortran.dg/bind_c_coms.f90 
b/gcc/testsuite/gfortran.dg/bind_c_coms.f90
index 85ead9fb636..2f9714947c7 100644
--- a/gcc/testsuite/gfortran.dg/bind_c_coms.f90
+++ b/gcc/testsuite/gfortran.dg/bind_c_coms.f90
@@ -3,6 +3,7 @@
 ! { dg-options "-w" }
 ! the -w option is to prevent the warning about long long ints
 module bind_c_coms
+! { dg-additional-options "-fcommon" { target hppa*-*-hpux* } }
   use, intrinsic :: iso_c_binding
   implicit none
 
diff --git a/gcc/testsuite/gfortran.dg/bind_c_vars.f90 
b/gcc/testsuite/gfortran.dg/bind_c_vars.f90
index 4f4a0cfd795..ede3ffd8c21 100644
--- a/gcc/testsuite/gfortran.dg/bind_c_vars.f90
+++ b/gcc/testsuite/gfortran.dg/bind_c_vars.f90
@@ -1,6 +1,7 @@
 ! { dg-do run }
 ! { dg-additional-sources bind_c_vars_driver.c }
 module bind_c_vars
+! { dg-additional-options "-fcommon" { target hppa*-*-hpux* } }
   use, intrinsic :: iso_c_binding
   implicit none
 


signature.asc
Description: PGP signature


Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]

2024-01-08 Thread Uros Bizjak
On Mon, Jan 8, 2024 at 5:57 PM Andrew Pinski  wrote:
>
> On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak  wrote:
> >
> > Instead of converting XOR or PLUS of two values, ANDed with two constants 
> > that
> > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > ANDed values to PLUS expression.
>
> I think this only helps targets which have leal like instruction. Also
> I think it is the same issue as I recorded as PR 111763 .  I suspect
> BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> this in expand to decide if we want to use PLUS or IOR.

For the pr108477.c testcase, expand pass expands:

  r_3 = a_2(D) & 1;
 p_5 = b_4(D) & 4294967292;
 _1 = r_3 | p_5;
 _6 = _1 + 2;
 return _6;

The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
we need to determine values of constants. Is this information
available in the expand pass?

IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
the shown testcase would be beneficial when constructing control
register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
sequence in this case.

Uros.


Re: [Patch] GCN: Add pre-initial support for gfx1100

2024-01-08 Thread Thomas Schwinge
Hi!

On 2024-01-08T15:30:06+0100, Tobias Burnus  wrote:
> Andrew Stubbs wrote:
>> I know there will be things that need fixing for
>> both experimental architectures.
>
> Indeed. [...]

..., like, making it even build?  ;-P

>> P.S. Apologies, but I think my commits today conflict a little; you
>> should be able to drop the hunks that patch deleted code.
>
> I did so - but I then realized that I should have also added gfx1100 to
> the new chunk.
>
> Committed as r14-7006-g97a52f69d209f6 (see attachment) - as follow up to
> the original r14-7005-g52a2c659ae6c21

Pushed to master branch commit f9290cdf4697f467fd0fb7c710f58cc12e497889
"GCN: Add pre-initial support for gfx1100: 'EF_AMDGPU_MACH_AMDGCN_GFX1100'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f9290cdf4697f467fd0fb7c710f58cc12e497889 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 8 Jan 2024 20:35:27 +0100
Subject: [PATCH] GCN: Add pre-initial support for gfx1100:
 'EF_AMDGPU_MACH_AMDGCN_GFX1100'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_hsa_name’:
../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’?
 1666 | case EF_AMDGPU_MACH_AMDGCN_GFX1100:
  |  ^
  |  EF_AMDGPU_MACH_AMDGCN_GFX1030
../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: note: each undeclared identifier is reported only once for each function it appears in
../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_code’:
../../../source-gcc/libgomp/plugin/plugin-gcn.c:1711:12: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’?
 1711 | return EF_AMDGPU_MACH_AMDGCN_GFX1100;
  |^
  |EF_AMDGPU_MACH_AMDGCN_GFX1030
../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘max_isa_vgprs’:
../../../source-gcc/libgomp/plugin/plugin-gcn.c:1728:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’?
 1728 | case EF_AMDGPU_MACH_AMDGCN_GFX1100:
  |  ^
  |  EF_AMDGPU_MACH_AMDGCN_GFX1030
make[4]: *** [Makefile:813: libgomp_plugin_gcn_la-plugin-gcn.lo] Error 1

Fix-up for commit 52a2c659ae6c21f84b6acce0afcb9b93b9dc71a0
"GCN: Add pre-initial support for gfx1100".

	libgomp/
	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add
	'EF_AMDGPU_MACH_AMDGCN_GFX1100'.
---
 libgomp/plugin/plugin-gcn.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index f24a28faa22..0339848451e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -389,7 +389,8 @@ typedef enum {
   EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
   EF_AMDGPU_MACH_AMDGCN_GFX908 = 0x030,
   EF_AMDGPU_MACH_AMDGCN_GFX90a = 0x03f,
-  EF_AMDGPU_MACH_AMDGCN_GFX1030 = 0x036
+  EF_AMDGPU_MACH_AMDGCN_GFX1030 = 0x036,
+  EF_AMDGPU_MACH_AMDGCN_GFX1100 = 0x041
 } EF_AMDGPU_MACH;
 
 const static int EF_AMDGPU_MACH_MASK = 0x00ff;
-- 
2.34.1



Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-08 Thread Richard Sandiford
Jeff Law  writes:
> On 1/8/24 09:59, Richard Sandiford wrote:
>> This is a bit of a hopeful stab, but is the problem that recog_data still
>> had the previous contents of insn 3674, and so extract_insn_cached wrongly
>> thought that it doesn't need to do anything?  If so, does something like:
>> 
>> diff --git a/gcc/recog.cc b/gcc/recog.cc
>> index a6799e3f5e6..8ba63c78179 100644
>> --- a/gcc/recog.cc
>> +++ b/gcc/recog.cc
>> @@ -267,6 +267,8 @@ validate_change_1 (rtx object, rtx *loc, rtx new_rtx, 
>> bool in_group,
>>   case invalid.  */
>> changes[num_changes].old_code = INSN_CODE (object);
>> INSN_CODE (object) = -1;
>> +  if (recog_data.insn == object)
>> +recog_data.insn = nullptr;
>>   }
>>   
>> num_changes++;
>> 
>> fix it?  I suppose there's an argument that this belongs in whatever code
>> sets INSN_CODE to a new nonnegative value (so recog_level2 for RTL-SSA).
>> But doing it in validate_change_1 seems more robust, since anything
>> calling that function is considering changing the insn code.
> Nope, doesn't help at all.

Yeah, in hindsight it was a dull guess.  recog resets recog_data.insn
itself, so doing it here wasn't likely to help.

> I'd briefly put a reset of the INSN_CODE 
> and a call to recog_memoized in the costing path of rtl-ssa to see if 
> that would allow things to move forward, but it failed miserably.
>
> I'll pass along the .i file separately.  Hopefully it'll fail for you 
> and you can debug.  But given failure depends on stale bits in 
> recog_data, it may not.

Thanks.  That led me to the following, which seems a bit more plausible
than my first attempt.  I'll test it on aarch64-linux-gnu and
x86_64-linux-gnu.  Does it look OK?

Richard


insn_info::calculate_cost computes the costs of unchanged insns lazily,
so that we don't waste time costing instructions that we never try to
change.  It therefore has to revert any in-progress changes, cost the
original instruction, and then reapply the in-progress changes.

However, doing that temporarily changes the INSN_CODEs, and so
temporarily invalidates any information cached about the insn.
This means that insn_cost can end up looking at stale data,
or can cache data that becomes stale once the in-progress
changes are reapplied.

This could in principle happen for any use of temporarily_undo_changes
and redo_changes.  Those functions in turn share a common subroutine,
swap_change, so that seems like the best place to fix this.

gcc/
* recog.cc (swap_change): Invalidate the cached recog_data if it
describes an insn that is being changed.
---
 gcc/recog.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/recog.cc b/gcc/recog.cc
index a6799e3f5e6..56370e40e01 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -614,7 +614,11 @@ swap_change (int num)
   else
 std::swap (*changes[num].loc, changes[num].old);
   if (changes[num].object && !MEM_P (changes[num].object))
-std::swap (INSN_CODE (changes[num].object), changes[num].old_code);
+{
+  std::swap (INSN_CODE (changes[num].object), changes[num].old_code);
+  if (recog_data.insn == changes[num].object)
+   recog_data.insn = nullptr;
+}
 }
 
 /* Temporarily undo all the changes numbered NUM and up, with a view
-- 
2.25.1



[committed] steering.html: Update my affiliation

2024-01-08 Thread Joseph Myers
diff --git a/htdocs/steering.html b/htdocs/steering.html
index 95d6a4a8..6039a503 100644
--- a/htdocs/steering.html
+++ b/htdocs/steering.html
@@ -36,7 +36,7 @@ place to reach them is the gcc mailing 
list.
 Jason Merrill (Red Hat)
 David Miller (Red Hat)
 Toon Moene (Koninklijk Nederlands Meteorologisch Instituut)
-Joseph Myers (CodeSourcery / Mentor Graphics) [co-Release Manager]
+Joseph Myers (Red Hat) [co-Release Manager]
 Gerald Pfeifer (SUSE)
 Ramana Radhakrishnan 
 Joel Sherrill (OAR Corporation)

-- 
Joseph S. Myers
josmy...@redhat.com



[committed] MAINTAINERS: Update my email address

2024-01-08 Thread Joseph Myers
* MAINTAINERS: Update my email address.

diff --git a/MAINTAINERS b/MAINTAINERS
index fe5d95ae970..882694cc47d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -34,7 +34,7 @@ Jeff Law  

 Michael Meissner   
 Jason Merrill  
 David S. Miller
-Joseph Myers   
+Joseph Myers   
 Richard Sandiford  
 Bernd Schmidt  
 Ian Lance Taylor   
@@ -155,7 +155,7 @@ cygwin, mingw-w64   Jonathan Yong   
<10wa...@gmail.com>
 
Language Front Ends Maintainers
 
-C front end/ISO C99Joseph Myers
+C front end/ISO C99Joseph Myers
 Ada front end  Arnaud Charlet  
 Ada front end  Eric Botcazou   
 Ada front end  Marc Poulhiès   
@@ -192,7 +192,7 @@ libquadmath Jakub Jelinek   

 libvtv Caroline Tice   
 libphobos  Iain Buclaw 
 line map   Dodji Seketeli  
-soft-fpJoseph Myers
+soft-fpJoseph Myers
 scheduler (+ haifa)Jim Wilson  
 scheduler (+ haifa)Michael Meissner
 scheduler (+ haifa)Jeff Law
@@ -219,7 +219,7 @@ jump.cc David S. Miller 

 web pages  Gerald Pfeifer  
 config.sub/config.guessBen Elliston
 i18n   Philipp Thomas  
-i18n   Joseph Myers
+i18n   Joseph Myers
 diagnostic messagesDodji Seketeli  
 diagnostic messagesDavid Malcolm   
 build machinery (*.in) Paolo Bonzini   
@@ -227,14 +227,14 @@ build machinery (*.in)Nathanael Nerode

 build machinery (*.in) Alexandre Oliva 
 build machinery (*.in) Ralf Wildenhues 
 docs co-maintainer Gerald Pfeifer  
-docs co-maintainer Joseph Myers
+docs co-maintainer Joseph Myers
 docs co-maintainer Sandra Loosemore
 docstring relicensing  Gerald Pfeifer  
-docstring relicensing  Joseph Myers
+docstring relicensing  Joseph Myers
 predict.defJan Hubicka 
 gcov   Jan Hubicka 
 gcov   Nathan Sidwell  
-option handlingJoseph Myers
+option handlingJoseph Myers
 middle-end Jeff Law
 middle-end Ian Lance Taylor
 middle-end Richard Biener  
@@ -278,7 +278,7 @@ CTF, BTF, bpf port  David Faust 

 dataflow   Paolo Bonzini   
 dataflow   Seongbae Park   
 dataflow   Kenneth Zadeck  
-driver Joseph Myers
+driver Joseph Myers
 FortranHarald Anlauf   
 FortranJanne Blomqvist 
 FortranTobias Burnus   


-- 
Joseph S. Myers
josmy...@redhat.com

[wwwdocs] gcc-14/changes.html: OpenMP - improve wording

2024-01-08 Thread Tobias Burnus
The attached patch does a tiny updated to the OpenMP features (AMD GCN 
now also has an optimized memcpy_rect not only nvptx), but the main 
change is some shifting around to make it more consistent and better 
readable.


I intend to commit this relatively soon; like always, comments and 
suggestions are welcome - be it before or after the commit.


Current version: http://gcc.gnu.org/gcc-14/changes.html

Thanks,

Tobias


[PATCH] c++: non-dep array list-init w/ non-triv dtor [PR109899]

2024-01-08 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13/12?

-- >8 --

The get_target_expr call added in r12-7069-g119cea98f66476 causes us
for the below testcase to call build_vec_delete in a template context,
which builds a templated destructor call and checks expr_noexcept_p for
it, which ICEs because the call has templated form.  Much of the work
of build_vec_delete however is code generation and thus will just get
throw away in a template context, including this expr_noexcept_p check
and the code generation guarded by it.  So this patch narrowly fixes this
ICE by assuming the expr_noexcept_p call returns true in a template
context.

PR c++/109899

gcc/cp/ChangeLog:

* init.cc (build_vec_delete_1): Assume expr_noexcept_p is true
in a template context.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-array21.C: New test.
---
 gcc/cp/init.cc|  3 ++-
 gcc/testsuite/g++.dg/cpp0x/initlist-array21.C | 12 
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-array21.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 09584719ee6..aa0a35a3885 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4155,7 +4155,8 @@ build_vec_delete_1 (location_t loc, tree base, tree 
maxindex, tree type,
 
   /* If one destructor throws, keep trying to clean up the rest, unless we're
  already in a build_vec_init cleanup.  */
-  if (flag_exceptions && !in_cleanup && !expr_noexcept_p (tmp, tf_none))
+  if (flag_exceptions && !in_cleanup && !processing_template_decl
+  && !expr_noexcept_p (tmp, tf_none))
 {
   loop = build2 (TRY_CATCH_EXPR, void_type_node, loop,
 unshare_expr (loop));
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C
new file mode 100644
index 000..5e37e3de62a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C
@@ -0,0 +1,12 @@
+// PR c++/109899
+// { dg-do compile { target c++11 } }
+
+struct A { A(); ~A(); };
+
+template 
+using array = T[42];
+
+template
+void f() {
+  array{};
+}
-- 
2.43.0.254.ga26002b628



Re: [PATCH] btf: print string position as comment for validation and testing purposes.

2024-01-08 Thread Cupertino Miranda


Thanks! Committed.

David Faust writes:

> Hi Cupertino,
>
> On 1/8/24 02:55, Cupertino Miranda wrote:
>> Hi everyone,
>>
>> This patch adds a comment to the BTF strings regarding their position
>> within the section. This is useful for assembly inspection purposes.
>>
>> Regards,
>> Cupertino
>>
>> When using -dA, this function was only printing as comment btf_string or
>> btf_aux_string.
>> This patch changes the comment to also include the position of the
>> string within the section in hexadecimal format.
>>
>> gcc/ChangeLog:
>>  * btfout.cc (output_btf_strs): Changed.
>
> Please be a little bit more expressive in the ChangeLog.
> Something along the lines of "print string offset in comment" will be
> much more useful.
>
> LGTM with that change, please apply.
> Thanks!
>
>> ---
>>  gcc/btfout.cc | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
>> index db4f1084f85c..04218adc9e66 100644
>> --- a/gcc/btfout.cc
>> +++ b/gcc/btfout.cc
>> @@ -1081,17 +1081,20 @@ static void
>>  output_btf_strs (ctf_container_ref ctfc)
>>  {
>>ctf_string_t * ctf_string = ctfc->ctfc_strtable.ctstab_head;
>> +  static int str_pos = 0;
>>
>>while (ctf_string)
>>  {
>> -  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string");
>> +  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string, str_pos 
>> = 0x%x", str_pos);
>> +  str_pos += strlen(ctf_string->cts_str) + 1;
>>ctf_string = ctf_string->cts_next;
>>  }
>>
>>ctf_string = ctfc->ctfc_aux_strtable.ctstab_head;
>>while (ctf_string)
>>  {
>> -  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string");
>> +  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string, 
>> str_pos = 0x%x", str_pos);
>> +  str_pos += strlen(ctf_string->cts_str) + 1;
>>ctf_string = ctf_string->cts_next;
>>  }
>>  }


Re: [PATCH] bpf: Correct BTF for kernel_helper attributed decls.

2024-01-08 Thread Cupertino Miranda


Thanks! Committed.

David Faust writes:

> Hi Cupetino,
>
> On 1/8/24 03:05, Cupertino Miranda wrote:
>> Hi everyone,
>>
>> This patch address the problem reported in:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113225
>>
>> Looking forward to your review.
>
> LGTM, thanks. Please apply.
>
>>
>> Cheers,
>> Cupertino
>>
>>
>> This patch fix a problem with kernel_helper attribute BTF information,
>> which incorrectly generates BTF_KIND_FUNC entry.
>> This BTF entry although accurate with traditional extern function
>> declarations, once the function is attributed with kernel_helper, it is
>> semantically incompatible of the kernel helpers in BPF infrastructure.
>>
>> gcc/ChangeLog:
>>  PR target/113225
>>  * btfout.cc (btf_collect_datasec): Skip creating BTF info for
>>  extern and kernel_helper attributed function decls.
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/bpf/attr-kernel-helper.c: New test.
>> ---
>>  gcc/btfout.cc |  7 +++
>>  gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c | 15 +++
>>  2 files changed, 22 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c
>>
>> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
>> index 04218adc9e66..39e7bec43bfb 100644
>> --- a/gcc/btfout.cc
>> +++ b/gcc/btfout.cc
>> @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "diagnostic-core.h"
>>  #include "cgraph.h"
>>  #include "varasm.h"
>> +#include "stringpool.h"
>> +#include "attribs.h"
>>  #include "dwarf2out.h" /* For lookup_decl_die.  */
>>
>>  static int btf_label_num;
>> @@ -429,6 +431,11 @@ btf_collect_datasec (ctf_container_ref ctfc)
>>if (dtd == NULL)
>>  continue;
>>
>> +  if (DECL_EXTERNAL (func->decl)
>> +  && (lookup_attribute ("kernel_helper",
>> +DECL_ATTRIBUTES (func->decl))) != NULL_TREE)
>> +continue;
>> +
>>/* Functions actually get two types: a BTF_KIND_FUNC_PROTO, and
>>   also a BTF_KIND_FUNC.  But the CTF container only allocates one
>>   type per function, which matches closely with BTF_KIND_FUNC_PROTO.
>> diff --git a/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c 
>> b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c
>> new file mode 100644
>> index ..7c5a0007c979
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c
>> @@ -0,0 +1,15 @@
>> +/* Basic test for kernel_helper attribute BTF information.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O0 -dA -gbtf" } */
>> +
>> +extern int foo_helper(int) __attribute((kernel_helper(42)));
>> +extern int foo_nohelper(int);
>> +
>> +int bar (int arg)
>> +{
>> +  return foo_helper (arg) + foo_nohelper (arg);
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_nohelper'" 1 } } */
>> +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_helper'" 0 } } */


Re: [PATCH] OpenMP: Support accelerated 2D/3D memory copies for AMD GCN

2024-01-08 Thread Julian Brown
On Thu, 21 Dec 2023 17:05:18 +0100
Tobias Burnus  wrote:

> I think it makes sense to split this patch into two parts:
> 
> * The libgomp/plugin/plugin-gcn.c – which is independent and would
> already used by omp_memcpy_rect.

I will commit this version in a moment. I needed to add the
DLSYM_OPT_FN bit from one of Andrew Stubbs's patches elsewhere.
Re-tested with offloading to AMD GCN (...with a couple of patches
applied locally to get working test results, as plain mainline as of a
few days ago wasn't working too well for GCN offloading).

Thanks for review!

Julian

commit 34c6e9132b3ea33c2e15c88e127c4134a5e88b8d
Author: Julian Brown 
Date:   Thu Jan 4 16:44:18 2024 +

OpenMP: Support accelerated 2D/3D memory copies for AMD GCN

This patch adds support for 2D/3D memory copies for omp_target_memcpy_rect
using AMD extensions to the HSA API.  This is just the AMD GCN-specific
part of the following patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631001.html

2024-01-04  Julian Brown  

libgomp/
* plugin/plugin-gcn.c (hsa_runtime_fn_info): Add
hsa_amd_memory_lock_fn, hsa_amd_memory_unlock_fn,
hsa_amd_memory_async_copy_rect_fn function pointers.
(init_hsa_runtime_functions): Add above functions, with
DLSYM_OPT_FN.
(GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New functions.

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index e3e8b31c558..f24a28faa22 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -196,6 +196,16 @@ struct hsa_runtime_fn_info
   hsa_status_t (*hsa_code_object_deserialize_fn)
 (void *serialized_code_object, size_t serialized_code_object_size,
  const char *options, hsa_code_object_t *code_object);
+  hsa_status_t (*hsa_amd_memory_lock_fn)
+(void *host_ptr, size_t size, hsa_agent_t *agents, int num_agent,
+ void **agent_ptr);
+  hsa_status_t (*hsa_amd_memory_unlock_fn) (void *host_ptr);
+  hsa_status_t (*hsa_amd_memory_async_copy_rect_fn)
+(const hsa_pitched_ptr_t *dst, const hsa_dim3_t *dst_offset,
+ const hsa_pitched_ptr_t *src, const hsa_dim3_t *src_offset,
+ const hsa_dim3_t *range, hsa_agent_t copy_agent,
+ hsa_amd_copy_direction_t dir, uint32_t num_dep_signals,
+ const hsa_signal_t *dep_signals, hsa_signal_t completion_signal);
 };
 
 /* Structure describing the run-time and grid properties of an HSA kernel
@@ -1371,6 +1381,8 @@ init_hsa_runtime_functions (void)
   hsa_fns.function##_fn = dlsym (handle, #function); \
   if (hsa_fns.function##_fn == NULL) \
 return false;
+#define DLSYM_OPT_FN(function) \
+  hsa_fns.function##_fn = dlsym (handle, #function);
   void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);
   if (handle == NULL)
 return false;
@@ -1405,7 +1417,11 @@ init_hsa_runtime_functions (void)
   DLSYM_FN (hsa_signal_load_acquire)
   DLSYM_FN (hsa_queue_destroy)
   DLSYM_FN (hsa_code_object_deserialize)
+  DLSYM_OPT_FN (hsa_amd_memory_lock)
+  DLSYM_OPT_FN (hsa_amd_memory_unlock)
+  DLSYM_OPT_FN (hsa_amd_memory_async_copy_rect)
   return true;
+#undef DLSYM_OPT_FN
 #undef DLSYM_FN
 }
 
@@ -3933,6 +3949,352 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n)
   return true;
 }
 
+/* Here _size refers to  multiplied by size -- i.e.
+   measured in bytes.  So we have:
+
+   dim1_size: number of bytes to copy on innermost dimension ("row")
+   dim0_len: number of rows to copy
+   dst: base pointer for destination of copy
+   dst_offset1_size: innermost row offset (for dest), in bytes
+   dst_offset0_len: offset, number of rows (for dest)
+   dst_dim1_size: whole-array dest row length, in bytes (pitch)
+   src: base pointer for source of copy
+   src_offset1_size: innermost row offset (for source), in bytes
+   src_offset0_len: offset, number of rows (for source)
+   src_dim1_size: whole-array source row length, in bytes (pitch)
+*/
+
+int
+GOMP_OFFLOAD_memcpy2d (int dst_ord, int src_ord, size_t dim1_size,
+		   size_t dim0_len, void *dst, size_t dst_offset1_size,
+		   size_t dst_offset0_len, size_t dst_dim1_size,
+		   const void *src, size_t src_offset1_size,
+		   size_t src_offset0_len, size_t src_dim1_size)
+{
+  if (!hsa_fns.hsa_amd_memory_lock_fn
+  || !hsa_fns.hsa_amd_memory_unlock_fn
+  || !hsa_fns.hsa_amd_memory_async_copy_rect_fn)
+return -1;
+
+  /* GCN hardware requires 4-byte alignment for base addresses & pitches.  Bail
+ out quietly if we have anything oddly-aligned rather than letting the
+ driver raise an error.  */
+  if uintptr_t) dst) & 3) != 0 || (((uintptr_t) src) & 3) != 0)
+return -1;
+
+  if ((dst_dim1_size & 3) != 0 || (src_dim1_size & 3) != 0)
+return -1;
+
+  /* Only handle host to device or device to host transfers here.  */
+  if ((dst_ord == -1 && src_ord == -1)
+  || (dst_ord != -1 && src_ord != -1))
+return -1;
+
+  

[PATCH][GCC][Arm] Define __ARM_FEATURE_BF16 when +bf16 feature is enabled

2024-01-08 Thread Matthieu Longo

Hi,

Arm GCC backend does not define __ARM_FEATURE_BF16 when +bf16 is 
specified (via -march option, or target pragma) whereas it is supposed 
to be tested before including arm_bf16.h (as specified in ACLE document: 
https://arm-software.github.io/acle/main/acle.html#arm_bf16h).


gcc/ChangeLog:

* config/arm/arm-c.cc (arm_cpu_builtins): define __ARM_FEATURE_BF16
* config/arm/arm.h: define TARGET_BF16

Ok for master ?

Matthieudiff --git a/gcc/config/arm/arm-c.cc b/gcc/config/arm/arm-c.cc
index 
2e181bf7f36bab1209d5358e65d9513541683632..21ca22ac71119eda4ff01709aa95002ca13b1813
 100644
--- a/gcc/config/arm/arm-c.cc
+++ b/gcc/config/arm/arm-c.cc
@@ -425,12 +425,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   arm_arch_cde_coproc);
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16", TARGET_BF16);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+ TARGET_BF16_FP);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
  TARGET_BF16_FP);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
  TARGET_BF16_SIMD);
-  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
- TARGET_BF16_FP || TARGET_BF16_SIMD);
 }
 
 void
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
2a2207c0ba1acef1c7082c89bf5f542b1466d033..e7a7fc47e606d2ead5f778dca2e63b2e894d0efe
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -252,10 +252,10 @@ emission of floating point pcs attributes.  */
 #define TARGET_I8MM (TARGET_NEON && arm_arch8_2 && arm_arch_i8mm)
 
 /* FPU supports Brain half-precision floating-point (BFloat16) extension.  */
-#define TARGET_BF16_FP (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP5 \
-   && arm_arch8_2 && arm_arch_bf16)
-#define TARGET_BF16_SIMD (TARGET_NEON && TARGET_VFP5 \
- && arm_arch8_2 && arm_arch_bf16)
+#define TARGET_BF16 (TARGET_32BIT && TARGET_HARD_FLOAT && arm_arch8_2 \
+   && TARGET_VFP5 && arm_arch_bf16)
+#define TARGET_BF16_FP (TARGET_BF16)
+#define TARGET_BF16_SIMD (TARGET_BF16 && TARGET_NEON)
 
 /* Q-bit is present.  */
 #define TARGET_ARM_QBIT \


Re: [PATCH] bpf: Correct BTF for kernel_helper attributed decls.

2024-01-08 Thread David Faust
Hi Cupetino,

On 1/8/24 03:05, Cupertino Miranda wrote:
> Hi everyone,
> 
> This patch address the problem reported in:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113225
> 
> Looking forward to your review.

LGTM, thanks. Please apply.

> 
> Cheers,
> Cupertino
> 
> 
> This patch fix a problem with kernel_helper attribute BTF information,
> which incorrectly generates BTF_KIND_FUNC entry.
> This BTF entry although accurate with traditional extern function
> declarations, once the function is attributed with kernel_helper, it is
> semantically incompatible of the kernel helpers in BPF infrastructure.
> 
> gcc/ChangeLog:
>   PR target/113225
>   * btfout.cc (btf_collect_datasec): Skip creating BTF info for
>   extern and kernel_helper attributed function decls.
> gcc/testsuite/ChangeLog:
>   * gcc.target/bpf/attr-kernel-helper.c: New test.
> ---
>  gcc/btfout.cc |  7 +++
>  gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c | 15 +++
>  2 files changed, 22 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c
> 
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index 04218adc9e66..39e7bec43bfb 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "diagnostic-core.h"
>  #include "cgraph.h"
>  #include "varasm.h"
> +#include "stringpool.h"
> +#include "attribs.h"
>  #include "dwarf2out.h" /* For lookup_decl_die.  */
>  
>  static int btf_label_num;
> @@ -429,6 +431,11 @@ btf_collect_datasec (ctf_container_ref ctfc)
>if (dtd == NULL)
>   continue;
>  
> +  if (DECL_EXTERNAL (func->decl)
> +   && (lookup_attribute ("kernel_helper",
> + DECL_ATTRIBUTES (func->decl))) != NULL_TREE)
> + continue;
> +
>/* Functions actually get two types: a BTF_KIND_FUNC_PROTO, and
>also a BTF_KIND_FUNC.  But the CTF container only allocates one
>type per function, which matches closely with BTF_KIND_FUNC_PROTO.
> diff --git a/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c 
> b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c
> new file mode 100644
> index ..7c5a0007c979
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c
> @@ -0,0 +1,15 @@
> +/* Basic test for kernel_helper attribute BTF information.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -dA -gbtf" } */
> +
> +extern int foo_helper(int) __attribute((kernel_helper(42)));
> +extern int foo_nohelper(int);
> +
> +int bar (int arg)
> +{
> +  return foo_helper (arg) + foo_nohelper (arg);
> +}
> +
> +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_nohelper'" 1 } } */
> +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_helper'" 0 } } */


Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-08 Thread Jeff Law




On 1/8/24 09:59, Richard Sandiford wrote:



This is a bit of a hopeful stab, but is the problem that recog_data still
had the previous contents of insn 3674, and so extract_insn_cached wrongly
thought that it doesn't need to do anything?  If so, does something like:

diff --git a/gcc/recog.cc b/gcc/recog.cc
index a6799e3f5e6..8ba63c78179 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -267,6 +267,8 @@ validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool 
in_group,
 case invalid.  */
changes[num_changes].old_code = INSN_CODE (object);
INSN_CODE (object) = -1;
+  if (recog_data.insn == object)
+   recog_data.insn = nullptr;
  }
  
num_changes++;


fix it?  I suppose there's an argument that this belongs in whatever code
sets INSN_CODE to a new nonnegative value (so recog_level2 for RTL-SSA).
But doing it in validate_change_1 seems more robust, since anything
calling that function is considering changing the insn code.
Nope, doesn't help at all.   I'd briefly put a reset of the INSN_CODE 
and a call to recog_memoized in the costing path of rtl-ssa to see if 
that would allow things to move forward, but it failed miserably.


I'll pass along the .i file separately.  Hopefully it'll fail for you 
and you can debug.  But given failure depends on stale bits in 
recog_data, it may not.


Jeff


Re: [PATCH] btf: print string position as comment for validation and testing purposes.

2024-01-08 Thread David Faust
Hi Cupertino,

On 1/8/24 02:55, Cupertino Miranda wrote:
> Hi everyone,
> 
> This patch adds a comment to the BTF strings regarding their position
> within the section. This is useful for assembly inspection purposes.
> 
> Regards,
> Cupertino
> 
> When using -dA, this function was only printing as comment btf_string or
> btf_aux_string.
> This patch changes the comment to also include the position of the
> string within the section in hexadecimal format.
> 
> gcc/ChangeLog:
>   * btfout.cc (output_btf_strs): Changed.

Please be a little bit more expressive in the ChangeLog.
Something along the lines of "print string offset in comment" will be
much more useful.

LGTM with that change, please apply.
Thanks!

> ---
>  gcc/btfout.cc | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index db4f1084f85c..04218adc9e66 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -1081,17 +1081,20 @@ static void
>  output_btf_strs (ctf_container_ref ctfc)
>  {
>ctf_string_t * ctf_string = ctfc->ctfc_strtable.ctstab_head;
> +  static int str_pos = 0;
>  
>while (ctf_string)
>  {
> -  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string");
> +  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string, str_pos 
> = 0x%x", str_pos);
> +  str_pos += strlen(ctf_string->cts_str) + 1;
>ctf_string = ctf_string->cts_next;
>  }
>  
>ctf_string = ctfc->ctfc_aux_strtable.ctstab_head;
>while (ctf_string)
>  {
> -  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string");
> +  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string, 
> str_pos = 0x%x", str_pos);
> +  str_pos += strlen(ctf_string->cts_str) + 1;
>ctf_string = ctf_string->cts_next;
>  }
>  }


Re: [PATCH] c++/modules: Prevent overwriting arguments for duplicates [PR112588]

2024-01-08 Thread Patrick Palka
On Mon, 8 Jan 2024, Nathaniel Shead wrote:

> On Sat, Jan 06, 2024 at 05:32:37PM -0500, Nathan Sidwell wrote:
> > I;m not sure about this, there was clearly a reason I did it the way it is,
> > but perhaps that reasoning became obsolete -- something about an existing
> > declaration and reading in a definition maybe?
> > 
> > nathan
> 
> So I took a bit of a closer look and this is actually a regression,
> seeming to start with r13-3134-g09df0d8b14dda6. I haven't looked more
> closely at the actual change though to see whether this implies a
> different fix yet though.

Interesting..  FWIW I applied your patch to the gcc 12 release branch,
which doesn't have r13-3134, and there were no modules testsuite
regressions there either, which at least suggests that this maybe_dup
logic isn't directly related to the optimization that r13-3134 removed.

Your patch also seems to fix PR99244 (which AFAICT is not a regression)

> 
> Nathaniel
> 
> > On 11/22/23 06:33, Nathaniel Shead wrote:
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
> > > access.
> > > 
> > > -- >8 --
> > > 
> > > When merging duplicate instantiations of function templates, currently
> > > read_function_def overwrites the arguments with that of the existing
> > > duplicate. This is problematic, however, since this means that the
> > > PARM_DECLs in the body of the function definition no longer match with
> > > the PARM_DECLs in the argument list, which causes issues when it comes
> > > to generating RTL.
> > > 
> > > There doesn't seem to be any reason to do this replacement, so this
> > > patch removes that logic.
> > > 
> > >   PR c++/112588
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * module.cc (trees_in::read_function_def): Don't overwrite
> > >   arguments.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/modules/merge-16.h: New test.
> > >   * g++.dg/modules/merge-16_a.C: New test.
> > >   * g++.dg/modules/merge-16_b.C: New test.
> > > 
> > > Signed-off-by: Nathaniel Shead 
> > > ---
> > >   gcc/cp/module.cc  |  2 --
> > >   gcc/testsuite/g++.dg/modules/merge-16.h   | 10 ++
> > >   gcc/testsuite/g++.dg/modules/merge-16_a.C |  7 +++
> > >   gcc/testsuite/g++.dg/modules/merge-16_b.C |  5 +
> > >   4 files changed, 22 insertions(+), 2 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16.h
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_a.C
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_b.C
> > > 
> > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > index 4f5b6e2747a..2520ab659cc 100644
> > > --- a/gcc/cp/module.cc
> > > +++ b/gcc/cp/module.cc
> > > @@ -11665,8 +11665,6 @@ trees_in::read_function_def (tree decl, tree 
> > > maybe_template)
> > > DECL_RESULT (decl) = result;
> > > DECL_INITIAL (decl) = initial;
> > > DECL_SAVED_TREE (decl) = saved;
> > > -  if (maybe_dup)
> > > - DECL_ARGUMENTS (decl) = DECL_ARGUMENTS (maybe_dup);
> > > if (context)
> > >   SET_DECL_FRIEND_CONTEXT (decl, context);
> > > diff --git a/gcc/testsuite/g++.dg/modules/merge-16.h 
> > > b/gcc/testsuite/g++.dg/modules/merge-16.h
> > > new file mode 100644
> > > index 000..fdb38551103
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/modules/merge-16.h
> > > @@ -0,0 +1,10 @@
> > > +// PR c++/112588
> > > +
> > > +void f(int*);
> > > +
> > > +template 
> > > +struct S {
> > > +  void g(int n) { f(); }
> > > +};
> > > +
> > > +template struct S;

If we use a partial specialization here instead (which would have disabled
the removed optimization, demonstrating how fragile/inconsistent it was)

  void f(int*);

  template 
  struct S { };

  template
  struct S {
void g(int n) { f(); }
  };

  template struct S;

then the ICE appears earlier, since GCC 12 instead of 13.

> > > diff --git a/gcc/testsuite/g++.dg/modules/merge-16_a.C 
> > > b/gcc/testsuite/g++.dg/modules/merge-16_a.C
> > > new file mode 100644
> > > index 000..c243224c875
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/modules/merge-16_a.C
> > > @@ -0,0 +1,7 @@
> > > +// PR c++/112588
> > > +// { dg-additional-options "-fmodules-ts" }
> > > +// { dg-module-cmi merge16 }
> > > +
> > > +module;
> > > +#include "merge-16.h"
> > > +export module merge16;
> > > diff --git a/gcc/testsuite/g++.dg/modules/merge-16_b.C 
> > > b/gcc/testsuite/g++.dg/modules/merge-16_b.C
> > > new file mode 100644
> > > index 000..8c7b1f0511f
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/modules/merge-16_b.C
> > > @@ -0,0 +1,5 @@
> > > +// PR c++/112588
> > > +// { dg-additional-options "-fmodules-ts" }
> > > +
> > > +#include "merge-16.h"
> > > +import merge16;
> > 
> > -- 
> > Nathan Sidwell
> > 
> 
> 



Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]

2024-01-08 Thread Jeff Law




On 1/8/24 09:57, Andrew Pinski wrote:

On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak  wrote:


Instead of converting XOR or PLUS of two values, ANDed with two constants that
have no bits in common, to IOR expression, convert IOR or XOR of said two
ANDed values to PLUS expression.


I think this only helps targets which have leal like instruction. Also
I think it is the same issue as I recorded as PR 111763 .  I suspect
BIT_IOR is more of a Canonical form for GIMPLE while we should handle
this in expand to decide if we want to use PLUS or IOR.
Actually there's benefit on RISC-V to using PLUS over IOR/XOR when 
there's no bits in common.  In fact, I've been asked to do that by 
Andrew W. for a case where we know ahead of time there's no bits in 
common in a sequence that currently uses IOR.


Specifically it can allow more use of the compact instructions as the 
compact PLUS allows the full set of hard registers while compact IOR/XOR 
only allow a subset of registers.


jeff


Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-08 Thread Richard Sandiford
Jeff Law  writes:
> On 1/8/24 04:52, Richard Sandiford wrote:
>> Jeff Law  writes:
>>> The other issue that's been in the back of my mind is costing.  But I
>>> think the model here is combine without regards to cost.
>> 
>> No, it does take costing into account.  For size, it's the usual
>> "sum up the before and after insn costs and see which one is lower".
>> For speed, the costs are weighted by execution frequency, so e.g.
>> two insns of cost 4 in the same block can be combined into a single
>> instruction of cost 8, but a hoisted invariant can only be combined
>> into a loop body instruction if the loop body instruction's cost
>> doesn't increase significantly.
>> 
>> This is done by rtl_ssa::changes_are_worthwhile.
> You're absolutely correct.  My bad.
>
> Interesting that's exactly where we do have a notable concern.

Gah.

> If you remember, there were a few ports that failed to build 
> newlib/libgcc that we initially ignored.  I went back and looked at one 
> (arc-elf).
>
> What appears to be happening for arc-elf is we're testing to see if the 
> change is profitable.  On arc-elf the costing model is highly dependent 
> on the length of the insns.
>
> We've got a very reasonable looking insn:
>
>> (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300])
>> (ashift:SI (reg:SI 27 fp [548])
>> (const_int 4 [0x4]))) 
>> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 
>> {*ashlsi3_insn}
>>  (nil))
>
> We call rtl_ssa::changes_are_profitable -> insn_cost -> arc_insn_cost -> 
> get_attr_length -> get_attr_length_1 -> insn_default_length
>
> insn_default_length grubs around looking at the operands via recog_data 
> which appears to be stale:
>
>
>
>> (gdb) p debug_rtx(recog_data.operand[0])
>> (reg/v:SI 18 r18 [orig:300 inex ] [300])
>> $4 = void
>> (gdb) p debug_rtx(recog_data.operand[1])
>> (reg/v:SI 3 r3 [orig:300 inex ] [300])
>> $5 = void
>> (gdb) p debug_rtx(recog_data.operand[2])
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x01432955 in rtx_writer::print_rtx (this=0x7fffe0e0, 
>> in_rtx=0xabababababababab) at /home/jlaw/test/gcc/gcc/print-rtl.cc:809
>> 809   else if (GET_CODE (in_rtx) > NUM_RTX_CODE)
>
> Note the 0xabab That was accessing operand #2, which should have 
> been (const_int 4).
>
> Sure enough if I force re-recognition then look at the recog_data I get 
> the right values.
>
> After LRA we have:
>
>> (insn 753 2434 3674 98 (set (reg/v:SI 3 r3 [orig:300 inex ] [300])
>> (ashift:SI (reg:SI 27 fp [548])
>> (const_int 4 [0x4]))) 
>> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 
>> {*ashlsi3_insn}
>>  (nil))
>> (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300])
>> (reg/v:SI 3 r3 [orig:300 inex ] [300])) 
>> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 3 
>> {*movsi_insn}
>>  (nil))
>
> In the emergency dump in late_combine2 (so cleanup hasn't been done):
>
>> (insn 753 2434 3674 98 (set (reg/v:SI 3 r3 [orig:300 inex ] [300])
>> (ashift:SI (reg:SI 27 fp [548])
>> (const_int 4 [0x4]))) 
>> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 
>> {*ashlsi3_insn}
>>  (nil))
>> (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300])
>> (ashift:SI (reg:SI 27 fp [548])
>> (const_int 4 [0x4]))) 
>> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 
>> {*ashlsi3_insn}
>>  (nil))
>
>
> Which brings us to the question.  If we change the form of an insn, then 
> ask for its cost, don't we need to make sure the insn is re-recognized 
> as the costing function may do things like query the insn's length which 
> would use cached recog_data?

Yeah, this only happens once we've verified that the new instruction
is valid.  And it looks from the emergency dump above that the insn
code has been correctly updated to *ashlsi3_insn.

This is a bit of a hopeful stab, but is the problem that recog_data still
had the previous contents of insn 3674, and so extract_insn_cached wrongly
thought that it doesn't need to do anything?  If so, does something like:

diff --git a/gcc/recog.cc b/gcc/recog.cc
index a6799e3f5e6..8ba63c78179 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -267,6 +267,8 @@ validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool 
in_group,
 case invalid.  */
   changes[num_changes].old_code = INSN_CODE (object);
   INSN_CODE (object) = -1;
+  if (recog_data.insn == object)
+   recog_data.insn = nullptr;
 }
 
   num_changes++;

fix it?  I suppose there's an argument that this belongs in whatever code
sets INSN_CODE to a new nonnegative value (so recog_level2 for RTL-SSA).
But doing it in validate_change_1 seems more robust, since anything
calling that function is considering changing the insn code.

Thanks for debugging the problem.

Richard


Re: breakage with: [committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20

2024-01-08 Thread Jonathan Wakely
On Mon, 8 Jan 2024 at 16:25, Hans-Peter Nilsson wrote:
>
> (Sorry, never a bringer of good news...)

Regarding this bit ... even if you're reporting something I've broken,
I like to see it as an incremental step towards better portability, so
it's always good news ;-)


Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]

2024-01-08 Thread Andrew Pinski
On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak  wrote:
>
> Instead of converting XOR or PLUS of two values, ANDed with two constants that
> have no bits in common, to IOR expression, convert IOR or XOR of said two
> ANDed values to PLUS expression.

I think this only helps targets which have leal like instruction. Also
I think it is the same issue as I recorded as PR 111763 .  I suspect
BIT_IOR is more of a Canonical form for GIMPLE while we should handle
this in expand to decide if we want to use PLUS or IOR.

Thanks,
Andrew Pinski

>
> If we consider the following testcase:
>
> --cut here--
> unsigned int foo (unsigned int a, unsigned int b)
> {
>   unsigned int r = a & 0x1;
>   unsigned int p = b & ~0x3;
>
>   return r + p + 2;
> }
>
> unsigned int bar (unsigned int a, unsigned int b)
> {
>   unsigned int r = a & 0x1;
>   unsigned int p = b & ~0x3;
>
>   return r | p | 2;
> }
> --cut here--
>
> the above testcase compiles (x86_64 -O2) to:
>
> foo:
> andl$1, %edi
> andl$-4, %esi
> orl %esi, %edi
> leal2(%rdi), %eax
> ret
>
> bar:
> andl$1, %edi
> andl$-4, %esi
> orl %esi, %edi
> movl%edi, %eax
> orl $2, %eax
> ret
>
> There is no further simplification possible in any case, we can't combine
> OR with a PLUS in the first case, and we don't have OR instruction with
> multiple inputs in the second case.
>
> If we switch around the logic in the conversion and convert from IOR/XOR
> to PLUS, then the resulting assembly reads:
>
> foo:
> andl$-4, %esi
> andl$1, %edi
> leal2(%rsi,%rdi), %eax
> ret
>
> bar:
> andl$1, %edi
> andl$-4, %esi
> leal(%rdi,%rsi), %eax
> orl $2, %eax
> ret
>
> On x86, the conversion can now use LEA instruction, which is much more
> usable than OR instruction.  In the first case, LEA implements three input
> ADD instruction, while in the second case, even though the instruction
> can't be combined with a follow-up OR, the non-destructive LEA avoids a move.
>
> PR target/108477
>
> gcc/ChangeLog:
>
> * match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2):
> Do not convert PLUS of two values, ANDed with two constants
> that have no bits in common to IOR exporession, convert
> IOR or XOR of said two ANDed values to PLUS expression.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr108477.c: New test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> OK for mainline?
>
> Uros.


Re: breakage with: [committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20

2024-01-08 Thread Jonathan Wakely
On Mon, 8 Jan 2024 at 16:28, Hans-Peter Nilsson wrote:
>
> > From: Hans-Peter Nilsson 
> > Date: Mon, 8 Jan 2024 17:24:35 +0100
>
> > For some reason, this (r14-6990-g74a0dab18292be) breaks a
> > build of (newlib targets) at least cris-elf and arm-eabi:
>
> ...aaand, just now fixed in r14-7007-geb846114ed7c49.
> (Thanks!)

Yup, it got reported on IRC this morning, but I had to finish testing
the fix. Sorry for the temporary breakage.


Re: [PATCH] RISC-V: Teach liveness computation loop invariant shift amount[Dynamic LMUL]

2024-01-08 Thread Robin Dapp
> > +  if (is_gimple_min_invariant (op))
> > +    return true;
> > +  if (SSA_NAME_IS_DEFAULT_DEF (op)
> > +  || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT 
> (op
> > +    return true;
> > +  return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1;
> > +}
> > +

Does gimple_uid ever return something useful for us here?
In tree-ssa-loop-ch it is being populated
before and then used but I don't think we populate it properly?

So my question would be, isn't is_gimple_constant and
flow_bb_inside_loop_p sufficient for our purpose?

Regards
 Robin


Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Wilco Dijkstra
Hi Richard,

>> Benchmarking showed that LSE and LSE2 RMW atomics have similar performance 
>> once
>> the atomic is acquire, release or both. Given there is already a significant 
>> overhead due
>> to the function call, PLT indirection and argument setup, it doesn't make 
>> sense to add
>> extra taken branches that may mispredict or cause extra fetch cycles...
>
> Thanks for the extra context, especially wrt the LSE/LSE2 benchmarking.
> If there isn't any difference for acquire vs. the rest, is there a
> justification we can use for keeping the acquire branch, rather than
> using SWPAL for everything except relaxed?

The results showed that acquire is typically slightly faster than release 
(5-10%), so for the
most frequently used atomics (CAS and SWP) it makes sense to add support for 
acquire.
In most cases once you have release semantics, adding acquire didn't make things
slower, so combining release/acq_rel/seq_cst avoids unnecessary extra branches 
and
keeps the code small.

> If so, then Victor, could you include that in the explanation above and
> add it as a source comment?  Although maybe tone down "doesn't make
> sense to add" to something like "doesn't seem worth adding". :)

Yes it's worth adding a comment to this effect.

Cheers,
Wilco

Re: breakage with: [committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20

2024-01-08 Thread Hans-Peter Nilsson
> From: Hans-Peter Nilsson 
> Date: Mon, 8 Jan 2024 17:24:35 +0100

> For some reason, this (r14-6990-g74a0dab18292be) breaks a
> build of (newlib targets) at least cris-elf and arm-eabi:

...aaand, just now fixed in r14-7007-geb846114ed7c49.
(Thanks!)

brgds, H-P



breakage with: [committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20

2024-01-08 Thread Hans-Peter Nilsson
(Sorry, never a bringer of good news...)

> From: Jonathan Wakely 
> Date: Mon,  8 Jan 2024 01:15:50 +

> Tested x86_64-linux and aarch64-linux. Pushed to trunk.
> 
> -- >8 --
> 
> This change ensures that char and wchar_t arguments are formatted
> consistently when using integer presentation types. This avoids
> non-portable std::format output that depends on whether char and wchar_t
> happen to be signed or unsigned on the target. Formatting '\xff' as an
> integer will now always format 255 and not sometimes -1. This was
> approved in Kona 2023 as a DR for C++20 so the change is implemented
> unconditionally.
> 
> Also make character formatters check for _Pres_c explicitly and call
> _M_format_character directly. This avoid the overhead of calling format
> and _S_to_character and then calling _M_format_character anyway.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/version.def (format_uchar): Define.
>   * include/bits/version.h: Regenerate.
>   * include/std/format (formatter::format): Check for
>   _Pres_c and call _M_format_character directly. Cast C to its
>   unsigned equivalent for formatting as an integer.
>   (formatter::format): Likewise.
>   (basic_format_arg(T&)): Store char arguments as unsigned char
>   for formatting to a wide string.
>   * testsuite/std/format/functions/format.cc: Adjust test. Check
>   formatting of

For some reason, this (r14-6990-g74a0dab18292be) breaks a
build of (newlib targets) at least cris-elf and arm-eabi:

libtool: compile:  /obj/./gcc/xgcc -shared-libgcc -B/obj/./gcc -nostdinc++ 
-L/obj/cris-elf/libstdc++-v3/src -L/obj/cris-elf/libstdc++-v3/src/.libs 
-L/obj/cris-elf/libstdc++-v3/libsupc++/.libs -nostdinc -B/obj/cris-elf/newlib/ 
-isystem /obj/cris-elf/newlib/targ-include -isystem /x/gcc/newlib/libc/include 
-B/obj/cris-elf/libgloss/cris -L/obj/cris-elf/libgloss/libnosys 
-L/x/gcc/libgloss/cris -B/x/cris-elf/pre/cris-elf/bin/ 
-B/x/cris-elf/pre/cris-elf/lib/ -isystem /x/cris-elf/pre/cris-elf/include 
-isystem /x/cris-elf/pre/cris-elf/sys-include -I/x/gcc/libstdc++-v3/../libgcc 
-I/obj/cris-elf/libstdc++-v3/include/cris-elf 
-I/obj/cris-elf/libstdc++-v3/include -I/x/gcc/libstdc++-v3/libsupc++ 
-std=gnu++20 -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual 
-Wabi=2 -fdiagnostics-show-location=once -ffunction-sections -fdata-sections 
-frandom-seed=tzdb.lo -fimplicit-templates -g -O2 -I. -c 
/x/gcc/libstdc++-v3/src/c++20/tzdb.cc -o tzdb.o
In file included from /x/gcc/newlib/libc/include/time.h:11,
 from /obj/cris-elf/libstdc++-v3/include/ctime:42,
 from /obj/cris-elf/libstdc++-v3/include/bits/chrono.h:40,
 from /obj/cris-elf/libstdc++-v3/include/chrono:41,
 from /x/gcc/libstdc++-v3/src/c++20/tzdb.cc:31:
/obj/cris-elf/libstdc++-v3/include/bits/unicode.h:86:37: error: declaration 
does not declare anything [-fpermissive]
   86 |   inline constexpr _Null_sentinel_t __null_sentinel;
  | ^~~
make[5]: *** [Makefile:754: tzdb.lo] Error 1

I don't see anything immediately related to that line in the
patch, though, so the actual cause and fix isn't obvious, at
least to me.

brgds, H-P


Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-08 Thread Jeff Law




On 1/8/24 04:52, Richard Sandiford wrote:

Jeff Law  writes:

The other issue that's been in the back of my mind is costing.  But I
think the model here is combine without regards to cost.


No, it does take costing into account.  For size, it's the usual
"sum up the before and after insn costs and see which one is lower".
For speed, the costs are weighted by execution frequency, so e.g.
two insns of cost 4 in the same block can be combined into a single
instruction of cost 8, but a hoisted invariant can only be combined
into a loop body instruction if the loop body instruction's cost
doesn't increase significantly.

This is done by rtl_ssa::changes_are_worthwhile.

You're absolutely correct.  My bad.

Interesting that's exactly where we do have a notable concern.

If you remember, there were a few ports that failed to build 
newlib/libgcc that we initially ignored.  I went back and looked at one 
(arc-elf).


What appears to be happening for arc-elf is we're testing to see if the 
change is profitable.  On arc-elf the costing model is highly dependent 
on the length of the insns.


We've got a very reasonable looking insn:


(insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300])
(ashift:SI (reg:SI 27 fp [548])
(const_int 4 [0x4]))) 
"../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 
{*ashlsi3_insn}
 (nil))


We call rtl_ssa::changes_are_profitable -> insn_cost -> arc_insn_cost -> 
get_attr_length -> get_attr_length_1 -> insn_default_length


insn_default_length grubs around looking at the operands via recog_data 
which appears to be stale:





(gdb) p debug_rtx(recog_data.operand[0])
(reg/v:SI 18 r18 [orig:300 inex ] [300])
$4 = void
(gdb) p debug_rtx(recog_data.operand[1])
(reg/v:SI 3 r3 [orig:300 inex ] [300])
$5 = void
(gdb) p debug_rtx(recog_data.operand[2])

Program received signal SIGSEGV, Segmentation fault.
0x01432955 in rtx_writer::print_rtx (this=0x7fffe0e0, 
in_rtx=0xabababababababab) at /home/jlaw/test/gcc/gcc/print-rtl.cc:809
809   else if (GET_CODE (in_rtx) > NUM_RTX_CODE)


Note the 0xabab That was accessing operand #2, which should have 
been (const_int 4).


Sure enough if I force re-recognition then look at the recog_data I get 
the right values.


After LRA we have:


(insn 753 2434 3674 98 (set (reg/v:SI 3 r3 [orig:300 inex ] [300])
(ashift:SI (reg:SI 27 fp [548])
(const_int 4 [0x4]))) 
"../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 
{*ashlsi3_insn}
 (nil))
(insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300])
(reg/v:SI 3 r3 [orig:300 inex ] [300])) 
"../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 3 
{*movsi_insn}
 (nil))


In the emergency dump in late_combine2 (so cleanup hasn't been done):


(insn 753 2434 3674 98 (set (reg/v:SI 3 r3 [orig:300 inex ] [300])
(ashift:SI (reg:SI 27 fp [548])
(const_int 4 [0x4]))) 
"../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 
{*ashlsi3_insn}
 (nil))
(insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300])
(ashift:SI (reg:SI 27 fp [548])
(const_int 4 [0x4]))) 
"../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 
{*ashlsi3_insn}
 (nil))



Which brings us to the question.  If we change the form of an insn, then 
ask for its cost, don't we need to make sure the insn is re-recognized 
as the costing function may do things like query the insn's length which 
would use cached recog_data?


jeff







[committed] libstdc++: Remove std::__unicode::__null_sentinel

2024-01-08 Thread Jonathan Wakely
Tested x86_64-linux, pushed to trunk.

-- >8 --

The name __null_sentinel is defined as a macro by newlib, so we can't
use it as an identifier. That variable is not actually used by
libstdc++, it was added because P2728R6 proposes std::uc::null_sentinel.
Since we don't need it and it breaks bootstrap for newlib targets, just
remove it. A null sentinel can still be used by constructing a
_Null_sentinel_t object as needed, rather than having a named object of
that type predefined.

libstdc++-v3/ChangeLog:

* include/bits/unicode.h (__null_sentinel): Remove.
* testsuite/17_intro/names.cc: Add __null_sentinel.
---
 libstdc++-v3/include/bits/unicode.h  | 2 --
 libstdc++-v3/testsuite/17_intro/names.cc | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/unicode.h 
b/libstdc++-v3/include/bits/unicode.h
index 66f8399fdfb..e49498a0531 100644
--- a/libstdc++-v3/include/bits/unicode.h
+++ b/libstdc++-v3/include/bits/unicode.h
@@ -83,8 +83,6 @@ namespace __unicode
   { return *__it == iter_value_t<_It>{}; }
   };
 
-  inline constexpr _Null_sentinel_t __null_sentinel;
-
   template _Sent = _Iter,
   typename _ErrorHandler = _Repl>
diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 5e77e9f2ab0..53c5aff219d 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -140,6 +140,7 @@
 
 // These clash with newlib so don't use them.
 # define __lockablecannot be used as an identifier
+# define __null_sentinel   cannot be used as an identifier
 # define __packed  cannot be used as an identifier
 # define __unused  cannot be used as an identifier
 # define __usedcannot be used as an identifier
-- 
2.43.0



[libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-08 Thread Roger Sayle

Bootstrapping GCC on arm-linux-gnueabihf with --with-arch=armv6 currently
has a large number of FAILs in libatomic (regressions since last time I
attempted this).  The failure mode is related to IFUNC handling with the
file tas_8_2_.o containing an unresolved reference to the function
libat_test_and_set_1_i2.

Bearing in mind I've no idea what's going on, the following one line
change, to build tas_1_2_.o when building tas_8_2_.o, resolves the problem
for me and restores the libatomic testsuite to 44 expected passes and 5
unsupported tests [from 22 unexpected failures and 22 unresolved testcases].

If this looks like the correct fix, I'm not confident with rebuilding
Makefile.in with correct version of automake, so I'd very much appreciate
it if someone/the reviewer/mainainer could please check this in for me.
Thanks in advance.


2024-01-08  Roger Sayle  

libatomic/ChangeLog
* Makefile.am: Build tas_1_2_.o on ARCH_ARM_LINUX
* Makefile.in: Regenerate.


Roger
--

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index cfad90124f9..e0988a18c9a 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -139,6 +139,7 @@ if ARCH_ARM_LINUX
 IFUNC_OPTIONS   = -march=armv7-a+fp -DHAVE_KERNEL64
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
 libatomic_la_LIBADD += $(addsuffix _8_2_.lo,$(SIZEOBJS))
+libatomic_la_LIBADD += $(addsuffix _1_2_.lo,$(SIZEOBJS))
 endif
 if ARCH_I386
 IFUNC_OPTIONS   = -march=i586


Re: [RFA] [V3] new pass for sign/zero extension elimination

2024-01-08 Thread Richard Sandiford
Jeff Law  writes:
>>> +
>>> +/* Initialization of the ext-dce pass.  Primarily this means
>>> +   setting up the various bitmaps we utilize.  */
>>> +
>>> +static void
>>> +ext_dce_init (void)
>>> +{
>>> +
>> 
>> Nit: excess blank line.
> Various nits have been fixed.  I think those are all mine.  For reasons 
> I don't understand to this day, my brain thinks there should be vertical 
> whitespace between the function comment and the definition.  I'm 
> constantly having to fix that.

Yeah, I've never known whether a blank line is preferred between the
comment and function definition.  When I started (obviously somewhat
later than you :)), "yes" seemed to be much more common, but now it's
pretty mixed.  So I just do what surrounding code does.  (Personally
I slightly prefer the blank line.)

So I wasn't commenting on that part, although reading it back, I can
see how it looked like that.  It was just on the blank line immediately
above, after the opening "{".  I.e. there were some instances of:

void
f (void)
{

   ...foo...;

}

rather than:

void
f (void)
{
  ...foo...;
}

Thanks,
Richard


Re: [PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-08 Thread Richard Sandiford
Victor Do Nascimento  writes:
> On 1/5/24 11:10, Richard Sandiford wrote:
>> Victor Do Nascimento  writes:
>>> The introduction of further architectural-feature dependent ifuncs
>>> for AArch64 makes hard-coding ifunc `_i' suffixes to functions
>>> cumbersome to work with.  It is awkward to remember which ifunc maps
>>> onto which arch feature and makes the code harder to maintain when new
>>> ifuncs are added and their suffixes possibly altered.
>>>
>>> This patch uses pre-processor `#define' statements to map each suffix to
>>> a descriptive feature name macro, for example:
>>>
>>>#define LSE2 _i1
>>>
>>> and reconstructs function names with the pre-processor's token
>>> concatenation feature, such that for `MACRO(_i)', we would
>>> now have `MACRO_FEAT(name, feature)' and in the macro definition body
>>> we replace `name` with `name##feature`.
>> 
>> FWIW, another way of doing this would be to have:
>> 
>> #define CORE(NAME) NAME
>> #define LSE2(NAME) NAME##_i1
>> 
>> and use feature(name) instead of name##feature.  This has the slight
>> advantage of not using ## on empty tokens, and the maybe slightly
>> better advantage of not needing the extra forwarding step in:
>> 
>> #define ENTRY_FEAT(name, feat)   \
>>  ENTRY_FEAT1(name, feat)
>> 
>> #define ENTRY_FEAT1(name, feat)  \
>> 
>> WDYT?
>> 
>> Richard
>> 
>
> While from a strictly stylistic point of view, I'm not so keen on the 
> resulting interface and its 'function call within a function call' look, 
> e.g.
>
>ENTRY (LSE2 (libat_compare_exchange_16))
>
> and
>
>ALIAS (LSE128 (libat_compare_exchange_16), \
>   LSE2 (libat_compare_exchange_16))
>
> on the implementation-side of things, I like the benefits this brings 
> about.  Namely allowing the use of the unaltered original 
> implementations of the ENTRY, END and ALIAS macros with the 
> aforementioned advantages of not having to use ## on empty tokens and 
> abolishing the need for the extra forwarding step.
>
> I'm happy enough to go with this approach.

I was thinking that the invocations would stay the same.  A C example is:

#define LSE2(NAME) NAME##_i2
#define ENTRY(NAME, FEAT) void FEAT (NAME) ()
ENTRY(foo, LSE2) {}

https://godbolt.org/z/rdn5dEMPM

Thanks,
Richard


Re: [PATCH v2] c++/modules: Differentiate extern templates and TYPE_DECL_SUPPRESS_DEBUG [PR112820]

2024-01-08 Thread Patrick Palka
On Mon, 8 Jan 2024, Nathaniel Shead wrote:

> On Thu, Jan 04, 2024 at 03:39:15PM -0500, Patrick Palka wrote:
> > On Sun, 3 Dec 2023, Nathaniel Shead wrote:
> > 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > 
> > > -- >8 --
> > > 
> > > The TYPE_DECL_SUPPRESS_DEBUG and DECL_EXTERNAL flags use the same
> > > underlying bit. This is causing confusion when attempting to determine
> > > the interface for a streamed-in class type, since the modules code
> > > currently assumes that all DECL_EXTERNAL types are extern templates.
> > > However, when -g is specified then TYPE_DECL_SUPPRESS_DEBUG (and hence
> > > DECL_EXTERNAL) is marked on various other kinds of declarations, such as
> > > vtables, which causes them to never be emitted.
> > 
> > Good catch.. Maybe we should use different bits for these flags?  I 
> > wouldn't be
> > surprised if this bit sharing causes issues elsewhere in the compiler.  The
> > documentation in tree.h / tree-core.h says DECL_EXTERNAL is only valid for
> > VAR_DECL and FUNCTION_DECL, so at one point it was safe to share the same 
> > bit
> > but that's not true anymore it seems.
> > 
> > Looking at tree-core.h:tree_decl_common luckily we have plenty of spare 
> > bits.
> > We could also e.g. make TYPE_DECL_SUPPRESS_DEBUG use the decl_not_flexarray 
> > bit
> > which is otherwise only used for FIELD_DECL.
> > 
> 
> That seems like a good idea, thanks. How does this look?
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> Currently, DECL_EXTERNAL and TYPE_DECL_SUPPRESS_DEBUG share a bit. This
> causes issues with module code, which then incorrectly assumes that
> anything with suppressed debug info (such as vtables when '-g' is
> specified) is an extern template and thus prevents their emission.
> 
> This patch splits the two flags up; extern templates continue to use the
> DECL_EXTERNAL flag (and the documentation is updated to indicate this),
> but TYPE_DECL_SUPPRESS_DEBUG now uses the 'decl_not_flexarray' flag,
> which currently is only used by FIELD_DECLs.
> 
>   PR c++/112820
>   PR c++/102607
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (mark_class_instantiated): Set DECL_EXTERNAL explicitly.
> 
> gcc/ChangeLog:
> 
>   * tree-core.h (struct tree_decl_common): Update comments.
>   * tree.h (DECL_EXTERNAL): Update comments.
>   (TYPE_DECL_SUPPRESS_DEBUG): Use 'decl_not_flexarray' instead.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/debug-2_a.C: New test.
>   * g++.dg/modules/debug-2_b.C: New test.
>   * g++.dg/modules/debug-2_c.C: New test.
>   * g++.dg/modules/debug-3_a.C: New test.
>   * g++.dg/modules/debug-3_b.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/pt.cc | 1 +
>  gcc/testsuite/g++.dg/modules/debug-2_a.C | 9 +
>  gcc/testsuite/g++.dg/modules/debug-2_b.C | 8 
>  gcc/testsuite/g++.dg/modules/debug-2_c.C | 9 +
>  gcc/testsuite/g++.dg/modules/debug-3_a.C | 8 
>  gcc/testsuite/g++.dg/modules/debug-3_b.C | 9 +
>  gcc/tree-core.h  | 6 +++---
>  gcc/tree.h   | 8 
>  8 files changed, 51 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index e38e7a773f0..7839745035b 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -26256,6 +26256,7 @@ mark_class_instantiated (tree t, int extern_p)
>SET_CLASSTYPE_EXPLICIT_INSTANTIATION (t);
>SET_CLASSTYPE_INTERFACE_KNOWN (t);
>CLASSTYPE_INTERFACE_ONLY (t) = extern_p;
> +  DECL_EXTERNAL (TYPE_NAME (t)) = extern_p;
>TYPE_DECL_SUPPRESS_DEBUG (TYPE_NAME (t)) = extern_p;
>if (! extern_p)
>  {
> diff --git a/gcc/testsuite/g++.dg/modules/debug-2_a.C 
> b/gcc/testsuite/g++.dg/modules/debug-2_a.C
> new file mode 100644
> index 000..eed0905542b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-2_a.C
> @@ -0,0 +1,9 @@
> +// PR c++/112820
> +// { dg-additional-options "-fmodules-ts -g" }
> +// { dg-module-cmi io }
> +
> +export module io;
> +
> +export struct error {
> +  virtual const char* what() const noexcept;
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/debug-2_b.C 
> b/gcc/testsuite/g++.dg/modules/debug-2_b.C
> new file mode 100644
> index 000..fc9afbc02e0
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-2_b.C
> @@ -0,0 +1,8 @@
> +// PR c++/112820
> +// { dg-additional-options "-fmodules-ts -g" }
> +
> +module io;
> +
> +const char* error::what() const noexcept {
> +  return "bla";
> +}
> diff --git a/gcc/testsuite/g++.dg/modules/debug-2_c.C 
> b/gcc/testsuite/g++.dg/modules/debug-2_c.C
> 

[PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]

2024-01-08 Thread Uros Bizjak
Instead of converting XOR or PLUS of two values, ANDed with two constants that
have no bits in common, to IOR expression, convert IOR or XOR of said two
ANDed values to PLUS expression.

If we consider the following testcase:

--cut here--
unsigned int foo (unsigned int a, unsigned int b)
{
  unsigned int r = a & 0x1;
  unsigned int p = b & ~0x3;

  return r + p + 2;
}

unsigned int bar (unsigned int a, unsigned int b)
{
  unsigned int r = a & 0x1;
  unsigned int p = b & ~0x3;

  return r | p | 2;
}
--cut here--

the above testcase compiles (x86_64 -O2) to:

foo:
andl$1, %edi
andl$-4, %esi
orl %esi, %edi
leal2(%rdi), %eax
ret

bar:
andl$1, %edi
andl$-4, %esi
orl %esi, %edi
movl%edi, %eax
orl $2, %eax
ret

There is no further simplification possible in any case, we can't combine
OR with a PLUS in the first case, and we don't have OR instruction with
multiple inputs in the second case.

If we switch around the logic in the conversion and convert from IOR/XOR
to PLUS, then the resulting assembly reads:

foo:
andl$-4, %esi
andl$1, %edi
leal2(%rsi,%rdi), %eax
ret

bar:
andl$1, %edi
andl$-4, %esi
leal(%rdi,%rsi), %eax
orl $2, %eax
ret

On x86, the conversion can now use LEA instruction, which is much more
usable than OR instruction.  In the first case, LEA implements three input
ADD instruction, while in the second case, even though the instruction
can't be combined with a follow-up OR, the non-destructive LEA avoids a move.

PR target/108477

gcc/ChangeLog:

* match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2):
Do not convert PLUS of two values, ANDed with two constants
that have no bits in common to IOR exporession, convert
IOR or XOR of said two ANDed values to PLUS expression.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108477.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
diff --git a/gcc/match.pd b/gcc/match.pd
index 7b4b15acc41..deac18a7635 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1830,18 +1830,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& element_precision (type) <= element_precision (TREE_TYPE (@1)))
(bit_not (rop (convert @0) (convert @1))
 
-/* If we are XORing or adding two BIT_AND_EXPR's, both of which are and'ing
+/* If we are ORing or XORing two BIT_AND_EXPR's, both of which are and'ing
with a constant, and the two constants have no bits in common,
-   we should treat this as a BIT_IOR_EXPR since this may produce more
+   we should treat this as a PLUS_EXPR since this may produce more
simplifications.  */
-(for op (bit_xor plus)
+(for op (bit_ior bit_xor)
  (simplify
   (op (convert1? (bit_and@4 @0 INTEGER_CST@1))
   (convert2? (bit_and@5 @2 INTEGER_CST@3)))
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
&& tree_nop_conversion_p (type, TREE_TYPE (@2))
&& (wi::to_wide (@1) & wi::to_wide (@3)) == 0)
-   (bit_ior (convert @4) (convert @5)
+   (plus (convert @4) (convert @5)
 
 /* (X | Y) ^ X -> Y & ~ X*/
 (simplify
diff --git a/gcc/testsuite/gcc.target/i386/pr108477.c 
b/gcc/testsuite/gcc.target/i386/pr108477.c
new file mode 100644
index 000..fb320a84c6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108477.c
@@ -0,0 +1,13 @@
+/* PR target/108477 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -masm=att" } */
+
+unsigned int foo (unsigned int a, unsigned int b)
+{
+  unsigned int r = a & 0x1;
+  unsigned int p = b & ~0x3;
+
+  return r + p + 2;
+}
+
+/* { dg-final { scan-assembler-not "orl" } } */


Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi,
>
>>> Is there no benefit to using SWPPL for RELEASE here?  Similarly for the
>>> others.
>>
>> We started off implementing all possible memory orderings available.
>> Wilco saw value in merging less restricted orderings into more
>> restricted ones - mainly to reduce codesize in less frequently used atomics.
>>
>> This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions
>> a little smaller.
>
> Benchmarking showed that LSE and LSE2 RMW atomics have similar performance 
> once
> the atomic is acquire, release or both. Given there is already a significant 
> overhead due
> to the function call, PLT indirection and argument setup, it doesn't make 
> sense to add
> extra taken branches that may mispredict or cause extra fetch cycles...

Thanks for the extra context, especially wrt the LSE/LSE2 benchmarking.
If there isn't any difference for acquire vs. the rest, is there a
justification we can use for keeping the acquire branch, rather than
using SWPAL for everything except relaxed?

If so, then Victor, could you include that in the explanation above and
add it as a source comment?  Although maybe tone down "doesn't make
sense to add" to something like "doesn't seem worth adding". :)

Richard


RE: [PATCH 2/2] arm: Add cortex-m52 doc

2024-01-08 Thread Kyrylo Tkachov


> -Original Message-
> From: Chung-Ju Wu 
> Sent: Monday, January 8, 2024 6:17 AM
> To: gcc-patches ; Kyrylo Tkachov
> ; Richard Earnshaw 
> Cc: jason...@anshingtek.com.tw
> Subject: [PATCH 2/2] arm: Add cortex-m52 doc
> 
> Hi,
> 
> This is the patch to add cortex-m52 in the Arm-related options
> sections of the gcc invoke.texi documentation.
> 
> Is it OK for trunk?

In the ChangeLog entry:
gcc/ChangeLog:

* doc/invoke.texi: Update docs.

Let's be more specific and specify something like
* doc/invoke.texi (Arm Options): Document Cortex-m52 options.

Ok with a better ChangeLog entry.
Thanks,
Kyrill


> 
> Regards,
> jasonwucj


RE: [PATCH 1/2] arm: Add cortex-m52 core

2024-01-08 Thread Kyrylo Tkachov
Hi jasonwucj,

> -Original Message-
> From: Chung-Ju Wu 
> Sent: Monday, January 8, 2024 6:16 AM
> To: gcc-patches ; Kyrylo Tkachov
> ; Richard Earnshaw 
> Cc: jason...@anshingtek.com.tw
> Subject: [PATCH 1/2] arm: Add cortex-m52 core
> 
> Hi,
> 
> Recently, Arm announced the Cortex-M52, delivering increased performance
> in DSP and ML along with a range of other features and benefits.
> For the completeness of Arm ecosystem, we hope that cortex-m52 support
> could be available in gcc-14.
> 
> Attached is the patch to support cortex-m52 cpu with MVE and PACBTI enabled in
> GCC.
> Bootstrapped and tested on arm-none-eabi.
> 
> Is it OK for trunk?

The patch looks good to me. It should be safe to include it in GCC 14 as it 
doesn’t add any new logic beyond a new entry in arm-cpus.in.
Do you have commit rights to push it?
Thanks,
Kyrill

> 
> Regards,
> jasonwucj


Re: [Patch] GCN: Add pre-initial support for gfx1100

2024-01-08 Thread Tobias Burnus

Hi Andrew,

Andrew Stubbs wrote:

   OK for mainline ?


This looks fine to me. I know there will be things that need fixing for 
both experimental architectures.


Indeed. I tried to be a bit more verbose also to avoid too high 
expectations by occasional gcc-patches@ readers.


P.S. Apologies, but I think my commits today conflict a little; you 
should be able to drop the hunks that patch deleted code.


I did so - but I then realized that I should have also added gfx1100 to 
the new chunk.


Committed as r14-7006-g97a52f69d209f6 (see attachment) - as follow up to 
the original r14-7005-g52a2c659ae6c21


Tobiascommit 97a52f69d209f69e755ffad6897c7176da9ac686
Author: Tobias Burnus 
Date:   Mon Jan 8 15:18:10 2024 +0100

amdgcn: Add gfx1100 to new XNACK defaults in mkoffload

Commit r14-6997-g78dff4c25c1b95 added an arch-dependent
SET_XNACK_OFF vs. SET_XNACK_ANY check; that was added
between writing and committing the add-gfx1100
commit r14-7005-g52a2c659ae6c21 - and I missed to add
it there.

gcc/ChangeLog:

* config/gcn/mkoffload.cc (main): Handle gfx1100
when setting the default XNACK.
---
 gcc/config/gcn/mkoffload.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 2cd201d56ca..d4cd509089e 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -1018,6 +1018,7 @@ main (int argc, char **argv)
 case EF_AMDGPU_MACH_AMDGCN_GFX906:
 case EF_AMDGPU_MACH_AMDGCN_GFX908:
 case EF_AMDGPU_MACH_AMDGCN_GFX1030:
+case EF_AMDGPU_MACH_AMDGCN_GFX1100:
   SET_XNACK_OFF (elf_flags);
   break;
 case EF_AMDGPU_MACH_AMDGCN_GFX90a:


RE: [PATCH]middle-end: check if target can do extract first for early breaks [PR113199]

2024-01-08 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Monday, January 8, 2024 12:48 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end: check if target can do extract first for 
> early breaks
> [PR113199]
> 
> On Tue, 2 Jan 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > I was generating the vector reverse mask without checking if the target
> > actually supported such an operation.
> >
> > It also seems like more targets implement VEC_EXTRACT than permute on mask
> > registers.
> >
> > So this adds a check for IFN_VEC_EXTRACT support when required and changes
> > the select first code to use it.
> >
> > This is good for now since masks always come from whilelo.  But in the 
> > future
> > when masks can come from other sources we will need the old code back.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues with --enable-checking=release --enable-lto
> > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> > tested on cross cc1 for amdgcn-amdhsa and issue fixed.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/113199
> > * tree-vect-loop.cc (vectorizable_live_operation_1): Use
> > IFN_VEC_EXTRACT.
> > (vectorizable_live_operation): Check for IFN_VEC_EXTRACT support.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/113199
> > * gcc.target/gcn/pr113199.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/gcc.target/gcn/pr113199.c
> b/gcc/testsuite/gcc.target/gcn/pr113199.c
> > new file mode 100644
> > index
> ..8a641e5536e80e207ca01
> 63cac66c0f4f6ca93f7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/gcn/pr113199.c
> > @@ -0,0 +1,44 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O2" } */
> > +
> > +typedef long unsigned int size_t;
> > +typedef int wchar_t;
> > +struct tm
> > +{
> > +  int tm_mon;
> > +  int tm_year;
> > +};
> > +int abs (int);
> > +struct lc_time_T { const char *month[12]; };
> > +struct __locale_t * __get_current_locale (void) { }
> > +const struct lc_time_T * __get_time_locale (struct __locale_t *locale) { }
> > +const wchar_t * __ctloc (wchar_t *buf, const char *elem, size_t *len_ret) {
> return buf; }
> > +size_t
> > +__strftime (wchar_t *s, size_t maxsize, const wchar_t *format,
> > + const struct tm *tim_p, struct __locale_t *locale)
> > +{
> > +  size_t count = 0;
> > +  const wchar_t *ctloc;
> > +  wchar_t ctlocbuf[256];
> > +  size_t i, ctloclen;
> > +  const struct lc_time_T *_CurrentTimeLocale = __get_time_locale (locale);
> > +{
> > +  switch (*format)
> > + {
> > + case L'B':
> > +   (ctloc = __ctloc (ctlocbuf, _CurrentTimeLocale->month[tim_p->tm_mon],
> ));
> > +   for (i = 0; i < ctloclen; i++)
> > + {
> > +   if (count < maxsize - 1)
> > +  s[count++] = ctloc[i];
> > +   else
> > +  return 0;
> > +   {
> > +  int century = tim_p->tm_year >= 0
> > +? tim_p->tm_year / 100 + 1900 / 100
> > +: abs (tim_p->tm_year + 1900) / 100;
> > +   }
> > +   }
> > + }
> > +}
> > +}
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> 37f1be1101ffae779214056a0886411e0683e887..5aa92e67444e7aacf458fffa14
> 28f1983c482374 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10648,36 +10648,18 @@ vectorizable_live_operation_1 (loop_vec_info
> loop_vinfo,
> >   _VINFO_MASKS (loop_vinfo),
> >   1, vectype, 0);
> >tree scalar_res;
> > +  gimple_seq_add_seq (, tem);
> >
> >/* For an inverted control flow with early breaks we want 
> > EXTRACT_FIRST
> > -instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
> > +instead of EXTRACT_LAST.  For now since the mask always comes from a
> > +WHILELO we can get the first element ignoring the mask since CLZ of the
> > +mask will always be zero.  */
> >if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > -   {
> > - /* First create the permuted mask.  */
> > - tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> > - tree perm_dest = copy_ssa_name (mask);
> > - gimple *perm_stmt
> > -   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> > -  mask, perm_mask);
> > - vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> > -  );
> > - mask = perm_dest;
> > -
> > - /* Then permute the vector contents.  */
> > - tree perm_elem = perm_mask_for_reverse (vectype);
> > - perm_dest = copy_ssa_name (vec_lhs_phi);
> > - perm_stmt
> > -   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
> > -  vec_lhs_phi, perm_elem);
> > - 

RE: [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands

2024-01-08 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Monday, January 8, 2024 12:38 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end: maintain LCSSA form when peeled vector
> iterations have virtual operands
> 
> On Fri, 29 Dec 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch fixes several interconnected issues.
> >
> > 1. When picking an exit we wanted to check for niter_desc.may_be_zero not
> true.
> >i.e. we want to pick an exit which we know will iterate at least once.
> >However niter_desc.may_be_zero is not a boolean.  It is a tree that 
> > encodes
> >a boolean value.  !niter_desc.may_be_zero is just checking if we have 
> > some
> >information, not what the information is.  This leads us to pick a more
> >difficult to vectorize exit more often than we should.
> >
> > 2. Because we had this bug, we used to pick an alternative exit much more 
> > ofthen
> >which showed one issue, when the loop accesses memory and we "invert it" 
> > we
> >would corrupt the VUSE chain.  This is because on an peeled vector 
> > iteration
> >every exit restarts the loop (i.e. they're all early) BUT since we may 
> > have
> >performed a store, the vUSE would need to be updated.  This version 
> > maintains
> >virtual PHIs correctly in these cases.   Note that we can't simply 
> > remove all
> >of them and recreate them because we need the PHI nodes still in the 
> > right
> >order for if skip_vector.
> >
> > 3. Since we're moving the stores to a safe location I don't think we 
> > actually
> >need to analyze whether the store is in range of the memref,  because if 
> > we
> >ever get there, we know that the loads must be in range, and if the 
> > loads are
> >in range and we get to the store we know the early breaks were not taken 
> > and
> >so the scalar loop would have done the VF stores too.
> >
> > 4. Instead of searching for where to move stores to, they should always be 
> > in
> >exit belonging to the latch.  We can only ever delay stores and even if 
> > we
> >pick a different exit than the latch one as the main one, effects still
> >happen in program order when vectorized.  If we don't move the stores to 
> > the
> >latch exit but instead to whever we pick as the "main" exit then we can
> >perform incorrect memory accesses (luckily these are trapped by 
> > verify_ssa).
> >
> > 5. We only used to analyze loads inside the same BB as an early break, and 
> > also
> >we'd never analyze the ones inside the block where we'd be moving memory
> >references to.  This is obviously bogus and to fix it this patch splits 
> > apart
> >the two constraints.  We first validate that all load memory references 
> > are
> >in bounds and only after that do we perform the alias checks for the 
> > writes.
> >This makes the code simpler to understand and more trivially correct.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues with --enable-checking=release --enable-lto
> > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/113137
> > PR tree-optimization/113136
> > PR tree-optimization/113172
> > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
> > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > (vect_do_peeling): Maintain virtual PHIs on inverted loops.
> > * tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
> > latch.
> > (vect_create_loop_vinfo): Record all conds instead of only alt ones.
> > * tree-vectorizer.h: Fix comment
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/113137
> > PR tree-optimization/113136
> > PR tree-optimization/113172
> > * g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
> > * g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
> > * gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
> > * gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
> > * gcc.dg/vect/vect-early-break_97-pr113172.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> > new file mode 100644
> > index
> ..f78db8669dcc65f1b45ea7
> 8f4433d175e1138332
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int b;
> > +void a() __attribute__((__noreturn__));
> > +void c() {
> > +  char *buf;
> > +  int bufsz = 64;
> > +  

Re: [PATCH] Clarify -mmovbe documentation

2024-01-08 Thread Uros Bizjak
On Mon, Jan 8, 2024 at 10:56 AM Richard Biener  wrote:
>
> It was noticed that -mmovbe doesn't use movbe for __builtin_bswap{32,64}
> when not optimizing.  The follownig adjusts the documentation to
> say it will be used for optimizing and applies to all byte swaps,
> not just those carried out via builtin function calls.
>
> OK?
>
> Thanks,
> Richard.
>
> * doc/invoke.texi (-mmovbe): Clarify.

OK.

Thanks,
Uros.

> ---
>  gcc/doc/invoke.texi | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 68d1f364ac0..8cf99f395a5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -34708,8 +34708,8 @@ see @ref{Other Builtins} for details.
>
>  @opindex mmovbe
>  @item -mmovbe
> -This option enables use of the @code{movbe} instruction to implement
> -@code{__builtin_bswap32} and @code{__builtin_bswap64}.
> +This option enables use of the @code{movbe} instruction to optimize
> +byte swapping of four and eight byte entities.
>
>  @opindex mshstk
>  @item -mshstk
> --
> 2.35.3


[PATCH 5/5] RISC-V: Document the syntax of -march

2024-01-08 Thread Kito Cheng
---
 gcc/doc/invoke.texi | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 68d1f364ac0..81ee7ac758a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -30037,6 +30037,22 @@ Generate code for given RISC-V ISA (e.g.@: 
@samp{rv64im}).  ISA strings must be
 lower-case.  Examples include @samp{rv64i}, @samp{rv32g}, @samp{rv32e}, and
 @samp{rv32imaf}.
 
+The syntax of the ISA string is defined as follows:
+
+@table @code
+@item The string must start with @samp{rv32} or @samp{rv64}, followed by
+@samp{i}, @samp{e}, or @samp{g}, referred to as the base ISA.
+@item The subsequent part of the string is a list of extension names. Extension
+names can be categorized as multi-letter (e.g.@: @samp{zba}) and single-letter
+(e.g.@: @samp{v}). Single-letter extensions can appear consecutively,
+but multi-letter extensions must be separated by underscores.
+@item An underscore can appear anywhere after the base ISA. It has no specific
+effect but is used to improve readability and can act as a separator.
+@item Extension names may include an optional version number, following the
+syntax @samp{p} or @samp{}, (e.g.@: @samp{m2p1} or
+@samp{m2}).
+@end table
+
 When @option{-march=} is not specified, use the setting from @option{-mcpu}.
 
 If both @option{-march} and @option{-mcpu=} are not specified, the default for
-- 
2.34.1



[PATCH 4/5] RISC-V: Update testsuite due to -march string relaxation

2024-01-08 Thread Kito Cheng
We has relaxed -march string, it no longer require canonical order, so
we need update some of those testcase.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-23.c: Update test.
* gcc.target/riscv/arch-27.c: Ditto.
* gcc.target/riscv/arch-28.c: Ditto.
* gcc.target/riscv/attribute-10.c: Ditto.
---
 gcc/testsuite/gcc.target/riscv/arch-23.c  | 1 -
 gcc/testsuite/gcc.target/riscv/arch-27.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/arch-28.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/attribute-10.c | 4 +++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/arch-23.c 
b/gcc/testsuite/gcc.target/riscv/arch-23.c
index fca5425790c..aacfc451043 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-23.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-23.c
@@ -4,7 +4,6 @@ int foo()
 {
 }
 
-/* { dg-error "ISA string is not in canonical order. 'c'" "" { target *-*-* } 
0 } */
 /* { dg-error "extension 'w' is unsupported standard single letter extension" 
"" { target *-*-* } 0 } */
 /* { dg-error "extension 'zvl' starts with 'z' but is unsupported standard 
extension" "" { target *-*-* } 0 } */
 /* { dg-error "extension 's123' starts with 's' but is unsupported standard 
supervisor extension" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-27.c 
b/gcc/testsuite/gcc.target/riscv/arch-27.c
index 70143b2156f..03f07deedd1 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-27.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-27.c
@@ -4,4 +4,4 @@ int foo()
 {
 }
 
-/* { dg-error "ISA string is not in canonical order. 'e'" "" { target *-*-* } 
0 } */
+/* { dg-error "'i', 'e' or 'g' must be the first extension" "" { target *-*-* 
} 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-28.c 
b/gcc/testsuite/gcc.target/riscv/arch-28.c
index 934399a7b3a..0f83c03ad3d 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-28.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-28.c
@@ -4,4 +4,4 @@ int foo()
 {
 }
 
-/* { dg-error "ISA string is not in canonical order. 'e'" "" { target *-*-* } 
0 } */
+/* { dg-error "'i', 'e' or 'g' must be the first extension" "" { target *-*-* 
} 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-10.c 
b/gcc/testsuite/gcc.target/riscv/attribute-10.c
index 868adef6ab7..8a7f0a8ac49 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-10.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-10.c
@@ -3,4 +3,6 @@
 int foo()
 {
 }
-/* { dg-error "unexpected ISA string at end:" "" { target { "riscv*-*-*" } } 0 
} */
+/* { dg-error "extension 'u' is unsupported standard single letter extension" 
"" { target { "riscv*-*-*" } } 0 } */
+/* { dg-error "extension 'n' is unsupported standard single letter extension" 
"" { target { "riscv*-*-*" } } 0 } */
+/* { dg-error "'i', 'e' or 'g' must be the first extension" "" { target { 
"riscv*-*-*" } } 0 } */
-- 
2.34.1



[PATCH 3/5] RISC-V: Remove unused function in riscv_subset_list [NFC]

2024-01-08 Thread Kito Cheng
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_std_ext): Remove.
(riscv_subset_list::parse_multiletter_ext): Remove.
* config/riscv/riscv-subset.h
(riscv_subset_list::parse_std_ext): Remove.
(riscv_subset_list::parse_multiletter_ext): Remove.
---
 gcc/common/config/riscv/riscv-common.cc | 179 
 gcc/config/riscv/riscv-subset.h |   4 -
 2 files changed, 183 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 891ecfce464..cf1c82c9f5e 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1059,73 +1059,6 @@ riscv_subset_list::parse_base_ext (const char *p)
   return p;
 }
 
-
-/* Parsing function for standard extensions.
-
-   Return Value:
- Points to the end of extensions.
-
-   Arguments:
- `p`: Current parsing position.  */
-
-const char *
-riscv_subset_list::parse_std_ext (const char *p)
-{
-  const char *all_std_exts = riscv_supported_std_ext ();
-  const char *std_exts = all_std_exts;
-
-  unsigned major_version = 0;
-  unsigned minor_version = 0;
-  char std_ext = '\0';
-  bool explicit_version_p = false;
-
-  while (p != NULL && *p)
-{
-  char subset[2] = {0, 0};
-
-  if (*p == 'x' || *p == 's' || *p == 'z')
-   break;
-
-  if (*p == '_')
-   {
- p++;
- continue;
-   }
-
-  std_ext = *p;
-
-  /* Checking canonical order.  */
-  const char *prior_std_exts = std_exts;
-
-  while (*std_exts && std_ext != *std_exts)
-   std_exts++;
-
-  subset[0] = std_ext;
-  if (std_ext != *std_exts && standard_extensions_p (subset))
-   {
- error_at (m_loc,
-   "%<-march=%s%>: ISA string is not in canonical order. "
-   "%<%c%>",
-   m_arch, *p);
- /* Extension ordering is invalid.  Ignore this extension and keep
-searching for other issues with remaining extensions.  */
- std_exts = prior_std_exts;
- p++;
- continue;
-   }
-
-  std_exts++;
-
-  p++;
-
-  p = parsing_subset_version (subset, p, _version, _version,
- /* std_ext_p= */ true, _version_p);
-
-  add (subset, major_version, minor_version, explicit_version_p, false);
-}
-  return p;
-}
-
 /* Parsing function for one standard extensions.
 
Return Value:
@@ -1409,118 +1342,6 @@ riscv_subset_list::parse_single_multiletter_ext (const 
char *p,
 
 }
 
-/* Parsing function for multi-letter extensions.
-
-   Return Value:
- Points to the end of extensions.
-
-   Arguments:
- `p`: Current parsing position.
- `ext_type`: What kind of extensions, 's', 'z' or 'x'.
- `ext_type_str`: Full name for kind of extension.  */
-
-const char *
-riscv_subset_list::parse_multiletter_ext (const char *p,
- const char *ext_type,
- const char *ext_type_str)
-{
-  unsigned major_version = 0;
-  unsigned minor_version = 0;
-  size_t ext_type_len = strlen (ext_type);
-
-  while (*p)
-{
-  if (*p == '_')
-   {
- p++;
- continue;
-   }
-
-  if (strncmp (p, ext_type, ext_type_len) != 0)
-   break;
-
-  char *subset = xstrdup (p);
-  char *q = subset;
-  const char *end_of_version;
-  bool explicit_version_p = false;
-  char *ext;
-  char backup;
-  size_t len;
-  size_t end_of_version_pos, i;
-  bool found_any_number = false;
-  bool found_minor_version = false;
-
-  /* Parse until end of this extension including version number.  */
-  while (*++q != '\0' && *q != '_')
-   ;
-
-  backup = *q;
-  *q = '\0';
-  len = q - subset;
-  *q = backup;
-
-  end_of_version_pos = len;
-  /* Find the begin of version string.  */
-  for (i = len -1; i > 0; --i)
-   {
- if (ISDIGIT (subset[i]))
-   {
- found_any_number = true;
- continue;
-   }
- /* Might be version seperator, but need to check one more char,
-we only allow p, so we could stop parsing if found
-any more `p`.  */
- if (subset[i] == 'p' &&
- !found_minor_version &&
- found_any_number && ISDIGIT (subset[i-1]))
-   {
- found_minor_version = true;
- continue;
-   }
-
- end_of_version_pos = i + 1;
- break;
-   }
-
-  backup = subset[end_of_version_pos];
-  subset[end_of_version_pos] = '\0';
-  ext = xstrdup (subset);
-  subset[end_of_version_pos] = backup;
-
-  end_of_version
-   = parsing_subset_version (ext, subset + end_of_version_pos, 
_version, _version,
- /* std_ext_p= */ false, _version_p);
-  free (ext);
-
-  if (end_of_version == 

[PATCH 2/5] RISC-V: Relax the -march string for accept any order

2024-01-08 Thread Kito Cheng
-march was require canonical order before, however it's not easy for
most user when we have so many extension, so this patch is relax the
constraint, -march accept the ISA string in any order, it only has few
requirement:

1. Must start with rv[32|64][e|i|g].
2. Multi-letter and single letter extension must be separated by
   at least one underscore(`_`).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_single_std_ext): New parameter.
(riscv_subset_list::parse_single_multiletter_ext): Ditto.
(riscv_subset_list::parse_single_ext): Ditto.
(riscv_subset_list::parse): Relax the order for the input of ISA
string.
* config/riscv/riscv-subset.h
(riscv_subset_list::parse_single_std_ext): New parameter.
(riscv_subset_list::parse_single_multiletter_ext): Ditto.
(riscv_subset_list::parse_single_ext): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-33.c: New.
* gcc.target/riscv/arch-34.c: New.
---
 gcc/common/config/riscv/riscv-common.cc  | 91 ++--
 gcc/config/riscv/riscv-subset.h  |  6 +-
 gcc/testsuite/gcc.target/riscv/arch-33.c |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-34.c |  5 ++
 4 files changed, 67 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-33.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-34.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index f0359380451..891ecfce464 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1132,10 +1132,12 @@ riscv_subset_list::parse_std_ext (const char *p)
  Points to the end of extensions.
 
Arguments:
- `p`: Current parsing position.  */
+ `p`: Current parsing position.
+ `exact_single_p`: True if input string is exactly an extension and end
+ with '\0'.  */
 
 const char *
-riscv_subset_list::parse_single_std_ext (const char *p)
+riscv_subset_list::parse_single_std_ext (const char *p, bool exact_single_p)
 {
   if (*p == 'x' || *p == 's' || *p == 'z')
 {
@@ -1146,6 +1148,11 @@ riscv_subset_list::parse_single_std_ext (const char *p)
   return nullptr;
 }
 
+  if (exact_single_p && strlen (p) > 1)
+{
+  return nullptr;
+}
+
   unsigned major_version = 0;
   unsigned minor_version = 0;
   bool explicit_version_p = false;
@@ -1296,13 +1303,16 @@ riscv_subset_list::check_conflict_ext ()
Arguments:
  `p`: Current parsing position.
  `ext_type`: What kind of extensions, 's', 'z' or 'x'.
- `ext_type_str`: Full name for kind of extension.  */
+ `ext_type_str`: Full name for kind of extension.
+ `exact_single_p`: True if input string is exactly an extension and end
+ with '\0'.   */
 
 
 const char *
 riscv_subset_list::parse_single_multiletter_ext (const char *p,
 const char *ext_type,
-const char *ext_type_str)
+const char *ext_type_str,
+bool exact_single_p)
 {
   unsigned major_version = 0;
   unsigned minor_version = 0;
@@ -1314,6 +1324,7 @@ riscv_subset_list::parse_single_multiletter_ext (const 
char *p,
   char *subset = xstrdup (p);
   const char *end_of_version;
   bool explicit_version_p = false;
+  char *q = subset;
   char *ext;
   char backup;
   size_t len = strlen (p);
@@ -1321,6 +1332,17 @@ riscv_subset_list::parse_single_multiletter_ext (const 
char *p,
   bool found_any_number = false;
   bool found_minor_version = false;
 
+  if (!exact_single_p)
+{
+  /* Extension may not ended with '\0', may come with another extension
+which concat by '_' */
+  /* Parse until end of this extension including version number.  */
+  while (*++q != '\0' && *q != '_')
+   ;
+
+  len = q - subset;
+}
+
   end_of_version_pos = len;
   /* Find the begin of version string.  */
   for (i = len -1; i > 0; --i)
@@ -1505,21 +1527,26 @@ riscv_subset_list::parse_multiletter_ext (const char *p,
  Points to the end of extensions.
 
Arguments:
- `p`: Current parsing position.  */
+ `p`: Current parsing position.
+ `exact_single_p`: True if input string is exactly an extension and end
+ with '\0'.  */
 
 const char *
-riscv_subset_list::parse_single_ext (const char *p)
+riscv_subset_list::parse_single_ext (const char *p, bool exact_single_p)
 {
   switch (p[0])
 {
 case 'x':
-  return parse_single_multiletter_ext (p, "x", "non-standard extension");
+  return parse_single_multiletter_ext (p, "x", "non-standard extension",
+  exact_single_p);
 case 'z':
-  return parse_single_multiletter_ext (p, "z", "sub-extension");
+  return parse_single_multiletter_ext (p, "z", "sub-extension",
+  

[PATCH 1/5] RISC-V: Extract part parsing base ISA logic into a standalone function [NFC]

2024-01-08 Thread Kito Cheng
Minor refactor, preparation for further change.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_base_ext): New.
(riscv_subset_list::parse): Extract part of logic into
riscv_subset_list::parse_base_ext.
* config/riscv/riscv-subset.h (riscv_subset_list::parse_base_ext):
New.
---
 gcc/common/config/riscv/riscv-common.cc | 68 -
 gcc/config/riscv/riscv-subset.h |  2 +
 2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0301d170a41..f0359380451 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -970,25 +970,38 @@ riscv_subset_list::parsing_subset_version (const char 
*ext,
   return p;
 }
 
-/* Parsing function for standard extensions.
+/* Parsing function for base extensions, rv[32|64][i|e|g]
 
Return Value:
- Points to the end of extensions.
+ Points to the end of extensions, return NULL if any error.
 
Arguments:
  `p`: Current parsing position.  */
-
 const char *
-riscv_subset_list::parse_std_ext (const char *p)
+riscv_subset_list::parse_base_ext (const char *p)
 {
-  const char *all_std_exts = riscv_supported_std_ext ();
-  const char *std_exts = all_std_exts;
-
   unsigned major_version = 0;
   unsigned minor_version = 0;
   char std_ext = '\0';
   bool explicit_version_p = false;
 
+  if (startswith (p, "rv32"))
+{
+  m_xlen = 32;
+  p += 4;
+}
+  else if (startswith (p, "rv64"))
+{
+  m_xlen = 64;
+  p += 4;
+}
+  else
+{
+  error_at (m_loc, "%<-march=%s%>: ISA string must begin with rv32 or 
rv64",
+   m_arch);
+  return NULL;
+}
+
   /* First letter must start with i, e or g.  */
   switch (*p)
 {
@@ -1043,6 +1056,28 @@ riscv_subset_list::parse_std_ext (const char *p)
"% or %", m_arch);
   return NULL;
 }
+  return p;
+}
+
+
+/* Parsing function for standard extensions.
+
+   Return Value:
+ Points to the end of extensions.
+
+   Arguments:
+ `p`: Current parsing position.  */
+
+const char *
+riscv_subset_list::parse_std_ext (const char *p)
+{
+  const char *all_std_exts = riscv_supported_std_ext ();
+  const char *std_exts = all_std_exts;
+
+  unsigned major_version = 0;
+  unsigned minor_version = 0;
+  char std_ext = '\0';
+  bool explicit_version_p = false;
 
   while (p != NULL && *p)
 {
@@ -1499,22 +1534,9 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
   riscv_subset_list *subset_list = new riscv_subset_list (arch, loc);
   riscv_subset_t *itr;
   const char *p = arch;
-  if (startswith (p, "rv32"))
-{
-  subset_list->m_xlen = 32;
-  p += 4;
-}
-  else if (startswith (p, "rv64"))
-{
-  subset_list->m_xlen = 64;
-  p += 4;
-}
-  else
-{
-  error_at (loc, "%<-march=%s%>: ISA string must begin with rv32 or rv64",
-   arch);
-  goto fail;
-}
+  p = subset_list->parse_base_ext (p);
+  if (p == NULL)
+goto fail;
 
   /* Parsing standard extension.  */
   p = subset_list->parse_std_ext (p);
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index 14461838db5..c8117d8daf2 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -67,6 +67,8 @@ private:
   const char *parsing_subset_version (const char *, const char *, unsigned *,
  unsigned *, bool, bool *);
 
+  const char *parse_base_ext (const char *);
+
   const char *parse_std_ext (const char *);
 
   const char *parse_single_std_ext (const char *);
-- 
2.34.1



[PATCH 0/5] RISC-V: Relax the -march string for accept any order

2024-01-08 Thread Kito Cheng


Do you know how to build a ISA string with following extension?
- g
- c
- zba
- zbs
- svnapot
- zve64d
- zvl128b

Don't trial and error with your gcc and don't read RISC-V ISA spec! OK, I 
believe it's impossible for most people, even I work for RISC-V so many years, 
I remember most of the rule of the the canonical order, it's still hard to 
order that right in short time...

So I think it's time to relax that for the -march string inputs, since we have 
so many extension today, but we still keep the canonicalization within the 
compiler, because we need that to handle multi-lib and also it's easier to 
compare different ISA string.

This patch break into serveral part:
1) Small refactor patch
2) Change the way of parsing ISA string.
3) Remove unused functions
4) Update test cases
5) Update document






RE: [PATCH]middle-end: rejects loops with nonlinear inductions and early breaks [PR113163]

2024-01-08 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Monday, January 8, 2024 12:07 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end: rejects loops with nonlinear inductions and 
> early
> breaks [PR113163]
> 
> On Fri, 29 Dec 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > We can't support nonlinear inductions other than neg when vectorizing
> > early breaks and iteration count is known.
> >
> > For early break we currently require a peeled epilog but in these cases
> > we can't compute the remaining values.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > tested on cross cc1 for amdgcn-amdhsa and issue fixed.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/113163
> > * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p):
> 
> Misses sth.
> 
> > gcc/testsuite/ChangeLog:
> >
> > PR middle-end/113163
> > * gcc.target/gcn/pr113163.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/gcc.target/gcn/pr113163.c
> b/gcc/testsuite/gcc.target/gcn/pr113163.c
> > new file mode 100644
> > index
> ..99b0fdbaf3a3152ca008b5
> 109abf6e80d8cb3d6a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/gcn/pr113163.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O2 -ftree-vectorize" } */
> > +
> > +struct _reent { union { struct { char _l64a_buf[8]; } _reent; } _new; };
> > +static const char R64_ARRAY[] =
> "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
> ;
> > +char *
> > +_l64a_r (struct _reent *rptr,
> > + long value)
> > +{
> > +  char *ptr;
> > +  char *result;
> > +  int i, index;
> > +  unsigned long tmp = (unsigned long)value & 0x;
> > +  result =
> > +  ((
> > +  rptr
> > +  )->_new._reent._l64a_buf)
> > +   ;
> > +  ptr = result;
> > +  for (i = 0; i < 6; ++i)
> > +{
> > +  if (tmp == 0)
> > + {
> > +   *ptr = '\0';
> > +   break;
> > + }
> > +  *ptr++ = R64_ARRAY[index];
> > +  tmp >>= 6;
> > +}
> > +}
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> 3810983a80c8b989be9fd9a9993642069fd39b99..f1bf43b3731868e7b053c18
> 6302fbeaf515be8cf 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -2075,6 +2075,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info
> loop_vinfo,
> >return false;
> >  }
> >
> > +  /* We can't support partial vectors and early breaks with an induction
> > + type other than add or neg since we require the epilog and can't
> > + perform the peeling.  PR113163.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
> 
> But why's that only for constant VF?  We might never end up here
> with variable VF but the check looks odd ...

It's mirroring the condition in vect_gen_vector_loop_niters where we
create step_vector which is not 1. This is the case which causes
niters_vector_mult_vf_var to become a tree var instead.

I'll update the comment to say this.

Thanks,
Tamar
> 
> OK with that clarified and/or the test removed.
> 
> Thanks,
> Richard.
> 
> > +  && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && induction_type != vect_step_op_neg)
> > +{
> > +  if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"Peeling for epilogue is not supported"
> > +" for nonlinear induction except neg"
> > +" when iteration count is known and early breaks.\n");
> > +  return false;
> > +}
> > +
> >return true;
> >  }
> >
> >
> >
> >
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases

2024-01-08 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Monday, January 8, 2024 11:29 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina 
> Subject: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases
> 
> The following avoids creating a niter peeling epilog more consistently,
> matching what peeling later uses for the skip_vector condition, in
> particular when versioning is required which then also ensures the
> vector loop is entered unless the epilog is vectorized.  This should
> ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
> later, some refactoring could make that better matching.
> 
> The patch also makes sure to adjust the upper bound of the epilogues
> when we do not have a skip edge around the vector loop.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  Tamar, does
> that look OK wrt early-breaks?

Yeah the value looks correct, I did find a few cases where the niters should 
actually be
higher for skip_vector, namely when of the breaks forces ncopies > 1 and we 
have a
break condition that requires all values to be true to continue.

The code is not wrong in that case, just executes a completely useless vector 
iters.

But that's unrelated, this looks correct because it means bound_scalar is not 
set, in
which case there's no difference between one and multiple exits.

Thanks,
Tamar

> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/113026
>   * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
>   Avoid an epilog in more cases.
>   * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
>   epilogues niter upper bounds and estimates.
> 
>   * gcc.dg/torture/pr113026-1.c: New testcase.
>   * gcc.dg/torture/pr113026-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 
>  gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 +
>  gcc/tree-vect-loop-manip.cc   | 32 +++
>  gcc/tree-vect-loop.cc |  6 -
>  4 files changed, 66 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> new file mode 100644
> index 000..56dfef3b36c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall" } */
> +
> +char dst[16];
> +
> +void
> +foo (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +dst[i] = src[i]; /* { dg-bogus "" } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> new file mode 100644
> index 000..b9d5857a403
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall" } */
> +
> +char dst1[17];
> +void
> +foo1 (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +dst1[i] = src[i]; /* { dg-bogus "" } */
> +}
> +
> +char dst2[18];
> +void
> +foo2 (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +dst2[i] = src[i]; /* { dg-bogus "" } */
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 9330183bfb9..927f76a0947 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3364,6 +3364,38 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
>   bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count
> ();
> bb_before_epilog = loop_preheader_edge (epilog)->src;
>   }
> +  else
> + {
> +   /* When we do not have a loop-around edge to the epilog we know
> +  the vector loop covered at least VF scalar iterations unless
> +  we have early breaks and the epilog will cover at most
> +  VF - 1 + gap peeling iterations.
> +  Update any known upper bound with this knowledge.  */
> +   if (! LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + {
> +   if (epilog->any_upper_bound)
> + epilog->nb_iterations_upper_bound -= lowest_vf;
> +   if (epilog->any_likely_upper_bound)
> + epilog->nb_iterations_likely_upper_bound -= lowest_vf;
> +   if (epilog->any_estimate)
> + epilog->nb_iterations_estimate -= lowest_vf;
> + }
> +   unsigned HOST_WIDE_INT const_vf;
> +   if (vf.is_constant (_vf))
> + {
> +   const_vf += LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - 1;
> +   if (epilog->any_upper_bound)
> + epilog->nb_iterations_upper_bound
> +   = wi::umin (epilog->nb_iterations_upper_bound, const_vf);
> +   if (epilog->any_likely_upper_bound)
> + epilog->nb_iterations_likely_upper_bound
> +   = wi::umin 

Re: [PATCH v4] aarch64: SVE/NEON Bridging intrinsics

2024-01-08 Thread Jakub Jelinek
On Mon, Dec 11, 2023 at 03:13:03PM +, Richard Ball wrote:
> ACLE has added intrinsics to bridge between SVE and Neon.
> 
> The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
> SVE vectors.
> 
> This patch adds support to GCC for the following 3 intrinsics:
> svset_neonq, svget_neonq and svdup_neonq

This broke PCH on aarch64, see https://gcc.gnu.org/PR113270
Given that the tree pointers are no longer GC marked, bet it results in
random crashes elsewhere too even when not using PCH.

Jakub



[PATCH v5 1/1] RISC-V: Add support for XCVbi extension in CV32E40P

2024-01-08 Thread Mary Bennett
Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
  Mary Bennett 
  Nandni Jamnadas 
  Pietra Ferreira 
  Charlie Keaney
  Jessica Mills
  Craig Blackmore 
  Simon Cook 
  Jeremy Bennett 
  Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Create XCVbi extension
  support.
* config/riscv/riscv.opt: Likewise.
* config/riscv/corev.md: Implement cv_branch pattern
  for cv.beqimm and cv.bneimm.
* config/riscv/riscv.md: Add CORE-V branch immediate to RISC-V
  branch instruction pattern.
* config/riscv/constraints.md: Implement constraints
  cv_bi_s5 - signed 5-bit immediate.
* config/riscv/predicates.md: Implement predicate
  const_int5s_operand - signed 5 bit immediate.
* doc/sourcebuild.texi: Add XCVbi documentation.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bi-beqimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-beqimm-compile-2.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-2.c: New test.
* lib/target-supports.exp: Add proc for XCVbi.
---
 gcc/common/config/riscv/riscv-common.cc   |  2 +
 gcc/config/riscv/constraints.md   |  6 +++
 gcc/config/riscv/corev.md | 37 ++
 gcc/config/riscv/predicates.md|  4 ++
 gcc/config/riscv/riscv.md |  2 +-
 gcc/config/riscv/riscv.opt|  2 +
 gcc/doc/sourcebuild.texi  |  3 ++
 .../gcc.target/riscv/cv-bi-beqimm-compile-1.c | 17 +++
 .../gcc.target/riscv/cv-bi-beqimm-compile-2.c | 48 +++
 .../gcc.target/riscv/cv-bi-bneimm-compile-1.c | 17 +++
 .../gcc.target/riscv/cv-bi-bneimm-compile-2.c | 48 +++
 gcc/testsuite/lib/target-supports.exp | 13 +
 12 files changed, 198 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-2.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0301d170a41..d61164a42b9 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -355,6 +355,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvelw", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xcvbi", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xtheadba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadbb", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1730,6 +1731,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xcvmac",_options::x_riscv_xcv_subext, MASK_XCVMAC},
   {"xcvalu",_options::x_riscv_xcv_subext, MASK_XCVALU},
   {"xcvelw",_options::x_riscv_xcv_subext, MASK_XCVELW},
+  {"xcvbi", _options::x_riscv_xcv_subext, MASK_XCVBI},
 
   {"xtheadba",  _options::x_riscv_xthead_subext, MASK_XTHEADBA},
   {"xtheadbb",  _options::x_riscv_xthead_subext, MASK_XTHEADBB},
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index ee1c12b2e51..e4bfa227a2f 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -262,3 +262,9 @@
   (and (match_code "const_int")
(and (match_test "IN_RANGE (ival, 0, 1073741823)")
 (match_test "exact_log2 (ival + 1) != -1"
+
+(define_constraint "CV_bi_sign5"
+  "@internal
+   A 5-bit signed immediate for CORE-V Immediate Branch."
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (ival, -16, 15)")))
diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md
index adad2409fb6..66e0e998e41 100644
--- a/gcc/config/riscv/corev.md
+++ b/gcc/config/riscv/corev.md
@@ -706,3 +706,40 @@
 
   [(set_attr "type" "load")
   (set_attr "mode" "SI")])
+
+;; XCVBI Instructions
+(define_insn "*cv_branch"
+  [(set (pc)
+   (if_then_else
+(match_operator 1 "equality_operator"
+[(match_operand:X 2 "register_operand" "r")
+ (match_operand:X 3 "const_int5s_operand" 
"CV_bi_sign5")])
+(label_ref (match_operand 0 "" ""))
+(pc)))]
+  "TARGET_XCVBI"
+{
+  if (get_attr_length (insn) == 12)
+return "cv.b%N1\t%2,%z3,1f; jump\t%l0,ra; 1:";
+
+  return "cv.b%C1imm\t%2,%3,%0";
+}
+  [(set_attr "type" "branch")
+   (set_attr "mode" "none")])
+
+(define_insn "*branch"
+  [(set (pc)
+(if_then_else
+ (match_operator 1 "ordered_comparison_operator"
+ [(match_operand:X 2 "register_operand" "r")
+  (match_operand:X 3 "reg_or_0_operand" "rJ")])
+  

[PATCH v5 0/1] RISC-V: Support CORE-V XCVBI extension

2024-01-08 Thread Mary Bennett
Thank you for reviewing my patches and merging XCVelw.

This patch series presents the comprehensive implementation of the BI
extension for CORE-V.

Tested with riscv-gnu-toolchain on binutils, ld, gas and gcc testsuites to
ensure its correctness and compatibility with the existing codebase.
However, your input, reviews, and suggestions are invaluable in making this
extension even more robust.

The CORE-V builtins are described in the specification [1] and work can be
found in the OpenHW group's Github repository [2].

[1] 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

[2] github.com/openhwgroup/corev-gcc

Contributors:
  Mary Bennett 
  Nandni Jamnadas 
  Pietra Ferreira 
  Charlie Keaney
  Jessica Mills
  Craig Blackmore 
  Simon Cook 
  Jeremy Bennett 
  Helene Chelin 

RISC-V: Add support for XCVbi extension in CV32E40P

 gcc/common/config/riscv/riscv-common.cc   |  4 ++
 gcc/config/riscv/constraints.md   | 21 +---
 gcc/config/riscv/corev.def|  3 ++
 gcc/config/riscv/corev.md | 51 ++-
 gcc/config/riscv/predicates.md|  4 ++
 gcc/config/riscv/riscv.md |  2 +-
 gcc/config/riscv/riscv.opt|  2 +
 gcc/doc/sourcebuild.texi  |  3 ++
 .../gcc.target/riscv/cv-bi-beqimm-compile-1.c | 17 +++
 .../gcc.target/riscv/cv-bi-beqimm-compile-2.c | 48 +++
 .../gcc.target/riscv/cv-bi-bneimm-compile-1.c | 17 +++
 .../gcc.target/riscv/cv-bi-bneimm-compile-2.c | 48 +++
 gcc/testsuite/lib/target-supports.exp | 13 +
 12 files changed, 198 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-2.c

-- 
2.34.1



Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Wilco Dijkstra
Hi,

>> Is there no benefit to using SWPPL for RELEASE here?  Similarly for the
>> others.
>
> We started off implementing all possible memory orderings available. 
> Wilco saw value in merging less restricted orderings into more 
> restricted ones - mainly to reduce codesize in less frequently used atomics.
> 
> This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions 
> a little smaller.

Benchmarking showed that LSE and LSE2 RMW atomics have similar performance once
the atomic is acquire, release or both. Given there is already a significant 
overhead due
to the function call, PLT indirection and argument setup, it doesn't make sense 
to add
extra taken branches that may mispredict or cause extra fetch cycles...

The goal for next GCC is to inline these instructions directly to avoid these 
overheads.

Cheers,
Wilco

Re: Add -falign-all-functions

2024-01-08 Thread Richard Biener
On Thu, 4 Jan 2024, Jan Hubicka wrote:

> Hi,
> this patch adds new option -falign-all-functions which works like
> -falign-functions, but applies to all functions including those in cold
> regions.  As discussed in the PR log, this is needed for atomically
> patching function entries in the kernel.
> 
> An option would be to make -falign-function mandatory, but I think it is not a
> good idea, since original purpose of -falign-funtions is optimization of
> instruction decode and cache size.  Having -falign-all-functions is
> backwards compatible.  Richi also suggested extending syntax of the
> -falign-functions parameters (which is already non-trivial) but it seems
> to me that having separate flag is more readable.
> 
> Bootstrapped/regtested x86_64-linux, OK for master and later
> backports to release branches?
> 
> gcc/ChangeLog:
>   
>   PR middle-end/88345
>   * common.opt: Add -falign-all-functions
>   * doc/invoke.texi: Add -falign-all-functions.
>   (-falign-functions, -falign-labels, -falign-loops): Document
>   that alignment is ignored in cold code.
>   * flags.h (align_loops): Reindent.
>   (align_jumps): Reindent.
>   (align_labels): Reindent.
>   (align_functions): Reindent.
>   (align_all_functions): New macro.
>   * opts.cc (common_handle_option): Handle -falign-all-functions.
>   * toplev.cc (parse_alignment_opts): Likewise.
>   * varasm.cc (assemble_start_function): Likewise.
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d263a959df3..fea2c855fcf 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1033,6 +1033,13 @@ faggressive-loop-optimizations
>  Common Var(flag_aggressive_loop_optimizations) Optimization Init(1)
>  Aggressively optimize loops using language constraints.
>  
> +falign-all-functions
> +Common Var(flag_align_all_functions) Optimization
> +Align the start of functions.

all functions

or maybe "of every function."?

> +
> +falign-all-functions=
> +Common RejectNegative Joined Var(str_align_all_functions) Optimization
> +
>  falign-functions
>  Common Var(flag_align_functions) Optimization
>  Align the start of functions.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index d272b9228dd..ad3d75d310c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}.
>  @xref{Optimize Options,,Options that Control Optimization}.
>  @gccoptlist{-faggressive-loop-optimizations
>  -falign-functions[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
> +-falign-all-functions=[@var{n}]
>  -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
>  -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
>  -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
> @@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two 
> greater than or
>  equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
>  least the first @var{m} bytes of the function can be fetched by the CPU
>  without crossing an @var{n}-byte alignment boundary.
> +This is an optimization of code performance and alignment is ignored for
> +functions considered cold.  If alignment is required for all functions,
> +use @option{-falign-all-functions}.
>  
>  If @var{m} is not specified, it defaults to @var{n}.
>  
> @@ -14210,6 +14214,12 @@ overaligning functions. It attempts to instruct the 
> assembler to align
>  by the amount specified by @option{-falign-functions}, but not to
>  skip more bytes than the size of the function.
>  
> +@opindex falign-all-functions=@var{n}
> +@item -falign-all-functions
> +Specify minimal alignment for function entry. Unlike 
> @option{-falign-functions}
> +this alignment is applied also to all functions (even those considered cold).
> +The alignment is also not affected by @option{-flimit-function-alignment}
> +

For functions with two entries (like on powerpc), which entry does this
apply to?  I suppose the external ABI entry, not the local one?  But
how does this then help to align the patchable entry (the common
local entry should be aligned?).  Should we align _both_ entries?

>  @opindex falign-labels
>  @item -falign-labels
>  @itemx -falign-labels=@var{n}
> @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
>  Align loops to a power-of-two boundary.  If the loops are executed
>  many times, this makes up for any execution of the dummy padding
>  instructions.
> +This is an optimization of code performance and alignment is ignored for
> +loops considered cold.
>  
>  If @option{-falign-labels} is greater than this value, then its value
>  is used instead.
> @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
>  Align branch targets to a power-of-two boundary, for branch targets
>  where the targets can only be reached by jumping.  In this case,
>  no dummy operations need be executed.
> +This is an optimization of code performance and alignment is ignored for
> +jumps 

[PATCH][frontend]: don't ice with pragma NOVECTOR if loop in C has no condition [PR113267]

2024-01-08 Thread Tamar Christina
Hi All,

In C you can have loops without a condition, the original version of the patch
was rejecting the use of #pragma GCC novector, however during review it was
changed to not due this with the reason that we didn't want to give a compile
error with such cases.

However because annotations seem to be only be allowed on conditions (unless
I'm mistaken?) the attached example ICEs because there's no condition.

This will have it ignore the pragma instead of ICEing.  I don't know if this is
the best solution,  but as far as I can tell we can't attach the annotation to
anything else.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c/ChangeLog:

PR c/113267
* c-parser.cc (c_parser_for_statement): Skip the pragma is no cond.

gcc/testsuite/ChangeLog:

PR c/113267
* gcc.dg/pr113267.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
c3724304580cf54f52655e10d2697c68966b9a17..e8300cea8ef7cedead5871e40c2a9ba5333bf839
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -8442,7 +8442,7 @@ c_parser_for_statement (c_parser *parser, bool ivdep, 
unsigned short unroll,
   build_int_cst (integer_type_node,
  annot_expr_unroll_kind),
   build_int_cst (integer_type_node, unroll));
- if (novector && cond != error_mark_node)
+ if (novector && cond && cond != error_mark_node)
cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
   build_int_cst (integer_type_node,
  annot_expr_no_vector_kind),
diff --git a/gcc/testsuite/gcc.dg/pr113267.c b/gcc/testsuite/gcc.dg/pr113267.c
new file mode 100644
index 
..8b6fa08324eb12ad6493291cca8e80bd3a072ba8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr113267.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+
+void f (char *a, int i)
+{
+#pragma GCC novector
+  for (;;i++)
+a[i] *= 2;
+}




-- 
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
c3724304580cf54f52655e10d2697c68966b9a17..e8300cea8ef7cedead5871e40c2a9ba5333bf839
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -8442,7 +8442,7 @@ c_parser_for_statement (c_parser *parser, bool ivdep, 
unsigned short unroll,
   build_int_cst (integer_type_node,
  annot_expr_unroll_kind),
   build_int_cst (integer_type_node, unroll));
- if (novector && cond != error_mark_node)
+ if (novector && cond && cond != error_mark_node)
cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
   build_int_cst (integer_type_node,
  annot_expr_no_vector_kind),
diff --git a/gcc/testsuite/gcc.dg/pr113267.c b/gcc/testsuite/gcc.dg/pr113267.c
new file mode 100644
index 
..8b6fa08324eb12ad6493291cca8e80bd3a072ba8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr113267.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+
+void f (char *a, int i)
+{
+#pragma GCC novector
+  for (;;i++)
+a[i] *= 2;
+}





Re: [PATCH] lower-bitint: Fix up lowering of huge _BitInt 0 PHI args [PR113120]

2024-01-08 Thread Richard Biener
On Thu, 4 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> The PHI argument expansion of INTEGER_CSTs where bitint_min_cst_precision
> returns significantly smaller precision than the PHI result precision is
> optimized by loading the much smaller constant (if any) from memory and
> then either setting the remaining limbs to {} or calling memset with -1.
> The case where no constant is loaded (i.e. c == NULL) is when the
> INTEGER_CST is 0 or all_ones - in that case we can just set all the limbs
> to {} or call memset with -1 on everything.
> While for the all ones extension case that is what the code was already
> doing, I missed one spot in the zero extension case, where constricting
> the offset of the MEM_REF lhs of the = {} store it was using unconditionally
> the byte size of c, which obviously doesn't work if c is NULL.  In that case
> we want to use zero offset.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

Richard.

> 2024-01-04  Jakub Jelinek  
> 
>   PR tree-optimization/113120
>   * gimple-lower-bitint.cc (gimple_lower_bitint): Fix handling of very
>   large _BitInt zero INTEGER_CST PHI argument.
> 
>   * gcc.dg/bitint-62.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-03 11:51:27.0 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-03 13:53:30.699328045 +0100
> @@ -6582,8 +6582,12 @@ gimple_lower_bitint (void)
>   = build_array_type_nelts (large_huge.m_limb_type,
> nelts);
> tree ptype = build_pointer_type (TREE_TYPE (v1));
> -   tree off = fold_convert (ptype,
> -TYPE_SIZE_UNIT (TREE_TYPE (c)));
> +   tree off;
> +   if (c)
> + off = fold_convert (ptype,
> + TYPE_SIZE_UNIT (TREE_TYPE (c)));
> +   else
> + off = build_zero_cst (ptype);
> tree vd = build2 (MEM_REF, vtype,
>   build_fold_addr_expr (v1), off);
> g = gimple_build_assign (vd, build_zero_cst (vtype));
> --- gcc/testsuite/gcc.dg/bitint-62.c.jj   2024-01-03 14:11:22.332301884 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-62.c  2024-01-03 14:10:58.219640178 +0100
> @@ -0,0 +1,32 @@
> +/* PR tree-optimization/113120 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +_BitInt(8) a;
> +_BitInt(55) b;
> +
> +#if __BITINT_MAXWIDTH__ >= 401
> +static __attribute__((noinline, noclone)) void
> +foo (unsigned _BitInt(1) c, _BitInt(401) d)
> +{
> +  c /= d << b;
> +  a = c;
> +}
> +
> +void
> +bar (void)
> +{
> +  foo (1, 4);
> +}
> +#endif
> +
> +#if __BITINT_MAXWIDTH__ >= 6928
> +_BitInt(6928)
> +baz (int x, _BitInt(6928) y)
> +{
> +  if (x)
> +return y;
> +  else
> +return 0;
> +}
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] lower-bitint: Punt .*_OVERFLOW optimization if cast from IMAGPART_EXPR appears before REALPART_EXPR [PR113119]

2024-01-08 Thread Richard Biener
On Thu, 4 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> _BitInt lowering for .{ADD,SUB,MUL}_OVERFLOW calls which have both
> REALPART_EXPR and IMAGPART_EXPR used and have a cast from the IMAGPART_EXPR
> to a boolean or normal integral type lowers them at the point of
> the REALPART_EXPR statement (which is especially needed if the lhs of
> the call is complex with large/huge _BitInt element type); we emit the
> stmt to set the lhs of the cast at the same spot as well.
> Normally, the lowering of __builtin_{add,sub,mul}_overflow arranges
> the REALPART_EXPR to come before IMAGPART_EXPR, followed by cast from that,
> but as the testcase shows, a redundant __builtin_*_overflow call and VN
> can reorder those and we then ICE because the def-stmt of the former cast
> from IMAGPART_EXPR may appear after its uses.
> We already check that all of REALPART_EXPR, IMAGPART_EXPR and the cast
> from the latter appear in the same bb as the .{ADD,SUB,MUL}_OVERFLOW call
> in the optimization, the following patch just extends it to make sure
> cast appears after REALPART_EXPR; if not, we punt on the optimization and
> expand it as a store of a complex _BitInt on the location of the ifn call.
> Only the testcase in the testsuite is changed by the patch, all other
> __builtin_*_overflow* calls in the bitint* tests (and there are quite a few)
> have REALPART_EXPR first.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-01-04  Jakub Jelinek  
> 
>   PR tree-optimization/113119
>   * gimple-lower-bitint.cc (optimizable_arith_overflow): Punt if
>   both REALPART_EXPR and cast from IMAGPART_EXPR appear, but cast
>   is before REALPART_EXPR.
> 
>   * gcc.dg/bitint-61.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2023-12-22 12:27:58.497437164 +0100
> +++ gcc/gimple-lower-bitint.cc2023-12-23 10:44:05.586522553 +0100
> @@ -305,6 +305,7 @@ optimizable_arith_overflow (gimple *stmt
>imm_use_iterator ui;
>use_operand_p use_p;
>int seen = 0;
> +  gimple *realpart = NULL, *cast = NULL;
>FOR_EACH_IMM_USE_FAST (use_p, ui, lhs)
>  {
>gimple *g = USE_STMT (use_p);
> @@ -317,6 +318,7 @@ optimizable_arith_overflow (gimple *stmt
> if ((seen & 1) != 0)
>   return 0;
> seen |= 1;
> +   realpart = g;
>   }
>else if (gimple_assign_rhs_code (g) == IMAGPART_EXPR)
>   {
> @@ -338,13 +340,35 @@ optimizable_arith_overflow (gimple *stmt
> if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs2))
> || TREE_CODE (TREE_TYPE (lhs2)) == BITINT_TYPE)
>   return 0;
> +   cast = use_stmt;
>   }
>else
>   return 0;
>  }
>if ((seen & 2) == 0)
>  return 0;
> -  return seen == 3 ? 2 : 1;
> +  if (seen == 3)
> +{
> +  /* Punt if the cast stmt appears before realpart stmt, because
> +  if both appear, the lowering wants to emit all the code
> +  at the location of realpart stmt.  */
> +  gimple_stmt_iterator gsi = gsi_for_stmt (realpart);
> +  unsigned int cnt = 0;
> +  do
> + {
> +   gsi_prev_nondebug ();
> +   if (gsi_end_p (gsi) || gsi_stmt (gsi) == cast)
> + return 0;
> +   if (gsi_stmt (gsi) == stmt)
> + return 2;
> +   /* If realpart is too far from stmt, punt as well.
> +  Usually it will appear right after it.  */
> +   if (++cnt == 32)
> + return 0;
> + }
> +  while (1);
> +}
> +  return 1;
>  }
>  
>  /* If STMT is some kind of comparison (GIMPLE_COND, comparison assignment)
> --- gcc/testsuite/gcc.dg/bitint-61.c.jj   2023-12-23 10:46:17.808658852 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-61.c  2023-12-23 10:46:02.482874865 +0100
> @@ -0,0 +1,17 @@
> +/* PR tree-optimization/113119 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +_BitInt(8) b;
> +_Bool c;
> +#if __BITINT_MAXWIDTH__ >= 8445
> +_BitInt(8445) a;
> +
> +void
> +foo (_BitInt(4058) d)
> +{
> +  c = __builtin_add_overflow (a, 0ULL, );
> +  __builtin_add_overflow (a, 0ULL, );
> +  b = d;
> +}
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH]middle-end: check if target can do extract first for early breaks [PR113199]

2024-01-08 Thread Richard Biener
On Tue, 2 Jan 2024, Tamar Christina wrote:

> Hi All,
> 
> I was generating the vector reverse mask without checking if the target
> actually supported such an operation.
> 
> It also seems like more targets implement VEC_EXTRACT than permute on mask
> registers.
> 
> So this adds a check for IFN_VEC_EXTRACT support when required and changes
> the select first code to use it.
> 
> This is good for now since masks always come from whilelo.  But in the future
> when masks can come from other sources we will need the old code back.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues with --enable-checking=release --enable-lto
> --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> tested on cross cc1 for amdgcn-amdhsa and issue fixed.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/113199
>   * tree-vect-loop.cc (vectorizable_live_operation_1): Use
>   IFN_VEC_EXTRACT.
>   (vectorizable_live_operation): Check for IFN_VEC_EXTRACT support.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/113199
>   * gcc.target/gcn/pr113199.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.target/gcn/pr113199.c 
> b/gcc/testsuite/gcc.target/gcn/pr113199.c
> new file mode 100644
> index 
> ..8a641e5536e80e207ca0163cac66c0f4f6ca93f7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/gcn/pr113199.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2" } */
> +
> +typedef long unsigned int size_t;
> +typedef int wchar_t;
> +struct tm
> +{
> +  int tm_mon;
> +  int tm_year;
> +};
> +int abs (int);
> +struct lc_time_T { const char *month[12]; };
> +struct __locale_t * __get_current_locale (void) { }
> +const struct lc_time_T * __get_time_locale (struct __locale_t *locale) { }
> +const wchar_t * __ctloc (wchar_t *buf, const char *elem, size_t *len_ret) { 
> return buf; }
> +size_t
> +__strftime (wchar_t *s, size_t maxsize, const wchar_t *format,
> + const struct tm *tim_p, struct __locale_t *locale)
> +{
> +  size_t count = 0;
> +  const wchar_t *ctloc;
> +  wchar_t ctlocbuf[256];
> +  size_t i, ctloclen;
> +  const struct lc_time_T *_CurrentTimeLocale = __get_time_locale (locale);
> +{
> +  switch (*format)
> + {
> + case L'B':
> +   (ctloc = __ctloc (ctlocbuf, _CurrentTimeLocale->month[tim_p->tm_mon], 
> ));
> +   for (i = 0; i < ctloclen; i++)
> + {
> +   if (count < maxsize - 1)
> +  s[count++] = ctloc[i];
> +   else
> +  return 0;
> +   {
> +  int century = tim_p->tm_year >= 0
> +? tim_p->tm_year / 100 + 1900 / 100
> +: abs (tim_p->tm_year + 1900) / 100;
> +   }
> +   }
> + }
> +}
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 37f1be1101ffae779214056a0886411e0683e887..5aa92e67444e7aacf458fffa1428f1983c482374
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10648,36 +10648,18 @@ vectorizable_live_operation_1 (loop_vec_info 
> loop_vinfo,
> _VINFO_MASKS (loop_vinfo),
> 1, vectype, 0);
>tree scalar_res;
> +  gimple_seq_add_seq (, tem);
>  
>/* For an inverted control flow with early breaks we want EXTRACT_FIRST
> -  instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
> +  instead of EXTRACT_LAST.  For now since the mask always comes from a
> +  WHILELO we can get the first element ignoring the mask since CLZ of the
> +  mask will always be zero.  */
>if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> - {
> -   /* First create the permuted mask.  */
> -   tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> -   tree perm_dest = copy_ssa_name (mask);
> -   gimple *perm_stmt
> - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> -mask, perm_mask);
> -   vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> -);
> -   mask = perm_dest;
> -
> -   /* Then permute the vector contents.  */
> -   tree perm_elem = perm_mask_for_reverse (vectype);
> -   perm_dest = copy_ssa_name (vec_lhs_phi);
> -   perm_stmt
> - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
> -vec_lhs_phi, perm_elem);
> -   vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> -);
> -   vec_lhs_phi = perm_dest;
> - }
> -
> -  gimple_seq_add_seq (, tem);
> -
> -  scalar_res = gimple_build (, CFN_EXTRACT_LAST, scalar_type,
> -  mask, vec_lhs_phi);
> + scalar_res = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> +vec_lhs_phi, bitstart);

So bitstart is always zero?  I 

Re: [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands

2024-01-08 Thread Richard Biener
On Fri, 29 Dec 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch fixes several interconnected issues.
> 
> 1. When picking an exit we wanted to check for niter_desc.may_be_zero not 
> true.
>i.e. we want to pick an exit which we know will iterate at least once.
>However niter_desc.may_be_zero is not a boolean.  It is a tree that encodes
>a boolean value.  !niter_desc.may_be_zero is just checking if we have some
>information, not what the information is.  This leads us to pick a more
>difficult to vectorize exit more often than we should.
> 
> 2. Because we had this bug, we used to pick an alternative exit much more 
> ofthen
>which showed one issue, when the loop accesses memory and we "invert it" we
>would corrupt the VUSE chain.  This is because on an peeled vector 
> iteration
>every exit restarts the loop (i.e. they're all early) BUT since we may have
>performed a store, the vUSE would need to be updated.  This version 
> maintains
>virtual PHIs correctly in these cases.   Note that we can't simply remove 
> all
>of them and recreate them because we need the PHI nodes still in the right
>order for if skip_vector.
> 
> 3. Since we're moving the stores to a safe location I don't think we actually
>need to analyze whether the store is in range of the memref,  because if we
>ever get there, we know that the loads must be in range, and if the loads 
> are
>in range and we get to the store we know the early breaks were not taken 
> and
>so the scalar loop would have done the VF stores too.
> 
> 4. Instead of searching for where to move stores to, they should always be in
>exit belonging to the latch.  We can only ever delay stores and even if we
>pick a different exit than the latch one as the main one, effects still
>happen in program order when vectorized.  If we don't move the stores to 
> the
>latch exit but instead to whever we pick as the "main" exit then we can
>perform incorrect memory accesses (luckily these are trapped by 
> verify_ssa).
> 
> 5. We only used to analyze loads inside the same BB as an early break, and 
> also
>we'd never analyze the ones inside the block where we'd be moving memory
>references to.  This is obviously bogus and to fix it this patch splits 
> apart
>the two constraints.  We first validate that all load memory references are
>in bounds and only after that do we perform the alias checks for the 
> writes.
>This makes the code simpler to understand and more trivially correct.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues with --enable-checking=release --enable-lto
> --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/113137
>   PR tree-optimization/113136
>   PR tree-optimization/113172
>   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
>   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
>   (vect_do_peeling): Maintain virtual PHIs on inverted loops.
>   * tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
>   latch.
>   (vect_create_loop_vinfo): Record all conds instead of only alt ones.
>   * tree-vectorizer.h: Fix comment
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/113137
>   PR tree-optimization/113136
>   PR tree-optimization/113172
>   * g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
>   * g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
>   * gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
>   * gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
>   * gcc.dg/vect/vect-early-break_97-pr113172.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc 
> b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> new file mode 100644
> index 
> ..f78db8669dcc65f1b45ea78f4433d175e1138332
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int b;
> +void a() __attribute__((__noreturn__));
> +void c() {
> +  char *buf;
> +  int bufsz = 64;
> +  while (b) {
> +!bufsz ? a(), 0 : *buf++ = bufsz--;
> +b -= 4;
> +  }
> +}
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc 
> b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
> new file mode 100644
> index 
> ..dcd19fa2d2145e09de18279479b3f20fc27336ba
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-add-options 

Re: [PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-08 Thread Victor Do Nascimento




On 1/5/24 11:10, Richard Sandiford wrote:

Victor Do Nascimento  writes:

The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

   #define LSE2 _i1

and reconstructs function names with the pre-processor's token
concatenation feature, such that for `MACRO(_i)', we would
now have `MACRO_FEAT(name, feature)' and in the macro definition body
we replace `name` with `name##feature`.


FWIW, another way of doing this would be to have:

#define CORE(NAME) NAME
#define LSE2(NAME) NAME##_i1

and use feature(name) instead of name##feature.  This has the slight
advantage of not using ## on empty tokens, and the maybe slightly
better advantage of not needing the extra forwarding step in:

#define ENTRY_FEAT(name, feat)  \
ENTRY_FEAT1(name, feat)

#define ENTRY_FEAT1(name, feat) \

WDYT?

Richard



While from a strictly stylistic point of view, I'm not so keen on the 
resulting interface and its 'function call within a function call' look, 
e.g.


  ENTRY (LSE2 (libat_compare_exchange_16))

and

  ALIAS (LSE128 (libat_compare_exchange_16), \
 LSE2 (libat_compare_exchange_16))

on the implementation-side of things, I like the benefits this brings 
about.  Namely allowing the use of the unaltered original 
implementations of the ENTRY, END and ALIAS macros with the 
aforementioned advantages of not having to use ## on empty tokens and 
abolishing the need for the extra forwarding step.


I'm happy enough to go with this approach.

Cheers


Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same.  For example, the entry
and endpoints of `libat_store_16' remain defined by:

   - ENTRY (libat_store_16)
and
   - END (libat_store_16)

For the LSE2 implementation of the same 16-byte atomic store, we now
have:

   - ENTRY_FEAT (libat_store_16, LSE2)
and
   - END_FEAT (libat_store_16, LSE2)

For the alising of ifunc names, we define the following new
implementation of the ALIAS macro:

   - ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)

Defining the base feature name macro to map `CORE' to the empty string,
mapping LSE2 to the base implementation, we'd alias the LSE2
`libat_exchange_16' to it base implementation with:

   - ALIAS (libat_exchange_16, LSE2, CORE)

libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY_FEAT): Likewise.
(END_FEAT): Likewise.
(ENTRY_FEAT1): Likewise.
(END_FEAT1): Likewise.
(ALIAS): Modify macro to take in `arch' arguments.
---
  libatomic/config/linux/aarch64/atomic_16.S | 83 +-
  1 file changed, 49 insertions(+), 34 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index a099037179b..eb8e749b8a2 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,22 +40,38 @@
  
  	.arch	armv8-a+lse
  
-#define ENTRY(name)		\

-   .global name;   \
-   .hidden name;   \
-   .type name,%function;   \
-   .p2align 4; \
-name:  \
-   .cfi_startproc; \
+#define ENTRY(name) ENTRY_FEAT (name, CORE)
+
+#define ENTRY_FEAT(name, feat) \
+   ENTRY_FEAT1(name, feat)
+
+#define ENTRY_FEAT1(name, feat)\
+   .global name##feat; \
+   .hidden name##feat; \
+   .type name##feat,%function; \
+   .p2align 4; \
+name##feat:\
+   .cfi_startproc; \
hint34  // bti c
  
-#define END(name)		\

-   .cfi_endproc;   \
-   .size name, .-name;
+#define END(name) END_FEAT (name, CORE)
  
-#define ALIAS(alias,name)	\

-   .global alias;  \
-   .set alias, name;
+#define END_FEAT(name, feat)   \
+   END_FEAT1(name, feat)
+
+#define END_FEAT1(name, feat)  \
+   .cfi_endproc;   \
+   .size name##feat, .-name##feat;
+
+#define ALIAS(alias, from, to) \
+   ALIAS1(alias,from,to)
+
+#define ALIAS1(alias, from, to)\
+   .global alias##from;\
+   .set alias##from, alias##to;
+
+#define CORE
+#define LSE2   _i1
  
  #define res0 x0

  #define res1 x1
@@ -108,7 +124,7 @@ ENTRY (libat_load_16)
  END (libat_load_16)
  
  
-ENTRY (libat_load_16_i1)

+ENTRY_FEAT (libat_load_16, LSE2)
cbnzw1, 1f
  
  	/* RELAXED.  */

@@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1)
ldp res0, res1, 

Re: [PATCH]middle-end: Fix dominators updates when peeling with multiple exits [PR113144]

2024-01-08 Thread Richard Biener
On Fri, 29 Dec 2023, Tamar Christina wrote:

> Hi All,
> 
> Only trying to update certain dominators doesn't seem to work very well
> because as the loop gets versioned, peeled, or skip_vector then we end up with
> very complicated control flow.  This means that the final merge blocks for the
> loop exit are not easy to find or update.
> 
> Instead of trying to pick which exits to update, this changes it to update all
> the blocks reachable by the new exits.  This is because they'll contain common
> blocks with e.g. the versioned loop.  It's these blocks that need an update
> most of the time.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

This makes it quadratic in the number of vectorized early exit loops
in a function.  The vectorizer CFG manipulation operates in a local
enough bubble that programmatic updating of dominators should be
possible (after all we manage to produce correct SSA form!), the
proposed change gets us too far off to a point where re-computating
dominance info is likely cheaper (but no, we shouldn't do this either).

Can you instead give manual updating a try again?  I think
versioning should produce up-to-date dominator info, it's only
when you redirect branches during peeling that you'd need
adjustments - but IIRC we're never introducing new merges?

IIRC we can't wipe dominators during transform since we query them
during code generation.  We possibly could code generate all
CFG manipulations of all vectorized loops, recompute all dominators
and then do code generation of all vectorized loops.

But then we're doing a loop transform and the exits will ultimatively
end up in the same place, so the CFG and dominator update is bound to
where the original exits went to.

Richard

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR middle-end/113144
>   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
>   Update all dominators reachable from exit.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/113144
>   * gcc.dg/vect/vect-early-break_94-pr113144.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c
> new file mode 100644
> index 
> ..903fe7be6621e81db6f29441e4309fa213d027c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +long tar_atol256_max, tar_atol256_size, tar_atosl_min;
> +char tar_atol256_s;
> +void __errno_location();
> +
> +
> +inline static long tar_atol256(long min) {
> +  char c;
> +  int sign;
> +  c = tar_atol256_s;
> +  sign = c;
> +  while (tar_atol256_size) {
> +if (c != sign)
> +  return sign ? min : tar_atol256_max;
> +c = tar_atol256_size--;
> +  }
> +  if ((c & 128) != (sign & 128))
> +return sign ? min : tar_atol256_max;
> +  return 0;
> +}
> +
> +inline static long tar_atol(long min) {
> +  return tar_atol256(min);
> +}
> +
> +long tar_atosl() {
> +  long n = tar_atol(-1);
> +  if (tar_atosl_min) {
> +__errno_location();
> +return 0;
> +  }
> +  if (n > 0)
> +return 0;
> +  return n;
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 1066ea17c5674e03412b3dcd8a62ddf4dd54cf31..3810983a80c8b989be9fd9a9993642069fd39b99
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1716,8 +1716,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
> /* Now link the alternative exits.  */
> if (multiple_exits_p)
>   {
> -   set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> -main_loop_exit_block);
> for (auto gsi_from = gsi_start_phis (loop->header),
>  gsi_to = gsi_start_phis (new_preheader);
>  !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> @@ -1751,12 +1749,26 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>  
>/* Finally after wiring the new epilogue we need to update its main 
> exit
>to the original function exit we recorded.  Other exits are already
> -  correct.  */
> +  correct.  Because of versioning, skip vectors and others we must update
> +  the dominators of every node reachable by the new exits.  */
>if (multiple_exits_p)
>   {
> update_loop = new_loop;
> -   for (edge e : get_loop_exit_edges (loop))
> - doms.safe_push (e->dest);
> +   hash_set  visited;
> +   auto_vec  workset;
> +   edge ev;
> +   edge_iterator ei;
> +   workset.safe_splice 

Re: [PATCH]middle-end: rejects loops with nonlinear inductions and early breaks [PR113163]

2024-01-08 Thread Richard Biener
On Fri, 29 Dec 2023, Tamar Christina wrote:

> Hi All,
> 
> We can't support nonlinear inductions other than neg when vectorizing
> early breaks and iteration count is known.
> 
> For early break we currently require a peeled epilog but in these cases
> we can't compute the remaining values.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> tested on cross cc1 for amdgcn-amdhsa and issue fixed.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR middle-end/113163
>   * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p):

Misses sth.

> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/113163
>   * gcc.target/gcn/pr113163.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.target/gcn/pr113163.c 
> b/gcc/testsuite/gcc.target/gcn/pr113163.c
> new file mode 100644
> index 
> ..99b0fdbaf3a3152ca008b5109abf6e80d8cb3d6a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/gcn/pr113163.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -ftree-vectorize" } */ 
> +
> +struct _reent { union { struct { char _l64a_buf[8]; } _reent; } _new; };
> +static const char R64_ARRAY[] = 
> "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
> +char *
> +_l64a_r (struct _reent *rptr,
> + long value)
> +{
> +  char *ptr;
> +  char *result;
> +  int i, index;
> +  unsigned long tmp = (unsigned long)value & 0x;
> +  result = 
> +  ((
> +  rptr
> +  )->_new._reent._l64a_buf)
> +   ;
> +  ptr = result;
> +  for (i = 0; i < 6; ++i)
> +{
> +  if (tmp == 0)
> + {
> +   *ptr = '\0';
> +   break;
> + }
> +  *ptr++ = R64_ARRAY[index];
> +  tmp >>= 6;
> +}
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 3810983a80c8b989be9fd9a9993642069fd39b99..f1bf43b3731868e7b053c186302fbeaf515be8cf
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -2075,6 +2075,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo,
>return false;
>  }
>  
> +  /* We can't support partial vectors and early breaks with an induction
> + type other than add or neg since we require the epilog and can't
> + perform the peeling.  PR113163.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()

But why's that only for constant VF?  We might never end up here
with variable VF but the check looks odd ...

OK with that clarified and/or the test removed.

Thanks,
Richard.

> +  && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> +  && induction_type != vect_step_op_neg)
> +{
> +  if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "Peeling for epilogue is not supported"
> +  " for nonlinear induction except neg"
> +  " when iteration count is known and early breaks.\n");
> +  return false;
> +}
> +
>return true;
>  }
>  
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] c++/modules: Differentiate extern templates and TYPE_DECL_SUPPRESS_DEBUG [PR112820]

2024-01-08 Thread Richard Biener
On Mon, Jan 8, 2024 at 10:58 AM Nathaniel Shead
 wrote:
>
> On Thu, Jan 04, 2024 at 03:39:15PM -0500, Patrick Palka wrote:
> > On Sun, 3 Dec 2023, Nathaniel Shead wrote:
> >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > >
> > > -- >8 --
> > >
> > > The TYPE_DECL_SUPPRESS_DEBUG and DECL_EXTERNAL flags use the same
> > > underlying bit. This is causing confusion when attempting to determine
> > > the interface for a streamed-in class type, since the modules code
> > > currently assumes that all DECL_EXTERNAL types are extern templates.
> > > However, when -g is specified then TYPE_DECL_SUPPRESS_DEBUG (and hence
> > > DECL_EXTERNAL) is marked on various other kinds of declarations, such as
> > > vtables, which causes them to never be emitted.
> >
> > Good catch.. Maybe we should use different bits for these flags?  I 
> > wouldn't be
> > surprised if this bit sharing causes issues elsewhere in the compiler.  The
> > documentation in tree.h / tree-core.h says DECL_EXTERNAL is only valid for
> > VAR_DECL and FUNCTION_DECL, so at one point it was safe to share the same 
> > bit
> > but that's not true anymore it seems.
> >
> > Looking at tree-core.h:tree_decl_common luckily we have plenty of spare 
> > bits.
> > We could also e.g. make TYPE_DECL_SUPPRESS_DEBUG use the decl_not_flexarray 
> > bit
> > which is otherwise only used for FIELD_DECL.
> >
>
> That seems like a good idea, thanks. How does this look?
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

OK if C++ folks are fine.

Richard.

> -- >8 --
>
> Currently, DECL_EXTERNAL and TYPE_DECL_SUPPRESS_DEBUG share a bit. This
> causes issues with module code, which then incorrectly assumes that
> anything with suppressed debug info (such as vtables when '-g' is
> specified) is an extern template and thus prevents their emission.
>
> This patch splits the two flags up; extern templates continue to use the
> DECL_EXTERNAL flag (and the documentation is updated to indicate this),
> but TYPE_DECL_SUPPRESS_DEBUG now uses the 'decl_not_flexarray' flag,
> which currently is only used by FIELD_DECLs.
>
> PR c++/112820
> PR c++/102607
>
> gcc/cp/ChangeLog:
>
> * pt.cc (mark_class_instantiated): Set DECL_EXTERNAL explicitly.
>
> gcc/ChangeLog:
>
> * tree-core.h (struct tree_decl_common): Update comments.
> * tree.h (DECL_EXTERNAL): Update comments.
> (TYPE_DECL_SUPPRESS_DEBUG): Use 'decl_not_flexarray' instead.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/modules/debug-2_a.C: New test.
> * g++.dg/modules/debug-2_b.C: New test.
> * g++.dg/modules/debug-2_c.C: New test.
> * g++.dg/modules/debug-3_a.C: New test.
> * g++.dg/modules/debug-3_b.C: New test.
>
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/pt.cc | 1 +
>  gcc/testsuite/g++.dg/modules/debug-2_a.C | 9 +
>  gcc/testsuite/g++.dg/modules/debug-2_b.C | 8 
>  gcc/testsuite/g++.dg/modules/debug-2_c.C | 9 +
>  gcc/testsuite/g++.dg/modules/debug-3_a.C | 8 
>  gcc/testsuite/g++.dg/modules/debug-3_b.C | 9 +
>  gcc/tree-core.h  | 6 +++---
>  gcc/tree.h   | 8 
>  8 files changed, 51 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index e38e7a773f0..7839745035b 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -26256,6 +26256,7 @@ mark_class_instantiated (tree t, int extern_p)
>SET_CLASSTYPE_EXPLICIT_INSTANTIATION (t);
>SET_CLASSTYPE_INTERFACE_KNOWN (t);
>CLASSTYPE_INTERFACE_ONLY (t) = extern_p;
> +  DECL_EXTERNAL (TYPE_NAME (t)) = extern_p;
>TYPE_DECL_SUPPRESS_DEBUG (TYPE_NAME (t)) = extern_p;
>if (! extern_p)
>  {
> diff --git a/gcc/testsuite/g++.dg/modules/debug-2_a.C 
> b/gcc/testsuite/g++.dg/modules/debug-2_a.C
> new file mode 100644
> index 000..eed0905542b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-2_a.C
> @@ -0,0 +1,9 @@
> +// PR c++/112820
> +// { dg-additional-options "-fmodules-ts -g" }
> +// { dg-module-cmi io }
> +
> +export module io;
> +
> +export struct error {
> +  virtual const char* what() const noexcept;
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/debug-2_b.C 
> b/gcc/testsuite/g++.dg/modules/debug-2_b.C
> new file mode 100644
> index 000..fc9afbc02e0
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-2_b.C
> @@ -0,0 +1,8 @@
> +// PR c++/112820
> +// { dg-additional-options "-fmodules-ts -g" }
> +
> +module io;
> +
> +const char* error::what() const noexcept {
> +  return "bla";
> +}
> diff --git 

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-08 Thread Richard Sandiford
Jeff Law  writes:
> The other issue that's been in the back of my mind is costing.  But I 
> think the model here is combine without regards to cost.

No, it does take costing into account.  For size, it's the usual
"sum up the before and after insn costs and see which one is lower".
For speed, the costs are weighted by execution frequency, so e.g.
two insns of cost 4 in the same block can be combined into a single
instruction of cost 8, but a hoisted invariant can only be combined
into a loop body instruction if the loop body instruction's cost
doesn't increase significantly.

This is done by rtl_ssa::changes_are_worthwhile.

Thanks,
Richard


Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-08 Thread Richard Biener
On Mon, Jan 8, 2024 at 3:35 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR113100 shows, the unbiasing introduced by r14-6737 can
> cause the scrubbing to overrun and screw some critical data
> on stack like saved toc base consequently cause segfault on
> Power.
>
> By checking PR112917, IMHO we should keep this unbiasing
> guarded under SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 &&
> TARGET_STACK_BIAS), similar to some existing code special
> treating SPARC stack bias.
>
> Bootstrapped and regtested on x86_64-redhat-linux and
> powerpc64{,le}-linux-gnu.  All reported failures in
> PR113100 are gone.  I also expect the culprit commit can
> affect those ports with nonzero STACK_POINTER_OFFSET.
>
> Is it ok for trunk?

OK

> BR,
> Kewen
> -
> PR middle-end/113100
>
> gcc/ChangeLog:
>
> * builtins.cc (expand_builtin_stack_address): Guard stack point
> adjustment with SPARC_STACK_BOUNDARY_HACK.
> ---
>  gcc/builtins.cc | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 125ea158ebf..9bad1e962b4 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -5450,6 +5450,7 @@ expand_builtin_stack_address ()
>rtx ret = convert_to_mode (ptr_mode, copy_to_reg (stack_pointer_rtx),
>  STACK_UNSIGNED);
>
> +#ifdef SPARC_STACK_BOUNDARY_HACK
>/* Unbias the stack pointer, bringing it to the boundary between the
>   stack area claimed by the active function calling this builtin,
>   and stack ranges that could get clobbered if it called another
> @@ -5476,7 +5477,9 @@ expand_builtin_stack_address ()
>   (caller) function's active area as well, whereas those pushed or
>   allocated temporarily for a call are regarded as part of the
>   callee's stack range, rather than the caller's.  */
> -  ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET);
> +  if (SPARC_STACK_BOUNDARY_HACK)
> +ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET);
> +#endif
>
>return force_reg (ptr_mode, ret);
>  }
> --
> 2.39.3


Re: [PATCH] sparc: Char arrays are 64-bit aligned on SPARC

2024-01-08 Thread Daniel Cederman

On 2024-01-08 10:20, Eric Botcazou wrote:

pr88077 fails on SPARC since char HeaderStr[1] in pr88077_1.c and
long HeaderStr in pr88077_0.c differs in alignment.

warning: alignment 4 of normal symbol `HeaderStr' in c_lto_pr88077_0.o is
smaller than 8 used by the common definition in c_lto_pr88077_1.o


I have never seen it though.  Is that really a warning issued by GCC?



Hello Eric! Thank you for reviewing the patches!

No, this warning is not from GCC, it is from binutils ld. I forgot to 
mention that in the message. I get a similar warning from older versions 
of ld, so I do not think it is a new warning. It is also there with GCC 10.


For the OK:ed patches (with your changes), can I push them to 
release/gcc-13 in addition to master?


/Daniel


Re: [PATCH] gimplify: Fix ICE in recalculate_side_effects [PR113228]

2024-01-08 Thread Richard Biener
On Sat, 6 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs during regimplificatgion since the addition of
> (convert (eqne zero_one_valued_p@0 INTEGER_CST@1))
> simplification.  That simplification is novel in the sense that in
> gimplify_expr it can turn an expression (comparison in particular) into
> a SSA_NAME.  Normally when gimplify_expr sees originally a SSA_NAME, it does
> case SSA_NAME:
>   /* Allow callbacks into the gimplifier during optimization.  */
>   ret = GS_ALL_DONE;
>   break;
> and doesn't try to recalculate side effects because of that, but in this
> case gimplify_expr normally enters the:
> default:
>   switch (TREE_CODE_CLASS (TREE_CODE (*expr_p)))
> {
> case tcc_comparison:
> then does
>   *expr_p = gimple_boolify (*expr_p);
> and then
>   *expr_p = fold_convert_loc (input_location,
>   org_type, *expr_p);
> with this new match.pd simplification turns that tcc_comparison class
> into SSA_NAME.  Unlike the outer SSA_NAME handling though, this falls
> through into
>   recalculate_side_effects (*expr_p);
> 
> dont_recalculate:
>   break;
> but unfortunately recalculate_side_effects doesn't handle SSA_NAME and ICEs
> on it.
> SSA_NAMEs don't ever have TREE_SIDE_EFFECTS set on those, so the following
> patch fixes it by handling it similarly to the tcc_constant case.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-01-06  Jakub Jelinek  
> 
>   PR tree-optimization/113228
>   * gimplify.cc (recalculate_side_effects): Do nothing for SSA_NAMEs.
> 
>   * gcc.c-torture/compile/pr113228.c: New test.
> 
> --- gcc/gimplify.cc.jj2024-01-03 11:51:40.744603324 +0100
> +++ gcc/gimplify.cc   2024-01-05 13:32:34.351336320 +0100
> @@ -3344,6 +3344,9 @@ recalculate_side_effects (tree t)
>return;
>  
>  default:
> +  if (code == SSA_NAME)
> + /* No side-effects.  */
> + return;
>gcc_unreachable ();
> }
>  }
> --- gcc/testsuite/gcc.c-torture/compile/pr113228.c.jj 2024-01-05 
> 13:27:42.876330301 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr113228.c2024-01-05 
> 13:27:22.503609458 +0100
> @@ -0,0 +1,17 @@
> +/* PR tree-optimization/113228 */
> +
> +int a, b, c, d, i;
> +
> +void
> +foo (void)
> +{
> +  int k[3] = {};
> +  int *l = 
> +  for (d = 0; c; c--)
> +for (i = 0; i <= 9; i++)
> +  {
> + for (b = 1; b <= 4; b++)
> +   k[0] = k[0] == 0;
> + *l |= k[d];
> +  }
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-optimization/113026 - avoid vector epilog in more cases

2024-01-08 Thread Richard Biener
The following avoids creating a niter peeling epilog more consistently,
matching what peeling later uses for the skip_vector condition, in
particular when versioning is required which then also ensures the
vector loop is entered unless the epilog is vectorized.  This should
ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
later, some refactoring could make that better matching.

The patch also makes sure to adjust the upper bound of the epilogues
when we do not have a skip edge around the vector loop.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  Tamar, does
that look OK wrt early-breaks?

Thanks,
Richard.

PR tree-optimization/113026
* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
Avoid an epilog in more cases.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
epilogues niter upper bounds and estimates.

* gcc.dg/torture/pr113026-1.c: New testcase.
* gcc.dg/torture/pr113026-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 
 gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 +
 gcc/tree-vect-loop-manip.cc   | 32 +++
 gcc/tree-vect-loop.cc |  6 -
 4 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c 
b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
new file mode 100644
index 000..56dfef3b36c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */ 
+/* { dg-additional-options "-Wall" } */
+
+char dst[16];
+
+void
+foo (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst[i] = src[i]; /* { dg-bogus "" } */
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c 
b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
new file mode 100644
index 000..b9d5857a403
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */ 
+/* { dg-additional-options "-Wall" } */
+
+char dst1[17];
+void
+foo1 (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst1[i] = src[i]; /* { dg-bogus "" } */
+}
+
+char dst2[18];
+void
+foo2 (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst2[i] = src[i]; /* { dg-bogus "" } */
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 9330183bfb9..927f76a0947 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3364,6 +3364,38 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
bb_before_epilog->count = single_pred_edge 
(bb_before_epilog)->count ();
  bb_before_epilog = loop_preheader_edge (epilog)->src;
}
+  else
+   {
+ /* When we do not have a loop-around edge to the epilog we know
+the vector loop covered at least VF scalar iterations unless
+we have early breaks and the epilog will cover at most
+VF - 1 + gap peeling iterations.
+Update any known upper bound with this knowledge.  */
+ if (! LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+   {
+ if (epilog->any_upper_bound)
+   epilog->nb_iterations_upper_bound -= lowest_vf;
+ if (epilog->any_likely_upper_bound)
+   epilog->nb_iterations_likely_upper_bound -= lowest_vf;
+ if (epilog->any_estimate)
+   epilog->nb_iterations_estimate -= lowest_vf;
+   }
+ unsigned HOST_WIDE_INT const_vf;
+ if (vf.is_constant (_vf))
+   {
+ const_vf += LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - 1;
+ if (epilog->any_upper_bound)
+   epilog->nb_iterations_upper_bound
+ = wi::umin (epilog->nb_iterations_upper_bound, const_vf);
+ if (epilog->any_likely_upper_bound)
+   epilog->nb_iterations_likely_upper_bound
+ = wi::umin (epilog->nb_iterations_likely_upper_bound,
+ const_vf);
+ if (epilog->any_estimate)
+   epilog->nb_iterations_estimate
+ = wi::umin (epilog->nb_iterations_estimate, const_vf);
+   }
+   }
 
   /* If loop is peeled for non-zero constant times, now niters refers to
 orig_niters - prolog_peeling, it won't overflow even the orig_niters
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a06771611ac..9dd573ef125 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1261,7 +1261,11 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info 
loop_vinfo)
 the epilogue is unnecessary.  */
  && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
  || ((unsigned HOST_WIDE_INT) max_niter
- > (th / const_vf) * const_vf
+ /* We'd like to 

[PATCH] btf: print string position as comment for validation and testing purposes.

2024-01-08 Thread Cupertino Miranda
Hi everyone,

This patch adds a comment to the BTF strings regarding their position
within the section. This is useful for assembly inspection purposes.

Regards,
Cupertino

When using -dA, this function was only printing as comment btf_string or
btf_aux_string.
This patch changes the comment to also include the position of the
string within the section in hexadecimal format.

gcc/ChangeLog:
* btfout.cc (output_btf_strs): Changed.
---
 gcc/btfout.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index db4f1084f85c..04218adc9e66 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -1081,17 +1081,20 @@ static void
 output_btf_strs (ctf_container_ref ctfc)
 {
   ctf_string_t * ctf_string = ctfc->ctfc_strtable.ctstab_head;
+  static int str_pos = 0;
 
   while (ctf_string)
 {
-  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string");
+  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string, str_pos = 
0x%x", str_pos);
+  str_pos += strlen(ctf_string->cts_str) + 1;
   ctf_string = ctf_string->cts_next;
 }
 
   ctf_string = ctfc->ctfc_aux_strtable.ctstab_head;
   while (ctf_string)
 {
-  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string");
+  dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string, 
str_pos = 0x%x", str_pos);
+  str_pos += strlen(ctf_string->cts_str) + 1;
   ctf_string = ctf_string->cts_next;
 }
 }
-- 
2.30.2



[PATCH] bpf: Correct BTF for kernel_helper attributed decls.

2024-01-08 Thread Cupertino Miranda
Hi everyone,

This patch address the problem reported in:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113225

Looking forward to your review.

Cheers,
Cupertino


This patch fix a problem with kernel_helper attribute BTF information,
which incorrectly generates BTF_KIND_FUNC entry.
This BTF entry although accurate with traditional extern function
declarations, once the function is attributed with kernel_helper, it is
semantically incompatible of the kernel helpers in BPF infrastructure.

gcc/ChangeLog:
PR target/113225
* btfout.cc (btf_collect_datasec): Skip creating BTF info for
extern and kernel_helper attributed function decls.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/attr-kernel-helper.c: New test.
---
 gcc/btfout.cc |  7 +++
 gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c | 15 +++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 04218adc9e66..39e7bec43bfb 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "cgraph.h"
 #include "varasm.h"
+#include "stringpool.h"
+#include "attribs.h"
 #include "dwarf2out.h" /* For lookup_decl_die.  */
 
 static int btf_label_num;
@@ -429,6 +431,11 @@ btf_collect_datasec (ctf_container_ref ctfc)
   if (dtd == NULL)
continue;
 
+  if (DECL_EXTERNAL (func->decl)
+ && (lookup_attribute ("kernel_helper",
+   DECL_ATTRIBUTES (func->decl))) != NULL_TREE)
+   continue;
+
   /* Functions actually get two types: a BTF_KIND_FUNC_PROTO, and
 also a BTF_KIND_FUNC.  But the CTF container only allocates one
 type per function, which matches closely with BTF_KIND_FUNC_PROTO.
diff --git a/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c 
b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c
new file mode 100644
index ..7c5a0007c979
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c
@@ -0,0 +1,15 @@
+/* Basic test for kernel_helper attribute BTF information.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf" } */
+
+extern int foo_helper(int) __attribute((kernel_helper(42)));
+extern int foo_nohelper(int);
+
+int bar (int arg)
+{
+  return foo_helper (arg) + foo_nohelper (arg);
+}
+
+/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_nohelper'" 1 } } */
+/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_helper'" 0 } } */
-- 
2.30.2



Re: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

2024-01-08 Thread Richard Biener
On Tue, Jan 2, 2024 at 2:37 PM  wrote:
>
> From: Pan Li 
>
> According to the sematics of no-signed-zeros option, the backend
> like RISC-V should treat the minus zero -0.0f as plus zero 0.0f.
>
> Consider below example with option -fno-signed-zeros.
>
> void
> test (float *a)
> {
>   *a = -0.0;
> }
>
> We will generate code as below, which doesn't treat the minus zero
> as plus zero.
>
> test:
>   lui  a5,%hi(.LC0)
>   flw  fa5,%lo(.LC0)(a5)
>   fsw  fa5,0(a0)
>   ret
>
> .LC0:
>   .word -2147483648 // aka -0.0 (0x8000 in hex)
>
> This patch would like to fix the bug and treat the minus zero -0.0
> as plus zero, aka +0.0. Thus after this patch we will have asm code
> as below for the above sampe code.
>
> test:
>   sw zero,0(a0)
>   ret
>
> This patch also fix the run failure of the test case pr30957-1.c. The
> below tests are passed for this patch.

We don't really expect targets to do this.  The small testcase above
is somewhat ill-formed with -fno-signed-zeros.  Note there's no
-0.0 in pr30957-1.c so why does that one fail for you?  Does
the -fvariable-expansion-in-unroller code maybe not trigger for
riscv?

I think we should go to PR30957 and see what that was filed originally
for, the testcase doesn't make much sense to me.

> * The riscv regression tests.
> * The pr30957-1.c run tests.
>
> gcc/ChangeLog:
>
> * config/riscv/constraints.md: Leverage func 
> riscv_float_const_zero_rtx_p
> for predicating the rtx is const zero float or not.
> * config/riscv/predicates.md: Ditto.
> * config/riscv/riscv.cc (riscv_const_insns): Ditto.
> (riscv_float_const_zero_rtx_p): New func impl for predicating the rtx 
> is
> const zero float or not.
> (riscv_const_zero_rtx_p): New func impl for predicating the rtx
> is const zero (both int and fp) or not.
> * config/riscv/riscv-protos.h (riscv_float_const_zero_rtx_p):
> New func decl.
> (riscv_const_zero_rtx_p): Ditto.
> * config/riscv/riscv.md: Making sure the operand[1] of movfp is
> CONST0_RTX when the operand[1] is const zero float.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/no-signed-zeros-0.c: New test.
> * gcc.target/riscv/no-signed-zeros-1.c: New test.
> * gcc.target/riscv/no-signed-zeros-2.c: New test.
> * gcc.target/riscv/no-signed-zeros-3.c: New test.
> * gcc.target/riscv/no-signed-zeros-4.c: New test.
> * gcc.target/riscv/no-signed-zeros-5.c: New test.
> * gcc.target/riscv/no-signed-zeros-run-0.c: New test.
> * gcc.target/riscv/no-signed-zeros-run-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/constraints.md   |  2 +-
>  gcc/config/riscv/predicates.md|  2 +-
>  gcc/config/riscv/riscv-protos.h   |  2 +
>  gcc/config/riscv/riscv.cc | 35 -
>  gcc/config/riscv/riscv.md | 49 ---
>  .../gcc.target/riscv/no-signed-zeros-0.c  | 26 ++
>  .../gcc.target/riscv/no-signed-zeros-1.c  | 28 +++
>  .../gcc.target/riscv/no-signed-zeros-2.c  | 26 ++
>  .../gcc.target/riscv/no-signed-zeros-3.c  | 28 +++
>  .../gcc.target/riscv/no-signed-zeros-4.c  | 26 ++
>  .../gcc.target/riscv/no-signed-zeros-5.c  | 28 +++
>  .../gcc.target/riscv/no-signed-zeros-run-0.c  | 36 ++
>  .../gcc.target/riscv/no-signed-zeros-run-1.c  | 36 ++
>  13 files changed, 314 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-0.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-run-0.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-run-1.c
>
> diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
> index de4359af00d..db1d5e1385f 100644
> --- a/gcc/config/riscv/constraints.md
> +++ b/gcc/config/riscv/constraints.md
> @@ -108,7 +108,7 @@ (define_constraint "DnS"
>  (define_constraint "G"
>"@internal"
>(and (match_code "const_double")
> -   (match_test "op == CONST0_RTX (mode)")))
> +   (match_test "riscv_float_const_zero_rtx_p (op)")))
>
>  (define_memory_constraint "A"
>"An address that is held in a general-purpose register."
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index b87a6900841..b428d842101 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -78,7 +78,7 @@ (define_predicate "sleu_operand"
>
>  (define_predicate "const_0_operand"
>

  1   2   >