date:20240225

[PATCH] middle-end/114070 - folding breaking VEC_COND expansion

2024-02-25 Thread Richard Biener

The following properly guards the simplifications that move
operations into VEC_CONDs, in particular when that changes the
type constraints on this operation.

This needed a genmatch fix which was recording spurious implicit fors
when tcc_comparison is used in a C expression.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/114070
* genmatch.cc (parser::parse_c_expr): Do not record operand
lists but only mark operators used.
* match.pd ((c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e)):
Properly guard the case of tcc_comparison changing the VEC_COND
value operand type.

* gcc.dg/torture/pr114070.c: New testcase.
---
 gcc/genmatch.cc |  6 ++
 gcc/match.pd| 15 ---
 gcc/testsuite/gcc.dg/torture/pr114070.c | 12 
 3 files changed, 26 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr114070.c

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 375ae90ae6c..d9ae436ce5c 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -4760,10 +4760,8 @@ parser::parse_c_expr (cpp_ttype start)
= (const char *)CPP_HASHNODE (token->val.node.node)->ident.str;
  if (strcmp (str, "return") == 0)
fatal_at (token, "return statement not allowed in C expression");
- id_base *idb = get_operator (str);
- user_id *p;
- if (idb && (p = dyn_cast (idb)) && p->is_oper_list)
-   record_operlist (token->src_loc, p);
+ /* Mark user operators corresponding to 'str' as used.  */
+ get_operator (str);
}
 
   /* Record the token.  */
diff --git a/gcc/match.pd b/gcc/match.pd
index c5b6540f939..67007fc2017 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5149,15 +5149,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
  (simplify
   (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
-  (vec_cond @0 (op! @1 @3) (op! @2 @4)))
+  (if (TREE_CODE_CLASS (op) != tcc_comparison
+   || types_match (type, TREE_TYPE (@1))
+   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
+   (vec_cond @0 (op! @1 @3) (op! @2 @4
 
 /* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
  (simplify
   (op (vec_cond:s @0 @1 @2) @3)
-  (vec_cond @0 (op! @1 @3) (op! @2 @3)))
+  (if (TREE_CODE_CLASS (op) != tcc_comparison
+   || types_match (type, TREE_TYPE (@1))
+   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
+   (vec_cond @0 (op! @1 @3) (op! @2 @3
  (simplify
   (op @3 (vec_cond:s @0 @1 @2))
-  (vec_cond @0 (op! @3 @1) (op! @3 @2
+  (if (TREE_CODE_CLASS (op) != tcc_comparison
+   || types_match (type, TREE_TYPE (@1))
+   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
+   (vec_cond @0 (op! @3 @1) (op! @3 @2)
 
 #if GIMPLE
 (match (nop_atomic_bit_test_and_p @0 @1 @4)
diff --git a/gcc/testsuite/gcc.dg/torture/pr114070.c 
b/gcc/testsuite/gcc.dg/torture/pr114070.c
new file mode 100644
index 000..cf46ec45a04
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr114070.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-vect-cost-model" } */
+
+int unresolved(unsigned dirmask, unsigned mask, int *unresolved_n)
+{
+  for (int i = 0; i < 1024; i++) {
+mask |= 1;
+if (!unresolved_n[i] || unresolved_n[i] & 7)
+  dirmask |= 1;
+  }
+  return (dirmask == mask);
+}
-- 
2.35.3

RE: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Li, Pan2

> validate_subreg is a can of worms, can you try to fix the issue in DSE
> by avoiding to form the subreg in the first place?

Sure thing, will have a try in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, February 26, 2024 3:38 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Wang, 
Yanzhang ; rdapp@gmail.com
Subject: Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

On Mon, Feb 26, 2024 at 4:26 AM  wrote:
>
> From: Pan Li 
>
> We allowed vector type for get_stored_val when read is less than or
> equal to store in previous.  Unfortunately, we missed to adjust the
> validate_subreg part accordingly.  For vector type, we don't need to
> restrict the mode size is greater than the vector register size.
>
> Thus, for example when gen_lowpart from E_V2SFmode to E_V4QImode, it
> will have NULL_RTX(of course ICE after that) because of the mode size
> is less than vector register size.  That also explain that gen_lowpart
> from E_V8SFmode to E_V16QImode is valid here.
>
> This patch would like to remove the the restriction for vector mode, to
> rid of the ICE when gen_lowpart because of validate_subreg fails.

validate_subreg is a can of worms, can you try to fix the issue in DSE
by avoiding to form the subreg in the first place?

> The below test are passed for this patch:
>
> * The X86 bootstrap test.
> * The fully riscv regression tests.
>
> gcc/ChangeLog:
>
> * emit-rtl.cc (validate_subreg): Bypass register size check
> if the mode is vector.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-fre-44.c: Add ftree-vectorize to trigger
> the ICE.
> * gcc.target/riscv/rvv/base/bug-6.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/emit-rtl.cc   |  3 ++-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c|  2 +-
>  .../gcc.target/riscv/rvv/base/bug-6.c | 22 +++
>  3 files changed, 25 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 1856fa4884f..45c6301b487 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -934,7 +934,8 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  ;
>/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>   is the culprit here, and not the backends.  */
> -  else if (known_ge (osize, regsize) && known_ge (isize, osize))
> +  else if (known_ge (isize, osize) && (known_ge (osize, regsize)
> +|| (VECTOR_MODE_P (imode) || VECTOR_MODE_P (omode
>  ;
>/* Allow component subregs of complex and vector.  Though given the below
>   extraction rules, it's not always clear what that means.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> index f79b4c142ae..624a00a4f32 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-fre1" } */
> +/* { dg-options "-O -fdump-tree-fre1 -O3 -ftree-vectorize" } */
>
>  struct A { float x, y; };
>  struct B { struct A u; };
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> new file mode 100644
> index 000..5bb00b8f587
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> @@ -0,0 +1,22 @@
> +/* Test that we do not have ice when compile */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> +
> +struct A { float x, y; };
> +struct B { struct A u; };
> +
> +extern void bar (struct A *);
> +
> +float
> +f3 (struct B *x, int y)
> +{
> +  struct A p = {1.0f, 2.0f};
> +  struct A *q = &x[y].u;
> +
> +  __builtin_memcpy (&q->x, &p.x, sizeof (float));
> +  __builtin_memcpy (&q->y, &p.y, sizeof (float));
> +
> +  bar (&p);
> +
> +  return x[y].u.x + x[y].u.y;
> +}
> --
> 2.34.1
>

Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Richard Biener

On Mon, Feb 26, 2024 at 4:26 AM  wrote:
>
> From: Pan Li 
>
> We allowed vector type for get_stored_val when read is less than or
> equal to store in previous.  Unfortunately, we missed to adjust the
> validate_subreg part accordingly.  For vector type, we don't need to
> restrict the mode size is greater than the vector register size.
>
> Thus, for example when gen_lowpart from E_V2SFmode to E_V4QImode, it
> will have NULL_RTX(of course ICE after that) because of the mode size
> is less than vector register size.  That also explain that gen_lowpart
> from E_V8SFmode to E_V16QImode is valid here.
>
> This patch would like to remove the the restriction for vector mode, to
> rid of the ICE when gen_lowpart because of validate_subreg fails.

validate_subreg is a can of worms, can you try to fix the issue in DSE
by avoiding to form the subreg in the first place?

> The below test are passed for this patch:
>
> * The X86 bootstrap test.
> * The fully riscv regression tests.
>
> gcc/ChangeLog:
>
> * emit-rtl.cc (validate_subreg): Bypass register size check
> if the mode is vector.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-fre-44.c: Add ftree-vectorize to trigger
> the ICE.
> * gcc.target/riscv/rvv/base/bug-6.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/emit-rtl.cc   |  3 ++-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c|  2 +-
>  .../gcc.target/riscv/rvv/base/bug-6.c | 22 +++
>  3 files changed, 25 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 1856fa4884f..45c6301b487 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -934,7 +934,8 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  ;
>/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>   is the culprit here, and not the backends.  */
> -  else if (known_ge (osize, regsize) && known_ge (isize, osize))
> +  else if (known_ge (isize, osize) && (known_ge (osize, regsize)
> +|| (VECTOR_MODE_P (imode) || VECTOR_MODE_P (omode
>  ;
>/* Allow component subregs of complex and vector.  Though given the below
>   extraction rules, it's not always clear what that means.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> index f79b4c142ae..624a00a4f32 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-fre1" } */
> +/* { dg-options "-O -fdump-tree-fre1 -O3 -ftree-vectorize" } */
>
>  struct A { float x, y; };
>  struct B { struct A u; };
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> new file mode 100644
> index 000..5bb00b8f587
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> @@ -0,0 +1,22 @@
> +/* Test that we do not have ice when compile */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> +
> +struct A { float x, y; };
> +struct B { struct A u; };
> +
> +extern void bar (struct A *);
> +
> +float
> +f3 (struct B *x, int y)
> +{
> +  struct A p = {1.0f, 2.0f};
> +  struct A *q = &x[y].u;
> +
> +  __builtin_memcpy (&q->x, &p.x, sizeof (float));
> +  __builtin_memcpy (&q->y, &p.y, sizeof (float));
> +
> +  bar (&p);
> +
> +  return x[y].u.x + x[y].u.y;
> +}
> --
> 2.34.1
>

[PATCH] RISC-V: add option -m(no-)autovec-segment

2024-02-25 Thread juzhe.zh...@rivai.ai

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1dbe1115da4..6303d82d959 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11521,7 +11521,8 @@ vectorizable_load (vec_info *vinfo,
 - (vec_num * j + i) * nunits);
/* remain should now be > 0 and < nunits.  */
unsigned num;
-   if (constant_multiple_p (nunits, remain, &num))
+   if (known_gt (remain, 0)
+   && constant_multiple_p (nunits, remain, &num))
Why do you change loop vectorize code here ？

Ideally, we should add cost model for segment load/store instead of disable 
segment load/store autovectorization with compile option.

with this recognition like aarch64:

/* Return true if an access of kind KIND for STMT_INFO represents one
   vector of an LD[234] or ST[234] operation.  Return the total number of
   vectors (2, 3 or 4) if so, otherwise return a value outside that range.  */
static int
aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, stmt_vec_info stmt_info)
{
  if ((kind == vector_load
   || kind == unaligned_load
   || kind == vector_store
   || kind == unaligned_store)
  && STMT_VINFO_DATA_REF (stmt_info))
{
  stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
  if (stmt_info
 && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
return DR_GROUP_SIZE (stmt_info);
}
  return 0;
}




juzhe.zh...@rivai.ai

[PATCH] match.pd: Guard 2 simplifications on integral TYPE_OVERFLOW_UNDEFINED [PR114090]

2024-02-25 Thread Jakub Jelinek

Hi!

These 2 patterns are incorrect on floating point, or for -fwrapv, or
for -ftrapv, or the first one for unsigned types (the second one is
mathematically correct, but we ought to just fold that to 0 instead).

So, the following patch properly guards this.

I think we don't need && !TYPE_OVERFLOW_SANITIZED (type) because
in both simplifications there would be UB before and after on
signed integer minimum.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-02-26  Jakub Jelinek  

PR tree-optimization/114090
* match.pd ((x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x):
Restrict pattern to ANY_INTEGRAL_TYPE_P and TYPE_OVERFLOW_UNDEFINED
types.
((x <= 0 ? -x : 0) -> max(-x, 0)): Likewise.

* gcc.dg/pr114090.c: New test.

--- gcc/match.pd.jj 2024-02-22 10:09:48.678446435 +0100
+++ gcc/match.pd2024-02-24 19:23:32.201014245 +0100
@@ -453,8 +453,9 @@ (define_operator_list SYNC_FETCH_AND_AND
 
 /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
 (simplify
-  (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
-  (abs @0))
+ (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
+ (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
+  (abs @0)))
 
 /* X * 1, X / 1 -> X.  */
 (for op (mult trunc_div ceil_div floor_div round_div exact_div)
@@ -4218,8 +4219,9 @@ (define_operator_list SYNC_FETCH_AND_AND
 
 /* (x <= 0 ? -x : 0) -> max(-x, 0).  */
 (simplify
-  (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
-  (max @2 @1))
+ (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
+ (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
+  (max @2 @1)))
 
 /* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior plus)
--- gcc/testsuite/gcc.dg/pr114090.c.jj  2024-02-24 19:38:56.301096850 +0100
+++ gcc/testsuite/gcc.dg/pr114090.c 2024-02-24 19:42:26.917153801 +0100
@@ -0,0 +1,38 @@
+/* PR tree-optimization/114090 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fwrapv" } */
+
+__attribute__((noipa)) int
+foo (int x)
+{
+  int w = (x >= 0 ? x : 0);
+  int y = -x;
+  int z = (y >= 0 ? y : 0);
+  return w + z;
+}
+
+__attribute__((noipa)) int
+bar (int x)
+{
+  int w = (x >= 0 ? x : 0);
+  int z = (x <= 0 ? -x : 0);
+  return w + z;
+}
+
+__attribute__((noipa)) int
+baz (int x)
+{
+  return x <= 0 ? -x : 0;
+}
+
+int
+main ()
+{
+  int v = -__INT_MAX__ - 1;
+  if (foo (v) != 0)
+__builtin_abort ();
+  if (bar (v) != v)
+__builtin_abort ();
+  if (baz (v) != v)
+__builtin_abort ();
+}

Jakub

[PATCH] fold-const: Avoid infinite recursion in +-*&|^minmax reassociation [PR114084]

2024-02-25 Thread Jakub Jelinek

Hi!

In the following testcase we infinitely recurse during BIT_IOR_EXPR
reassociation.
One operand is (unsigned _BitInt(31)) a << 4 and another operand
2147483647 >> 1 | 80 where both the right shift and the | 80
trees have TREE_CONSTANT set, but weren't folded because of delayed
folding, where some foldings are apparently done even in that case
unfortunately.
Now, the fold_binary_loc reassocation code splits both operands into
variable part, minus variable part, constant part, minus constant part,
literal part and minus literal parts, to prevent infinite recursion
punts if there are just 2 parts altogether from the 2 operands and then goes
on with reassociation, merges first the corresponding parts from both
operands and then some further merges.
The problem with the above expressions is that we get 3 different objects,
var0 (the left shift), con1 (the right shift) and lit1 (80), so the infinite
recursion prevention doesn't trigger, and we eventually merge con1 with
lit1, which effectively reconstructs the original op1 and then associate
that with var0 which is original op0, and associate_trees for that case
calls fold_binary.  There are some casts involved there too (the T typedef
type and the underlying _BitInt type which are stripped with STRIP_NOPS).

The following patch attempts to prevent this infinite recursion by tracking
the origin (if certain var comes from nothing - 0, op0 - 1, op1 - 2 or both - 3)
and propagates it through all the associate_tree calls which merge the vars.
If near the end we'd try to merge what comes solely from op0 with what comes
solely from op1 (or vice versa), the patch punts, because then it isn't any
kind of reassociation between the two operands, if anything it should be
handled when folding the suboperands.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-02-26  Jakub Jelinek  

PR middle-end/114084
* fold-const.cc (fold_binary_loc): Avoid the final associate_trees
if all subtrees of var0 come from one of the op0 or op1 operands
and all subtrees of con0 come from the other one.  Don't clear
variables which are never used afterwards.

* gcc.dg/bitint-94.c: New test.

--- gcc/fold-const.cc.jj2024-02-24 09:49:09.098815803 +0100
+++ gcc/fold-const.cc   2024-02-24 11:08:56.036223491 +0100
@@ -11779,6 +11779,15 @@ fold_binary_loc (location_t loc, enum tr
  + (lit0 != 0) + (lit1 != 0)
  + (minus_lit0 != 0) + (minus_lit1 != 0)) > 2)
{
+ int var0_origin = (var0 != 0) + 2 * (var1 != 0);
+ int minus_var0_origin
+   = (minus_var0 != 0) + 2 * (minus_var1 != 0);
+ int con0_origin = (con0 != 0) + 2 * (con1 != 0);
+ int minus_con0_origin
+   = (minus_con0 != 0) + 2 * (minus_con1 != 0);
+ int lit0_origin = (lit0 != 0) + 2 * (lit1 != 0);
+ int minus_lit0_origin
+   = (minus_lit0 != 0) + 2 * (minus_lit1 != 0);
  var0 = associate_trees (loc, var0, var1, code, atype);
  minus_var0 = associate_trees (loc, minus_var0, minus_var1,
code, atype);
@@ -11791,15 +11800,19 @@ fold_binary_loc (location_t loc, enum tr
 
  if (minus_var0 && var0)
{
+ var0_origin |= minus_var0_origin;
  var0 = associate_trees (loc, var0, minus_var0,
  MINUS_EXPR, atype);
  minus_var0 = 0;
+ minus_var0_origin = 0;
}
  if (minus_con0 && con0)
{
+ con0_origin |= minus_con0_origin;
  con0 = associate_trees (loc, con0, minus_con0,
  MINUS_EXPR, atype);
  minus_con0 = 0;
+ minus_con0_origin = 0;
}
 
  /* Preserve the MINUS_EXPR if the negative part of the literal is
@@ -11815,15 +11828,19 @@ fold_binary_loc (location_t loc, enum tr
  /* But avoid ending up with only negated parts.  */
  && (var0 || con0))
{
+ minus_lit0_origin |= lit0_origin;
  minus_lit0 = associate_trees (loc, minus_lit0, lit0,
MINUS_EXPR, atype);
  lit0 = 0;
+ lit0_origin = 0;
}
  else
{
+ lit0_origin |= minus_lit0_origin;
  lit0 = associate_trees (loc, lit0, minus_lit0,
  MINUS_EXPR, atype);
  minus_lit0 = 0;
+ minus_lit0_origin = 0;
}
}
 
@@ -11833,37 +11850,51 @@ fold_binary_loc (location_t loc, enum tr
return NULL_TREE;

Re: [PATCH v1] RISC-V: Upgrade RVV intrinsic version to 0.12

2024-02-25 Thread Kito Cheng

On Fri, Feb 23, 2024 at 3:48 AM Palmer Dabbelt  wrote:
>
> On Wed, 21 Feb 2024 16:02:50 PST (-0800), Kito Cheng wrote:
> > Palmer Dabbelt  於 2024年2月22日 週四 07:42 寫道：
> >
> >> On Wed, 21 Feb 2024 15:34:32 PST (-0800), Kito Cheng wrote:
> >> > LGTM for the patch
> >> >
> >> > Li, Pan2  於 2024年2月21日 週三 12:31 寫道：
> >> >
> >> >> Hi kito and juzhe.
> >> >>
> >> >> There may be 2 items for double-confirm. Thanks a lot.
> >> >>
> >> >> 1. Not very sure if we need to upgrade the version for
> >> >> __riscv_th_v_intrinsic.
> >> >>
> >> >
> >> > Yes since 0.11 and 0.12 is not really compatible
> >>
> >> Where are the incompatibilities?  The whole reason we accepted the
> >> intrinsics in the first place is because the RVI folks said they
> >> wouldn't break compatibility, if that's changed then just dropping the
> >> old version is going to break users.
> >>
> >
> > 0.12 have interface for segment load store and new fixed points intrinsic
> > compare to 0.11, the first one item is not incompatible change since it's
> > new added and gcc 13 isn't implemented the legacy one, the later one is
> > kinda broken on both llvm and gcc which is made is not really useful in
> > practice.
> >
> > Other than that, everything are same, it's not 100% compatible so I am not
> > intend to cheating my self to say it's compatible, but we do think it's
> > necessary evil since fixing point stuff are not right design and
> > implementation.
>
> OK, those don't seem so scary.  So maybe let's just put it in a NEWS
> entry or something?  It's mildly interesting to users, but I agree the
> earlier intrinsics spec was vague enough in some areas we can get away
> with the diffs I've seen.

Yeah, thanks for the reminder, I guess we need to prepare to update
more NEWS entries...

>
> > Anyway it's became frozen mode, 1.0 rc0 has been tagged, no API will
> > change/remove.
>
> OK, so I guess we should move to 1.0, then?  Are you guys going to pick
> that up?

No difference between 0.12 and 1.0...here is just some paperwork for
the process that is still ongoing...but anyway we will handle that.

[committed] i386: Fix up x86_function_profiler -masm=intel support [PR114094]

2024-02-25 Thread Jakub Jelinek

Hi!

In my r14-8214 changes I apparently forgot one \n at the end of an instruction.
The corresponding AT&T line looks like:
"1:\tcall\t*%s@GOTPCREL(%%rip)\n"
but the Intel variant was
"1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]"

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk as obvious.

2024-02-26  Jakub Jelinek  

PR target/114094
* config/i386/i386.cc (x86_function_profiler): Add missing new-line
to printed instruction.

* gcc.target/i386/pr114094.c: New test.

--- gcc/config/i386/i386.cc.jj  2024-02-22 10:10:18.675032283 +0100
+++ gcc/config/i386/i386.cc 2024-02-25 13:12:12.403323842 +0100
@@ -22909,7 +22909,7 @@ x86_function_profiler (FILE *file, int l
  if (!ix86_direct_extern_access)
{
  if (ASSEMBLER_DIALECT == ASM_INTEL)
-   fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]",
+   fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n",
 mcount_name);
  else
fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n",
--- gcc/testsuite/gcc.target/i386/pr114094.c.jj 2024-02-25 22:36:24.673320296 
+0100
+++ gcc/testsuite/gcc.target/i386/pr114094.c2024-02-25 22:17:18.254178208 
+0100
@@ -0,0 +1,10 @@
+/* PR target/114094 */
+/* { dg-do assemble { target *-*-linux* } } */
+/* { dg-require-effective-target masm_intel } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-fpie -fprofile -mno-direct-extern-access -masm=intel" } */
+
+void
+foo (void)
+{
+}

Jakub

RE: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-02-25 Thread Li, Pan2

Got it, we need to combine that together up to point.
Thanks Tamar for the explanation. Help a lot and will have a try in v3.

Pan

-Original Message-
From: Tamar Christina  
Sent: Sunday, February 25, 2024 5:02 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com; richard.guent...@gmail.com; richard.sandiford@arm.com2; 
jeffreya...@gmail.com
Subject: RE: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation 
US_PLUS

Hi Pan,

> From: Pan Li 
> 
> Hi Richard & Tamar,
> 
> Try the DEF_INTERNAL_INT_EXT_FN as your suggestion.  By mapping
> us_plus$a3 to the RTL representation (us_plus:m x y) in optabs.def.
> And then expand_US_PLUS in internal-fn.cc.  Not very sure if my
> understanding is correct for DEF_INTERNAL_INT_EXT_FN.
> 
> I am not sure if we still need DEF_INTERNAL_SIGNED_OPTAB_FN here, given
> the RTL representation has (ss_plus:m x y) and (us_plus:m x y) already.
> 

I think a couple of things are being confused here.  So lets break it down:

The reason for DEF_INTERNAL_SIGNED_OPTAB_FN is because in GIMPLE
we only want one internal function for both signed and unsigned SAT_ADD.
with this definition we don't need SAT_UADD and SAT_SADD but instead
we will only have SAT_ADD, which will expand to us_plus or ss_plus.

Now the downside of this is that this is a direct internal optab.  This means
that for the representation to be used the target *must* have the optab
implemented.   This is a bit annoying because it doesn't allow us to generically
assume that all targets use SAT_ADD for saturating add and thus only have to
write optimization for this representation.

This is why Richi said we may need to use a new tree_code because we can
override tree code expansions.  However the same can be done with the _EXT_FN
internal functions.

So what I meant was that we want to have a combination of the two. i.e. a
DEF_INTERNAL_SIGNED_OPTAB_EXT_FN.

If Richi agrees, the below is what I meant. It creates the infrastructure for 
this
and for now only allows a default fallback for unsigned saturating add and makes
it easier for us to add the rest later

Also, unless I'm wrong (and Richi can correct me here), us_plus and ss_plus are 
the
RTL expression, but the optab for saturation are ssadd and usadd.  So you don't
need to make new us_plus and ss_plus ones.

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index a07f25f3aee..aaf9f8991b3 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4103,6 +4103,17 @@ direct_internal_fn_supported_p (internal_fn fn, 
tree_pair types,
return direct_##TYPE##_optab_supported_p (which_optab, types,   \
  opt_type);\
   }
+#define DEF_INTERNAL_SIGNED_OPTAB_EXT_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+UNSIGNED_OPTAB, TYPE)  \
+case IFN_##CODE:   \
+  {
\
+   optab which_optab = (TYPE_UNSIGNED (types.SELECTOR) \
+? UNSIGNED_OPTAB ## _optab \
+: SIGNED_OPTAB ## _optab); \
+   return direct_##TYPE##_optab_supported_p (which_optab, types,   \
+ opt_type) \
+  || internal_##CODE##_fn_supported_p (types.SELECTOR, opt_type); \
+  }
 #include "internal-fn.def"

 case IFN_LAST:
@@ -4303,6 +4314,8 @@ set_edom_supported_p (void)
 optab which_optab = direct_internal_fn_optab (fn, types);  \
 expand_##TYPE##_optab_fn (fn, stmt, which_optab);  \
   }
+#define DEF_INTERNAL_SIGNED_OPTAB_EXT_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+UNSIGNED_OPTAB, TYPE)
 #include "internal-fn.def"

 /* Routines to expand each internal function, indexed by function number.
@@ -5177,3 +5190,45 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
   emit_move_insn (plhs, cmp);
 }
 }
+
+void
+expand_SAT_ADD (internal_fn fn, gcall *stmt)
+{
+  /* Check if the target supports the expansion through an IFN.  */
+  tree_pair types = direct_internal_fn_types (fn, stmt);
+  optab which_optab = direct_internal_fn_optab (fn, types);
+  if (direct_binary_optab_supported_p (which_optab, types,
+  insn_optimization_type ()))
+{
+  expand_binary_optab_fn (fn, stmt, which_optab);
+  return;
+}
+
+  /* Target does not support the optab, but we can de-compose it.  */
+  /*
+  ... decompose to a canonical representation ...
+  if (TYPE_UNSIGNED (types.SELECTOR))
+{
+  ...
+  decompose back to (X + Y) | - ((X + Y) < X)
+}
+  else
+{
+  ...
+}
+  */
+}
+
+bool internal_SAT_ADD_fn_supported_p (tree type, optimization_type /* optype 
*/)
+{
+  /* For now, don't support deco

[RESEND PATCH] C/C++: add hints for strerror

2024-02-25 Thread Oskari Pirhonen

Add proper hints for implicit declaration of strerror.

The results could be confusing depending on the other included headers.
These example messages are from compiling a trivial program to print the
string for an errno value. It only includes stdio.h (cstdio for C++).

Before:
$ /tmp/gcc-master/bin/gcc test.c -o test_c
test.c: In function ‘main’:
test.c:4:20: warning: implicit declaration of function ‘strerror’; did you mean 
‘perror’? [-Wimplicit-function-declaration]
4 | printf("%s\n", strerror(0));
  |^~~~
  |perror

$ /tmp/gcc-master/bin/g++ test.cpp -o test_cpp
test.cpp: In function ‘int main()’:
test.cpp:4:20: error: ‘strerror’ was not declared in this scope; did you mean 
‘stderr’?
4 | printf("%s\n", strerror(0));
  |^~~~
  |stderr

After:
$ /tmp/gcc-known-headers/bin/gcc test.c -o test_c
test.c: In function ‘main’:
test.c:4:20: warning: implicit declaration of function ‘strerror’ 
[-Wimplicit-function-declaration]
4 | printf("%s\n", strerror(0));
  |^~~~
test.c:2:1: note: ‘strerror’ is defined in header ‘’; this is 
probably fixable by adding ‘#include ’
1 | #include 
  +++ |+#include 
2 |

$ /tmp/gcc-known-headers/bin/g++ test.cpp -o test_cpp
test.cpp: In function ‘int main()’:
test.cpp:4:20: error: ‘strerror’ was not declared in this scope
4 | printf("%s\n", strerror(0));
  |^~~~
test.cpp:2:1: note: ‘strerror’ is defined in header ‘’; this is 
probably fixable by adding ‘#include ’
1 | #include 
  +++ |+#include 
2 |

gcc/c-family/ChangeLog:

* known-headers.cc (get_stdlib_header_for_name): Add strerror.

gcc/testsuite/ChangeLog:

* g++.dg/spellcheck-stdlib.C: Add check for strerror.
* gcc.dg/spellcheck-stdlib-2.c: New test.

Signed-off-by: Oskari Pirhonen 
---
 gcc/c-family/known-headers.cc  | 1 +
 gcc/testsuite/g++.dg/spellcheck-stdlib.C   | 2 ++
 gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c | 8 
 3 files changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c

diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc
index dbc42eacde1..871fd714eb5 100644
--- a/gcc/c-family/known-headers.cc
+++ b/gcc/c-family/known-headers.cc
@@ -182,6 +182,7 @@ get_stdlib_header_for_name (const char *name, enum stdlib 
lib)
 {"strchr", {"", ""} },
 {"strcmp", {"", ""} },
 {"strcpy", {"", ""} },
+{"strerror", {"", ""} },
 {"strlen", {"", ""} },
 {"strncat", {"", ""} },
 {"strncmp", {"", ""} },
diff --git a/gcc/testsuite/g++.dg/spellcheck-stdlib.C 
b/gcc/testsuite/g++.dg/spellcheck-stdlib.C
index fd0f3a9b8c9..33718b8034e 100644
--- a/gcc/testsuite/g++.dg/spellcheck-stdlib.C
+++ b/gcc/testsuite/g++.dg/spellcheck-stdlib.C
@@ -104,6 +104,8 @@ void test_cstring (char *dest, char *src)
   // { dg-message "'#include '" "" { target *-*-* } .-1 }
   strcpy(dest, "test"); // { dg-error "was not declared" }
   // { dg-message "'#include '" "" { target *-*-* } .-1 }
+  strerror(0); // { dg-error "was not declared" }
+  // { dg-message "'#include '" "" { target *-*-* } .-1 }
   strlen("test"); // { dg-error "was not declared" }
   // { dg-message "'#include '" "" { target *-*-* } .-1 }
   strncat(dest, "test", 3); // { dg-error "was not declared" }
diff --git a/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c 
b/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c
new file mode 100644
index 000..61c17f350cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c
@@ -0,0 +1,8 @@
+/* { dg-options "-Wimplicit-function-declaration" } */
+
+/* Missing .  */
+void test_string_h (void)
+{
+  strerror (0); /* { dg-warning "implicit declaration of function 'strerror'" 
} */
+  /* { dg-message "'strerror' is defined in header ''" "" { target 
*-*-* } .-1 } */
+}
-- 
2.43.0

[PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-25 Thread jeevitha

Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

There is no immediate value splatting instruction in powerpc. Currently that
needs to be stored in a register or memory. For addressing this I have updated
the predicate for the second operand in vsx_splat to splat_input_operand,
which will handle the operands appropriately.

2024-02-26  Jeevitha Palanisamy  

gcc/
PR target/113950
* config/rs6000/vsx.md (vsx_splat_): Updated the predicates
for second operand.

gcc/testsuite/
PR target/113950
* gcc.target/powerpc/pr113950.c: New testcase.

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 6111cc90eb7..e5688ff972a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4660,7 +4660,7 @@
 (define_expand "vsx_splat_"
   [(set (match_operand:VSX_D 0 "vsx_register_operand")
(vec_duplicate:VSX_D
-(match_operand: 1 "input_operand")))]
+(match_operand: 1 "splat_input_operand")))]
   "VECTOR_MEM_VSX_P (mode)"
 {
   rtx op1 = operands[1];
diff --git a/gcc/testsuite/gcc.target/powerpc/pr113950.c 
b/gcc/testsuite/gcc.target/powerpc/pr113950.c
new file mode 100644
index 000..29ded29f683
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113950.c
@@ -0,0 +1,24 @@
+/* PR target/113950 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+/* Verify we do not ICE on the following.  */
+
+void abort (void);
+
+int main ()
+{
+  int i;
+  vector signed long long vsll_result, vsll_expected_result;
+  signed long long sll_arg1;
+
+  sll_arg1 = 300;
+  vsll_expected_result = (vector signed long long) {300, 300};
+  vsll_result = __builtin_vsx_splat_2di (sll_arg1);  
+
+  for (i = 0; i < 2; i++)
+if (vsll_result[i] != vsll_expected_result[i])
+  abort();
+
+  return 0;
+}

[PATCH] rs6000: load high and low part of 128bit vector independently [PR110040]

2024-02-25 Thread jeevitha

Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

PR110040 exposes an issue concerning moves from vector registers to GPRs.
There are two moves, one for upper 64 bits and the other for the lower
64 bits.  In the problematic test case, we are only interested in storing
the lower 64 bits.  However, the instruction for copying the upper 64 bits
is still emitted and is dead code.  This patch adds a splitter that splits
apart the two move instructions so that DCE can remove the dead code after
splitting.

2024-02-26  Jeevitha Palanisamy  

gcc/
PR target/110040
* config/rs6000/vsx.md (split pattern for V1TI to DI move): Defined.

gcc/testsuite/
PR target/110040
* gcc.target/powerpc/pr110040-1.c: New testcase.
* gcc.target/powerpc/pr110040-2.c: New testcase.


diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 6111cc90eb7..78457f8fb14 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -6706,3 +6706,19 @@
   "vmsumcud %0,%1,%2,%3"
   [(set_attr "type" "veccomplex")]
 )
+
+(define_split
+  [(set (match_operand:V1TI 0 "int_reg_operand")
+   (match_operand:V1TI 1 "vsx_register_operand"))]
+  "reload_completed
+   && TARGET_DIRECT_MOVE_64BIT"
+   [(pc)]
+{
+  rtx op0 = gen_rtx_REG (DImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
+  rtx op2 = gen_rtx_REG (DImode, REGNO (operands[0]) + 1);
+  rtx op3 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
+  emit_insn (gen_vsx_extract_v2di (op0, op1, GEN_INT (0)));
+  emit_insn (gen_vsx_extract_v2di (op2, op3, GEN_INT (1)));
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
new file mode 100644
index 000..fb3bd254636
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
@@ -0,0 +1,14 @@
+/* PR target/110040 */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
+
+#include 
+
+void
+foo (signed long *dst, vector signed __int128 src)
+{
+  *dst = (signed long) src[0];
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
new file mode 100644
index 000..f3aa22be4e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
@@ -0,0 +1,13 @@
+/* PR target/110040 */
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
+
+#include 
+
+void
+foo (signed int *dst, vector signed __int128 src)
+{
+  __builtin_vec_xst_trunc (src, 0, dst);
+}

Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Hongtao Liu

On Mon, Feb 26, 2024 at 11:42 AM Li, Pan2  wrote:
>
> > Be Careful, It may regresses some other backend.
>
> Thanks Hongtao, how about take INNER_MODE here for regsize. Currently it will 
> be the whole vector register when comparation.
>
> poly_uint64 regsize = REGMODE_NATURAL_SIZE (imode);
>
> Pan
>
> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, February 26, 2024 11:41 AM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> richard.guent...@gmail.com; Wang, Yanzhang ; 
> rdapp@gmail.com
> Subject: Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE
>
> On Mon, Feb 26, 2024 at 11:26 AM  wrote:
> >
> > From: Pan Li 
> >
> > We allowed vector type for get_stored_val when read is less than or
> > equal to store in previous.  Unfortunately, we missed to adjust the
> > validate_subreg part accordingly.  For vector type, we don't need to
> > restrict the mode size is greater than the vector register size.
> >
> > Thus, for example when gen_lowpart from E_V2SFmode to E_V4QImode, it
> > will have NULL_RTX(of course ICE after that) because of the mode size
> > is less than vector register size.  That also explain that gen_lowpart
> > from E_V8SFmode to E_V16QImode is valid here.
> >
> > This patch would like to remove the the restriction for vector mode, to
> > rid of the ICE when gen_lowpart because of validate_subreg fails.
> Be Careful, It may regresses some other backend.
The related thread.
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578466.html
> >
> > The below test are passed for this patch:
> >
> > * The X86 bootstrap test.
> > * The fully riscv regression tests.
> >
> > gcc/ChangeLog:
> >
> > * emit-rtl.cc (validate_subreg): Bypass register size check
> > if the mode is vector.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/ssa-fre-44.c: Add ftree-vectorize to trigger
> > the ICE.
> > * gcc.target/riscv/rvv/base/bug-6.c: New test.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/emit-rtl.cc   |  3 ++-
> >  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c|  2 +-
> >  .../gcc.target/riscv/rvv/base/bug-6.c | 22 +++
> >  3 files changed, 25 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> >
> > diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> > index 1856fa4884f..45c6301b487 100644
> > --- a/gcc/emit-rtl.cc
> > +++ b/gcc/emit-rtl.cc
> > @@ -934,7 +934,8 @@ validate_subreg (machine_mode omode, machine_mode imode,
> >  ;
> >/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> >   is the culprit here, and not the backends.  */
> > -  else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > +  else if (known_ge (isize, osize) && (known_ge (osize, regsize)
> > +|| (VECTOR_MODE_P (imode) || VECTOR_MODE_P (omode
> >  ;
> >/* Allow component subregs of complex and vector.  Though given the below
> >   extraction rules, it's not always clear what that means.  */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> > index f79b4c142ae..624a00a4f32 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O -fdump-tree-fre1" } */
> > +/* { dg-options "-O -fdump-tree-fre1 -O3 -ftree-vectorize" } */
> >
> >  struct A { float x, y; };
> >  struct B { struct A u; };
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> > new file mode 100644
> > index 000..5bb00b8f587
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> > @@ -0,0 +1,22 @@
> > +/* Test that we do not have ice when compile */
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> > +
> > +struct A { float x, y; };
> > +struct B { struct A u; };
> > +
> > +extern void bar (struct A *);
> > +
> > +float
> > +f3 (struct B *x, int y)
> > +{
> > +  struct A p = {1.0f, 2.0f};
> > +  struct A *q = &x[y].u;
> > +
> > +  __builtin_memcpy (&q->x, &p.x, sizeof (float));
> > +  __builtin_memcpy (&q->y, &p.y, sizeof (float));
> > +
> > +  bar (&p);
> > +
> > +  return x[y].u.x + x[y].u.y;
> > +}
> > --
> > 2.34.1
> >
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao

[PATCH] RISC-V: add option -m(no-)autovec-segment

2024-02-25 Thread Greg McGary

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, 
vec_mask_len_store_lanes):
  Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
macro.
* gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
* gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
divide-by-zero.
* testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.
---
 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 gcc/tree-vect-stmts.cc|  3 +-
 69 files changed, 411 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 
gcc/testsuite/gcc.t

[PATCH 2/2] LoongArch: Remove unneeded sign extension after crc/crcc instructions

2024-02-25 Thread Xi Ruoyao

The specification of crc/crcc instructions is clear that the output is
sign-extended to GRLEN.  Add a define_insn to tell the compiler this
fact and allow it to remove the unneeded sign extension on crc/crcc
output.  As crc/crcc instructions are usually used in a tight loop,
this should produce a significant performance gain.

gcc/ChangeLog:

* config/loongarch/loongarch.md
(loongarch__w__w_extended): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/crc-sext.c: New test;
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 11 +++
 gcc/testsuite/gcc.target/loongarch/crc-sext.c | 13 +
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/crc-sext.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 4ded1b3a117..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4264,6 +4264,17 @@ (define_insn "loongarch__w__w"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
+(define_insn "loongarch__w__w_extended"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
+ (match_operand:SI 2 "register_operand" "r")]
+CRC)))]
+  "TARGET_64BIT"
+  ".w..w\t%0,%1,%2"
+  [(set_attr "type" "unknown")
+   (set_attr "mode" "")])
+
 ;; With normal or medium code models, if the only use of a pc-relative
 ;; address is for loading or storing a value, then relying on linker
 ;; relaxation is not better than emitting the machine instruction directly.
diff --git a/gcc/testsuite/gcc.target/loongarch/crc-sext.c 
b/gcc/testsuite/gcc.target/loongarch/crc-sext.c
new file mode 100644
index 000..9ade5a8e4ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/crc-sext.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+**my_crc:
+** crc.w.d.w   \$r4,\$r4,\$r5
+** jr  \$r1
+*/
+int my_crc(long long dword, int crc)
+{
+   return __builtin_loongarch_crc_w_d_w(dword, crc);
+}
-- 
2.44.0

[PATCH 1/2] LoongArch: NFC: Deduplicate crc instruction defines

2024-02-25 Thread Xi Ruoyao

Introduce an iterator for UNSPEC_CRC and UNSPEC_CRCC to make the next
change easier.

gcc/ChangeLog:

* config/loongarch/loongarch.md (CRC): New define_int_iterator.
(crc): New define_int_attr.
(loongarch_crc_w__w, loongarch_crcc_w__w): Unify
into ...
(loongarch__w__w): ... here.
---
 gcc/config/loongarch/loongarch.md | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2ce7a151880..4ded1b3a117 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4251,24 +4251,16 @@ (define_peephole2
 
 
 (define_mode_iterator QHSD [QI HI SI DI])
+(define_int_iterator CRC [UNSPEC_CRC UNSPEC_CRCC])
+(define_int_attr crc [(UNSPEC_CRC "crc") (UNSPEC_CRCC "crcc")])
 
-(define_insn "loongarch_crc_w__w"
+(define_insn "loongarch__w__w"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRC))]
+CRC))]
   ""
-  "crc.w..w\t%0,%1,%2"
-  [(set_attr "type" "unknown")
-   (set_attr "mode" "")])
-
-(define_insn "loongarch_crcc_w__w"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
-  (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRCC))]
-  ""
-  "crcc.w..w\t%0,%1,%2"
+  ".w..w\t%0,%1,%2"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
-- 
2.44.0

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread H.J. Lu

On Sun, Feb 25, 2024 at 7:03 PM Hongtao Liu  wrote:
>
> On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu  wrote:
> >
> > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu  wrote:
> > >
> > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu  wrote:
> > > >
> > > > ldtilecfg and sttilecfg take a 512-byte memory block.  With
> > > > _tile_loadconfig implemented as
> > > >
> > > > extern __inline void
> > > > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > > _tile_loadconfig (const void *__config)
> > > > {
> > > >   __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void 
> > > > **)__config)));
> > > > }
> > > >
> > > > GCC sees:
> > > >
> > > > (parallel [
> > > >   (asm_operands/v ("ldtilecfg   %X0") ("") 0
> > > >[(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
> > > >  (const_int -64 [0xffc0])) [1 
> > > > MEM[(const void * *)&tile_data]+0 S8 A128])]
> > > >[(asm_input:DI ("m"))]
> > > >(clobber (reg:CC 17 flags))])
> > > >
> > > > and the memory operand size is 1 byte.  As the result, the rest of 511
> > > > bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
> > > > with a pointer to BLKmode to honor the 512-byte memory block.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/114098
> > > > * config/i386/amxtileintrin.h (_tile_loadconfig): Use
> > > > __builtin_ia32_ldtilecfg.
> > > > (_tile_storeconfig): Use __builtin_ia32_sttilecfg.
> > > > * config/i386/i386-builtin.def (BDESC): Add
> > > > __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
> > > > * config/i386/i386-expand.cc (ix86_expand_builtin): Handle
> > > > IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
> > > > * config/i386/i386.md (ldtilecfg): New pattern.
> > > > (sttilecfg): Likewise.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/114098
> > > > * gcc.target/i386/amxtile-4.c: New test.
> > > > ---
> > > >  gcc/config/i386/amxtileintrin.h   |  4 +-
> > > >  gcc/config/i386/i386-builtin.def  |  4 ++
> > > >  gcc/config/i386/i386-expand.cc| 19 
> > > >  gcc/config/i386/i386.md   | 24 ++
> > > >  gcc/testsuite/gcc.target/i386/amxtile-4.c | 55 +++
> > > >  5 files changed, 104 insertions(+), 2 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-4.c
> > > >
> > > > diff --git a/gcc/config/i386/amxtileintrin.h 
> > > > b/gcc/config/i386/amxtileintrin.h
> > > > index d1a26e0fea5..5081b326498 100644
> > > > --- a/gcc/config/i386/amxtileintrin.h
> > > > +++ b/gcc/config/i386/amxtileintrin.h
> > > > @@ -39,14 +39,14 @@ extern __inline void
> > > >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > >  _tile_loadconfig (const void *__config)
> > > >  {
> > > > -  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void 
> > > > **)__config)));
> > > > +  __builtin_ia32_ldtilecfg (__config);
> > > >  }
> > > >
> > > >  extern __inline void
> > > >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > >  _tile_storeconfig (void *__config)
> > > >  {
> > > > -  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config)));
> > > > +  __builtin_ia32_sttilecfg (__config);
> > > >  }
> > > >
> > > >  extern __inline void
> > > > diff --git a/gcc/config/i386/i386-builtin.def 
> > > > b/gcc/config/i386/i386-builtin.def
> > > > index 729355230b8..88dd7f8857f 100644
> > > > --- a/gcc/config/i386/i386-builtin.def
> > > > +++ b/gcc/config/i386/i386-builtin.def
> > > > @@ -126,6 +126,10 @@ BDESC (OPTION_MASK_ISA_XSAVES | 
> > > > OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, "__b
> > > >  BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, 
> > > > CODE_FOR_nothing, "__builtin_ia32_xrstors64", IX86_BUILTIN_XRSTORS64, 
> > > > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> > > >  BDESC (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_64BIT, 0, 
> > > > CODE_FOR_nothing, "__builtin_ia32_xsavec64", IX86_BUILTIN_XSAVEC64, 
> > > > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> > > >
> > > > +/* LDFILECFG and STFILECFG.  */
> > > > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > > > CODE_FOR_ldtilecfg, "__builtin_ia32_ldtilecfg", IX86_BUILTIN_LDTILECFG, 
> > > > UNKNOWN, (int) VOID_FTYPE_PCVOID)
> > > > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > > > CODE_FOR_ldtilecfg, "__builtin_ia32_sttilecfg", IX86_BUILTIN_STTILECFG, 
> > > > UNKNOWN, (int) VOID_FTYPE_PVOID)
> > > CODE_FOR_sttilecfg.
> >
> > It is unused.  I changed both to CODE_FOR_nothing.
> >
> > > > +
> > > >  /* SSE */
> > > >  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_movv4sf_internal, 
> > > > "__builtin_ia32_storeups", IX86_BUILTIN_STOREUPS, UNKNOWN, (int) 
> > > > VOID_FTYPE_PFLOAT_V4SF)
> > > >  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movntv4sf, 
> > > > "__builtin_ia32_movntps", IX86_BUILTIN_MOVNTPS, UNKNOWN, (int) 
> > > > VOID_

RE: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Li, Pan2

> Be Careful, It may regresses some other backend.

Thanks Hongtao, how about take INNER_MODE here for regsize. Currently it will 
be the whole vector register when comparation.

poly_uint64 regsize = REGMODE_NATURAL_SIZE (imode);

Pan

-Original Message-
From: Hongtao Liu  
Sent: Monday, February 26, 2024 11:41 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com; Wang, Yanzhang ; 
rdapp@gmail.com
Subject: Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

On Mon, Feb 26, 2024 at 11:26 AM  wrote:
>
> From: Pan Li 
>
> We allowed vector type for get_stored_val when read is less than or
> equal to store in previous.  Unfortunately, we missed to adjust the
> validate_subreg part accordingly.  For vector type, we don't need to
> restrict the mode size is greater than the vector register size.
>
> Thus, for example when gen_lowpart from E_V2SFmode to E_V4QImode, it
> will have NULL_RTX(of course ICE after that) because of the mode size
> is less than vector register size.  That also explain that gen_lowpart
> from E_V8SFmode to E_V16QImode is valid here.
>
> This patch would like to remove the the restriction for vector mode, to
> rid of the ICE when gen_lowpart because of validate_subreg fails.
Be Careful, It may regresses some other backend.
>
> The below test are passed for this patch:
>
> * The X86 bootstrap test.
> * The fully riscv regression tests.
>
> gcc/ChangeLog:
>
> * emit-rtl.cc (validate_subreg): Bypass register size check
> if the mode is vector.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-fre-44.c: Add ftree-vectorize to trigger
> the ICE.
> * gcc.target/riscv/rvv/base/bug-6.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/emit-rtl.cc   |  3 ++-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c|  2 +-
>  .../gcc.target/riscv/rvv/base/bug-6.c | 22 +++
>  3 files changed, 25 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 1856fa4884f..45c6301b487 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -934,7 +934,8 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  ;
>/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>   is the culprit here, and not the backends.  */
> -  else if (known_ge (osize, regsize) && known_ge (isize, osize))
> +  else if (known_ge (isize, osize) && (known_ge (osize, regsize)
> +|| (VECTOR_MODE_P (imode) || VECTOR_MODE_P (omode
>  ;
>/* Allow component subregs of complex and vector.  Though given the below
>   extraction rules, it's not always clear what that means.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> index f79b4c142ae..624a00a4f32 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-fre1" } */
> +/* { dg-options "-O -fdump-tree-fre1 -O3 -ftree-vectorize" } */
>
>  struct A { float x, y; };
>  struct B { struct A u; };
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> new file mode 100644
> index 000..5bb00b8f587
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> @@ -0,0 +1,22 @@
> +/* Test that we do not have ice when compile */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> +
> +struct A { float x, y; };
> +struct B { struct A u; };
> +
> +extern void bar (struct A *);
> +
> +float
> +f3 (struct B *x, int y)
> +{
> +  struct A p = {1.0f, 2.0f};
> +  struct A *q = &x[y].u;
> +
> +  __builtin_memcpy (&q->x, &p.x, sizeof (float));
> +  __builtin_memcpy (&q->y, &p.y, sizeof (float));
> +
> +  bar (&p);
> +
> +  return x[y].u.x + x[y].u.y;
> +}
> --
> 2.34.1
>


-- 
BR,
Hongtao

Re: [PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread Hongtao Liu

On Mon, Feb 26, 2024 at 11:26 AM  wrote:
>
> From: Pan Li 
>
> We allowed vector type for get_stored_val when read is less than or
> equal to store in previous.  Unfortunately, we missed to adjust the
> validate_subreg part accordingly.  For vector type, we don't need to
> restrict the mode size is greater than the vector register size.
>
> Thus, for example when gen_lowpart from E_V2SFmode to E_V4QImode, it
> will have NULL_RTX(of course ICE after that) because of the mode size
> is less than vector register size.  That also explain that gen_lowpart
> from E_V8SFmode to E_V16QImode is valid here.
>
> This patch would like to remove the the restriction for vector mode, to
> rid of the ICE when gen_lowpart because of validate_subreg fails.
Be Careful, It may regresses some other backend.
>
> The below test are passed for this patch:
>
> * The X86 bootstrap test.
> * The fully riscv regression tests.
>
> gcc/ChangeLog:
>
> * emit-rtl.cc (validate_subreg): Bypass register size check
> if the mode is vector.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-fre-44.c: Add ftree-vectorize to trigger
> the ICE.
> * gcc.target/riscv/rvv/base/bug-6.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/emit-rtl.cc   |  3 ++-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c|  2 +-
>  .../gcc.target/riscv/rvv/base/bug-6.c | 22 +++
>  3 files changed, 25 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 1856fa4884f..45c6301b487 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -934,7 +934,8 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  ;
>/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>   is the culprit here, and not the backends.  */
> -  else if (known_ge (osize, regsize) && known_ge (isize, osize))
> +  else if (known_ge (isize, osize) && (known_ge (osize, regsize)
> +|| (VECTOR_MODE_P (imode) || VECTOR_MODE_P (omode
>  ;
>/* Allow component subregs of complex and vector.  Though given the below
>   extraction rules, it's not always clear what that means.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> index f79b4c142ae..624a00a4f32 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-fre1" } */
> +/* { dg-options "-O -fdump-tree-fre1 -O3 -ftree-vectorize" } */
>
>  struct A { float x, y; };
>  struct B { struct A u; };
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> new file mode 100644
> index 000..5bb00b8f587
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
> @@ -0,0 +1,22 @@
> +/* Test that we do not have ice when compile */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
> +
> +struct A { float x, y; };
> +struct B { struct A u; };
> +
> +extern void bar (struct A *);
> +
> +float
> +f3 (struct B *x, int y)
> +{
> +  struct A p = {1.0f, 2.0f};
> +  struct A *q = &x[y].u;
> +
> +  __builtin_memcpy (&q->x, &p.x, sizeof (float));
> +  __builtin_memcpy (&q->y, &p.y, sizeof (float));
> +
> +  bar (&p);
> +
> +  return x[y].u.x + x[y].u.y;
> +}
> --
> 2.34.1
>


-- 
BR,
Hongtao

[PATCH] fwprop: Avoid volatile defines to be propagated

2024-02-25 Thread HAO CHEN GUI

Hi,
  This patch tries to fix a potential problem which is raised by the patch
for PR111267. The volatile asm operand tries to be propagated to a single
set insn with the patch for PR111267. It has potential risk as the behavior
is wrong. Currently set_src_cost comparison can reject such propagation.
But the propagation might be taken after replacing set_src_cost with insn
cost. Actually I found the problem in testing my patch which replacing
et_src_cost with insn cost for fwprop.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
fwprop: Avoid volatile defines to be propagated

The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f)
which introduces an exception for propagation on single set insn.  The
propagation which might not be profitable (checked by profitable_p) is still
allowed to be propagated to single set insn.  It has a potential problem
that a volatile asm operand will try to be propagated to a single set insn.
The volatile asm operand is originally banned in profitable_p.  This patch
fixes the problem by skipping volatile set source in define set finding.

gcc/
* fwprop.cc (forward_propagate_into): Return false for volatile set
source.

gcc/testsuite/
* gcc.target/powerpc/fwprop-1.c: New.

patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 7872609b336..89dce88b43d 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = 
false)

   rtx dest = SET_DEST (def_set);
   rtx src = SET_SRC (def_set);
+  if (volatile_insn_p (src))
+return false;

   /* Allow propagations into a loop only for reg-to-reg copies, since
  replacing one register by another shouldn't increase the cost.
diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c 
b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
new file mode 100644
index 000..07b207f980c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */
+/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */
+
+/* Verify that volatile asm operands doesn't try to be propagated.  */
+long long foo ()
+{
+  long long res;
+  __asm__ __volatile__(
+""
+  : "=r" (res)
+  :
+  : "memory");
+  return res;
+}

[PATCH v1] RTL: Bugfix ICE after allow vector type in DSE

2024-02-25 Thread pan2 . li

From: Pan Li 

We allowed vector type for get_stored_val when read is less than or
equal to store in previous.  Unfortunately, we missed to adjust the
validate_subreg part accordingly.  For vector type, we don't need to
restrict the mode size is greater than the vector register size.

Thus, for example when gen_lowpart from E_V2SFmode to E_V4QImode, it
will have NULL_RTX(of course ICE after that) because of the mode size
is less than vector register size.  That also explain that gen_lowpart
from E_V8SFmode to E_V16QImode is valid here.

This patch would like to remove the the restriction for vector mode, to
rid of the ICE when gen_lowpart because of validate_subreg fails.

The below test are passed for this patch:

* The X86 bootstrap test.
* The fully riscv regression tests.

gcc/ChangeLog:

* emit-rtl.cc (validate_subreg): Bypass register size check
if the mode is vector.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-fre-44.c: Add ftree-vectorize to trigger
the ICE.
* gcc.target/riscv/rvv/base/bug-6.c: New test.

Signed-off-by: Pan Li 
---
 gcc/emit-rtl.cc   |  3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c|  2 +-
 .../gcc.target/riscv/rvv/base/bug-6.c | 22 +++
 3 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 1856fa4884f..45c6301b487 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -934,7 +934,8 @@ validate_subreg (machine_mode omode, machine_mode imode,
 ;
   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
  is the culprit here, and not the backends.  */
-  else if (known_ge (osize, regsize) && known_ge (isize, osize))
+  else if (known_ge (isize, osize) && (known_ge (osize, regsize)
+|| (VECTOR_MODE_P (imode) || VECTOR_MODE_P (omode
 ;
   /* Allow component subregs of complex and vector.  Though given the below
  extraction rules, it's not always clear what that means.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
index f79b4c142ae..624a00a4f32 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-fre1" } */
+/* { dg-options "-O -fdump-tree-fre1 -O3 -ftree-vectorize" } */
 
 struct A { float x, y; };
 struct B { struct A u; };
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
new file mode 100644
index 000..5bb00b8f587
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
@@ -0,0 +1,22 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
+
+struct A { float x, y; };
+struct B { struct A u; };
+
+extern void bar (struct A *);
+
+float
+f3 (struct B *x, int y)
+{
+  struct A p = {1.0f, 2.0f};
+  struct A *q = &x[y].u;
+
+  __builtin_memcpy (&q->x, &p.x, sizeof (float));
+  __builtin_memcpy (&q->y, &p.y, sizeof (float));
+
+  bar (&p);
+
+  return x[y].u.x + x[y].u.y;
+}
-- 
2.34.1

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongtao Liu

On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu  wrote:
>
> On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu  wrote:
> >
> > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu  wrote:
> > >
> > > ldtilecfg and sttilecfg take a 512-byte memory block.  With
> > > _tile_loadconfig implemented as
> > >
> > > extern __inline void
> > > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > _tile_loadconfig (const void *__config)
> > > {
> > >   __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
> > > }
> > >
> > > GCC sees:
> > >
> > > (parallel [
> > >   (asm_operands/v ("ldtilecfg   %X0") ("") 0
> > >[(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
> > >  (const_int -64 [0xffc0])) [1 
> > > MEM[(const void * *)&tile_data]+0 S8 A128])]
> > >[(asm_input:DI ("m"))]
> > >(clobber (reg:CC 17 flags))])
> > >
> > > and the memory operand size is 1 byte.  As the result, the rest of 511
> > > bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
> > > with a pointer to BLKmode to honor the 512-byte memory block.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/114098
> > > * config/i386/amxtileintrin.h (_tile_loadconfig): Use
> > > __builtin_ia32_ldtilecfg.
> > > (_tile_storeconfig): Use __builtin_ia32_sttilecfg.
> > > * config/i386/i386-builtin.def (BDESC): Add
> > > __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
> > > * config/i386/i386-expand.cc (ix86_expand_builtin): Handle
> > > IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
> > > * config/i386/i386.md (ldtilecfg): New pattern.
> > > (sttilecfg): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/114098
> > > * gcc.target/i386/amxtile-4.c: New test.
> > > ---
> > >  gcc/config/i386/amxtileintrin.h   |  4 +-
> > >  gcc/config/i386/i386-builtin.def  |  4 ++
> > >  gcc/config/i386/i386-expand.cc| 19 
> > >  gcc/config/i386/i386.md   | 24 ++
> > >  gcc/testsuite/gcc.target/i386/amxtile-4.c | 55 +++
> > >  5 files changed, 104 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-4.c
> > >
> > > diff --git a/gcc/config/i386/amxtileintrin.h 
> > > b/gcc/config/i386/amxtileintrin.h
> > > index d1a26e0fea5..5081b326498 100644
> > > --- a/gcc/config/i386/amxtileintrin.h
> > > +++ b/gcc/config/i386/amxtileintrin.h
> > > @@ -39,14 +39,14 @@ extern __inline void
> > >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > >  _tile_loadconfig (const void *__config)
> > >  {
> > > -  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void 
> > > **)__config)));
> > > +  __builtin_ia32_ldtilecfg (__config);
> > >  }
> > >
> > >  extern __inline void
> > >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > >  _tile_storeconfig (void *__config)
> > >  {
> > > -  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config)));
> > > +  __builtin_ia32_sttilecfg (__config);
> > >  }
> > >
> > >  extern __inline void
> > > diff --git a/gcc/config/i386/i386-builtin.def 
> > > b/gcc/config/i386/i386-builtin.def
> > > index 729355230b8..88dd7f8857f 100644
> > > --- a/gcc/config/i386/i386-builtin.def
> > > +++ b/gcc/config/i386/i386-builtin.def
> > > @@ -126,6 +126,10 @@ BDESC (OPTION_MASK_ISA_XSAVES | 
> > > OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, "__b
> > >  BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, 
> > > CODE_FOR_nothing, "__builtin_ia32_xrstors64", IX86_BUILTIN_XRSTORS64, 
> > > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> > >  BDESC (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_64BIT, 0, 
> > > CODE_FOR_nothing, "__builtin_ia32_xsavec64", IX86_BUILTIN_XSAVEC64, 
> > > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> > >
> > > +/* LDFILECFG and STFILECFG.  */
> > > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > > CODE_FOR_ldtilecfg, "__builtin_ia32_ldtilecfg", IX86_BUILTIN_LDTILECFG, 
> > > UNKNOWN, (int) VOID_FTYPE_PCVOID)
> > > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > > CODE_FOR_ldtilecfg, "__builtin_ia32_sttilecfg", IX86_BUILTIN_STTILECFG, 
> > > UNKNOWN, (int) VOID_FTYPE_PVOID)
> > CODE_FOR_sttilecfg.
>
> It is unused.  I changed both to CODE_FOR_nothing.
>
> > > +
> > >  /* SSE */
> > >  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_movv4sf_internal, 
> > > "__builtin_ia32_storeups", IX86_BUILTIN_STOREUPS, UNKNOWN, (int) 
> > > VOID_FTYPE_PFLOAT_V4SF)
> > >  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movntv4sf, 
> > > "__builtin_ia32_movntps", IX86_BUILTIN_MOVNTPS, UNKNOWN, (int) 
> > > VOID_FTYPE_PFLOAT_V4SF)
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index a4d3369f01b..17993eb837f 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -14152,6 +14152,25 @@ ix86_expand_built

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread H.J. Lu

On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu  wrote:
>
> On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu  wrote:
> >
> > ldtilecfg and sttilecfg take a 512-byte memory block.  With
> > _tile_loadconfig implemented as
> >
> > extern __inline void
> > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > _tile_loadconfig (const void *__config)
> > {
> >   __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
> > }
> >
> > GCC sees:
> >
> > (parallel [
> >   (asm_operands/v ("ldtilecfg   %X0") ("") 0
> >[(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
> >  (const_int -64 [0xffc0])) [1 
> > MEM[(const void * *)&tile_data]+0 S8 A128])]
> >[(asm_input:DI ("m"))]
> >(clobber (reg:CC 17 flags))])
> >
> > and the memory operand size is 1 byte.  As the result, the rest of 511
> > bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
> > with a pointer to BLKmode to honor the 512-byte memory block.
> >
> > gcc/ChangeLog:
> >
> > PR target/114098
> > * config/i386/amxtileintrin.h (_tile_loadconfig): Use
> > __builtin_ia32_ldtilecfg.
> > (_tile_storeconfig): Use __builtin_ia32_sttilecfg.
> > * config/i386/i386-builtin.def (BDESC): Add
> > __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
> > * config/i386/i386-expand.cc (ix86_expand_builtin): Handle
> > IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
> > * config/i386/i386.md (ldtilecfg): New pattern.
> > (sttilecfg): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/114098
> > * gcc.target/i386/amxtile-4.c: New test.
> > ---
> >  gcc/config/i386/amxtileintrin.h   |  4 +-
> >  gcc/config/i386/i386-builtin.def  |  4 ++
> >  gcc/config/i386/i386-expand.cc| 19 
> >  gcc/config/i386/i386.md   | 24 ++
> >  gcc/testsuite/gcc.target/i386/amxtile-4.c | 55 +++
> >  5 files changed, 104 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-4.c
> >
> > diff --git a/gcc/config/i386/amxtileintrin.h 
> > b/gcc/config/i386/amxtileintrin.h
> > index d1a26e0fea5..5081b326498 100644
> > --- a/gcc/config/i386/amxtileintrin.h
> > +++ b/gcc/config/i386/amxtileintrin.h
> > @@ -39,14 +39,14 @@ extern __inline void
> >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> >  _tile_loadconfig (const void *__config)
> >  {
> > -  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
> > +  __builtin_ia32_ldtilecfg (__config);
> >  }
> >
> >  extern __inline void
> >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> >  _tile_storeconfig (void *__config)
> >  {
> > -  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config)));
> > +  __builtin_ia32_sttilecfg (__config);
> >  }
> >
> >  extern __inline void
> > diff --git a/gcc/config/i386/i386-builtin.def 
> > b/gcc/config/i386/i386-builtin.def
> > index 729355230b8..88dd7f8857f 100644
> > --- a/gcc/config/i386/i386-builtin.def
> > +++ b/gcc/config/i386/i386-builtin.def
> > @@ -126,6 +126,10 @@ BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 
> > 0, CODE_FOR_nothing, "__b
> >  BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, 
> > CODE_FOR_nothing, "__builtin_ia32_xrstors64", IX86_BUILTIN_XRSTORS64, 
> > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> >  BDESC (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_64BIT, 0, 
> > CODE_FOR_nothing, "__builtin_ia32_xsavec64", IX86_BUILTIN_XSAVEC64, 
> > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> >
> > +/* LDFILECFG and STFILECFG.  */
> > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > CODE_FOR_ldtilecfg, "__builtin_ia32_ldtilecfg", IX86_BUILTIN_LDTILECFG, 
> > UNKNOWN, (int) VOID_FTYPE_PCVOID)
> > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > CODE_FOR_ldtilecfg, "__builtin_ia32_sttilecfg", IX86_BUILTIN_STTILECFG, 
> > UNKNOWN, (int) VOID_FTYPE_PVOID)
> CODE_FOR_sttilecfg.

It is unused.  I changed both to CODE_FOR_nothing.

> > +
> >  /* SSE */
> >  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_movv4sf_internal, 
> > "__builtin_ia32_storeups", IX86_BUILTIN_STOREUPS, UNKNOWN, (int) 
> > VOID_FTYPE_PFLOAT_V4SF)
> >  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movntv4sf, 
> > "__builtin_ia32_movntps", IX86_BUILTIN_MOVNTPS, UNKNOWN, (int) 
> > VOID_FTYPE_PFLOAT_V4SF)
> > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index a4d3369f01b..17993eb837f 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -14152,6 +14152,25 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
> > subtarget,
> > emit_insn (pat);
> >return 0;
> >
> > +case IX86_BUILTIN_LDTILECFG:
> > +case IX86_BUILTIN_STTILECFG:
> > +  arg0 = CALL_EXPR_ARG (exp, 0);
> > +  op0 = expand_normal (arg0);
> > +
> > +  if (!address_oper

[Patch, rs6000] Enable overlap memory store for block memory clear

2024-02-25 Thread HAO CHEN GUI

Hi,
  This patch enables overlap memory store for block memory clear which
saves the number of store instructions. The expander calls
widest_fixed_size_mode_for_block_clear to get the mode for looped block
clear and calls widest_fixed_size_mode_for_block_clear to get the mode
for last overlapped clear.

Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk or next stage 1?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable overlap memory store for block memory clear

gcc/
* config/rs6000/rs6000-string.cc
(widest_fixed_size_mode_for_block_clear): New.
(smallest_fixed_size_mode_for_block_clear): New.
(expand_block_clear): Call widest_fixed_size_mode_for_block_clear to
get the mode for looped memory stores and call
smallest_fixed_size_mode_for_block_clear to get the mode for the last
overlapped memory store.

gcc/testsuite
* gcc.target/powerpc/block-clear-1.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 133e5382af2..c2a6095a586 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -38,6 +38,49 @@
 #include "profile-count.h"
 #include "predict.h"

+/* Return the widest mode which mode size is less than or equal to the
+   size.  */
+static fixed_size_mode
+widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int align,
+   bool unaligned_vsx_ok)
+{
+  machine_mode mode;
+
+  if (TARGET_ALTIVEC
+  && size >= 16
+  && (align >= 128
+ || unaligned_vsx_ok))
+mode = V4SImode;
+  else if (size >= 8
+  && TARGET_POWERPC64
+  && (align >= 64
+  || !STRICT_ALIGNMENT))
+mode = DImode;
+  else if (size >= 4
+  && (align >= 32
+  || !STRICT_ALIGNMENT))
+mode = SImode;
+  else if (size >= 2
+  && (align >= 16
+  || !STRICT_ALIGNMENT))
+mode = HImode;
+  else
+mode = QImode;
+
+  return as_a  (mode);
+}
+
+/* Return the smallest mode which mode size is smaller than or eqaul to
+   the size.  */
+static fixed_size_mode
+smallest_fixed_size_mode_for_block_clear (unsigned int size)
+{
+  if (size > UNITS_PER_WORD)
+return as_a  (V4SImode);
+
+  return smallest_int_mode_for_size (size * BITS_PER_UNIT);
+}
+
 /* Expand a block clear operation, and return 1 if successful.  Return 0
if we should let the compiler generate normal code.

@@ -55,7 +98,6 @@ expand_block_clear (rtx operands[])
   HOST_WIDE_INT align;
   HOST_WIDE_INT bytes;
   int offset;
-  int clear_bytes;
   int clear_step;

   /* If this is not a fixed size move, just call memcpy */
@@ -89,62 +131,36 @@ expand_block_clear (rtx operands[])

   bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX);

-  for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
+  auto mode = widest_fixed_size_mode_for_block_clear (bytes, align,
+ unaligned_vsx_ok);
+  offset = 0;
+  rtx dest;
+
+  do
 {
-  machine_mode mode = BLKmode;
-  rtx dest;
+  unsigned int size = GET_MODE_SIZE (mode);

-  if (TARGET_ALTIVEC
- && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok)))
+  while (bytes >= size)
{
- clear_bytes = 16;
- mode = V4SImode;
-   }
-  else if (bytes >= 8 && TARGET_POWERPC64
-  && (align >= 64 || !STRICT_ALIGNMENT))
-   {
- clear_bytes = 8;
- mode = DImode;
- if (offset == 0 && align < 64)
-   {
- rtx addr;
+ dest = adjust_address (orig_dest, mode, offset);
+ emit_move_insn (dest, CONST0_RTX (mode));

- /* If the address form is reg+offset with offset not a
-multiple of four, reload into reg indirect form here
-rather than waiting for reload.  This way we get one
-reload, not one per store.  */
- addr = XEXP (orig_dest, 0);
- if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
- && CONST_INT_P (XEXP (addr, 1))
- && (INTVAL (XEXP (addr, 1)) & 3) != 0)
-   {
- addr = copy_addr_to_reg (addr);
- orig_dest = replace_equiv_address (orig_dest, addr);
-   }
-   }
-   }
-  else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT))
-   {   /* move 4 bytes */
- clear_bytes = 4;
- mode = SImode;
-   }
-  else if (bytes >= 2 && (align >= 16 || !STRICT_ALIGNMENT))
-   {   /* move 2 bytes */
- clear_bytes = 2;
- mode = HImode;
-   }
-  else /* move 1 byte at a time */
-   {
- clear_bytes = 1;
- mode = QImode;
+ offset += size;
+ bytes -= size;
}

-  dest = adjust_

Re: [PATCH] testsuite: Fix up lra effective target

2024-02-25 Thread Hans-Peter Nilsson

> Date: Fri, 16 Feb 2024 11:16:22 +0100
> From: Jakub Jelinek 

> Given the recent discussions on IRC started with Andrew P. mentioning that
> an asm goto outputs test should have { target lra } and the lra effective
> target in GCC 11/12 only returning 0 for PA and in 13/14 for PA/AVR, while
> we clearly have 14 other targets which don't support LRA and a couple of
> further ones which have an -mlra/-mno-lra switch (whatever default they
> have), seems to me the effective target is quite broken.

Definitely, good riddance to that list.

I suggested a little over a year ago to generalize
check_effective_target_lra to get rid of that flawed target
list but was effectively shut down with a review request
that'd *keep* the faulty non-lra target list. :-(
"https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611531.html";

TL;DR: I based LRA-ness on EBB being scanned in LRA but not
for reload (same empty foo), i.e. matching the string "EBB 2
3".  I don't know which method more stable, but that didn't
require -O2 nor -fdump-rtl-reload-details.

Having said that, I'm glad there's now a generic, working
(non-target-list-dependent) effective_target lra.

brgds, H-P

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongtao Liu

On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu  wrote:
>
> ldtilecfg and sttilecfg take a 512-byte memory block.  With
> _tile_loadconfig implemented as
>
> extern __inline void
> __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> _tile_loadconfig (const void *__config)
> {
>   __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
> }
>
> GCC sees:
>
> (parallel [
>   (asm_operands/v ("ldtilecfg   %X0") ("") 0
>[(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
>  (const_int -64 [0xffc0])) [1 MEM[(const 
> void * *)&tile_data]+0 S8 A128])]
>[(asm_input:DI ("m"))]
>(clobber (reg:CC 17 flags))])
>
> and the memory operand size is 1 byte.  As the result, the rest of 511
> bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
> with a pointer to BLKmode to honor the 512-byte memory block.
>
> gcc/ChangeLog:
>
> PR target/114098
> * config/i386/amxtileintrin.h (_tile_loadconfig): Use
> __builtin_ia32_ldtilecfg.
> (_tile_storeconfig): Use __builtin_ia32_sttilecfg.
> * config/i386/i386-builtin.def (BDESC): Add
> __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
> * config/i386/i386-expand.cc (ix86_expand_builtin): Handle
> IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
> * config/i386/i386.md (ldtilecfg): New pattern.
> (sttilecfg): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/114098
> * gcc.target/i386/amxtile-4.c: New test.
> ---
>  gcc/config/i386/amxtileintrin.h   |  4 +-
>  gcc/config/i386/i386-builtin.def  |  4 ++
>  gcc/config/i386/i386-expand.cc| 19 
>  gcc/config/i386/i386.md   | 24 ++
>  gcc/testsuite/gcc.target/i386/amxtile-4.c | 55 +++
>  5 files changed, 104 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-4.c
>
> diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
> index d1a26e0fea5..5081b326498 100644
> --- a/gcc/config/i386/amxtileintrin.h
> +++ b/gcc/config/i386/amxtileintrin.h
> @@ -39,14 +39,14 @@ extern __inline void
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _tile_loadconfig (const void *__config)
>  {
> -  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
> +  __builtin_ia32_ldtilecfg (__config);
>  }
>
>  extern __inline void
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _tile_storeconfig (void *__config)
>  {
> -  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config)));
> +  __builtin_ia32_sttilecfg (__config);
>  }
>
>  extern __inline void
> diff --git a/gcc/config/i386/i386-builtin.def 
> b/gcc/config/i386/i386-builtin.def
> index 729355230b8..88dd7f8857f 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -126,6 +126,10 @@ BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 
> 0, CODE_FOR_nothing, "__b
>  BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
> "__builtin_ia32_xrstors64", IX86_BUILTIN_XRSTORS64, UNKNOWN, (int) 
> VOID_FTYPE_PVOID_INT64)
>  BDESC (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
> "__builtin_ia32_xsavec64", IX86_BUILTIN_XSAVEC64, UNKNOWN, (int) 
> VOID_FTYPE_PVOID_INT64)
>
> +/* LDFILECFG and STFILECFG.  */
> +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, CODE_FOR_ldtilecfg, 
> "__builtin_ia32_ldtilecfg", IX86_BUILTIN_LDTILECFG, UNKNOWN, (int) 
> VOID_FTYPE_PCVOID)
> +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, CODE_FOR_ldtilecfg, 
> "__builtin_ia32_sttilecfg", IX86_BUILTIN_STTILECFG, UNKNOWN, (int) 
> VOID_FTYPE_PVOID)
CODE_FOR_sttilecfg.
> +
>  /* SSE */
>  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_movv4sf_internal, 
> "__builtin_ia32_storeups", IX86_BUILTIN_STOREUPS, UNKNOWN, (int) 
> VOID_FTYPE_PFLOAT_V4SF)
>  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movntv4sf, 
> "__builtin_ia32_movntps", IX86_BUILTIN_MOVNTPS, UNKNOWN, (int) 
> VOID_FTYPE_PFLOAT_V4SF)
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index a4d3369f01b..17993eb837f 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -14152,6 +14152,25 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
> subtarget,
> emit_insn (pat);
>return 0;
>
> +case IX86_BUILTIN_LDTILECFG:
> +case IX86_BUILTIN_STTILECFG:
> +  arg0 = CALL_EXPR_ARG (exp, 0);
> +  op0 = expand_normal (arg0);
> +
> +  if (!address_operand (op0, VOIDmode))
> +   {
> + op0 = convert_memory_address (Pmode, op0);
> + op0 = copy_addr_to_reg (op0);
> +   }
> +  op0 = gen_rtx_MEM (BLKmode, op0);
maybe we can just use XImode, and adjust the patterns with XI.
> +  if (fcode == IX86_BUILTIN_LDTILECFG)
> +   icode = CODE_FOR_ldtilecfg;
> +  else
>

Re: Ping: Re: [PATCH] libgcc: fix SEH C++ rethrow semantics [PR113337]

2024-02-25 Thread NightStrike

On Wed, Feb 7, 2024 at 4:23 AM Matteo Italia  wrote:
>
> Il 06/02/24 10:17, Jonathan Yong ha scritto:
> > On 2/6/24 05:31, NightStrike wrote:
> >> On Mon, Feb 5, 2024, 06:53 Matteo Italia  wrote:
> >>
> >>> Il 31/01/24 04:24, LIU Hao ha scritto:
>  在 2024-01-31 08:08, Jonathan Yong 写道:
> > On 1/24/24 15:17, Matteo Italia wrote:
> >> Ping! That's a one-line fix, and you can find all the details in the
> >> bugzilla entry. Also, I can provide executables built with the
> >> affected toolchains, demonstrating the problem and the fix.
> >>
> >> Thanks,
> >> Matteo
> >>
> >
> > I was away last week. LH, care to comment? Changes look fine to me.
> >
> 
>  The change looks good to me, too.
> 
>  I haven't tested it though. According to a similar construction around
>  'libgcc/unwind.inc:265' it should be that way.
> >>>
> >>> Hello,
> >>>
> >>> thank you for the replies, is there anything else I can do to help push
> >>> this forward?
> >>>
> >>
> >> Remember to mention the pr with the right syntax in the ChangeLog so the
> >> bot adds a comment field. I didn't see it in yours, but I might have
> >> missed
> >> it.
> >>
> >>>
> >>
> >
> > Thanks all, pushed to master branch.
>
> Thanks all :-) do you think this warrants backports? On one hand this is
> a pretty niche feature, and I am probably the first to notice the
> problem in ~12 years since that code was written, OTOH Win64/SEH was not
> super widespread for a long time, and seems like a safe enough change.

It's mostly up to you whether you want to make the patch and test it.

> Also: should I explicitly mark PR113337 as resolved? The bot added the
> reference to the commit, but the PR is still marked as "UNCONFIRMED".

Looks like Jon did that a few days ago.

Re: [PATCH v1 00/13] Add aarch64-w64-mingw32 target

2024-02-25 Thread NightStrike

On Wed, Feb 21, 2024 at 12:48 PM Evgeny Karpov
 wrote:
>
> Hello,
>
> We would like to take your attention to the review of changes for the
> new GCC target, aarch64-w64-mingw32. The new target will be
> supported, tested, added to CI, and maintained by Linaro. This marks
> the first of three planned patch series contributing to the GCC C
> compiler's support for Windows Arm64.
>
> 1. Minimal aarch64-w64-mingw32 C implementation to cross-compile
> hello-world with libgcc for Windows Arm64 using MinGW.
> 2. Extension of the aarch64-w64-mingw32 C implementation to
> cross-compile OpenSSL, OpenBLAS, FFmpeg, and libjpeg-turbo. All
> packages successfully pass tests.
> 3. Addition of call stack support for debugging, resolution of
> optimization issues in the C compiler, and DLL export/import for the
> aarch64-w64-mingw32 target.
>
> This patch series introduces the 1st point, which involves building
> hello-world for the aarch64-w64-mingw32 target. The patch depends on
> the binutils changes for the aarch64-w64-mingw32 target that have
> already been merged.
>
> The binutils should include recent relocation fixes.
> f87eaf8ff3995a5888c6dc4996a20c770e6bcd36
> aarch64: Add new relocations and limit COFF AArch64 relocation offsets
>
> The series is structured in a way to trivially show that it should not
> affect any other targets.

To be clear, because of the refactoring, it will affect x86/x64
Windows targets.  Can you do a testsuite run before and after and see
that it doesn't get worse?  The full testsuite for all languages for
Windows isn't in great shape, but it's not awful.  Some languages,
like Rust and Fortran, have ~10 FAILs.  C and C++ have several
thousand.

In particular, there are quite a few testsuite test FAILs regarding MS
ABI that hopefully do not get worse.

Lastly, I don't think I see in the current patch series where you add
new testsuite coverage for aarch64-specific bits.  I probably missed
it, so feel free to helpfully correct me there :)  I'd be curious to
see how the tests were written to take into account target differences
(using for example the dejagnu feature procs) and other nuances.

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongyu Wang

Thanks for fixing this! Didn't notice that the pointer conversion can
cause this issue...

Was it possible to use local array like

char a[64] = (char *)p
__asm__ volatile ("ldtilecfg\t%X0" :: "m" (a)));

If not, for the two patterns we can use "m" instead of "jm" as APX
supports EGPR extension for AMX.

Re: [PATCH] Fortran: do not evaluate polymorphic functions twice in assignment [PR114012]

2024-02-25 Thread Jerry D


On 2/25/24 12:26 PM, Harald Anlauf wrote:

Dear all,

the attached simple patch fixes an issue where we evaluated
polymorphic functions twice in assignments: once for the _data
component, and once for the _vptr.  Using save_expr prevents
the double evaluation.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?
And a backport to 13-branch after some delay?

Thanks,
Harald



Yes, simple enough. OK.

Thanks,

Jerry

[committed] d: Merge dmd, druntime ceff48bf7d, phobos dcbfbd43a

2024-02-25 Thread Iain Buclaw

Hi,

This patch merges the D front-end and runtime library with upstream dmd
ceff48bf7d, and the standard library with phobos dcbfbd43a.

D front-end changes:

-   Import latest fixes from dmd v2.107.1-rc.1.

D runtime changes:

-   Import latest fixes from druntime v2.107.1-rc.1.

Phobos changes:

-   Import latest fixes from phobos v2.107.1-rc.1.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd ceff48bf7d.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime ceff48bf7d.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES_FREEBSD): Add
core/sys/freebsd/net/if_.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos dcbfbd43a.
---
 gcc/d/dmd/MERGE   |   2 +-
 gcc/d/dmd/arrayop.d   |   2 +-
 gcc/d/dmd/ast_node.h  |   2 +-
 gcc/d/dmd/common/file.d   |  89 ++--
 gcc/d/dmd/common/smallbuffer.d|  30 +-
 gcc/d/dmd/cparse.d| 150 +-
 gcc/d/dmd/dimport.d   | 109 +---
 gcc/d/dmd/dmodule.d   |  32 +-
 gcc/d/dmd/dsymbolsem.d|  97 
 gcc/d/dmd/expression.d|   4 +-
 gcc/d/dmd/expression.h|   2 +-
 gcc/d/dmd/expressionsem.d |  97 
 gcc/d/dmd/func.d  | 394 +-
 gcc/d/dmd/funcsem.d   | 390 ++
 gcc/d/dmd/identifier.h|   2 +-
 gcc/d/dmd/importc.d   |   7 +-
 gcc/d/dmd/mtype.d |   1 -
 gcc/d/dmd/parse.d |  48 +-
 gcc/d/dmd/root/array.h|   3 +-
 gcc/d/dmd/root/bitarray.h |   1 -
 gcc/d/dmd/{root/object.h => rootobject.h} |   6 +-
 gcc/d/dmd/statementsem.d  |   2 +-
 gcc/d/dmd/staticcond.d| 107 
 gcc/d/dmd/template.h  |   2 +-
 .../gdc.test/compilable/imports/defines.c |  25 +
 .../gdc.test/compilable/testdefines.d |  10 +
 .../gdc.test/fail_compilation/warn13679.d |   4 +-
 libphobos/libdruntime/MERGE   |   2 +-
 libphobos/libdruntime/Makefile.am |  22 +-
 libphobos/libdruntime/Makefile.in |  31 +-
 .../libdruntime/core/sys/freebsd/ifaddrs.d|   3 +-
 .../libdruntime/core/sys/freebsd/net/if_.d| 493 ++
 .../libdruntime/core/sys/linux/sys/socket.d   |   1 -
 libphobos/libdruntime/core/thread/fiber.d |   2 +-
 libphobos/src/MERGE   |   2 +-
 libphobos/src/std/typecons.d  |  35 +-
 36 files changed, 1417 insertions(+), 792 deletions(-)
 rename gcc/d/dmd/{root/object.h => rootobject.h} (91%)
 create mode 100644 libphobos/libdruntime/core/sys/freebsd/net/if_.d

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index 021149aabc7..f11c5fbfb0b 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-9471b25db9ed44d71e0e27956430c0c6a09c16db
+ceff48bf7db05503117f54fdc0cefcb89b711136
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/arrayop.d b/gcc/d/dmd/arrayop.d
index afe6054f4aa..af3875ea6c5 100644
--- a/gcc/d/dmd/arrayop.d
+++ b/gcc/d/dmd/arrayop.d
@@ -22,7 +22,7 @@ import dmd.dsymbol;
 import dmd.errors;
 import dmd.expression;
 import dmd.expressionsem;
-import dmd.func;
+import dmd.funcsem;
 import dmd.hdrgen;
 import dmd.id;
 import dmd.identifier;
diff --git a/gcc/d/dmd/ast_node.h b/gcc/d/dmd/ast_node.h
index a24218a86d0..db8608e7cdd 100644
--- a/gcc/d/dmd/ast_node.h
+++ b/gcc/d/dmd/ast_node.h
@@ -10,7 +10,7 @@
 
 #pragma once
 
-#include "root/object.h"
+#include "rootobject.h"
 
 class Visitor;
 
diff --git a/gcc/d/dmd/common/file.d b/gcc/d/dmd/common/file.d
index 8a284241fc2..80677f66ff8 100644
--- a/gcc/d/dmd/common/file.d
+++ b/gcc/d/dmd/common/file.d
@@ -16,24 +16,37 @@ module dmd.common.file;
 
 import core.stdc.errno : errno;
 import core.stdc.stdio : fprintf, remove, rename, stderr;
-import core.stdc.stdlib : exit;
-import core.stdc.string : strerror, strlen;
-import core.sys.windows.winbase;
-import core.sys.windows.winnt;
-import core.sys.posix.fcntl;
-import core.sys.posix.unistd;
+import core.stdc.stdlib;
+import core.stdc.string : strerror, strlen, memcpy;
 
 import dmd.common.smallbuffer;
 
-nothrow:
-
 version (Windows)
 {
+import core.sys.windows.winbase;
 import core.sys.windows.winnls : CP_ACP;
+import core.sys.windows.winnt;
+
+enum CodePage = CP_ACP; // assume filenames encoded in system default 
Windows ANSI code page
+enum invalidHandle = INVALID_HANDLE_VALUE;
+}
+else version (Posix)
+{
+

Re: [patch, libgfortran] PR105456 Child I/O does not propage iostat

2024-02-25 Thread Jerry D


On 2/25/24 12:34 PM, Harald Anlauf wrote:

Hi Jerry,

On 2/22/24 20:11, Jerry D wrote:

Hi all,

The attached fix adds a check for an error condition from a UDDTIO
procedure in the case where there is no actual underlying error, but the
user defines an error by setting the iostat variable manually before
returning to the parent READ.


the libgfortran fix LGTM.

Regarding the testcase code, the following looks like you left some
debugging code in it:

+  rewind (10)
+  read (10,*) x
+  print *, myerror, mymessage
+  write (*,'(10(A))') "Read: '",x%ch,"'"


--- snip ---

I cleaned up the test case. Thanks for review.

The master branch has been updated by Jerry DeLisle :

https://gcc.gnu.org/g:3f58f96a4e8255e222953f9856bcd6c25f7b33cd

Regards,

Jerry

Re: [PATCH v1 02/13] aarch64: The aarch64-w64-mingw32 target implements

2024-02-25 Thread Mark Harmstone


On 23/2/24 17:54, Andrew Pinski wrote:

There is arm64ec ABI defined for aarch64 windows which is a different
ABI from the standard windows aarch64 ABI, though I am not sure if it
supported with the patches here.
It is documented at
https://learn.microsoft.com/en-us/cpp/build/arm64ec-windows-abi-conventions?view=msvc-170
.


ARM64EC would also need a lot of work in binutils, and AFAIK no-one's been 
working on that yet.

Mark

[PATCH v2] x86: Check interrupt instead of noreturn attribute

2024-02-25 Thread H.J. Lu

ix86_set_func_type checks noreturn attribute to avoid incompatible
attribute error in LTO1 on interrupt functions.  Since TREE_THIS_VOLATILE
is set also for _Noreturn without noreturn attribute, check interrupt
attribute for interrupt functions instead.

gcc/

PR target/114097
* config/i386/i386-options.cc (ix86_set_func_type): Check
interrupt instead of noreturn attribute.

gcc/testsuite/

PR target/114097
* gcc.target/i386/pr114097-1.c: New test.
---
 gcc/config/i386/i386-options.cc|  8 ---
 gcc/testsuite/gcc.target/i386/pr114097-1.c | 26 ++
 2 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114097-1.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 93a01146db7..1301f6b913e 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3391,11 +3391,13 @@ ix86_set_func_type (tree fndecl)
  into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
  the local-pure-const pass is run after ix86_set_func_type is called.
  When the local-pure-const pass is enabled for LTO, the interrupt
- function is marked as noreturn in the IR output, which leads the
- incompatible attribute error in LTO1.  */
+ function is marked with TREE_THIS_VOLATILE in the IR output, which
+ leads to the incompatible attribute error in LTO1.  Ignore the
+ interrupt function in this case.  */
   bool has_no_callee_saved_registers
 = ((TREE_THIS_VOLATILE (fndecl)
-   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
+   && !lookup_attribute ("interrupt",
+ TYPE_ATTRIBUTES (TREE_TYPE (fndecl)))
&& optimize
&& !optimize_debug
&& (TREE_NOTHROW (fndecl) || !flag_exceptions))
diff --git a/gcc/testsuite/gcc.target/i386/pr114097-1.c 
b/gcc/testsuite/gcc.target/i386/pr114097-1.c
new file mode 100644
index 000..b14c7b6214d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114097-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move 
-fomit-frame-pointer" } */
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+_Noreturn
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+for (j = ARRAY_SIZE; j > 0; --j)
+  for (k = ARRAY_SIZE; k > 0; --k)
+   array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
-- 
2.43.2

Re: [PATCH v1 05/13] Reuse MinGW from i386 for AArch64

2024-02-25 Thread Mark Harmstone


On 22/2/24 11:11, Richard Earnshaw (lists) wrote:

Most of the free world has left COFF behind since several decades, so I won't 
comment on that. YMMV.

This isn't helpful.  Windows platforms use (a derivative of) COFF, so that's 
what the tools need to use when targetting that platform.


Also, there are relocation types needed for Windows programs that are supported 
in COFF but not in ELF object files.

Mark

[PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread H.J. Lu

ldtilecfg and sttilecfg take a 512-byte memory block.  With
_tile_loadconfig implemented as

extern __inline void
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_tile_loadconfig (const void *__config)
{
  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
}

GCC sees:

(parallel [
  (asm_operands/v ("ldtilecfg   %X0") ("") 0
   [(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
 (const_int -64 [0xffc0])) [1 MEM[(const 
void * *)&tile_data]+0 S8 A128])]
   [(asm_input:DI ("m"))]
   (clobber (reg:CC 17 flags))])

and the memory operand size is 1 byte.  As the result, the rest of 511
bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
with a pointer to BLKmode to honor the 512-byte memory block.

gcc/ChangeLog:

PR target/114098
* config/i386/amxtileintrin.h (_tile_loadconfig): Use
__builtin_ia32_ldtilecfg.
(_tile_storeconfig): Use __builtin_ia32_sttilecfg.
* config/i386/i386-builtin.def (BDESC): Add
__builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
* config/i386/i386-expand.cc (ix86_expand_builtin): Handle
IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
* config/i386/i386.md (ldtilecfg): New pattern.
(sttilecfg): Likewise.

gcc/testsuite/ChangeLog:

PR target/114098
* gcc.target/i386/amxtile-4.c: New test.
---
 gcc/config/i386/amxtileintrin.h   |  4 +-
 gcc/config/i386/i386-builtin.def  |  4 ++
 gcc/config/i386/i386-expand.cc| 19 
 gcc/config/i386/i386.md   | 24 ++
 gcc/testsuite/gcc.target/i386/amxtile-4.c | 55 +++
 5 files changed, 104 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-4.c

diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
index d1a26e0fea5..5081b326498 100644
--- a/gcc/config/i386/amxtileintrin.h
+++ b/gcc/config/i386/amxtileintrin.h
@@ -39,14 +39,14 @@ extern __inline void
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _tile_loadconfig (const void *__config)
 {
-  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
+  __builtin_ia32_ldtilecfg (__config);
 }
 
 extern __inline void
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _tile_storeconfig (void *__config)
 {
-  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config)));
+  __builtin_ia32_sttilecfg (__config);
 }
 
 extern __inline void
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 729355230b8..88dd7f8857f 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -126,6 +126,10 @@ BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, 
CODE_FOR_nothing, "__b
 BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
"__builtin_ia32_xrstors64", IX86_BUILTIN_XRSTORS64, UNKNOWN, (int) 
VOID_FTYPE_PVOID_INT64)
 BDESC (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
"__builtin_ia32_xsavec64", IX86_BUILTIN_XSAVEC64, UNKNOWN, (int) 
VOID_FTYPE_PVOID_INT64)
 
+/* LDFILECFG and STFILECFG.  */
+BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, CODE_FOR_ldtilecfg, 
"__builtin_ia32_ldtilecfg", IX86_BUILTIN_LDTILECFG, UNKNOWN, (int) 
VOID_FTYPE_PCVOID)
+BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, CODE_FOR_ldtilecfg, 
"__builtin_ia32_sttilecfg", IX86_BUILTIN_STTILECFG, UNKNOWN, (int) 
VOID_FTYPE_PVOID)
+
 /* SSE */
 BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_movv4sf_internal, 
"__builtin_ia32_storeups", IX86_BUILTIN_STOREUPS, UNKNOWN, (int) 
VOID_FTYPE_PFLOAT_V4SF)
 BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_movntv4sf, 
"__builtin_ia32_movntps", IX86_BUILTIN_MOVNTPS, UNKNOWN, (int) 
VOID_FTYPE_PFLOAT_V4SF)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a4d3369f01b..17993eb837f 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -14152,6 +14152,25 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
emit_insn (pat);
   return 0;
 
+case IX86_BUILTIN_LDTILECFG:
+case IX86_BUILTIN_STTILECFG:
+  arg0 = CALL_EXPR_ARG (exp, 0);
+  op0 = expand_normal (arg0);
+
+  if (!address_operand (op0, VOIDmode))
+   {
+ op0 = convert_memory_address (Pmode, op0);
+ op0 = copy_addr_to_reg (op0);
+   }
+  op0 = gen_rtx_MEM (BLKmode, op0);
+  if (fcode == IX86_BUILTIN_LDTILECFG)
+   icode = CODE_FOR_ldtilecfg;
+  else
+   icode = CODE_FOR_sttilecfg;
+  pat = GEN_FCN (icode) (op0);
+  emit_insn (pat);
+  return 0;
+
 case IX86_BUILTIN_LLWPCB:
   arg0 = CALL_EXPR_ARG (exp, 0);
   op0 = expand_normal (arg0);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6a26d966a0e..0ede6adac2f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/confi

[wwwdocs] Add Ada's GCC 14 changelog entry

2024-02-25 Thread Fernando Oleo Blanco

Dear all,

just like last year, I would like to commit the changes that took place 
over at GNAT for GCC v14. The patch is attached to the email. Hopefully 
it is good enough to just be added to master. If you see something wrong 
or if you would like to add anything to it, feel free :) Feedback is 
always welcomed.

Best regards,
FerFrom 0ae94649be7f638bb4f98ba3e2ba2e1bf9770c09 Mon Sep 17 00:00:00 2001
From: Fernando Oleo Blanco 
Date: Sun, 25 Feb 2024 21:43:43 +0100
Subject: [PATCH 1/1] Add Ada changes for v14

---
 htdocs/gcc-14/changes.html | 44 +-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 85ccc54d..e6c96c9f 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -171,7 +171,49 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
-
+Ada
+
+
+  Several new aspects and contracts have been implemented:
+
+  Exceptional_Cases may be specified for procedures and
+  functions with side effects; it can be used to list exceptions that might
+  be propagated by the subprogram with side effects in the context of its
+  precondition, and associate them with a specific postcondition. For more
+  information, refer to SPARK 2014 Reference Manual, section 6.1.9.
+  User_Aspect takes an argument that is the name of an
+  aspect defined by a User_Aspect_Definition configuration pragma.
+  Local_Restrictions is used to specify that a particular
+  subprogram does not violate one or more local restrictions, nor can it
+  call a subprogram that is not subject to the same requirements.
+  Side_Effects is equivalent to pragma
+  Side_Effecs.
+  Always_Terminates is a boolean equivalent to pragma
+  Always_Terminates
+  Ghost_Predicate
+
+  
+  The new attributes and contracts have been applied to the relevant parts
+of the Ada library and more code has been proven to be correct.
+  Initial support for the
+  https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/";>CHERI
+  architecture.
+  Support for the LoongArch architecture.
+  Hardening improvements:
+
+  Use of the new -fharden* options. Most
+  notably -fharden-compares,
+  -fharden-conditional-branches and
+  -fharden-control-flow-redundancy.
+  Custom bools with higher Hamming distance.
+  The strub attribute has been added for functions and
+  variables in order to automatically zero-out their stack upon use or
+  return.
+
+  
+  Further clean up and improvements to the GNAT code.
+  Support for vxWorks 7 Cert RTP has been removed.
+
 
 
 
-- 
2.43.2

Re: [PATCH] Fortran - Error compiling PDT Type-bound Procedures [PR82943/86148/86268]

2024-02-25 Thread Alexander Westbrooks

Harald,

Thank you for reviewing my code. I've been doing research and debugging to
investigate the error thrown by Intel and NAG for the deferred parameter in
the dummy variable declaration. I found where the problem was and added the
fix as part of my patch. I've attached the patch as a file, which also
includes your feedback and suggested fixes. I've updated the test case
pdt_37.f03 to check for the POINTER or ALLOCATABLE error as you suggested.

All regression tests pass, including the new ones, after including the fix
for the POINTER or ALLOCATABLE error for CLASS declarations of PDTs when
deferred length parameters are used. This was tested on WSL 2, with Ubuntu
20.04 distro.

Is this okay to push to the trunk?

Thanks,

Alexander Westbrooks


On Sun, Feb 11, 2024 at 2:11 PM Harald Anlauf  wrote:

> Hi Alex,
>
> I've been unable to apply your patch to my local trunk, likely due to
> whitespace issues my newsreader handles differently from your site.
> I see it inline instead of attached.
>
> A few general remarks:
>
> Please follow the general recommendation regarding style if possible,
> see https://www.gnu.org/prep/standards/standards.html#Formatting
> regarding formatting/whitespace use (5.1) and comments (5.2)
>
> Also, when an error message text spans multiple lines, please place the
> whitespace at the end of a line, not at the beginning of the new one:
>
> > +  if ( resolve_bindings_derived->attr.pdt_template &&
> > +   !gfc_pdt_is_instance_of(resolve_bindings_derived,
> > +   CLASS_DATA(me_arg)->ts.u.derived))
> > +{
> > +  gfc_error ("Argument %qs of %qs with PASS(%s) at %L must be of"
> > +" the parametric derived-type %qs", me_arg->name, proc->name,
>
>gfc_error ("Argument %qs of %qs with PASS(%s) at %L must be of "
>   "the parametric derived-type %qs", me_arg->name,
> proc->name,
>
> > +me_arg->name, &where, resolve_bindings_derived->name);
> > +  goto error;
> > +}
>
> The following change is almost unreadable: the lnegthy comment is split
> over three parts and almost hides the code.  Couldn't this be combined
> into one comment before the function?
>
> > diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
> > index fddf68f8398..11f4bac0415 100644
> > --- a/gcc/fortran/symbol.cc
> > +++ b/gcc/fortran/symbol.cc
> > @@ -5172,6 +5172,35 @@ gfc_type_is_extension_of (gfc_symbol *t1,
> gfc_symbol
> > *t2)
> > return gfc_compare_derived_types (t1, t2);
> >   }
> >
> > +/* Check if a parameterized derived type t2 is an instance of a PDT
> > template t1 */
> > +
> > +bool
> > +gfc_pdt_is_instance_of(gfc_symbol *t1, gfc_symbol *t2)
> > +{
> > +  if ( !t1->attr.pdt_template || !t2->attr.pdt_type )
> > +return false;
> > +
> > +  /*
> > +in decl.cc, gfc_get_pdt_instance, a pdt instance is given a 3
> > character prefix "Pdt", followed
> > +by an underscore list of the kind parameters, up to a maximum of 8.
> > +
> > +So to check if a PDT Type corresponds to the template, extract the
> > core derive_type name,
> > +and then see if it is type compatible by name...
> > +
> > +For example:
> > +
> > +Pdtf_2_2 -> extract out the 'f' -> see if the derived type 'f' is
> > compatible with symbol t1
> > +  */
> > +
> > +  // Starting at index 3 of the string in order to skip past the 'Pdt'
> > prefix
> > +  // Also, here the length of the template name is used in order to
> avoid
> > the
> > +  // kind parameter suffixes that are placed at the end of PDT instance
> > names.
> > +  if ( !(strncmp(&(t2->name[3]), t1->name, strlen(t1->name)) == 0) )
> > +return false;
> > +
> > +  return true;
> > +}
> > +
> >
> >   /* Check if two typespecs are type compatible (F03:5.1.1.2):
> >  If ts1 is nonpolymorphic, ts2 must be the same type.
>
> The following testcase tests for errors.  I tried Intel and NAG on it
> after commenting the 'contains' section of the type desclaration.
> Both complained about subroutine deferred_len_param, e.g.
>
> Intel:
> A colon may only be used as a type parameter value in the declaration of
> an object that has the POINTER or ALLOCATABLE attribute.   [THIS]
>  class(param_deriv_type(:)), intent(inout) :: this
>
> NAG:
> Entity THIS of type PARAM_DERIV_TYPE(A=:) has a deferred length type
> parameter but is not a data pointer or allocatable
>
> Do we detect this after your patch?  If the answer is yes,
> can we add another subroutine where we check for this error?
> (the dg-error suggests we only expect assumed len type parameters.)
> If no, maybe add a comment in the testcase that this subroutine
> may need updating later.
>
> > diff --git a/gcc/testsuite/gfortran.dg/pdt_37.f03
> > b/gcc/testsuite/gfortran.dg/pdt_37.f03
> > new file mode 100644
> > index 000..68d376fad25
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/pdt_37.f03
> > @@ -0,0 +1,34 @@
> > +! { dg-do compile }
> > +!
> > +! Tests the fixes for PR82943.
> > +!
> > +! This

Re: [patch, libgfortran] PR105456 Child I/O does not propage iostat

2024-02-25 Thread Harald Anlauf


Hi Jerry,

On 2/22/24 20:11, Jerry D wrote:

Hi all,

The attached fix adds a check for an error condition from a UDDTIO
procedure in the case where there is no actual underlying error, but the
user defines an error by setting the iostat variable manually before
returning to the parent READ.


the libgfortran fix LGTM.

Regarding the testcase code, the following looks like you left some
debugging code in it:

+  rewind (10)
+  read (10,*) x
+  print *, myerror, mymessage
+  write (*,'(10(A))') "Read: '",x%ch,"'"

myerror and mymessage are never set and never tested.

I suggest to either remove them or to enhance the testcase e.g. like

  rewind (10)
  read (10,*,iostat=myerror,iomsg=mymessage) x
  if (myerror /= 42 .or. mymessage /= "The users message") stop 1
  rewind (10)
  read (10,*) x
  write (*,'(10(A))') "Read: '",x%ch,"'"

I'll leave that up to you.


I did not address the case of a formatted WRITE or unformatted
READ/WRITE until I get some feedback on the approach. If this approach
is OK I would like to commit and then do a separate patch for the cases
I just mentioned.


I haven't thought about this long enough, but I do not anything wrong
with your patch.


Feedback appreciated.  Regression tested on x86_64. OK for trunk?


This is OK.

Thanks,
Harald


Jerry

Author: Jerry DeLisle 
Date:   Thu Feb 22 10:48:39 2024 -0800

     libgfortran: Propagate user defined iostat and iomsg.

     PR libfortran/105456

     libgfortran/ChangeLog:

     * io/list_read.c (list_formatted_read_scalar): Add checks
     for the case where a user defines their own error codes
     and error messages and generate the runtime error.

     gcc/testsuite/ChangeLog:

     * gfortran.dg/pr105456.f90: New test.

[PATCH] Fortran: do not evaluate polymorphic functions twice in assignment [PR114012]

2024-02-25 Thread Harald Anlauf

Dear all,

the attached simple patch fixes an issue where we evaluated
polymorphic functions twice in assignments: once for the _data
component, and once for the _vptr.  Using save_expr prevents
the double evaluation.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?
And a backport to 13-branch after some delay?

Thanks,
Harald

From 7a16143448ee21b716b54a94f83f9ee477af1b63 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 25 Feb 2024 21:18:23 +0100
Subject: [PATCH] Fortran: do not evaluate polymorphic functions twice in
 assignment [PR114012]

	PR fortran/114012

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_procedure_call): Evaluate non-trivial
	arguments just once before assigning to an unlimited polymorphic
	dummy variable.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr114012.f90: New test.
---
 gcc/fortran/trans-expr.cc  |  4 ++
 gcc/testsuite/gfortran.dg/pr114012.f90 | 81 ++
 2 files changed, 85 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr114012.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 118dfd7c9b2..d63c304661a 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6691,6 +6691,10 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 			{
 			  tree efield;

+			  /* Evaluate arguments just once.  */
+			  if (e->expr_type != EXPR_VARIABLE)
+parmse.expr = save_expr (parmse.expr);
+
 			  /* Set the _data field.  */
 			  tmp = gfc_class_data_get (var);
 			  efield = fold_convert (TREE_TYPE (tmp),
diff --git a/gcc/testsuite/gfortran.dg/pr114012.f90 b/gcc/testsuite/gfortran.dg/pr114012.f90
new file mode 100644
index 000..9dbb031c664
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr114012.f90
@@ -0,0 +1,81 @@
+! { dg-do run }
+! PR fortran/114012
+!
+! Polymorphic functions were evaluated twice in assignment
+
+program test
+  implicit none
+
+  type :: custom_int
+ integer :: val = 2
+  end type
+
+  interface assignment(=)
+ procedure assign
+  end interface
+  interface operator(-)
+ procedure neg
+  end interface
+
+  type(custom_int) :: i
+  integer  :: count_assign, count_neg
+
+  count_assign = 0
+  count_neg= 0
+
+  i = 1
+  if (count_assign /= 1 .or. count_neg /= 0) stop 1
+
+  i = -i
+  if (count_assign /= 2 .or. count_neg /= 1) stop 2
+  if (i% val /= -1) stop 3
+
+  i = neg(i)
+  if (count_assign /= 3 .or. count_neg /= 2) stop 4
+  if (i% val /=  1) stop 5
+
+  i = (neg(i))
+  if (count_assign /= 4 .or. count_neg /= 3) stop 6
+  if (i% val /= -1) stop 7
+
+  i = - neg(i)
+  if (count_assign /= 5 .or. count_neg /= 5) stop 8
+  if (i% val /= -1) stop 9
+
+contains
+
+  subroutine assign (field, val)
+type(custom_int), intent(out) :: field
+class(*), intent(in) :: val
+
+count_assign = count_assign + 1
+
+select type (val)
+type is (integer)
+!  print *, " in assign(integer)", field%val, val
+   field%val = val
+type is (custom_int)
+!  print *, " in assign(custom)", field%val, val%val
+   field%val = val%val
+class default
+   error stop
+end select
+
+  end subroutine assign
+
+  function neg (input_field) result(output_field)
+type(custom_int), intent(in), target :: input_field
+class(custom_int), allocatable :: output_field
+allocate (custom_int :: output_field)
+
+count_neg = count_neg + 1
+
+select type (output_field)
+type is (custom_int)
+!  print *, " in neg", output_field%val, input_field%val
+   output_field%val = -input_field%val
+class default
+   error stop
+end select
+  end function neg
+end program test
--
2.35.3

Re: [PATCH v5 RESEND] C, ObjC: Add -Wunterminated-string-initialization

2024-02-25 Thread Alejandro Colomar

Hi Mike, Joseph,

On Sun, Feb 25, 2024 at 10:10:09AM -0800, Mike Stump wrote:
> On Feb 6, 2024, at 2:45 AM, Alejandro Colomar  wrote:
> > 
> > Warn about the following:
> > 
> >char  s[3] = "foo";
> 
> No ObjC specific impact here, so no need for ObjC review.
> 
> As a member of the peanut gallery, I like the patch.
> 
> Joseph, this is been submitted 5 times over the past year.  Any thoughts?

Thanks!  BTW, I'd like to know if I did anything wrong so that it wasn't
reviewed in all this time, or if it's just that everyone was busy doing
other stuff.  Do you prefer if I ping more often?  Or something else?

Have a lovely day!
Alex

-- 

Looking for a remote C programming job at the moment.

signature.asc
Description: PGP signature

[PATCH v1 13/13] Add aarch64-w64-mingw32 target to libgcc

2024-02-25 Thread Evgeny Karpov

The target will be adjusted to aarch64-*-mingw* in config.gcc. This
change will ensure consistency with the target in libgcc.

Regards,
Evgeny


-Original Message-
Thursday, February 22, 2024 2:36 PM
Richard Earnshaw (lists) wrote:

>
+aarch64-*-mingw*)

This doesn't match the glob pattern you added to config.gcc in an earlier 
patch, but see my comment on that.  The two should really be consistent with 
each other or you might get build failures late on.

R.

[PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-25 Thread Evgeny Karpov

Thank you for the historical information regarding the introduction 
of the features. I can confirm that removing the HAVE_GAS_WEAK check
and setting HAVE_GAS_ALIGNED_COMM to 1 by default works well.
These changes will be included in v2.

Regards,
Evgeny


-Original Message-
Thursday, February 22, 2024 2:23 PM
Richard Earnshaw (lists) wrote:

>
+/* GNU as supports weak symbols on PECOFF.  */ #ifdef HAVE_GAS_WEAK

Can't we assume this is true?  It was most likely needed on i386 because 
support goes back longer than the assembler had this feature, but it looks like 
it was added in 2000, or thereabouts, so significantly before aarch64 was 
supported in the assembler.

+#ifndef HAVE_GAS_ALIGNED_COMM

And this was added to GCC in 2009, which probably means it predates 
aarch64-coff support in gas as well.

R.

Re: [PATCH v5 RESEND] C, ObjC: Add -Wunterminated-string-initialization

2024-02-25 Thread Mike Stump

On Feb 6, 2024, at 2:45 AM, Alejandro Colomar  wrote:
> 
> Warn about the following:
> 
>char  s[3] = "foo";

No ObjC specific impact here, so no need for ObjC review.

As a member of the peanut gallery, I like the patch.

Joseph, this is been submitted 5 times over the past year.  Any thoughts?

> Initializing a char array with a string literal of the same length as
> the size of the array is usually a mistake.  Rarely is the case where
> one wants to create a non-terminated character sequence from a string
> literal.
> 
> In some cases, for writing faster code, one may want to use arrays
> instead of pointers, since that removes the need for storing an array of
> pointers apart from the strings themselves.
> 
>char  *log_levels[]   = { "info", "warning", "err" };
> vs.
>char  log_levels[][7] = { "info", "warning", "err" };
> 
> This forces the programmer to specify a size, which might change if a
> new entry is later added.  Having no way to enforce null termination is
> very dangerous, however, so it is useful to have a warning for this, so
> that the compiler can make sure that the programmer didn't make any
> mistakes.  This warning catches the bug above, so that the programmer
> will be able to fix it and write:
> 
>char  log_levels[][8] = { "info", "warning", "err" };
> 
> This warning already existed as part of -Wc++-compat, but this patch
> allows enabling it separately.  It is also included in -Wextra, since
> it may not always be desired (when unterminated character sequences are
> wanted), but it's likely to be desired in most cases.
> 
> Since Wc++-compat now includes this warning, the test has to be modified
> to expect the text of the new warning too, in .
> 
> Link: 
> Link: 
> Link: 
> 
> Acked-by: Doug McIlroy 
> Cc: "G. Branden Robinson" 
> Cc: Ralph Corderoy 
> Cc: Dave Kemper 
> Cc: Larry McVoy 
> Cc: Andrew Pinski 
> Cc: Jonathan Wakely 
> Cc: Andrew Clayton 
> Cc: Martin Uecker 
> Cc: David Malcolm 
> Signed-off-by: Alejandro Colomar 
> ---
> 
> v5:
> 
> -  Fix existing C++-compat tests.  [reported by ]
> 
> 
> gcc/c-family/c.opt | 4 
> gcc/c/c-typeck.cc  | 6 +++---
> gcc/testsuite/gcc.dg/Wcxx-compat-14.c  | 2 +-
> gcc/testsuite/gcc.dg/Wunterminated-string-initialization.c | 6 ++
> 4 files changed, 14 insertions(+), 4 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/Wunterminated-string-initialization.c
> 
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 44b9c862c14..e8f6b836836 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1407,6 +1407,10 @@ Wunsuffixed-float-constants
> C ObjC Var(warn_unsuffixed_float_constants) Warning
> Warn about unsuffixed float constants.
> 
> +Wunterminated-string-initialization
> +C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C 
> ObjC,Wextra || Wc++-compat)
> +Warn about character arrays initialized as unterminated character sequences 
> by a string literal.
> +
> Wunused
> C ObjC C++ ObjC++ LangEnabledBy(C ObjC C++ ObjC++,Wall)
> ; documented in common.opt
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index e55e887da14..7df9de819ed 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -8399,11 +8399,11 @@ digest_init (location_t init_loc, tree type, tree 
> init, tree origtype,
>   pedwarn_init (init_loc, 0,
> ("initializer-string for array of %qT "
>  "is too long"), typ1);
> -   else if (warn_cxx_compat
> +   else if (warn_unterminated_string_initialization
>  && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
> - warning_at (init_loc, OPT_Wc___compat,
> + warning_at (init_loc, OPT_Wunterminated_string_initialization,
>   ("initializer-string for array of %qT "
> -  "is too long for C++"), typ1);
> +  "is too long"), typ1);
> if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
>   {
> unsigned HOST_WIDE_INT size
> diff --git a/gcc/testsuite/gcc.dg/Wcxx-compat-14.c 
> b/gcc/testsuite/gcc.dg/Wcxx-compat-14.c
> index 23783711be6..6df0ee197cc 100644
> --- a/gcc/testsuite/gcc.dg/Wcxx-compat-14.c
> +++ b/gcc/testsuite/gcc.dg/Wcxx-compat-14.c
> @@ -2,5 +2,5 @@
> /* { dg-options "-Wc++-compat" } */
> 
> char a1[] = "a";
> -char a2[1] = "a";/* { dg-warning "C\[+\]\[+\]" } */
> +char a2[1] = "a";/* { dg-warning "initializer-string for array of 'char' 
> is too long" } */
> char a3[2] = "a";
> diff --git a/gcc/testsuite/gcc.dg/Wunterminated-string-initialization.c 
> b/gcc/testsuite/gcc.dg/Wuntermin

Re: [PATCH] x86: Check interrupt instead of noreturn attribute

2024-02-25 Thread H.J. Lu

On Sun, Feb 25, 2024 at 8:54 AM Uros Bizjak  wrote:
>
> On Sun, Feb 25, 2024 at 5:01 PM H.J. Lu  wrote:
> >
> > ix86_set_func_type checks noreturn attribute to avoid incompatible
> > attribute error in LTO1 on interrupt functions.  Since TREE_THIS_VOLATILE
> > is set also for _Noreturn without noreturn attribute, check interrupt
> > attribute for interrupt functions instead.
>
> Please also adjust the comment above the change. The current comment
> even explains why the "noreturn" attribute is checked instead of
> "interrupt" attribute.

How about this?

 NB: Can't use just TREE_THIS_VOLATILE to check if this is a noreturn
 function.  The local-pure-const pass turns an interrupt function
 into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
 the local-pure-const pass is run after ix86_set_func_type is called.
 When the local-pure-const pass is enabled for LTO, the interrupt
 function is marked with TREE_THIS_VOLATILE in the IR output, which
 leads to the incompatible attribute error in LTO1.  Ignore the
 interrupt function in this case.

Thanks.

> Uros.
>
> >
> > gcc/
> >
> > PR target/114097
> > * config/i386/i386-options.cc (ix86_set_func_type): Check
> > interrupt instead of noreturn attribute.
> >
> > gcc/testsuite/
> >
> > PR target/114097
> > * gcc.target/i386/pr114097-1.c: New test.
> > ---
> >  gcc/config/i386/i386-options.cc|  3 ++-
> >  gcc/testsuite/gcc.target/i386/pr114097-1.c | 26 ++
> >  2 files changed, 28 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr114097-1.c
> >
> > diff --git a/gcc/config/i386/i386-options.cc 
> > b/gcc/config/i386/i386-options.cc
> > index 93a01146db7..82fe0d228cd 100644
> > --- a/gcc/config/i386/i386-options.cc
> > +++ b/gcc/config/i386/i386-options.cc
> > @@ -3395,7 +3395,8 @@ ix86_set_func_type (tree fndecl)
> >   incompatible attribute error in LTO1.  */
> >bool has_no_callee_saved_registers
> >  = ((TREE_THIS_VOLATILE (fndecl)
> > -   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
> > +   && !lookup_attribute ("interrupt",
> > + TYPE_ATTRIBUTES (TREE_TYPE (fndecl)))
> > && optimize
> > && !optimize_debug
> > && (TREE_NOTHROW (fndecl) || !flag_exceptions))
> > diff --git a/gcc/testsuite/gcc.target/i386/pr114097-1.c 
> > b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> > new file mode 100644
> > index 000..b14c7b6214d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move 
> > -fomit-frame-pointer" } */
> > +
> > +#define ARRAY_SIZE 256
> > +
> > +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
> > +extern int value (int, int, int)
> > +#ifndef __x86_64__
> > +__attribute__ ((regparm(3)))
> > +#endif
> > +;
> > +
> > +void
> > +_Noreturn
> > +no_return_to_caller (void)
> > +{
> > +  unsigned i, j, k;
> > +  for (i = ARRAY_SIZE; i > 0; --i)
> > +for (j = ARRAY_SIZE; j > 0; --j)
> > +  for (k = ARRAY_SIZE; k > 0; --k)
> > +   array[i - 1][j - 1][k - 1] = value (i, j, k);
> > +  while (1);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "push" } } */
> > +/* { dg-final { scan-assembler-not "pop" } } */
> > --
> > 2.43.2
> >



-- 
H.J.

Re: [PATCH] x86: Check interrupt instead of noreturn attribute

2024-02-25 Thread Uros Bizjak

On Sun, Feb 25, 2024 at 5:01 PM H.J. Lu  wrote:
>
> ix86_set_func_type checks noreturn attribute to avoid incompatible
> attribute error in LTO1 on interrupt functions.  Since TREE_THIS_VOLATILE
> is set also for _Noreturn without noreturn attribute, check interrupt
> attribute for interrupt functions instead.

Please also adjust the comment above the change. The current comment
even explains why the "noreturn" attribute is checked instead of
"interrupt" attribute.

Uros.

>
> gcc/
>
> PR target/114097
> * config/i386/i386-options.cc (ix86_set_func_type): Check
> interrupt instead of noreturn attribute.
>
> gcc/testsuite/
>
> PR target/114097
> * gcc.target/i386/pr114097-1.c: New test.
> ---
>  gcc/config/i386/i386-options.cc|  3 ++-
>  gcc/testsuite/gcc.target/i386/pr114097-1.c | 26 ++
>  2 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr114097-1.c
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 93a01146db7..82fe0d228cd 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3395,7 +3395,8 @@ ix86_set_func_type (tree fndecl)
>   incompatible attribute error in LTO1.  */
>bool has_no_callee_saved_registers
>  = ((TREE_THIS_VOLATILE (fndecl)
> -   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
> +   && !lookup_attribute ("interrupt",
> + TYPE_ATTRIBUTES (TREE_TYPE (fndecl)))
> && optimize
> && !optimize_debug
> && (TREE_NOTHROW (fndecl) || !flag_exceptions))
> diff --git a/gcc/testsuite/gcc.target/i386/pr114097-1.c 
> b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> new file mode 100644
> index 000..b14c7b6214d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move 
> -fomit-frame-pointer" } */
> +
> +#define ARRAY_SIZE 256
> +
> +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
> +extern int value (int, int, int)
> +#ifndef __x86_64__
> +__attribute__ ((regparm(3)))
> +#endif
> +;
> +
> +void
> +_Noreturn
> +no_return_to_caller (void)
> +{
> +  unsigned i, j, k;
> +  for (i = ARRAY_SIZE; i > 0; --i)
> +for (j = ARRAY_SIZE; j > 0; --j)
> +  for (k = ARRAY_SIZE; k > 0; --k)
> +   array[i - 1][j - 1][k - 1] = value (i, j, k);
> +  while (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> --
> 2.43.2
>

[PATCH] x86: Check interrupt instead of noreturn attribute

2024-02-25 Thread H.J. Lu

ix86_set_func_type checks noreturn attribute to avoid incompatible
attribute error in LTO1 on interrupt functions.  Since TREE_THIS_VOLATILE
is set also for _Noreturn without noreturn attribute, check interrupt
attribute for interrupt functions instead.

gcc/

PR target/114097
* config/i386/i386-options.cc (ix86_set_func_type): Check
interrupt instead of noreturn attribute.

gcc/testsuite/

PR target/114097
* gcc.target/i386/pr114097-1.c: New test.
---
 gcc/config/i386/i386-options.cc|  3 ++-
 gcc/testsuite/gcc.target/i386/pr114097-1.c | 26 ++
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114097-1.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 93a01146db7..82fe0d228cd 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3395,7 +3395,8 @@ ix86_set_func_type (tree fndecl)
  incompatible attribute error in LTO1.  */
   bool has_no_callee_saved_registers
 = ((TREE_THIS_VOLATILE (fndecl)
-   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
+   && !lookup_attribute ("interrupt",
+ TYPE_ATTRIBUTES (TREE_TYPE (fndecl)))
&& optimize
&& !optimize_debug
&& (TREE_NOTHROW (fndecl) || !flag_exceptions))
diff --git a/gcc/testsuite/gcc.target/i386/pr114097-1.c 
b/gcc/testsuite/gcc.target/i386/pr114097-1.c
new file mode 100644
index 000..b14c7b6214d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114097-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move 
-fomit-frame-pointer" } */
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+_Noreturn
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+for (j = ARRAY_SIZE; j > 0; --j)
+  for (k = ARRAY_SIZE; k > 0; --k)
+   array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
-- 
2.43.2

[PATCH]middle-end: delay updating of dominators until later during vectorization. [PR114081]

2024-02-25 Thread Tamar Christina

Hi All,

The testcase shows an interesting case where we have multiple loops sharing a
live value and have an early exit that go to the same location.  The additional
complication is that on x86_64 with -mavx we seem to also do prologue peeling
on the loops.

We correctly identify which BB we need their dominators updated for, but we do
so too early.

Instead of adding more dominator update we can solve this by for the cases with
multiple exits not to verify dominators at the end of peeling if peeling for
vectorization.

We can then perform the final dominator updates just before vectorization when
all loop transformations are done.

This also means we reduce the number of dominator updates needed by at least
50% and fixes the ICE.

Bootstrapped Regtested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/114081
PR tree-optimization/113290
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Skip dominator update when multiple exit.
(vect_do_peeling): Remove multiple exit dominator update.
* tree-vect-loop.cc (vect_transform_loop): Update dominators when
multiple exits.
* tree-vectorizer.h (LOOP_VINFO_DOMS_NEED_UPDATE,
 dominators_needing_update): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/114081
PR tree-optimization/113290
* gcc.dg/vect/vect-early-break_120-pr114081.c: New test.
* gcc.dg/vect/vect-early-break_121-pr114081.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c
new file mode 100644
index 
..2cd4ce1e4ac573ba6e41730fd2216f0ec8061376
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+typedef struct filter_list_entry {
+  const char *name;
+  int id;
+  void (*function)();
+} filter_list_entry;
+
+static const filter_list_entry filter_list[9] = {0};
+
+void php_zval_filter(int filter, int id1) {
+  filter_list_entry filter_func;
+
+  int size = 9;
+  for (int i = 0; i < size; ++i) {
+if (filter_list[i].id == filter) {
+  filter_func = filter_list[i];
+  goto done;
+}
+  }
+
+#pragma GCC novector
+  for (int i = 0; i < size; ++i) {
+if (filter_list[i].id == 0x0204) {
+  filter_func = filter_list[i];
+  goto done;
+}
+  }
+done:
+  if (!filter_func.id)
+filter_func.function();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
new file mode 100644
index 
..feebdb7a6c9b8981d7be31dd1c741f9e36738515
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+typedef struct filter_list_entry {
+  const char *name;
+  int id;
+  void (*function)();
+} filter_list_entry;
+
+static const filter_list_entry filter_list[9] = {0};
+
+void php_zval_filter(int filter, int id1) {
+  filter_list_entry filter_func;
+
+  int size = 9;
+  for (int i = 0; i < size; ++i) {
+if (filter_list[i].id == filter) {
+  filter_func = filter_list[i];
+  goto done;
+}
+  }
+
+  for (int i = 0; i < size; ++i) {
+if (filter_list[i].id == 0x0204) {
+  filter_func = filter_list[i];
+  goto done;
+}
+  }
+done:
+  if (!filter_func.id)
+filter_func.function();
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
3f974d6d839e32516ae316f28ca25316e43d7d86..b5e158bc5cfb5107d5ff461e489d306f81e090d0
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1917,7 +1917,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, 
edge loop_exit,
  doms.safe_push (e->dest);
}
 
-  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
   if (updated_doms)
updated_doms->safe_splice (doms);
 }
@@ -1925,7 +1924,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, 
edge loop_exit,
   free (new_bbs);
   free (bbs);
 
-  checking_verify_dominators (CDI_DOMINATORS);
+  /* If we're peeling for vectorization then delay verifying dominators.  */
+  if (!flow_loops || !multiple_exits_p)
+checking_verify_dominators (CDI_DOMINATORS);
 
   return new_loop

New Swedish PO file for 'gcc' (version 14.1-b20240218)

2024-02-25 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-14.1-b20240218.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

RE: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-02-25 Thread Tamar Christina

Hi Pan,

> From: Pan Li 
> 
> Hi Richard & Tamar,
> 
> Try the DEF_INTERNAL_INT_EXT_FN as your suggestion.  By mapping
> us_plus$a3 to the RTL representation (us_plus:m x y) in optabs.def.
> And then expand_US_PLUS in internal-fn.cc.  Not very sure if my
> understanding is correct for DEF_INTERNAL_INT_EXT_FN.
> 
> I am not sure if we still need DEF_INTERNAL_SIGNED_OPTAB_FN here, given
> the RTL representation has (ss_plus:m x y) and (us_plus:m x y) already.
> 

I think a couple of things are being confused here.  So lets break it down:

The reason for DEF_INTERNAL_SIGNED_OPTAB_FN is because in GIMPLE
we only want one internal function for both signed and unsigned SAT_ADD.
with this definition we don't need SAT_UADD and SAT_SADD but instead
we will only have SAT_ADD, which will expand to us_plus or ss_plus.

Now the downside of this is that this is a direct internal optab.  This means
that for the representation to be used the target *must* have the optab
implemented.   This is a bit annoying because it doesn't allow us to generically
assume that all targets use SAT_ADD for saturating add and thus only have to
write optimization for this representation.

This is why Richi said we may need to use a new tree_code because we can
override tree code expansions.  However the same can be done with the _EXT_FN
internal functions.

So what I meant was that we want to have a combination of the two. i.e. a
DEF_INTERNAL_SIGNED_OPTAB_EXT_FN.

If Richi agrees, the below is what I meant. It creates the infrastructure for 
this
and for now only allows a default fallback for unsigned saturating add and makes
it easier for us to add the rest later

Also, unless I'm wrong (and Richi can correct me here), us_plus and ss_plus are 
the
RTL expression, but the optab for saturation are ssadd and usadd.  So you don't
need to make new us_plus and ss_plus ones.

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index a07f25f3aee..aaf9f8991b3 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4103,6 +4103,17 @@ direct_internal_fn_supported_p (internal_fn fn, 
tree_pair types,
return direct_##TYPE##_optab_supported_p (which_optab, types,   \
  opt_type);\
   }
+#define DEF_INTERNAL_SIGNED_OPTAB_EXT_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+UNSIGNED_OPTAB, TYPE)  \
+case IFN_##CODE:   \
+  {
\
+   optab which_optab = (TYPE_UNSIGNED (types.SELECTOR) \
+? UNSIGNED_OPTAB ## _optab \
+: SIGNED_OPTAB ## _optab); \
+   return direct_##TYPE##_optab_supported_p (which_optab, types,   \
+ opt_type) \
+  || internal_##CODE##_fn_supported_p (types.SELECTOR, opt_type); \
+  }
 #include "internal-fn.def"
 
 case IFN_LAST:
@@ -4303,6 +4314,8 @@ set_edom_supported_p (void)
 optab which_optab = direct_internal_fn_optab (fn, types);  \
 expand_##TYPE##_optab_fn (fn, stmt, which_optab);  \
   }
+#define DEF_INTERNAL_SIGNED_OPTAB_EXT_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+UNSIGNED_OPTAB, TYPE)
 #include "internal-fn.def"
 
 /* Routines to expand each internal function, indexed by function number.
@@ -5177,3 +5190,45 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
   emit_move_insn (plhs, cmp);
 }
 }
+
+void
+expand_SAT_ADD (internal_fn fn, gcall *stmt)
+{
+  /* Check if the target supports the expansion through an IFN.  */
+  tree_pair types = direct_internal_fn_types (fn, stmt);
+  optab which_optab = direct_internal_fn_optab (fn, types);
+  if (direct_binary_optab_supported_p (which_optab, types,
+  insn_optimization_type ()))
+{
+  expand_binary_optab_fn (fn, stmt, which_optab);
+  return;
+}
+
+  /* Target does not support the optab, but we can de-compose it.  */
+  /*
+  ... decompose to a canonical representation ...
+  if (TYPE_UNSIGNED (types.SELECTOR))
+{
+  ...
+  decompose back to (X + Y) | - ((X + Y) < X)
+}
+  else
+{
+  ...
+}
+  */
+}
+
+bool internal_SAT_ADD_fn_supported_p (tree type, optimization_type /* optype 
*/)
+{
+  /* For now, don't support decomposing vector ops.  */
+  if (VECTOR_TYPE_P (type))
+return false;
+
+  /* Signed saturating arithmetic is harder to do since we'll so for now
+ lets ignore.  */
+  if (!TYPE_UNSIGNED (type))
+return false;
+
+  return TREE_CODE (type) == INTEGER_TYPE;
+}
\ No newline at end of file
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index c14d30365c1..5a2491228d5 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -92,6 +92,10 @@ along with GCC; see the file COP

50 matches

Mail list logo