Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Kewen.Lin
Hi,

on 2024/6/14 11:58, Peter Bergner wrote:
> On 6/13/24 9:34 PM, Kewen.Lin wrote:
>> on 2024/6/14 05:16, Carl Love wrote:
> 
>>>  /* { dg-options "-mvsx" } */
>>>  /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>>> has_arch_pwr8 } } } */
> 
> With the above, we're going to compile and run this test case with 
> -mcpu=power8
> or higher, which means we could have P8, P9 or even P10 instructions emitted.
> 
> 
> 
>>>  /* { dg-require-effective-target powerpc_vsx } */
>>
>> Since you changed this for "run", I think you also want 
>> s/powerpc_vsx/vsx_hw/ .
> 
> ...which means we'd need p8vector_hw, p9vector_hw or ... here.

Ah, good catch!  Yes, it would require some harder guard.

> 
> 
> Should we just always compile with -mcpu=power8 and then check for p8vector_hw
> to make our lives easier?  Ala...
> 
> 
>/* { dg-options "-mdejagnu-cpu=power8" } */
>...
>/* { dg-require-effective-target p8vector_hw } */
> 
> 
> Note I've removed -mvsx, since that is implied by -mcpu=power8 and no
> need for dg-additional-options.   Maybe we want to add -O2 as well?
> Thoughts?

Both sounds reasonable to me, it looks useless to distinguish p8 or p8-up for
this test case.

BR,
Kewen


Re: [PATCH] s390: testsuite: Fix ifcvt-one-insn-bool.c

2024-06-13 Thread Andreas Krebbel

On Wed, Jun 05, 2024 at 08:00:15AM +0200, Stefan Schulze Frielinghaus wrote:

With the change of r15-787-g57e04879389f9c I forgot to also update this
test.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ifcvt-one-insn-bool.c: Fix loc.


Ok. Thanks!


Andreas



---
  Ok for mainline?  Ok for GCC 14 if the corresponding backport is also
  approved?

  gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
index 0c8c2f879a6..4ae29dbd6b6 100644
--- a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
@@ -3,7 +3,7 @@
  /* { dg-do compile { target { s390*-*-* } } } */
  /* { dg-options "-O2 -march=z13 -mzarch" } */
  
-/* { dg-final { scan-assembler "lochinh\t%r.?,1" } } */

+/* { dg-final { scan-assembler "lochile\t%r.?,1" } } */
  #include 
  
  int foo (int *a, unsigned int n)

--
2.45.1



Re: [PATCH v2] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-06-13 Thread Andreas Krebbel

On 6/2/24 14:07, Stefan Schulze Frielinghaus wrote:

Since the patch works fine so far for mainline, ok to backport to GCC 14?


Yes please do. Thanks!


Andreas




On Fri, May 17, 2024 at 08:59:05AM +0200, Stefan Schulze Frielinghaus wrote:

I've adapted the patch as follows and will push.

Thanks,
Stefan

--

Consider a NOCE conversion as profitable if there is at least one
conditional move.

gcc/ChangeLog:

* config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
Define.
(s390_noce_conversion_profitable_p): Implement.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
consequence the condition has to be reversed.
---
  gcc/config/s390/s390.cc  | 32 
  gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
  2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index bf46eab2d63..7f8f1681c2a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "tree-pass.h"
  #include "context.h"
  #include "builtins.h"
+#include "ifcvt.h"
  #include "rtl-iter.h"
  #include "intl.h"
  #include "tm-constrs.h"
@@ -18037,6 +18038,34 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
return vectorize_vec_perm_const_1 (d);
  }
  
+/* Consider a NOCE conversion as profitable if there is at least one

+   conditional move.  */
+
+static bool
+s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  if (if_info->speed_p)
+{
+  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
+   {
+ rtx set = single_set (insn);
+ if (set == NULL)
+   continue;
+ if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
+   continue;
+ rtx src = SET_SRC (set);
+ machine_mode mode = GET_MODE (src);
+ if (GET_MODE_CLASS (mode) != MODE_INT
+ && GET_MODE_CLASS (mode) != MODE_FLOAT)
+   continue;
+ if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   continue;
+ return true;
+   }
+}
+  return default_noce_conversion_profitable_p (seq, if_info);
+}
+
  /* Initialize GCC target structure.  */
  
  #undef  TARGET_ASM_ALIGNED_HI_OP

@@ -18350,6 +18379,9 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
  #undef TARGET_VECTORIZE_VEC_PERM_CONST
  #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
  
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P

+#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
+
  struct gcc_target targetm = TARGET_INITIALIZER;
  
  #include "gt-s390.h"

diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
b/gcc/testsuite/gcc.target/s390/ccor.c
index 31f30f60314..36a3c3a999a 100644
--- a/gcc/testsuite/gcc.target/s390/ccor.c
+++ b/gcc/testsuite/gcc.target/s390/ccor.c
@@ -42,7 +42,7 @@ GENFUN1(2)
  
  GENFUN1(3)
  
-/* { dg-final { scan-assembler {locrno} } } */

+/* { dg-final { scan-assembler {locro} } } */
  
  GENFUN2(0,1)
  
@@ -58,7 +58,7 @@ GENFUN2(0,3)
  
  GENFUN2(1,2)
  
-/* { dg-final { scan-assembler {locrnlh} } } */

+/* { dg-final { scan-assembler {locrlh} } } */
  
  GENFUN2(1,3)
  
--

2.45.0



[PATCH 30/52 v2] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-13 Thread Kewen.Lin
Hi Paul,

on 2024/6/14 04:07, Paul Koning wrote:
> What is the effect of this change?  The original code intended to have 
> "float" mean a 32 bit value, and "double" a 64 bit value.  There aren't any 
> larger floats, so I defined the long double size as 64 also.  Is the right 
> answer not to define it?

Since sub-patch 09/52 will poison {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE, target 
code building will fail
if it still has these macros.  As I'd like to squash these target changes onto 
09/52, so I didn't note
the background/context here, sorry about that.

> 
> That part I understand, but why does the patch also remove FLOAT_TYPE_SIZE 
> and DOUBLE_TYPE_SIZE without explanation and without mention in the changelog?

Oops, thanks for catching!  I just noticed this sub-patch has inconsistent 
subject & changelog, I should
have noticed this as it has a quite different subject from the others. :(  With 
your finding, I just
re-visited all the other sub-patches, luckily they are consistent.

The below is the updated revision, hope it looks good to you.  Thanks again.

BR,
Kewen
-

Subject: [PATCH] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in pdp11 port, as we want to replace these macros
with hook mode_for_floating_type and poison them.

gcc/ChangeLog:

* config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
---
 gcc/config/pdp11/pdp11.h | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index 2446fea0b58..6c8e045bc57 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -71,17 +71,6 @@ along with GCC; see the file COPYING3.  If not see
 #define LONG_TYPE_SIZE 32
 #define LONG_LONG_TYPE_SIZE64

-/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64,
-   but that conflicts with Fortran language rules.  Since there is no
-   obvious reason why we should have that feature -- other targets
-   generally don't have float and double the same size -- I've removed
-   it.  Note that it continues to be true (for now) that arithmetic is
-   always done with 64-bit values, i.e., the FPU is always in "double"
-   mode.  */
-#define FLOAT_TYPE_SIZE32
-#define DOUBLE_TYPE_SIZE   64
-#define LONG_DOUBLE_TYPE_SIZE  64
-
 /* machine types from ansi */
 #define SIZE_TYPE "short unsigned int" /* definition of size_t */
 #define WCHAR_TYPE "short int" /* or long int */
--
2.43.0




Re: [PATCH 6/6] vect: Optimize order of lane-reducing statements in loop def-use cycles [PR114440]

2024-06-13 Thread Feng Xue OS
Regenerate the patch due to changes on its dependent patches.

Thanks,
Feng,
---
gcc/
PR tree-optimization/114440
* tree-vectorizer.h (struct _stmt_vec_info): Add a new field
reduc_result_pos.
* tree-vect-loop.cc (vect_transform_reduction): Generate lane-reducing
statements in an optimized order.
---
 gcc/tree-vect-loop.cc | 51 ++-
 gcc/tree-vectorizer.h |  6 +
 2 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fb9259d115c..de7a9bab990 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8734,7 +8734,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 }

   bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
-  gcc_assert (single_defuse_cycle || lane_reducing_op_p (code));
+  bool lane_reducing = lane_reducing_op_p (code);
+  gcc_assert (single_defuse_cycle || lane_reducing);

   /* Create the destination vector  */
   tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
@@ -8751,6 +8752,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 }
   else
 {
+  int result_pos = 0;
+
   /* The input vectype of the reduction PHI determines copies of
 vectorized def-use cycles, which might be more than effective copies
 of vectorized lane-reducing reduction statements.  This could be
@@ -8780,9 +8783,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

-  sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
-  sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
-  sum_v2 = sum_v2;  // copy
+  sum_v0 = sum_v0;  // copy
+  sum_v1 = SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1);
+  sum_v2 = SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2);
   sum_v3 = sum_v3;  // copy

   sum_v0 += n_v0[i: 0  ~ 3 ];
@@ -8790,7 +8793,20 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   sum_v2 += n_v2[i: 8  ~ 11];
   sum_v3 += n_v3[i: 12 ~ 15];
 }
-   */
+
+Moreover, for a higher instruction parallelism in final vectorized
+loop, it is considered to make those effective vectorized
+lane-reducing statements be distributed evenly among all def-use
+cycles. In the above example, SADs are generated into other cycles
+rather than that of DOT_PROD.  */
+
+  if (stmt_ncopies < ncopies)
+   {
+ gcc_assert (lane_reducing);
+ result_pos = reduc_info->reduc_result_pos;
+ reduc_info->reduc_result_pos = (result_pos + stmt_ncopies) % ncopies;
+ gcc_assert (result_pos >= 0 && result_pos < ncopies);
+   }

   for (i = 0; i < MIN (3, (int) op.num_ops); i++)
{
@@ -8826,7 +8842,30 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   op.ops[i], _oprnds[i], vectype);

  if (used_ncopies < ncopies)
-   vec_oprnds[i].safe_grow_cleared (ncopies);
+   {
+ vec_oprnds[i].safe_grow_cleared (ncopies);
+
+ /* Find suitable def-use cycles to generate vectorized
+statements into, and reorder operands based on the
+selection.  */
+ if (i != reduc_index && result_pos)
+   {
+ int count = ncopies - used_ncopies;
+ int start = result_pos - count;
+
+ if (start < 0)
+   {
+ count = result_pos;
+ start = 0;
+   }
+
+ for (int j = used_ncopies - 1; j >= start; j--)
+   {
+ std::swap (vec_oprnds[i][j], vec_oprnds[i][j + count]);
+ gcc_assert (!vec_oprnds[i][j]);
+   }
+   }
+   }
}
 }

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 3f7db707d97..b9bc9d432ee 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1402,6 +1402,12 @@ public:
   /* The vector type for performing the actual reduction.  */
   tree reduc_vectype;

+  /* For loop reduction with multiple vectorized results (ncopies > 1), a
+ lane-reducing operation participating in it may not use all of those
+ results, this field specifies result index starting from which any
+ following land-reducing operation would be assigned to.  */
+  int reduc_result_pos;
+
   /* If IS_REDUC_INFO is true and if the vector code is performing
  N scalar reductions in parallel, this variable gives the initial
  scalar values of those N reductions.  */
--
2.17.1


From: Feng Xue OS 
Sent: Thursday, May 30, 2024 10:56 PM
To: Richard Biener
Cc: Tamar Christina; gcc-patches@gcc.gnu.org
Subject: [PATCH 

Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Peter Bergner
On 6/13/24 9:34 PM, Kewen.Lin wrote:
> on 2024/6/14 05:16, Carl Love wrote:

>>  /* { dg-options "-mvsx" } */
>>  /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>> has_arch_pwr8 } } } */

With the above, we're going to compile and run this test case with -mcpu=power8
or higher, which means we could have P8, P9 or even P10 instructions emitted.



>>  /* { dg-require-effective-target powerpc_vsx } */
> 
> Since you changed this for "run", I think you also want s/powerpc_vsx/vsx_hw/ 
> .

...which means we'd need p8vector_hw, p9vector_hw or ... here.


Should we just always compile with -mcpu=power8 and then check for p8vector_hw
to make our lives easier?  Ala...


   /* { dg-options "-mdejagnu-cpu=power8" } */
   ...
   /* { dg-require-effective-target p8vector_hw } */


Note I've removed -mvsx, since that is implied by -mcpu=power8 and no
need for dg-additional-options.   Maybe we want to add -O2 as well?
Thoughts?

Peter




Re: [PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt in loop reduction

2024-06-13 Thread Feng Xue OS
Updated the patch.

Thanks,
Feng
--

gcc/
* tree-vect-loop.cc (vectorizable_reduction): Set STMT_VINFO_REDUC_DEF
for non-live stmt.
* tree-vect-stmts.cc (vectorizable_condition): Treat the condition
statement that is pointed by stmt_vec_info of reduction PHI as the
real "for_reduction" statement.
---
 gcc/tree-vect-loop.cc  |  7 +--
 gcc/tree-vect-stmts.cc | 11 ++-
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bbd5d261907..35c50eb72cb 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7665,8 +7665,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
if (STMT_VINFO_LIVE_P (s))
  STMT_VINFO_REDUC_DEF (vect_orig_stmt (s)) = phi_info;
}
-  else if (STMT_VINFO_LIVE_P (vdef))
-   STMT_VINFO_REDUC_DEF (def) = phi_info;
+
+  /* For lane-reducing operation vectorizable analysis needs the
+reduction PHI information */
+  STMT_VINFO_REDUC_DEF (def) = phi_info;
+
   gimple_match_op op;
   if (!gimple_extract_op (vdef->stmt, ))
{
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e32d44050e5..dbdb59054e0 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12137,11 +12137,20 @@ vectorizable_condition (vec_info *vinfo,
   vect_reduction_type reduction_type = TREE_CODE_REDUCTION;
   bool for_reduction
 = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) != NULL;
+  if (for_reduction)
+{
+  reduc_info = info_for_reduction (vinfo, stmt_info);
+  if (STMT_VINFO_REDUC_DEF (reduc_info) != vect_orig_stmt (stmt_info))
+   {
+ for_reduction = false;
+ reduc_info = NULL;
+   }
+}
+
   if (for_reduction)
 {
   if (slp_node && SLP_TREE_LANES (slp_node) > 1)
return false;
-  reduc_info = info_for_reduction (vinfo, stmt_info);
   reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
   reduc_index = STMT_VINFO_REDUC_IDX (stmt_info);
   gcc_assert (reduction_type != EXTRACT_LAST_REDUCTION
--
2.17.1


From: Feng Xue OS 
Sent: Thursday, May 30, 2024 10:51 PM
To: Richard Biener
Cc: Tamar Christina; gcc-patches@gcc.gnu.org
Subject: [PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt in loop  
reduction

Normally, vectorizable checking on statement in a loop reduction chain does
not use the reduction PHI information. But some special statements might
need it in vectorizable analysis, especially, for multiple lane-reducing
operations support later.

Thanks,
Feng
---
gcc/
* tree-vect-loop.cc (vectorizable_reduction): Set STMT_VINFO_REDUC_DEF
for non-live stmt.
* tree-vect-stmts.cc (vectorizable_condition): Treat the condition
statement that is pointed by stmt_vec_info of reduction PHI as the
real "for_reduction" statement.
---
 gcc/tree-vect-loop.cc  |  5 +++--
 gcc/tree-vect-stmts.cc | 11 ++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index aa5f21ccd1a..51627c27f8a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7632,14 +7632,15 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 all lanes here - even though we only will vectorize from
 the SLP node with live lane zero the other live lanes also
 need to be identified as part of a reduction to be able
-to skip code generation for them.  */
+to skip code generation for them.  For lane-reducing operation
+vectorizable analysis needs the reduction PHI information.  */
   if (slp_for_stmt_info)
{
  for (auto s : SLP_TREE_SCALAR_STMTS (slp_for_stmt_info))
if (STMT_VINFO_LIVE_P (s))
  STMT_VINFO_REDUC_DEF (vect_orig_stmt (s)) = phi_info;
}
-  else if (STMT_VINFO_LIVE_P (vdef))
+  else
STMT_VINFO_REDUC_DEF (def) = phi_info;
   gimple_match_op op;
   if (!gimple_extract_op (vdef->stmt, ))
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 935d80f0e1b..2e0be763abb 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12094,11 +12094,20 @@ vectorizable_condition (vec_info *vinfo,
   vect_reduction_type reduction_type = TREE_CODE_REDUCTION;
   bool for_reduction
 = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) != NULL;
+  if (for_reduction)
+{
+  reduc_info = info_for_reduction (vinfo, stmt_info);
+  if (STMT_VINFO_REDUC_DEF (reduc_info) != vect_orig_stmt (stmt_info))
+   {
+ for_reduction = false;
+ reduc_info = NULL;
+   }
+}
+
   if (for_reduction)
 {
   if (slp_node)
return false;
-  reduc_info = info_for_reduction (vinfo, stmt_info);
   reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
   reduc_index = STMT_VINFO_REDUC_IDX (stmt_info);
   gcc_assert (reduction_type 

Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

2024-06-13 Thread Peter Bergner
On 6/13/24 9:26 PM, Kewen.Lin wrote:
> on 2024/6/13 21:24, Peter Bergner wrote:
>> On 6/13/24 12:35 AM, Kewen.Lin wrote:
 @@ -826,7 +826,14 @@ rs6000_stack_info (void)
  info->ehrd_offset -= info->rop_hash_size;
}
else
 -  info->ehrd_offset = info->gp_save_offset - ehrd_size;
 +  {
 +info->ehrd_offset = info->gp_save_offset - ehrd_size;
 +
 +/* Adjust for ROP protection.  */
 +info->rop_hash_save_offset
 +  = info->gp_save_offset - info->rop_hash_size;
 +info->ehrd_offset -= info->rop_hash_size;
 +  }
>>>
>>> I understand this is just copied from the if arm, but if I read this right, 
>>> it can be
>>> simplified as:
>>
>> Ok, I'll retest with that simplification.

So I retested a normal powerpc64le-linux build (ie, we default to Power8
with Altivec) and it bootstrapped and regtested with no regressions.
I then attempted a --with-cpu=power5 build to test the non-altivec path,
but both the unpatched and patched builds died building libgfortran with
the following error: "error: ‘_Float128’ is not supported on this target".
I believe that is related to PR113652.  I'll kick off the build again,
this time disabling Fortran and seeing if the build completes.



 +/* { dg-do assemble } */
 +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect -mno-vsx 
 -mno-altivec -mabi=no-altivec -save-temps" } */
>>>
>>> I'd expect -mabi=no-altivec is default for -mno-altivec, but specifying it 
>>> explicitly
>>> looks fine to me. :)
>>
>> That's what I expected too! :-)  However, I was surprised to learn that 
>> -mno-altivec
>> does *not* disable TARGET_ALTIVEC_ABI.  I had to explicitly use the -mabi= 
>> option to
>> expose the bug.
> 
> oh, it's surprising, I learn something today! :) I guess it's not intentional 
> but just no
> one noticed it, as it seems nonsense to have altivec ABI extension but not 
> using any altivec
> features.

Agreed!

Peter




[PATCH 3/3] [APX CFCMOV] Support APX CFCMOV in backend

2024-06-13 Thread Kong, Lingling
From: Lingling Kong 


Handle target hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP and support
CFCMOV in backend.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that
test if the cfcmov can be generated.
(ix86_expand_int_movcc): Expand to cfcmov pattern if ix86_can_cfcmov_p
return ture.
* config/i386/i386-opts.h (enum apx_features): Add apx_cfcmov.
* config/i386/i386.cc (ix86_have_conditional_move_mem_notrap): New
function to hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
(TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP): Target hook define.
(ix86_rtx_costs): Add UNSPEC_APX_CFCMOV cost;
* config/i386/i386.h (TARGET_APX_CFCMOV): Define.
* config/i386/i386.md (*cfcmov_1): New define_insn to support
cfcmov.
(*cfcmov_2): Ditto.
(UNSPEC_APX_CFCMOV): New unspec for cfcmov.
* config/i386/i386.opt: Add enum value for cfcmov.
* config/i386/predicates.md (register_or_cfc_mem_operand): New
define_predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-cfcmov-1.c: New test.
* gcc.target/i386/apx-cfcmov-2.c: Ditto.
---
 gcc/config/i386/i386-expand.cc   | 63 +
 gcc/config/i386/i386-opts.h  |  4 +-
 gcc/config/i386/i386.cc  | 33 +++--
 gcc/config/i386/i386.h   |  1 +
 gcc/config/i386/i386.md  | 53 --
 gcc/config/i386/i386.opt |  3 +
 gcc/config/i386/predicates.md|  7 ++
 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c | 73 
 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c | 40 +++
 9 files changed, 265 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 312329e550b..c02a4bcbec3 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -3336,6 +3336,30 @@ ix86_expand_int_addcc (rtx operands[])
   return true;
 }
 
+/* Return TRUE if we could convert "if (test) x = a; else x = b;" to cfcmov,
+   especially when load a or b or x store may cause memmory faults.  */
+bool
+ix86_can_cfcmov_p (rtx x, rtx a, rtx b)
+{
+  machine_mode mode = GET_MODE (x);
+  if (TARGET_APX_CFCMOV
+  && (mode == DImode || mode == SImode || mode == HImode))
+{
+  /* C load (r m r), (r m C), (r r m). For r m m could use
+two cfcmov. */
+  if (register_operand (x, mode)
+ && ((MEM_P (a) && register_operand (b, mode))
+ || (MEM_P (a) && b == const0_rtx)
+ || (register_operand (a, mode) && MEM_P (b))
+ || (MEM_P (a) && MEM_P (b
+   return true;
+  /* C store  (m r 0).  */
+  else if (MEM_P (x) && x == b && register_operand (a, mode))
+   return true;
+}
+  return false;
+}
+
 bool
 ix86_expand_int_movcc (rtx operands[])
 {
@@ -3366,6 +3390,45 @@ ix86_expand_int_movcc (rtx operands[])
 
   compare_code = GET_CODE (compare_op);
 
+  if (MEM_P (operands[0])
+  && !ix86_can_cfcmov_p (operands[0], op2, op3))
+return false;
+
+  if (may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
+  {
+   if (ix86_can_cfcmov_p (operands[0], op2, op3))
+ {
+   if (may_trap_or_fault_p (op2))
+ op2 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[2]),
+   UNSPEC_APX_CFCMOV);
+   if (may_trap_or_fault_p (op3))
+ op3 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[3]),
+   UNSPEC_APX_CFCMOV);
+   emit_insn (compare_seq);
+
+   if (may_trap_or_fault_p (op2) && may_trap_or_fault_p (op3))
+ {
+   emit_insn (gen_rtx_SET (operands[0],
+   gen_rtx_IF_THEN_ELSE (mode,
+ compare_op,
+ op2,
+ operands[0])));
+   emit_insn (gen_rtx_SET (operands[0],
+   gen_rtx_IF_THEN_ELSE (mode,
+ compare_op,
+ operands[0],
+ op3)));
+ }
+   else
+ emit_insn (gen_rtx_SET (operands[0],
+ gen_rtx_IF_THEN_ELSE (mode,
+   compare_op,
+   op2, op3)));
+   return true;
+ }
+   return false;
+  }
+
   if ((op1 == const0_rtx && (code == GE || code == 

[PATCH 2/3] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-13 Thread Kong, Lingling
From: Lingling Kong 

After added target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP,
we could support a conditional move that load or store mem may trap
or fault in if convert pass.

Conditional move suppress fault for conditional mem store would not
move any arithmetic calculations. For conditional mem load now just
support a conditional move one trap mem and one no trap and no mem
cases.

gcc/ChangeLog:

* ifcvt.cc (noce_try_cmove_load_mem_notrap): Use target hook
to allow convert to cfcmov for conditional load.
(noce_try_cmove_store_mem_notrap): Convert to conditional store.
(noce_process_if_block): Ditto.
---
 gcc/ifcvt.cc | 247 ++-
 1 file changed, 246 insertions(+), 1 deletion(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..6e3e48af810 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -783,6 +783,8 @@ static rtx noce_emit_cmove (struct noce_if_info *, rtx, 
enum rtx_code, rtx,
rtx, rtx, rtx, rtx = NULL, rtx = NULL);
 static bool noce_try_cmove (struct noce_if_info *);
 static bool noce_try_cmove_arith (struct noce_if_info *);
+static bool noce_try_cmove_load_mem_notrap (struct noce_if_info *);
+static bool noce_try_cmove_store_mem_notrap (struct noce_if_info *, rtx *, 
rtx);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
 static bool noce_try_minmax (struct noce_if_info *);
 static bool noce_try_abs (struct noce_if_info *);
@@ -2401,6 +2403,237 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   return false;
 }
 
+/* When target support suppress memory fault, try more complex cases involving
+   conditional_move's source or dest may trap or fault.  */
+
+static bool
+noce_try_cmove_load_mem_notrap (struct noce_if_info *if_info)
+{
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+  rtx x = if_info->x;
+
+  if (MEM_P (x))
+return false;
+  /* Just handle a conditional move from one trap MEM + other non_trap,
+ non mem cases.  */
+  if (!(MEM_P (a) ^ MEM_P (b)))
+  return false;
+  bool a_trap = may_trap_or_fault_p (a);
+  bool b_trap = may_trap_or_fault_p (b);
+
+  if (!(a_trap ^ b_trap))
+return false;
+  if (a_trap && (!MEM_P (a) || !targetm.have_conditional_move_mem_notrap (a)))
+return false;
+  if (b_trap && (!MEM_P (b) || !targetm.have_conditional_move_mem_notrap (b)))
+return false;
+
+  rtx orig_b;
+  rtx_insn *insn_a, *insn_b;
+  bool a_simple = if_info->then_simple;
+  bool b_simple = if_info->else_simple;
+  basic_block then_bb = if_info->then_bb;
+  basic_block else_bb = if_info->else_bb;
+  rtx target;
+  enum rtx_code code;
+  rtx cond = if_info->cond;
+  rtx_insn *ifcvt_seq;
+
+  /* if (test) x = *a; else x = c - d;
+ => x = c - d;
+   if (test)
+ x = *a;
+  */
+
+  code = GET_CODE (cond);
+  insn_a = if_info->insn_a;
+  insn_b = if_info->insn_b;
+
+  machine_mode x_mode = GET_MODE (x);
+
+  if (!can_conditionally_move_p (x_mode))
+return false;
+
+  /* Because we only handle one trap MEM + other non_trap, non mem cases,
+ just move one trap MEM always in then_bb.  */
+  if (noce_reversed_cond_code (if_info) != UNKNOWN)
+{
+  bool reversep = false;
+  if (b_trap)
+   reversep = true;
+
+  if (reversep)
+   {
+ if (if_info->rev_cond)
+   {
+ cond = if_info->rev_cond;
+ code = GET_CODE (cond);
+   }
+ else
+   code = reversed_comparison_code (cond, if_info->jump);
+ std::swap (a, b);
+ std::swap (insn_a, insn_b);
+ std::swap (a_simple, b_simple);
+ std::swap (then_bb, else_bb);
+   }
+}
+
+  if (then_bb && else_bb
+  && (!bbs_ok_for_cmove_arith (then_bb, else_bb,  if_info->orig_x)
+ || !bbs_ok_for_cmove_arith (else_bb, then_bb,  if_info->orig_x)))
+return false;
+
+  start_sequence ();
+
+  /* If one of the blocks is empty then the corresponding B or A value
+ came from the test block.  The non-empty complex block that we will
+ emit might clobber the register used by B or A, so move it to a pseudo
+ first.  */
+
+  rtx tmp_b = NULL_RTX;
+
+  /* Don't move trap mem to a pseudo. */
+  if (!may_trap_or_fault_p (b) && (b_simple || !else_bb))
+tmp_b = gen_reg_rtx (x_mode);
+
+  orig_b = b;
+
+  rtx emit_a = NULL_RTX;
+  rtx emit_b = NULL_RTX;
+  rtx_insn *tmp_insn = NULL;
+  bool modified_in_a = false;
+  bool modified_in_b = false;
+  /* If either operand is complex, load it into a register first.
+ The best way to do this is to copy the original insn.  In this
+ way we preserve any clobbers etc that the insn may have had.
+ This is of course not possible in the IS_MEM case.  */
+
+  if (! general_operand (b, GET_MODE (b)) || tmp_b)
+{
+ if (insn_b)
+   {
+ b = tmp_b ? tmp_b : gen_reg_rtx (GET_MODE (b));
+ rtx_insn *copy_of_b = as_a  (copy_rtx (insn_b));
+ rtx set 

[PATCH 1/3] [APX CFCMOV] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP

2024-06-13 Thread Kong, Lingling
From: konglin1 

APX CFCMOV feature implements conditionally faulting which means that all
memory faults are suppressed when the condition code evaluates to false and
load or store a memory operand. Now we could load or store a memory operand
may trap or fault for conditional move.

In middle-end, now we don't support a conditional move if we knew
that a load from A or B could trap or fault.

To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
in if-conversion pass to allow convert to cmov.

gcc/ChangeLog:

* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Add TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
* target.def (bool,): New hook.
* targhooks.cc (default_have_conditional_move_mem_notrap): New
function to hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP.
* targhooks.h (default_have_conditional_move_mem_notrap): New
target hook declear.
---
 gcc/doc/tm.texi|  6 ++
 gcc/doc/tm.texi.in |  2 ++
 gcc/target.def | 11 +++
 gcc/targhooks.cc   |  8 
 gcc/targhooks.h|  1 +
 5 files changed, 28 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8a7aa70d605..f8faf44ab73 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7311,6 +7311,12 @@ candidate as a replacement for the if-convertible 
sequence described in
 @code{if_info}.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP (rtx 
@var{x})
+This hook returns true if the target supports condition move instructions
+  that enables fault suppression of memory operands when the condition code
+  evaluates to false.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NEW_ADDRESS_PROFITABLE_P (rtx 
@var{memref}, rtx_insn * @var{insn}, rtx @var{new_addr})
 Return @code{true} if it is profitable to replace the address in
 @var{memref} with @var{new_addr}.  This allows targets to prevent the
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9e0830758ae..17c122aea43 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4748,6 +4748,8 @@ Define this macro if a non-short-circuit operation 
produced by
 
 @hook TARGET_NOCE_CONVERSION_PROFITABLE_P
 
+@hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
+
 @hook TARGET_NEW_ADDRESS_PROFITABLE_P
 
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
diff --git a/gcc/target.def b/gcc/target.def
index 70070caebc7..aa77737e006 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3993,6 +3993,17 @@ candidate as a replacement for the if-convertible 
sequence described in\n\
 bool, (rtx_insn *seq, struct noce_if_info *if_info),
 default_noce_conversion_profitable_p)
 
+/* Return true if the target support condition move instructions that enables
+   fault suppression of memory operands when the condition code evaluates to
+   false.  */
+DEFHOOK
+(have_conditional_move_mem_notrap,
+ "This hook returns true if the target supports condition move instructions\n\
+  that enables fault suppression of memory operands when the condition code\n\
+  evaluates to false.",
+bool, (rtx x),
+default_have_conditional_move_mem_notrap)
+
 /* Return true if new_addr should be preferred over the existing address used 
by
memref in insn.  */
 DEFHOOK
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index fb339bf75dd..a616371b204 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -2816,4 +2816,12 @@ default_memtag_untagged_pointer (rtx tagged_pointer, rtx 
target)
   return untagged_base;
 }
 
+/* The default implementation of
+   TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP.  */
+bool
+default_have_conditional_move_mem_notrap (rtx x ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 85f3817c176..f8ea2fde53d 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -305,5 +305,6 @@ extern rtx default_memtag_add_tag (rtx, poly_int64, 
uint8_t);
 extern rtx default_memtag_set_tag (rtx, rtx, rtx);
 extern rtx default_memtag_extract_tag (rtx, rtx);
 extern rtx default_memtag_untagged_pointer (rtx, rtx);
+extern bool default_have_conditional_move_mem_notrap (rtx x);
 
 #endif /* GCC_TARGHOOKS_H */
-- 
2.31.1



[PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-13 Thread Kong, Lingling
APX CFCMOV[1] feature implements conditionally faulting which means that all 
memory faults are suppressed
when the condition code evaluates to false and load or store a memory operand. 
Now we could load or store a
memory operand may trap or fault for conditional move.

In middle-end, now we don't support a conditional move if we knew that a load 
from A or B could trap or fault.

To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
in if-conversion pass to allow convert to cmov.

All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
We also tested spec with SDE and passed the runtime test.

Ok for trunk?

[1].https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

Lingling Kong (3):
  [APX CFCMOV] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
  [APX CFCMOV] Support APX CFCMOV in if_convert pass
  [APX CFCMOV] Support APX CFCMOV in backend

 gcc/config/i386/i386-expand.cc   |  63 +
 gcc/config/i386/i386-opts.h  |   4 +-
 gcc/config/i386/i386.cc  |  33 ++-
 gcc/config/i386/i386.h   |   1 +
 gcc/config/i386/i386.md  |  53 +++-
 gcc/config/i386/i386.opt |   3 +
 gcc/config/i386/predicates.md|   7 +
 gcc/doc/tm.texi  |   6 +
 gcc/doc/tm.texi.in   |   2 +
 gcc/ifcvt.cc | 247 ++-
 gcc/target.def   |  11 +
 gcc/targhooks.cc |   8 +
 gcc/targhooks.h  |   1 +
 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c |  73 ++
 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c |  40 +++
 15 files changed, 539 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c

-- 
2.31.1



RE: [PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 3

2024-06-13 Thread Li, Pan2
Thanks Juzhe, will commit the series after the middle-end patch.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, June 14, 2024 10:24 AM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
Robin Dapp ; Li, Pan2 
Subject: Re: [PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB 
form 3

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
jeffreyalaw; 
rdapp.gcc; Pan Li
Subject: [PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 3
From: Pan Li mailto:pan2...@intel.com>>

After the middle-end support the form 3 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 3 of unsigned .SAT_SUB.

Form 3:
  #define SAT_SUB_U_3(T) \
  T sat_sub_u_3_##T (T x, T y) \
  { \
return x > y ? x - y : 0; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-10.c: New test.
* gcc.target/riscv/sat_u_sub-11.c: New test.
* gcc.target/riscv/sat_u_sub-12.c: New test.
* gcc.target/riscv/sat_u_sub-9.c: New test.
* gcc.target/riscv/sat_u_sub-run-10.c: New test.
* gcc.target/riscv/sat_u_sub-run-11.c: New test.
* gcc.target/riscv/sat_u_sub-run-12.c: New test.
* gcc.target/riscv/sat_u_sub-run-9.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c | 17 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-9.c  | 18 +
.../gcc.target/riscv/sat_u_sub-run-10.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-11.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-12.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-9.c| 25 +++
9 files changed, 180 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-9.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index bc9a372b6df..50c65cdea49 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -92,8 +92,16 @@ sat_u_sub_##T##_fmt_2 (T x, T y)  \
   return (x - y) & (-(T)(x > y)); \
}
+#define DEF_SAT_U_SUB_FMT_3(T)\
+T __attribute__((noinline))   \
+sat_u_sub_##T##_fmt_3 (T x, T y)  \
+{ \
+  return x > y ? x - y : 0;   \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
+#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
new file mode 100644
index 000..6e78164865f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_3:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_3(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
new file mode 100644
index 000..84e34657f55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 

Re: [PATCH v1 8/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 10

2024-06-13 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 8/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 
10
From: Pan Li 
 
After the middle-end support the form 10 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 10 of unsigned .SAT_SUB.
 
Form 10:
  #define SAT_SUB_U_10(T) \
  T sat_sub_u_10_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, ); \
return !overflow ? ret : 0; \
  }
 
Passed the rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-37.c: New test.
* gcc.target/riscv/sat_u_sub-38.c: New test.
* gcc.target/riscv/sat_u_sub-39.c: New test.
* gcc.target/riscv/sat_u_sub-40.c: New test.
* gcc.target/riscv/sat_u_sub-run-37.c: New test.
* gcc.target/riscv/sat_u_sub-run-38.c: New test.
* gcc.target/riscv/sat_u_sub-run-39.c: New test.
* gcc.target/riscv/sat_u_sub-run-40.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-39.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-40.c | 17 +
.../gcc.target/riscv/sat_u_sub-run-37.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-38.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-39.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-40.c   | 25 +++
9 files changed, 182 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-39.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-40.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-37.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-38.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-39.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-40.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index ecb74e56e9c..4c02783e845 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -147,6 +147,15 @@ sat_u_sub_##T##_fmt_9 (T x, T y)\
   return overflow ? 0 : ret;\
}
+#define DEF_SAT_U_SUB_FMT_10(T) \
+T __attribute__((noinline)) \
+sat_u_sub_##T##_fmt_10 (T x, T y)   \
+{   \
+  T ret;\
+  T overflow = __builtin_sub_overflow (x, y, ); \
+  return !overflow ? ret : 0;   \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -156,6 +165,7 @@ sat_u_sub_##T##_fmt_9 (T x, T y)\
#define RUN_SAT_U_SUB_FMT_7(T, x, y) sat_u_sub_##T##_fmt_7(x, y)
#define RUN_SAT_U_SUB_FMT_8(T, x, y) sat_u_sub_##T##_fmt_8(x, y)
#define RUN_SAT_U_SUB_FMT_9(T, x, y) sat_u_sub_##T##_fmt_9(x, y)
+#define RUN_SAT_U_SUB_FMT_10(T, x, y) sat_u_sub_##T##_fmt_10(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c
new file mode 100644
index 000..8c97a518d2c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_10:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_10(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c
new file mode 100644
index 000..7e3cec2a9a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } 

Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

2024-06-13 Thread Kewen.Lin
on 2024/6/13 21:24, Peter Bergner wrote:
> On 6/13/24 12:35 AM, Kewen.Lin wrote:
>>> @@ -826,7 +826,14 @@ rs6000_stack_info (void)
>>>   info->ehrd_offset -= info->rop_hash_size;
>>> }
>>>else
>>> -   info->ehrd_offset = info->gp_save_offset - ehrd_size;
>>> +   {
>>> + info->ehrd_offset = info->gp_save_offset - ehrd_size;
>>> +
>>> + /* Adjust for ROP protection.  */
>>> + info->rop_hash_save_offset
>>> +   = info->gp_save_offset - info->rop_hash_size;
>>> + info->ehrd_offset -= info->rop_hash_size;
>>> +   }
>>
>> I understand this is just copied from the if arm, but if I read this right, 
>> it can be
>> simplified as:
> 
> Ok, I'll retest with that simplification.

Thanks!
>>> +/* { dg-do assemble } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect -mno-vsx 
>>> -mno-altivec -mabi=no-altivec -save-temps" } */
>>
>> I'd expect -mabi=no-altivec is default for -mno-altivec, but specifying it 
>> explicitly
>> looks fine to me. :)
> 
> That's what I expected too! :-)  However, I was surprised to learn that 
> -mno-altivec
> does *not* disable TARGET_ALTIVEC_ABI.  I had to explicitly use the -mabi= 
> option to
> expose the bug.

oh, it's surprising, I learn something today! :) I guess it's not intentional 
but just no
one noticed it, as it seems nonsense to have altivec ABI extension but not 
using any altivec
features.

BR,
Kewen



Re: [PATCH v1 6/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 8

2024-06-13 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 6/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 8
From: Pan Li 
 
After the middle-end support the form 8 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 8 of unsigned .SAT_SUB.
 
Form 8:
  #define SAT_SUB_U_8(T) \
  T sat_sub_u_8_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, ); \
return ret & (T)-(!overflow); \
  }
 
Passed the rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-29.c: New test.
* gcc.target/riscv/sat_u_sub-30.c: New test.
* gcc.target/riscv/sat_u_sub-31.c: New test.
* gcc.target/riscv/sat_u_sub-32.c: New test.
* gcc.target/riscv/sat_u_sub-run-29.c: New test.
* gcc.target/riscv/sat_u_sub-run-30.c: New test.
* gcc.target/riscv/sat_u_sub-run-31.c: New test.
* gcc.target/riscv/sat_u_sub-run-32.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-31.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-32.c | 17 +
.../gcc.target/riscv/sat_u_sub-run-29.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-30.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-31.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-32.c   | 25 +++
9 files changed, 182 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-31.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-29.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-30.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-31.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-32.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index bde054d5c9d..9f901de5cdf 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -129,6 +129,15 @@ sat_u_sub_##T##_fmt_7 (T x, T y)\
   return ret & (T)(overflow - 1);   \
}
+#define DEF_SAT_U_SUB_FMT_8(T)  \
+T __attribute__((noinline)) \
+sat_u_sub_##T##_fmt_8 (T x, T y)\
+{   \
+  T ret;\
+  T overflow = __builtin_sub_overflow (x, y, ); \
+  return ret & (T)-(!overflow); \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -136,6 +145,7 @@ sat_u_sub_##T##_fmt_7 (T x, T y)\
#define RUN_SAT_U_SUB_FMT_5(T, x, y) sat_u_sub_##T##_fmt_5(x, y)
#define RUN_SAT_U_SUB_FMT_6(T, x, y) sat_u_sub_##T##_fmt_6(x, y)
#define RUN_SAT_U_SUB_FMT_7(T, x, y) sat_u_sub_##T##_fmt_7(x, y)
+#define RUN_SAT_U_SUB_FMT_8(T, x, y) sat_u_sub_##T##_fmt_8(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c
new file mode 100644
index 000..1a2da50256e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_8:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_8(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c
new file mode 100644
index 000..75aa7506369
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+

Re: [PATCH v1 2/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 4

2024-06-13 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 2/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 4
From: Pan Li 
 
After the middle-end support the form 4 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 4 of unsigned .SAT_SUB.
 
Form 4:
  #define SAT_SUB_U_4(T) \
  T sat_sub_u_4_##T (T x, T y) \
  { \
return x >= y ? x - y : 0; \
  }
 
Passed the rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-13.c: New test.
* gcc.target/riscv/sat_u_sub-14.c: New test.
* gcc.target/riscv/sat_u_sub-15.c: New test.
* gcc.target/riscv/sat_u_sub-16.c: New test.
* gcc.target/riscv/sat_u_sub-run-13.c: New test.
* gcc.target/riscv/sat_u_sub-run-14.c: New test.
* gcc.target/riscv/sat_u_sub-run-15.c: New test.
* gcc.target/riscv/sat_u_sub-run-16.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-16.c | 17 +
.../gcc.target/riscv/sat_u_sub-run-13.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-14.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-15.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-16.c   | 25 +++
9 files changed, 180 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-13.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-15.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-16.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 50c65cdea49..b2f8478d36b 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -99,9 +99,17 @@ sat_u_sub_##T##_fmt_3 (T x, T y)  \
   return x > y ? x - y : 0;   \
}
+#define DEF_SAT_U_SUB_FMT_4(T)   \
+T __attribute__((noinline))  \
+sat_u_sub_##T##_fmt_4 (T x, T y) \
+{\
+  return x >= y ? x - y : 0; \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
+#define RUN_SAT_U_SUB_FMT_4(T, x, y) sat_u_sub_##T##_fmt_4(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c
new file mode 100644
index 000..edb7017f9b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_4:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_4(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c
new file mode 100644
index 000..2aab9f65586
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_4:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_4(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c
new file mode 100644
index 000..25ad702bf04
--- /dev/null
+++ 

Re: [PATCH v1 5/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 7

2024-06-13 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 5/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 7
From: Pan Li 
 
After the middle-end support the form 7 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 7 of unsigned .SAT_SUB.
 
Form 7:
  #define SAT_SUB_U_7(T) \
  T sat_sub_u_7_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, ); \
return ret & (T)(overflow - 1); \
  }
 
Passed the rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-25.c: New test.
* gcc.target/riscv/sat_u_sub-26.c: New test.
* gcc.target/riscv/sat_u_sub-27.c: New test.
* gcc.target/riscv/sat_u_sub-28.c: New test.
* gcc.target/riscv/sat_u_sub-run-25.c: New test.
* gcc.target/riscv/sat_u_sub-run-26.c: New test.
* gcc.target/riscv/sat_u_sub-run-27.c: New test.
* gcc.target/riscv/sat_u_sub-run-28.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-27.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-28.c | 17 +
.../gcc.target/riscv/sat_u_sub-run-25.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-26.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-27.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-28.c   | 25 +++
9 files changed, 182 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-27.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-28.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-25.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-26.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-27.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-28.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 4296235cf62..bde054d5c9d 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -120,12 +120,22 @@ sat_u_sub_##T##_fmt_6 (T x, T y) \
   return x <= y ? 0 : x - y; \
}
+#define DEF_SAT_U_SUB_FMT_7(T)  \
+T __attribute__((noinline)) \
+sat_u_sub_##T##_fmt_7 (T x, T y)\
+{   \
+  T ret;\
+  T overflow = __builtin_sub_overflow (x, y, ); \
+  return ret & (T)(overflow - 1);   \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
#define RUN_SAT_U_SUB_FMT_4(T, x, y) sat_u_sub_##T##_fmt_4(x, y)
#define RUN_SAT_U_SUB_FMT_5(T, x, y) sat_u_sub_##T##_fmt_5(x, y)
#define RUN_SAT_U_SUB_FMT_6(T, x, y) sat_u_sub_##T##_fmt_6(x, y)
+#define RUN_SAT_U_SUB_FMT_7(T, x, y) sat_u_sub_##T##_fmt_7(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c
new file mode 100644
index 000..8780ef0c8f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_7:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_7(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c
new file mode 100644
index 000..f720f619d09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_7:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** 

Re: [PATCH v1 4/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 6

2024-06-13 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 4/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 6
From: Pan Li 
 
After the middle-end support the form 6 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 6 of unsigned .SAT_SUB.
 
Form 6:
  #define SAT_SUB_U_6(T) \
  T sat_sub_u_6_##T (T x, T y) \
  { \
return x <= y ? 0 : x - y; \
  }
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-21.c: New test.
* gcc.target/riscv/sat_u_sub-22.c: New test.
* gcc.target/riscv/sat_u_sub-23.c: New test.
* gcc.target/riscv/sat_u_sub-24.c: New test.
* gcc.target/riscv/sat_u_sub-run-21.c: New test.
* gcc.target/riscv/sat_u_sub-run-22.c: New test.
* gcc.target/riscv/sat_u_sub-run-23.c: New test.
* gcc.target/riscv/sat_u_sub-run-24.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-24.c | 17 +
.../gcc.target/riscv/sat_u_sub-run-21.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-22.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-23.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-24.c   | 25 +++
9 files changed, 180 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-24.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-21.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-22.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-23.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-24.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index d08755dd861..4296235cf62 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -113,11 +113,19 @@ sat_u_sub_##T##_fmt_5 (T x, T y) \
   return x < y ? 0 : x - y;  \
}
+#define DEF_SAT_U_SUB_FMT_6(T)   \
+T __attribute__((noinline))  \
+sat_u_sub_##T##_fmt_6 (T x, T y) \
+{\
+  return x <= y ? 0 : x - y; \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
#define RUN_SAT_U_SUB_FMT_4(T, x, y) sat_u_sub_##T##_fmt_4(x, y)
#define RUN_SAT_U_SUB_FMT_5(T, x, y) sat_u_sub_##T##_fmt_5(x, y)
+#define RUN_SAT_U_SUB_FMT_6(T, x, y) sat_u_sub_##T##_fmt_6(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c
new file mode 100644
index 000..9a8fb7f1c91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_6:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_6(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c
new file mode 100644
index 000..6182169edc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_6:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_6(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c
new 

Re: [PATCH v1 7/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 9

2024-06-13 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 7/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 9
From: Pan Li 
 
After the middle-end support the form 9 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 9 of unsigned .SAT_SUB.
 
Form 9:
  #define SAT_SUB_U_9(T) \
  T sat_sub_u_9_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, ); \
return overflow ? 0 : ret; \
  }
 
Passed the rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-33.c: New test.
* gcc.target/riscv/sat_u_sub-34.c: New test.
* gcc.target/riscv/sat_u_sub-35.c: New test.
* gcc.target/riscv/sat_u_sub-36.c: New test.
* gcc.target/riscv/sat_u_sub-run-33.c: New test.
* gcc.target/riscv/sat_u_sub-run-34.c: New test.
* gcc.target/riscv/sat_u_sub-run-35.c: New test.
* gcc.target/riscv/sat_u_sub-run-36.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-35.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-36.c | 17 +
.../gcc.target/riscv/sat_u_sub-run-33.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-34.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-35.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-36.c   | 25 +++
9 files changed, 182 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-35.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-36.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-33.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-34.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-35.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-36.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 9f901de5cdf..ecb74e56e9c 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -138,6 +138,15 @@ sat_u_sub_##T##_fmt_8 (T x, T y)\
   return ret & (T)-(!overflow); \
}
+#define DEF_SAT_U_SUB_FMT_9(T)  \
+T __attribute__((noinline)) \
+sat_u_sub_##T##_fmt_9 (T x, T y)\
+{   \
+  T ret;\
+  T overflow = __builtin_sub_overflow (x, y, ); \
+  return overflow ? 0 : ret;\
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -146,6 +155,7 @@ sat_u_sub_##T##_fmt_8 (T x, T y)\
#define RUN_SAT_U_SUB_FMT_6(T, x, y) sat_u_sub_##T##_fmt_6(x, y)
#define RUN_SAT_U_SUB_FMT_7(T, x, y) sat_u_sub_##T##_fmt_7(x, y)
#define RUN_SAT_U_SUB_FMT_8(T, x, y) sat_u_sub_##T##_fmt_8(x, y)
+#define RUN_SAT_U_SUB_FMT_9(T, x, y) sat_u_sub_##T##_fmt_9(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c
new file mode 100644
index 000..aca4bd28b5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_9:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_9(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c
new file mode 100644
index 000..f87a51a504b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+

Re: [PATCH v1 3/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 5

2024-06-13 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 3/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 5
From: Pan Li 
 
After the middle-end support the form 5 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 5 of unsigned .SAT_SUB.
 
Form 5:
  #define SAT_SUB_U_5(T) \
  T sat_sub_u_5_##T (T x, T y) \
  { \
return x < y ? 0 : x - y; \
  }
 
Passed the rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-17.c: New test.
* gcc.target/riscv/sat_u_sub-18.c: New test.
* gcc.target/riscv/sat_u_sub-19.c: New test.
* gcc.target/riscv/sat_u_sub-20.c: New test.
* gcc.target/riscv/sat_u_sub-run-17.c: New test.
* gcc.target/riscv/sat_u_sub-run-18.c: New test.
* gcc.target/riscv/sat_u_sub-run-19.c: New test.
* gcc.target/riscv/sat_u_sub-run-20.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-20.c | 17 +
.../gcc.target/riscv/sat_u_sub-run-17.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-18.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-19.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-20.c   | 25 +++
9 files changed, 180 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-20.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-17.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-18.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-19.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-20.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index b2f8478d36b..d08755dd861 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -106,10 +106,18 @@ sat_u_sub_##T##_fmt_4 (T x, T y) \
   return x >= y ? x - y : 0; \
}
+#define DEF_SAT_U_SUB_FMT_5(T)   \
+T __attribute__((noinline))  \
+sat_u_sub_##T##_fmt_5 (T x, T y) \
+{\
+  return x < y ? 0 : x - y;  \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
#define RUN_SAT_U_SUB_FMT_4(T, x, y) sat_u_sub_##T##_fmt_4(x, y)
+#define RUN_SAT_U_SUB_FMT_5(T, x, y) sat_u_sub_##T##_fmt_5(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c
new file mode 100644
index 000..853ddcfd285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_5:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_5(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c
new file mode 100644
index 000..423a6f82170
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_5:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_5(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c
new file mode 100644
index 

Re: [PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 3

2024-06-13 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-06-14 10:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 3
From: Pan Li 
 
After the middle-end support the form 3 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 3 of unsigned .SAT_SUB.
 
Form 3:
  #define SAT_SUB_U_3(T) \
  T sat_sub_u_3_##T (T x, T y) \
  { \
return x > y ? x - y : 0; \
  }
 
Passed the rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-10.c: New test.
* gcc.target/riscv/sat_u_sub-11.c: New test.
* gcc.target/riscv/sat_u_sub-12.c: New test.
* gcc.target/riscv/sat_u_sub-9.c: New test.
* gcc.target/riscv/sat_u_sub-run-10.c: New test.
* gcc.target/riscv/sat_u_sub-run-11.c: New test.
* gcc.target/riscv/sat_u_sub-run-12.c: New test.
* gcc.target/riscv/sat_u_sub-run-9.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c | 17 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-9.c  | 18 +
.../gcc.target/riscv/sat_u_sub-run-10.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-11.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-12.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-9.c| 25 +++
9 files changed, 180 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-9.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index bc9a372b6df..50c65cdea49 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -92,8 +92,16 @@ sat_u_sub_##T##_fmt_2 (T x, T y)  \
   return (x - y) & (-(T)(x > y)); \
}
+#define DEF_SAT_U_SUB_FMT_3(T)\
+T __attribute__((noinline))   \
+sat_u_sub_##T##_fmt_3 (T x, T y)  \
+{ \
+  return x > y ? x - y : 0;   \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
+#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
new file mode 100644
index 000..6e78164865f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_3:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_3(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
new file mode 100644
index 000..84e34657f55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint32_t_fmt_3:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_SUB_FMT_3(uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c
new file mode 100644
index 000..eea282b21ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { 

Re: [PATCH 10/52] jit: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-13 Thread Kewen.Lin
Hi David,

on 2024/6/13 21:44, David Malcolm wrote:
> On Sun, 2024-06-02 at 22:01 -0500, Kewen Lin wrote:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  Unlike the other FEs, for the
>> uses in recording::memento_of_get_type::get_size, since
>> {float,{,long_}double}_type_node haven't been initialized
>> yet, this is to replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> with calling hook targetm.c.mode_for_floating_type.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/jit/ChangeLog:
>>
>> * jit-recording.cc
>> (recording::memento_of_get_type::get_size): Update
>> macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>> targetm.c.mode_for_floating_type with
>> TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
>> ---
>>  gcc/jit/jit-recording.cc | 12 
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
>> index 68a2e860c1f..7719b898e57 100644
>> --- a/gcc/jit/jit-recording.cc
>> +++ b/gcc/jit/jit-recording.cc
>> @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "config.h"
>>  #include "system.h"
>>  #include "coretypes.h"
>> -#include "tm.h"
>> +#include "target.h"
>>  #include "pretty-print.h"
>>  #include "toplev.h"
>>  
>> @@ -2353,6 +2353,7 @@ size_t
>>  recording::memento_of_get_type::get_size ()
>>  {
>>    int size;
>> +  machine_mode m;
>>    switch (m_kind)
>>  {
>>  case GCC_JIT_TYPE_VOID:
>> @@ -2399,13 +2400,16 @@ recording::memento_of_get_type::get_size ()
>>    size = 128;
>>    break;
>>  case GCC_JIT_TYPE_FLOAT:
>> -  size = FLOAT_TYPE_SIZE;
>> +  m = targetm.c.mode_for_floating_type (TI_FLOAT_TYPE);
>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>    break;
>>  case GCC_JIT_TYPE_DOUBLE:
>> -  size = DOUBLE_TYPE_SIZE;
>> +  m = targetm.c.mode_for_floating_type (TI_DOUBLE_TYPE);
>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>    break;
>>  case GCC_JIT_TYPE_LONG_DOUBLE:
>> -  size = LONG_DOUBLE_TYPE_SIZE;
>> +  m = targetm.c.mode_for_floating_type (TI_LONG_DOUBLE_TYPE);
>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>    break;
>>  case GCC_JIT_TYPE_SIZE_T:
>>    size = MAX_BITS_PER_WORD;
> 
> [CCing jit mailing list]
> 
> Thanks for the patch; sorry for the delay in responding.
> 
> Did your testing include jit?  Note that --enable-languages=all does
> *not* include it (due to it needing --enable-host-shared).

Thanks for the hints!  Yes, as noted in the cover letter, I did test jit.
Initially I used TYPE_PRECISION ({float,{long_,}double_type_node) to
replace these just like what I proposed for the other FE changes, but the
testing showed some failures on test-combination.c etc., by looking into
them, I realized that this call recording::memento_of_get_type::get_size
can happen before when we set up those type nodes.  Then I had to use the
current approach with the new hook, it made all failures gone (no
regressions).  btw, test result comparison showed some more lines with
"NA->PASS: test-threads.c.exe", since it's positive, I didn't look into
it.

> 
> The jit::recording code runs *very* early - before toplev::main.  For
> example, a call to gcc_jit_type_get_size can trigger the above code
> path before toplev::main has run.
> 
> target.h says each target should have a:
> 
>   struct gcc_target targetm = TARGET_INITIALIZER;
> 
> Has targetm.c.mode_for_floating_type been initialized enough by that
> static initialization?  

It depends on how to define "enough".  The hook has been initialized
as you pointed out, I just debugged it and confirmed target specific
hook was called as expected (rs6000_c_mode_for_floating_type on Power)
when this jit::recording function gets called.  If "enough" refers to
something like command line options, it's not ready.

> Could the mode_for_floating_type hook be
> relying on some target-specific dynamic initialization that hasn't run
> yet?  (e.g. taking account of command-line options?)
> 

Yes, it could.  Like rs6000 port, the hook checks rs6000_long_double_type_size
for long double (it's related to command line option -mlong-double-x) and
some other targets like i386, also would like to check TARGET_LONG_DOUBLE_64
and TARGET_LONG_DOUBLE_128.  But I think it isn't worse than before, without
this change (with the previous macro), we used to define the macro with
the things related to this command line options, which are still not ready.

#define LONG_DOUBLE_TYPE_SIZE rs6000_long_double_type_size

I debugged the code, jit::recording will see rs6000_long_double_type_size
with the static initialized value zero, it means that the function 
recording::memento_of_get_type::get_size would 

[PATCH v1 8/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 10

2024-06-13 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 10 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 10 of unsigned .SAT_SUB.

Form 10:
  #define SAT_SUB_U_10(T) \
  T sat_sub_u_10_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, ); \
return !overflow ? ret : 0; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-37.c: New test.
* gcc.target/riscv/sat_u_sub-38.c: New test.
* gcc.target/riscv/sat_u_sub-39.c: New test.
* gcc.target/riscv/sat_u_sub-40.c: New test.
* gcc.target/riscv/sat_u_sub-run-37.c: New test.
* gcc.target/riscv/sat_u_sub-run-38.c: New test.
* gcc.target/riscv/sat_u_sub-run-39.c: New test.
* gcc.target/riscv/sat_u_sub-run-40.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-39.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-40.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-37.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-38.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-39.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-40.c   | 25 +++
 9 files changed, 182 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-39.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-40.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-37.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-38.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-39.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-40.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index ecb74e56e9c..4c02783e845 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -147,6 +147,15 @@ sat_u_sub_##T##_fmt_9 (T x, T y)\
   return overflow ? 0 : ret;\
 }
 
+#define DEF_SAT_U_SUB_FMT_10(T) \
+T __attribute__((noinline)) \
+sat_u_sub_##T##_fmt_10 (T x, T y)   \
+{   \
+  T ret;\
+  T overflow = __builtin_sub_overflow (x, y, ); \
+  return !overflow ? ret : 0;   \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -156,6 +165,7 @@ sat_u_sub_##T##_fmt_9 (T x, T y)\
 #define RUN_SAT_U_SUB_FMT_7(T, x, y) sat_u_sub_##T##_fmt_7(x, y)
 #define RUN_SAT_U_SUB_FMT_8(T, x, y) sat_u_sub_##T##_fmt_8(x, y)
 #define RUN_SAT_U_SUB_FMT_9(T, x, y) sat_u_sub_##T##_fmt_9(x, y)
+#define RUN_SAT_U_SUB_FMT_10(T, x, y) sat_u_sub_##T##_fmt_10(x, y)
 
 #define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
 void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c
new file mode 100644
index 000..8c97a518d2c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-37.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_10:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_10(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c
new file mode 100644
index 000..7e3cec2a9a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-38.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_10:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** 

[PATCH v1 7/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 9

2024-06-13 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 9 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 9 of unsigned .SAT_SUB.

Form 9:
  #define SAT_SUB_U_9(T) \
  T sat_sub_u_9_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, ); \
return overflow ? 0 : ret; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-33.c: New test.
* gcc.target/riscv/sat_u_sub-34.c: New test.
* gcc.target/riscv/sat_u_sub-35.c: New test.
* gcc.target/riscv/sat_u_sub-36.c: New test.
* gcc.target/riscv/sat_u_sub-run-33.c: New test.
* gcc.target/riscv/sat_u_sub-run-34.c: New test.
* gcc.target/riscv/sat_u_sub-run-35.c: New test.
* gcc.target/riscv/sat_u_sub-run-36.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-35.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-36.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-33.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-34.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-35.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-36.c   | 25 +++
 9 files changed, 182 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-35.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-36.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-33.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-34.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-35.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-36.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 9f901de5cdf..ecb74e56e9c 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -138,6 +138,15 @@ sat_u_sub_##T##_fmt_8 (T x, T y)\
   return ret & (T)-(!overflow); \
 }
 
+#define DEF_SAT_U_SUB_FMT_9(T)  \
+T __attribute__((noinline)) \
+sat_u_sub_##T##_fmt_9 (T x, T y)\
+{   \
+  T ret;\
+  T overflow = __builtin_sub_overflow (x, y, ); \
+  return overflow ? 0 : ret;\
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -146,6 +155,7 @@ sat_u_sub_##T##_fmt_8 (T x, T y)\
 #define RUN_SAT_U_SUB_FMT_6(T, x, y) sat_u_sub_##T##_fmt_6(x, y)
 #define RUN_SAT_U_SUB_FMT_7(T, x, y) sat_u_sub_##T##_fmt_7(x, y)
 #define RUN_SAT_U_SUB_FMT_8(T, x, y) sat_u_sub_##T##_fmt_8(x, y)
+#define RUN_SAT_U_SUB_FMT_9(T, x, y) sat_u_sub_##T##_fmt_9(x, y)
 
 #define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
 void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c
new file mode 100644
index 000..aca4bd28b5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-33.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_9:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_9(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c
new file mode 100644
index 000..f87a51a504b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-34.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_9:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** 

[PATCH v1 4/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 6

2024-06-13 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 6 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 6 of unsigned .SAT_SUB.

Form 6:
  #define SAT_SUB_U_6(T) \
  T sat_sub_u_6_##T (T x, T y) \
  { \
return x <= y ? 0 : x - y; \
  }

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-21.c: New test.
* gcc.target/riscv/sat_u_sub-22.c: New test.
* gcc.target/riscv/sat_u_sub-23.c: New test.
* gcc.target/riscv/sat_u_sub-24.c: New test.
* gcc.target/riscv/sat_u_sub-run-21.c: New test.
* gcc.target/riscv/sat_u_sub-run-22.c: New test.
* gcc.target/riscv/sat_u_sub-run-23.c: New test.
* gcc.target/riscv/sat_u_sub-run-24.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-24.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-21.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-22.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-23.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-24.c   | 25 +++
 9 files changed, 180 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-24.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-21.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-22.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-23.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-24.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index d08755dd861..4296235cf62 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -113,11 +113,19 @@ sat_u_sub_##T##_fmt_5 (T x, T y) \
   return x < y ? 0 : x - y;  \
 }
 
+#define DEF_SAT_U_SUB_FMT_6(T)   \
+T __attribute__((noinline))  \
+sat_u_sub_##T##_fmt_6 (T x, T y) \
+{\
+  return x <= y ? 0 : x - y; \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
 #define RUN_SAT_U_SUB_FMT_4(T, x, y) sat_u_sub_##T##_fmt_4(x, y)
 #define RUN_SAT_U_SUB_FMT_5(T, x, y) sat_u_sub_##T##_fmt_5(x, y)
+#define RUN_SAT_U_SUB_FMT_6(T, x, y) sat_u_sub_##T##_fmt_6(x, y)
 
 #define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
 void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c
new file mode 100644
index 000..9a8fb7f1c91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-21.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_6:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_6(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c
new file mode 100644
index 000..6182169edc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_6:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_6(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c
new file mode 100644
index 000..820110cdbb0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-23.c
@@ -0,0 +1,18 @@

[PATCH v1 5/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 7

2024-06-13 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 7 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 7 of unsigned .SAT_SUB.

Form 7:
  #define SAT_SUB_U_7(T) \
  T sat_sub_u_7_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, ); \
return ret & (T)(overflow - 1); \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-25.c: New test.
* gcc.target/riscv/sat_u_sub-26.c: New test.
* gcc.target/riscv/sat_u_sub-27.c: New test.
* gcc.target/riscv/sat_u_sub-28.c: New test.
* gcc.target/riscv/sat_u_sub-run-25.c: New test.
* gcc.target/riscv/sat_u_sub-run-26.c: New test.
* gcc.target/riscv/sat_u_sub-run-27.c: New test.
* gcc.target/riscv/sat_u_sub-run-28.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-27.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-28.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-25.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-26.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-27.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-28.c   | 25 +++
 9 files changed, 182 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-27.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-28.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-25.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-26.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-27.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-28.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 4296235cf62..bde054d5c9d 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -120,12 +120,22 @@ sat_u_sub_##T##_fmt_6 (T x, T y) \
   return x <= y ? 0 : x - y; \
 }
 
+#define DEF_SAT_U_SUB_FMT_7(T)  \
+T __attribute__((noinline)) \
+sat_u_sub_##T##_fmt_7 (T x, T y)\
+{   \
+  T ret;\
+  T overflow = __builtin_sub_overflow (x, y, ); \
+  return ret & (T)(overflow - 1);   \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
 #define RUN_SAT_U_SUB_FMT_4(T, x, y) sat_u_sub_##T##_fmt_4(x, y)
 #define RUN_SAT_U_SUB_FMT_5(T, x, y) sat_u_sub_##T##_fmt_5(x, y)
 #define RUN_SAT_U_SUB_FMT_6(T, x, y) sat_u_sub_##T##_fmt_6(x, y)
+#define RUN_SAT_U_SUB_FMT_7(T, x, y) sat_u_sub_##T##_fmt_7(x, y)
 
 #define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
 void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c
new file mode 100644
index 000..8780ef0c8f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-25.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_7:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_7(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c
new file mode 100644
index 000..f720f619d09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-26.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_7:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48

[PATCH v1 3/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 5

2024-06-13 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 5 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 5 of unsigned .SAT_SUB.

Form 5:
  #define SAT_SUB_U_5(T) \
  T sat_sub_u_5_##T (T x, T y) \
  { \
return x < y ? 0 : x - y; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-17.c: New test.
* gcc.target/riscv/sat_u_sub-18.c: New test.
* gcc.target/riscv/sat_u_sub-19.c: New test.
* gcc.target/riscv/sat_u_sub-20.c: New test.
* gcc.target/riscv/sat_u_sub-run-17.c: New test.
* gcc.target/riscv/sat_u_sub-run-18.c: New test.
* gcc.target/riscv/sat_u_sub-run-19.c: New test.
* gcc.target/riscv/sat_u_sub-run-20.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-20.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-17.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-18.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-19.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-20.c   | 25 +++
 9 files changed, 180 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-20.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-17.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-18.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-19.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-20.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index b2f8478d36b..d08755dd861 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -106,10 +106,18 @@ sat_u_sub_##T##_fmt_4 (T x, T y) \
   return x >= y ? x - y : 0; \
 }
 
+#define DEF_SAT_U_SUB_FMT_5(T)   \
+T __attribute__((noinline))  \
+sat_u_sub_##T##_fmt_5 (T x, T y) \
+{\
+  return x < y ? 0 : x - y;  \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
 #define RUN_SAT_U_SUB_FMT_4(T, x, y) sat_u_sub_##T##_fmt_4(x, y)
+#define RUN_SAT_U_SUB_FMT_5(T, x, y) sat_u_sub_##T##_fmt_5(x, y)
 
 #define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
 void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c
new file mode 100644
index 000..853ddcfd285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-17.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_5:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_5(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c
new file mode 100644
index 000..423a6f82170
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-18.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_5:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_5(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c
new file mode 100644
index 000..29b9c235d97
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-19.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */

[PATCH v1 6/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 8

2024-06-13 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 8 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 8 of unsigned .SAT_SUB.

Form 8:
  #define SAT_SUB_U_8(T) \
  T sat_sub_u_8_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, ); \
return ret & (T)-(!overflow); \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-29.c: New test.
* gcc.target/riscv/sat_u_sub-30.c: New test.
* gcc.target/riscv/sat_u_sub-31.c: New test.
* gcc.target/riscv/sat_u_sub-32.c: New test.
* gcc.target/riscv/sat_u_sub-run-29.c: New test.
* gcc.target/riscv/sat_u_sub-run-30.c: New test.
* gcc.target/riscv/sat_u_sub-run-31.c: New test.
* gcc.target/riscv/sat_u_sub-run-32.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-31.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-32.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-29.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-30.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-31.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-32.c   | 25 +++
 9 files changed, 182 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-31.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-29.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-30.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-31.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-32.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index bde054d5c9d..9f901de5cdf 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -129,6 +129,15 @@ sat_u_sub_##T##_fmt_7 (T x, T y)\
   return ret & (T)(overflow - 1);   \
 }
 
+#define DEF_SAT_U_SUB_FMT_8(T)  \
+T __attribute__((noinline)) \
+sat_u_sub_##T##_fmt_8 (T x, T y)\
+{   \
+  T ret;\
+  T overflow = __builtin_sub_overflow (x, y, ); \
+  return ret & (T)-(!overflow); \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -136,6 +145,7 @@ sat_u_sub_##T##_fmt_7 (T x, T y)\
 #define RUN_SAT_U_SUB_FMT_5(T, x, y) sat_u_sub_##T##_fmt_5(x, y)
 #define RUN_SAT_U_SUB_FMT_6(T, x, y) sat_u_sub_##T##_fmt_6(x, y)
 #define RUN_SAT_U_SUB_FMT_7(T, x, y) sat_u_sub_##T##_fmt_7(x, y)
+#define RUN_SAT_U_SUB_FMT_8(T, x, y) sat_u_sub_##T##_fmt_8(x, y)
 
 #define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
 void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c
new file mode 100644
index 000..1a2da50256e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-29.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_8:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_8(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c
new file mode 100644
index 000..75aa7506369
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-30.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_8:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** 

[PATCH v1 2/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 4

2024-06-13 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 4 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 4 of unsigned .SAT_SUB.

Form 4:
  #define SAT_SUB_U_4(T) \
  T sat_sub_u_4_##T (T x, T y) \
  { \
return x >= y ? x - y : 0; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-13.c: New test.
* gcc.target/riscv/sat_u_sub-14.c: New test.
* gcc.target/riscv/sat_u_sub-15.c: New test.
* gcc.target/riscv/sat_u_sub-16.c: New test.
* gcc.target/riscv/sat_u_sub-run-13.c: New test.
* gcc.target/riscv/sat_u_sub-run-14.c: New test.
* gcc.target/riscv/sat_u_sub-run-15.c: New test.
* gcc.target/riscv/sat_u_sub-run-16.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-16.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-13.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-14.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-15.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-16.c   | 25 +++
 9 files changed, 180 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-16.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 50c65cdea49..b2f8478d36b 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -99,9 +99,17 @@ sat_u_sub_##T##_fmt_3 (T x, T y)  \
   return x > y ? x - y : 0;   \
 }
 
+#define DEF_SAT_U_SUB_FMT_4(T)   \
+T __attribute__((noinline))  \
+sat_u_sub_##T##_fmt_4 (T x, T y) \
+{\
+  return x >= y ? x - y : 0; \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
+#define RUN_SAT_U_SUB_FMT_4(T, x, y) sat_u_sub_##T##_fmt_4(x, y)
 
 #define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
 void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c
new file mode 100644
index 000..edb7017f9b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-13.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_4:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_4(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c
new file mode 100644
index 000..2aab9f65586
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-14.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_4:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_4(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c
new file mode 100644
index 000..25ad702bf04
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-15.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 

[PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 3

2024-06-13 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 3 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 3 of unsigned .SAT_SUB.

Form 3:
  #define SAT_SUB_U_3(T) \
  T sat_sub_u_3_##T (T x, T y) \
  { \
return x > y ? x - y : 0; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-10.c: New test.
* gcc.target/riscv/sat_u_sub-11.c: New test.
* gcc.target/riscv/sat_u_sub-12.c: New test.
* gcc.target/riscv/sat_u_sub-9.c: New test.
* gcc.target/riscv/sat_u_sub-run-10.c: New test.
* gcc.target/riscv/sat_u_sub-run-11.c: New test.
* gcc.target/riscv/sat_u_sub-run-12.c: New test.
* gcc.target/riscv/sat_u_sub-run-9.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c | 17 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-9.c  | 18 +
 .../gcc.target/riscv/sat_u_sub-run-10.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-11.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-12.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-9.c| 25 +++
 9 files changed, 180 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-9.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index bc9a372b6df..50c65cdea49 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -92,8 +92,16 @@ sat_u_sub_##T##_fmt_2 (T x, T y)  \
   return (x - y) & (-(T)(x > y)); \
 }
 
+#define DEF_SAT_U_SUB_FMT_3(T)\
+T __attribute__((noinline))   \
+sat_u_sub_##T##_fmt_3 (T x, T y)  \
+{ \
+  return x > y ? x - y : 0;   \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
+#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
 
 #define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
 void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
new file mode 100644
index 000..6e78164865f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_3:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_3(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
new file mode 100644
index 000..84e34657f55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint32_t_fmt_3:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_SUB_FMT_3(uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c
new file mode 100644
index 000..eea282b21ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { 

Re: [PATCH] expand: constify sepops operand to expand_expr_real_2 and expand_widen_pattern_expr [PR113212]

2024-06-13 Thread Jeff Law




On 6/13/24 7:54 PM, Andrew Pinski wrote:

While working on an expand patch back in January I noticed that
the first argument (of sepops type) of expand_expr_real_2 could be
constified as it was not to be touched by the function (nor should it be).
There is code in internal-fn.cc that depends on expand_expr_real_2 not touching
the ops argument so constification makes this more obvious.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR middle-end/113212
* expr.h (const_seqpops): New typedef.
(expand_expr_real_2): Constify the first argument.
* optabs.cc (expand_widen_pattern_expr): Likewise.
* optabs.h (expand_widen_pattern_expr): Likewise.
* expr.cc (expand_expr_real_2):  Likewise
(do_store_flag): Likewise. Remove incorrect store to ops->code.

OK.
jeff



[PATCH] expand: constify sepops operand to expand_expr_real_2 and expand_widen_pattern_expr [PR113212]

2024-06-13 Thread Andrew Pinski
While working on an expand patch back in January I noticed that
the first argument (of sepops type) of expand_expr_real_2 could be
constified as it was not to be touched by the function (nor should it be).
There is code in internal-fn.cc that depends on expand_expr_real_2 not touching
the ops argument so constification makes this more obvious.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR middle-end/113212
* expr.h (const_seqpops): New typedef.
(expand_expr_real_2): Constify the first argument.
* optabs.cc (expand_widen_pattern_expr): Likewise.
* optabs.h (expand_widen_pattern_expr): Likewise.
* expr.cc (expand_expr_real_2):  Likewise
(do_store_flag): Likewise. Remove incorrect store to ops->code.

Signed-off-by: Andrew Pinski 
---
 gcc/expr.cc   | 8 
 gcc/expr.h| 4 +++-
 gcc/optabs.cc | 2 +-
 gcc/optabs.h  | 2 +-
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 04bad5e1425..9cecc1758f5 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -94,7 +94,7 @@ static unsigned HOST_WIDE_INT highest_pow2_factor_for_target 
(const_tree, const_
 
 static bool is_aligning_offset (const_tree, const_tree);
 static rtx reduce_to_bit_field_precision (rtx, rtx, tree);
-static rtx do_store_flag (sepops, rtx, machine_mode);
+static rtx do_store_flag (const_sepops, rtx, machine_mode);
 #ifdef PUSH_ROUNDING
 static void emit_single_push_insn (machine_mode, rtx, tree);
 #endif
@@ -9643,7 +9643,7 @@ expand_expr_divmod (tree_code code, machine_mode mode, 
tree treeop0,
 }
 
 rtx
-expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
+expand_expr_real_2 (const_sepops ops, rtx target, machine_mode tmode,
enum expand_modifier modifier)
 {
   rtx op0, op1, op2, temp;
@@ -13504,7 +13504,7 @@ expand_single_bit_test (location_t loc, enum tree_code 
code,
set/jump/set sequence.  */
 
 static rtx
-do_store_flag (sepops ops, rtx target, machine_mode mode)
+do_store_flag (const_sepops ops, rtx target, machine_mode mode)
 {
   enum rtx_code code;
   tree arg0, arg1, type;
@@ -13566,7 +13566,7 @@ do_store_flag (sepops ops, rtx target, machine_mode 
mode)
   if (new_code != ops->code)
{
  struct separate_ops nops = *ops;
- nops.code = ops->code = new_code;
+ nops.code = new_code;
  nops.op0 = arg0;
  nops.op1 = arg1;
  nops.type = TREE_TYPE (arg0);
diff --git a/gcc/expr.h b/gcc/expr.h
index 75181584108..533ae0af387 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -53,6 +53,8 @@ typedef struct separate_ops
   tree type;
   tree op0, op1, op2;
 } *sepops;
+
+typedef const struct separate_ops *const_sepops;
 
 /* This is run during target initialization to set up which modes can be
used directly in memory and to initialize the block move optab.  */
@@ -305,7 +307,7 @@ extern rtx expand_expr_real (tree, rtx, machine_mode,
 enum expand_modifier, rtx *, bool);
 extern rtx expand_expr_real_1 (tree, rtx, machine_mode,
   enum expand_modifier, rtx *, bool);
-extern rtx expand_expr_real_2 (sepops, rtx, machine_mode,
+extern rtx expand_expr_real_2 (const_sepops, rtx, machine_mode,
   enum expand_modifier);
 extern rtx expand_expr_real_gassign (gassign *, rtx, machine_mode,
 enum expand_modifier modifier,
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 78cd9ef3448..c54d275b8b7 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -253,7 +253,7 @@ widen_operand (rtx op, machine_mode mode, machine_mode 
oldmode,
type-promotion (vec-unpack)  1   oprnd0  -   -  */
 
 rtx
-expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
+expand_widen_pattern_expr (const_sepops ops, rtx op0, rtx op1, rtx wide_op,
   rtx target, int unsignedp)
 {
   class expand_operand eops[4];
diff --git a/gcc/optabs.h b/gcc/optabs.h
index c0b8df5268f..301847e2186 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -182,7 +182,7 @@ enum optab_methods
   OPTAB_MUST_WIDEN
 };
 
-extern rtx expand_widen_pattern_expr (struct separate_ops *, rtx , rtx , rtx,
+extern rtx expand_widen_pattern_expr (const struct separate_ops *, rtx , rtx , 
rtx,
   rtx, int);
 extern rtx expand_ternary_op (machine_mode mode, optab ternary_optab,
  rtx op0, rtx op1, rtx op2, rtx target,
-- 
2.43.0



Re: [PATCH] [i386] restore recompute to override opts after change [PR113719]

2024-06-13 Thread Alexandre Oliva
On Jun 13, 2024, Hongyu Wang  wrote:

> I think the function name can be like ix86_unroll_flag_adjust instead
> of ix86_override_options_after_change_1, like the previous 2 functions
> which declares the usage more clearly.

I'd be happy to rename it, but we have a long-established convention of
using _ suffixes for functions that are wrapped or otherwise
formerly part of another, and that would likely be immediately
recognizable by GCC developers, whereas there seems to be a bit of an
organically evolutionary mess of recompute/override/adjust/tweak
functions in this general area, so I figured we'd be better off sticking
to a name that more clearly refers to the target hook.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH 2/2] [APX CFCMOV] Support APX CFCMOV

2024-06-13 Thread Kong, Lingling
From: konglin1 mailto:lingling.k...@intel.com>>



APX CFCMOV feature implements conditionally faulting which means that all

memory faults are suppressed when the condition code evaluates to false and

load or store a memory operand. Now we could load or store a memory operand

may trap or fault for conditional move.



To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP

in if-conversion pass to allow convert to cmov.



Bootstrapped & regtested on x86-64-pc-linux-gnu with binutils 2.42 branch.

OK for trunk?



gcc/ChangeLog:



   * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function 
that

   test if the cfcmov can be generated.

   (ix86_expand_int_movcc): Expand to cfcmov pattern if 
ix86_can_cfcmov_p

   return ture.

   * config/i386/i386-opts.h (enum apx_features): Add apx_cfcmov.

   * config/i386/i386.cc (ix86_have_conditional_move_mem_notrap): 
New

   function to hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP

   (TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP): Target hook define.

   (ix86_rtx_costs): Add UNSPEC_APX_CFCMOV cost;

   * config/i386/i386.h (TARGET_APX_CFCMOV): Define.

   * config/i386/i386.md (*cfcmov_1): New define_insn to 
support

   cfcmov.

   (*cfcmov_2): Ditto.

   (UNSPEC_APX_CFCMOV): New unspec for cfcmov.

   * config/i386/i386.opt: Add enum value for cfcmov.

   * ifcvt.cc (noce_try_cmove_load_mem_notrap): Use target hook to 
allow

   convert to cfcmov for conditional load.

   (noce_try_cmove_store_mem_notrap): Convert to conditional store.

   (noce_process_if_block): Ditto.



gcc/testsuite/ChangeLog:



   * gcc.target/i386/apx-cfcmov-1.c: New test.

   * gcc.target/i386/apx-cfcmov-2.c: Ditto.

---

gcc/config/i386/i386-expand.cc   |  63 +

gcc/config/i386/i386-opts.h  |   4 +-

gcc/config/i386/i386.cc  |  33 ++-

gcc/config/i386/i386.h   |   1 +

gcc/config/i386/i386.md  |  53 +++-

gcc/config/i386/i386.opt |   3 +

gcc/config/i386/predicates.md|   7 +

gcc/ifcvt.cc | 247 ++-

gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c |  73 ++

gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c |  40 +++

10 files changed, 511 insertions(+), 13 deletions(-)

create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c

create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c



diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc

index 312329e550b..c02a4bcbec3 100644

--- a/gcc/config/i386/i386-expand.cc

+++ b/gcc/config/i386/i386-expand.cc

@@ -3336,6 +3336,30 @@ ix86_expand_int_addcc (rtx operands[])

   return true;

}



+/* Return TRUE if we could convert "if (test) x = a; else x = b;" to cfcmov,

+   especially when load a or b or x store may cause memmory faults.  */

+bool

+ix86_can_cfcmov_p (rtx x, rtx a, rtx b)

+{

+  machine_mode mode = GET_MODE (x);

+  if (TARGET_APX_CFCMOV

+  && (mode == DImode || mode == SImode || mode == HImode))

+{

+  /* C load (r m r), (r m C), (r r m). For r m m could use

+ two cfcmov. */

+  if (register_operand (x, mode)

+   && ((MEM_P (a) && register_operand (b, mode))

+   || (MEM_P (a) && b == const0_rtx)

+   || (register_operand (a, mode) && MEM_P (b))

+   || (MEM_P (a) && MEM_P (b

+ return true;

+  /* C store  (m r 0).  */

+  else if (MEM_P (x) && x == b && register_operand (a, mode))

+ return true;

+}

+  return false;

+}

+

bool

ix86_expand_int_movcc (rtx operands[])

{

@@ -3366,6 +3390,45 @@ ix86_expand_int_movcc (rtx operands[])



   compare_code = GET_CODE (compare_op);



+  if (MEM_P (operands[0])

+  && !ix86_can_cfcmov_p (operands[0], op2, op3))

+return false;

+

+  if (may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))

+  {

+ if (ix86_can_cfcmov_p (operands[0], op2, op3))

+   {

+ if (may_trap_or_fault_p (op2))

+   op2 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[2]),

+  
UNSPEC_APX_CFCMOV);

+ if (may_trap_or_fault_p (op3))

+   op3 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[3]),

+  
UNSPEC_APX_CFCMOV);

+ emit_insn (compare_seq);

+

+ if (may_trap_or_fault_p (op2) && may_trap_or_fault_p (op3))

+   {

+emit_insn (gen_rtx_SET (operands[0],

+ 

[PATCH 1/2] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP

2024-06-13 Thread Kong, Lingling
From: konglin1 

gcc/ChangeLog:

* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Add TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
* target.def (bool,): New hook.
* targhooks.cc (default_have_conditional_move_mem_notrap): New
function to hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP.
* targhooks.h (default_have_conditional_move_mem_notrap): New
target hook declear.
---
 gcc/doc/tm.texi|  6 ++
 gcc/doc/tm.texi.in |  2 ++
 gcc/target.def | 11 +++
 gcc/targhooks.cc   |  8 
 gcc/targhooks.h|  1 +
 5 files changed, 28 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8a7aa70d605..f8faf44ab73 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7311,6 +7311,12 @@ candidate as a replacement for the if-convertible 
sequence described in
 @code{if_info}.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP (rtx 
@var{x})
+This hook returns true if the target supports condition move instructions
+  that enables fault suppression of memory operands when the condition code
+  evaluates to false.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NEW_ADDRESS_PROFITABLE_P (rtx 
@var{memref}, rtx_insn * @var{insn}, rtx @var{new_addr})
 Return @code{true} if it is profitable to replace the address in
 @var{memref} with @var{new_addr}.  This allows targets to prevent the
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9e0830758ae..17c122aea43 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4748,6 +4748,8 @@ Define this macro if a non-short-circuit operation 
produced by
 
 @hook TARGET_NOCE_CONVERSION_PROFITABLE_P
 
+@hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
+
 @hook TARGET_NEW_ADDRESS_PROFITABLE_P
 
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
diff --git a/gcc/target.def b/gcc/target.def
index 70070caebc7..aa77737e006 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3993,6 +3993,17 @@ candidate as a replacement for the if-convertible 
sequence described in\n\
 bool, (rtx_insn *seq, struct noce_if_info *if_info),
 default_noce_conversion_profitable_p)
 
+/* Return true if the target support condition move instructions that enables
+   fault suppression of memory operands when the condition code evaluates to
+   false.  */
+DEFHOOK
+(have_conditional_move_mem_notrap,
+ "This hook returns true if the target supports condition move instructions\n\
+  that enables fault suppression of memory operands when the condition code\n\
+  evaluates to false.",
+bool, (rtx x),
+default_have_conditional_move_mem_notrap)
+
 /* Return true if new_addr should be preferred over the existing address used 
by
memref in insn.  */
 DEFHOOK
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index fb339bf75dd..a616371b204 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -2816,4 +2816,12 @@ default_memtag_untagged_pointer (rtx tagged_pointer, rtx 
target)
   return untagged_base;
 }
 
+/* The default implementation of
+   TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP.  */
+bool
+default_have_conditional_move_mem_notrap (rtx x ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 85f3817c176..f8ea2fde53d 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -305,5 +305,6 @@ extern rtx default_memtag_add_tag (rtx, poly_int64, 
uint8_t);
 extern rtx default_memtag_set_tag (rtx, rtx, rtx);
 extern rtx default_memtag_extract_tag (rtx, rtx);
 extern rtx default_memtag_untagged_pointer (rtx, rtx);
+extern bool default_have_conditional_move_mem_notrap (rtx x);
 
 #endif /* GCC_TARGHOOKS_H */
-- 
2.31.1



[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-13 Thread Levy Hsu
This patch updates the GCC x86 backend to efficiently handle
odd, incrementally increasing permutations of BF16 vectors
using the cvtne2ps2bf16 instruction.
It modifies ix86_vectorize_vec_perm_const to support these operations
and adds a specific predicate to ensure proper sequence handling.

Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* config/i386/i386-expand.cc
(ix86_vectorize_vec_perm_const): Convert BF to HI using subreg.
* config/i386/predicates.md
(vcvtne2ps2bf_parallel): New define_insn_and_split.
* config/i386/sse.md
(vpermt2_sepcial_bf16_shuffle_): New predicates matches odd 
increasing perm.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vpermt2-special-bf16-shufflue.c: New test.
---
 gcc/config/i386/i386-expand.cc|  4 +--
 gcc/config/i386/predicates.md | 11 ++
 gcc/config/i386/sse.md| 35 +++
 .../i386/vpermt2-special-bf16-shufflue.c  | 27 ++
 4 files changed, 75 insertions(+), 2 deletions(-)
 create mode 100755 
gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 312329e550b..3d599c0651a 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23657,8 +23657,8 @@ ix86_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   if (GET_MODE_SIZE (vmode) == 64 && !TARGET_EVEX512)
 return false;
 
-  /* For HF mode vector, convert it to HI using subreg.  */
-  if (GET_MODE_INNER (vmode) == HFmode)
+  /* For HF and BF mode vector, convert it to HI using subreg.  */
+  if (GET_MODE_INNER (vmode) == HFmode || GET_MODE_INNER (vmode) == BFmode)
 {
   machine_mode orig_mode = vmode;
   vmode = mode_for_vector (HImode,
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 7afe3100cb7..1676c50de71 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -2322,3 +2322,14 @@
 
   return true;
 })
+
+;; Check that each element is odd and incrementally increasing from 1
+(define_predicate "vcvtne2ps2bf_parallel"
+  (and (match_code "const_vector")
+   (match_code "const_int" "a"))
+{
+  for (int i = 0; i < XVECLEN (op, 0); ++i)
+if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1))
+  return false;
+  return true;
+})
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 680a46a0b08..5ddd1c0a778 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -30698,3 +30698,38 @@
   "TARGET_AVXVNNIINT16"
   "vpdp\t{%3, %2, %0|%0, %2, %3}"
[(set_attr "prefix" "vex")])
+
+(define_mode_attr hi_cvt_bf
+  [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")])
+
+(define_mode_attr HI_CVT_BF
+  [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")])
+
+(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_"
+  [(set (match_operand:VI2_AVX512F 0 "register_operand")
+   (unspec:VI2_AVX512F
+ [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel")
+  (match_operand:VI2_AVX512F 2 "register_operand")
+  (match_operand:VI2_AVX512F 3 "nonimmediate_operand")]
+  UNSPEC_VPERMT2))]
+  "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx op0 = gen_reg_rtx (mode);
+  operands[2] = lowpart_subreg (mode,
+   force_reg (mode, operands[2]),
+   mode);
+  operands[3] = lowpart_subreg (mode,
+   force_reg (mode, operands[3]),
+   mode);
+
+  emit_insn (gen_avx512f_cvtne2ps2bf16_(op0,
+  operands[3],
+  operands[2]));
+  emit_move_insn (operands[0], lowpart_subreg (mode, op0,
+  mode));
+  DONE;
+}
+[(set_attr "mode" "")])
diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c 
b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
new file mode 100755
index 000..5c65f2a9884
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */
+/* { dg-final { scan-assembler-not "vpermi2b" } } */
+/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } */
+
+typedef __bf16 v8bf __attribute__((vector_size(16)));
+typedef __bf16 v16bf __attribute__((vector_size(32)));
+typedef __bf16 v32bf __attribute__((vector_size(64)));
+
+v8bf foo0(v8bf a, v8bf b)
+{
+  return __builtin_shufflevector(a, b, 1, 3, 5, 7, 9, 11, 13, 15);
+}
+
+v16bf foo1(v16bf a, v16bf b)
+{
+  return __builtin_shufflevector(a, b, 1, 3, 5, 7, 9, 11, 13, 15,
+ 17, 19, 21, 23, 25, 27, 29, 31);
+}
+
+v32bf foo2(v32bf a, v32bf b)
+{
+  return __builtin_shufflevector(a, b, 

[PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-13 Thread Levy Hsu
gcc/ChangeLog:

* config/i386/i386-expand.cc
(ix86_vectorize_vec_perm_const): Convert BF to HI using subreg.
* config/i386/predicates.md
(vcvtne2ps2bf_parallel): New define_insn_and_split.
* config/i386/sse.md
(vpermt2_sepcial_bf16_shuffle_): New predicates matches odd 
increasing perm.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vpermt2-special-bf16-shufflue.c: New test.
---
 gcc/config/i386/i386-expand.cc|  4 +--
 gcc/config/i386/predicates.md | 11 ++
 gcc/config/i386/sse.md| 35 +++
 .../i386/vpermt2-special-bf16-shufflue.c  | 27 ++
 4 files changed, 75 insertions(+), 2 deletions(-)
 create mode 100755 
gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 312329e550b..3d599c0651a 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23657,8 +23657,8 @@ ix86_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   if (GET_MODE_SIZE (vmode) == 64 && !TARGET_EVEX512)
 return false;
 
-  /* For HF mode vector, convert it to HI using subreg.  */
-  if (GET_MODE_INNER (vmode) == HFmode)
+  /* For HF and BF mode vector, convert it to HI using subreg.  */
+  if (GET_MODE_INNER (vmode) == HFmode || GET_MODE_INNER (vmode) == BFmode)
 {
   machine_mode orig_mode = vmode;
   vmode = mode_for_vector (HImode,
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 7afe3100cb7..1676c50de71 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -2322,3 +2322,14 @@
 
   return true;
 })
+
+;; Check that each element is odd and incrementally increasing from 1
+(define_predicate "vcvtne2ps2bf_parallel"
+  (and (match_code "const_vector")
+   (match_code "const_int" "a"))
+{
+  for (int i = 0; i < XVECLEN (op, 0); ++i)
+if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1))
+  return false;
+  return true;
+})
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 680a46a0b08..5ddd1c0a778 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -30698,3 +30698,38 @@
   "TARGET_AVXVNNIINT16"
   "vpdp\t{%3, %2, %0|%0, %2, %3}"
[(set_attr "prefix" "vex")])
+
+(define_mode_attr hi_cvt_bf
+  [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")])
+
+(define_mode_attr HI_CVT_BF
+  [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")])
+
+(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_"
+  [(set (match_operand:VI2_AVX512F 0 "register_operand")
+   (unspec:VI2_AVX512F
+ [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel")
+  (match_operand:VI2_AVX512F 2 "register_operand")
+  (match_operand:VI2_AVX512F 3 "nonimmediate_operand")]
+  UNSPEC_VPERMT2))]
+  "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx op0 = gen_reg_rtx (mode);
+  operands[2] = lowpart_subreg (mode,
+   force_reg (mode, operands[2]),
+   mode);
+  operands[3] = lowpart_subreg (mode,
+   force_reg (mode, operands[3]),
+   mode);
+
+  emit_insn (gen_avx512f_cvtne2ps2bf16_(op0,
+  operands[3],
+  operands[2]));
+  emit_move_insn (operands[0], lowpart_subreg (mode, op0,
+  mode));
+  DONE;
+}
+[(set_attr "mode" "")])
diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c 
b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
new file mode 100755
index 000..5c65f2a9884
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */
+/* { dg-final { scan-assembler-not "vpermi2b" } } */
+/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } */
+
+typedef __bf16 v8bf __attribute__((vector_size(16)));
+typedef __bf16 v16bf __attribute__((vector_size(32)));
+typedef __bf16 v32bf __attribute__((vector_size(64)));
+
+v8bf foo0(v8bf a, v8bf b)
+{
+  return __builtin_shufflevector(a, b, 1, 3, 5, 7, 9, 11, 13, 15);
+}
+
+v16bf foo1(v16bf a, v16bf b)
+{
+  return __builtin_shufflevector(a, b, 1, 3, 5, 7, 9, 11, 13, 15,
+ 17, 19, 21, 23, 25, 27, 29, 31);
+}
+
+v32bf foo2(v32bf a, v32bf b)
+{
+  return __builtin_shufflevector(a, b, 1, 3, 5, 7, 9, 11, 13, 15, 
+ 17, 19, 21, 23, 25, 27, 29, 31, 
+ 33, 35, 37, 39, 41, 43, 45, 47, 
+ 49, 51, 53, 55, 57, 59, 61, 63);
+}
-- 
2.31.1



RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

2024-06-13 Thread Li, Pan2
Thanks for another try.

Looks the build failure list below has nothing to do with this patch. I think 
the correlated owner will take care of this Werror build issue soon.

Pan

-Original Message-
From: Maciej W. Rozycki  
Sent: Friday, June 14, 2024 12:15 AM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

On Thu, 13 Jun 2024, Li, Pan2 wrote:

> Could you please help to update the upstream for another try ?
> 
> Should be fixed by this commit 
> https://github.com/gcc-mirror/gcc/commit/d03ff3fd3e2da1352a404e3c53fe61314569345c.
> 
> Feel free to ping me if any questions or concerns.

 Upstream master (as at 609764a42f0c) doesn't build:

In file included from .../gcc/gcc/coretypes.h:487,
 from .../gcc/gcc/tree-vect-stmts.cc:24:
In member function 'bool poly_int::is_constant() const [with unsigned int 
N = 2; C = long unsigned int]',
inlined from 'C poly_int::to_constant() const [with unsigned int N = 
2; C = long unsigned int]' at .../gcc/gcc/poly-int.h:588:3,
inlined from 'bool get_group_load_store_type(vec_info*, stmt_vec_info, 
tree, slp_tree, bool, vec_load_store_type, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2155:39,
inlined from 'bool get_load_store_type(vec_info*, stmt_vec_info, tree, 
slp_tree, bool, vec_load_store_type, unsigned int, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2387:38:
.../gcc/gcc/poly-int.h:557:7: error: 'remain.poly_int<2, long unsigned 
int>::coeffs[1]' may be used uninitialized [-Werror=maybe-uninitialized]
  557 |   if (this->coeffs[i] != 0)
  |   ^~
.../gcc/gcc/tree-vect-stmts.cc: In function 'bool 
get_load_store_type(vec_info*, stmt_vec_info, tree, slp_tree, bool, 
vec_load_store_type, unsigned int, vect_memory_access_type*, poly_int64*, 
dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)':
.../gcc/gcc/tree-vect-stmts.cc:2115:23: note: 'remain.poly_int<2, long unsigned 
int>::coeffs[1]' was declared here
 2115 |   poly_uint64 remain;
  |   ^~
In file included from .../gcc/gcc/system.h:1250,
 from .../gcc/gcc/tree-vect-stmts.cc:23:
In function 'int ceil_log2(long unsigned int)',
inlined from 'bool get_group_load_store_type(vec_info*, stmt_vec_info, 
tree, slp_tree, bool, vec_load_store_type, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2156:43,
inlined from 'bool get_load_store_type(vec_info*, stmt_vec_info, tree, 
slp_tree, bool, vec_load_store_type, unsigned int, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2387:38:
.../gcc/gcc/hwint.h:266:17: error: 'remain.poly_int<2, long unsigned 
int>::coeffs[0]' may be used uninitialized [-Werror=maybe-uninitialized]
  266 |   return x == 0 ? 0 : floor_log2 (x - 1) + 1;
  |  ~~~^~~~
.../gcc/gcc/tree-vect-stmts.cc: In function 'bool 
get_load_store_type(vec_info*, stmt_vec_info, tree, slp_tree, bool, 
vec_load_store_type, unsigned int, vect_memory_access_type*, poly_int64*, 
dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)':
.../gcc/gcc/tree-vect-stmts.cc:2115:23: note: 'remain.poly_int<2, long unsigned 
int>::coeffs[0]' was declared here
 2115 |   poly_uint64 remain;
  |   ^~
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:1198: tree-vect-stmts.o] Error 1

and actually e14afbe2d1c6^ doesn't build either (I guess it's just slipped 
through bisection as the file didn't have to be rebuild or something):

In file included from .../gcc/gcc/rtl.h:3973,
 from .../gcc/gcc/config/riscv/riscv.cc:31:
In function 'rtx_def* init_rtx_fmt_ee(rtx, machine_mode, rtx, rtx)',
inlined from 'rtx_def* gen_rtx_fmt_ee_stat(rtx_code, machine_mode, rtx, 
rtx)' at ./genrtl.h:50:26,
inlined from 'void riscv_move_integer(rtx, rtx, long int, machine_mode)' at 
.../gcc/gcc/config/riscv/riscv.cc:2786:10:
./genrtl.h:37:16: error: 'x' may be used uninitialized 
[-Werror=maybe-uninitialized]
   37 |   XEXP (rt, 0) = arg0;
.../gcc/gcc/config/riscv/riscv.cc: In function 'void riscv_move_integer(rtx, 
rtx, long int, machine_mode)':
.../gcc/gcc/config/riscv/riscv.cc:2723:7: note: 'x' was declared here
 2723 |   rtx x;
  |   ^
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:2563: riscv.o] Error 1

I hope you'll find this all useful.  As it happens I don't need to verify 
my needs with a RISC-V target anymore, so I'm leaving it all up to you now 
as I 

Re: [PATCH] build: Fix missing variable quotes

2024-06-13 Thread Sam James
Collin Funk  writes:

> When dlopen and pthread_create are in libc the variable is
> set to "none required", therefore running configure will show
> the following errors:
>
> ./configure: line 8997: test: too many arguments
> ./configure: line 8999: test: too many arguments
> ./configure: line 9003: test: too many arguments
> ./configure: line 9005: test: =: unary operator expected
>
> ChangeLog:
>
>   * configure.ac: Quote variable result of AC_SEARCH_LIBS.
> * configure: Regenerate.

This is PR115453 (which also needs to address a 'crate' typo).

>
> Signed-off-by: Collin Funk 
> ---
>  configure| 10 +-
>  configure.ac |  8 
>  2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/configure b/configure
> index 51576a41f30..6e95b27d9df 100755
> --- a/configure
> +++ b/configure
> @@ -8994,15 +8994,15 @@ if test "$ac_res" != no; then :
>  fi
>  
>  
> -if test $ac_cv_search_dlopen = -ldl; then
> +if test "$ac_cv_search_dlopen" = -ldl; then
>  CRAB1_LIBS="$CRAB1_LIBS -ldl"
> -elif test $ac_cv_search_dlopen = no; then
> +elif test "$ac_cv_search_dlopen" = no; then
>  missing_rust_dynlibs="libdl"
>  fi
>  
> -if test $ac_cv_search_pthread_create = -lpthread; then
> +if test "$ac_cv_search_pthread_create" = -lpthread; then
>  CRAB1_LIBS="$CRAB1_LIBS -lpthread"
> -elif test $ac_cv_search_pthread_crate = no; then
> +elif test "$ac_cv_search_pthread_crate" = no; then
>  missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
>  fi
>  
> @@ -19746,7 +19746,7 @@ config.status
>  configured by $0, generated by GNU Autoconf 2.69,
>with options \\"\$ac_cs_config\\"
>  
> -Copyright (C) 2012 Free Software Foundation, Inc.
> +Copyright (C)  Free Software Foundation, Inc.
>  This config.status script is free software; the Free Software Foundation
>  gives unlimited permission to copy, distribute and modify it."
>  
> diff --git a/configure.ac b/configure.ac
> index 5eda8dcdbf7..88576b31bfc 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2045,15 +2045,15 @@ missing_rust_dynlibs=none
>  AC_SEARCH_LIBS([dlopen], [dl])
>  AC_SEARCH_LIBS([pthread_create], [pthread])
>  
> -if test $ac_cv_search_dlopen = -ldl; then
> +if test "$ac_cv_search_dlopen" = -ldl; then
>  CRAB1_LIBS="$CRAB1_LIBS -ldl"
> -elif test $ac_cv_search_dlopen = no; then
> +elif test "$ac_cv_search_dlopen" = no; then
>  missing_rust_dynlibs="libdl"
>  fi
>  
> -if test $ac_cv_search_pthread_create = -lpthread; then
> +if test "$ac_cv_search_pthread_create" = -lpthread; then
>  CRAB1_LIBS="$CRAB1_LIBS -lpthread"
> -elif test $ac_cv_search_pthread_crate = no; then
> +elif test "$ac_cv_search_pthread_crate" = no; then
>  missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
>  fi


signature.asc
Description: PGP signature


[PATCH] build: Fix missing variable quotes

2024-06-13 Thread Collin Funk
When dlopen and pthread_create are in libc the variable is
set to "none required", therefore running configure will show
the following errors:

./configure: line 8997: test: too many arguments
./configure: line 8999: test: too many arguments
./configure: line 9003: test: too many arguments
./configure: line 9005: test: =: unary operator expected

ChangeLog:

* configure.ac: Quote variable result of AC_SEARCH_LIBS.
* configure: Regenerate.

Signed-off-by: Collin Funk 
---
 configure| 10 +-
 configure.ac |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 51576a41f30..6e95b27d9df 100755
--- a/configure
+++ b/configure
@@ -8994,15 +8994,15 @@ if test "$ac_res" != no; then :
 fi
 
 
-if test $ac_cv_search_dlopen = -ldl; then
+if test "$ac_cv_search_dlopen" = -ldl; then
 CRAB1_LIBS="$CRAB1_LIBS -ldl"
-elif test $ac_cv_search_dlopen = no; then
+elif test "$ac_cv_search_dlopen" = no; then
 missing_rust_dynlibs="libdl"
 fi
 
-if test $ac_cv_search_pthread_create = -lpthread; then
+if test "$ac_cv_search_pthread_create" = -lpthread; then
 CRAB1_LIBS="$CRAB1_LIBS -lpthread"
-elif test $ac_cv_search_pthread_crate = no; then
+elif test "$ac_cv_search_pthread_crate" = no; then
 missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
 fi
 
@@ -19746,7 +19746,7 @@ config.status
 configured by $0, generated by GNU Autoconf 2.69,
   with options \\"\$ac_cs_config\\"
 
-Copyright (C) 2012 Free Software Foundation, Inc.
+Copyright (C)  Free Software Foundation, Inc.
 This config.status script is free software; the Free Software Foundation
 gives unlimited permission to copy, distribute and modify it."
 
diff --git a/configure.ac b/configure.ac
index 5eda8dcdbf7..88576b31bfc 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2045,15 +2045,15 @@ missing_rust_dynlibs=none
 AC_SEARCH_LIBS([dlopen], [dl])
 AC_SEARCH_LIBS([pthread_create], [pthread])
 
-if test $ac_cv_search_dlopen = -ldl; then
+if test "$ac_cv_search_dlopen" = -ldl; then
 CRAB1_LIBS="$CRAB1_LIBS -ldl"
-elif test $ac_cv_search_dlopen = no; then
+elif test "$ac_cv_search_dlopen" = no; then
 missing_rust_dynlibs="libdl"
 fi
 
-if test $ac_cv_search_pthread_create = -lpthread; then
+if test "$ac_cv_search_pthread_create" = -lpthread; then
 CRAB1_LIBS="$CRAB1_LIBS -lpthread"
-elif test $ac_cv_search_pthread_crate = no; then
+elif test "$ac_cv_search_pthread_crate" = no; then
 missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
 fi
 
-- 
2.45.2



RE: [PATCH v2] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-13 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng  writes:
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we now generate the following for V8HI:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >
> > For V4HI, we generate:
> >   cnt v1.8b, v.8b
> >   uaddlp  v2.4h, v1.8b
> >
> > For V4SI, we generate:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >   uaddlp  v3.4s, v2.8h
> >
> > For V2SI, we generate:
> >   cnt v1.8b, v.8b
> >   uaddlp  v2.4h, v1.8b
> >   uaddlp  v3.2s, v2.4h
> >
> > For V2DI, we generate:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >   uaddlp  v3.4s, v2.8h
> >   uaddlp  v4.2d, v3.4s
> >
> > PR target/113859
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (aarch64_addlp):
> Rename to...
> > (@aarch64_addlp): ... This.
> > (popcount2): New define_expand.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/popcnt-vec.c: New test.
> > ---
> >  gcc/config/aarch64/aarch64-simd.md| 28 +++-
> >  gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 65
> > +++
> >  2 files changed, 92 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 0bb39091a38..38dba285f69 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3461,7 +3461,7 @@ (define_insn
> "*aarch64_addlv_ze"
> >[(set_attr "type" "neon_reduc_add")]
> >  )
> >
> > -(define_expand "aarch64_addlp"
> > +(define_expand "@aarch64_addlp"
> >[(set (match_operand: 0 "register_operand")
> > (plus:
> >   (vec_select:
> > @@ -3517,6 +3517,32 @@ (define_insn
> "popcount2"
> >[(set_attr "type" "neon_cnt")]
> >  )
> >
> > +(define_expand "popcount2"
> > +  [(set (match_operand:VDQHSD 0 "register_operand")
> > +(popcount:VDQHSD (match_operand:VDQHSD 1
> > +"register_operand")))]
> > +  "TARGET_SIMD"
> > +  {
> > +/* Generate a byte popcount. */
> > +machine_mode mode =  == 64 ? V8QImode : V16QImode;
> > +rtx tmp = gen_reg_rtx (mode);
> > +auto icode = optab_handler (popcount_optab, mode);
> > +emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode,
> > +operands[1])));
> > +
> > +/* Use a sequence of UADDLPs to accumulate the counts. Each step
> doubles the
> > +   element size and halves the number of elements. */
> 
> Nit: reflowing this paragraph has made the first line too long.
> I think we should stick with the version in the review:
> 
>/* Use a sequence of UADDLPs to accumulate the counts.  Each step
> doubles
>   the element size and halves the number of elements.  */

Good catch. I've fixed this in the latest version.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654601.html

> 
> > +do
> > +  {
> > +auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE
> (tmp));
> > +mode = insn_data[icode].operand[0].mode;
> > +rtx dest = mode == mode ? operands[0] : gen_reg_rtx
> (mode);
> > +emit_insn (GEN_FCN (icode) (dest, tmp));
> > +tmp = dest;
> > +  }
> > +while (mode != mode);
> > +DONE;
> > +  }
> > +)
> > +
> >  ;; 'across lanes' max and min ops.
> >
> >  ;; Template for outputting a scalar, so we can create __builtins
> > which can be diff --git
> > a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > new file mode 100644
> > index 000..89860940296
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > @@ -0,0 +1,65 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +/* This function should produce cnt v.16b. */ void bar (unsigned char
> > +*__restrict b, unsigned char *__restrict d) {
> > +  for (int i = 0; i < 1024; i++)
> > +d[i] = __builtin_popcount (b[i]); }
> > +
> > +/* This function should produce cnt v.16b and uaddlp (Add Long
> > +Pairwise). */ void
> > +bar1 (unsigned short *__restrict b, unsigned short *__restrict d) {
> > +  for (int i = 0; i < 1024; i++)
> > +d[i] = __builtin_popcount (b[i]); }
> > +
> > +/* This function should produce cnt v.16b and 2 uaddlp (Add Long
> > +Pairwise). */ void
> > +bar2 (unsigned int *__restrict b, unsigned int *__restrict d) {
> > +  for (int i = 0; i < 1024; i++)
> > +d[i] = __builtin_popcount (b[i]); }
> > +
> > +/* This function should produce cnt v.16b and 3 uaddlp (Add Long
> > +Pairwise). */ void
> > +bar3 (unsigned long long *__restrict b, unsigned long long
> > +*__restrict d) {
> > +  for (int i = 0; i < 1024; i++)
> > +d[i] = __builtin_popcountll (b[i]); }
> > +
> > +/* This function should produce cnt v.8b and uaddlp (Add Long
> > +Pairwise). */ void
> > +bar4 (unsigned short *__restrict b, unsigned 

Re: [FYI] map packed field type to unpacked for debug info

2024-06-13 Thread Alexandre Oliva
On Jun 11, 2024, Alexandre Oliva  wrote:

> Regstrapped on x86_64-linux-gnu.  Pre-approved by Eric.  I'm checking it
> in.

... I've just reverted it.  It turned out to be too easy to be good :-(
There were various regressions, including infinite loops in the compiler
to GDB regressions yet to be investigated.  I'll be back with an
improved version.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH v3] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-13 Thread Pengxuan Zheng
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.

With this patch, we now generate the following for V8HI:
  cnt v1.16b, v.16b
  uaddlp  v2.8h, v1.16b

For V4HI, we generate:
  cnt v1.8b, v.8b
  uaddlp  v2.4h, v1.8b

For V4SI, we generate:
  cnt v1.16b, v.16b
  uaddlp  v2.8h, v1.16b
  uaddlp  v3.4s, v2.8h

For V2SI, we generate:
  cnt v1.8b, v.8b
  uaddlp  v2.4h, v1.8b
  uaddlp  v3.2s, v2.4h

For V2DI, we generate:
  cnt v1.16b, v.16b
  uaddlp  v2.8h, v1.16b
  uaddlp  v3.4s, v2.8h
  uaddlp  v4.2d, v3.4s

PR target/113859

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_addlp): Rename to...
(@aarch64_addlp): ... This.
(popcount2): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-vec.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-simd.md| 28 +++-
 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69 +++
 2 files changed, 96 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 0bb39091a38..ee73e13534b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3461,7 +3461,7 @@ (define_insn 
"*aarch64_addlv_ze"
   [(set_attr "type" "neon_reduc_add")]
 )
 
-(define_expand "aarch64_addlp"
+(define_expand "@aarch64_addlp"
   [(set (match_operand: 0 "register_operand")
(plus:
  (vec_select:
@@ -3517,6 +3517,32 @@ (define_insn "popcount2"
   [(set_attr "type" "neon_cnt")]
 )
 
+(define_expand "popcount2"
+  [(set (match_operand:VDQHSD 0 "register_operand")
+(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
+  "TARGET_SIMD"
+  {
+/* Generate a byte popcount. */
+machine_mode mode =  == 64 ? V8QImode : V16QImode;
+rtx tmp = gen_reg_rtx (mode);
+auto icode = optab_handler (popcount_optab, mode);
+emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1])));
+
+/* Use a sequence of UADDLPs to accumulate the counts. Each step doubles
+   the element size and halves the number of elements. */
+do
+  {
+auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE (tmp));
+mode = insn_data[icode].operand[0].mode;
+rtx dest = mode == mode ? operands[0] : gen_reg_rtx (mode);
+emit_insn (GEN_FCN (icode) (dest, tmp));
+tmp = dest;
+  }
+while (mode != mode);
+DONE;
+  }
+)
+
 ;; 'across lanes' max and min ops.
 
 ;; Template for outputting a scalar, so we can create __builtins which can be
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
new file mode 100644
index 000..0c4926d7ca8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
@@ -0,0 +1,69 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-vect-cost-model" } */
+
+/* This function should produce cnt v.16b. */
+void
+bar (unsigned char *__restrict b, unsigned char *__restrict d)
+{
+  for (int i = 0; i < 1024; i++)
+d[i] = __builtin_popcount (b[i]);
+}
+
+/* This function should produce cnt v.16b and uaddlp (Add Long Pairwise). */
+void
+bar1 (unsigned short *__restrict b, unsigned short *__restrict d)
+{
+  for (int i = 0; i < 1024; i++)
+d[i] = __builtin_popcount (b[i]);
+}
+
+/* This function should produce cnt v.16b and 2 uaddlp (Add Long Pairwise). */
+void
+bar2 (unsigned int *__restrict b, unsigned int *__restrict d)
+{
+  for (int i = 0; i < 1024; i++)
+d[i] = __builtin_popcount (b[i]);
+}
+
+/* This function should produce cnt v.16b and 3 uaddlp (Add Long Pairwise). */
+void
+bar3 (unsigned long long *__restrict b, unsigned long long *__restrict d)
+{
+  for (int i = 0; i < 1024; i++)
+d[i] = __builtin_popcountll (b[i]);
+}
+
+/* SLP
+   This function should produce cnt v.8b and uaddlp (Add Long Pairwise). */
+void
+bar4 (unsigned short *__restrict b, unsigned short *__restrict d)
+{
+  d[0] = __builtin_popcount (b[0]);
+  d[1] = __builtin_popcount (b[1]);
+  d[2] = __builtin_popcount (b[2]);
+  d[3] = __builtin_popcount (b[3]);
+}
+
+/* SLP
+   This function should produce cnt v.8b and 2 uaddlp (Add Long Pairwise). */
+void
+bar5 (unsigned int *__restrict b, unsigned int *__restrict d)
+{
+  d[0] = __builtin_popcount (b[0]);
+  d[1] = __builtin_popcount (b[1]);
+}
+
+/* SLP
+   This function should produce cnt v.16b and 3 uaddlp (Add Long Pairwise). */
+void
+bar6 (unsigned long long *__restrict b, unsigned long long *__restrict d)
+{
+  d[0] = __builtin_popcountll (b[0]);
+  d[1] = __builtin_popcountll (b[1]);
+}
+
+/* { dg-final { scan-assembler-not {\tbl\tpopcount} } } */
+/* { dg-final { scan-assembler-times {cnt\t} 7 } } */
+/* { dg-final { scan-assembler-times {uaddlp\t} 12 } } */
+/* { 

Re: [PATCH] RISC-V: Add support for subword atomic loads/stores

2024-06-13 Thread Patrick O'Neill



On 6/13/24 12:58, Jeff Law wrote:



On 6/12/24 6:10 PM, Patrick O'Neill wrote:
Andrea Parri recently pointed out that we were emitting overly 
conservative
fences for seq_cst atomic loads/stores. This adds support for the 
optimized

fences specified in the PSABI:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/2092568f7896ceaa1ec0f02569b19eaa42cd51c9/riscv-atomic.adoc 



gcc/ChangeLog:

* config/riscv/sync-rvwmo.md: Add support for subword fenced
loads/stores.
* config/riscv/sync-ztso.md: Ditto.
* config/riscv/sync.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Increase test 
coverage to

include longs, shorts, chars, and bools.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-3.c: Ditto.

OK
jeff


Committed with a fixup to the long case to match both ld/sd or lw/sw
since that tripped up on rv32 targets. I resent the committed patch for
the archiver.

Patrick




[Committed] RISC-V: Add support for subword atomic loads/stores

2024-06-13 Thread Patrick O'Neill
Andrea Parri recently pointed out that we were emitting overly conservative
fences for seq_cst atomic loads/stores. This adds support for the optimized
fences specified in the PSABI:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/2092568f7896ceaa1ec0f02569b19eaa42cd51c9/riscv-atomic.adoc

gcc/ChangeLog:

* config/riscv/sync-rvwmo.md: Add support for subword fenced
loads/stores.
* config/riscv/sync-ztso.md: Ditto.
* config/riscv/sync.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Increase test coverage to
include longs, shorts, chars, and bools.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-3.c: Ditto.

Signed-off-by: Patrick O'Neill 
Tested-by: Andrea Parri 
---
v2 ChangeLog:
Adjusted 'long' testcase regex from ld/sd to l[wd]/s[wd] to also pass on rv32
targets.
---
 gcc/config/riscv/sync-rvwmo.md| 24 
 gcc/config/riscv/sync-ztso.md | 20 +++
 gcc/config/riscv/sync.md  |  8 +--
 .../riscv/amo/amo-table-a-6-load-1.c  | 48 +++-
 .../riscv/amo/amo-table-a-6-load-2.c  | 52 -
 .../riscv/amo/amo-table-a-6-load-3.c  | 56 ++-
 .../riscv/amo/amo-table-a-6-store-1.c | 48 +++-
 .../riscv/amo/amo-table-a-6-store-2.c | 52 -
 .../riscv/amo/amo-table-a-6-store-compat-3.c  | 56 ++-
 .../riscv/amo/amo-table-ztso-load-1.c | 48 +++-
 .../riscv/amo/amo-table-ztso-load-2.c | 48 +++-
 .../riscv/amo/amo-table-ztso-load-3.c | 52 -
 .../riscv/amo/amo-table-ztso-store-1.c| 48 +++-
 .../riscv/amo/amo-table-ztso-store-2.c| 48 +++-
 .../riscv/amo/amo-table-ztso-store-3.c| 52 -
 15 files changed, 610 insertions(+), 50 deletions(-)

diff --git a/gcc/config/riscv/sync-rvwmo.md b/gcc/config/riscv/sync-rvwmo.md
index e639a1e2392..5db94c8c27f 100644
--- a/gcc/config/riscv/sync-rvwmo.md
+++ b/gcc/config/riscv/sync-rvwmo.md
@@ -47,9 +47,9 @@
 ;; Atomic memory operations.

 (define_insn "atomic_load_rvwmo"
-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (unspec_volatile:GPR
-   [(match_operand:GPR 1 "memory_operand" "A")
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+   (unspec_volatile:ANYI
+   [(match_operand:ANYI 1 "memory_operand" "A")
 (match_operand:SI 2 "const_int_operand")]  ;; model
 UNSPEC_ATOMIC_LOAD))]
   "!TARGET_ZTSO"
@@ -59,13 +59,13 @@

 if (model == MEMMODEL_SEQ_CST)
   return "fence\trw,rw\;"
-"l\t%0,%1\;"
+"\t%0,%1\;"
 "fence\tr,rw";
 if (model == MEMMODEL_ACQUIRE)
-  return "l\t%0,%1\;"
+  return "\t%0,%1\;"
 "fence\tr,rw";
 else
-  return "l\t%0,%1";
+  return "\t%0,%1";
   }
   [(set_attr "type" "multi")
(set (attr "length") (const_int 12))])
@@ -73,9 +73,9 @@
 ;; Implement atomic stores with conservative fences.
 ;; This allows us to be compatible with the ISA manual Table A.6 and Table A.7.
 (define_insn "atomic_store_rvwmo"
-  [(set (match_operand:GPR 0 "memory_operand" "=A")
-   (unspec_volatile:GPR
-   [(match_operand:GPR 1 "reg_or_0_operand" "rJ")
+  [(set (match_operand:ANYI 0 "memory_operand" "=A")
+   (unspec_volatile:ANYI
+   [(match_operand:ANYI 1 "reg_or_0_operand" "rJ")
 (match_operand:SI 2 "const_int_operand")]  ;; model
 UNSPEC_ATOMIC_STORE))]
   "!TARGET_ZTSO"
@@ -85,13 +85,13 @@

 if (model == MEMMODEL_SEQ_CST)
   return "fence\trw,w\;"
-"s\t%z1,%0\;"
+"\t%z1,%0\;"
 "fence\trw,rw";
 if (model == MEMMODEL_RELEASE)
   return "fence\trw,w\;"
-"s\t%z1,%0";
+"\t%z1,%0";
 else
-  return "s\t%z1,%0";
+  return "\t%z1,%0";
   }
   [(set_attr "type" "multi")
(set (attr "length") (const_int 12))])
diff --git a/gcc/config/riscv/sync-ztso.md b/gcc/config/riscv/sync-ztso.md
index 0a866d2906b..f99a21b45ca 100644
--- a/gcc/config/riscv/sync-ztso.md
+++ b/gcc/config/riscv/sync-ztso.md
@@ -41,9 +41,9 @@
 ;; Atomic memory operations.

 (define_insn "atomic_load_ztso"
-  [(set (match_operand:GPR 0 

[RFC v2] RISC-V: Promote Zaamo/Zalrsc to a when using an old binutils

2024-06-13 Thread Patrick O'Neill
Binutils 2.42 and before don't support Zaamo/Zalrsc. Add a configure
check to upgrade Zaamo/Zalrsc to 'a' when the assember does not support it.

This change respects Zaamo/Zalrsc when generating code.

Testcases that check for the default isa string will fail with the old binutils
since zaamo/zalrsc aren't emitted anymore. All other Zaamo/Zalrsc testcases
pass.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Add toggle to promote Zaamo/Zalrsc
extensions to 'a'.
(riscv_arch_str): Ditto.
(riscv_expand_arch): Ditto.
(riscv_expand_arch_from_cpu): Ditto.
(riscv_expand_arch_upgrade_exts): New function. Wrapper around
riscv_expand_arch to preserve the function signature.
(riscv_expand_arch_no_upgrade_exts): Ditto
(riscv_expand_arch_from_cpu_upgrade_exts): New function. Wrapper around
riscv_expand_arch_from_cpu to preserve the function signature.
(riscv_expand_arch_from_cpu_no_upgrade_exts): Ditto.
* config/riscv/riscv-protos.h (riscv_arch_str): Add toggle to function
prototype.
* config/riscv/riscv-subset.h: Ditto.
* config/riscv/riscv-target-attr.cc (riscv_process_target_attr):
* config/riscv/riscv.cc (riscv_emit_attribute):
(riscv_declare_function_name):
* config/riscv/riscv.h (riscv_expand_arch): Remove.
(riscv_expand_arch_from_cpu): Ditto.
(riscv_expand_arch_upgrade_exts): Add toggle wrapper functions.
(riscv_expand_arch_no_upgrade_exts): Ditto.
(riscv_expand_arch_from_cpu_upgrade_exts): Ditto.
(riscv_expand_arch_from_cpu_no_upgrade_exts): Ditto.
(EXTRA_SPEC_FUNCTIONS): Ditto.
(OPTION_DEFAULT_SPECS): Use non-upgraded march string when invoking the
compiler.
(ASM_SPEC): Use upgraded march string when invoking the assembler.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add zaamo/zalrsc assembler check.

Signed-off-by: Patrick O'Neill 
---
This change is way more invasive than the last one hence the RFC. I'd be happy
to iterate more on this or just commit v1 with the misc configure diffs removed.
---
 gcc/common/config/riscv/riscv-common.cc | 101 ++--
 gcc/config.in   |   6 ++
 gcc/config/riscv/riscv-protos.h |   3 +-
 gcc/config/riscv/riscv-subset.h |   2 +-
 gcc/config/riscv/riscv-target-attr.cc   |   4 +-
 gcc/config/riscv/riscv.cc   |   7 +-
 gcc/config/riscv/riscv.h|  46 ++-
 gcc/configure   |  31 
 gcc/configure.ac|   5 ++
 9 files changed, 174 insertions(+), 31 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 78dfd6b1470..cdb390982c8 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -907,7 +907,7 @@ riscv_subset_list::add (const char *subset, bool implied_p)
VERSION_P to determine append version info or not.  */

 std::string
-riscv_subset_list::to_string (bool version_p) const
+riscv_subset_list::to_string (bool version_p, bool upgrade_exts) const
 {
   std::ostringstream oss;
   oss << "rv" << m_xlen;
@@ -916,6 +916,11 @@ riscv_subset_list::to_string (bool version_p) const
   riscv_subset_t *subset;

   bool skip_zifencei = false;
+  bool upgrade_zaamo_zalrsc = false;
+  bool has_a_ext = false;
+  bool insert_a_ext = false;
+  bool inserted_a_ext = false;
+  riscv_subset_t *a_subset;
   bool skip_zicsr = false;
   bool i2p0 = false;

@@ -943,6 +948,31 @@ riscv_subset_list::to_string (bool version_p) const
  a mistake in that binutils 2.35 supports zicsr but not zifencei.  */
   skip_zifencei = true;
 #endif
+#ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC
+  /* Upgrade Zaamo/Zalrsc extensions to 'a' since binutils 2.42 and earlier
+ don't recognize zaamo/zalrsc.  */
+  upgrade_zaamo_zalrsc = upgrade_exts;
+  if (upgrade_zaamo_zalrsc)
+{
+  for (subset = m_head; subset != NULL; subset = subset->next)
+   {
+ if (subset->name == "a")
+   has_a_ext = true;
+ if (subset->name == "zaamo" || subset->name == "zalrsc")
+   insert_a_ext = true;
+   }
+  if (insert_a_ext && !has_a_ext)
+   {
+ unsigned int major_version = 0, minor_version = 0;
+ get_default_version ("a", _version, _version);
+ a_subset = new riscv_subset_t ();
+ a_subset->name = "a";
+ a_subset->implied_p = false;
+ a_subset->major_version = major_version;
+ a_subset->minor_version = minor_version;
+   }
+}
+#endif

   for (subset = m_head; subset != NULL; subset = subset->next)
 {
@@ -954,6 +984,27 @@ riscv_subset_list::to_string (bool version_p) const
  subset->name == "zicsr")
continue;

+  if (upgrade_zaamo_zalrsc && subset->name == 

[committed] c: Implement C2Y complex increment/decrement support

2024-06-13 Thread Joseph Myers
Support for complex increment and decrement (previously supported as
an extension) was voted into C2Y today (paper N3259).  Thus, change
the pedwarn to a pedwarn_c23 and add associated tests.

Note: the type of the 1 to be added / subtracted is underspecified (to
be addressed in a subsequent paper), but understood to be intended to
be a real type (so the sign of a zero imaginary part is never changed)
and this is what is implemented; the tests added include verifying
that there is no undesired change to the sign of a zero imaginary
part.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.

gcc/c/
* c-typeck.cc (build_unary_op): Use pedwarn_c23 for complex
increment and decrement.

gcc/testsuite/
* gcc.dg/c23-complex-1.c, gcc.dg/c23-complex-2.c,
gcc.dg/c23-complex-3.c, gcc.dg/c23-complex-4.c,
gcc.dg/c2y-complex-1.c, gcc.dg/c2y-complex-2.c: New tests.

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index a5ca9ea7db6..ffcab7df4d3 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -5079,8 +5079,9 @@ build_unary_op (location_t location, enum tree_code code, 
tree xarg,
{
  tree real, imag;
 
- pedwarn (location, OPT_Wpedantic,
-  "ISO C does not support %<++%> and %<--%> on complex types");
+ pedwarn_c23 (location, OPT_Wpedantic,
+  "ISO C does not support %<++%> and %<--%> on complex "
+  "types before C2Y");
 
  if (!atomic_op)
{
diff --git a/gcc/testsuite/gcc.dg/c23-complex-1.c 
b/gcc/testsuite/gcc.dg/c23-complex-1.c
new file mode 100644
index 000..3607336593d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-complex-1.c
@@ -0,0 +1,14 @@
+/* Test C2Y complex increment and decrement: disallowed for C23.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+
+_Complex float a;
+
+void
+f (void)
+{
+  a++; /* { dg-error "does not support" } */
+  ++a; /* { dg-error "does not support" } */
+  a--; /* { dg-error "does not support" } */
+  --a; /* { dg-error "does not support" } */
+}
diff --git a/gcc/testsuite/gcc.dg/c23-complex-2.c 
b/gcc/testsuite/gcc.dg/c23-complex-2.c
new file mode 100644
index 000..301b668ea15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-complex-2.c
@@ -0,0 +1,15 @@
+/* Test C2Y complex increment and decrement: disallowed for C23 (warning with
+   -pedantic).  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -pedantic" } */
+
+_Complex float a;
+
+void
+f (void)
+{
+  a++; /* { dg-warning "does not support" } */
+  ++a; /* { dg-warning "does not support" } */
+  a--; /* { dg-warning "does not support" } */
+  --a; /* { dg-warning "does not support" } */
+}
diff --git a/gcc/testsuite/gcc.dg/c23-complex-3.c 
b/gcc/testsuite/gcc.dg/c23-complex-3.c
new file mode 100644
index 000..6fef30105b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-complex-3.c
@@ -0,0 +1,15 @@
+/* Test C2Y complex increment and decrement: allowed for C23 with
+   -Wno-c23-c2y-compat.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -pedantic-errors -Wno-c23-c2y-compat" } */
+
+_Complex float a;
+
+void
+f (void)
+{
+  a++;
+  ++a;
+  a--;
+  --a;
+}
diff --git a/gcc/testsuite/gcc.dg/c23-complex-4.c 
b/gcc/testsuite/gcc.dg/c23-complex-4.c
new file mode 100644
index 000..61d50e9a1dd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-complex-4.c
@@ -0,0 +1,15 @@
+/* Test C2Y complex increment and decrement: allowed for C23 by default (not
+   pedantic).  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23" } */
+
+_Complex float a;
+
+void
+f (void)
+{
+  a++;
+  ++a;
+  a--;
+  --a;
+}
diff --git a/gcc/testsuite/gcc.dg/c2y-complex-1.c 
b/gcc/testsuite/gcc.dg/c2y-complex-1.c
new file mode 100644
index 000..29a8c2771ac
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2y-complex-1.c
@@ -0,0 +1,232 @@
+/* Test C2Y complex increment and decrement.  */
+/* { dg-do run } */
+/* { dg-options "-std=c2y -pedantic-errors" } */
+
+extern void abort (void);
+extern void exit (int);
+
+_Complex float a, ax;
+_Complex double b, bx;
+_Complex long double c, cx;
+
+int
+main ()
+{
+  ax = a++;
+  if (ax != 0
+  || a != 1
+  || __builtin_signbit (__builtin_crealf (ax))
+  || __builtin_signbit (__builtin_cimagf (ax))
+  || __builtin_signbit (__builtin_crealf (a))
+  || __builtin_signbit (__builtin_cimagf (a)))
+abort ();
+  a = __builtin_complex (0.0f, -0.0f);
+  ax = a++;
+  if (ax != 0
+  || a != 1
+  || __builtin_signbit (__builtin_crealf (ax))
+  || !__builtin_signbit (__builtin_cimagf (ax))
+  || __builtin_signbit (__builtin_crealf (a))
+  || !__builtin_signbit (__builtin_cimagf (a)))
+abort ();
+  a = 0;
+  ax = ++a;
+  if (ax != 1
+  || a != 1
+  || __builtin_signbit (__builtin_crealf (ax))
+  || __builtin_signbit (__builtin_cimagf (ax))
+  || __builtin_signbit (__builtin_crealf (a))
+  || __builtin_signbit (__builtin_cimagf (a)))
+abort ();

Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Carl Love
Segher:

On 6/13/24 12:51, Segher Boessenkool wrote:



> 
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target powerpc*-*-* } } */
>> +/* { dg-do run { target powerpc*-*-* } } */
>>  /* { dg-options "-mvsx" } */
>>  /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>> has_arch_pwr8 } } } */
>>  /* { dg-require-effective-target powerpc_vsx } */
> 
> Everything in gcc.target/powerpc/ is tested for "target powerpc*-*-*"
> already, so you could remove that target clause even (after testing of
> course :-) )
> 
> Okay for trunk with or without that extra tweak.  Thank you!

I updated the patch by removing the target clause as suggested:

-/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-do run } */
 /* { dg-options "-mvsx" } */
 /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
 /* { dg-require-effective-target powerpc_vsx } */
 
Retested on Power 10.  Reports 2 passes and no failures.  I will go ahead and 
commit.

Thanks. 

   Carl 


[PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins

2024-06-13 Thread Carl Love


GCC maintainers:

As noted the removal of __builtin_vsx_xvcvdpuxds_uns and 
__builtin_vsx_xvcvspuxws was moved to patch 2 in the seris.  The patch has been 
updated per the comments from version 3.

Please let me know if this patch is acceptable for mainline.  

 Carl 

--

rs6000, extend the current vec_{un,}signed{e,o} built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to signed/unsigned long long ints.  Extend the
existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return the even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have testcases.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

Add testcases and update documentation.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws,
__builtin_vsx_xvcvdpuxws): Removed.
(__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed
__builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
(XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
VEC_VUNSIGNEDE_V4SF respectively.
(__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
built-in definitions.
* config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
* config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
vunsignede_v4sf, vunsignedo_v4sf): New  define_expands.
* doc/extend.texi (vec_signedo, vec_signede): Add documentation
for new overloaded built-ins.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c
(test_unsigned_int_result, test_ll_unsigned_int_result): Add
new argument.
(vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
tests for the overloaded built-ins.
---  gcc/config/rs6000/rs6000-builtins.def | 20 ++---
 gcc/config/rs6000/rs6000-overload.def |  8 ++
 gcc/config/rs6000/vsx.md  | 84 +++
 gcc/doc/extend.texi   | 10 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
 5 files changed, 154 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 322d27b7a0d..29a9deb3410 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1688,26 +1688,26 @@
   const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
 XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpsxws (vd);
-XVCVDPSXWS vsx_xvcvdpsxws {}
-
   const vsll __builtin_vsx_xvcvdpuxds (vd);
 XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
 
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
-XVCVDPUXWS vsx_xvcvdpuxws {}
-
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}
 
-  const vsll __builtin_vsx_xvcvspsxds (vf);
-XVCVSPSXDS vsx_xvcvspsxds {}
+  const vsll __builtin_vsignede_v4sf (vf);
+VEC_VSIGNEDE_V4SF vsignede_v4sf {}
+
+  const vsll __builtin_vsignedo_v4sf (vf);
+VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
+
+  const vull __builtin_vunsignede_v4sf (vf);
+VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
-XVCVSPUXDS vsx_xvcvspuxds {}
+  const vull __builtin_vunsignedo_v4sf (vf);
+VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 84bd9ae6554..4d857bb1af3 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 
 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede (vf);
+

Re: [Patch, fortran] PR59104

2024-06-13 Thread Paul Richard Thomas
Hi Both,

Thanks for the highly constructive comments. I think that I have
incorporated them fully in the attached.

OK for mainline and ...?

Paul


On Mon, 10 Jun 2024 at 08:19, Andre Vehreschild  wrote:

> Hi Paul,
>
> while looking at your patch I see calls to gfc_add_init_cleanup (...,
> back),
> while the function signature is gfc_add_init_cleanup (..., bool front).
> This
> slightly confuses me. I would at least expect to see
> gfc_add_init_cleanup(...,
> !back) calls. Just to get the semantics right.
>
> Then I wonder why not doing:
>
> diff --git a/gcc/fortran/dependency.cc b/gcc/fortran/dependency.cc
> index bafe8cbc5bc..97ace8c778e 100644
> --- a/gcc/fortran/dependency.cc
> +++ b/gcc/fortran/dependency.cc
> @@ -2497,3 +2497,63 @@ gfc_omp_expr_prefix_same (gfc_expr *lexpr, gfc_expr
> *rexpr)
>return true;
>  }
> +
> +
> +/* gfc_function_dependency returns true for non-dummy symbols with
> dependencies
> +   on an old-fashioned function result (ie. proc_name =
> proc_name->result).
> +   This is used to ensure that initialization code appears after the
> function
> +   result is treated and that any mutual dependencies between these
> symbols are
> +   respected.  */
> +
> +static bool
> +dependency_fcn (gfc_expr *e, gfc_symbol *sym,
> +int *f ATTRIBUTE_UNUSED)
> +{
> +  return (e && e->expr_type == EXPR_VARIABLE
> +  && e->symtree
> +  && e->symtree->n.sym == sym);
> +}
>
> Instead of the multiple if-statements?
>
> +
> +bool
> +gfc_function_dependency (gfc_symbol *sym, gfc_symbol *proc_name)
> +{
> +  bool front = false;
> +
> +  if (proc_name && proc_name->attr.function
> +  && proc_name == proc_name->result
> +  && !(sym->attr.dummy || sym->attr.result))
> +{
> +  if (sym->as && sym->as->type == AS_EXPLICIT)
> +   {
> + for (int dim = 0; dim < sym->as->rank; dim++)
> +   {
> + if (sym->as->lower[dim]
> + && sym->as->lower[dim]->expr_type != EXPR_CONSTANT)
> +   front = gfc_traverse_expr (sym->as->lower[dim], proc_name,
> +  dependency_fcn, 0);
> + if (front)
> +   break;
> + if (sym->as->upper[dim]
> + && sym->as->upper[dim]->expr_type != EXPR_CONSTANT)
> +   front = gfc_traverse_expr (sym->as->upper[dim], proc_name,
> +  dependency_fcn, 0);
> + if (front)
> +   break;
> +   }
> +   }
> +
> +  if (sym->ts.type == BT_CHARACTER
> + && sym->ts.u.cl && sym->ts.u.cl->length
> + && sym->ts.u.cl->length->expr_type != EXPR_CONSTANT)
> +   front = gfc_traverse_expr (sym->ts.u.cl->length, proc_name,
> +  dependency_fcn, 0);
>
> This can overwrite a previous front == true, right? Is this intended?
>
> +}
> +  return front;
> + }
>
> The rest - besides the front-back confusion - looks fine to me. Thanks for
> the
> patch.
>
> Regards,
> Andre
>
> On Sun, 9 Jun 2024 07:14:39 +0100
> Paul Richard Thomas  wrote:
>
> > Hi All,
> >
> > The attached fixes a problem that, judging by the comments, has been
> looked
> > at periodically over the last ten years but just looked to be too
> > fiendishly complicated to fix. This is not in small part because of the
> > confusing ordering of dummies in the tlink chain and the unintuitive
> > placement of all deferred initializations to the front of the init chain
> in
> > the wrapped block.
> >
> > The result of the existing ordering is that the initialization code for
> > non-dummy variables that depends on the function result occurs before any
> > initialization code for the function result itself. The fix ensures that:
> > (i) These variables are placed correctly in the tlink chain, respecting
> > inter-dependencies; and (ii) The dependent initializations are placed at
> > the end of the wrapped block init chain.  The details appear in the
> > comments in the patch. It is entirely possible that a less clunky fix
> > exists but I failed to find it.
> >
> > OK for mainline?
> >
> > Regards
> >
> > Paul
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>
diff --git a/gcc/fortran/dependency.cc b/gcc/fortran/dependency.cc
index fb4d94de641..e299508e53a 100644
--- a/gcc/fortran/dependency.cc
+++ b/gcc/fortran/dependency.cc
@@ -2465,3 +2465,85 @@ gfc_omp_expr_prefix_same (gfc_expr *lexpr, gfc_expr *rexpr)
 
   return true;
 }
+
+
+/* gfc_function_dependency returns true for non-dummy symbols with dependencies
+   on an old-fashioned function result (ie. proc_name = proc_name->result).
+   This is used to ensure that initialization code appears after the function
+   result is treated and that any mutual dependencies between these symbols are
+   respected.  */
+
+static bool
+dependency_fcn (gfc_expr *e, gfc_symbol *sym,
+		 int *f ATTRIBUTE_UNUSED)
+{
+  if (e == NULL)
+return false;
+
+  if (e && e->expr_type == 

[PATCH V2 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn

2024-06-13 Thread Edwin Lu
When emitting insns, we have an early assertion to ensure the input
operand's mode and the expanded operand's mode are the same; however, it
does not perform this check if the pattern does not have an explicit
machine mode specifying the operand. In this scenario, it will always
assume that mode = Pmode to correctly satisfy the
maybe_legitimize_operand check, however, there may be problems when
working in 32 bit environments.

Make the assert unconditional and replace it with an internal error for
more descriptive logging

gcc/ChangeLog:

* config/riscv/riscv-v.cc: Move assert out of conditional block

Signed-off-by: Edwin Lu 
Co-authored-by: Robin Dapp 
---
V2: change assert to internal error
---
 gcc/config/riscv/riscv-v.cc | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8911f5783c8..3f8214b74d5 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -50,6 +50,7 @@
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
 #include "predict.h"
+#include "errors.h"
 
 using namespace riscv_vector;
 
@@ -290,11 +291,17 @@ public:
   always Pmode.  */
if (mode == VOIDmode)
  mode = Pmode;
-   else
- /* Early assertion ensures same mode since maybe_legitimize_operand
-will check this.  */
- gcc_assert (GET_MODE (ops[opno]) == VOIDmode
- || GET_MODE (ops[opno]) == mode);
+
+   /* Early assertion ensures same mode since maybe_legitimize_operand
+  will check this.  */
+machine_mode required_mode = GET_MODE (ops[opno]);
+if (required_mode != VOIDmode && required_mode != mode)
+ internal_error ("expected mode %s for operand %d of "
+ "insn %s but got mode %s.\n",
+ GET_MODE_NAME (mode),
+ opno,
+ insn_data[(int) icode].name,
+ GET_MODE_NAME (required_mode));
 
add_input_operand (ops[opno], mode);
   }
@@ -346,7 +353,13 @@ public:
 else if (m_insn_flags & VXRM_RDN_P)
   add_rounding_mode_operand (VXRM_RDN);
 
-gcc_assert (insn_data[(int) icode].n_operands == m_opno);
+
+if (insn_data[(int) icode].n_operands != m_opno)
+  internal_error ("invalid number of operands for insn %s, "
+ "expected %d but got %d.\n",
+ insn_data[(int) icode].name,
+ insn_data[(int) icode].n_operands, m_opno);
+
 expand (icode, any_mem_p);
   }
 
-- 
2.34.1



[PATCH V2 1/2] RISC-V: Fix vwsll combine on rv32 targets

2024-06-13 Thread Edwin Lu
On rv32 targets, vwsll_zext1_scalar_ would trigger an ice in
maybe_legitimize_instruction when zero extending a uint32 to uint64 due
to a mismatch between the input operand's mode (DI) and the expanded insn
operand's mode (Pmode == SI). Ensure that mode of the operands match

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Fix mode mismatch

Signed-off-by: Edwin Lu 
Co-authored-by: Robin Dapp 
---
V2: Remove subreg check
---
 gcc/config/riscv/autovec-opt.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 6a2eabbd854..29916adb62b 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1517,8 +1517,7 @@ (define_insn_and_split "*vwsll_zext1_scalar_"
   "&& 1"
   [(const_int 0)]
   {
-if (GET_CODE (operands[2]) == SUBREG)
-  operands[2] = SUBREG_REG (operands[2]);
+operands[2] = gen_lowpart (Pmode, operands[2]);
 insn_code icode = code_for_pred_vwsll_scalar (mode);
 riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
 DONE;
-- 
2.34.1



[PATCH V2 0/2] Fix ICE with vwsll combine on 32bit targets

2024-06-13 Thread Edwin Lu
The following testcases have been failing on rv32 targets since 
r15-953-gaf4bf422a69:
FAIL: gcc.target/riscv/rvv/autovec/binop/vwsll-1.c (internal compiler
error: in maybe_legitimize_operand, at optabs.cc:8056)
FAIL: gcc.target/riscv/rvv/autovec/binop/vwsll-1.c (test for excess
errors)

Fix the bug and also robustify our emit_insn by making an assertion
check unconditional

I'm not sure if this ICE warrants its own separate testcase since it is
already being tested. I do have a minimal testcase on hand if we would
like to add one.

V2: Remove subreg condition and change assert to internal error

Edwin Lu (2):
  RISC-V: Fix vwsll combine on rv32 targets
  RISC-V: Move mode assertion out of conditional branch in emit_insn

 gcc/config/riscv/autovec-opt.md |  3 +--
 gcc/config/riscv/riscv-v.cc | 25 +++--
 2 files changed, 20 insertions(+), 8 deletions(-)

-- 
2.34.1



[PATCH 7/13 ver4] rs6000, add overloaded vec_sel with int128 arguments

2024-06-13 Thread Carl Love


GCC maintainers:

The patch has been updated per the comments from version 3.  Please let me know 
if the patch is acceptable for mainline.

 Carl 

-

rs6000, add overloaded vec_sel with int128 arguments

Extend the vec_sel built-in to take three signed/unsigned/bool int128
arguments and return a signed/unsigned/bool int128 result.

Extending the vec_sel built-in makes the existing buit-ins
__builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
patch removes these built-ins.

The patch adds documentation and test cases for the new overloaded
vec_sel built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
__builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
* config/rs6000/rs6000-overload.def (vec_sel): Add new
overloaded  definitions.
* doc/extend.texi: Add documentation for new vec_sel instances.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-10-runnable.c: New runnable test
file.
* gcc.target/powerpc/builtins-10.c: New compile only test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 -
 gcc/config/rs6000/rs6000-overload.def |  12 +
 gcc/doc/extend.texi   |  20 ++
 .../gcc.target/powerpc/builtins-10-runnable.c | 220 ++
 .../gcc.target/powerpc/builtins-10.c  |  63 +
 5 files changed, 315 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index b90b3f34167..c969cd0f3f6 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1907,12 +1907,6 @@
   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
 XXSEL_16QI_UNS vector_select_v16qi_uns {}
 
-  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
-XXSEL_1TI vector_select_v1ti {}
-
-  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
-XXSEL_1TI_UNS vector_select_v1ti_uns {}
-
   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
 XXSEL_2DF vector_select_v2df {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 4d857bb1af3..6cec1ad4f1a 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3274,6 +3274,18 @@
 VSEL_2DF  VSEL_2DF_B
   vd __builtin_vec_sel (vd, vd, vull);
 VSEL_2DF  VSEL_2DF_U
+  vsq __builtin_vec_sel (vsq, vsq, vbq);
+VSEL_1TI  VSEL_1TI_B
+  vsq __builtin_vec_sel (vsq, vsq, vuq);
+VSEL_1TI  VSEL_1TI_U
+  vuq __builtin_vec_sel (vuq, vuq, vbq);
+VSEL_1TI_UNS  VSEL_1TI_UB
+  vuq __builtin_vec_sel (vuq, vuq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_UU
+  vbq __builtin_vec_sel (vbq, vbq, vbq);
+VSEL_1TI_UNS  VSEL_1TI_BB
+  vbq __builtin_vec_sel (vbq, vbq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_BU
 ; The following variants are deprecated.
   vsll __builtin_vec_sel (vsll, vsll, vsll);
 VSEL_2DI_B  VSEL_2DI_S
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b1620274285..d7d8d149a43 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21420,6 +21420,26 @@ Additional built-in functions are available for the 
64-bit PowerPC
 family of processors, for efficient use of 128-bit floating point
 (@code{__float128}) values.
 
+Vector select
+
+@smallexample
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector bool __int128);
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector unsigned __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector bool __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_sel (vector bool __int128,
+   vector bool __int128, vector bool __int128);
+vector bool __int128 vec_sel (vector bool __int128,
+   vector bool __int128, vector unsigned __int128);
+@end smallexample
+
+The instance is an extension of the exiting overloaded built-in @code{vec_sel}
+that is documented in the PVIPR.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.06
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
new file mode 100644
index 000..b7b4a95ea0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -0,0 +1,220 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 " } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 

Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:59, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:16, Carl Love wrote:
>> This was patch 13 from the previous series.  Note the previous series patch 
>> 12 was dropped.  This patch is the same as the previous version.  The 
>> additional work to remove  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,  
>> __builtin_vec_set_v2d per the feedback comments with equivalent gimple code 
>> is being deferred to a future patch.  The goal of this series was simply to 
>> remove duplicated built-ins, extending overloaded built-ins as needed.  
>> Adding the needed gimple code to remove the additional built-ins is beyond 
>> the goal of this patch series.
>>
>>  Carl 
>> ---
>>
>> rs6000, remove vector set and vector init built-ins.
>>
>> The vector init built-ins:
>>
>>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>>   __builtin_vec_set_v1ti
> 
> Typo here, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/

Fixed.

> 
>>
>> perform the same operation as initializing the vector in C code.  For
>> example:
>>
>>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>>   result_v4si = {1, 2, 3, 4};
>>
>> These two constructs were tested and verified they generate identical
>> assembly instructions with no optimization and -O3 optimization.
>>
>> The vector set built-ins:
>>
>>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf
> 
> Please also add the reserved ones (...v1ti/v2di/v2df), as they are the 
> same too, temporarily reserving them for the uses in resolve_vec_insert()
> doesn't affect this.

Added the three additional built-ins to the list.

> 
>>
>> perform the same operation as setting a specific element in the vector in
>> C code.  For example:
>>
>>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>>   src_v4si[index] = int_val;
>>
>> The built-in actually generates more instructions than the inline C code
>> with no optimization but is identical with -O3 optimizations.
>>
>> All of the above built-ins that are removed do not have test cases and
>> are not documented.
>>
>> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
>> __builtin_vec_set_v2df are not removed as they are used in function
>> resolve_vec_insert() in file rs6000-c.cc.
>>
>> The built-ins are removed as they don't provide any benefit over just
>> using C code.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
>>  __builtin_vec_init_v8hi, __builtin_vec_init_v4si,
>>  __builtin_vec_init_v4sf, __builtin_vec_init_v2di,
>>  __builtin_vec_init_v2df, __builtin_vec_set_v1ti,
> 
> Typo, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/

Fixed

> 
>>  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>>  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>>  __builtin_vec_set_v2di, __builtin_vec_set_v2df,
>>  __builtin_vec_set_v1ti): Remove built-in definitions.
> 
> The last three ones are not actually removed.

OK, fixed.

> 
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 42 ++-
>>  1 file changed, 2 insertions(+), 40 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 48ebc018a8d..8349d45169f 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1118,37 +1118,6 @@
>>const signed short __builtin_vec_ext_v8hi (vss, signed int);
>>  VEC_EXT_V8HI nothing {extract}
>>  
>> -  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char);
>> -VEC_INIT_V16QI nothing {init}
>> -
>> -  const vf __builtin_vec_init_v4sf (float, float, float, float);
>> -VEC_INIT_V4SF nothing {init}
>> -
>> -  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
>> - signed int);
>> -VEC_INIT_V4SI nothing {init}
>> -
>> -  const vss __builtin_vec_init_v8hi (signed short, signed short, signed 
>> short,\
>> - signed short, signed short, signed short, signed short, \
>> - signed short);
>> -VEC_INIT_V8HI nothing {init}
>> -
>> -  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
>> -VEC_SET_V16QI nothing {set}
>> -
>> -  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
>> -VEC_SET_V4SF nothing {set}
>> -
>> -  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
>> -VEC_SET_V4SI nothing {set}
>> -
>> -  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
>> - 

Re: [PATCH] configure: adjustments for building with in-tree binutils

2024-06-13 Thread Jeff Law




On 6/12/24 7:06 AM, Jan Beulich wrote:

For one setting ld_ver in a conditional (no in-tree ld) when it's used,
for x86 at least, in unconditional ways can't be quite right. And then
prefixing relative paths to binaries with ${objdir}/, when ${objdir}
nowadays resolves to just .libs, can at best be a leftover that wasn't
properly cleaned up at some earlier point.

gcc/

* configure.ac: Drop ${objdir}/ from NM and AR. Move setting of
  ld_ver out of conditional.
* configure: Re-generate.

OK.
jeff



Re: PING^1 [PATCH 30/52] pdp11: Remove macro LONG_DOUBLE_TYPE_SIZE

2024-06-13 Thread Paul Koning
What is the effect of this change?  The original code intended to have "float" 
mean a 32 bit value, and "double" a 64 bit value.  There aren't any larger 
floats, so I defined the long double size as 64 also.  Is the right answer not 
to define it?

That part I understand, but why does the patch also remove FLOAT_TYPE_SIZE and 
DOUBLE_TYPE_SIZE without explanation and without mention in the changelog?

paul

> On Jun 13, 2024, at 3:16 AM, Kewen.Lin  wrote:
> 
> Hi,
> 
> Gentle ping:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653368.html
> 
> BR,
> Kewen
> 
> on 2024/6/3 11:01, Kewen Lin wrote:
>> This is to remove macro LONG_DOUBLE_TYPE_SIZE define
>> in pdp11 port.
>> 
>> gcc/ChangeLog:
>> 
>>  * config/pdp11/pdp11.h (LONG_DOUBLE_TYPE_SIZE): Remove.
>> ---
>> gcc/config/pdp11/pdp11.h | 11 ---
>> 1 file changed, 11 deletions(-)
>> 
>> diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
>> index 2446fea0b58..6c8e045bc57 100644
>> --- a/gcc/config/pdp11/pdp11.h
>> +++ b/gcc/config/pdp11/pdp11.h
>> @@ -71,17 +71,6 @@ along with GCC; see the file COPYING3.  If not see
>> #define LONG_TYPE_SIZE   32
>> #define LONG_LONG_TYPE_SIZE  64 
>> 
>> -/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64,
>> -   but that conflicts with Fortran language rules.  Since there is no
>> -   obvious reason why we should have that feature -- other targets
>> -   generally don't have float and double the same size -- I've removed
>> -   it.  Note that it continues to be true (for now) that arithmetic is
>> -   always done with 64-bit values, i.e., the FPU is always in "double"
>> -   mode.  */
>> -#define FLOAT_TYPE_SIZE 32
>> -#define DOUBLE_TYPE_SIZE64
>> -#define LONG_DOUBLE_TYPE_SIZE   64
>> -
>> /* machine types from ansi */
>> #define SIZE_TYPE "short unsigned int"   /* definition of size_t */
>> #define WCHAR_TYPE "short int"   /* or long int */
> 
> 
> 



[PATCH 0/13 ver4] rs6000, built-in cleanup patch series

2024-06-13 Thread Carl Love
GCC maintainers:

I have addressed the comments to the five patches in the series that have not 
yet been approved.
The patches that have already been approved are 1, 3, 5, 6, 8, 9, 10, and 12.

The remaining patches all have fairly minor fixes requested.  I will just post 
version 4 of these patches here.  The goal is to commit the entire series all 
at once as they are all related.  So I a holding off committing the approved 
patches.  

Thank you for your time and feedback of these patches.  The entire patch series 
has been tested on Power 10 LE, Power 9 BE with no regression failures.

   Carl 


Re: [PATCH] RISC-V: Add configure check for Zaamo/Zalrsc assembler support

2024-06-13 Thread Jeff Law




On 6/12/24 5:20 PM, Patrick O'Neill wrote:

Binutils 2.42 and before don't support Zaamo/Zalrsc. Add a configure
check to prevent emitting Zaamo/Zalrsc in the arch string when the
assember does not support it.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
  (riscv_subset_list::to_string): Skip zaamo/zalrsc when not
  supported by the assembler.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add zaamo/zalrsc assmeber check.

OK.

It looks like you've got some unexpected diff fragmets in configure -- 
all the LARGE_OFF_T stuff.  They look OK to me, but something like that 
is usually a sign of different autoconf versions.   I wouldn't lose any 
sleep if you left them as-is or removed those hunks before committing.


jeff



Re: [PATCH] RISC-V: Add support for subword atomic loads/stores

2024-06-13 Thread Jeff Law




On 6/12/24 6:10 PM, Patrick O'Neill wrote:

Andrea Parri recently pointed out that we were emitting overly conservative
fences for seq_cst atomic loads/stores. This adds support for the optimized
fences specified in the PSABI:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/2092568f7896ceaa1ec0f02569b19eaa42cd51c9/riscv-atomic.adoc

gcc/ChangeLog:

* config/riscv/sync-rvwmo.md: Add support for subword fenced
loads/stores.
* config/riscv/sync-ztso.md: Ditto.
* config/riscv/sync.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Increase test coverage to
include longs, shorts, chars, and bools.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-3.c: Ditto.

OK
jeff



Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Segher Boessenkool
Hi!

On Thu, Jun 13, 2024 at 11:32:58AM -0700, Carl Love wrote:
> The test gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c is supposed to 
> be a runnable test
> to verify the execution of the vec_unpackl and vec_unpackh built-ins.  The 
> dg-do command is set to
> compile not run.  This patch fixes the dg-do command argument.
> 
> The patch has been verified on a P10.  The test runs without errors.

> rs6000, altivec-2-runnable.c should be a runnable test
> 
> The test case has "dg-do compile" set not "dg-do run" for a runnable
> test.  This patch changes the dg-do command argument to run.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>   argument to run.


> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-do run { target powerpc*-*-* } } */
>  /* { dg-options "-mvsx" } */
>  /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 
> } } } */
>  /* { dg-require-effective-target powerpc_vsx } */

Everything in gcc.target/powerpc/ is tested for "target powerpc*-*-*"
already, so you could remove that target clause even (after testing of
course :-) )

Okay for trunk with or without that extra tweak.  Thank you!


Segher


Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins

2024-06-13 Thread Carl Love
GCC maintainers:

The patch has been updated per the feedback from version 3.  Please let me know 
it the patch is acceptable for mainline.

Thanks.

  Carl 

--

rs6000, remove vector set and vector init built-ins

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_init_v1ti

perform the same operation as initializing the vector in C code.  For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
  __builtin_vec_set_v2df

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
__builtin_vec_init_v2df, __builtin_vec_init_v2di,
__builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
__builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 44 +++
 1 file changed, 4 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 02aa04e5698..053dc0115d2 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1118,37 +1118,6 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
 VEC_EXT_V8HI nothing {extract}
 
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}
-
-  const vf __builtin_vec_init_v4sf (float, float, float, float);
-VEC_INIT_V4SF nothing {init}
-
-  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
- signed int);
-VEC_INIT_V4SI nothing {init}
-
-  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\
- signed short, signed short, signed short, signed short, \
- signed short);
-VEC_INIT_V8HI nothing {init}
-
-  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
-VEC_SET_V16QI nothing {set}
-
-  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
-VEC_SET_V4SF nothing {set}
-
-  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
-VEC_SET_V4SI nothing {set}
-
-  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
-VEC_SET_V8HI nothing {set}
-
-
 ; Cell builtins.
 [cell]
   pure vsc __builtin_altivec_lvlx (signed long, const void *);
@@ -1295,15 +1264,10 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}
 
-  const vsq __builtin_vec_init_v1ti (signed __int128);
-VEC_INIT_V1TI nothing {init}
-
-  const vd __builtin_vec_init_v2df (double, double);
-VEC_INIT_V2DF nothing {init}
-
-  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
-VEC_INIT_V2DI nothing {init}
-
+;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
+;; resolve_vec_insert(), rs6000-c.cc
+;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
+;; in resolve_vec_insert are replaced by the equivalent gimple statements.
   const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
 VEC_SET_V1TI nothing {set}
 
-- 
2.45.0



[PATCH 11/13 ver4] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-13 Thread Carl Love


GCC maintainers:

The patch has been updated per the comments from version 3.  Please let me know 
if the patch is acceptable for mainline.

Thanks.

 Carl 

-

rs6000, extend vec_xxpermdi built-in for __int128 args

Add a new signed and unsigned overloaded instances for vec_xxpermdi

   __int128 vec_xxpermdi (__int128, __int128, const int);
   __uint128 vec_xxpermdi (__uint128, __uint128, const int);

Update the documentation to include a reference to the new built-in
instances.

Add test cases for the new overloaded instances.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
overloaded built-in instances.
* doc/extend.texi:  Add documentation for new overloaded built-in
instances.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |   4 +
 .../powerpc/vec_perm-runnable-i128.c  | 229 ++
 3 files changed, 237 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 6cec1ad4f1a..354f8fabe0f 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4936,6 +4936,10 @@
 XXPERMDI_2DI  XXPERMDI_VSLL
   vull __builtin_vsx_xxpermdi (vull, vull, const int);
 XXPERMDI_2DI  XXPERMDI_VULL
+  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
+XXPERMDI_1TI  XXPERMDI_1SQ
+  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
+XXPERMDI_1TI  XXPERMDI_1UQ
   vf __builtin_vsx_xxpermdi (vf, vf, const int);
 XXPERMDI_4SF  XXPERMDI_VF
   vd __builtin_vsx_xxpermdi (vd, vd, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d7d8d149a43..9e45976436b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22610,6 +22610,10 @@ void vec_vsx_st (vector bool char, int, signed char *);
 
 vector double vec_xxpermdi (vector double, vector double, const int);
 vector float vec_xxpermdi (vector float, vector float, const int);
+vector __int128 vec_xxpermdi (vector signed __int128,
+  vector signed __int128, const int);
+vector __int128 vec_xxpermdi (vector unsigned __int128,
+  vector unsigned __int128, const int);
 vector long long vec_xxpermdi (vector long long, vector long long, const int);
 vector unsigned long long vec_xxpermdi (vector unsigned long long,
 vector unsigned long long, const int);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
new file mode 100644
index 000..0e0d77bcb84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -0,0 +1,229 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 " } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will
+ run with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;
+}
+
+int check_s128_result(vector signed __int128 vresult_s128,
+ vector signed __int128 expected_vresult_s128)
+{
+  /* Convert the arguments to unsigned, then check equality.  */
+  union convert_union result;
+  union convert_union expected;
+
+  result.s128 = vresult_s128;
+  expected.s128 = expected_vresult_s128;
+
+  return check_u128_result (result.u128, expected.u128);
+}
+
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  
+  vector signed __int128 src_va_s128;
+  vector signed __int128 src_vb_s128;
+  vector signed __int128 vresult_s128;
+  vector signed __int128 expected_vresult_s128;
+
+  vector unsigned __int128 src_va_u128;
+  vector unsigned __int128 src_vb_u128;
+  vector unsigned __int128 src_vc_u128;
+  vector unsigned __int128 vresult_u128;
+  vector unsigned __int128 

[PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

2024-06-13 Thread Carl Love
GCC maintainers:

Per the comments on patch 0004 from version 3, the removal of 
The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was 
moved to this patch.  The rest of the patch is unchanged from version 3.  There 
were no comments on this patch for version 3.

Please let me know if this patch is acceptable.  Thanks.

Carl 


-

rs6000, Remove __builtin_vsx_xvcvspsxws,
 __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundant as it is covered by
vec_unsigned, remove.

This patch removes the redundant built-in.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws):
Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 9 -
 1 file changed, 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 7c36976a089..8cf0b715898 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1697,9 +1697,6 @@
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
-XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
-
   const vsi __builtin_vsx_xvcvdpuxws (vd);
 XVCVDPUXWS vsx_xvcvdpuxws {}
 
@@ -1709,15 +1706,9 @@
   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}
 
-  const vsi __builtin_vsx_xvcvspsxws (vf);
-XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
-
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
-XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
-
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
 
-- 
2.45.0



Re: [PATCH] libstdc++: Do not use memset _Hashtable buckets allocation

2024-06-13 Thread Jonathan Wakely
On Thu, 13 Jun 2024 at 19:57, Jonathan Wakely  wrote:
>
> On Thu, 13 Jun 2024 at 18:40, François Dumont  wrote:
> >
> > Hi
> >
> > Following your recent change here:
> >
> > https://gcc.gnu.org/pipermail/libstdc++/2024-June/058998.html
> >
> > I think we also need to fix the memset at bucket allocation level.
> >
> > I did it trying also to be more fancy pointer friendly by running
> > __uninitialized_default_n_a on the allocator returned pointer rather
> > than on the __to_address result. I wonder if an __uninitialized_fill_n_a
> > would have been better ? Doing so I also had to call std::_Destroy on
> > deallocation. Let me know if it is too early.
>
> You don't need the RAII guard. Initializing Alloc::pointer isn't
> allowed to throw exceptions:
>
> "An allocator type X shall meet the Cpp17CopyConstructible
> requirements (Table 32). The XX::pointer,
> XX::const_pointer, XX::void_pointer, and XX::const_void_pointer types
> shall meet the Cpp17Nullable-
> Pointer requirements (Table 36). No constructor, comparison operator
> function, copy operation, move
> operation, or swap operation on these pointer types shall exit via an
> exception."
>
> And you should not pass the allocator to the __uninitialized_xxx call,
> nor the _Destroy call. We don't want to use the allocator's
> construct/destroy members for those pointers. They are not container
> elements.
>
> I think either uninitialized_fill_n with nullptr or
> __uninitialized_default_n is fine. Not the _a forms taking an
> allocator though.

And I'd use _Destroy_n(_M_buckets, _M_bucket_count)


>
> > I also wonder if the compiler will be able to optimize it to a memset
> > call ? I'm interested to work on it if you confirm that it won't.
>
> It will do whatever is fastest, which might be memset or might be
> vectorized code to zero it out (which is probably what libc memset
> does too).
>
> >
> > libstdc++: Do not use memset in _Hashtable buckets allocation
> >
> > Using memset is incorrect if the __bucket_ptr type is non-trivial, or
> > does not use an all-zero bit pattern for its null value.
> >
> > Replace the use of memset with std::__uinitialized_default_n_a to set the
> > pointers to nullptr. Doing so and corresponding std::_Destroy when
> > deallocating
> > buckets.
> >
> > libstdc++-v3/ChangeLog:
> >
> >  * include/bits/hashtable_policy.h
> >  (_Hashtable_alloc::_M_allocate_buckets): Do not use memset to zero
> >  out bucket pointers.
> >  (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of buckets.
> >
> >
> > I hope you won't ask for copy rights on the changelog entry :-)
> >
> > Tested under Linux x64, ok to commit ?
> >
> > François



Re: [PATCH] libstdc++: Do not use memset _Hashtable buckets allocation

2024-06-13 Thread Jonathan Wakely
On Thu, 13 Jun 2024 at 18:40, François Dumont  wrote:
>
> Hi
>
> Following your recent change here:
>
> https://gcc.gnu.org/pipermail/libstdc++/2024-June/058998.html
>
> I think we also need to fix the memset at bucket allocation level.
>
> I did it trying also to be more fancy pointer friendly by running
> __uninitialized_default_n_a on the allocator returned pointer rather
> than on the __to_address result. I wonder if an __uninitialized_fill_n_a
> would have been better ? Doing so I also had to call std::_Destroy on
> deallocation. Let me know if it is too early.

You don't need the RAII guard. Initializing Alloc::pointer isn't
allowed to throw exceptions:

"An allocator type X shall meet the Cpp17CopyConstructible
requirements (Table 32). The XX::pointer,
XX::const_pointer, XX::void_pointer, and XX::const_void_pointer types
shall meet the Cpp17Nullable-
Pointer requirements (Table 36). No constructor, comparison operator
function, copy operation, move
operation, or swap operation on these pointer types shall exit via an
exception."

And you should not pass the allocator to the __uninitialized_xxx call,
nor the _Destroy call. We don't want to use the allocator's
construct/destroy members for those pointers. They are not container
elements.

I think either uninitialized_fill_n with nullptr or
__uninitialized_default_n is fine. Not the _a forms taking an
allocator though.

> I also wonder if the compiler will be able to optimize it to a memset
> call ? I'm interested to work on it if you confirm that it won't.

It will do whatever is fastest, which might be memset or might be
vectorized code to zero it out (which is probably what libc memset
does too).

>
> libstdc++: Do not use memset in _Hashtable buckets allocation
>
> Using memset is incorrect if the __bucket_ptr type is non-trivial, or
> does not use an all-zero bit pattern for its null value.
>
> Replace the use of memset with std::__uinitialized_default_n_a to set the
> pointers to nullptr. Doing so and corresponding std::_Destroy when
> deallocating
> buckets.
>
> libstdc++-v3/ChangeLog:
>
>  * include/bits/hashtable_policy.h
>  (_Hashtable_alloc::_M_allocate_buckets): Do not use memset to zero
>  out bucket pointers.
>  (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of buckets.
>
>
> I hope you won't ask for copy rights on the changelog entry :-)
>
> Tested under Linux x64, ok to commit ?
>
> François



[PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Carl Love


GCC maintainers:

The test gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c is supposed to 
be a runnable test
to verify the execution of the vec_unpackl and vec_unpackh built-ins.  The 
dg-do command is set to
compile not run.  This patch fixes the dg-do command argument.

The patch has been verified on a P10.  The test runs without errors.

Please let me know if the patch is acceptable.  Thanks.

Carl 

-

rs6000, altivec-2-runnable.c should be a runnable test

The test case has "dg-do compile" set not "dg-do run" for a runnable
test.  This patch changes the dg-do command argument to run.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
argument to run.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 6975ea57e65..3e66435d0d2 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-do run { target powerpc*-*-* } } */
 /* { dg-options "-mvsx" } */
 /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
 /* { dg-require-effective-target powerpc_vsx } */
-- 
2.45.0



[pushed] doc: Spell "command-line option" with a hypen

2024-06-13 Thread Gerald Pfeifer
Per codingconventions.html; noticed reviewing a patch that went in already 
(touching something else actually).

Pushed.

Gerald


gcc:
* doc/extend.texi (AArch64 Function Attributes): Add
(AVR Variable Attributes): Ditto.
(Common Type Attributes): Ditto.
---
 gcc/doc/extend.texi | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ee3644a5264..173cdef0131 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4627,14 +4627,14 @@ the same behavior as that of the command-line option
 Indicates that the workaround for the Cortex-A53 erratum 835769 should be
 applied to this function.  To explicitly disable the workaround for this
 function specify the negated form: @code{no-fix-cortex-a53-835769}.
-This corresponds to the behavior of the command line options
+This corresponds to the behavior of the command-line options
 @option{-mfix-cortex-a53-835769} and @option{-mno-fix-cortex-a53-835769}.
 
 @cindex @code{cmodel=} function attribute, AArch64
 @item cmodel=
 Indicates that code should be generated for a particular code model for
 this function.  The behavior and permissible arguments are the same as
-for the command line option @option{-mcmodel=}.
+for the command-line option @option{-mcmodel=}.
 
 @cindex @code{strict-align} function attribute, AArch64
 @item strict-align
@@ -4694,7 +4694,7 @@ behavior and permissible arguments are the same as for 
the command-line option
 @cindex @code{outline-atomics} function attribute, AArch64
 @item outline-atomics
 Enable or disable calls to out-of-line helpers to implement atomic operations.
-This corresponds to the behavior of the command line options
+This corresponds to the behavior of the command-line options
 @option{-moutline-atomics} and @option{-mno-outline-atomics}.
 
 @end table
@@ -8456,7 +8456,7 @@ volatile int porta __attribute__((address (0x600)));
 
 This attribute can also be used to define symbols in C/C++
 code which otherwise would require assembly, a linker description file
-or command line options like @code{-Wl,--defsym,a_symbol=@var{value}}.
+or command-line options like @code{-Wl,--defsym,a_symbol=@var{value}}.
 For example,
 @smallexample
 int a_symbol __attribute__((weak, address (1234)));
@@ -9473,7 +9473,7 @@ bat (void)
 @end example
 
 @cindex strub eligibility and viability
-Some @option{-fstrub=*} command line options enable @code{strub} modes
+Some @option{-fstrub=*} command-line options enable @code{strub} modes
 implicitly where viable.  A @code{strub} mode is only viable for a
 function if the function is eligible for that mode, and if other
 conditions, detailed below, are satisfied.  If it's not eligible for a
-- 
2.45.2


Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-06-13 Thread Jeff Law




On 6/13/24 10:10 AM, Andi Kleen wrote:

Manolis Tsamis  writes:


Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

   Neoverse-N1:  +29.4%
   Intel Coffeelake: +13.1%
   AMD 5950X:+17.5%


It seems this should have some kind of target hook so that the target
can configure what forwards should be avoided. At least in x86 land
there is a trend to the hardware handling more and more cases with each
generation.
Definitely the case that we should expect the hardware guys to keep 
improving things.  I was speaking to one of ours about this specific 
case and even with their planned improvements in the uarch they think 
the compiler side transformation will perform better when it can be applied.


But yes, I think we're going to need some way to control this not just 
on a per arch, but on a per uarch basis.  I originally thought we just 
do it all the time, but my position has evolved since then.


jeff


Re: [SUBREG V4 2/4] DF: Add DF_LIVE_SUBREG problem

2024-06-13 Thread Richard Sandiford
Thanks for the update!

In terms of high-level comments:

- It looks like the code processes normal pseudos as well as subregs.
  Is that necessary?  The handling of single-register pseudos should
  be fully redundant with the normal liveness tracking.

  I think if possible we should just process pseudos for which
  need_track_subreg_p is true.

- It looks like the information uses a map from register numbers to
  sbitmaps.  That seems like quite an expensive structure.

  The way I'd imagined it working, we'd instead precompute a mapping
  from pseudo registers to a zero-based "subreg id".  A pseudo that
  has N blocks would have N consecutive subreg ids.  We could then
  have a single bitmap (rather than sbitmap) for all subreg ids.

  For example, if we have:

  P == FIRST_PSEUDO_REGISTER

  reg P:   2 subregisters: subreg ids [0, 1]
  reg P+1: 1 subregister:  ignored
  reg P+2: 4 subregisters: subreg ids [2, 5]
  reg P+3: 1 subregister:  ignored

  The map from pseudo registers to subreg ids could be linear array or
  a hash map.  I don't have a good intuition for which would be better,
  but a linear array sounds like the best starting point.

Some individual comments below:

Juzhe-Zhong  writes:
> ---
>  gcc/Makefile.in  |   1 +
>  gcc/df-problems.cc   | 886 ++-
>  gcc/df.h | 159 +++
>  gcc/regs.h   |   5 +
>  gcc/sbitmap.cc   |  98 +
>  gcc/sbitmap.h|   2 +
>  gcc/subreg-live-range.cc | 233 ++
>  gcc/subreg-live-range.h  |  60 +++
>  gcc/timevar.def  |   1 +
>  9 files changed, 1444 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/subreg-live-range.cc
>  create mode 100644 gcc/subreg-live-range.h
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a7f15694c34..67d2e3ca1bc 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1684,6 +1684,7 @@ OBJS = \
>   store-motion.o \
>   streamer-hooks.o \
>   stringpool.o \
> + subreg-live-range.o \
>   substring-locations.o \
>   target-globals.o \
>   targhooks.o \
> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
> index 88ee0dd67fc..01f1f850925 100644
> --- a/gcc/df-problems.cc
> +++ b/gcc/df-problems.cc
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "target.h"
>  #include "rtl.h"
>  #include "df.h"
> +#include "subreg-live-range.h"
>  #include "memmodel.h"
>  #include "tm_p.h"
>  #include "insn-config.h"
> @@ -1344,8 +1345,891 @@ df_lr_verify_transfer_functions (void)
>bitmap_clear (_blocks);
>  }
>  
> +/*
> +   REGISTER AND SUBREGS LIVES
> +   Like DF_LR, but include tracking subreg liveness.  Currently used to 
> provide
> +   subreg liveness related information to the register allocator.  The subreg
> +   information is currently tracked for registers that satisfy the following
> +   conditions:
> + 1.  REG is a pseudo register
> + 2.  MODE_SIZE > UNIT_SIZE
> + 3.  MODE_SIZE is a multiple of UNIT_SIZE
> + 4.  REG is used via subreg pattern
> +   Assuming: MODE = the machine mode of the REG
> +  MODE_SIZE = GET_MODE_SIZE (MODE)
> +  UNIT_SIZE = REGMODE_NATURAL_SIZE (MODE)
> +   Condition 3 is currently strict, maybe it can be removed in the future, 
> but
> +   for now it is sufficient.
> +*/
> +
> +/* These two empty data are used as default data in case the user does not 
> turn
> + * on the track-subreg-liveness feature.  */

Nit: should be no leading "*" on this line.

Maybe:

/* Data for an empty subreg problem, for cases in which subreg tracking
   is not enabled.  */

> +bitmap_head df_subreg_empty_bitmap;
> +subregs_live df_subreg_empty_live;
> +
> +/* Private data for live_subreg problem.  */
> +struct df_live_subreg_problem_data
> +{
> +  /* Record registers that need to track subreg liveness.  */

Maybe:

  /* The set of pseudo registers to track.  */

But with the linear array described above, it would be simpler to check
whether the subreg id >= 0.

> +  bitmap_head tracked_regs;
> +  /* An obstack for the bitmaps we need for this problem.  */
> +  bitmap_obstack live_subreg_bitmaps;
> +};
> +
> +/* Helper functions.  */
> +
> +static df_live_subreg_bb_info *
> +df_live_subreg_get_bb_info (unsigned int index)
> +{
> +  if (index < df_live_subreg->block_info_size)
> +return _cast (
> +  df_live_subreg->block_info)[index];
> +  else
> +return nullptr;
> +}
> +
> +static df_live_subreg_local_bb_info *
> +get_live_subreg_local_bb_info (unsigned int bb_index)
> +{
> +  return df_live_subreg_get_bb_info (bb_index);
> +}
> +
> +/* Return true if regno is a multireg.  */
> +bool
> +multireg_p (int regno)
> +{
> +  if (regno < FIRST_PSEUDO_REGISTER)
> +return false;
> +  rtx regno_rtx = regno_reg_rtx[regno];
> +  machine_mode reg_mode = 

Re: PING^1 [PATCH 22/52] microblaze: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-13 Thread Michael Eager

OK with me.

On 6/13/24 00:16, Kewen.Lin wrote:

Hi,

Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653355.html

BR,
Kewen

on 2024/6/3 11:01, Kewen Lin wrote:

This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in microblaze port.

gcc/ChangeLog:

* config/microblaze/microblaze.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
---
  gcc/config/microblaze/microblaze.h | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/gcc/config/microblaze/microblaze.h 
b/gcc/config/microblaze/microblaze.h
index c88a87c12e2..5d28abf9741 100644
--- a/gcc/config/microblaze/microblaze.h
+++ b/gcc/config/microblaze/microblaze.h
@@ -216,9 +216,6 @@ extern enum pipeline_type microblaze_pipe;
  #define SHORT_TYPE_SIZE 16
  #define LONG_TYPE_SIZE  32
  #define LONG_LONG_TYPE_SIZE 64
-#define FLOAT_TYPE_SIZE 32
-#define DOUBLE_TYPE_SIZE64
-#define LONG_DOUBLE_TYPE_SIZE   64
  #define POINTER_SIZE32
  #define PARM_BOUNDARY   32
  #define FUNCTION_BOUNDARY   32





--
Michael Eager


[PATCH] libstdc++: Do not use memset _Hashtable buckets allocation

2024-06-13 Thread François Dumont

Hi

Following your recent change here:

https://gcc.gnu.org/pipermail/libstdc++/2024-June/058998.html

I think we also need to fix the memset at bucket allocation level.

I did it trying also to be more fancy pointer friendly by running 
__uninitialized_default_n_a on the allocator returned pointer rather 
than on the __to_address result. I wonder if an __uninitialized_fill_n_a 
would have been better ? Doing so I also had to call std::_Destroy on 
deallocation. Let me know if it is too early.


I also wonder if the compiler will be able to optimize it to a memset 
call ? I'm interested to work on it if you confirm that it won't.


libstdc++: Do not use memset in _Hashtable buckets allocation

Using memset is incorrect if the __bucket_ptr type is non-trivial, or
does not use an all-zero bit pattern for its null value.

Replace the use of memset with std::__uinitialized_default_n_a to set the
pointers to nullptr. Doing so and corresponding std::_Destroy when 
deallocating

buckets.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h
    (_Hashtable_alloc::_M_allocate_buckets): Do not use memset to zero
    out bucket pointers.
    (_Hashtable_alloc::_M_deallocate_buckets): Add destroy of buckets.


I hope you won't ask for copy rights on the changelog entry :-)

Tested under Linux x64, ok to commit ?

François
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index 26def24f24e..6456c53e8b8 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -33,8 +33,9 @@
 
 #include// for std::tuple, std::forward_as_tuple
 #include  // for __is_fast_hash
-#include  // for std::min, std::is_permutation.
+#include  // for std::min, std::is_permutation
 #include  // for std::pair
+#include  // for __uninitialized_default_n_a
 #include // for __gnu_cxx::__aligned_buffer
 #include   // for std::__alloc_rebind
 #include // for __gnu_cxx::__int_traits
@@ -2068,12 +2069,39 @@ namespace __detail
 _Hashtable_alloc<_NodeAlloc>::_M_allocate_buckets(std::size_t __bkt_count)
 -> __buckets_ptr
 {
-  __buckets_alloc_type __alloc(_M_node_allocator());
+  // RAII guard for allocated storage.
+  struct _Guard_alloc
+  {
+   __buckets_ptr _M_storage;   // Storage to deallocate
+   std::size_t _M_len;
+   __buckets_alloc_type& _M_alloc;
+
+   _Guard_alloc(__buckets_ptr __s, std::size_t __l,
+__buckets_alloc_type& __alloc)
+   : _M_storage(__s), _M_len(__l), _M_alloc(__alloc)
+   { }
+   _Guard_alloc(const _Guard_alloc&) = delete;
+
+   ~_Guard_alloc()
+   {
+ if (_M_storage)
+   __buckets_alloc_traits::deallocate(_M_alloc, _M_storage, _M_len);
+   }
+
+   __buckets_ptr
+   _M_release()
+   {
+ auto __res = _M_storage;
+ _M_storage = nullptr;
+ return __res;
+   }
+  };
 
+  __buckets_alloc_type __alloc(_M_node_allocator());
   auto __ptr = __buckets_alloc_traits::allocate(__alloc, __bkt_count);
-  __buckets_ptr __p = std::__to_address(__ptr);
-  __builtin_memset(__p, 0, __bkt_count * sizeof(__node_base_ptr));
-  return __p;
+  _Guard_alloc __guard(__ptr, __bkt_count, __alloc);
+  std::__uninitialized_default_n_a(__ptr, __bkt_count, __alloc);
+  return std::__to_address(__guard._M_release());
 }
 
   template
@@ -2085,6 +2113,7 @@ namespace __detail
   typedef typename __buckets_alloc_traits::pointer _Ptr;
   auto __ptr = std::pointer_traits<_Ptr>::pointer_to(*__bkts);
   __buckets_alloc_type __alloc(_M_node_allocator());
+  std::_Destroy(__ptr, __ptr + __bkt_count, __alloc);
   __buckets_alloc_traits::deallocate(__alloc, __ptr, __bkt_count);
 }
 


Re: [PATCH] [alpha] adjust MEM alignment for block move [PR115459] (was: Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required)

2024-06-13 Thread Joseph Myers
On Thu, 13 Jun 2024, Maciej W. Rozycki wrote:

> > This was tricky to duplicate without access to an alpha-linux-gnu
> > machine.  I ended up building an uberbaum tree with --disable-shared
> > --disable-threads --enable-languages=ada up to all-target-libgcc, then I
> > replaced gcc/collect2 with a wrapper script that dropped crt[1in].o and
> > -lc, so that link tests in libada/configure would succeed without glibc
> > for the target.  libada still wouldn't build, because of the missing
> > glibc headers, but I could compile g-depboo.adb with -I pointing at a
> > x86_64-linux-gnu's gcc/ada/rts build tree, and with that, at -O2, I
> > could trigger the problem and investigate it.  And with the following
> > patch, the problem seems to be gone.
> 
>  If you like, I'll share with you a pair of scripts I use to cross-compile 
> Linux configurations.  No target system is required to build things and 
> all is done based on git checkouts from the relevant upstream repos, which 
> then bootstrap themselves using whatever compiler is locally available to 
> bootstrap native GCC first and then going through the ususal steps to get 
> a target cross-compiler.

Also, you can use build-many-glibcs.py --full-gcc to build a cross 
compiler with all languages (for any architecture and ABI supported by 
glibc), but you need to build the same-version native compiler yourself 
first (and put it in the PATH) in order to build Ada; that bit isn't 
automated by build-many-glibcs.py.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] c++: alias CTAD and copy deduction guide [PR115198]

2024-06-13 Thread Patrick Palka
On Thu, 13 Jun 2024, Jason Merrill wrote:

> On 6/13/24 11:05, Patrick Palka wrote:
> > On Thu, 23 May 2024, Jason Merrill wrote:
> > 
> > > On 5/23/24 17:42, Patrick Palka wrote:
> > > > On Thu, 23 May 2024, Jason Merrill wrote:
> > > > 
> > > > > On 5/23/24 14:06, Patrick Palka wrote:
> > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > > > OK for trunk/14?
> > > > > > 
> > > > > > -- >8 --
> > > > > > 
> > > > > > Here we're neglecting to update DECL_NAME during the alias CTAD
> > > > > > guide
> > > > > > transformation, which causes copy_guide_p to return false for the
> > > > > > transformed copy deduction guide since DECL_NAME is still __dguide_C
> > > > > > with TREE_TYPE C but it should be __dguide_A with TREE_TYPE
> > > > > > A
> > > > > > (equivalently C).  This ultimately results in ambiguity
> > > > > > during
> > > > > > overload resolution between the copy deduction guide vs copy ctor
> > > > > > guide.
> > > > > > 
> > > > > > This patch makes us update DECL_NAME of a transformed guide
> > > > > > accordingly
> > > > > > during alias CTAD.  This eventually needs to be done for inherited
> > > > > > CTAD
> > > > > > too, but it's not clear what identifier to use there since it has to
> > > > > > be
> > > > > > unique for each derived/base pair.  For
> > > > > > 
> > > > > >  template struct A { ... };
> > > > > >  template struct B : A { using A > > > > > T>::A; }
> > > > > > 
> > > > > > at first glance it'd be reasonable to give inherited guides a name
> > > > > > of
> > > > > > __dguide_B with TREE_TYPE A, but since that name is
> > > > > > already
> > > > > > used B's own guides its TREE_TYPE is already B.
> > > > > 
> > > > > Why can't it be the same __dguide_B with TREE_TYPE B?
> > > > 
> > > > Ah because copy_guide_p relies on TREE_TYPE in order to recognize a copy
> > > > deduction guide, and with that TREE_TYPE it would still incorrectly
> > > > return false for an inherited copy deduction guide, e.g.
> > > > 
> > > > A(A) -> A
> > > > 
> > > > gets transformed into
> > > > 
> > > > B(A) -> B
> > > > 
> > > > and A != B so copy_guide_p returns false.
> > > 
> > > Hmm, that seems correct; the transformed candidate is not the copy
> > > deduction
> > > guide for B.
> > 
> > By https://eel.is/c++draft/over.match.class.deduct#3.4 it seems that a
> > class template can now have multiple copy deduction guides with inherited
> > CTAD: the derived class's own copy guide, along with the transformed copy
> > guides of its eligible base classes.  Do we want to follow the standard
> > precisely here, or should we maybe restrict the copy-guideness propagation
> > to alias CTAD only?
> 
> The latter, I think; it seems nonsensical to have multiple copy guides.

Sounds good, so for inherited CTAD it should suffice to use __dguide_B
as the name (where B is the derived class), like so?

-- >8 --

Subject: [PATCH] c++: alias CTAD and copy deduction guide [PR115198]

Here we're neglecting to update DECL_NAME during the alias CTAD guide
transformation, which causes copy_guide_p to return false for the
transformed copy deduction guide since DECL_NAME is still __dguide_C
with TREE_TYPE C but it should be __dguide_A with TREE_TYPE A
(equivalently C).  This ultimately results in ambiguity during
overload resolution between the copy deduction guide vs copy ctor guide.

This patch makes us update DECL_NAME of a transformed guide accordingly
during alias/inherited CTAD.

PR c++/115198

gcc/cp/ChangeLog:

* pt.cc (alias_ctad_tweaks): Update DECL_NAME of a transformed
guide.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias22.C: New test.
---
 gcc/cp/pt.cc   |  6 +-
 .../g++.dg/cpp2a/class-deduction-alias22.C | 14 ++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias22.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 607753ae6b7..daa8ac386dc 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30342,13 +30342,14 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  any).  */
 
   enum { alias, inherited } ctad_kind;
-  tree atype, fullatparms, utype;
+  tree atype, fullatparms, utype, name;
   if (TREE_CODE (tmpl) == TEMPLATE_DECL)
 {
   ctad_kind = alias;
   atype = TREE_TYPE (tmpl);
   fullatparms = DECL_TEMPLATE_PARMS (tmpl);
   utype = DECL_ORIGINAL_TYPE (DECL_TEMPLATE_RESULT (tmpl));
+  name = dguide_name (tmpl);
 }
   else
 {
@@ -30356,6 +30357,8 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
   atype = NULL_TREE;
   fullatparms = TREE_PURPOSE (tmpl);
   utype = TREE_VALUE (tmpl);
+  name = dguide_name (TPARMS_PRIMARY_TEMPLATE
+ (INNERMOST_TEMPLATE_PARMS (fullatparms)));
 }
 
   tsubst_flags_t complain = tf_warning_or_error;
@@ -30451,6 +30454,7 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
}
  

Re: [PATCH] c++: alias CTAD and copy deduction guide [PR115198]

2024-06-13 Thread Patrick Palka
On Thu, 23 May 2024, Jason Merrill wrote:

> On 5/23/24 17:42, Patrick Palka wrote:
> > On Thu, 23 May 2024, Jason Merrill wrote:
> > 
> > > On 5/23/24 14:06, Patrick Palka wrote:
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > OK for trunk/14?
> > > > 
> > > > -- >8 --
> > > > 
> > > > Here we're neglecting to update DECL_NAME during the alias CTAD guide
> > > > transformation, which causes copy_guide_p to return false for the
> > > > transformed copy deduction guide since DECL_NAME is still __dguide_C
> > > > with TREE_TYPE C but it should be __dguide_A with TREE_TYPE A
> > > > (equivalently C).  This ultimately results in ambiguity during
> > > > overload resolution between the copy deduction guide vs copy ctor guide.
> > > > 
> > > > This patch makes us update DECL_NAME of a transformed guide accordingly
> > > > during alias CTAD.  This eventually needs to be done for inherited CTAD
> > > > too, but it's not clear what identifier to use there since it has to be
> > > > unique for each derived/base pair.  For
> > > > 
> > > > template struct A { ... };
> > > > template struct B : A { using A::A; }
> > > > 
> > > > at first glance it'd be reasonable to give inherited guides a name of
> > > > __dguide_B with TREE_TYPE A, but since that name is already
> > > > used B's own guides its TREE_TYPE is already B.
> > > 
> > > Why can't it be the same __dguide_B with TREE_TYPE B?
> > 
> > Ah because copy_guide_p relies on TREE_TYPE in order to recognize a copy
> > deduction guide, and with that TREE_TYPE it would still incorrectly
> > return false for an inherited copy deduction guide, e.g.
> > 
> >A(A) -> A
> > 
> > gets transformed into
> > 
> >B(A) -> B
> > 
> > and A != B so copy_guide_p returns false.
> 
> Hmm, that seems correct; the transformed candidate is not the copy deduction
> guide for B.

By https://eel.is/c++draft/over.match.class.deduct#3.4 it seems that a
class template can now have multiple copy deduction guides with inherited
CTAD: the derived class's own copy guide, along with the transformed copy
guides of its eligible base classes.  Do we want to follow the standard
precisely here, or should we maybe restrict the copy-guideness propagation
to alias CTAD only?

> 
> > But it just occurred to me that this TREE_TYPE clobbering of the
> > __dguide_foo identifier already happens if we have two class templates
> > with the same name in different namespaces, since the identifier
> > contains only the terminal name.  Maybe this suggests that we should
> > use a tree flag to track whether a guide is the copy deduction guide
> > instead of setting TREE_TYPE of DECL_NAME?
> 
> Good point.
> 
> Jason
> 
> 



[pushed] c++/modules: export using across namespace [PR114683]

2024-06-13 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Currently we represent a non-function using-declaration by inserting the
named declaration into the target scope.  In general this works fine, but in
the case of an exported using-declaration we have nowhere to mark the
using-declaration as exported, so we mark the original declaration as
exported instead, and then treat all using-declarations that name it as
exported as well.  We were doing this only if there was also a previous
non-exported using, so for this testcase the export got lost; this patch
broadens the workaround to also apply to the using that first brings the
declaration into the current scope.

This does not fully resolve 114683, but replaces a missing exports bug with
an extra exports bug, which should be a significant usability improvement.
The testcase has xfails for extra exports.

I imagine a complete fix should involve inserting a USING_DECL.

PR c++/114683

gcc/cp/ChangeLog:

* name-lookup.cc (do_nonmember_using_decl): Allow exporting
a newly inserted decl.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-22_a.C: New test.
* g++.dg/modules/using-22_b.C: New test.
---
 gcc/cp/name-lookup.cc |  5 ++---
 gcc/testsuite/g++.dg/modules/using-22_a.C | 24 +++
 gcc/testsuite/g++.dg/modules/using-22_b.C | 13 
 3 files changed, 39 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/using-22_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-22_b.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 71482db7b76..b57893116eb 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -5316,14 +5316,13 @@ do_nonmember_using_decl (name_lookup , bool 
fn_scope_p,
   /* FIXME: Handle exporting declarations from a different scope
 without also marking those declarations as exported.
 This will require not just binding directly to the underlying
-value; see c++/114863 and c++/114865.  We allow this for purview
-declarations for now as this doesn't (currently) cause ICEs
+value; see c++/114683 and c++/114685.  We allow the extra exports
+for now as this doesn't (currently) cause ICEs
 later down the line, but this should be revisited.  */
   if (revealing_p)
{
  if (module_exporting_p ()
  && check_can_export_using_decl (lookup.value)
- && lookup.value == value
  && !DECL_MODULE_EXPORT_P (lookup.value))
{
  /* We're redeclaring the same value, but this time as
diff --git a/gcc/testsuite/g++.dg/modules/using-22_a.C 
b/gcc/testsuite/g++.dg/modules/using-22_a.C
new file mode 100644
index 000..9eca9dacb46
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-22_a.C
@@ -0,0 +1,24 @@
+// PR c++/114683
+// { dg-additional-options "-fmodules-ts -Wno-global-module" }
+
+module;
+
+namespace std
+{
+  inline namespace __cxx11
+  {
+template 
+struct basic_string{};
+  }
+}
+
+namespace foo {
+  using std::basic_string;
+}
+
+export module std;
+
+export namespace std
+{
+  using std::basic_string;
+}
diff --git a/gcc/testsuite/g++.dg/modules/using-22_b.C 
b/gcc/testsuite/g++.dg/modules/using-22_b.C
new file mode 100644
index 000..0b66f4ad6b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-22_b.C
@@ -0,0 +1,13 @@
+// { dg-additional-options "-fmodules-ts" }
+
+import std;
+
+int main()
+{
+  std::basic_string s;
+
+  // The inline namespace should not be exported, only the 'using' in std.
+  std::__cxx11::basic_string s2; // { dg-error "has not been declared" 
"" { xfail *-*-* } }
+  // The non-exported using should also not be visible.
+  foo::basic_string s3; // { dg-error "has not been declared" "" { xfail 
*-*-* } }
+}

base-commit: 99e6cf404e37655be303e71f20df03c284c7989e
-- 
2.44.0



Re: [pushed] c++: module std and exception_ptr

2024-06-13 Thread Patrick Palka
On Wed, 12 Jun 2024, Jason Merrill wrote:

> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
> -- 8< --
> 
> exception_ptr.h contains
> 
>   namespace __exception_ptr
>   {
> class exception_ptr;
>   }
>   using __exception_ptr::exception_ptr;
> 
> so when module std tries to 'export using std::exception_ptr', it names
> another using-directive rather than the class directly, so __exception_ptr
> is never explicitly opened in module purview.

FWIW PR100134 ICEd in the same way, and r13-3236-g9736a42e1fb8df
narrowly fixed this by setting DECL_MODULE_PURVIEW_P on the enclosing
namespace around the time we set the flag on the namespace-scope entity in
question.  I wonder if it'd be preferable to do something similar here,
e.g. set DECL_MODULE_PURVIEW_P on the enclosing namespace in
do_nonmember_using_decl?

> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (depset::hash::add_binding_entity): Set
>   DECL_MODULE_PURVIEW_P instead of asserting.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/using-20_a.C: New test.
> ---
>  gcc/cp/module.cc  |  7 +--
>  gcc/testsuite/g++.dg/modules/using-20_a.C | 14 ++
>  2 files changed, 19 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/using-20_a.C
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 21fc85150c9..72e876cec18 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -13253,8 +13253,11 @@ depset::hash::add_binding_entity (tree decl, 
> WMB_Flags flags, void *data_)
>data->met_namespace = true;
>if (data->hash->add_namespace_entities (decl, data->partitions))
>   {
> -   /* It contains an exported thing, so it is exported.  */
> -   gcc_checking_assert (DECL_MODULE_PURVIEW_P (decl));
> +   /* It contains an exported thing, so it is exported.
> +  We used to assert DECL_MODULE_PURVIEW_P, but that fails for a
> +  namespace like std::__exception_ptr which is never opened in
> +  module purview; the exporting using finds another using.  */
> +   DECL_MODULE_PURVIEW_P (decl) = true;
> DECL_MODULE_EXPORT_P (decl) = true;
>   }
>  
> diff --git a/gcc/testsuite/g++.dg/modules/using-20_a.C 
> b/gcc/testsuite/g++.dg/modules/using-20_a.C
> new file mode 100644
> index 000..bb3bb6160f8
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/using-20_a.C
> @@ -0,0 +1,14 @@
> +// { dg-additional-options "-fmodules-ts -fdump-lang-module 
> -Wno-global-module" }
> +// { dg-final { scan-lang-dump {Writing definition '::foo::bar::baz'} module 
> } }
> +
> +module;
> +namespace foo {
> +  namespace bar {
> +struct baz { };
> +  }
> +  using bar::baz;
> +}
> +export module foo;
> +namespace foo {
> +  export using foo::baz;
> +}
> 
> base-commit: 7bf072e87a03c9eaff9b7a1ac182537b70f0ba8e
> -- 
> 2.44.0
> 
> 



Re: [PATCH v3] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-06-13 Thread Jason Merrill

On 6/13/24 10:31, Ken Matsui wrote:

This patch adds a warning switch for "#pragma once in main file".  The
warning option name is Wpragma-once-outside-header, which is the same
as Clang.

PR preprocessor/89808

gcc/c-family/ChangeLog:

* c.opt (Wpragma_once_outside_header): Define new option.

gcc/ChangeLog:

* doc/invoke.texi (Warning Options): Document
-Wno-pragma-once-outside-header.

libcpp/ChangeLog:

* include/cpplib.h (struct cpp_options): Define
cpp_warn_pragma_once_outside_header.


This bit-field should be unneeded now, along with the uses of it.


(cpp_warning_reason): Define CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
* directives.cc (do_pragma_once): Use
cpp_warn_pragma_once_outside_header and
CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
* init.cc (cpp_create_reader): Handle
cpp_warn_pragma_once_outside_header.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
* g++.dg/warn/Wpragma-once-outside-header.C: New test.

Signed-off-by: Ken Matsui 
---
  gcc/c-family/c.opt |  4 
  gcc/doc/invoke.texi| 10 --
  .../g++.dg/warn/Wno-pragma-once-outside-header.C   |  5 +
  .../g++.dg/warn/Wpragma-once-outside-header.C  |  6 ++
  libcpp/directives.cc   |  9 ++---
  libcpp/include/cpplib.h|  7 ++-
  libcpp/init.cc |  1 +
  7 files changed, 36 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
  create mode 100644 gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 403abc1f26e..3439f36fe45 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1188,6 +1188,10 @@ Wpragmas
  C ObjC C++ ObjC++ Var(warn_pragmas) Init(1) Warning
  Warn about misuses of pragmas.
  
+Wpragma-once-outside-header

+C ObjC C++ ObjC++ Var(warn_pragma_once_outside_header) 
CppReason(CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER) Init(1) Warning
+Warn about #pragma once outside of a header.
+
  Wprio-ctor-dtor
  C ObjC C++ ObjC++ Var(warn_prio_ctor_dtor) Init(1) Warning
  Warn if constructor or destructors with priorities from 0 to 100 are used.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9456ced468a..c7f17ca9eb7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -391,8 +391,8 @@ Objective-C and Objective-C++ Dialects}.
  -Wpacked  -Wno-packed-bitfield-compat  -Wpacked-not-aligned  -Wpadded
  -Wparentheses  -Wno-pedantic-ms-format
  -Wpointer-arith  -Wno-pointer-compare  -Wno-pointer-to-int-cast
--Wno-pragmas  -Wno-prio-ctor-dtor  -Wredundant-decls
--Wrestrict  -Wno-return-local-addr  -Wreturn-type
+-Wno-pragmas  -Wno-pragma-once-outside-header  -Wno-prio-ctor-dtor
+-Wredundant-decls  -Wrestrict  -Wno-return-local-addr  -Wreturn-type
  -Wno-scalar-storage-order  -Wsequence-point
  -Wshadow  -Wshadow=global  -Wshadow=local  -Wshadow=compatible-local
  -Wno-shadow-ivar
@@ -7983,6 +7983,12 @@ Do not warn about misuses of pragmas, such as incorrect 
parameters,
  invalid syntax, or conflicts between pragmas.  See also
  @option{-Wunknown-pragmas}.
  
+@opindex Wno-pragma-once-outside-header

+@opindex Wpragma-once-outside-header
+@item -Wno-pragma-once-outside-header
+Do not warn when @code{#pragma once} is used in a file that is not a header
+file, such as a main file.
+
  @opindex Wno-prio-ctor-dtor
  @opindex Wprio-ctor-dtor
  @item -Wno-prio-ctor-dtor
diff --git a/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C 
b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
new file mode 100644
index 000..b5be4d25a9d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
@@ -0,0 +1,5 @@
+// { dg-do assemble  }
+// { dg-options "-Wno-pragma-once-outside-header" }
+
+#pragma once
+int main() {}
diff --git a/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C 
b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
new file mode 100644
index 000..324b0638c3f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
@@ -0,0 +1,6 @@
+// { dg-do assemble  }
+// { dg-options "-Werror=pragma-once-outside-header" }
+// { dg-message "some warnings being treated as errors" "" {target "*-*-*"} 0 }
+
+#pragma once  // { dg-error "#pragma once in main file" }
+int main() {}
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 479f8c716e8..68f47104dea 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1588,8 +1588,12 @@ do_pragma (cpp_reader *pfile)
  static void
  do_pragma_once (cpp_reader *pfile)
  {
-  if (_cpp_in_main_source_file (pfile))
-cpp_error (pfile, CPP_DL_WARNING, "#pragma once in main file");
+  const unsigned char warn_level =
+CPP_OPTION (pfile, 

Re: [PATCH] c++: alias CTAD and copy deduction guide [PR115198]

2024-06-13 Thread Jason Merrill

On 6/13/24 11:05, Patrick Palka wrote:

On Thu, 23 May 2024, Jason Merrill wrote:


On 5/23/24 17:42, Patrick Palka wrote:

On Thu, 23 May 2024, Jason Merrill wrote:


On 5/23/24 14:06, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/14?

-- >8 --

Here we're neglecting to update DECL_NAME during the alias CTAD guide
transformation, which causes copy_guide_p to return false for the
transformed copy deduction guide since DECL_NAME is still __dguide_C
with TREE_TYPE C but it should be __dguide_A with TREE_TYPE A
(equivalently C).  This ultimately results in ambiguity during
overload resolution between the copy deduction guide vs copy ctor guide.

This patch makes us update DECL_NAME of a transformed guide accordingly
during alias CTAD.  This eventually needs to be done for inherited CTAD
too, but it's not clear what identifier to use there since it has to be
unique for each derived/base pair.  For

 template struct A { ... };
 template struct B : A { using A::A; }

at first glance it'd be reasonable to give inherited guides a name of
__dguide_B with TREE_TYPE A, but since that name is already
used B's own guides its TREE_TYPE is already B.


Why can't it be the same __dguide_B with TREE_TYPE B?


Ah because copy_guide_p relies on TREE_TYPE in order to recognize a copy
deduction guide, and with that TREE_TYPE it would still incorrectly
return false for an inherited copy deduction guide, e.g.

A(A) -> A

gets transformed into

B(A) -> B

and A != B so copy_guide_p returns false.


Hmm, that seems correct; the transformed candidate is not the copy deduction
guide for B.


By https://eel.is/c++draft/over.match.class.deduct#3.4 it seems that a
class template can now have multiple copy deduction guides with inherited
CTAD: the derived class's own copy guide, along with the transformed copy
guides of its eligible base classes.  Do we want to follow the standard
precisely here, or should we maybe restrict the copy-guideness propagation
to alias CTAD only?


The latter, I think; it seems nonsensical to have multiple copy guides.

Jason


But it just occurred to me that this TREE_TYPE clobbering of the
__dguide_foo identifier already happens if we have two class templates
with the same name in different namespaces, since the identifier
contains only the terminal name.  Maybe this suggests that we should
use a tree flag to track whether a guide is the copy deduction guide
instead of setting TREE_TYPE of DECL_NAME?


Good point.

Jason








Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-06-13 Thread Andi Kleen
Manolis Tsamis  writes:
>
> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
>
>   Neoverse-N1:  +29.4%
>   Intel Coffeelake: +13.1%
>   AMD 5950X:+17.5%

It seems this should have some kind of target hook so that the target
can configure what forwards should be avoided. At least in x86 land
there is a trend to the hardware handling more and more cases with each
generation.

Also is there any data what this does to code size? Perhaps it should be
only done on hot blocks? 

And did you see speedups on real applications?

-Andi


RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

2024-06-13 Thread Maciej W. Rozycki
On Thu, 13 Jun 2024, Li, Pan2 wrote:

> Could you please help to update the upstream for another try ?
> 
> Should be fixed by this commit 
> https://github.com/gcc-mirror/gcc/commit/d03ff3fd3e2da1352a404e3c53fe61314569345c.
> 
> Feel free to ping me if any questions or concerns.

 Upstream master (as at 609764a42f0c) doesn't build:

In file included from .../gcc/gcc/coretypes.h:487,
 from .../gcc/gcc/tree-vect-stmts.cc:24:
In member function 'bool poly_int::is_constant() const [with unsigned int 
N = 2; C = long unsigned int]',
inlined from 'C poly_int::to_constant() const [with unsigned int N = 
2; C = long unsigned int]' at .../gcc/gcc/poly-int.h:588:3,
inlined from 'bool get_group_load_store_type(vec_info*, stmt_vec_info, 
tree, slp_tree, bool, vec_load_store_type, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2155:39,
inlined from 'bool get_load_store_type(vec_info*, stmt_vec_info, tree, 
slp_tree, bool, vec_load_store_type, unsigned int, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2387:38:
.../gcc/gcc/poly-int.h:557:7: error: 'remain.poly_int<2, long unsigned 
int>::coeffs[1]' may be used uninitialized [-Werror=maybe-uninitialized]
  557 |   if (this->coeffs[i] != 0)
  |   ^~
.../gcc/gcc/tree-vect-stmts.cc: In function 'bool 
get_load_store_type(vec_info*, stmt_vec_info, tree, slp_tree, bool, 
vec_load_store_type, unsigned int, vect_memory_access_type*, poly_int64*, 
dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)':
.../gcc/gcc/tree-vect-stmts.cc:2115:23: note: 'remain.poly_int<2, long unsigned 
int>::coeffs[1]' was declared here
 2115 |   poly_uint64 remain;
  |   ^~
In file included from .../gcc/gcc/system.h:1250,
 from .../gcc/gcc/tree-vect-stmts.cc:23:
In function 'int ceil_log2(long unsigned int)',
inlined from 'bool get_group_load_store_type(vec_info*, stmt_vec_info, 
tree, slp_tree, bool, vec_load_store_type, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2156:43,
inlined from 'bool get_load_store_type(vec_info*, stmt_vec_info, tree, 
slp_tree, bool, vec_load_store_type, unsigned int, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2387:38:
.../gcc/gcc/hwint.h:266:17: error: 'remain.poly_int<2, long unsigned 
int>::coeffs[0]' may be used uninitialized [-Werror=maybe-uninitialized]
  266 |   return x == 0 ? 0 : floor_log2 (x - 1) + 1;
  |  ~~~^~~~
.../gcc/gcc/tree-vect-stmts.cc: In function 'bool 
get_load_store_type(vec_info*, stmt_vec_info, tree, slp_tree, bool, 
vec_load_store_type, unsigned int, vect_memory_access_type*, poly_int64*, 
dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)':
.../gcc/gcc/tree-vect-stmts.cc:2115:23: note: 'remain.poly_int<2, long unsigned 
int>::coeffs[0]' was declared here
 2115 |   poly_uint64 remain;
  |   ^~
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:1198: tree-vect-stmts.o] Error 1

and actually e14afbe2d1c6^ doesn't build either (I guess it's just slipped 
through bisection as the file didn't have to be rebuild or something):

In file included from .../gcc/gcc/rtl.h:3973,
 from .../gcc/gcc/config/riscv/riscv.cc:31:
In function 'rtx_def* init_rtx_fmt_ee(rtx, machine_mode, rtx, rtx)',
inlined from 'rtx_def* gen_rtx_fmt_ee_stat(rtx_code, machine_mode, rtx, 
rtx)' at ./genrtl.h:50:26,
inlined from 'void riscv_move_integer(rtx, rtx, long int, machine_mode)' at 
.../gcc/gcc/config/riscv/riscv.cc:2786:10:
./genrtl.h:37:16: error: 'x' may be used uninitialized 
[-Werror=maybe-uninitialized]
   37 |   XEXP (rt, 0) = arg0;
.../gcc/gcc/config/riscv/riscv.cc: In function 'void riscv_move_integer(rtx, 
rtx, long int, machine_mode)':
.../gcc/gcc/config/riscv/riscv.cc:2723:7: note: 'x' was declared here
 2723 |   rtx x;
  |   ^
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:2563: riscv.o] Error 1

I hope you'll find this all useful.  As it happens I don't need to verify 
my needs with a RISC-V target anymore, so I'm leaving it all up to you now 
as I need to switch back to Alpha, which has been my actual objective, and 
these rebuilds have taken a lot of my attention already.

 Thank you for your input.

  Maciej


Re: [PATCH v2] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-06-13 Thread Ken Matsui
On Tue, Jun 4, 2024 at 7:54 AM Jason Merrill  wrote:
>
> On 3/14/24 04:01, Ken Matsui wrote:
> > On Sat, Mar 2, 2024 at 5:04 AM Ken Matsui  wrote:
> >>
> >> This patch adds a warning switch for "#pragma once in main file".  The
> >> warning option name is Wpragma-once-outside-header, which is the same
> >> as Clang.
> >
> > Ping.
> >
> >>
> >>  PR preprocessor/89808
> >>
> >> gcc/c-family/ChangeLog:
> >>
> >>  * c-opts.cc (c_common_handle_option): Handle
> >>  OPT_Wpragma_once_outside_header.
> >>  * c.opt (Wpragma_once_outside_header): Define new option.
> >>
> >> gcc/ChangeLog:
> >>
> >>  * doc/invoke.texi (Warning Options): Document
> >>  -Wno-pragma-once-outside-header.
> >>
> >> libcpp/ChangeLog:
> >>
> >>  * include/cpplib.h (struct cpp_options): Define
> >>  cpp_warn_pragma_once_outside_header.
> >>  * directives.cc (do_pragma_once): Use
> >>  cpp_warn_pragma_once_outside_header.
> >>  * init.cc (cpp_create_reader): Handle
> >>  cpp_warn_pragma_once_outside_header.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * g++.dg/Wpragma-once-outside-header.C: New test.
>
> Please drop this file, keeping the duplicate in the warn subdirectory.
>
> >>  * g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
> >>  * g++.dg/warn/Wpragma-once-outside-header.C: New test.
> >>
> >> Signed-off-by: Ken Matsui 
> >> ---
> >>   gcc/c-family/c-opts.cc |  9 +
> >>   gcc/c-family/c.opt |  4 
> >>   gcc/doc/invoke.texi| 10 --
> >>   gcc/testsuite/g++.dg/Wpragma-once-outside-header.C |  5 +
> >>   .../g++.dg/warn/Wno-pragma-once-outside-header.C   |  5 +
> >>   .../g++.dg/warn/Wpragma-once-outside-header.C  |  5 +
> >>   libcpp/directives.cc   |  8 ++--
> >>   libcpp/include/cpplib.h|  4 
> >>   libcpp/init.cc |  1 +
> >>   9 files changed, 47 insertions(+), 4 deletions(-)
> >>   create mode 100644 gcc/testsuite/g++.dg/Wpragma-once-outside-header.C
> >>   create mode 100644 
> >> gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
> >>   create mode 100644 
> >> gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
> >>
> >> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> >> index be3058dca63..4edd8c6c515 100644
> >> --- a/gcc/c-family/c-opts.cc
> >> +++ b/gcc/c-family/c-opts.cc
> >> @@ -430,6 +430,15 @@ c_common_handle_option (size_t scode, const char 
> >> *arg, HOST_WIDE_INT value,
> >> cpp_opts->warn_num_sign_change = value;
> >> break;
> >>
> >> +case OPT_Wpragma_once_outside_header:
> >> +  if (value == 0)
> >> +   cpp_opts->cpp_warn_pragma_once_outside_header = 0;
> >> +  else if (kind == DK_ERROR)
> >> +   cpp_opts->cpp_warn_pragma_once_outside_header = 2;
> >> +  else
> >> +   cpp_opts->cpp_warn_pragma_once_outside_header = 1;
> >> +  break;
>
> Rather than encode the -Werror this way...
>
> >>   case OPT_Wunknown_pragmas:
> >> /* Set to greater than 1, so that even unknown pragmas in
> >>   system headers will be warned about.  */
> >> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> >> index b7a4a1a68e3..6841a5a5e81 100644
> >> --- a/gcc/c-family/c.opt
> >> +++ b/gcc/c-family/c.opt
> >> @@ -1180,6 +1180,10 @@ Wpragmas
> >>   C ObjC C++ ObjC++ Var(warn_pragmas) Init(1) Warning
> >>   Warn about misuses of pragmas.
> >>
> >> +Wpragma-once-outside-header
> >> +C ObjC C++ ObjC++ Var(warn_pragma_once_outside_header) Init(1) Warning
> >> +Warn about #pragma once outside of a header.
> >> +
> >>   Wprio-ctor-dtor
> >>   C ObjC C++ ObjC++ Var(warn_prio_ctor_dtor) Init(1) Warning
> >>   Warn if constructor or destructors with priorities from 0 to 100 are 
> >> used.
> >> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> >> index bdf05be387d..eeb8954bcdf 100644
> >> --- a/gcc/doc/invoke.texi
> >> +++ b/gcc/doc/invoke.texi
> >> @@ -391,8 +391,8 @@ Objective-C and Objective-C++ Dialects}.
> >>   -Wpacked  -Wno-packed-bitfield-compat  -Wpacked-not-aligned  -Wpadded
> >>   -Wparentheses  -Wno-pedantic-ms-format
> >>   -Wpointer-arith  -Wno-pointer-compare  -Wno-pointer-to-int-cast
> >> --Wno-pragmas  -Wno-prio-ctor-dtor  -Wredundant-decls
> >> --Wrestrict  -Wno-return-local-addr  -Wreturn-type
> >> +-Wno-pragmas  -Wno-pragma-once-outside-header  -Wno-prio-ctor-dtor
> >> +-Wredundant-decls  -Wrestrict  -Wno-return-local-addr  -Wreturn-type
> >>   -Wno-scalar-storage-order  -Wsequence-point
> >>   -Wshadow  -Wshadow=global  -Wshadow=local  -Wshadow=compatible-local
> >>   -Wno-shadow-ivar
> >> @@ -7955,6 +7955,12 @@ Do not warn about misuses of pragmas, such as 
> >> incorrect parameters,
> >>   invalid syntax, or conflicts between pragmas.  See 

Re: [SUBREG V4 1/4] DF: Add -ftrack-subreg-liveness option

2024-06-13 Thread Richard Sandiford
Juzhe-Zhong  writes:
> ---
>  gcc/common.opt  | 4 
>  gcc/common.opt.urls | 3 +++
>  gcc/doc/invoke.texi | 8 
>  gcc/opts.cc | 1 +
>  4 files changed, 16 insertions(+)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 40cab3cb36a..5710e817abe 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2163,6 +2163,10 @@ fira-share-spill-slots
>  Common Var(flag_ira_share_spill_slots) Init(1) Optimization
>  Share stack slots for spilled pseudo-registers.
>  
> +ftrack-subreg-liveness
> +Common Var(flag_track_subreg_liveness) Init(0) Optimization
> +Track subreg liveness information.
> +
>  fira-verbose=
>  Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
>  -fira-verbose=   Control IRA's level of diagnostic messages.
> diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
> index f71ed80a34b..59f27a6f7c6 100644
> --- a/gcc/common.opt.urls
> +++ b/gcc/common.opt.urls
> @@ -880,6 +880,9 @@ 
> UrlSuffix(gcc/Optimize-Options.html#index-fira-share-save-slots)
>  fira-share-spill-slots
>  UrlSuffix(gcc/Optimize-Options.html#index-fira-share-spill-slots)
>  
> +ftrack-subreg-liveness
> +UrlSuffix(gcc/Optimize-Options.html#index-ftrack-subreg-liveness)
> +
>  fira-verbose=
>  UrlSuffix(gcc/Developer-Options.html#index-fira-verbose)
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ddcd5213f06..fbcde8aa745 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13188,6 +13188,14 @@ Disable sharing of stack slots allocated for 
> pseudo-registers.  Each
>  pseudo-register that does not get a hard register gets a separate
>  stack slot, and as a result function stack frames are larger.
>  
> +@opindex ftrack-subreg-liveness
> +@item -ftrack-subreg-liveness
> +Enable tracking subreg liveness information. This infomation allows IRA
> +and LRA to support subreg coalesce feature which can improve the quality
> +of register allocation.
> +
> +This option is enabled at level @option{-O3} for all targets.
> +

This is a good description, but some of these GCC terms might not be
familiar to users.  How about something like:


Enable a more precise form of dataflow analysis.  This analysis focuses
on values that occupy multiple consecutive machine registers; examples
of such values include complex numbers and small tuples of vectors.
The analysis detects which parts of a value are in use at a given time
and which parts are free to be reused for other things.  Enabling the
analysis can improve the quality of register allocation.

This option is enabled at level @option{-O3} for all targets.


It might be worth enabling at -O2 and above eventually, but I agree
it makes sense to start with -O3.

OK with that change if you agree and if there are no countersuggestions
from others.

Thanks,
Richard

>  @opindex flra-remat
>  @item -flra-remat
>  Enable CFG-sensitive rematerialization in LRA.  Instead of loading
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index 14d1767e48f..8fe3a213807 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -698,6 +698,7 @@ static const struct default_options 
> default_options_table[] =
>  { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, 
> VECT_COST_MODEL_DYNAMIC },
>  { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
> +{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
>  
>  /* -O3 parameters.  */
>  { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },


Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:58, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:10, Carl Love wrote:
>>  This was patch 10 from the previous series.  The patch was updated to 
>> address feedback comments.
>>
>> Carl 
>> ---
>>
>> rs6000, extend vec_xxpermdi built-in for __int128 args
>>
>> Add a new signed and unsigned overloaded instances for vec_xxpermdi
>>
>>__int128 vec_xxpermdi (__int128, __int128, const int);
>>__uint128 vec_xxpermdi (__uint128, __uint128, const int);
>>
>> Update the documentation to include a reference to the new built-in
>> instances.
>>
>> Add test cases for the new overloaded instances.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
>>  overloaded built-in instances.
>>  * doc/extend.texi:  Add documentation for new overloaded built-in
>>  instances.
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
>> ---
>>  gcc/config/rs6000/rs6000-overload.def |   4 +
>>  gcc/doc/extend.texi   |   2 +
>>  .../powerpc/vec_perm-runnable-i128.c  | 229 ++
>>  3 files changed, 235 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index a210c5ad10d..45000f161e4 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -4932,6 +4932,10 @@
>>  XXPERMDI_4SF  XXPERMDI_VF
>>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>>  XXPERMDI_2DF  XXPERMDI_VD
>> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TI
>> +  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TUI
> 
> Nits:
>   - Move them before "vf __builtin_vsx_xxpermdi (vf, vf, const int);" so
> they are close to instances for other integral types.
>   - As the existing name convention, _{SQ,UQ} are better.
> 
> vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>XXPERMDI_1TI  XXPERMDI_1SQ
> vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
>XXPERMDI_1TI  XXPERMDI_1UQ
> 

OK, moved the definitions up and changed the names.

>>  
>>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 0756230b19e..edfef1bdab7 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char 
>> *);
>>  vector double vec_xxpermdi (vector double, vector double, const int);
>>  vector float vec_xxpermdi (vector float, vector float, const int);
>>  vector long long vec_xxpermdi (vector long long, vector long long, const 
>> int);
> 
>> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
>> +vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const 
>> int);
> 
> Nit: These two lines break the long long and unsigned long long lines, can 
> you move
> them one line upward?  Also using the explicit "signed" and "unsigned" would 
> be
> better than "__{u,}int128".
> 

Yup, I didn't get them in the right place.  Fixed.

>>  vector unsigned long long vec_xxpermdi (vector unsigned long long,
>>  vector unsigned long long, const 
>> int);
>>  vector int vec_xxpermdi (vector int, vector int, const int);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
>> b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>> new file mode 100644
>> index 000..2d5dce09404
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>> @@ -0,0 +1,229 @@
>> +/* { dg-do run } */
>> +/* { dg-require-effective-target vmx_hw } */
>> +/* { dg-options "-save-temps" } */
> 
> Nit: dg-options line isn't needed as it doesn't check assembly.

Removed the save-temps.

> 
> BR,
> Kewen
> 
>> +
>> +#include 
>> +
>> +#define DEBUG 0
>> +
>> +#if DEBUG
>> +#include 
>> +void print_i128 (unsigned __int128 val)
>> +{
>> +  printf(" 0x%016llx%016llx",
>> + (unsigned long long)(val >> 64),
>> + (unsigned long long)(val & 0x));
>> +}
>> +#endif
>> +
>> +extern void abort (void);
>> +
>> +union convert_union {
>> +  vector signed __int128s128;
>> +  vector unsigned __int128  u128;
>> +  char  val[16];
>> +} convert;
>> +
>> +int check_u128_result(vector unsigned __int128 vresult_u128,
>> +  vector unsigned __int128 expected_vresult_u128)
>> +{
>> +  /* Use a for loop to check each byte manually so the test case will
>> + run with ISA 2.06.
>> +
>> + Return 1 if they match, 0 otherwise.  */
>> +
>> +  int i;
>> +
>> +  union convert_union result;
>> +  union convert_union 

Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:58, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:03, Carl Love wrote:
>> This was patch 6 in the previous series.  Updated the documentation file per 
>> the comments.  No functional changes to the patch.
>>
>>   Carl 
>> 
>>
>> rs6000, add overloaded vec_sel with int128 arguments
>>
>> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
>> and return a signed/unsigned int128 result.
>>
>> Extending the vec_sel built-in makes the existing buit-ins
>> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
>> patch removes these built-ins.
>>
>> The patch adds documentation and test cases for the new overloaded vec_sel
>> built-ins.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>>  __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>>  * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>>  definitions.
>>  * doc/extend.texi: Add documentation for new vec_sel instances.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vec-sel-runnable-i128.c: New test file.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def |   6 -
>>  gcc/config/rs6000/rs6000-overload.def |   4 +
>>  gcc/doc/extend.texi   |  12 ++
>>  .../powerpc/vec-sel-runnable-i128.c   | 129 ++
>>  4 files changed, 145 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 13e36df008d..ea0da77f13e 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1904,12 +1904,6 @@
>>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>>  
>> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
>> -XXSEL_1TI vector_select_v1ti {}
>> -
>> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
>> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
>> -
>>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>>  XXSEL_2DF vector_select_v2df {}
>>  
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 4d857bb1af3..a210c5ad10d 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3274,6 +3274,10 @@
>>  VSEL_2DF  VSEL_2DF_B
>>vd __builtin_vec_sel (vd, vd, vull);
>>  VSEL_2DF  VSEL_2DF_U
>> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
>> +VSEL_1TI  VSEL_1TI_S
>> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
>> +VSEL_1TI_UNS  VSEL_1TI_U
> 
> I just noticed that for integral types, such as: signed/unsigned int, we have 
> six instances:
> 
>   vsi __builtin_vec_sel (vsi, vsi, vbi);
> VSEL_4SI  VSEL_4SI_B
>   vsi __builtin_vec_sel (vsi, vsi, vui);
> VSEL_4SI  VSEL_4SI_U
>   vui __builtin_vec_sel (vui, vui, vbi);
> VSEL_4SI_UNS  VSEL_4SI_UB
>   vui __builtin_vec_sel (vui, vui, vui);
> VSEL_4SI_UNS  VSEL_4SI_UU
>   vbi __builtin_vec_sel (vbi, vbi, vbi);
> VSEL_4SI_UNS  VSEL_4SI_BB
>   vbi __builtin_vec_sel (vbi, vbi, vui);
> 
> It considers the control vector can only have unsigned and bool types, also 
> consider the
> return type can be bool.  It aligns with what PVIPR defines, so here we 
> should have:
> 
> vsq __builtin_vec_sel (vsq, vsq, vbq);
> vsq __builtin_vec_sel (vsq, vsq, vuq);
> vuq __builtin_vec_sel (vuq, vuq, vbq);
> vuq __builtin_vec_sel (vuq, vuq, vuq);
> vbq __builtin_vec_sel (vbq, vbq, vbq);
> vbq __builtin_vec_sel (vbq, vbq, vuq);
> 
> Sorry that I didn't find this in the previous review.

Yea, my bad I missed that as well.  Fixed to add all six instances.
> 
> 
>>  ; The following variants are deprecated.
>>vsll __builtin_vec_sel (vsll, vsll, vsll);
>>  VSEL_2DI_B  VSEL_2DI_S
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index b88e61641a2..0756230b19e 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -21372,6 +21372,18 @@ Additional built-in functions are available for the 
>> 64-bit PowerPC
>>  family of processors, for efficient use of 128-bit floating point
>>  (@code{__float128}) values.
>>  
>> +Vector select
>> +
>> +@smallexample
>> +vector signed __int128 vec_sel (vector signed __int128,
>> +   vector signed __int128, vector signed __int128);
>> +vector unsigned __int128 vec_sel (vector unsigned __int128,
>> +   vector unsigned __int128, vector unsigned __int128);
>> +@end smallexample
> 
> As above, the documentation here has to consider vector bool __int128 and 
> note that
> the control vector are of type either vector unsigned __int128 or vector bool 
> __int128.
> 
>> +
>> +The instance is an extension of the exiting overloaded built-in 
>> @code{vec_sel}
>> +that is 

Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-06-13 Thread Carl Love
Kewen:

On 6/4/24 00:19, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/29 23:58, Carl Love wrote:
>> Updated the patch per the feedback comments from the previous version.
>>
>>  Carl 
>> ---
>>
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
>> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
>> built-ins.
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
>> now for internal use only. They are not documented and they do not
>> have testcases.
>>> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
>> vec_signed{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
>> vec_unsigned{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
>> vec_unsigned, remove.
>>
>> The __builtin_vsx_xvcvspuxws is redundante as it is covered by
>> vec_unsigned, remove.
> 
> I perfer to move these removals into sub-patch 2/13 or split them out into
> a new patch, since they don't match the subject of this patch.  Moving it
> to sub-patch 2/13 looks good as they are all about vec_{un,}signed{,e,o}.

Yes, we need to have all of the vec_unsigned in the same patch.  Moved 
__builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws to patch 2.
> 
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>>  __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>>  (__builtin_vsx_xvcvspuxds): Fix return type.
>>  (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
>>  VEC_VUNSIGNEDE_V4SF respectively.
>>  (vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf,
>>  vunsignede_v4sf respectively.
>>  (__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
>>  __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed.
>>  * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>>  vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
>>  * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>>  vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
>>  * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c: New tests for the added
>>  overloaded built-ins.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 25 ++
>>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>>  gcc/config/rs6000/vsx.md  | 88 +++
>>  gcc/doc/extend.texi   | 10 +++
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 51 +--
>>  5 files changed, 157 insertions(+), 25 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index bf9a0ae22fc..cea2649b86c 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1688,32 +1688,23 @@
>>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
>> -XVCVDPSXWS vsx_xvcvdpsxws {}
>> -
>> -  const vsll __builtin_vsx_xvcvdpuxds (vd);
>> -XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
>> -
>>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>>  
>> -  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
>> -XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
>> -
>> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
>> -XVCVDPUXWS vsx_xvcvdpuxws {}
>> -
>>const vd __builtin_vsx_xvcvspdp (vf);
>>  XVCVSPDP vsx_xvcvspdp {}
>>  
>>const vsll __builtin_vsx_xvcvspsxds (vf);
>> -XVCVSPSXDS vsx_xvcvspsxds {}
>> +VEC_VSIGNEDE_V4SF vsignede_v4sf {}
> 
> We should rename __builtin_vsx_xvcvspsxds to
> __builtin_vsx_vsignede_v4sf, one reason is to align with
> the existing others, one more important thing
> is that it doesn't generate 1-1 mapping xvcvspsxds,
> putting that mnemonic can be misleading.

Yes, that would be more consistent. Changed.

> 
>> +
>> +  const vsll __builtin_vsx_xvcvspsxds_low (vf);
> 
> Ditto.
Changed.

> 
>> +VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
>>  
>> -  const vsll __builtin_vsx_xvcvspuxds (vf); -XVCVSPUXDS vsx_xvcvspuxds 
>> {}
>> +  const vull __builtin_vsx_xvcvspuxds (vf);
> 
> Ditto.
Changed.

> 
>> +VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
>>  
>> -  const vsi __builtin_vsx_xvcvspuxws (vf);
>> -

Re: [pushed] c++: module std and exception_ptr

2024-06-13 Thread Jason Merrill

On 6/13/24 11:16, Patrick Palka wrote:

On Wed, 12 Jun 2024, Jason Merrill wrote:


exception_ptr.h contains

   namespace __exception_ptr
   {
 class exception_ptr;
   }
   using __exception_ptr::exception_ptr;

so when module std tries to 'export using std::exception_ptr', it names
another using-directive rather than the class directly, so __exception_ptr
is never explicitly opened in module purview.


FWIW PR100134 ICEd in the same way, and r13-3236-g9736a42e1fb8df
narrowly fixed this by setting DECL_MODULE_PURVIEW_P on the enclosing
namespace around the time we set the flag on the namespace-scope entity in
question.  I wonder if it'd be preferable to do something similar here,
e.g. set DECL_MODULE_PURVIEW_P on the enclosing namespace in
do_nonmember_using_decl?


Interesting thought, but I don't think so, as this is a workaround for 
the broader 114683 problem; we shouldn't actually be setting 
DECL_MODULE_PURVIEW_P on the class, either, only the using-declaration.
The problem is that we don't currently represent the using in a way that 
we can set flags on specifically.


Jason



Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-13 Thread Jonathan Wakely
On Thu, 13 Jun 2024 at 15:16, Jonathan Wakely  wrote:
>
> On Thu, 13 Jun 2024 at 15:11, Jeff Law  wrote:
> >
> >
> >
> > On 6/13/24 4:33 AM, Jonathan Wakely wrote:
> > > On Wed, 12 Jun 2024 at 22:00, Frank Scheiner  
> > > wrote:
> > >>
> > >> Hi Jonathan, Richard,
> > >>
> > >> On 12.06.24 20:54, Jonathan Wakely wrote:
> > >>> On 12/06/24 16:09 +0200, Frank Scheiner wrote:
> >  Dear Richard,
> > 
> >  On 12.06.24 13:01, Richard Biener wrote:
> > > [...]
> > > I can find two gcc-testresult postings, one appearantly with LRA
> > > and one without?  Both from May:
> > >
> > > https://sourceware.org/pipermail/gcc-testresults/2024-May/816422.html
> > > https://sourceware.org/pipermail/gcc-testresults/2024-May/816346.html
> > >
> > > somehow for example libstdc++ summaries were not merged, it might
> > > be you do not have recent python installed on the system?  Or you
> > > didn't use contrib/test_summary to create those mails.
> > 
> >  No, I did not use contrib/test_summary. But I still have tarballs of
> >  both testsuite runs, so could still produce these summaries - I hope?
> > >>>
> > >>> It looks like the results at
> > >>> https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816422.html are
> > >>> just what's printed on standard out, including output from 'make -j4'
> > >>> so not combined into one set of results.
> > >>
> > >> That's what it is, yes.
> > >>
> > >>> It would certainly be better to either get the results from the .sum
> > >>> files, or just use the contrib/test_summary script to do that for you.
> > >>
> > >> Ok, I posted the results as created by contrib/test_summary now:
> > >>
> > >> 1. non-LRA version on [1]
> > >>
> > >> 2. LRA version on [2]
> > >>
> > >> [1]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html
> > >>
> > >> [2]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html
> > >
> > > Thanks!
> > >
> > > These ones are probably due to non-reserved names in glibc or kernel 
> > > headers:
> > >
> > > FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
> > > FAIL: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
> > > FAIL: experimental/names.cc  -std=gnu++17 (test for excess errors)
> > >
> > > The errors for all three are probably the same and should be
> > > decipherable from libstdc++.log which will show which names defined as
> > > macros in names.cc are clashing with names in system headers.
> > And wouldn't failure of these imply that the headers are either ancient
> > with some kind of pollution or that there's a ia64 specific goof in the
> > headers?
>
> Yes, indeed. It probably means some ia64-specific structures in kernel
> headers use non-reserved names like "next" or "ptr" or something,
> instead of __next or __ptr.
>
> >  These tests work on the other linux targets AFAIK.
>
> Most of them, yes. I think Jakub noticed some failures on s390x linux
> recently, due to bad names in s390x-specific structs in the kernel
> headers.

Ah yes, see r14-10076-gcf5f7791056b3e for details - the commit message
has lots of info. There were problems in kernel headers and in
s390x-specific parts of glibc.



Re: [PATCH v3] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-06-13 Thread David Malcolm
On Thu, 2024-06-13 at 07:31 -0700, Ken Matsui wrote:
> This patch adds a warning switch for "#pragma once in main file". 
> The
> warning option name is Wpragma-once-outside-header, which is the same
> as Clang.
> 
> PR preprocessor/89808
> 
> gcc/c-family/ChangeLog:
> 
> * c.opt (Wpragma_once_outside_header): Define new option.
> 
> gcc/ChangeLog:
> 
> * doc/invoke.texi (Warning Options): Document
> -Wno-pragma-once-outside-header.
> 
> libcpp/ChangeLog:
> 
> * include/cpplib.h (struct cpp_options): Define
> cpp_warn_pragma_once_outside_header.
> (cpp_warning_reason): Define
> CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
> * directives.cc (do_pragma_once): Use
> cpp_warn_pragma_once_outside_header and
> CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
> * init.cc (cpp_create_reader): Handle
> cpp_warn_pragma_once_outside_header.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
> * g++.dg/warn/Wpragma-once-outside-header.C: New test.
> 
> Signed-off-by: Ken Matsui 

[...snip...]

Thanks for the updated patch.

> @@ -7983,6 +7983,12 @@ Do not warn about misuses of pragmas, such as
> incorrect parameters,
>  invalid syntax, or conflicts between pragmas.  See also
>  @option{-Wunknown-pragmas}.
>  
> +@opindex Wno-pragma-once-outside-header
> +@opindex Wpragma-once-outside-header
> +@item -Wno-pragma-once-outside-header
> +Do not warn when @code{#pragma once} is used in a file that is not a
> header
> +file, such as a main file.
> +
>  @opindex Wno-prio-ctor-dtor
>  @opindex Wprio-ctor-dtor
>  @item -Wno-prio-ctor-dtor

Please run "make html && make regenerate-opt-urls" so that the
diagnostic gets a documentation URL.  Sorry that you have to do this
manually (it's to avoid complicating the build dependencies for someone
just building gcc).

[...snip...]


> diff --git a/libcpp/directives.cc b/libcpp/directives.cc
> index 479f8c716e8..68f47104dea 100644
> --- a/libcpp/directives.cc
> +++ b/libcpp/directives.cc
> @@ -1588,8 +1588,12 @@ do_pragma (cpp_reader *pfile)
>  static void
>  do_pragma_once (cpp_reader *pfile)
>  {
> -  if (_cpp_in_main_source_file (pfile))
> -    cpp_error (pfile, CPP_DL_WARNING, "#pragma once in main file");
> +  const unsigned char warn_level =
> +    CPP_OPTION (pfile, cpp_warn_pragma_once_outside_header);
> +
> +  if (warn_level && _cpp_in_main_source_file (pfile))
> +    cpp_warning (pfile, CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER,
> +    "#pragma once in main file");

Please put the "#pragma once" in the message in quotes, such as via:

cpp_warning (pfile, CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER,
"% in main file");

or via:

cpp_warning (pfile, CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER,
"%qs in main file", "pragma once");

Although it's a minor style nit, I'm working on patches to
automatically add URLs to GCC's documentation for certain quoted
strings on sufficiently capable terminals (I've done command-line
options, I'm working on attributes, and I hope to eventually do
pragmas).

Dave



[pushed] c++/modules: multiple usings of the same decl [PR115194]

2024-06-13 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

add_binding_entity creates an OVERLOAD to represent a using-declaration in
module purview of a declaration in the global module, even for
non-functions, and we were failing to merge that with the original
declaration in name lookup.

It's not clear to me that building the OVERLOAD is what should be happening,
but let's work around it for now pending an overhaul of using-decl handling
for c++/114683.

PR c++/115194

gcc/cp/ChangeLog:

* name-lookup.cc (name_lookup::process_module_binding): Strip an
OVERLOAD from a non-function.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-23_a.C: New test.
* g++.dg/modules/using-23_b.C: New test.
---
 gcc/cp/name-lookup.cc | 11 +++
 gcc/testsuite/g++.dg/modules/using-23_a.C | 19 +++
 gcc/testsuite/g++.dg/modules/using-23_b.C |  7 +++
 3 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/using-23_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-23_b.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 3d3e20f48cb..71482db7b76 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -827,6 +827,17 @@ name_lookup::process_module_binding (tree new_val, tree 
new_type,
   marker |= 2;
 }
 
+  /* add_binding_entity wraps decls brought in by 'using' in an OVERLOAD even
+ for non-functions; strip it now.
+ ??? Why isn't it represented with a USING_DECL?  Or do we want to use
+ OVERLOAD for using more widely to address 114683?  */
+  if (new_val && TREE_CODE (new_val) == OVERLOAD
+  && !DECL_DECLARES_FUNCTION_P (OVL_FUNCTION (new_val)))
+{
+  gcc_checking_assert (OVL_USING_P (new_val) && !OVL_CHAIN (new_val));
+  new_val = OVL_FUNCTION (new_val);
+}
+
   if (new_type || new_val)
 marker |= process_binding (new_val, new_type);
 
diff --git a/gcc/testsuite/g++.dg/modules/using-23_a.C 
b/gcc/testsuite/g++.dg/modules/using-23_a.C
new file mode 100644
index 000..e7e6fecbea6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-23_a.C
@@ -0,0 +1,19 @@
+// PR c++/115194
+// { dg-additional-options "-fmodules-ts -Wno-global-module" }
+
+module;
+
+namespace NS1 {
+  namespace NS2 {
+class Thing {};
+  } // NS2
+  using NS2::Thing;
+} // NS1
+
+export module modA;
+
+export
+namespace NS1 {
+  using ::NS1::Thing;
+  namespace NS2 { }
+}
diff --git a/gcc/testsuite/g++.dg/modules/using-23_b.C 
b/gcc/testsuite/g++.dg/modules/using-23_b.C
new file mode 100644
index 000..6502c476b9b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-23_b.C
@@ -0,0 +1,7 @@
+// { dg-additional-options "-fmodules-ts" }
+
+import modA;
+
+using NS1::Thing;
+using namespace NS1::NS2;
+Thing thing;

base-commit: 99e6cf404e37655be303e71f20df03c284c7989e
-- 
2.44.0



Re: [PATCH 1/2] RISC-V: Fix vwsll combine on rv32 targets

2024-06-13 Thread Robin Dapp
> I did a test run without the subreg condition and it also appears to
> work when running on rv32gcv and rv64gcv newlib. Would it be better
> to remove the subreg?

Yep, if it works, i.e. all tests still pass then let's get rid of it.

Regards
 Robin



[PATCH v3] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-06-13 Thread Ken Matsui
This patch adds a warning switch for "#pragma once in main file".  The
warning option name is Wpragma-once-outside-header, which is the same
as Clang.

PR preprocessor/89808

gcc/c-family/ChangeLog:

* c.opt (Wpragma_once_outside_header): Define new option.

gcc/ChangeLog:

* doc/invoke.texi (Warning Options): Document
-Wno-pragma-once-outside-header.

libcpp/ChangeLog:

* include/cpplib.h (struct cpp_options): Define
cpp_warn_pragma_once_outside_header.
(cpp_warning_reason): Define CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
* directives.cc (do_pragma_once): Use
cpp_warn_pragma_once_outside_header and
CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
* init.cc (cpp_create_reader): Handle
cpp_warn_pragma_once_outside_header.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
* g++.dg/warn/Wpragma-once-outside-header.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/c-family/c.opt |  4 
 gcc/doc/invoke.texi| 10 --
 .../g++.dg/warn/Wno-pragma-once-outside-header.C   |  5 +
 .../g++.dg/warn/Wpragma-once-outside-header.C  |  6 ++
 libcpp/directives.cc   |  9 ++---
 libcpp/include/cpplib.h|  7 ++-
 libcpp/init.cc |  1 +
 7 files changed, 36 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
 create mode 100644 gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 403abc1f26e..3439f36fe45 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1188,6 +1188,10 @@ Wpragmas
 C ObjC C++ ObjC++ Var(warn_pragmas) Init(1) Warning
 Warn about misuses of pragmas.
 
+Wpragma-once-outside-header
+C ObjC C++ ObjC++ Var(warn_pragma_once_outside_header) 
CppReason(CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER) Init(1) Warning
+Warn about #pragma once outside of a header.
+
 Wprio-ctor-dtor
 C ObjC C++ ObjC++ Var(warn_prio_ctor_dtor) Init(1) Warning
 Warn if constructor or destructors with priorities from 0 to 100 are used.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9456ced468a..c7f17ca9eb7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -391,8 +391,8 @@ Objective-C and Objective-C++ Dialects}.
 -Wpacked  -Wno-packed-bitfield-compat  -Wpacked-not-aligned  -Wpadded
 -Wparentheses  -Wno-pedantic-ms-format
 -Wpointer-arith  -Wno-pointer-compare  -Wno-pointer-to-int-cast
--Wno-pragmas  -Wno-prio-ctor-dtor  -Wredundant-decls
--Wrestrict  -Wno-return-local-addr  -Wreturn-type
+-Wno-pragmas  -Wno-pragma-once-outside-header  -Wno-prio-ctor-dtor
+-Wredundant-decls  -Wrestrict  -Wno-return-local-addr  -Wreturn-type
 -Wno-scalar-storage-order  -Wsequence-point
 -Wshadow  -Wshadow=global  -Wshadow=local  -Wshadow=compatible-local
 -Wno-shadow-ivar
@@ -7983,6 +7983,12 @@ Do not warn about misuses of pragmas, such as incorrect 
parameters,
 invalid syntax, or conflicts between pragmas.  See also
 @option{-Wunknown-pragmas}.
 
+@opindex Wno-pragma-once-outside-header
+@opindex Wpragma-once-outside-header
+@item -Wno-pragma-once-outside-header
+Do not warn when @code{#pragma once} is used in a file that is not a header
+file, such as a main file.
+
 @opindex Wno-prio-ctor-dtor
 @opindex Wprio-ctor-dtor
 @item -Wno-prio-ctor-dtor
diff --git a/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C 
b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
new file mode 100644
index 000..b5be4d25a9d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
@@ -0,0 +1,5 @@
+// { dg-do assemble  }
+// { dg-options "-Wno-pragma-once-outside-header" }
+
+#pragma once
+int main() {}
diff --git a/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C 
b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
new file mode 100644
index 000..324b0638c3f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
@@ -0,0 +1,6 @@
+// { dg-do assemble  }
+// { dg-options "-Werror=pragma-once-outside-header" }
+// { dg-message "some warnings being treated as errors" "" {target "*-*-*"} 0 }
+
+#pragma once  // { dg-error "#pragma once in main file" }
+int main() {}
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 479f8c716e8..68f47104dea 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1588,8 +1588,12 @@ do_pragma (cpp_reader *pfile)
 static void
 do_pragma_once (cpp_reader *pfile)
 {
-  if (_cpp_in_main_source_file (pfile))
-cpp_error (pfile, CPP_DL_WARNING, "#pragma once in main file");
+  const unsigned char warn_level =
+CPP_OPTION (pfile, cpp_warn_pragma_once_outside_header);
+
+  if (warn_level && _cpp_in_main_source_file (pfile))
+cpp_warning (pfile, 

Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-13 Thread Jeff Law




On 6/13/24 5:40 AM, Manolis Tsamis wrote:



Could you please run the v3 with your tester? I assume some of the
additional fixes introduced may clear some of the other issues.

Will do.

jeff



Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-13 Thread Jonathan Wakely
On Thu, 13 Jun 2024 at 15:11, Jeff Law  wrote:
>
>
>
> On 6/13/24 4:33 AM, Jonathan Wakely wrote:
> > On Wed, 12 Jun 2024 at 22:00, Frank Scheiner  wrote:
> >>
> >> Hi Jonathan, Richard,
> >>
> >> On 12.06.24 20:54, Jonathan Wakely wrote:
> >>> On 12/06/24 16:09 +0200, Frank Scheiner wrote:
>  Dear Richard,
> 
>  On 12.06.24 13:01, Richard Biener wrote:
> > [...]
> > I can find two gcc-testresult postings, one appearantly with LRA
> > and one without?  Both from May:
> >
> > https://sourceware.org/pipermail/gcc-testresults/2024-May/816422.html
> > https://sourceware.org/pipermail/gcc-testresults/2024-May/816346.html
> >
> > somehow for example libstdc++ summaries were not merged, it might
> > be you do not have recent python installed on the system?  Or you
> > didn't use contrib/test_summary to create those mails.
> 
>  No, I did not use contrib/test_summary. But I still have tarballs of
>  both testsuite runs, so could still produce these summaries - I hope?
> >>>
> >>> It looks like the results at
> >>> https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816422.html are
> >>> just what's printed on standard out, including output from 'make -j4'
> >>> so not combined into one set of results.
> >>
> >> That's what it is, yes.
> >>
> >>> It would certainly be better to either get the results from the .sum
> >>> files, or just use the contrib/test_summary script to do that for you.
> >>
> >> Ok, I posted the results as created by contrib/test_summary now:
> >>
> >> 1. non-LRA version on [1]
> >>
> >> 2. LRA version on [2]
> >>
> >> [1]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html
> >>
> >> [2]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html
> >
> > Thanks!
> >
> > These ones are probably due to non-reserved names in glibc or kernel 
> > headers:
> >
> > FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
> > FAIL: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
> > FAIL: experimental/names.cc  -std=gnu++17 (test for excess errors)
> >
> > The errors for all three are probably the same and should be
> > decipherable from libstdc++.log which will show which names defined as
> > macros in names.cc are clashing with names in system headers.
> And wouldn't failure of these imply that the headers are either ancient
> with some kind of pollution or that there's a ia64 specific goof in the
> headers?

Yes, indeed. It probably means some ia64-specific structures in kernel
headers use non-reserved names like "next" or "ptr" or something,
instead of __next or __ptr.

>  These tests work on the other linux targets AFAIK.

Most of them, yes. I think Jakub noticed some failures on s390x linux
recently, due to bad names in s390x-specific structs in the kernel
headers.



Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-13 Thread Jeff Law




On 6/13/24 4:33 AM, Jonathan Wakely wrote:

On Wed, 12 Jun 2024 at 22:00, Frank Scheiner  wrote:


Hi Jonathan, Richard,

On 12.06.24 20:54, Jonathan Wakely wrote:

On 12/06/24 16:09 +0200, Frank Scheiner wrote:

Dear Richard,

On 12.06.24 13:01, Richard Biener wrote:

[...]
I can find two gcc-testresult postings, one appearantly with LRA
and one without?  Both from May:

https://sourceware.org/pipermail/gcc-testresults/2024-May/816422.html
https://sourceware.org/pipermail/gcc-testresults/2024-May/816346.html

somehow for example libstdc++ summaries were not merged, it might
be you do not have recent python installed on the system?  Or you
didn't use contrib/test_summary to create those mails.


No, I did not use contrib/test_summary. But I still have tarballs of
both testsuite runs, so could still produce these summaries - I hope?


It looks like the results at
https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816422.html are
just what's printed on standard out, including output from 'make -j4'
so not combined into one set of results.


That's what it is, yes.


It would certainly be better to either get the results from the .sum
files, or just use the contrib/test_summary script to do that for you.


Ok, I posted the results as created by contrib/test_summary now:

1. non-LRA version on [1]

2. LRA version on [2]

[1]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html

[2]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html


Thanks!

These ones are probably due to non-reserved names in glibc or kernel headers:

FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
FAIL: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
FAIL: experimental/names.cc  -std=gnu++17 (test for excess errors)

The errors for all three are probably the same and should be
decipherable from libstdc++.log which will show which names defined as
macros in names.cc are clashing with names in system headers.
And wouldn't failure of these imply that the headers are either ancient 
with some kind of pollution or that there's a ia64 specific goof in the 
headers?  These tests work on the other linux targets AFAIK.


jeff



Re: [PATCH] c++: ICE w/ ambig and non-strictly-viable cands [PR115239]

2024-06-13 Thread Patrick Palka
On Wed, 12 Jun 2024, Jason Merrill wrote:

> On 6/12/24 13:56, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk/14?
> > 
> > -- >8 --
> > 
> > Here during overload resolution we have two strictly viable ambiguous
> > candidates #1 and #2, and two non-strictly viable candidates #3 and #4
> > which we hold on to ever since r14-6522.  These latter candidates have
> > an empty third arg conversion since the second arg conversion was deemed
> > bad.  This ends up causing an ICE during joust for #3 and #4 due to this
> > empty arg conversion.
> > 
> > We can fix this by making joust robust to empty arg conversions, but in
> > this situation we shouldn't need to compare #3 and #4 at all given that
> > we have a strictly viable candidate.  To that end, this patch makes
> > tourney shortcut considering non-strictly viable candidates upon
> > encountering ambiguity between two strictly viable candidates, taking
> > advantage of the fact that the candidates list is sorted according to
> > viability via splice_viable.
> > 
> > PR c++/115239
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (tourney): Don't consider a non-strictly viable
> > candidate as the champ if there was ambiguity between two
> > strictly viable candidates.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/overload/error7.C: New test.
> > ---
> >   gcc/cp/call.cc |  4 +++-
> >   gcc/testsuite/g++.dg/overload/error7.C | 10 ++
> >   2 files changed, 13 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/overload/error7.C
> > 
> > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > index ed68eb3c568..82c70f5c39f 100644
> > --- a/gcc/cp/call.cc
> > +++ b/gcc/cp/call.cc
> > @@ -13484,9 +13484,11 @@ tourney (struct z_candidate *candidates,
> > tsubst_flags_t complain)
> > }
> > else
> > {
> > + z_candidate *prev_champ = *champ;
> >   previous_worse_champ = nullptr;
> >   champ = &(*challenger)->next;
> > - if (!*champ || !(*champ)->viable)
> > + if (!*champ || !(*champ)->viable
> > + || (prev_champ->viable == 1 && (*champ)->viable == -1))
> 
> Maybe
> 
> (!*champ || (*champ)->viable < prev_champ->viable) ?
> 
> OK with that change.

Nice, done.  In passing it occurred to me that we can just consider
challenger->viable instead of prev_champ->viable (they must be the
same since there was ambiguity), so we don't need to introduce
prev_champ.

This is what I ended up pushing:

-- >8 --

Subject: [PATCH] c++: ICE w/ ambig and non-strictly-viable cands [PR115239]

PR c++/115239

gcc/cp/ChangeLog:

* call.cc (tourney): Don't consider a non-strictly viable
candidate as the champ if there was ambiguity between two
strictly viable candidates.

gcc/testsuite/ChangeLog:

* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill 
---
 gcc/cp/call.cc |  3 ++-
 gcc/testsuite/g++.dg/overload/error7.C | 10 ++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/overload/error7.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 85536fc25ff..7bbc1fb0c78 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13490,7 +13490,8 @@ tourney (struct z_candidate *candidates, tsubst_flags_t 
complain)
{
  previous_worse_champ = nullptr;
  champ = &(*challenger)->next;
- if (!*champ || !(*champ)->viable)
+ if (!*champ || !(*champ)->viable
+ || (*champ)->viable < (*challenger)->viable)
{
  champ = nullptr;
  break;
diff --git a/gcc/testsuite/g++.dg/overload/error7.C 
b/gcc/testsuite/g++.dg/overload/error7.C
new file mode 100644
index 000..de50ce5f66e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/error7.C
@@ -0,0 +1,10 @@
+// PR c++/115239
+
+bool foo(char *, long); // #1, strictly viable, ambig with #2
+bool foo(char *, unsigned); // #2, strictly viable, ambig with #1
+bool foo(char, long);   // #3, non-strictly viable
+bool foo(char, unsigned);   // #4, non-strictly viable
+
+int main() {
+  foo((char *)0, 0); // { dg-error "ambiguous" }
+}
-- 
2.45.2.457.g8d94cfb545



Re: [PATCH 10/52] jit: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-13 Thread David Malcolm
On Sun, 2024-06-02 at 22:01 -0500, Kewen Lin wrote:
> Joseph pointed out "floating types should have their mode,
> not a poorly defined precision value" in the discussion[1],
> as he and Richi suggested, the existing macros
> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
> hook mode_for_floating_type.  Unlike the other FEs, for the
> uses in recording::memento_of_get_type::get_size, since
> {float,{,long_}double}_type_node haven't been initialized
> yet, this is to replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> with calling hook targetm.c.mode_for_floating_type.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
> 
> gcc/jit/ChangeLog:
> 
> * jit-recording.cc
> (recording::memento_of_get_type::get_size): Update
> macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
> targetm.c.mode_for_floating_type with
> TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
> ---
>  gcc/jit/jit-recording.cc | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
> index 68a2e860c1f..7719b898e57 100644
> --- a/gcc/jit/jit-recording.cc
> +++ b/gcc/jit/jit-recording.cc
> @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> -#include "tm.h"
> +#include "target.h"
>  #include "pretty-print.h"
>  #include "toplev.h"
>  
> @@ -2353,6 +2353,7 @@ size_t
>  recording::memento_of_get_type::get_size ()
>  {
>    int size;
> +  machine_mode m;
>    switch (m_kind)
>  {
>  case GCC_JIT_TYPE_VOID:
> @@ -2399,13 +2400,16 @@ recording::memento_of_get_type::get_size ()
>    size = 128;
>    break;
>  case GCC_JIT_TYPE_FLOAT:
> -  size = FLOAT_TYPE_SIZE;
> +  m = targetm.c.mode_for_floating_type (TI_FLOAT_TYPE);
> +  size = GET_MODE_PRECISION (m).to_constant ();
>    break;
>  case GCC_JIT_TYPE_DOUBLE:
> -  size = DOUBLE_TYPE_SIZE;
> +  m = targetm.c.mode_for_floating_type (TI_DOUBLE_TYPE);
> +  size = GET_MODE_PRECISION (m).to_constant ();
>    break;
>  case GCC_JIT_TYPE_LONG_DOUBLE:
> -  size = LONG_DOUBLE_TYPE_SIZE;
> +  m = targetm.c.mode_for_floating_type (TI_LONG_DOUBLE_TYPE);
> +  size = GET_MODE_PRECISION (m).to_constant ();
>    break;
>  case GCC_JIT_TYPE_SIZE_T:
>    size = MAX_BITS_PER_WORD;

[CCing jit mailing list]

Thanks for the patch; sorry for the delay in responding.

Did your testing include jit?  Note that --enable-languages=all does
*not* include it (due to it needing --enable-host-shared).

The jit::recording code runs *very* early - before toplev::main.  For
example, a call to gcc_jit_type_get_size can trigger the above code
path before toplev::main has run.

target.h says each target should have a:

  struct gcc_target targetm = TARGET_INITIALIZER;

Has targetm.c.mode_for_floating_type been initialized enough by that
static initialization?  Could the mode_for_floating_type hook be
relying on some target-specific dynamic initialization that hasn't run
yet?  (e.g. taking account of command-line options?)

Dave



[COMMITTED 29/30] ada: Remove -gnatdJ switch

2024-06-13 Thread Marc Poulhiès
From: Viljar Indus 

Using -gnatdJ with various other switches was error prone.
Remove this switch since the primary users of this mode
GNATCheck and Codepeer no longer need it.

gcc/ada/

* debug.adb: Remove mentions of -gnatdJ.
* errout.adb: Remove printing subprogram names to JSON.
* erroutc.adb: Remove printing subprogram names in messages.
* erroutc.ads: Remove Node and Subprogram_Name_Ptr used for -gnatdJ.
* errutil.adb: Remove Node used for -gnatdJ
* gnat1drv.adb: Remove references of -gnatdJ and
Include_Subprgram_In_Messages.
* opt.ads: Remove Include_Subprgram_In_Messages
* par-util.adb: Remove behavior related to
Include_Subprgram_In_Messages.
* sem_util.adb: Remove Subprogram_Name used for -gnatdJ

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/debug.adb|   6 ---
 gcc/ada/errout.adb   |  62 +++
 gcc/ada/erroutc.adb  |  20 +---
 gcc/ada/erroutc.ads  |  18 ---
 gcc/ada/errutil.adb  |   3 +-
 gcc/ada/gnat1drv.adb |   7 ---
 gcc/ada/opt.ads  |   4 --
 gcc/ada/par-util.adb |   6 ---
 gcc/ada/sem_util.adb | 116 ---
 9 files changed, 22 insertions(+), 220 deletions(-)

diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb
index 540db2a9942..602a8fa0b63 100644
--- a/gcc/ada/debug.adb
+++ b/gcc/ada/debug.adb
@@ -67,7 +67,6 @@ package body Debug is
--  dG   Generate all warnings including those normally suppressed
--  dH   Hold (kill) call to gigi
--  dI   Inhibit internal name numbering in gnatG listing
-   --  dJ   Prepend subprogram name in messages
--  dK   Kill all error messages
--  dL   Ignore external calls from instances for elaboration
--  dM   Assume all variables are modified (no current values)
@@ -615,11 +614,6 @@ package body Debug is
--   is used in the fixed bugs run to minimize system and version
--   dependency in filed -gnatD or -gnatG output.
 
-   --  dJ   Prepend the name of the enclosing subprogram in compiler messages
-   --   (errors, warnings, style checks). This is useful in particular to
-   --   integrate compiler warnings in static analysis tools such as
-   --   CodePeer.
-
--  dK   Kill all error messages. This debug flag suppresses the output
--   of all error messages. It is used in regression tests where the
--   error messages are target dependent and irrelevant.
diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
index 92c4f6a4635..76c461a2fd7 100644
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -100,8 +100,7 @@ package body Errout is
  (Msg  : String;
   Span : Source_Span;
   Opan : Source_Span;
-  Msg_Cont : Boolean;
-  Node : Node_Id);
+  Msg_Cont : Boolean);
--  This is the low-level routine used to post messages after dealing with
--  the issue of messages placed on instantiations (which get broken up
--  into separate calls in Error_Msg). Span is the location on which the
@@ -112,9 +111,7 @@ package body Errout is
--  copy. So typically we can see Opan pointing to the template location
--  in an instantiation copy when Span points to the source location of
--  the actual instantiation (i.e the line with the new). Msg_Cont is
-   --  set true if this is a continuation message. Node is the relevant
-   --  Node_Id for this message, to be used to compute the enclosing entity if
-   --  Opt.Include_Subprogram_In_Messages is set.
+   --  set true if this is a continuation message.
 
function No_Warnings (N : Node_Or_Entity_Id) return Boolean;
--  Determines if warnings should be suppressed for the given node
@@ -475,7 +472,7 @@ package body Errout is
   --  Error_Msg_Internal to place the message in the requested location.
 
   if Instantiation (Sindex) = No_Location then
- Error_Msg_Internal (Msg, Flag_Span, Flag_Span, False, N);
+ Error_Msg_Internal (Msg, Flag_Span, Flag_Span, False);
  return;
   end if;
 
@@ -573,32 +570,28 @@ package body Errout is
(Msg  => "info: in inlined body #",
 Span => To_Span (Actual_Error_Loc),
 Opan => Flag_Span,
-Msg_Cont => Msg_Cont_Status,
-Node => N);
+Msg_Cont => Msg_Cont_Status);
 
   elsif Is_Warning_Msg then
  Error_Msg_Internal
(Msg  => Warn_Insertion & "in inlined body #",
 Span => To_Span (Actual_Error_Loc),
 Opan => Flag_Span,
-Msg_Cont => Msg_Cont_Status,
-Node => N);
+Msg_Cont => Msg_Cont_Status);
 
   elsif Is_Style_Msg then
  Error_Msg_Internal
(Msg 

Re: [PATCH] rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

2024-06-13 Thread Peter Bergner
On 6/13/24 12:35 AM, Kewen.Lin wrote:
>> @@ -826,7 +826,14 @@ rs6000_stack_info (void)
>>info->ehrd_offset -= info->rop_hash_size;
>>  }
>>else
>> -info->ehrd_offset = info->gp_save_offset - ehrd_size;
>> +{
>> +  info->ehrd_offset = info->gp_save_offset - ehrd_size;
>> +
>> +  /* Adjust for ROP protection.  */
>> +  info->rop_hash_save_offset
>> += info->gp_save_offset - info->rop_hash_size;
>> +  info->ehrd_offset -= info->rop_hash_size;
>> +}
> 
> I understand this is just copied from the if arm, but if I read this right, 
> it can be
> simplified as:

Ok, I'll retest with that simplification.





>> +/* { dg-do assemble } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect -mno-vsx 
>> -mno-altivec -mabi=no-altivec -save-temps" } */
> 
> I'd expect -mabi=no-altivec is default for -mno-altivec, but specifying it 
> explicitly
> looks fine to me. :)

That's what I expected too! :-)  However, I was surprised to learn that 
-mno-altivec
does *not* disable TARGET_ALTIVEC_ABI.  I had to explicitly use the -mabi= 
option to
expose the bug.



Peter



  1   2   3   >