date:20220325

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread H.J. Lu via Gcc-patches

On Fri, Mar 25, 2022 at 7:04 PM Hongyu Wang  wrote:
>
> > > Is it possible to create a test case that gas would throw an error for
> > > invalid operands?
> >
> > You can use -ffix-xmmN to disable XMM0-15.
>
> I mean can we create an intrinsic test for this PR that produces xmm16-31?
> And the -ffix-xmmN is an option for assembler or compiler? I didn't
> find it in document.

You can add -march=skylake-avx512 -ffix-xmm0 ... -ffix-xmm15 to force
XMM16-XMM31.

> H.J. Lu  于2022年3月26日周六 09:22写道：
> >
> > On Fri, Mar 25, 2022 at 6:08 PM Hongyu Wang  wrote:
> > >
> > > Is it possible to create a test case that gas would throw an error for
> > > invalid operands?
> >
> > You can use -ffix-xmmN to disable XMM0-15.
> >
> > > H.J. Lu via Gcc-patches  于2022年3月26日周六 04:50写道：
> > > >
> > > > Since KL instructions have no AVX512 version, replace the "v" register
> > > > constraint with the "x" register constraint.
> > > >
> > > > PR target/105058
> > > > * config/i386/sse.md (loadiwkey): Replace "v" with "x".
> > > > (aesu8): Likewise.
> > > > ---
> > > >  gcc/config/i386/sse.md | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > index 29802d00ce6..33bd2c4768a 100644
> > > > --- a/gcc/config/i386/sse.md
> > > > +++ b/gcc/config/i386/sse.md
> > > > @@ -28364,8 +28364,8 @@ (define_insn "avx512f_dpbf16ps__mask"
> > > >
> > > >  ;; KEYLOCKER
> > > >  (define_insn "loadiwkey"
> > > > -  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "v")
> > > > - (match_operand:V2DI 1 "register_operand" "v")
> > > > +  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "x")
> > > > + (match_operand:V2DI 1 "register_operand" "x")
> > > >   (match_operand:V2DI 2 "register_operand" "Yz")
> > > >   (match_operand:SI   3 "register_operand" "a")]
> > > >  UNSPECV_LOADIWKEY)
> > > > @@ -28498,7 +28498,7 @@ (define_int_attr aesklvariant
> > > > (UNSPECV_AESENC256KLU8 "enc256kl")])
> > > >
> > > >  (define_insn "aesu8"
> > > > -  [(set (match_operand:V2DI 0 "register_operand" "=v")
> > > > +  [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > > (unspec_volatile:V2DI [(match_operand:V2DI 1 "register_operand" 
> > > > "0")
> > > >(match_operand:BLK   2 "memory_operand" 
> > > > "m")]
> > > >   AESDECENCKL))
> > > > --
> > > > 2.35.1
> > > >
> >
> >
> >
> > --
> > H.J.



-- 
H.J.

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread Hongyu Wang via Gcc-patches

> > Is it possible to create a test case that gas would throw an error for
> > invalid operands?
>
> You can use -ffix-xmmN to disable XMM0-15.

I mean can we create an intrinsic test for this PR that produces xmm16-31?
And the -ffix-xmmN is an option for assembler or compiler? I didn't
find it in document.

H.J. Lu  于2022年3月26日周六 09:22写道：
>
> On Fri, Mar 25, 2022 at 6:08 PM Hongyu Wang  wrote:
> >
> > Is it possible to create a test case that gas would throw an error for
> > invalid operands?
>
> You can use -ffix-xmmN to disable XMM0-15.
>
> > H.J. Lu via Gcc-patches  于2022年3月26日周六 04:50写道：
> > >
> > > Since KL instructions have no AVX512 version, replace the "v" register
> > > constraint with the "x" register constraint.
> > >
> > > PR target/105058
> > > * config/i386/sse.md (loadiwkey): Replace "v" with "x".
> > > (aesu8): Likewise.
> > > ---
> > >  gcc/config/i386/sse.md | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 29802d00ce6..33bd2c4768a 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -28364,8 +28364,8 @@ (define_insn "avx512f_dpbf16ps__mask"
> > >
> > >  ;; KEYLOCKER
> > >  (define_insn "loadiwkey"
> > > -  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "v")
> > > - (match_operand:V2DI 1 "register_operand" "v")
> > > +  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "x")
> > > + (match_operand:V2DI 1 "register_operand" "x")
> > >   (match_operand:V2DI 2 "register_operand" "Yz")
> > >   (match_operand:SI   3 "register_operand" "a")]
> > >  UNSPECV_LOADIWKEY)
> > > @@ -28498,7 +28498,7 @@ (define_int_attr aesklvariant
> > > (UNSPECV_AESENC256KLU8 "enc256kl")])
> > >
> > >  (define_insn "aesu8"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=v")
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > (unspec_volatile:V2DI [(match_operand:V2DI 1 "register_operand" 
> > > "0")
> > >(match_operand:BLK   2 "memory_operand" 
> > > "m")]
> > >   AESDECENCKL))
> > > --
> > > 2.35.1
> > >
>
>
>
> --
> H.J.

[PATCH, stage 1] Fortran: Add support for OMP non-rectangular loops

2022-03-25 Thread Sandra Loosemore

This patch adds Fortran support for OMP 5.1 "canonical loop nest form" 
and non-rectangular loops.  The C/C++ and middle-end support is already 
present except for some missing constraint checks in the gimplifier, 
which I've added here.  There's still a TODO with respect to the 
not-yet-implemented TILE construct.


Is this OK for stage 1 when the time comes?

-Sandracommit c46a79a9841b90fb0cde564e4147932290d91832
Author: Sandra Loosemore 
Date:   Fri Mar 25 14:14:37 2022 -0700

Fortran: Add support for OMP non-rectangular loops.

This patch adds support for OMP 5.1 "canonical loop nest form" to the
Fortran front end, marks non-rectangular loops for processing
by the middle end, and implements missing checks in the gimplifier
for additional prohibitions on non-rectangular loops.

Note that the OMP spec also prohibits non-rectangular loops with the TILE
construct; that construct hasn't been implemented yet, so that error will
need to be filled in later.

	gcc/fortran/
	* gfortran.h (struct gfc_omp_clauses): Add non_rectangular bit.
	* openmp.cc (is_outer_iteration_variable): New function.
	(expr_is_invariant): New function.
	(bound_expr_is_canonical): New function.
	(resolve_omp_do): Replace existing non-rectangularity error with
	check for canonical form and setting non_rectangular bit.
	* trans-openmp.cc (gfc_trans_omp_do): Transfer non_rectangular
	flag to generated tree structure.

	gcc/
	* gimplify.cc (gimplify_omp_for): Update messages for SCHEDULED
	and ORDERED clause conflict errors.  Add check for GRAINSIZE and
	NUM_TASKS on TASKLOOP.

	gcc/testsuite/
	* c-c++-common/gomp/loop-6.c (f3): New function to test TASKLOOP
	diagnostics.
	* gfortran.dg/gomp/collapse1.f90: Update expected messages.
	* gfortran.dg/gomp/pr85313.f90: Remove dg-error on non-rectangular
	loops that are now accepted.
	* gfortran.dg/gomp/non-rectangular-loop.f90: New file.
	* gfortran.dg/gomp/canonical-loop-1.f90: New file.
	* gfortran.dg/gomp/canonical-loop-2.f90: New file.

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 7bf1d5a..1bce283 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1533,6 +1533,7 @@ typedef struct gfc_omp_clauses
   unsigned simd:1, threads:1, depend_source:1, destroy:1, order_concurrent:1;
   unsigned order_unconstrained:1, order_reproducible:1, capture:1;
   unsigned grainsize_strict:1, num_tasks_strict:1, compare:1, weak:1;
+  unsigned non_rectangular:1;
   ENUM_BITFIELD (gfc_omp_sched_kind) sched_kind:3;
   ENUM_BITFIELD (gfc_omp_device_type) device_type:2;
   ENUM_BITFIELD (gfc_omp_memorder) memorder:3;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 7141481..4d3fcc8 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8446,6 +8446,105 @@ gfc_resolve_omp_local_vars (gfc_namespace *ns)
 gfc_traverse_ns (ns, handle_local_var);
 }
 
+/* CODE is an OMP loop construct.  Return true if VAR matches an iteration
+   variable outer to level DEPTH.  */
+static bool
+is_outer_iteration_variable (gfc_code *code, int depth, gfc_symbol *var)
+{
+  int i;
+  gfc_code *do_code = code->block->next;
+
+  for (i = 1; i < depth; i++)
+{
+  gfc_symbol *ivar = do_code->ext.iterator->var->symtree->n.sym;
+  if (var == ivar)
+	return true;
+  do_code = do_code->block->next;
+}
+  return false;
+}
+
+/* CODE is an OMP loop construct.  Return true if EXPR does not reference
+   any iteration variables outer to level DEPTH.  */
+static bool
+expr_is_invariant (gfc_code *code, int depth, gfc_expr *expr)
+{
+  int i;
+  gfc_code *do_code = code->block->next;
+
+  for (i = 1; i < depth; i++)
+{
+  gfc_symbol *ivar = do_code->ext.iterator->var->symtree->n.sym;
+  if (gfc_find_sym_in_expr (ivar, expr))
+	return false;
+  do_code = do_code->block->next;
+}
+  return true;
+}
+
+/* CODE is an OMP loop construct.  Return true if EXPR matches one of the
+   canonical forms for a bound expression.  It may include references to
+   an iteration variable outer to level DEPTH; set OUTER_VARP if so.  */
+static bool
+bound_expr_is_canonical (gfc_code *code, int depth, gfc_expr *expr,
+			 gfc_symbol **outer_varp)
+{
+  gfc_expr *expr2 = NULL;
+
+  /* Rectangular case.  */
+  if (depth == 0 || expr_is_invariant (code, depth, expr))
+return true;
+
+  /* Any simple variable that didn't pass expr_is_invariant must be
+ an outer_var.  */
+  if (expr->expr_type == EXPR_VARIABLE && expr->rank == 0)
+{
+  *outer_varp = expr->symtree->n.sym;
+  return true;
+}
+
+  /* All other permitted forms are binary operators.  */
+  if (expr->expr_type != EXPR_OP)
+return false;
+
+  /* Check for plus/minus a loop invariant expr.  */
+  if (expr->value.op.op == INTRINSIC_PLUS
+  || expr->value.op.op == INTRINSIC_MINUS)
+{
+  if (expr_is_invariant (code, depth,

[PATCH] Fortran: Add location info to OpenMP tree nodes

2022-03-25 Thread Sandra Loosemore

I've got another patch forthcoming (stage 1 material) that adds some new 
diagnostics for non-rectangular loops during gimplification of OMP 
nodes.  When I was working on that, I discovered that the Fortran front 
end wasn't attaching location information to the tree nodes 
corresponding to the various OMP directives, so the new errors weren't 
coming out with location info either.  I went through trans-openmp.cc 
and fixed all the places where make_node was being called to explicitly 
set the location.


I don't have a test case specifically for this change, but my test cases 
for the new diagnostics in the non-rectangular loops patch do exercise 
it.  Is this OK for trunk now, or for stage 1 when we get there?


-Sandracommit 4c745003d0b39d0e92032b62421df4920753783a
Author: Sandra Loosemore 
Date:   Thu Mar 24 21:02:34 2022 -0700

Fortran: Add location info to OpenMP tree nodes

	 gcc/fortran/
	 * trans-openmp.cc (gfc_trans_omp_critical): Set location on OMP
	 tree node.
	 (gfc_trans_omp_do): Likewise.
	 (gfc_trans_omp_masked): Likewise.
	 (gfc_trans_omp_do_simd): Likewise.
	 (gfc_trans_omp_scope): Likewise.
	 (gfc_trans_omp_taskgroup): Likewise.
	 (gfc_trans_omp_taskwait): Likewise.
	 (gfc_trans_omp_distribute): Likewise.
	 (gfc_trans_omp_taskloop): Likewise.
	 (gfc_trans_omp_master_masked_taskloop): Likewise.

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 25dde82..ba3ff71 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5012,6 +5012,7 @@ gfc_trans_omp_critical (gfc_code *code)
 name = get_identifier (code->ext.omp_clauses->critical_name);
   gfc_start_block ();
   stmt = make_node (OMP_CRITICAL);
+  SET_EXPR_LOCATION (stmt, gfc_get_location (>loc));
   TREE_TYPE (stmt) = void_type_node;
   OMP_CRITICAL_BODY (stmt) = gfc_trans_code (code->block->next);
   OMP_CRITICAL_NAME (stmt) = name;
@@ -5044,6 +5045,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   unsigned ix;
   vec *saved_doacross_steps = doacross_steps;
   gfc_expr_list *tile = do_clauses ? do_clauses->tile_list : clauses->tile_list;
+  gfc_code *orig_code = code;
 
   /* Both collapsed and tiled loops are lowered the same way.  In
  OpenACC, those clauses are not compatible, so prioritize the tile
@@ -5398,6 +5400,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
 default: gcc_unreachable ();
 }
 
+  SET_EXPR_LOCATION (stmt, gfc_get_location (_code->loc));
   TREE_TYPE (stmt) = void_type_node;
   OMP_FOR_BODY (stmt) = gfc_finish_block ();
   OMP_FOR_CLAUSES (stmt) = omp_clauses;
@@ -5670,6 +5673,7 @@ gfc_trans_omp_masked (gfc_code *code, gfc_omp_clauses *clauses)
   gfc_start_block ();
   tree omp_clauses = gfc_trans_omp_clauses (, clauses, code->loc);
   tree stmt = make_node (OMP_MASKED);
+  SET_EXPR_LOCATION (stmt, gfc_get_location (>loc));
   TREE_TYPE (stmt) = void_type_node;
   OMP_MASKED_BODY (stmt) = body;
   OMP_MASKED_CLAUSES (stmt) = omp_clauses;
@@ -6444,6 +6448,7 @@ gfc_trans_omp_do_simd (gfc_code *code, stmtblock_t *pblock,
   if (flag_openmp)
 {
   stmt = make_node (OMP_FOR);
+  SET_EXPR_LOCATION (stmt, gfc_get_location (>loc));
   TREE_TYPE (stmt) = void_type_node;
   OMP_FOR_BODY (stmt) = body;
   OMP_FOR_CLAUSES (stmt) = omp_do_clauses;
@@ -6616,6 +6621,7 @@ gfc_trans_omp_scope (gfc_code *code)
   tree omp_clauses = gfc_trans_omp_clauses (, code->ext.omp_clauses,
 	code->loc);
   tree stmt = make_node (OMP_SCOPE);
+  SET_EXPR_LOCATION (stmt, gfc_get_location (>loc));
   TREE_TYPE (stmt) = void_type_node;
   OMP_SCOPE_BODY (stmt) = body;
   OMP_SCOPE_CLAUSES (stmt) = omp_clauses;
@@ -6691,6 +6697,7 @@ gfc_trans_omp_taskgroup (gfc_code *code)
   gfc_start_block ();
   tree body = gfc_trans_code (code->block->next);
   tree stmt = make_node (OMP_TASKGROUP);
+  SET_EXPR_LOCATION (stmt, gfc_get_location (>loc));
   TREE_TYPE (stmt) = void_type_node;
   OMP_TASKGROUP_BODY (stmt) = body;
   OMP_TASKGROUP_CLAUSES (stmt) = gfc_trans_omp_clauses (,
@@ -6711,6 +6718,7 @@ gfc_trans_omp_taskwait (gfc_code *code)
   stmtblock_t block;
   gfc_start_block ();
   tree stmt = make_node (OMP_TASK);
+  SET_EXPR_LOCATION (stmt, gfc_get_location (>loc));
   TREE_TYPE (stmt) = void_type_node;
   OMP_TASK_BODY (stmt) = NULL_TREE;
   OMP_TASK_CLAUSES (stmt) = gfc_trans_omp_clauses (,
@@ -6788,6 +6796,7 @@ gfc_trans_omp_distribute (gfc_code *code, gfc_omp_clauses *clausesa)
   if (flag_openmp)
 {
   tree distribute = make_node (OMP_DISTRIBUTE);
+  SET_EXPR_LOCATION (distribute, gfc_get_location (>loc));
   TREE_TYPE (distribute) = void_type_node;
   OMP_FOR_BODY (distribute) = stmt;
   OMP_FOR_CLAUSES (distribute) = omp_clauses;
@@ -7008,6 +7017,7 @@ gfc_trans_omp_taskloop (gfc_code *code, gfc_exec_op op)
   if (flag_openmp)
 {
   tree taskloop = make_node (OMP_TASKLOOP);
+  SET_EXPR_LOCATION

[PATCH] Fortran: Fix clause splitting for OMP masked taskloop directive

2022-03-25 Thread Sandra Loosemore

I ran into this bug in the handling of clauses on the combined "masked 
taskloop" OMP directive when I was working on something else.  The fix 
turned out to be a 1-liner.  OK for trunk?


-Sandracommit 17c4fa0bd97c070945004095a06fb7d9e91869e3
Author: Sandra Loosemore 
Date:   Wed Mar 23 18:45:25 2022 -0700

Fortran: Fix clause splitting for OMP masked taskloop directive

This patch fixes an obvious coding goof that caused all clauses for
the combined OMP masked taskloop directive to be discarded.

	gcc/fortran/
	* trans-openmp.cc (gfc_split_omp_clauses): Fix mask for
	EXEC_OMP_MASKED_TASKLOOP.

	gcc/testsuite/
	* gfortran.dg/gomp/masked-taskloop.f90: New.

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 101924f..25dde82 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5998,7 +5998,7 @@ gfc_split_omp_clauses (gfc_code *code,
   innermost = GFC_OMP_SPLIT_DO;
   break;
 case EXEC_OMP_MASKED_TASKLOOP:
-  mask = GFC_OMP_SPLIT_MASKED | GFC_OMP_SPLIT_TASKLOOP;
+  mask = GFC_OMP_MASK_MASKED | GFC_OMP_MASK_TASKLOOP;
   innermost = GFC_OMP_SPLIT_TASKLOOP;
   break;
 case EXEC_OMP_MASTER_TASKLOOP:
diff --git a/gcc/testsuite/gfortran.dg/gomp/masked-taskloop.f90 b/gcc/testsuite/gfortran.dg/gomp/masked-taskloop.f90
new file mode 100644
index 000..6fb7111
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/masked-taskloop.f90
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! { dg-additional-options "-fopenmp -fdump-tree-original" }
+
+! There was a bug in the clause splitting for the "masked taskloop"
+! combined directive that caused it to lose all the clauses.
+
+subroutine s1 (a1, a2)
+  integer :: a1, a2
+  integer :: i, j
+
+  !$omp masked taskloop collapse(2) grainsize(4)
+  do i = 1, a1
+do j = 1, a2
+end do
+  end do
+
+end subroutine
+
+! { dg-final { scan-tree-dump "omp taskloop collapse\\(2\\) grainsize\\(4\\)" "original" } }

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread H.J. Lu via Gcc-patches

On Fri, Mar 25, 2022 at 6:08 PM Hongyu Wang  wrote:
>
> Is it possible to create a test case that gas would throw an error for
> invalid operands?

You can use -ffix-xmmN to disable XMM0-15.

> H.J. Lu via Gcc-patches  于2022年3月26日周六 04:50写道：
> >
> > Since KL instructions have no AVX512 version, replace the "v" register
> > constraint with the "x" register constraint.
> >
> > PR target/105058
> > * config/i386/sse.md (loadiwkey): Replace "v" with "x".
> > (aesu8): Likewise.
> > ---
> >  gcc/config/i386/sse.md | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 29802d00ce6..33bd2c4768a 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -28364,8 +28364,8 @@ (define_insn "avx512f_dpbf16ps__mask"
> >
> >  ;; KEYLOCKER
> >  (define_insn "loadiwkey"
> > -  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "v")
> > - (match_operand:V2DI 1 "register_operand" "v")
> > +  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "x")
> > + (match_operand:V2DI 1 "register_operand" "x")
> >   (match_operand:V2DI 2 "register_operand" "Yz")
> >   (match_operand:SI   3 "register_operand" "a")]
> >  UNSPECV_LOADIWKEY)
> > @@ -28498,7 +28498,7 @@ (define_int_attr aesklvariant
> > (UNSPECV_AESENC256KLU8 "enc256kl")])
> >
> >  (define_insn "aesu8"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=v")
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x")
> > (unspec_volatile:V2DI [(match_operand:V2DI 1 "register_operand" "0")
> >(match_operand:BLK   2 "memory_operand" "m")]
> >   AESDECENCKL))
> > --
> > 2.35.1
> >



-- 
H.J.

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread Hongyu Wang via Gcc-patches

Is it possible to create a test case that gas would throw an error for
invalid operands?

H.J. Lu via Gcc-patches  于2022年3月26日周六 04:50写道：
>
> Since KL instructions have no AVX512 version, replace the "v" register
> constraint with the "x" register constraint.
>
> PR target/105058
> * config/i386/sse.md (loadiwkey): Replace "v" with "x".
> (aesu8): Likewise.
> ---
>  gcc/config/i386/sse.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 29802d00ce6..33bd2c4768a 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -28364,8 +28364,8 @@ (define_insn "avx512f_dpbf16ps__mask"
>
>  ;; KEYLOCKER
>  (define_insn "loadiwkey"
> -  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "v")
> - (match_operand:V2DI 1 "register_operand" "v")
> +  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "x")
> + (match_operand:V2DI 1 "register_operand" "x")
>   (match_operand:V2DI 2 "register_operand" "Yz")
>   (match_operand:SI   3 "register_operand" "a")]
>  UNSPECV_LOADIWKEY)
> @@ -28498,7 +28498,7 @@ (define_int_attr aesklvariant
> (UNSPECV_AESENC256KLU8 "enc256kl")])
>
>  (define_insn "aesu8"
> -  [(set (match_operand:V2DI 0 "register_operand" "=v")
> +  [(set (match_operand:V2DI 0 "register_operand" "=x")
> (unspec_volatile:V2DI [(match_operand:V2DI 1 "register_operand" "0")
>(match_operand:BLK   2 "memory_operand" "m")]
>   AESDECENCKL))
> --
> 2.35.1
>

Re: rs6000/testsuite: Use -mdejagnu-cpu= and -mdejagnu-tune= options

2022-03-25 Thread Segher Boessenkool

On Fri, Mar 25, 2022 at 06:15:56PM -0500, Peter Bergner wrote:
> On 3/25/22 4:08 PM, Segher Boessenkool wrote:
> > On Fri, Mar 25, 2022 at 02:51:38PM -0500, Peter Bergner wrote:
> > It seems likely many of these tests should move to g++.target/powerpc .
> 
> Probably, that can be a follow on patch.  Maybe a good first patch for Surya.

Certainly, it should be a separate patch no matter what; just an
obvious improvement, that your patch exposes :-)  It is quite unusual
to see -mcpu= in target-independent testcases: if this particular -mcpu=
(or any other machine flag) exposes the problem, you typically want to
build the testcase everywhere else anyway, for increased coverage :-)

> > Those that should not should likely not use -mcpu= in the first place
> > (instead, those tests should use has_arch_pwrN).
> 
> If the test cases explicitly added -mcpu=, I'm guessing they need them
> to test whatever the test case is checking for.  If we remove the -mcpu=
> and reply on dg-require has_arch_pwrN or whatever, then the test case
> would only run whenever the default flags match that, right?  So it
> would seem we'd get less test coverage than before.

It comes down to what the test wants to test.  For target tests -m
options are fine and usual, in most cases you want to test a bug or some
code generation that only happens with those flags; for generic tests,
-m options are unusual.

> >> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
> > 
> > I missed these two in reviewing when the -mcpu= was introduced, oops.
> 
> It's WAY too easy to miss those, since -mcpu= is such a common option
> that we see and use everyday, we almost expect to see it, so it doesn't
> look out of place or wrong.

Yes it is very easy.  But I trained myself pretty well apparently :-)

> > Okay for trunk.  Also okay for backports if you want / if you think it
> > useful.  Thanks!
> 
> Thanks, commit pushed.  I had not thought about backports, but if it
> helps stabilize our test results there, it can't hurt.  I'll have a
> look and see if the tests even exist there or not.

It is mostly useful because this causes many more things to be tested.
In the grand scheme of things this is just a few tests, so if it is
hard, just don't bother :-)


Segher

Re: rs6000/testsuite: Use -mdejagnu-cpu= and -mdejagnu-tune= options

2022-03-25 Thread Peter Bergner via Gcc-patches

On 3/25/22 4:08 PM, Segher Boessenkool wrote:
> On Fri, Mar 25, 2022 at 02:51:38PM -0500, Peter Bergner wrote:
>> This patch updates the POWER testsuite test cases using -mcpu= and -mtune=
>> to use the preferred -mdejagnu-cpu= and -mdejagnu-tune= options.  This also
>> obviates the need for the dg-skip-if directive, since the user cannot
>> override the -mcpu= value being used to compile the test case.
> 
> So this is all testcases that say "do not override -mcpu"?

Not all of them, but most of them, yes.

> It seems likely many of these tests should move to g++.target/powerpc .

Probably, that can be a follow on patch.  Maybe a good first patch for Surya.

> Those that should not should likely not use -mcpu= in the first place
> (instead, those tests should use has_arch_pwrN).

If the test cases explicitly added -mcpu=, I'm guessing they need them
to test whatever the test case is checking for.  If we remove the -mcpu=
and reply on dg-require has_arch_pwrN or whatever, then the test case
would only run whenever the default flags match that, right?  So it
would seem we'd get less test coverage than before.

>> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
> 
> I missed these two in reviewing when the -mcpu= was introduced, oops.

It's WAY too easy to miss those, since -mcpu= is such a common option
that we see and use everyday, we almost expect to see it, so it doesn't
look out of place or wrong.

> Okay for trunk.  Also okay for backports if you want / if you think it
> useful.  Thanks!

Thanks, commit pushed.  I had not thought about backports, but if it
helps stabilize our test results there, it can't hurt.  I'll have a
look and see if the tests even exist there or not.

Peter

[committed] wwwdocs: Add release notes for new C2X features in GCC 12

2022-03-25 Thread Joseph Myers

Committed.

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 9cff81b9..689feeba 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -193,6 +193,27 @@ a work-in-progress.
   
 
 
+C
+
+  Some new features from the upcoming C2X revision of the ISO C
+  standard are supported with -std=c2x
+  and -std=gnu2x.  Some of these features are also
+  supported as extensions when compiling for older language versions.
+  In addition to the features listed, some features previously
+  supported as extensions and now added to the C standard are enabled
+  by default in C2X mode and not diagnosed with -std=c2x
+  -Wpedantic.
+  
+Digit separators (as in C++) are supported for C2X.
+The #elifdef and #elifndef
+preprocessing directives are now supported.
+The printf and scanf format checking
+with -Wformat now supports the %b format
+specified by C2X for binary integers, and the %B
+format recommended by C2X for printf.
+  
+
+
 C++
 
   Several C++23 features have been implemented:

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] c++: ICE with aggregate assignment and DMI [PR104583]

2022-03-25 Thread Marek Polacek via Gcc-patches

The attached 93280 test no longer ICEs but looks like it was never added to the
testsuite.  The 104583 test, modified so that it closely resembles 93280, still
ICEs.

The problem is that in 104583 we have a value-init from {} (the line A a{};),
so this code in convert_like_internal

 7960 /* If we're initializing from {}, it's value-initialization.  */
 7961 if (BRACE_ENCLOSED_INITIALIZER_P (expr)
 7962 && CONSTRUCTOR_NELTS (expr) == 0
 7963 && TYPE_HAS_DEFAULT_CONSTRUCTOR (totype)
 7964 && !processing_template_decl)
 7965   {
 7966 bool direct = CONSTRUCTOR_IS_DIRECT_INIT (expr);
...
 7974 TARGET_EXPR_DIRECT_INIT_P (expr) = direct;

sets TARGET_EXPR_DIRECT_INIT_P.  This does not happen in 93280 where we
initialize from {0}.

In 104583, when gimplifying, the d = {}; line, we have

d = {.a=TARGET_EXPR >> }

where the TARGET_EXPR is the one with TARGET_EXPR_DIRECT_INIT_P set.  In
gimplify_init_ctor_preeval we do

 4724   FOR_EACH_VEC_SAFE_ELT (v, ix, ce)
 4725 gimplify_init_ctor_preeval (>value, pre_p, post_p, data);

so we gimplify the TARGET_EXPR, crashing at

 744 case TARGET_EXPR:
 745   /* A TARGET_EXPR that expresses direct-initialization should have
been
 746  elided by cp_gimplify_init_expr.  */
 747   gcc_checking_assert (!TARGET_EXPR_DIRECT_INIT_P (*expr_p));

but there is no INIT_EXPR so cp_gimplify_init_expr was never called!

Now, the fix for c++/93280

says "let's only set TARGET_EXPR_DIRECT_INIT_P when we're using the DMI in
a constructor." and the comment talks about the full initialization.  Is
is accurate to say that our TARGET_EXPR does not represent the full
initialization, because it only initializes the 'a' subobject?  If so,
then maybe get_nsdmi should clear TARGET_EXPR_DIRECT_INIT_P when in_ctor
is false.

I've compared the 93280.s and 104583.s files, they differ only in one
movl $0, so there are no extra calls and similar.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/93280
PR c++/104583

gcc/cp/ChangeLog:

* init.cc (get_nsdmi): Set TARGET_EXPR_DIRECT_INIT_P to in_ctor.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-list7.C: New test.
* g++.dg/cpp0x/nsdmi-list8.C: New test.
---
 gcc/cp/init.cc   |  8 
 gcc/testsuite/g++.dg/cpp0x/nsdmi-list7.C | 17 +
 gcc/testsuite/g++.dg/cpp0x/nsdmi-list8.C | 17 +
 3 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-list7.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-list8.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 08767679dd4..fd32a8bd90f 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -679,10 +679,10 @@ get_nsdmi (tree member, bool in_ctor, tsubst_flags_t 
complain)
   if (simple_target)
 init = TARGET_EXPR_INITIAL (init);
   init = break_out_target_exprs (init, /*loc*/true);
-  if (in_ctor && init && TREE_CODE (init) == TARGET_EXPR)
-/* This expresses the full initialization, prevent perform_member_init from
-   calling another constructor (58162).  */
-TARGET_EXPR_DIRECT_INIT_P (init) = true;
+  if (init && TREE_CODE (init) == TARGET_EXPR)
+/* If this expresses the full initialization, prevent perform_member_init
+   from calling another constructor (58162).  */
+TARGET_EXPR_DIRECT_INIT_P (init) = in_ctor;
   if (simple_target && TREE_CODE (init) != CONSTRUCTOR)
 /* Now put it back so C++17 copy elision works.  */
 init = get_target_expr (init);
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-list7.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-list7.C
new file mode 100644
index 000..62b07429bec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-list7.C
@@ -0,0 +1,17 @@
+// PR c++/93280
+// { dg-do compile { target c++11 } }
+
+struct A {
+  template  A(T);
+  int c;
+};
+
+struct D {
+  A a{0};
+};
+
+void g()
+{
+  D d;
+  d = {};
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-list8.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-list8.C
new file mode 100644
index 000..fe73da8f98d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-list8.C
@@ -0,0 +1,17 @@
+// PR c++/104583
+// { dg-do compile { target c++11 } }
+
+struct A {
+  A();
+  int c;
+};
+
+struct D {
+  A a{};
+};
+
+void g()
+{
+  D d;
+  d = {};
+}

base-commit: bdd7b679e8497c07e25726f6ab6429e4c4d429c7
-- 
2.35.1

[PATCH] gimple-fold: fix location of loads for memory ops [PR104308]

2022-03-25 Thread David Malcolm via Gcc-patches

PR analyzer/104308 reports that when -Wanalyzer-use-of-uninitialized-value
complains about certain memmove operations where the source is
uninitialized, the diagnostic uses UNKNOWN_LOCATION:

In function 'main':
cc1: warning: use of uninitialized value '*(short unsigned int *) + 1' 
[CWE-457] [-Wanalyzer-use-of-uninitialized-value]
  'main': event 1
|
|pr104308.c:5:8:
|5 |   char s[5]; /* { dg-message "region created on stack here" } */
|  |^
|  ||
|  |(1) region created on stack here
|
  'main': event 2
|
|cc1:
| (2): use of uninitialized value '*(short unsigned int *) + 1' here
|

The issue is that gimple_fold_builtin_memory_op converts a memmove to:

  _3 = MEM  [(char * {ref-all})_1];
  MEM  [(char * {ref-all})] = _3;

but only sets the location of the 2nd stmt, not the 1st.

Fixed thusly, giving:

pr104308.c: In function 'main':
pr104308.c:6:3: warning: use of uninitialized value '*(short unsigned int *) 
+ 1' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
6 |   memmove(s, s + 1, 2); /* { dg-warning "use of uninitialized value" } 
*/
  |   ^~~~
  'main': events 1-2
|
|5 |   char s[5]; /* { dg-message "region created on stack here" } */
|  |^
|  ||
|  |(1) region created on stack here
|6 |   memmove(s, s + 1, 2); /* { dg-warning "use of uninitialized 
value" } */
|  |   
|  |   |
|  |   (2) use of uninitialized value '*(short unsigned int *) + 1' 
here
|

One side-effect of this change is a change in part of the output of
gcc.dg/uninit-40.c from:

  uninit-40.c:47:3: warning: ‘*(long unsigned int *)([1][0][0])’ is used 
uninitialized [-Wuninitialized]
 47 |   __builtin_memcpy ([1], [1], sizeof (V));
|   ^~~
  uninit-40.c:45:5: note: ‘*(long unsigned int *)([1][0][0])’ was declared 
here
 45 |   V u[2], v[2];
| ^

to:

  uninit-40.c:47:3: warning: ‘u’ is used uninitialized [-Wuninitialized]
 47 |   __builtin_memcpy ([1], [1], sizeof (V));
|   ^~~
  uninit-40.c:45:5: note: ‘u’ declared here
 45 |   V u[2], v[2];
| ^

What's happening is that pass "early_uninit"(29)'s call to
maybe_warn_operand is guarded by this condition:
  1051else if (gimple_assign_load_p (stmt)
  1052 && gimple_has_location (stmt))

Before the patch, the stmt:
  _3 = MEM  [(char * {ref-all}) + 8B];
has no location, and so early_uninit skips this operand at line
1052 above.  Later, pass "uninit"(217) tests the var_decl "u$8", and
emits a warning for it.

With the patch, the stmt has a location, and so early_uninit emits a
warning for "u" and sets a NW_UNINIT warning suppression at that
location.  Later, pass "uninit"(217)'s test of "u$8" is rejected
due to that per-location suppression of uninit warnings, from the
earlier warning.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for stage 4?  or for next stage 1?

gcc/ChangeLog:
PR analyzer/104308
* gimple-fold.cc (gimple_fold_builtin_memory_op): When optimizing
to loads then stores, set the location of the new load stmt.

gcc/testsuite/ChangeLog:
PR analyzer/104308
* gcc.dg/analyzer/pr104308.c: New test.
* gcc.dg/uninit-40.c (foo): Update expression in expected message.

Signed-off-by: David Malcolm 
---
 gcc/gimple-fold.cc   | 1 +
 gcc/testsuite/gcc.dg/analyzer/pr104308.c | 8 
 gcc/testsuite/gcc.dg/uninit-40.c | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr104308.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 5eff7d68ac1..e73bc6a7137 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -1039,6 +1039,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
  new_stmt);
  gimple_assign_set_lhs (new_stmt, srcmem);
  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+ gimple_set_location (new_stmt, loc);
  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
}
  if (dest_align < GET_MODE_ALIGNMENT (mode))
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr104308.c 
b/gcc/testsuite/gcc.dg/analyzer/pr104308.c
new file mode 100644
index 000..9cd5ee6feee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr104308.c
@@ -0,0 +1,8 @@
+#include 
+
+int main()
+{
+  char s[5]; /* { dg-message "region created on stack here" } */
+  memmove(s, s + 1, 2); /* { dg-warning "use of uninitialized value" } */
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/uninit-40.c b/gcc/testsuite/gcc.dg/uninit-40.c
index 8708079d397..567707a885e

Re: rs6000/testsuite: Use -mdejagnu-cpu= and -mdejagnu-tune= options

2022-03-25 Thread Segher Boessenkool

On Fri, Mar 25, 2022 at 02:51:38PM -0500, Peter Bergner wrote:
> This patch updates the POWER testsuite test cases using -mcpu= and -mtune=
> to use the preferred -mdejagnu-cpu= and -mdejagnu-tune= options.  This also
> obviates the need for the dg-skip-if directive, since the user cannot
> override the -mcpu= value being used to compile the test case.

So this is all testcases that say "do not override -mcpu"?

It seems likely many of these tests should move to g++.target/powerpc .
Those that should not should likely not use -mcpu= in the first place
(instead, those tests should use has_arch_pwrN).

> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c

I missed these two in reviewing when the -mcpu= was introduced, oops.

Okay for trunk.  Also okay for backports if you want / if you think it
useful.  Thanks!

Segher

[PATCH] x86: Use -msse2 on gcc.target/i386/pr95483-1.c

2022-03-25 Thread H.J. Lu via Gcc-patches

Replace -msse with -msse2 since  requires SSE2.

PR testsuite/105055
* gcc.target/i386/pr95483-1.c: Replace -msse with -msse2.
---
 gcc/testsuite/gcc.target/i386/pr95483-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr95483-1.c 
b/gcc/testsuite/gcc.target/i386/pr95483-1.c
index 6b008261f35..0f3e0bf9280 100644
--- a/gcc/testsuite/gcc.target/i386/pr95483-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr95483-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse" } */
+/* { dg-options "-O2 -msse2" } */
 /* { dg-final { scan-assembler-times "pxor\[ 
\\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "pinsrw\[ 
\\t\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "pextrw\[ 
\\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*(?:\n|\[ \\t\]+#)" 1 } } */
-- 
2.35.1

[PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread H.J. Lu via Gcc-patches

Since KL instructions have no AVX512 version, replace the "v" register
constraint with the "x" register constraint.

PR target/105058
* config/i386/sse.md (loadiwkey): Replace "v" with "x".
(aesu8): Likewise.
---
 gcc/config/i386/sse.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 29802d00ce6..33bd2c4768a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -28364,8 +28364,8 @@ (define_insn "avx512f_dpbf16ps__mask"
 
 ;; KEYLOCKER
 (define_insn "loadiwkey"
-  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "v")
- (match_operand:V2DI 1 "register_operand" "v")
+  [(unspec_volatile:V2DI [(match_operand:V2DI 0 "register_operand" "x")
+ (match_operand:V2DI 1 "register_operand" "x")
  (match_operand:V2DI 2 "register_operand" "Yz")
  (match_operand:SI   3 "register_operand" "a")]
 UNSPECV_LOADIWKEY)
@@ -28498,7 +28498,7 @@ (define_int_attr aesklvariant
(UNSPECV_AESENC256KLU8 "enc256kl")])
 
 (define_insn "aesu8"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
(unspec_volatile:V2DI [(match_operand:V2DI 1 "register_operand" "0")
   (match_operand:BLK   2 "memory_operand" "m")]
  AESDECENCKL))
-- 
2.35.1

Re: [PATCH v3] Document that the 'access' and 'nonnull' attributes are independent

2022-03-25 Thread Martin Sebor via Gcc-patches


On 3/25/22 12:45, David Malcolm wrote:

On Wed, 2022-03-23 at 17:52 +0100, Sebastian Huber wrote:

On 23/03/2022 17:31, Martin Sebor via Gcc-patches wrote:


The concern is that the constraints implied by atttributes access
and
nonnull are independent of each other.  I would suggest to document
that without talking about dereferencing because that's not implied
by either of them.  E.g., something like this (feel free to tweak
it
as you see fit):

Note that the @code{access} attribute doesn't imply the same
constraint as attribute @code{nonnull} (@pxref{Attribute
nonnull}).
The latter attribute should be used to annotate arguments that
must
never be null, regardless of the value of the size argument.


I would not give an advice on using the nonnull attribute here. This
attribute could have pretty dangerous effects in the function
definition
(removal of null pointer checks).



That's a fair point.

Here's a v3 of the patch, which tones down the advice, and mentions that
there are caveats when directing the reader to the "nonnull" attribute.

How does this look?


This version looks good to me.

Thanks
Martin



gcc/ChangeLog:
* doc/extend.texi (Common Function Attributes): Document that
'access' does not imply 'nonnull'.

Signed-off-by: David Malcolm 
---
  gcc/doc/extend.texi | 8 
  1 file changed, 8 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a4a25e86928..539dad7001d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2652,6 +2652,14 @@ The mode is intended to be used as a means to help 
validate the expected
  object size, for example in functions that call @code{__builtin_object_size}.
  @xref{Object Size Checking}.
  
+Note that the @code{access} attribute merely specifies how an object

+referenced by the pointer argument can be accessed; it does not imply that
+an access @strong{will} happen.  Also, the @code{access} attribute does not
+imply the attribute @code{nonnull}; it may be appropriate to add both 
attributes
+at the declaration of a function that unconditionally manipulates a buffer via
+a pointer argument.  See the @code{nonnull} attribute for more information and
+caveats.
+
  @item alias ("@var{target}")
  @cindex @code{alias} function attribute
  The @code{alias} attribute causes the declaration to be emitted as an alias

[committed] libstdc++: Add more doxygen comments in

2022-03-25 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/bit (bit_cast, byteswap, endian): Add doxygen
comments.
---
 libstdc++-v3/include/std/bit | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index a40f1ce99df..ef19d649e32 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -69,6 +69,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #define __cpp_lib_bit_cast 201806L
 
   /// Create a value of type `To` from the bits of `from`.
+  /**
+   * @tparam _To   A trivially-copyable type.
+   * @param __from A trivially-copyable object of the same size as `_To`.
+   * @return   An object of type `_To`.
+   * @since C++20
+   */
   template
 [[nodiscard]]
 constexpr _To
@@ -86,6 +92,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #define __cpp_lib_byteswap 202110L
 
   /// Reverse order of bytes in the object representation of `value`.
+  /**
+   * @tparam _Tp An integral type.
+   * @param __value  An object of integer type.
+   * @return An object of the same type, with the bytes reversed.
+   * @since C++23
+   */
   template
 [[nodiscard]]
 constexpr enable_if_t::value, _Tp>
@@ -444,7 +456,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #define __cpp_lib_endian 201907L
 
-  /// Byte order
+  /// Byte order constants
+  /**
+   * The platform endianness can be checked by comparing `std::endian::native`
+   * to one of `std::endian::big` or `std::endian::little`.
+   *
+   * @since C++20
+   */
   enum class endian
   {
 little = __ORDER_LITTLE_ENDIAN__,
-- 
2.34.1

rs6000/testsuite: Use -mdejagnu-cpu= and -mdejagnu-tune= options

2022-03-25 Thread Peter Bergner via Gcc-patches

This patch updates the POWER testsuite test cases using -mcpu= and -mtune=
to use the preferred -mdejagnu-cpu= and -mdejagnu-tune= options.  This also
obviates the need for the dg-skip-if directive, since the user cannot
override the -mcpu= value being used to compile the test case.

This passed regression testing with no regressions on powerpc64le-linux.
Ok for trunk?

Peter


gcc/testsuite/

* g++.dg/pr65240-1.C: Use -mdejagnu-cpu=.  Remove dg-skip-if.
* g++.dg/pr65240-2.C: Likewise.
* g++.dg/pr65240-3.C: Likewise.
* g++.dg/pr65240-4.C: Likewise.
* g++.dg/pr65242.C: Likewise.
* g++.dg/pr67211.C: Likewise.
* g++.dg/pr69667.C: Likewise.
* g++.dg/pr71294.C: Likewise.
* g++.dg/pr84279.C: Likewise.
* g++.dg/torture/ppc-ldst-array.C: Likewise.
* gfortran.dg/nint_p7.f90: Likewise.
* gfortran.dg/pr102860.f90: Likewise.
* gcc.target/powerpc/fusion.c: Use -mdejagnu-cpu= and -mdejagnu-tune=.
* gcc.target/powerpc/fusion2.c: Likewise.
* gcc.target/powerpc/int_128bit-runnable.c: Use -mdejagnu-cpu=.
* gcc.target/powerpc/test_mffsl.c: Likewise.
* gfortran.dg/pr47614.f: Likewise.
* gfortran.dg/pr58968.f: Likewise.


diff --git a/gcc/testsuite/g++.dg/pr65240-1.C b/gcc/testsuite/g++.dg/pr65240-1.C
index d2e25b65fca..ff8910df6a1 100644
--- a/gcc/testsuite/g++.dg/pr65240-1.C
+++ b/gcc/testsuite/g++.dg/pr65240-1.C
@@ -1,8 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=small -mno-fp-in-toc 
-Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mno-fp-in-toc -Wno-return-type" } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-2.C b/gcc/testsuite/g++.dg/pr65240-2.C
index 38d5020bd19..bdb7a62d73d 100644
--- a/gcc/testsuite/g++.dg/pr65240-2.C
+++ b/gcc/testsuite/g++.dg/pr65240-2.C
@@ -1,8 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=small -mfp-in-toc 
-Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mfp-in-toc -Wno-return-type" } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-3.C b/gcc/testsuite/g++.dg/pr65240-3.C
index e8463c91494..f37db9025d1 100644
--- a/gcc/testsuite/g++.dg/pr65240-3.C
+++ b/gcc/testsuite/g++.dg/pr65240-3.C
@@ -1,8 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-4.C b/gcc/testsuite/g++.dg/pr65240-4.C
index a119752d18e..efb6a6c06e7 100644
--- a/gcc/testsuite/g++.dg/pr65240-4.C
+++ b/gcc/testsuite/g++.dg/pr65240-4.C
@@ -1,8 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
-/* { dg-options "-mcpu=power7 -O3 -ffast-math -Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power7 -O3 -ffast-math -Wno-return-type" } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.dg/pr65242.C b/gcc/testsuite/g++.dg/pr65242.C
index be2ddaa85b2..662f375015f 100644
--- a/gcc/testsuite/g++.dg/pr65242.C
+++ b/gcc/testsuite/g++.dg/pr65242.C
@@ -1,8 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O3" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3" } */
 
 class A {
 public:
diff --git a/gcc/testsuite/g++.dg/pr67211.C b/gcc/testsuite/g++.dg/pr67211.C
index cb3d342c122..ac241818ab5 100644
---

[PATCH] Add condition coverage profiling

2022-03-25 Thread Jørgen Kvalsvik via Gcc-patches

Hello, and thanks for the review!

> 1) Do I correctly understand that __conditions_accu_true/false track
> every short-circuit sub-expression of a condition and record
> if a given sub-expr is seen to be true or false?

Sort-of. It is not really aware of sub-expressions at all, but tracks every
boolean condition/term in the full expression, mapped to a bitvector. I usually
find it easier to understand visually:

if (a || (b && c) || d)

terms | a b c d
--|
true  | 0 0 0 0
false | 0 0 0 0

Whenever a = true, true becomes 1000 and false remains . a = true would
short-circuit, (implemented as a normal edge to the "exit" node of the
expression) leaving bcd unevaluated. Coverage is then determined as the number
of unset bits in this vector. The accumulators are zero'd on every function
entry, and |= into the global "counter" on function exit.

> 2) As mentioned these are bit sets, can you please explain why you (&
> -value) from these sets?

Yes, excellent question. This should answer 3) too, and is the most surprsing
thing when unfamiliar with MC/DC.

For modified condition/decision coverage we need to prove that a condition's
value independently affects the outcome/decision. In other words, would
flipping a condition change the outcome?

Let's take your if (a || b) check, which says that 1/4 conditions are covered
on (0, 1). If a = 1, the expression is always true, and b is never evaluated.
In fact, the value of b does not change the outcome at all, so intuitively only
1/4 of the conditions are covered.

On (0, 1) b gets evaluated, so in a way we know that a = 0. However, the value
of a will no longer affect the outcome because (* || 1) is always true. In a
way, you can think of it as reverse short-circuiting, the same thing would
happen on (* && 0). This is what Wahlen, Heimdahl, and De Silva calls masking.
What I do is figure out when masking must happen by analyzing the CFG and zero
out the bits that are masked (i.e. no longer affect the outcome). This is in
practice what happens to the bits when regular short circuiting happen.

So while (0, 1) "sees" the condition a = false, it cannot prove that it
mattered based on your inputs. In general, you need N+1 test cases to cover N
conditionals with MC/DC. As a consequence, the only way of covering a = false
is (0, 0), which alone would be 2/4 cases covered.

I hope this made it clearer, if not I can write a more elaborate explanation
with more examples.

> 4) I noticed various CFG patterns in tree-profile.cc which are handled. Can
> you please explain how the algorithm works even without knowing the original
> paper?

My paper is not written yet, but I will describe the high-level algorithm (+
extracts from source comments) at the end of this email, as it is a bit long.

> 5) I noticed the following ICE:
Yes, this is an unfortunate flaw - I piggyback on the setup done by branch
coverage, which means I silently assume --coverage is used. I was unsure what
to do (and meant to ask, sorry!) as it is a larger decision - should
condition-coverage also imply branch coverage? Should condition coverage work
on its own? Other options?

I'm happy to implement either strategy - please advise on what you would
prefer.


> 6) Please follow the GNU coding style, most notable we replace 8 spaces with 
> a tab. And we break line before {, (, ... Remove noexcept specifiers for 
> functions and I think most of the newly added functions can be static. And 
> each newly added function should have a comment.

Sure thing, I'll add some tabs. All the helper functions are already static (in
an unnamed namespace), but I can remove the namespace and/or add static.

> 7) Please consider supporting of -profile-update=atomic where you'll need to
> utilize an atomic builts (see how other instrumentation looks like with
> it).
Will do! I always intended to support the atomics, not having them there is an
oversight. To confirm, I will only need to use atomics for the global counters,
right? The per-call accumulators are local to each activation record and should
not need the atomics, or am I missing something?

ALGORITHM

Phase 1: expression isolation from CFG analysis

Break the CFG into smaller pieces ("sections") by splitting it on dominators.
No expression can cross a dominator, so becomes a natural way of terminating
expression searches.

For each section, do a BFS from the root node and collect all reachable nodes
following edges that point to "conditional" nodes, i.e. nodes with outgoing
true edges. Series of single-parent single-exit nodes are contracted.
This gives a good estimate for an expression, but might include conditions in
the if/else blocks.

The algorithm then collects the immediate neighborhood, i.e. all nodes adjacent
to the collected set, dropping everything in the set itself. Then, a new BFS is
performed, but now "upwards" (i.e. following predecessors), skipping any
neighbor not dominated by the root (this eliminates loops and other expressions
with

Re: [PATCH] c++: Fix up __builtin_{bit_cast,convertvector} parsing

2022-03-25 Thread Jason Merrill via Gcc-patches


On 3/25/22 11:58, Jakub Jelinek wrote:

Hi!

Jonathan reported on IRC that we don't parse
__builtin_bit_cast (type, val).field
etc.
The problem is that for these 2 builtins we return from
cp_parser_postfix_expression instead of setting postfix_expression
to the cp_build_* value and falling through into the postfix regression
suffix handling loop.

Ok for trunk if it passes bootstrap/regtest?


OK.


2022-03-25  Jakub Jelinek  

* parser.cc (cp_parser_postfix_expression)
: Don't
return cp_build_{vec,convert,bit_cast} result right away, instead
set postfix_expression to it and break.

* c-c++-common/builtin-convertvector-3.c: New test.
* g++.dg/cpp2a/bit-cast15.C: New test.

--- gcc/cp/parser.cc.jj 2022-03-15 09:15:21.366108714 +0100
+++ gcc/cp/parser.cc2022-03-25 16:04:21.464248103 +0100
@@ -7525,8 +7525,10 @@ cp_parser_postfix_expression (cp_parser
}
/* Look for the closing `)'.  */
parens.require_close (parser);
-   return cp_build_vec_convert (expression, type_location, type,
-tf_warning_or_error);
+   postfix_expression
+ = cp_build_vec_convert (expression, type_location, type,
+ tf_warning_or_error);
+   break;
}
  
  case RID_BUILTIN_BIT_CAST:

@@ -7551,8 +7553,10 @@ cp_parser_postfix_expression (cp_parser
expression = cp_parser_assignment_expression (parser);
/* Look for the closing `)'.  */
parens.require_close (parser);
-   return cp_build_bit_cast (type_location, type, expression,
- tf_warning_or_error);
+   postfix_expression
+ = cp_build_bit_cast (type_location, type, expression,
+  tf_warning_or_error);
+   break;
}
  
  default:

--- gcc/testsuite/c-c++-common/builtin-convertvector-3.c.jj 2022-03-25 
16:23:18.033120090 +0100
+++ gcc/testsuite/c-c++-common/builtin-convertvector-3.c2022-03-25 
16:23:40.633799410 +0100
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v4si __attribute__((vector_size (4 * sizeof (int;
+typedef double v4df __attribute__((vector_size (4 * sizeof (double;
+double
+foo (void)
+{
+  v4si a = { 1, 2, 3, 4 };
+  return __builtin_convertvector (a, v4df)[1];
+}
--- gcc/testsuite/g++.dg/cpp2a/bit-cast15.C.jj  2022-03-25 16:26:22.271505979 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/bit-cast15.C 2022-03-25 16:26:15.907596274 
+0100
@@ -0,0 +1,19 @@
+// { dg-do compile }
+
+struct S { short a, b; };
+struct T { float a[4]; };
+struct U { int b[4]; };
+
+#if __SIZEOF_FLOAT__ == __SIZEOF_INT__
+int
+f1 (T )
+{
+  return __builtin_bit_cast (U, x).b[1];
+}
+
+float
+f2 (int ()[4])
+{
+  return __builtin_bit_cast (T, x).a[2];
+}
+#endif

Jakub

Re: [PATCH] c++: ICE when building builtin operator->* set [PR103455]

2022-03-25 Thread Jason Merrill via Gcc-patches


On 3/25/22 14:09, Patrick Palka wrote:

When constructing the builtin operator->* candidate set according to
the available conversion functions for each operand type, we end up
considering a candidate with C1=T (a TEMPLATE_TYPE_PARM) and C2=F,
during which we crash from lookup_base because dependent_type_p sees
a TEMPLATE_TYPE_PARM when processing_template_decl is cleared.

Sidestepping the question of whether we should be considering a
dependent conversion function here in the first place (which I'm not
sure about), it seems futile to check DERIVED_FROM_P for anything other
than an actual class type, so this patch fixes this ICE by guarding
the DERIVED_FROM_P test with CLASS_TYPE_P instead of MAYBE_CLASS_TYPE_P.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps the release branches?


OK for trunk and branches.


PR c++/103455

gcc/cp/ChangeLog:

* call.cc (add_builtin_candidate) : Check
CLASS_TYPE_P instead of MAYBE_CLASS_TYPE_P.

gcc/testsuite/ChangeLog:

* g++.dg/overload/builtin6.C: New test.
---
  gcc/cp/call.cc   |  2 +-
  gcc/testsuite/g++.dg/overload/builtin6.C | 14 ++
  2 files changed, 15 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/overload/builtin6.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index ec6c5d5baa2..dfe370d685d 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2821,7 +2821,7 @@ add_builtin_candidate (struct z_candidate **candidates, 
enum tree_code code,
  tree c1 = TREE_TYPE (type1);
  tree c2 = TYPE_PTRMEM_CLASS_TYPE (type2);
  
-	  if (MAYBE_CLASS_TYPE_P (c1) && DERIVED_FROM_P (c2, c1)

+ if (CLASS_TYPE_P (c1) && DERIVED_FROM_P (c2, c1)
  && (TYPE_PTRMEMFUNC_P (type2)
  || is_complete (TYPE_PTRMEM_POINTED_TO_TYPE (type2
break;
diff --git a/gcc/testsuite/g++.dg/overload/builtin6.C 
b/gcc/testsuite/g++.dg/overload/builtin6.C
new file mode 100644
index 000..25e45040094
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/builtin6.C
@@ -0,0 +1,14 @@
+// PR c++/103455
+
+struct A { };
+
+struct B {
+  operator A*() const;
+  template operator T*() const;
+};
+
+typedef void (A::*F)();
+
+void foo(B b, F f) {
+  (b->*f)();
+}

Re: [PATCH] c++: diagnosing if-stmt with non-constant branches [PR105050]

2022-03-25 Thread Jason Merrill via Gcc-patches


On 3/25/22 12:07, Patrick Palka wrote:

When an if-stmt is deemed non-constant because both of its branches are
non-constant, we issue a rather generic error which, given that it points
to the 'if' token, misleadingly suggests the condition is at fault:

   constexpr-105050.C:8:3: error: expression ‘’ is not a constant 
expression
   8 |   if (p != q && *p < 0)
 |   ^~

This patch clarifies the error message to read:

   constexpr-105050.C:8:3: error: neither branch of ‘if’ is a constant 
expression
   8 |   if (p != q && *p < 0)
 |   ^~

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.  I wonder if we want to then diagnose the branches individually, but 
let's leave that for stage 1, if at all.



PR c++/105050

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) :
Clarify error message when a if-stmt is non-constant because its
branches are non-constant.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-105050.C: New test.
---
  gcc/cp/constexpr.cc   |  7 ++-
  gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C | 12 
  2 files changed, 18 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 778680b8270..9c40b051574 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9439,7 +9439,12 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
}
}
if (flags & tf_error)
-   error_at (loc, "expression %qE is not a constant expression", t);
+   {
+ if (TREE_CODE (t) == IF_STMT)
+   error_at (loc, "neither branch of % is a constant expression");
+ else
+   error_at (loc, "expression %qE is not a constant expression", t);
+   }
return false;
  
  case VEC_INIT_EXPR:

diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C
new file mode 100644
index 000..99d5c9960ac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C
@@ -0,0 +1,12 @@
+// PR c++/105050
+// { dg-do compile { target c++14 } }
+
+void g();
+void h();
+
+constexpr void f(int* p, int *q) {
+  if (p != q && *p < 0) // { dg-error "neither branch of 'if' is a constant 
expression" }
+g();
+  else
+h();
+}

[PATCH v3] Document that the 'access' and 'nonnull' attributes are independent

2022-03-25 Thread David Malcolm via Gcc-patches

On Wed, 2022-03-23 at 17:52 +0100, Sebastian Huber wrote:
> On 23/03/2022 17:31, Martin Sebor via Gcc-patches wrote:
> > 
> > The concern is that the constraints implied by atttributes access
> > and
> > nonnull are independent of each other.  I would suggest to document
> > that without talking about dereferencing because that's not implied
> > by either of them.  E.g., something like this (feel free to tweak
> > it
> > as you see fit):
> > 
> >Note that the @code{access} attribute doesn't imply the same
> >constraint as attribute @code{nonnull} (@pxref{Attribute
> > nonnull}).
> >The latter attribute should be used to annotate arguments that
> > must
> >never be null, regardless of the value of the size argument.
> 
> I would not give an advice on using the nonnull attribute here. This 
> attribute could have pretty dangerous effects in the function
> definition 
> (removal of null pointer checks).
> 

That's a fair point.

Here's a v3 of the patch, which tones down the advice, and mentions that
there are caveats when directing the reader to the "nonnull" attribute.

How does this look?

gcc/ChangeLog:
* doc/extend.texi (Common Function Attributes): Document that
'access' does not imply 'nonnull'.

Signed-off-by: David Malcolm 
---
 gcc/doc/extend.texi | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a4a25e86928..539dad7001d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2652,6 +2652,14 @@ The mode is intended to be used as a means to help 
validate the expected
 object size, for example in functions that call @code{__builtin_object_size}.
 @xref{Object Size Checking}.
 
+Note that the @code{access} attribute merely specifies how an object
+referenced by the pointer argument can be accessed; it does not imply that
+an access @strong{will} happen.  Also, the @code{access} attribute does not
+imply the attribute @code{nonnull}; it may be appropriate to add both 
attributes
+at the declaration of a function that unconditionally manipulates a buffer via
+a pointer argument.  See the @code{nonnull} attribute for more information and
+caveats.
+
 @item alias ("@var{target}")
 @cindex @code{alias} function attribute
 The @code{alias} attribute causes the declaration to be emitted as an alias
-- 
2.26.3

[PATCH] c++: ICE when building builtin operator->* set [PR103455]

2022-03-25 Thread Patrick Palka via Gcc-patches

When constructing the builtin operator->* candidate set according to
the available conversion functions for each operand type, we end up
considering a candidate with C1=T (a TEMPLATE_TYPE_PARM) and C2=F,
during which we crash from lookup_base because dependent_type_p sees
a TEMPLATE_TYPE_PARM when processing_template_decl is cleared.

Sidestepping the question of whether we should be considering a
dependent conversion function here in the first place (which I'm not
sure about), it seems futile to check DERIVED_FROM_P for anything other
than an actual class type, so this patch fixes this ICE by guarding
the DERIVED_FROM_P test with CLASS_TYPE_P instead of MAYBE_CLASS_TYPE_P.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps the release branches?

PR c++/103455

gcc/cp/ChangeLog:

* call.cc (add_builtin_candidate) : Check
CLASS_TYPE_P instead of MAYBE_CLASS_TYPE_P.

gcc/testsuite/ChangeLog:

* g++.dg/overload/builtin6.C: New test.
---
 gcc/cp/call.cc   |  2 +-
 gcc/testsuite/g++.dg/overload/builtin6.C | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/overload/builtin6.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index ec6c5d5baa2..dfe370d685d 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2821,7 +2821,7 @@ add_builtin_candidate (struct z_candidate **candidates, 
enum tree_code code,
  tree c1 = TREE_TYPE (type1);
  tree c2 = TYPE_PTRMEM_CLASS_TYPE (type2);
 
- if (MAYBE_CLASS_TYPE_P (c1) && DERIVED_FROM_P (c2, c1)
+ if (CLASS_TYPE_P (c1) && DERIVED_FROM_P (c2, c1)
  && (TYPE_PTRMEMFUNC_P (type2)
  || is_complete (TYPE_PTRMEM_POINTED_TO_TYPE (type2
break;
diff --git a/gcc/testsuite/g++.dg/overload/builtin6.C 
b/gcc/testsuite/g++.dg/overload/builtin6.C
new file mode 100644
index 000..25e45040094
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/builtin6.C
@@ -0,0 +1,14 @@
+// PR c++/103455
+
+struct A { };
+
+struct B {
+  operator A*() const;
+  template operator T*() const;
+};
+
+typedef void (A::*F)();
+
+void foo(B b, F f) {
+  (b->*f)();
+}
-- 
2.35.1.655.ga68dfadae5

Re: [PATCH] c++: Fix up ICE when cplus_decl_attributes is called with error_mark_node attributes [PR104668]

2022-03-25 Thread Jason Merrill via Gcc-patches


On 3/25/22 12:34, Jakub Jelinek wrote:

Hi!

cplus_decl_attributes can be called with attributes equal to
error_mark_node, there are some spots in the function that test
it or decl_attributes it calls starts with:
   if (TREE_TYPE (*node) == error_mark_node || attributes == error_mark_node)
 return NULL_TREE;
But the recent PR104245 change broke this when processing_template_decl
is true.

This fixes it and also fixes an OpenMP problem with such attributes.

Ok for trunk if it passes bootstrap/regtest?

2022-03-25  Jakub Jelinek  

PR c++/104668
* decl2.cc (splice_template_attributes): Return NULL if *p is
error_mark_node.
(cplus_decl_attributes): Don't chain on OpenMP attributes if
attributes is error_mark_node.

* g++.dg/cpp0x/pr104668.C: New test.

--- gcc/cp/decl2.cc.jj  2022-03-09 09:09:55.415843331 +0100
+++ gcc/cp/decl2.cc 2022-03-25 17:17:27.769036749 +0100
@@ -1336,7 +1336,7 @@ splice_template_attributes (tree *attr_p
tree late_attrs = NULL_TREE;
tree *q = _attrs;
  
-  if (!p)

+  if (!p || *p == error_mark_node)
  return NULL_TREE;
  
for (; *p; )

@@ -1644,6 +1644,8 @@ cplus_decl_attributes (tree *decl, tree
  && DECL_CLASS_SCOPE_P (*decl))
error ("%q+D static data member inside of declare target directive",
   *decl);
+  else if (attributes == error_mark_node)
+   ;


Why not check at the beginning of the function?


else if (VAR_P (*decl)
   && (processing_template_decl
   || !cp_omp_mappable_type (TREE_TYPE (*decl
--- gcc/testsuite/g++.dg/cpp0x/pr104668.C.jj2022-03-25 17:25:42.280068058 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr104668.C   2022-03-25 17:24:44.862881444 
+0100
@@ -0,0 +1,13 @@
+// PR c++/104668
+// { dg-do compile { target c++11 } }
+// { dg-excess-errors "" }
+
+template 
+void sink(Ts...);
+template 
+void f(Ts...) {
+  sink([] { struct alignas:Ts) S {}; }...); }
+}
+int main() {
+  f(0);
+}

Jakub

[PATCH] PR102024 - IBM Z: Add psabi diagnostics

2022-03-25 Thread Andreas Krebbel via Gcc-patches

For IBM Z in particular there is a problem with structs like:

struct A { float a; int :0; };

Our ABI document allows passing a struct in an FPR only if it has
exactly one member. On the other hand it says that structs of 1,2,4,8
bytes are passed in a GPR. So this struct is expected to be passed in
a GPR. Since we don't return structs in registers (regardless of the
number of members) it is always returned in memory.

Situation is as follows:

All compiler versions tested return it in memory - as expected.

gcc 11, gcc 12, g++ 12, and clang 13 pass it in a GPR - as expected.

g++ 11 as well as clang++ 13 pass in an FPR

For IBM Z we stick to the current GCC 12 behavior, i.e. zero-width
bitfields are NOT ignored.  A struct as above will be passed in a
GPR. Rational behind this is that not affecting the C ABI is more
important here.

A patch for clang is in progress: https://reviews.llvm.org/D122388

In addition to the usual regression test I ran the compat and
struct-layout-1 testsuites comparing the compiler before and after the
patch.

gcc/ChangeLog:
PR target/102024
* config/s390/s390-protos.h (s390_function_arg_vector): Remove
prototype.
* config/s390/s390.cc (s390_single_field_struct_p): New function.
(s390_function_arg_vector): Invoke s390_single_field_struct_p.
(s390_function_arg_float): Likewise.

gcc/testsuite/ChangeLog:
PR target/102024
* g++.target/s390/pr102024-1.C: New test.
* g++.target/s390/pr102024-2.C: New test.
* g++.target/s390/pr102024-3.C: New test.
* g++.target/s390/pr102024-4.C: New test.
* g++.target/s390/pr102024-5.C: New test.
* g++.target/s390/pr102024-6.C: New test.
---
 gcc/config/s390/s390-protos.h  |   1 -
 gcc/config/s390/s390.cc| 212 +++--
 gcc/testsuite/g++.target/s390/pr102024-1.C |  12 ++
 gcc/testsuite/g++.target/s390/pr102024-2.C |  14 ++
 gcc/testsuite/g++.target/s390/pr102024-3.C |  15 ++
 gcc/testsuite/g++.target/s390/pr102024-4.C |  15 ++
 gcc/testsuite/g++.target/s390/pr102024-5.C |  14 ++
 gcc/testsuite/g++.target/s390/pr102024-6.C |  12 ++
 8 files changed, 195 insertions(+), 100 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/s390/pr102024-1.C
 create mode 100644 gcc/testsuite/g++.target/s390/pr102024-2.C
 create mode 100644 gcc/testsuite/g++.target/s390/pr102024-3.C
 create mode 100644 gcc/testsuite/g++.target/s390/pr102024-4.C
 create mode 100644 gcc/testsuite/g++.target/s390/pr102024-5.C
 create mode 100644 gcc/testsuite/g++.target/s390/pr102024-6.C

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index e6251595870..fd4acaae44a 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -49,7 +49,6 @@ extern void s390_function_profiler (FILE *, int);
 extern void s390_set_has_landing_pad_p (bool);
 extern bool s390_hard_regno_rename_ok (unsigned int, unsigned int);
 extern int s390_class_max_nregs (enum reg_class, machine_mode);
-extern bool s390_function_arg_vector (machine_mode, const_tree);
 extern bool s390_return_addr_from_memory(void);
 extern bool s390_fma_allowed_p (machine_mode);
 #if S390_USE_TARGET_ATTRIBUTE
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index d2af6d8813d..6cfa586b9cd 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -12148,29 +12148,29 @@ s390_function_arg_size (machine_mode mode, const_tree 
type)
   gcc_unreachable ();
 }
 
-/* Return true if a function argument of type TYPE and mode MODE
-   is to be passed in a vector register, if available.  */
-
-bool
-s390_function_arg_vector (machine_mode mode, const_tree type)
+/* Return true if a variable of TYPE should be passed as single value
+   with type CODE. If STRICT_SIZE_CHECK_P is true the sizes of the
+   record type and the field type must match.
+
+   The ABI says that record types with a single member are treated
+   just like that member would be.  This function is a helper to
+   detect such cases.  The function also produces the proper
+   diagnostics for cases where the outcome might be different
+   depending on the GCC version.  */
+static bool
+s390_single_field_struct_p (enum tree_code code, const_tree type,
+   bool strict_size_check_p)
 {
-  if (!TARGET_VX_ABI)
-return false;
-
-  if (s390_function_arg_size (mode, type) > 16)
-return false;
-
-  /* No type info available for some library calls ...  */
-  if (!type)
-return VECTOR_MODE_P (mode);
-
-  /* The ABI says that record types with a single member are treated
- just like that member would be.  */
   int empty_base_seen = 0;
+  bool zero_width_bf_seen_p = false;
   const_tree orig_type = type;
+  bool single_p = true;
+
   while (TREE_CODE (type) == RECORD_TYPE)
 {
-  tree field, single = NULL_TREE;
+  tree field, single_type = NULL_TREE;
+  int num_zero_width_bf_seen = 0;
+  int num_fields_seen = 0;

Re: [PATCH] arm: Revert Auto-vectorization for MVE: add pack/unpack patterns PR target/104882

2022-03-25 Thread Christophe Lyon via Gcc-patches





On 3/25/22 11:42, Jakub Jelinek wrote:

On Tue, Mar 22, 2022 at 03:33:44PM +0100, Christophe Lyon via Gcc-patches wrote:

This reverts commit r12-1434-g046a3beb1673bf to fix PR target/104882.

As discussed in the PR, it turns out that the MVE ISA has no natural
mapping with GCC's vec_pack_trunc / vec_unpack standard patterns, unlike
Neon or SVE for instance.

This patch also adds the executable testcase provided in the PR.
This test passes at -O3 because the generated code does not need
to use the pack/unpack patterns, hence the use of -O2 which now
triggers vectorization since a few months ago.


For reverting your own patches you don't need to wait for approval:
https://gcc.gnu.org/gitwrite.html
"Similarly, no outside approval is needed to revert a patch that you checked 
in."

The new test LGTM.


Thanks, just pushed as r12-7414-g1027dc45920489.

Christophe




2022-03-18  Christophe Lyon  

PR target/104882
Revert
2021-06-11  Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vec_unpack_lo_): Delete.
(mve_vec_unpack_hi_): Delete.
(@mve_vec_pack_trunc_lo_): Delete.
(mve_vmovntq_): Remove '@' prefix.
* config/arm/neon.md (vec_unpack_hi_): Move back
from vec-common.md.
(vec_unpack_lo_): Likewise.
(vec_pack_trunc_): Rename from
neon_quad_vec_pack_trunc_.
* config/arm/vec-common.md (vec_unpack_hi_): Delete.
(vec_unpack_lo_): Delete.
(vec_pack_trunc_): Delete.

PR target/104882
gcc/testsuite/
* gcc.target/arm/simd/mve-vclz.c: Update expected results.
* gcc.target/arm/simd/mve-vshl.c: Likewise.
* gcc.target/arm/simd/mve-vec-pack.c: Delete.
* gcc.target/arm/simd/mve-vec-unpack.c: Delete.
* gcc.target/arm/simd/pr104882.c: New test.


Jakub

[PATCH] x86: Use x constraint on SSSE3 patterns with MMX operands

2022-03-25 Thread H.J. Lu via Gcc-patches

Since PHADDW/PHADDD/PHADDSW/PHSUBW/PHSUBD/PHSUBSW/PSIGNB/PSIGNW/PSIGND
have no AVX512 version, replace the "Yv" register constraint with the
"x" register constraint.

PR target/105052
* config/i386/sse.md (ssse3_phwv4hi3):
Replace "Yv" with "x".
(ssse3_phdv2si3): Likewise.
(ssse3_psign3): Likewise.
---
 gcc/config/i386/sse.md | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6f7af2f21d6..aae29cd462f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -20112,12 +20112,12 @@ (define_insn "ssse3_phwv8hi3"
(set_attr "mode" "TI")])
 
 (define_insn_and_split "ssse3_phwv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,x")
(ssse3_plusminus:V4HI
  (vec_select:V4HI
(vec_concat:V8HI
- (match_operand:V4HI 1 "register_operand" "0,0,Yv")
- (match_operand:V4HI 2 "register_mmxmem_operand" "ym,x,Yv"))
+ (match_operand:V4HI 1 "register_operand" "0,0,x")
+ (match_operand:V4HI 2 "register_mmxmem_operand" "ym,x,x"))
(parallel
  [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))
  (vec_select:V4HI
@@ -20199,12 +20199,12 @@ (define_insn "ssse3_phdv4si3"
(set_attr "mode" "TI")])
 
 (define_insn_and_split "ssse3_phdv2si3"
-  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,x")
(plusminus:V2SI
  (vec_select:V2SI
(vec_concat:V4SI
- (match_operand:V2SI 1 "register_operand" "0,0,Yv")
- (match_operand:V2SI 2 "register_mmxmem_operand" "ym,x,Yv"))
+ (match_operand:V2SI 1 "register_operand" "0,0,x")
+ (match_operand:V2SI 2 "register_mmxmem_operand" "ym,x,x"))
(parallel [(const_int 0) (const_int 2)]))
  (vec_select:V2SI
(vec_concat:V4SI (match_dup 1) (match_dup 2))
@@ -20702,10 +20702,10 @@ (define_insn "_psign3"
(set_attr "mode" "")])
 
 (define_insn "ssse3_psign3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,x")
(unspec:MMXMODEI
- [(match_operand:MMXMODEI 1 "register_operand" "0,0,Yv")
-  (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")]
+ [(match_operand:MMXMODEI 1 "register_operand" "0,0,x")
+  (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,x")]
  UNSPEC_PSIGN))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
   "@
-- 
2.35.1

Re: [PATCH] recog: Return 1 from insn_invalid_p if REG_INC reg overlaps some stored reg [PR103775]

2022-03-25 Thread Jeff Law via Gcc-patches





On 3/25/2022 4:17 AM, Jakub Jelinek wrote:

Hi!

The following testcase ICEs on aarch64-linux with -g and
assembles with a warning otherwise, because it emits
ldrb w0,[x0,16]!
instruction which sets the x0 register multiple times.
Due to disabled DCE (from -Og) we end up before REE with:
(insn 12 39 13 2 (set (reg:SI 1 x1 [orig:93 _2 ] [93])
 (zero_extend:SI (mem/c:QI (pre_modify:DI (reg/f:DI 0 x0 [114])
 (plus:DI (reg/f:DI 0 x0 [114])
 (const_int 16 [0x10]))) [1 u128_1+0 S1 A128]))) 
"pr103775.c":5:35 117 {*zero_extendqisi2_aarch64}
  (expr_list:REG_INC (reg/f:DI 0 x0 [114])
 (nil)))
(insn 13 12 14 2 (set (reg:DI 0 x0 [orig:112 _2 ] [112])
 (zero_extend:DI (reg:SI 1 x1 [orig:93 _2 ] [93]))) "pr103775.c":5:16 
111 {*zero_extendsidi2_aarch64}
  (nil))
which is valid but not exactly efficient as x0 is dead after the
insn that auto-increments it.  REE turns it into:
(insn 12 39 44 2 (set (reg:DI 0 x0)
 (zero_extend:DI (mem/c:QI (pre_modify:DI (reg/f:DI 0 x0 [114])
 (plus:DI (reg/f:DI 0 x0 [114])
 (const_int 16 [0x10]))) [1 u128_1+0 S1 A128]))) 
"pr103775.c":5:35 119 {*zero_extendqidi2_aarch64}
  (expr_list:REG_INC (reg/f:DI 0 x0 [114])
 (nil)))
(insn 44 12 14 2 (set (reg:DI 1 x1)
 (reg:DI 0 x0)) "pr103775.c":5:35 -1
  (nil))
which is invalid because it sets x0 multiple times, one
in SET_DEST of the PATTERN and once in PRE_MODIFY.
As perhaps other passes than REE might suffer from it, IMHO it is better
to reject this during change validation.

Below is one patch that does that only if reload_completed, attached
is another version that does it always even before reload.

I've so far bootstrapped/regtested the first patch on
{x86_64,i686,powerpc64le,armv7hl}-linux, aarch64-linux regtest
is still pending.
If you prefer the second version, I can start testing it momentarily.

2022-03-25  Jakub Jelinek

PR rtl-optimization/103775
* recog.cc (check_invalid_inc_dec): New function.
(insn_invalid_p): Return 1 if REG_INC operand overlaps
any stored REGs.

* gcc.dg/pr103775.c: New test.
OK.  I bet this could also be extended to catch the case where the 
autoinc'd operand is used as a source elsewhere in the insn.  That's 
supposed to be avoided in RTL, but we don't check it and I've seen it 
sneak in (see PR 101697):



(insn 1444 1443 164 31 (parallel [
 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
 (reg/f:SI 7 sp))
 (clobber (reg:CC 12 cc))
 ]) "libc/inet/getaddrinfo.c":466:11 -1
  (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
 (nil)))
While this has well defined semantics on the H8 RTL considers it 
ill-defined, but never checks for it.



If a register used as the operand of these expressions is used in
another address in an insn, the original value of the register is used.
Uses of the register outside of an address are not permitted within the
same insn as a use in an embedded side effect expression because such
insns behave differently on different machines and hence must be treated
as ambiguous and disallowed.



Jeff

Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-25 Thread Andre Vieira (lists) via Gcc-patches


Hi,

Addressed all of your comments bar the pred ops one.

Is this OK?


gcc/ChangeLog:

    * config/aarch64/aarch64.cc (aarch64_vector_costs): Define 
determine_suggested_unroll_factor and m_nosve_pattern.

    (determine_suggested_unroll_factor): New function.
    (aarch64_vector_costs::add_stmt_cost): Check for a qualifying 
pattern

    to set m_nosve_pattern.
    (aarch64_vector_costs::finish_costs): Use 
determine_suggested_unroll_factor.

    * config/aarch64/aarch64.opt (aarch64-vect-unroll-limit): New.

On 16/03/2022 18:01, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

Hi,

This patch implements the costing function
determine_suggested_unroll_factor for aarch64.
It determines the unrolling factor by dividing the number of X
operations we can do per cycle by the number of X operations in the loop
body, taking this information from the vec_ops analysis during vector
costing and the available issue_info information.
We multiply the dividend by a potential reduction_latency, to improve
our pipeline utilization if we are stalled waiting on a particular
reduction operation.

Right now we also have a work around for vectorization choices where the
main loop uses a NEON mode and predication is available, such that if
the main loop makes use of a NEON pattern that is not directly supported
by SVE we do not unroll, as that might cause performance regressions in
cases where we would enter the original main loop's VF. As an example if
you have a loop where you could use AVG_CEIL with a V8HI mode, you would
originally get 8x NEON using AVG_CEIL followed by a 8x SVE predicated
epilogue, using other instructions. Whereas with the unrolling you would
end up with 16x AVG_CEIL NEON + 8x SVE predicated loop, thus skipping
the original 8x NEON. In the future, we could handle this differently,
by either using a different costing model for epilogues, or potentially
vectorizing more than one single epilogue.

gcc/ChangeLog:

      * config/aarch64/aarch64.cc (aarch64_vector_costs): Define
determine_suggested_unroll_factor.
      (determine_suggested_unroll_factor): New function.
      (aarch64_vector_costs::finish_costs): Use
determine_suggested_unroll_factor.

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b5687aab59f630920e51b742b80a540c3a56c6c8..9d3a607d378d6a2792efa7c6dece2a65c24e4521
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -15680,6 +15680,7 @@ private:
unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
 unsigned int);
bool prefer_unrolled_loop () const;
+  unsigned int determine_suggested_unroll_factor ();
  
/* True if we have performed one-time initialization based on the

   vec_info.  */
@@ -16768,6 +16769,105 @@ adjust_body_cost_sve (const aarch64_vec_op_count *ops,
return sve_cycles_per_iter;
  }
  
+unsigned int

+aarch64_vector_costs::determine_suggested_unroll_factor ()
+{
+  auto *issue_info = aarch64_tune_params.vec_costs->issue_info;
+  if (!issue_info)
+return 1;
+  bool sve = false;
+  if (aarch64_sve_mode_p (m_vinfo->vector_mode))

Other code uses m_vec_flags & VEC_ANY_SVE for this.


+{
+  if (!issue_info->sve)
+   return 1;
+  sve = true;
+}
+  else
+{
+  if (!issue_info->advsimd)
+   return 1;

The issue info should instead be taken from vec_ops.simd_issue_info ()
in the loop below.  It can vary for each entry.


+  /* If we are trying to unroll a NEON main loop that contains patterns

s/a NEON/an Advanced SIMD/


+that we do not support with SVE and we might use a predicated
+epilogue, we need to be conservative and block unrolling as this might
+lead to a less optimal loop for the first and only epilogue using the
+original loop's vectorization factor.
+TODO: Remove this constraint when we add support for multiple epilogue
+vectorization.  */
+  if (partial_vectors_supported_p ()
+ && param_vect_partial_vector_usage != 0
+ && !TARGET_SVE2)
+   {
+ unsigned int i;
+ stmt_vec_info stmt_vinfo;
+ FOR_EACH_VEC_ELT (m_vinfo->stmt_vec_infos, i, stmt_vinfo)
+   {
+ if (is_pattern_stmt_p (stmt_vinfo))
+   {
+ gimple *stmt = stmt_vinfo->stmt;
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt))
+   {
+ enum internal_fn ifn
+   = gimple_call_internal_fn (stmt);
+ switch (ifn)
+   {
+   case IFN_AVG_FLOOR:
+   case IFN_AVG_CEIL:
+ return 1;
+   default:
+ break;
+   }
+   }
+   }
+   }
+   }

I think we should instead

[PATCH] c++: Fix up ICE when cplus_decl_attributes is called with error_mark_node attributes [PR104668]

2022-03-25 Thread Jakub Jelinek via Gcc-patches

Hi!

cplus_decl_attributes can be called with attributes equal to
error_mark_node, there are some spots in the function that test
it or decl_attributes it calls starts with:
  if (TREE_TYPE (*node) == error_mark_node || attributes == error_mark_node)
return NULL_TREE;
But the recent PR104245 change broke this when processing_template_decl
is true.

This fixes it and also fixes an OpenMP problem with such attributes.

Ok for trunk if it passes bootstrap/regtest?

2022-03-25  Jakub Jelinek  

PR c++/104668
* decl2.cc (splice_template_attributes): Return NULL if *p is
error_mark_node.
(cplus_decl_attributes): Don't chain on OpenMP attributes if
attributes is error_mark_node.

* g++.dg/cpp0x/pr104668.C: New test.

--- gcc/cp/decl2.cc.jj  2022-03-09 09:09:55.415843331 +0100
+++ gcc/cp/decl2.cc 2022-03-25 17:17:27.769036749 +0100
@@ -1336,7 +1336,7 @@ splice_template_attributes (tree *attr_p
   tree late_attrs = NULL_TREE;
   tree *q = _attrs;
 
-  if (!p)
+  if (!p || *p == error_mark_node)
 return NULL_TREE;
 
   for (; *p; )
@@ -1644,6 +1644,8 @@ cplus_decl_attributes (tree *decl, tree
  && DECL_CLASS_SCOPE_P (*decl))
error ("%q+D static data member inside of declare target directive",
   *decl);
+  else if (attributes == error_mark_node)
+   ;
   else if (VAR_P (*decl)
   && (processing_template_decl
   || !cp_omp_mappable_type (TREE_TYPE (*decl
--- gcc/testsuite/g++.dg/cpp0x/pr104668.C.jj2022-03-25 17:25:42.280068058 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr104668.C   2022-03-25 17:24:44.862881444 
+0100
@@ -0,0 +1,13 @@
+// PR c++/104668
+// { dg-do compile { target c++11 } }
+// { dg-excess-errors "" }
+
+template 
+void sink(Ts...);
+template 
+void f(Ts...) {
+  sink([] { struct alignas:Ts) S {}; }...); }
+}
+int main() {
+  f(0);
+}

Jakub

[committed] [PR104971] LRA: check live hard regs to remove a dead insn

2022-03-25 Thread Vladimir Makarov via Gcc-patches


The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

The PR was already fixed by Jakub but his patch did not fix a latent LRA 
bug mentioned in the PR comments.  The current patch fixes the latent bug.


The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit 33904327c92bd914d4e0e076be12dc0a6b453c2d
Author: Vladimir N. Makarov 
Date:   Fri Mar 25 12:22:08 2022 -0400

[PR104971] LRA: check live hard regs to remove a dead insn

LRA removes insn modifying sp for given PR test set.  We should also have
checked living hard regs to prevent this.  The patch fixes this.

gcc/ChangeLog:

PR middle-end/104971
* lra-lives.cc (process_bb_lives): Check hard_regs_live for hard
regs to clear remove_p flag.

diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index 796f00629b4..a755464ee81 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -724,7 +724,10 @@ process_bb_lives (basic_block bb, int _point, bool dead_insn_p)
 	  bool remove_p = true;
 
 	  for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN && sparseset_bit_p (pseudos_live, reg->regno))
+	if (reg->type != OP_IN
+		&& (reg->regno < FIRST_PSEUDO_REGISTER
+		? TEST_HARD_REG_BIT (hard_regs_live, reg->regno)
+		: sparseset_bit_p (pseudos_live, reg->regno)))
 	  {
 		remove_p = false;
 		break;

Re: [PATCH] c++: diagnosing if-stmt with non-constant branches [PR105050]

2022-03-25 Thread Marek Polacek via Gcc-patches

On Fri, Mar 25, 2022 at 12:07:31PM -0400, Patrick Palka via Gcc-patches wrote:
> When an if-stmt is deemed non-constant because both of its branches are
> non-constant, we issue a rather generic error which, given that it points
> to the 'if' token, misleadingly suggests the condition is at fault:
> 
>   constexpr-105050.C:8:3: error: expression ‘’ is not a constant 
> expression
>   8 |   if (p != q && *p < 0)
> |   ^~
> 
> This patch clarifies the error message to read:
> 
>   constexpr-105050.C:8:3: error: neither branch of ‘if’ is a constant 
> expression
>   8 |   if (p != q && *p < 0)
> |   ^~
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?

LGTM.
 
>   PR c++/105050
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (potential_constant_expression_1) :
>   Clarify error message when a if-stmt is non-constant because its
>   branches are non-constant.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1y/constexpr-105050.C: New test.
> ---
>  gcc/cp/constexpr.cc   |  7 ++-
>  gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C | 12 
>  2 files changed, 18 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index 778680b8270..9c40b051574 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -9439,7 +9439,12 @@ potential_constant_expression_1 (tree t, bool 
> want_rval, bool strict, bool now,
>   }
>   }
>if (flags & tf_error)
> - error_at (loc, "expression %qE is not a constant expression", t);
> + {
> +   if (TREE_CODE (t) == IF_STMT)
> + error_at (loc, "neither branch of % is a constant expression");
> +   else
> + error_at (loc, "expression %qE is not a constant expression", t);
> + }
>return false;
>  
>  case VEC_INIT_EXPR:
> diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C 
> b/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C
> new file mode 100644
> index 000..99d5c9960ac
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C
> @@ -0,0 +1,12 @@
> +// PR c++/105050
> +// { dg-do compile { target c++14 } }
> +
> +void g();
> +void h();
> +
> +constexpr void f(int* p, int *q) {
> +  if (p != q && *p < 0) // { dg-error "neither branch of 'if' is a constant 
> expression" }
> +g();
> +  else
> +h();
> +}
> -- 
> 2.35.1.655.ga68dfadae5
> 

Marek

[PATCH] c++: diagnosing if-stmt with non-constant branches [PR105050]

2022-03-25 Thread Patrick Palka via Gcc-patches

When an if-stmt is deemed non-constant because both of its branches are
non-constant, we issue a rather generic error which, given that it points
to the 'if' token, misleadingly suggests the condition is at fault:

  constexpr-105050.C:8:3: error: expression ‘’ is not a constant 
expression
  8 |   if (p != q && *p < 0)
|   ^~

This patch clarifies the error message to read:

  constexpr-105050.C:8:3: error: neither branch of ‘if’ is a constant expression
  8 |   if (p != q && *p < 0)
|   ^~

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/105050

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) :
Clarify error message when a if-stmt is non-constant because its
branches are non-constant.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-105050.C: New test.
---
 gcc/cp/constexpr.cc   |  7 ++-
 gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C | 12 
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 778680b8270..9c40b051574 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9439,7 +9439,12 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
}
}
   if (flags & tf_error)
-   error_at (loc, "expression %qE is not a constant expression", t);
+   {
+ if (TREE_CODE (t) == IF_STMT)
+   error_at (loc, "neither branch of % is a constant expression");
+ else
+   error_at (loc, "expression %qE is not a constant expression", t);
+   }
   return false;
 
 case VEC_INIT_EXPR:
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C
new file mode 100644
index 000..99d5c9960ac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-105050.C
@@ -0,0 +1,12 @@
+// PR c++/105050
+// { dg-do compile { target c++14 } }
+
+void g();
+void h();
+
+constexpr void f(int* p, int *q) {
+  if (p != q && *p < 0) // { dg-error "neither branch of 'if' is a constant 
expression" }
+g();
+  else
+h();
+}
-- 
2.35.1.655.ga68dfadae5

[PATCH] c++: Fix up __builtin_{bit_cast,convertvector} parsing

2022-03-25 Thread Jakub Jelinek via Gcc-patches

Hi!

Jonathan reported on IRC that we don't parse
__builtin_bit_cast (type, val).field
etc.
The problem is that for these 2 builtins we return from
cp_parser_postfix_expression instead of setting postfix_expression
to the cp_build_* value and falling through into the postfix regression
suffix handling loop.

Ok for trunk if it passes bootstrap/regtest?

2022-03-25  Jakub Jelinek  

* parser.cc (cp_parser_postfix_expression)
: Don't
return cp_build_{vec,convert,bit_cast} result right away, instead
set postfix_expression to it and break.

* c-c++-common/builtin-convertvector-3.c: New test.
* g++.dg/cpp2a/bit-cast15.C: New test.

--- gcc/cp/parser.cc.jj 2022-03-15 09:15:21.366108714 +0100
+++ gcc/cp/parser.cc2022-03-25 16:04:21.464248103 +0100
@@ -7525,8 +7525,10 @@ cp_parser_postfix_expression (cp_parser
}
/* Look for the closing `)'.  */
parens.require_close (parser);
-   return cp_build_vec_convert (expression, type_location, type,
-tf_warning_or_error);
+   postfix_expression
+ = cp_build_vec_convert (expression, type_location, type,
+ tf_warning_or_error);
+   break;
   }
 
 case RID_BUILTIN_BIT_CAST:
@@ -7551,8 +7553,10 @@ cp_parser_postfix_expression (cp_parser
expression = cp_parser_assignment_expression (parser);
/* Look for the closing `)'.  */
parens.require_close (parser);
-   return cp_build_bit_cast (type_location, type, expression,
- tf_warning_or_error);
+   postfix_expression
+ = cp_build_bit_cast (type_location, type, expression,
+  tf_warning_or_error);
+   break;
   }
 
 default:
--- gcc/testsuite/c-c++-common/builtin-convertvector-3.c.jj 2022-03-25 
16:23:18.033120090 +0100
+++ gcc/testsuite/c-c++-common/builtin-convertvector-3.c2022-03-25 
16:23:40.633799410 +0100
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v4si __attribute__((vector_size (4 * sizeof (int;
+typedef double v4df __attribute__((vector_size (4 * sizeof (double;
+double
+foo (void)
+{
+  v4si a = { 1, 2, 3, 4 };
+  return __builtin_convertvector (a, v4df)[1];
+}
--- gcc/testsuite/g++.dg/cpp2a/bit-cast15.C.jj  2022-03-25 16:26:22.271505979 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/bit-cast15.C 2022-03-25 16:26:15.907596274 
+0100
@@ -0,0 +1,19 @@
+// { dg-do compile }
+
+struct S { short a, b; };
+struct T { float a[4]; };
+struct U { int b[4]; };
+
+#if __SIZEOF_FLOAT__ == __SIZEOF_INT__
+int
+f1 (T )
+{
+  return __builtin_bit_cast (U, x).b[1];
+}
+
+float
+f2 (int ()[4])
+{
+  return __builtin_bit_cast (T, x).a[2];
+}
+#endif

Jakub

[PATCH][GCC 13] Enable match.pd dumping with -fdump-tree-original

2022-03-25 Thread Alex Coplan via Gcc-patches

Hi,

I noticed that, while the C/C++ frontends invoke the GENERIC match.pd
simplifications to do early folding, the debug output from
generic-match.cc does not appear in the -fdump-tree-original output,
even with -fdump-tree-original-folding or -fdump-tree-original-all. This
patch fixes that.

For example, before the patch, for the following code:

int a[2];
void bar ();
void f()
{
if ((unsigned long)(a + 1) == 0)
bar ();
}

on AArch64 at -O0, -fdump-tree-original-all would give:

;; Function f (null)
;; enabled by -tree-original


{
  if (0)
{
  bar ();
}
}

After the patch, we get:

Applying pattern match.pd:3774, generic-match.cc:24535
Matching expression match.pd:146, generic-match.cc:23
Applying pattern match.pd:5638, generic-match.cc:13388

;; Function f (null)
;; enabled by -tree-original


{
  if (0)
{
  bar ();
}
}

The reason we don't get the match.pd output as it stands, is that the
original dump is treated specially in c-opts.cc: it gets its own state
which is independent from that used by other dump files in the compiler.
Like most of the compiler, the generated generic-match.cc has code of
the form:

  if (dump_file && (dump_flags & TDF_FOLDING))
fprintf (dump_file, ...);

But, as it stands, -fdump-tree-original has its own FILE * and flags in
c-opts.cc (original_dump_{file,flags}) and never touches the global
dump_{file,flags} (managed by dumpfile.{h,cc}). This patch adjusts the
code in c-opts.cc to use the main dump infrastructure used by the rest
of the compiler, instead of treating the original dump specially.

We take the opportunity to make a small refactor: the code in
c-gimplify.cc:c_genericize can, with this change, use the global dump
infrastructure to get the original dump file and flags instead of using
the bespoke get_dump_info function implemented in c-opts.cc. With this
change, we remove the only use of get_dump_info, so this can be removed.

Note that we also fix a leak of the original dump file in
c_common_parse_file. I originally thought it might be possible to
achieve this with only one static call to dump_finish () (by simply
moving it earlier in the loop), but unfortunately the dump file is
required to be open while c_parse_final_cleanups runs, as we (e.g.)
perform some template instantiations here for C++, which need to appear
in the original dump file.

We adjust cgraph_node::get_create to avoid introducing noise in the
original dump file: without this, these "Introduced new external node"
lines start appearing in the original dump files, which breaks tests
that do a scan-tree-dump-times on the original dump looking for a
certain function name.

Bootstrapped/regtested on aarch64-linux-gnu, OK for GCC 13?

Thanks,
Alex

gcc/c-family/ChangeLog:

* c-common.h (get_dump_info): Delete.
* c-gimplify.cc (c_genericize): Get TDI_original dump file info
from the global dump_manager instead of the (now obsolete)
get_dump_info.
* c-opts.cc (original_dump_file): Delete.
(original_dump_flags): Delete.
(c_common_parse_file): Switch to using global dump_manager to
manage the original dump file; fix leak of dump file.
(get_dump_info): Delete.

gcc/ChangeLog:

* cgraph.cc (cgraph_node::get_create): Don't dump if the current
dump file is that of -fdump-tree-original.
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 52a85bfb783..b829cdbfe28 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -950,7 +950,6 @@ extern bool c_common_post_options (const char **);
 extern bool c_common_init (void);
 extern void c_common_finish (void);
 extern void c_common_parse_file (void);
-extern FILE *get_dump_info (int, dump_flags_t *);
 extern alias_set_type c_common_get_alias_set (tree);
 extern void c_register_builtin_type (tree, const char*);
 extern bool c_promoting_integer_type_p (const_tree);
diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index a00b0a02dcc..a6f26c9b0d3 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "c-ubsan.h"
 #include "tree-nested.h"
+#include "context.h"
 
 /*  The gimplification pass converts the language-dependent trees
 (ld-trees) emitted by the parser into language-independent trees
@@ -552,6 +553,7 @@ c_genericize_control_r (tree *stmt_p, int *walk_subtrees, 
void *data)
 void
 c_genericize (tree fndecl)
 {
+  dump_file_info *dfi;
   FILE *dump_orig;
   dump_flags_t local_dump_flags;
   struct cgraph_node *cgn;
@@ -581,7 +583,9 @@ c_genericize (tree fndecl)
  do_warn_duplicated_branches_r, NULL);
 
   /* Dump the C-specific tree IR.  */
-  dump_orig = get_dump_info (TDI_original, _dump_flags);
+  dfi = g->get_dumps ()->get_dump_file_info (TDI_original);
+  dump_orig = dfi->pstream;
+  local_dump_flags = dfi->pflags;
   if

[PATCH] tree-optimization/105053 - fix reduction chain epilogue generation

2022-03-25 Thread Richard Biener via Gcc-patches

When we optimize permutations in a reduction chain we have to
be careful to select the correct live-out stmt, otherwise the
reduction result will be unused and the retained scalar code will
execute only the number of vector iterations.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.

2022-03-25  Richard Biener  

PR tree-optimization/105053
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Pick
the correct live-out stmt for a reduction chain.

* g++.dg/vect/pr105053.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr105053.cc | 25 +
 gcc/tree-vect-loop.cc | 14 +++---
 2 files changed, 36 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr105053.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr105053.cc 
b/gcc/testsuite/g++.dg/vect/pr105053.cc
new file mode 100644
index 000..6deef8458fc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr105053.cc
@@ -0,0 +1,25 @@
+// { dg-require-effective-target c++11 }
+// { dg-require-effective-target int32plus }
+
+#include 
+#include 
+#include 
+
+int main()
+{
+  const int n = 4;
+  std::vector> vec
+  = { { 1597201307, 1817606674, 0. },
+{ 1380347796, 1721941769, 0.},
+{837975613, 1032707773, 0.},
+{1173654292, 2020064272, 0.} } ;
+  int sup1 = 0;
+  for(int i=0;i(vec[i]),std::get<1>(vec[i])));
+  int sup2 = 0;
+  for(int i=0;i(vec[i])),std::get<1>(vec[i]));
+  if (sup1 != sup2)
+std::abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7a74633e0b4..d7bc34636bd 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5271,9 +5271,17 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 /* All statements produce live-out values.  */
 live_out_stmts = SLP_TREE_SCALAR_STMTS (slp_node);
   else if (slp_node)
-/* The last statement in the reduction chain produces the live-out
-   value.  */
-single_live_out_stmt[0] = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1];
+{
+  /* The last statement in the reduction chain produces the live-out
+value.  Note SLP optimization can shuffle scalar stmts to
+optimize permutations so we have to search for the last stmt.  */
+  for (k = 0; k < group_size; ++k)
+   if (!REDUC_GROUP_NEXT_ELEMENT (SLP_TREE_SCALAR_STMTS (slp_node)[k]))
+ {
+   single_live_out_stmt[0] = SLP_TREE_SCALAR_STMTS (slp_node)[k];
+   break;
+ }
+}
 
   unsigned vec_num;
   int ncopies;
-- 
2.34.1

Re: [PATCH] middle-end/104854: Avoid overread warning for strnlen and strndup

2022-03-25 Thread Siddhesh Poyarekar


On 10/03/2022 06:09, Siddhesh Poyarekar wrote:

The size argument larger than size of SRC for strnlen and strndup is
problematic only if SRC is not NULL terminated, which invokes undefined
behaviour.  In all other cases, as long as SRC is large enough to have a
NULL char (i.e. size 1 or more), a larger N should not invoke a warning
during compilation.

Such a warning may be a suitable check for the static analyzer instead
with slightly different wording suggesting that choice of size argument
makes the function call equivalent to strlen/strdup.


This fix is too aggressive, I need to take another pass at this once 
stage 1 opens.


Siddhesh

Re: [committed] Docs: Document that taint analyzer checker disables some warnings [PR103533]

2022-03-25 Thread David Malcolm via Gcc-patches

On Fri, 2022-03-25 at 10:58 +0100, Tobias Burnus wrote:
> This commit broke bootstrapping - well, at least kind of:
>    make pdf   (and probably 'make dvi' and other formats using TeX)
> now fails with:
> > Runaway argument?
> > 
> > -Wanalyzer-tainted-allocation-size @gol 
> > -Wanalyzer-tainted-array-inde@ETC.
> > src/gcc-mainline/gcc/doc/invoke.texi:96
> > 82: File ended while scanning use of @doignoretext.
> The problem seems to be that when producing TeX output, the '...' of
>    @gccoptlist{ ... }
> ends up in a single line. Thus, '@ignore' is found but no
> corresponding '@end ignore'.
> 
> Using '@c' has the same problem. And adding a line break will give
> an error as  @gccoptlist  does not permit another paragraph in
> the argument.
> 
> My solution is move the '@ignore' comment after the closing '}'.
> 
> Committed attached patch as obvious
> as r12-7810-g748f36a48b506f52e10bcdeb750a7fe9c30c26f3 

Sorry about the breakage; thanks for fixing it.

Dave

Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-25 Thread Siddhesh Poyarekar


On 25/03/2022 18:56, Jason Merrill via Gcc-patches wrote:
Perhaps a suitable compromise would be to add a separate warning flag 
specifically for the strn* warnings, so users deliberately using the 
bound to express a limit other than the length of the argument string 
(and confident that their strings are always NUL-terminated) can turn 
them off without turning off all the overread warnings.


For strncmp (in cases where NUL termination cannot be proven) that is 
perhaps a reasonable compromise.  However I think I need to take a 
closer look to figure out if there are other ways to work around this, 
especially since discovering that I had misread the previous report.


I take back this patch and will revisit this a bit later, probably once 
stage 1 opens.


Thanks,
Siddhesh

Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

2022-03-25 Thread Hongtao Liu via Gcc-patches

On Fri, Mar 25, 2022 at 9:42 PM Richard Biener  wrote:
>
> On Fri, 25 Mar 2022, Hongtao Liu wrote:
>
> > On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > Since we're now vectorizing by default at -O2 issues like PR101908
> > > become more important where we apply basic-block vectorization to
> > > parts of the function covering loads from function parameters passed
> > > on the stack.  Since we have no good idea how the stack pushing
> > > was performed but we do have a good idea that it was done recent
> > > when at the start of the function a STLF failure is inevitable
> > > if the argument passing code didn't use the same or larger vector
> > > stores and also has them aligned the same way.
> > >
> > > Until there's a robust IPA based solution the following implements
> > > target independent heuristics in the vectorizer to retain scalar
> > > loads for loads from parameters likely passed in memory (I use
> > > a BLKmode DECL_MODE check for this rather than firing up
> > > cummulative-args).  I've restricted this also to loads from the
> > > first "block" (that can be less than the first basic block if there's
> > > a call for example), since that covers the testcase.
> > I  prefer this patch to the md-reorg way.
>
> I was mostly posting the variant because it is target agnostic.  In
> general I agree that mitigation is best done at the RTL level after
> scheduling/RA so that the instruction sequence from function entry
> is visible.
>
> Did you have any success in coding this up yet?  Do you think it
Not yet.
> can be done in a way to be re-used by multiple targets?
>
> > Can vectorizer similarly handle by-reference passed arguments if their
> > corresponding real parameters are local variables of the caller?
> > .i.e. we can also prevent vectorization for the below case.
> >
> > struct vec3 {
> >   double x, y, z;
> > };
> >
> > struct ray {
> >  struct vec3 orig, dir;
> > };
> >
> > void
> > __attribute__((noinline))
> > ray_sphere (struct ray* __restrict ray, double *__restrict res)
> > {
> >   res[0] = ray->orig.y * ray->dir.x;
> >   res[1] = ray->orig.z * ray->dir.y;
> > }
> >
> > extern struct ray g;
> > void bar (double* res)
> > {
> > struct ray tmp = g;
> > ray_sphere (, res);
> > }
>
> Without IPA analysis this will be hard, if not impossible.
>
> Richard.
>
> > >
> > > Note that for the testcase (but not c-ray from the bugreport) there's
> > > a x86 peephole2 that vectorizes things back, so the patch is
> > > not effective there.
> > >
> > > Any comments?  I know we're also looking at x86 port specific
> > > mitigations but the issue will hit arm and power/z as well I think.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2022-03-25  Richard Biener  
> > >
> > > PR tree-optimization/101908
> > > * tree-vect-stmts.cc (get_group_load_store_type): Add
> > > heuristic to limit BB vectorization of function parameters.
> > >
> > > * gcc.dg/vect/bb-slp-pr101908.c: New testcase.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++
> > >  gcc/tree-vect-stmts.cc  | 27 -
> > >  2 files changed, 44 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c 
> > > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > > new file mode 100644
> > > index 000..b7534a18f0e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_double } */
> > > +
> > > +struct vec3 {
> > > +  double x, y, z;
> > > +};
> > > +
> > > +struct ray {
> > > +  struct vec3 orig, dir;
> > > +};
> > > +
> > > +void ray_sphere (struct ray ray, double *res)
> > > +{
> > > +  res[0] = ray.orig.y * ray.dir.x;
> > > +  res[1] = ray.orig.z * ray.dir.y;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index f7449a79d1c..1e37e9678b6 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, 
> > > stmt_vec_info stmt_info,
> > >/* Stores can't yet have gaps.  */
> > >gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0);
> > >
> > > -  if (slp_node)
> > > +  tree parm;
> > > +  if (!loop_vinfo
> > > +  && vls_type == VLS_LOAD
> > > +  /* The access is based on a PARM_DECL.  */
> > > +  && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> > > +  && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 
> > > 0)), true)
> > > +  && TREE_CODE (parm) == PARM_DECL
> > > +  /* Likely passed on the stack.  */
> > > +  && DECL_MODE (parm) == BLKmode
> > > +  /*

Re: [PATCH] c++: memory corruption during name lookup w/ modules [PR99479]

2022-03-25 Thread Patrick Palka via Gcc-patches

On Thu, 17 Mar 2022, Patrick Palka wrote:

> On Tue, Mar 1, 2022 at 8:13 AM Patrick Palka  wrote:
> >
> > On Thu, Feb 17, 2022 at 3:24 PM Patrick Palka  wrote:
> > >
> > > name_lookup::search_unqualified uses a statically allocated vector
> > > in order to avoid repeated reallocation, under the assumption that
> > > the function can't be called recursively.  With modules however,
> > > this assumption turns out to be false, and search_unqualified can
> > > be called recursively as demonstrated by testcase in comment #19
> > > of PR99479[1] where the recursive call causes the vector to get
> > > reallocated which invalidates the reference held by the parent call.
> > >
> > > This patch makes search_unqualified instead use an auto_vec with 16
> > > elements of internal storage (since with the various libraries I tested,
> > > the size of the vector never exceeded 12).  In turn we can simplify the
> > > API of subroutines to take the vector by reference and return void.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > > trunk?
> >
> > Ping.
> 
> Ping.

Ping (+CC Nathan, I wonder if you can take a look at this patch?)

> 
> >
> > >
> > > [1]: https://gcc.gnu.org/PR99479#c19
> > >
> > > PR c++/99479
> > >
> > > gcc/cp/ChangeLog:
> > >
> > > * name-lookup.cc (name_lookup::using_queue): Change to an
> > > auto_vec (with 16 elements of internal storage).
> > > (name_lookup::queue_namespace): Change return type to void,
> > > take queue parameter by reference and adjust function body
> > > accordingly.
> > > (name_lookup::do_queue_usings): Inline into ...
> > > (name_lookup::queue_usings): ... here.  As in queue_namespace.
> > > (name_lookup::search_unqualified): Don't make queue static,
> > > assume its incoming length is 0, and adjust function body
> > > accordingly.
> > > ---
> > >  gcc/cp/name-lookup.cc | 62 +++
> > >  1 file changed, 22 insertions(+), 40 deletions(-)
> > >
> > > diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> > > index 93c4eb7193b..5c965d6fba1 100644
> > > --- a/gcc/cp/name-lookup.cc
> > > +++ b/gcc/cp/name-lookup.cc
> > > @@ -429,7 +429,7 @@ class name_lookup
> > >  {
> > >  public:
> > >typedef std::pair using_pair;
> > > -  typedef vec using_queue;
> > > +  typedef auto_vec using_queue;
> > >
> > >  public:
> > >tree name;   /* The identifier being looked for.  */
> > > @@ -528,16 +528,8 @@ private:
> > >bool search_usings (tree scope);
> > >
> > >  private:
> > > -  using_queue *queue_namespace (using_queue *queue, int depth, tree 
> > > scope);
> > > -  using_queue *do_queue_usings (using_queue *queue, int depth,
> > > -   vec *usings);
> > > -  using_queue *queue_usings (using_queue *queue, int depth,
> > > -vec *usings)
> > > -  {
> > > -if (usings)
> > > -  queue = do_queue_usings (queue, depth, usings);
> > > -return queue;
> > > -  }
> > > +  void queue_namespace (using_queue& queue, int depth, tree scope);
> > > +  void queue_usings (using_queue& queue, int depth, vec 
> > > *usings);
> > >
> > >  private:
> > >void add_fns (tree);
> > > @@ -1084,39 +1076,35 @@ name_lookup::search_qualified (tree scope, bool 
> > > usings)
> > >  /* Add SCOPE to the unqualified search queue, recursively add its
> > > inlines and those via using directives.  */
> > >
> > > -name_lookup::using_queue *
> > > -name_lookup::queue_namespace (using_queue *queue, int depth, tree scope)
> > > +void
> > > +name_lookup::queue_namespace (using_queue& queue, int depth, tree scope)
> > >  {
> > >if (see_and_mark (scope))
> > > -return queue;
> > > +return;
> > >
> > >/* Record it.  */
> > >tree common = scope;
> > >while (SCOPE_DEPTH (common) > depth)
> > >  common = CP_DECL_CONTEXT (common);
> > > -  vec_safe_push (queue, using_pair (common, scope));
> > > +  queue.safe_push (using_pair (common, scope));
> > >
> > >/* Queue its inline children.  */
> > >if (vec *inlinees = DECL_NAMESPACE_INLINEES (scope))
> > >  for (unsigned ix = inlinees->length (); ix--;)
> > > -  queue = queue_namespace (queue, depth, (*inlinees)[ix]);
> > > +  queue_namespace (queue, depth, (*inlinees)[ix]);
> > >
> > >/* Queue its using targets.  */
> > > -  queue = queue_usings (queue, depth, NAMESPACE_LEVEL 
> > > (scope)->using_directives);
> > > -
> > > -  return queue;
> > > +  queue_usings (queue, depth, NAMESPACE_LEVEL (scope)->using_directives);
> > >  }
> > >
> > >  /* Add the namespaces in USINGS to the unqualified search queue.  */
> > >
> > > -name_lookup::using_queue *
> > > -name_lookup::do_queue_usings (using_queue *queue, int depth,
> > > - vec *usings)
> > > +void
> > > +name_lookup::queue_usings (using_queue& queue, int depth, vec > > va_gc> *usings)
> > >  {
> > > -  for

Re: [PATCH v3] c++: alignas and alignof void [PR104944]

2022-03-25 Thread Marek Polacek via Gcc-patches

On Fri, Mar 25, 2022 at 09:36:10AM -0400, Jason Merrill wrote:
> On 3/24/22 18:43, Marek Polacek wrote:
> > On Thu, Mar 24, 2022 at 05:12:12PM -0400, Jason Merrill wrote:
> > > On 3/24/22 15:56, Marek Polacek wrote:
> > > > On Thu, Mar 24, 2022 at 12:02:29PM -0400, Jason Merrill wrote:
> > > > > On 3/24/22 11:49, Marek Polacek wrote:
> > > > > > I started looking into this PR because in GCC 4.9 we were able to
> > > > > > detect the invalid
> > > > > > 
> > > > > >  struct alignas(void) S{};
> > > > > > 
> > > > > > but I broke it in r210262.
> > > > > > 
> > > > > > It's ill-formed code in C++:
> > > > > > [dcl.align]/3: "An alignment-specifier of the form alignas(type-id) 
> > > > > > has
> > > > > > the same effect as alignas(alignof(type-id))", and [expr.align]/1:
> > > > > > "The operand shall be a type-id representing a complete object type,
> > > > > > or an array thereof, or a reference to one of those types." and void
> > > > > > is not a complete type.
> > > > > > 
> > > > > > It's also invalid in C:
> > > > > > 6.7.5: _Alignas(type-name) is equivalent to 
> > > > > > _Alignas(_Alignof(type-name))
> > > > > > 6.5.3.4: "The _Alignof operator shall not be applied to a function 
> > > > > > type
> > > > > > or an incomplete type."
> > > > > > 
> > > > > > We have a GNU extension whereby we treat sizeof(void) as 1, but I 
> > > > > > assume
> > > > > > it doesn't apply to alignof, so I'd like to reject it in C too.
> > > > > 
> > > > > That makes sense to me in principle, but we've allowed it since the
> > > > > beginning of version control, back when c_alignof was a separate 
> > > > > function.
> > > > > Changing that seems questionable for a regression fix.
> > > > 
> > > > Ok, that makes sense.  How about rejecting alignof(void) in C++ only
> > > > now (where it is a regression), and maybe come back to this in GCC 13 
> > > > for C?
> > > 
> > > I'd probably just leave it alone for C and __alignof.
> > 
> > Fair enough.
> > 
> > > > PR c++/104944
> > > > 
> > > > gcc/c-family/ChangeLog:
> > > > 
> > > > * c-common.cc (c_sizeof_or_alignof_type): Do not allow 
> > > > alignof(void)
> > > > in C++.
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * typeck.cc (cxx_alignas_expr): Call cxx_sizeof_or_alignof_type 
> > > > with
> > > > complain == true.
> > > 
> > > This hunk is OK.  But let's put the diagnostic in
> > > cxx_sizeof_or_alignof_type, where it can depend on std_alignof.
> > 
> > Like so?  With this patch __alignof only produces a pedwarn (there's no
> > __alignas to worry about).
> > 
> > -- >8 --
> > I started looking into this PR because in GCC 4.9 we were able to
> > detect the invalid
> > 
> >struct alignas(void) S{};
> > 
> > but I broke it in r210262.
> > 
> > It's ill-formed code in C++:
> > [dcl.align]/3: "An alignment-specifier of the form alignas(type-id) has
> > the same effect as alignas(alignof(type-id))", and [expr.align]/1:
> > "The operand shall be a type-id representing a complete object type,
> > or an array thereof, or a reference to one of those types." and void
> > is not a complete type.
> > 
> > It's also invalid in C:
> > 6.7.5: _Alignas(type-name) is equivalent to _Alignas(_Alignof(type-name))
> > 6.5.3.4: "The _Alignof operator shall not be applied to a function type
> > or an incomplete type."
> > 
> > We have a GNU extension whereby we treat sizeof(void) as 1, but I assume
> > it doesn't apply to alignof, at least in C++.  However, __alignof__(void)
> > is still accepted with a -Wpedantic warning.
> > 
> > (We still say "invalid application of '__alignof__'" rather than
> > 'alignas' but I felt that fixing that may not be suitable as part of this
> > patch.)
> 
> Do we still say '__alignof__' in this version of the patch?  Seems like now
> we might as well say 'alignof'.  OK with that change.

When diagnosing alignof(void) we now say 'alignof', for __alignof__(void) we
say '__alignof__', but the "incomplete type" diagnostic still always prints
'__alignof__' :(.

I'll fix the note and push, thanks!

Marek

Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

2022-03-25 Thread Richard Biener via Gcc-patches

On Fri, 25 Mar 2022, Hongtao Liu wrote:

> On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > Since we're now vectorizing by default at -O2 issues like PR101908
> > become more important where we apply basic-block vectorization to
> > parts of the function covering loads from function parameters passed
> > on the stack.  Since we have no good idea how the stack pushing
> > was performed but we do have a good idea that it was done recent
> > when at the start of the function a STLF failure is inevitable
> > if the argument passing code didn't use the same or larger vector
> > stores and also has them aligned the same way.
> >
> > Until there's a robust IPA based solution the following implements
> > target independent heuristics in the vectorizer to retain scalar
> > loads for loads from parameters likely passed in memory (I use
> > a BLKmode DECL_MODE check for this rather than firing up
> > cummulative-args).  I've restricted this also to loads from the
> > first "block" (that can be less than the first basic block if there's
> > a call for example), since that covers the testcase.
> I  prefer this patch to the md-reorg way.

I was mostly posting the variant because it is target agnostic.  In
general I agree that mitigation is best done at the RTL level after
scheduling/RA so that the instruction sequence from function entry
is visible.

Did you have any success in coding this up yet?  Do you think it
can be done in a way to be re-used by multiple targets?

> Can vectorizer similarly handle by-reference passed arguments if their
> corresponding real parameters are local variables of the caller?
> .i.e. we can also prevent vectorization for the below case.
> 
> struct vec3 {
>   double x, y, z;
> };
> 
> struct ray {
>  struct vec3 orig, dir;
> };
> 
> void
> __attribute__((noinline))
> ray_sphere (struct ray* __restrict ray, double *__restrict res)
> {
>   res[0] = ray->orig.y * ray->dir.x;
>   res[1] = ray->orig.z * ray->dir.y;
> }
> 
> extern struct ray g;
> void bar (double* res)
> {
> struct ray tmp = g;
> ray_sphere (, res);
> }

Without IPA analysis this will be hard, if not impossible.

Richard.

> >
> > Note that for the testcase (but not c-ray from the bugreport) there's
> > a x86 peephole2 that vectorizes things back, so the patch is
> > not effective there.
> >
> > Any comments?  I know we're also looking at x86 port specific
> > mitigations but the issue will hit arm and power/z as well I think.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Thanks,
> > Richard.
> >
> > 2022-03-25  Richard Biener  
> >
> > PR tree-optimization/101908
> > * tree-vect-stmts.cc (get_group_load_store_type): Add
> > heuristic to limit BB vectorization of function parameters.
> >
> > * gcc.dg/vect/bb-slp-pr101908.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++
> >  gcc/tree-vect-stmts.cc  | 27 -
> >  2 files changed, 44 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c 
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > new file mode 100644
> > index 000..b7534a18f0e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_double } */
> > +
> > +struct vec3 {
> > +  double x, y, z;
> > +};
> > +
> > +struct ray {
> > +  struct vec3 orig, dir;
> > +};
> > +
> > +void ray_sphere (struct ray ray, double *res)
> > +{
> > +  res[0] = ray.orig.y * ray.dir.x;
> > +  res[1] = ray.orig.z * ray.dir.y;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index f7449a79d1c..1e37e9678b6 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, 
> > stmt_vec_info stmt_info,
> >/* Stores can't yet have gaps.  */
> >gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0);
> >
> > -  if (slp_node)
> > +  tree parm;
> > +  if (!loop_vinfo
> > +  && vls_type == VLS_LOAD
> > +  /* The access is based on a PARM_DECL.  */
> > +  && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> > +  && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)), 
> > true)
> > +  && TREE_CODE (parm) == PARM_DECL
> > +  /* Likely passed on the stack.  */
> > +  && DECL_MODE (parm) == BLKmode
> > +  /* The access is in the first group.  */
> > +  && first_dr_info->group == 0)
> > +{
> > +  /* When doing BB vectorizing force early loads from function 
> > parameters
> > +passed on the stack and thus stored recently to be done elementwise
> > +to avoid store-to-load forwarding penalties.
> > +

Re: [PATCH v3] c++: alignas and alignof void [PR104944]

2022-03-25 Thread Jason Merrill via Gcc-patches


On 3/24/22 18:43, Marek Polacek wrote:

On Thu, Mar 24, 2022 at 05:12:12PM -0400, Jason Merrill wrote:

On 3/24/22 15:56, Marek Polacek wrote:

On Thu, Mar 24, 2022 at 12:02:29PM -0400, Jason Merrill wrote:

On 3/24/22 11:49, Marek Polacek wrote:

I started looking into this PR because in GCC 4.9 we were able to
detect the invalid

 struct alignas(void) S{};

but I broke it in r210262.

It's ill-formed code in C++:
[dcl.align]/3: "An alignment-specifier of the form alignas(type-id) has
the same effect as alignas(alignof(type-id))", and [expr.align]/1:
"The operand shall be a type-id representing a complete object type,
or an array thereof, or a reference to one of those types." and void
is not a complete type.

It's also invalid in C:
6.7.5: _Alignas(type-name) is equivalent to _Alignas(_Alignof(type-name))
6.5.3.4: "The _Alignof operator shall not be applied to a function type
or an incomplete type."

We have a GNU extension whereby we treat sizeof(void) as 1, but I assume
it doesn't apply to alignof, so I'd like to reject it in C too.


That makes sense to me in principle, but we've allowed it since the
beginning of version control, back when c_alignof was a separate function.
Changing that seems questionable for a regression fix.


Ok, that makes sense.  How about rejecting alignof(void) in C++ only
now (where it is a regression), and maybe come back to this in GCC 13 for C?


I'd probably just leave it alone for C and __alignof.


Fair enough.


PR c++/104944

gcc/c-family/ChangeLog:

* c-common.cc (c_sizeof_or_alignof_type): Do not allow alignof(void)
in C++.

gcc/cp/ChangeLog:

* typeck.cc (cxx_alignas_expr): Call cxx_sizeof_or_alignof_type with
complain == true.


This hunk is OK.  But let's put the diagnostic in
cxx_sizeof_or_alignof_type, where it can depend on std_alignof.


Like so?  With this patch __alignof only produces a pedwarn (there's no
__alignas to worry about).

-- >8 --
I started looking into this PR because in GCC 4.9 we were able to
detect the invalid

   struct alignas(void) S{};

but I broke it in r210262.

It's ill-formed code in C++:
[dcl.align]/3: "An alignment-specifier of the form alignas(type-id) has
the same effect as alignas(alignof(type-id))", and [expr.align]/1:
"The operand shall be a type-id representing a complete object type,
or an array thereof, or a reference to one of those types." and void
is not a complete type.

It's also invalid in C:
6.7.5: _Alignas(type-name) is equivalent to _Alignas(_Alignof(type-name))
6.5.3.4: "The _Alignof operator shall not be applied to a function type
or an incomplete type."

We have a GNU extension whereby we treat sizeof(void) as 1, but I assume
it doesn't apply to alignof, at least in C++.  However, __alignof__(void)
is still accepted with a -Wpedantic warning.

(We still say "invalid application of '__alignof__'" rather than
'alignas' but I felt that fixing that may not be suitable as part of this
patch.)


Do we still say '__alignof__' in this version of the patch?  Seems like 
now we might as well say 'alignof'.  OK with that change.



PR c++/104944

gcc/cp/ChangeLog:

* typeck.cc (cxx_sizeof_or_alignof_type): Diagnose alignof(void).
(cxx_alignas_expr): Call cxx_sizeof_or_alignof_type with
complain == true.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alignas20.C: New test.
---
  gcc/cp/typeck.cc   | 21 +++--
  gcc/testsuite/g++.dg/cpp0x/alignas20.C | 26 ++
  2 files changed, 41 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/alignas20.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 516fa574ef6..26a7cb4b50d 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -1873,9 +1873,9 @@ compparms (const_tree parms1, const_tree parms2)
  }
  
  

-/* Process a sizeof or alignof expression where the operand is a
-   type. STD_ALIGNOF indicates whether an alignof has C++11 (minimum alignment)
-   or GNU (preferred alignment) semantics; it is ignored if op is
+/* Process a sizeof or alignof expression where the operand is a type.
+   STD_ALIGNOF indicates whether an alignof has C++11 (minimum alignment)
+   or GNU (preferred alignment) semantics; it is ignored if OP is
 SIZEOF_EXPR.  */
  
  tree

@@ -1899,6 +1899,13 @@ cxx_sizeof_or_alignof_type (location_t loc, tree type, 
enum tree_code op,
else
return error_mark_node;
  }
+  else if (VOID_TYPE_P (type) && std_alignof)
+{
+  if (complain)
+   error_at (loc, "invalid application of %qs to a void type",
+ OVL_OP_INFO (false, op)->name);
+  return error_mark_node;
+}
  
bool dependent_p = dependent_type_p (type);

if (!dependent_p)
@@ -2132,11 +2139,13 @@ cxx_alignas_expr (tree e)
  /* [dcl.align]/3:
 
  	   When the alignment-specifier is of the form

-  alignas(type-id ), it shall have the same effect as
-

[wwwdocs] Add functions that require to GCC 12 porting-to docs

2022-03-25 Thread Jonathan Wakely via Gcc-patches

Pushed to wwwdocs.

---
 htdocs/gcc-12/porting_to.html | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-12/porting_to.html b/htdocs/gcc-12/porting_to.html
index 470703c7..079bda30 100644
--- a/htdocs/gcc-12/porting_to.html
+++ b/htdocs/gcc-12/porting_to.html
@@ -67,7 +67,8 @@ be included explicitly when compiled with GCC 12:
   (for std::shared_ptr, std::unique_ptr etc.)
 
  iterator
-  (for std::istream_iterator, 
std::istreambuf_iterator)
+  (for std::begin, std::end, std::size,
+  std::istream_iterator, std::istreambuf_iterator)
 
  algorithm
   (for std::for_each, std::copy etc.)
-- 
2.34.1

Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-25 Thread Jason Merrill via Gcc-patches


On 3/17/22 06:35, Jonathan Wakely via Gcc-patches wrote:

On 15/03/22 14:36 -0600, Martin Sebor wrote:

On 3/15/22 10:40, Siddhesh Poyarekar wrote:

On 15/03/2022 21:09, Martin Sebor wrote:

The strncmp function takes arrays as arguments (not necessarily
strings).? The main purpose of the -Wstringop-overread warning
for calls to it is to detect calls where one of the arrays is
not a nul-terminated string and the bound is larger than the size
of the array.? For example:

?? char a[4], b[4];

?? int f (void)
?? {
 return strncmp (a, b, 8);?? // -Wstringop-overread
?? }

Such a call is suspect: if one of the arrays isn't nul-terminated
the call is undefined.? Otherwise, if both are nul-terminated there


Isn't "suspect" too harsh a description though?? The bound does not
specify the size of a or b, it specifies the maximum extent to which to
compare a and b, the extent being any application-specific limit.? In
fact the limit could be the size of some arbitrary third buffer that the
contents of a or b must be copied to, truncating to the bound.


The intended use of the strncmp bound is to limit the comparison to
at most the size of the arrays or (in a subset of cases) the length
of an initial substring. Providing an arbitrary bound that's not
related to the sizes as you describe sounds very much like a misuse.

As a historical note, strncmp was first introduced in UNIX v7 where
its purpose, alongside strncpy, was to manipulate (potentially)
unterminated character arrays like file names stored in fixed size
arrays (typically 14 bytes).  Strncpy would fill the buffers with
ASCII data up to their size and pad the rest with nuls only if there
was room.

Strncmp was then used to compare these potentially unterminated
character arrays (e.g., archive headers in ld and ranlib).  The bound
was the size of the fixed size array.  Its other use case was to compare
leading portions of strings (e.g, when looking for an environment
variable or when stripping "./" from path names).

Since the early UNIX days, both strncpy and to a lesser extent strncmp
have been widely misused and, along with many other functions in
, a frequent source of bugs due to common misunderstanding
of their intended purpose.  The aim of these warnings is to detect
the common (and sometimes less common) misuses and bugs.


I agree the call is undefined if one of the arrays is not nul-terminated
and that's the thing; nothing about the bound is undefined in this
context, it's the NUL termination that is key.


is no point in calling strncmp with a bound greater than their sizes.


There is, when the bound describes something else, e.g. the size of a
third destination buffer into which one of the input buffers may get
copied into.? Or when the bound describes the maximum length of a set of
strings where only a subset of the strings are reachable in the current
function and ranger sees it, allowing us to reduce our input string size
estimate.? The bounds being the maximum of the lengths of two input
strings is just one of many possibilities.


With no evidence that this warning is ever harmful I'd consider


There is, the false positives were seen in Fedora/RHEL builds.


I haven't seen these so I can't very well comment on them.  But I can
assure you that warning for the code above is intentional.  Whether


I don't think anybody is saying it wasn't intentional, the point is
that we can change our minds and do it differently based on feedback
and usage experience.

If users no longer have faith in these warnings and just disable them
or ignore them, then the warnings do not find real bugs and are not
fit for purpose.


or not the arrays are nul-terminated, the expected way to call
the function is with a bound no greater than their size (some coding
guidelines are explicit about this; see for example the CERT C Secure
Coding standard rule ARR38-C).


That's fine. It shouldn't be in -Wall though.


Perhaps a suitable compromise would be to add a separate warning flag 
specifically for the strn* warnings, so users deliberately using the 
bound to express a limit other than the length of the argument string 
(and confident that their strings are always NUL-terminated) can turn 
them off without turning off all the overread warnings.


Jason

Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

2022-03-25 Thread Hongtao Liu via Gcc-patches

On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches
 wrote:
>
> Since we're now vectorizing by default at -O2 issues like PR101908
> become more important where we apply basic-block vectorization to
> parts of the function covering loads from function parameters passed
> on the stack.  Since we have no good idea how the stack pushing
> was performed but we do have a good idea that it was done recent
> when at the start of the function a STLF failure is inevitable
> if the argument passing code didn't use the same or larger vector
> stores and also has them aligned the same way.
>
> Until there's a robust IPA based solution the following implements
> target independent heuristics in the vectorizer to retain scalar
> loads for loads from parameters likely passed in memory (I use
> a BLKmode DECL_MODE check for this rather than firing up
> cummulative-args).  I've restricted this also to loads from the
> first "block" (that can be less than the first basic block if there's
> a call for example), since that covers the testcase.
I  prefer this patch to the md-reorg way.
Can vectorizer similarly handle by-reference passed arguments if their
corresponding real parameters are local variables of the caller?
.i.e. we can also prevent vectorization for the below case.

struct vec3 {
  double x, y, z;
};

struct ray {
 struct vec3 orig, dir;
};

void
__attribute__((noinline))
ray_sphere (struct ray* __restrict ray, double *__restrict res)
{
  res[0] = ray->orig.y * ray->dir.x;
  res[1] = ray->orig.z * ray->dir.y;
}

extern struct ray g;
void bar (double* res)
{
struct ray tmp = g;
ray_sphere (, res);
}

>
> Note that for the testcase (but not c-ray from the bugreport) there's
> a x86 peephole2 that vectorizes things back, so the patch is
> not effective there.
>
> Any comments?  I know we're also looking at x86 port specific
> mitigations but the issue will hit arm and power/z as well I think.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Thanks,
> Richard.
>
> 2022-03-25  Richard Biener  
>
> PR tree-optimization/101908
> * tree-vect-stmts.cc (get_group_load_store_type): Add
> heuristic to limit BB vectorization of function parameters.
>
> * gcc.dg/vect/bb-slp-pr101908.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++
>  gcc/tree-vect-stmts.cc  | 27 -
>  2 files changed, 44 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> new file mode 100644
> index 000..b7534a18f0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +
> +struct vec3 {
> +  double x, y, z;
> +};
> +
> +struct ray {
> +  struct vec3 orig, dir;
> +};
> +
> +void ray_sphere (struct ray ray, double *res)
> +{
> +  res[0] = ray.orig.y * ray.dir.x;
> +  res[1] = ray.orig.z * ray.dir.y;
> +}
> +
> +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index f7449a79d1c..1e37e9678b6 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>/* Stores can't yet have gaps.  */
>gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0);
>
> -  if (slp_node)
> +  tree parm;
> +  if (!loop_vinfo
> +  && vls_type == VLS_LOAD
> +  /* The access is based on a PARM_DECL.  */
> +  && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> +  && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)), 
> true)
> +  && TREE_CODE (parm) == PARM_DECL
> +  /* Likely passed on the stack.  */
> +  && DECL_MODE (parm) == BLKmode
> +  /* The access is in the first group.  */
> +  && first_dr_info->group == 0)
> +{
> +  /* When doing BB vectorizing force early loads from function parameters
> +passed on the stack and thus stored recently to be done elementwise
> +to avoid store-to-load forwarding penalties.
> +Note this will cause vectorization to fail for the load because of
> +the fear of underestimating the cost of elementwise accesses,
> +see the end of get_load_store_type.  We are then going to effectively
> +do the same via handling the loads as external input to the SLP.  */
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"Not using vector loads from function parameter "
> +"for the fear of causing a STLF fail\n");
> +  *memory_access_type = VMAT_ELEMENTWISE;
> +}
> +  else if (slp_node)
>  {
>/* For SLP vectorization we directly vectorize a subchain
>

RE: [aarch64] Update Neoverse N2 core definition

2022-03-25 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Wednesday, March 16, 2022 3:01 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [aarch64] Update Neoverse N2 core definition
> 
> Hi,
> 
> As requested, I updated the Neoverse N2 entry to use the
> AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated
> the
> ARCH_INDENT to 9A and moved it under the Armv9 cores.
> 

Ok, I should have said that the change is pre-approved.
Thanks,
Kyrill

> gcc/ChangeLog:
> 
>      * config/aarch64/aarch64-cores.def: Update Neoverse N2 core entry.

Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches


On 3/25/22 13:35, Thomas Schwinge wrote:

Hi!

On 2022-03-25T13:08:52+0100, Tom de Vries  wrote:

On 3/25/22 11:04, Tobias Burnus wrote:

On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:

On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:

[...]
Fix this by scaling down the failing test-cases.
Tested on x86_64-linux with nvptx accelerator.
[...]

Will defer to Thomas, as it is a purely OpenACC change.

One way to do it is
/* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests
} } */
and using
#ifdef EXPENSIVE
[...]

For the Fortran test it would mean .F90 extension though...


Alternative, use the "-cpp" flag in 'dg-additional-options', which also
enables the C-pre-processor pre-processing in gfortran.


Ack, updated patch accordingly.


Not sure if this additional "complexity" is really necessary here: as far
as I can tell, there's no actual rationale behind the original number of
iterations, so it seems fine to unconditionally scale them down.  I'd
thus move forward with your original patch -- but won't object the
'run_expensive_tests' variant either; the latter is already used in a
handful of other libgomp test cases.



Ack, committed the GCC_TEST_RUN_EXPENSIVE variant.

Thanks,
- Tom

Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Thomas Schwinge

Hi!

On 2022-03-25T13:08:52+0100, Tom de Vries  wrote:
> On 3/25/22 11:04, Tobias Burnus wrote:
>> On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:
>>> On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:
 [...]
 Fix this by scaling down the failing test-cases.
 Tested on x86_64-linux with nvptx accelerator.
 [...]
>>> Will defer to Thomas, as it is a purely OpenACC change.
>>>
>>> One way to do it is
>>> /* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests
>>> } } */
>>> and using
>>> #ifdef EXPENSIVE
>>> [...]
>>>
>>> For the Fortran test it would mean .F90 extension though...
>>
>> Alternative, use the "-cpp" flag in 'dg-additional-options', which also
>> enables the C-pre-processor pre-processing in gfortran.
>
> Ack, updated patch accordingly.

Not sure if this additional "complexity" is really necessary here: as far
as I can tell, there's no actual rationale behind the original number of
iterations, so it seems fine to unconditionally scale them down.  I'd
thus move forward with your original patch -- but won't object the
'run_expensive_tests' variant either; the latter is already used in a
handful of other libgomp test cases.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] fortran: Fix up initializers of param(0) PARAMETERs [PR103691]

2022-03-25 Thread Jakub Jelinek via Gcc-patches

On Fri, Mar 25, 2022 at 01:13:06PM +0100, Richard Biener wrote:
> > Also, I think typically in the Fortran FE side-effects would go into
> > se.pre and se.post sequences, not into se.expr, and this routine
> > doesn't emit those se.pre/se.post sequences anywhere, so presumably it
> > assumes they don't exist.
> >
> > What is the behavior with a RANGE_EXPR when one has { [0..10] = ++i; },
> > is that applying the side-effects 11 times or once ?
> 
> 11 times is what is documented.

Then [0..-1] = ++i should be 0 times the side-effect.

Jakub

Re: [PATCH] middle-end/105049 - fix uniform_vector_p and vector CTOR gimplification

2022-03-25 Thread Richard Biener via Gcc-patches

On Fri, 25 Mar 2022, Richard Biener wrote:

> We have
> 
>   return VIEW_CONVERT_EXPR( VEC_PERM_EXPR < {<<< Unknown tree: 
> compound_literal_expr
> V D.1984 = { 0 }; >>>, { 0 }} , {<<< Unknown tree: 
> compound_literal_expr
> V D.1985 = { 0 }; >>>, { 0 }} , { 0, 0 } >  & {(short int) SAVE_EXPR 
> , (short int) SAVE_EXPR });
> 
> where we gimplify the init CTORs to
> 
>   _1 = {{ 0 }, { 0 }};
>   _2 = {{ 0 }, { 0 }};
> 
> instead of to vector constants.  That later runs into a bug in
> uniform_vector_p which doesn't handle CTORs of vector elements
> correctly.
> 
> The following adjusts uniform_vector_p to handle CTORs of vector
> elements and also makes sure to simplify the CTORs to VECTOR_CSTs
> during gimplification by re-ordering the simplification to after
> CTOR flag recomputation.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  At this
> point I'm leaning towards delaying the gimplification change
> to stage1 - do you agree?

I have now pushed the variant with just the tree.cc hunk.

Richard.

> Thanks,
> Richard.
> 
> 2022-03-25  Richard Biener  
> 
>   PR middle-end/105049
>   * gimplify.cc (gimplify_init_constructor): First gimplify,
>   then simplify the result to a VECTOR_CST.
>   * tree.cc (uniform_vector_p): Recurse for VECTOR_CST or
>   CONSTRUCTOR first elements.
> 
>   * gcc/testsuite/gcc.dg/pr105049.c: New testcase.
> ---
>  gcc/gimplify.cc | 33 -
>  gcc/testsuite/gcc.dg/pr105049.c | 12 
>  gcc/tree.cc |  2 ++
>  3 files changed, 30 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr105049.c
> 
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index f62f150fc08..a866d4e6f56 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -5390,6 +5390,22 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
>   if (notify_temp_creation)
> return GS_OK;
>  
> + /* Vector types use CONSTRUCTOR all the way through gimple
> +compilation as a general initializer.  */
> + FOR_EACH_VEC_SAFE_ELT (elts, ix, ce)
> +   {
> + enum gimplify_status tret;
> + tret = gimplify_expr (>value, pre_p, post_p, is_gimple_val,
> +   fb_rvalue);
> + if (tret == GS_ERROR)
> +   ret = GS_ERROR;
> + else if (TREE_STATIC (ctor)
> +  && !initializer_constant_valid_p (ce->value,
> +TREE_TYPE (ce->value)))
> +   TREE_STATIC (ctor) = 0;
> +   }
> + recompute_constructor_flags (ctor);
> +
>   /* Go ahead and simplify constant constructors to VECTOR_CST.  */
>   if (TREE_CONSTANT (ctor))
> {
> @@ -5412,25 +5428,8 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
>   TREE_OPERAND (*expr_p, 1) = build_vector_from_ctor (type, elts);
>   break;
> }
> -
> - TREE_CONSTANT (ctor) = 0;
> }
>  
> - /* Vector types use CONSTRUCTOR all the way through gimple
> -compilation as a general initializer.  */
> - FOR_EACH_VEC_SAFE_ELT (elts, ix, ce)
> -   {
> - enum gimplify_status tret;
> - tret = gimplify_expr (>value, pre_p, post_p, is_gimple_val,
> -   fb_rvalue);
> - if (tret == GS_ERROR)
> -   ret = GS_ERROR;
> - else if (TREE_STATIC (ctor)
> -  && !initializer_constant_valid_p (ce->value,
> -TREE_TYPE (ce->value)))
> -   TREE_STATIC (ctor) = 0;
> -   }
> - recompute_constructor_flags (ctor);
>   if (!is_gimple_reg (TREE_OPERAND (*expr_p, 0)))
> TREE_OPERAND (*expr_p, 1) = get_formal_tmp_var (ctor, pre_p);
>}
> diff --git a/gcc/testsuite/gcc.dg/pr105049.c b/gcc/testsuite/gcc.dg/pr105049.c
> new file mode 100644
> index 000..b0518c6a181
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr105049.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fno-tree-forwprop" } */
> +
> +typedef short __attribute__((__vector_size__ (sizeof(short V;
> +typedef short __attribute__((__vector_size__ (2*sizeof(short U;
> +char c;
> +
> +U
> +foo (void)
> +{
> +  return __builtin_shufflevector ((V){}, (V){}, 0, 0) & c;
> +}
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index b8017af6cfc..ec200e9a7eb 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -10266,6 +10266,8 @@ uniform_vector_p (const_tree vec)
>if (i != nelts)
>   return NULL_TREE;
>  
> +  if (TREE_CODE (first) == CONSTRUCTOR || TREE_CODE (first) == 
> VECTOR_CST)
> + return uniform_vector_p (first);
>return first;
>  }
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

Re: [PATCH] fortran: Fix up initializers of param(0) PARAMETERs [PR103691]

2022-03-25 Thread Richard Biener via Gcc-patches

On Fri, Mar 25, 2022 at 12:34 PM Jakub Jelinek  wrote:
>
> On Fri, Mar 25, 2022 at 12:16:40PM +0100, Richard Biener wrote:
> > On Fri, Mar 25, 2022 at 11:13 AM Tobias Burnus  
> > wrote:
> > >
> > > On 25.03.22 09:57, Jakub Jelinek via Fortran wrote:
> > > > On the gfortran.dg/pr103691.f90 testcase the Fortran ICE emits
> > > >static real(kind=4) a[0] = {[0 ... -1]=2.0e+0};
> > > > That is an invalid RANGE_EXPR where the maximum is smaller than the 
> > > > minimum.
> > > >
> > > > The following patch fixes that.  If TYPE_MAX_VALUE is smaller than
> > > > TYPE_MIN_VALUE, the array is empty and so doesn't need any initializer,
> > > > if the two are equal, we don't need to bother with a RANGE_EXPR and
> > > > can just use that INTEGER_CST as the index and finally for the 2+ values
> > > > in the range it uses a RANGE_EXPR as before.
> > > >
> > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > >
> > > LGTM – thanks for taking care of Fortran patches and regressions.
> > >
> > > > 2022-03-25  Jakub Jelinek  
> > > >
> > > >   PR fortran/103691
> > > >   * trans-array.cc (gfc_conv_array_initializer): If TYPE_MAX_VALUE 
> > > > is
> > > >   smaller than TYPE_MIN_VALUE (i.e. empty array), throw the 
> > > > initializer
> > > >   on the floor, if TYPE_MIN_VALUE is equal to TYPE_MAX_VALUE, use 
> > > > just
> > > >   the TYPE_MIN_VALUE as index instead of RANGE_EXPR.
> > >
> > > I am not sure whether "throw the initializer on the floor" is the best 
> > > wording
> > > for a changelog. I think I prefer a wording like "ignore the initializer" 
> > > or
> > > another less idiomatic expression. And I think a ';' before the second 
> > > 'if'
> > > also increases readability.
> >
> > Can there be side-effects in those initializer elements in Fortran?
>
> For PARAMETERs certainly not, those need to be constant.
> Even otherwise, this is in a routine that does
>   /* Create a constructor from the list of elements.  */
>   tmp = build_constructor (type, v);
>   TREE_CONSTANT (tmp) = 1;
>   return tmp;
> at the end so I wouldn't expect side-effects anywhere.

Ah, didn't see that.

> Also, I think typically in the Fortran FE side-effects would go into
> se.pre and se.post sequences, not into se.expr, and this routine
> doesn't emit those se.pre/se.post sequences anywhere, so presumably it
> assumes they don't exist.
>
> What is the behavior with a RANGE_EXPR when one has { [0..10] = ++i; },
> is that applying the side-effects 11 times or once ?

11 times is what is documented.

Richard.

>
>
> Jakub
>

Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Jakub Jelinek via Gcc-patches

On Fri, Mar 25, 2022 at 01:08:52PM +0100, Tom de Vries wrote:
> On 3/25/22 11:04, Tobias Burnus wrote:
> > On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:
> > > On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:
> > > > [...]
> > > > Fix this by scaling down the failing test-cases.
> > > > Tested on x86_64-linux with nvptx accelerator.
> > > > [...]
> > > Will defer to Thomas, as it is a purely OpenACC change.
> > > 
> > > One way to do it is
> > > /* { dg-additional-options "-DEXPENSIVE" { target
> > > run_expensive_tests } } */
> > > and using
> > > #ifdef EXPENSIVE
> > > [...]
> > > 
> > > For the Fortran test it would mean .F90 extension though...
> > 
> > Alternative, use the "-cpp" flag in 'dg-additional-options', which also
> > enables the C-pre-processor pre-processing in gfortran.
> > 
> 
> Ack, updated patch accordingly.

LGTM, if Thomas doesn't disagree until mid next week, it is ok for trunk.

> 2022-03-25  Tom de Vries  
> 
>   PR libgomp/105042
>   * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce
>   execution time.
>   * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same.
>   * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.

Jakub

[PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

2022-03-25 Thread Richard Biener via Gcc-patches

Since we're now vectorizing by default at -O2 issues like PR101908
become more important where we apply basic-block vectorization to
parts of the function covering loads from function parameters passed
on the stack.  Since we have no good idea how the stack pushing
was performed but we do have a good idea that it was done recent
when at the start of the function a STLF failure is inevitable
if the argument passing code didn't use the same or larger vector
stores and also has them aligned the same way.

Until there's a robust IPA based solution the following implements
target independent heuristics in the vectorizer to retain scalar
loads for loads from parameters likely passed in memory (I use
a BLKmode DECL_MODE check for this rather than firing up
cummulative-args).  I've restricted this also to loads from the
first "block" (that can be less than the first basic block if there's
a call for example), since that covers the testcase.

Note that for the testcase (but not c-ray from the bugreport) there's
a x86 peephole2 that vectorizes things back, so the patch is
not effective there.

Any comments?  I know we're also looking at x86 port specific
mitigations but the issue will hit arm and power/z as well I think.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2022-03-25  Richard Biener  

PR tree-optimization/101908
* tree-vect-stmts.cc (get_group_load_store_type): Add
heuristic to limit BB vectorization of function parameters.

* gcc.dg/vect/bb-slp-pr101908.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++
 gcc/tree-vect-stmts.cc  | 27 -
 2 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
new file mode 100644
index 000..b7534a18f0e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+struct vec3 {
+  double x, y, z;
+};
+
+struct ray {
+  struct vec3 orig, dir;
+};
+
+void ray_sphere (struct ray ray, double *res)
+{
+  res[0] = ray.orig.y * ray.dir.x;
+  res[1] = ray.orig.z * ray.dir.y;
+}
+
+/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f7449a79d1c..1e37e9678b6 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* Stores can't yet have gaps.  */
   gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0);
 
-  if (slp_node)
+  tree parm;
+  if (!loop_vinfo
+  && vls_type == VLS_LOAD
+  /* The access is based on a PARM_DECL.  */
+  && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
+  && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)), true)
+  && TREE_CODE (parm) == PARM_DECL
+  /* Likely passed on the stack.  */
+  && DECL_MODE (parm) == BLKmode
+  /* The access is in the first group.  */
+  && first_dr_info->group == 0)
+{
+  /* When doing BB vectorizing force early loads from function parameters
+passed on the stack and thus stored recently to be done elementwise
+to avoid store-to-load forwarding penalties.
+Note this will cause vectorization to fail for the load because of
+the fear of underestimating the cost of elementwise accesses,
+see the end of get_load_store_type.  We are then going to effectively
+do the same via handling the loads as external input to the SLP.  */
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"Not using vector loads from function parameter "
+"for the fear of causing a STLF fail\n");
+  *memory_access_type = VMAT_ELEMENTWISE;
+}
+  else if (slp_node)
 {
   /* For SLP vectorization we directly vectorize a subchain
 without permutation.  */
-- 
2.34.1

Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches


On 3/25/22 11:04, Tobias Burnus wrote:

On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:

On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:

[...]
Fix this by scaling down the failing test-cases.
Tested on x86_64-linux with nvptx accelerator.
[...]

Will defer to Thomas, as it is a purely OpenACC change.

One way to do it is
/* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests 
} } */

and using
#ifdef EXPENSIVE
[...]

For the Fortran test it would mean .F90 extension though...


Alternative, use the "-cpp" flag in 'dg-additional-options', which also
enables the C-pre-processor pre-processing in gfortran.



Ack, updated patch accordingly.

Thanks,
- Tom
[libgomp, testsuite] Scale down some OpenACC test-cases

When a display manager is running on an nvidia card, all CUDA kernel launches
get a 5 seconds watchdog timer.

Consequently, when running the libgomp testsuite with nvptx accelerator and
GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
  execution test
...

Fix this by scaling down the failing test-cases by default, and reverting to
the original behaviour for GCC_TEST_RUN_EXPENSIVE=1.

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-03-25  Tom de Vries  

	PR libgomp/105042
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce
	execution time.
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.

---
 .../libgomp.oacc-c-c++-common/parallel-dims.c  | 45 +-
 .../libgomp.oacc-c-c++-common/vred2d-128.c |  6 +++
 .../libgomp.oacc-fortran/parallel-dims.f90 | 18 +++--
 3 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index b1cfe37df8a..6798e23ef70 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -1,6 +1,8 @@
 /* OpenACC parallelism dimensions clauses: num_gangs, num_workers,
vector_length.  */
 
+/* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests } } */
+
 /* { dg-additional-options "--param=openacc-kernels=decompose" } */
 
 /* { dg-additional-options "-fopt-info-all-omp" }
@@ -49,6 +51,11 @@ static int acc_vector ()
   return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
 }
 
+#ifdef EXPENSIVE
+#define N 100
+#else
+#define N 50
+#endif
 
 int main ()
 {
@@ -76,7 +83,7 @@ int main ()
 {
   /* We're actually executing with num_gangs (1).  */
   gangs_actual = 1;
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+  for (int i = N * gangs_actual; i > -N * gangs_actual; --i)
 	{
 	  gangs_min = gangs_max = acc_gang ();
 	  workers_min = workers_max = acc_worker ();
@@ -115,7 +122,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+  for (int i = N * gangs_actual; i > -N * gangs_actual; --i)
 	{
 	  gangs_min = gangs_max = acc_gang ();
 	  workers_min = workers_max = acc_worker ();
@@ -154,7 +161,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC worker loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * workers_actual; i > -100 * workers_actual; --i)
+  for (int i = N * workers_actual; i > -N * workers_actual; --i)
 	{
 	  gangs_min = gangs_max = acc_gang ();
 	  workers_min = workers_max = acc_worker ();
@@ -200,7 +207,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC vector loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * vectors_actual; i > -100 * vectors_actual; --i)
+  for (int i = N * vectors_actual; i > -N * vectors_actual; --i)
 	{

[c-family] Fix issue for pointers to anonymous types with -fdump-ada-spec

2022-03-25 Thread Eric Botcazou via Gcc-patches

This used to work long ago but broke at some point, so I'm applying the fix
only on the mainline, all the more so that it deals the "section" attribute.

Tested on x86-64/Linux, applied on the mainline.


2022-03-25  Eric Botcazou  

c-family/
* c-ada-spec.cc (dump_ada_import): Deal with the "section" attribute.
(dump_ada_node) : Do not modify and pass the name, but
the referenced type instead.  Deal with the anonymous original type
of a typedef'ed type.  In the actual access case, follow the chain of
external subtypes.
: Tidy up control flow.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.cc b/gcc/c-family/c-ada-spec.cc
index aeb429136b6..f291e150934 100644
--- a/gcc/c-family/c-ada-spec.cc
+++ b/gcc/c-family/c-ada-spec.cc
@@ -1526,6 +1526,15 @@ dump_ada_import (pretty_printer *buffer, tree t, int spc)
 
   newline_and_indent (buffer, spc + 5);
 
+  tree sec = lookup_attribute ("section", DECL_ATTRIBUTES (t));
+  if (sec)
+{
+  pp_string (buffer, "Linker_Section => \"");
+  pp_string (buffer, TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (sec;
+  pp_string (buffer, "\", ");
+  newline_and_indent (buffer, spc + 5);
+}
+
   pp_string (buffer, "External_Name => \"");
 
   if (is_stdcall)
@@ -2179,10 +2188,11 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc,
 	}
   else
 	{
-	  const unsigned int quals = TYPE_QUALS (TREE_TYPE (node));
+	  tree ref_type = TREE_TYPE (node);
+	  const unsigned int quals = TYPE_QUALS (ref_type);
 	  bool is_access = false;
 
-	  if (VOID_TYPE_P (TREE_TYPE (node)))
+	  if (VOID_TYPE_P (ref_type))
 	{
 	  if (!name_only)
 		pp_string (buffer, "new ");
@@ -2197,9 +2207,8 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc,
 	  else
 	{
 	  if (TREE_CODE (node) == POINTER_TYPE
-		  && TREE_CODE (TREE_TYPE (node)) == INTEGER_TYPE
-		  && id_equal (DECL_NAME (TYPE_NAME (TREE_TYPE (node))),
-			   "char"))
+		  && TREE_CODE (ref_type) == INTEGER_TYPE
+		  && id_equal (DECL_NAME (TYPE_NAME (ref_type)), "char"))
 		{
 		  if (!name_only)
 		pp_string (buffer, "new ");
@@ -2214,28 +2223,11 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc,
 		}
 	  else
 		{
-		  tree type_name = TYPE_NAME (TREE_TYPE (node));
-
-		  /* Generate "access " instead of "access "
-		 if the subtype comes from another file, because subtype
-		 declarations do not contribute to the limited view of a
-		 package and thus subtypes cannot be referenced through
-		 a limited_with clause.  */
-		  if (type_name
-		  && TREE_CODE (type_name) == TYPE_DECL
-		  && DECL_ORIGINAL_TYPE (type_name)
-		  && TYPE_NAME (DECL_ORIGINAL_TYPE (type_name)))
-		{
-		  const expanded_location xloc
-			= expand_location (decl_sloc (type_name, false));
-		  if (xloc.line
-			  && xloc.file
-			  && xloc.file != current_source_file)
-			type_name = DECL_ORIGINAL_TYPE (type_name);
-		}
+		  tree stub = TYPE_STUB_DECL (ref_type);
+		  tree type_name = TYPE_NAME (ref_type);
 
 		  /* For now, handle access-to-access as System.Address.  */
-		  if (TREE_CODE (TREE_TYPE (node)) == POINTER_TYPE)
+		  if (TREE_CODE (ref_type) == POINTER_TYPE)
 		{
 		  if (package_prefix)
 			{
@@ -2251,7 +2243,7 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc,
 
 		  if (!package_prefix)
 		pp_string (buffer, "access");
-		  else if (AGGREGATE_TYPE_P (TREE_TYPE (node)))
+		  else if (AGGREGATE_TYPE_P (ref_type))
 		{
 		  if (!type || TREE_CODE (type) != FUNCTION_DECL)
 			{
@@ -2281,12 +2273,41 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc,
 			pp_string (buffer, "all ");
 		}
 
-		  if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (node)) && type_name)
-		dump_ada_node (buffer, type_name, TREE_TYPE (node), spc,
-   is_access, true);
-		  else
-		dump_ada_node (buffer, TREE_TYPE (node), TREE_TYPE (node),
-   spc, false, true);
+		  /* If this is the anonymous original type of a typedef'ed
+		 type, then use the name of the latter.  */
+		  if (!type_name
+		  && stub
+		  && DECL_CHAIN (stub)
+		  && TREE_CODE (DECL_CHAIN (stub)) == TYPE_DECL
+		  && DECL_ORIGINAL_TYPE (DECL_CHAIN (stub)) == ref_type)
+		ref_type = TREE_TYPE (DECL_CHAIN (stub));
+
+		  /* Generate "access " instead of "access "
+		 if the subtype comes from another file, because subtype
+		 declarations do not contribute to the limited view of a
+		 package and thus subtypes cannot be referenced through
+		 a limited_with clause.  */
+		  else if (is_access)
+		while (type_name
+			   && TREE_CODE (type_name) == TYPE_DECL
+			   && DECL_ORIGINAL_TYPE (type_name)
+			   && TYPE_NAME (DECL_ORIGINAL_TYPE (type_name)))
+		  {
+			const expanded_location xloc
+			  = expand_location (decl_sloc (type_name, false));
+			if (xloc.line
+			&& xloc.file
+

Re: [PATCH] fortran: Fix up initializers of param(0) PARAMETERs [PR103691]

2022-03-25 Thread Jakub Jelinek via Gcc-patches

On Fri, Mar 25, 2022 at 12:16:40PM +0100, Richard Biener wrote:
> On Fri, Mar 25, 2022 at 11:13 AM Tobias Burnus  
> wrote:
> >
> > On 25.03.22 09:57, Jakub Jelinek via Fortran wrote:
> > > On the gfortran.dg/pr103691.f90 testcase the Fortran ICE emits
> > >static real(kind=4) a[0] = {[0 ... -1]=2.0e+0};
> > > That is an invalid RANGE_EXPR where the maximum is smaller than the 
> > > minimum.
> > >
> > > The following patch fixes that.  If TYPE_MAX_VALUE is smaller than
> > > TYPE_MIN_VALUE, the array is empty and so doesn't need any initializer,
> > > if the two are equal, we don't need to bother with a RANGE_EXPR and
> > > can just use that INTEGER_CST as the index and finally for the 2+ values
> > > in the range it uses a RANGE_EXPR as before.
> > >
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > LGTM – thanks for taking care of Fortran patches and regressions.
> >
> > > 2022-03-25  Jakub Jelinek  
> > >
> > >   PR fortran/103691
> > >   * trans-array.cc (gfc_conv_array_initializer): If TYPE_MAX_VALUE is
> > >   smaller than TYPE_MIN_VALUE (i.e. empty array), throw the 
> > > initializer
> > >   on the floor, if TYPE_MIN_VALUE is equal to TYPE_MAX_VALUE, use just
> > >   the TYPE_MIN_VALUE as index instead of RANGE_EXPR.
> >
> > I am not sure whether "throw the initializer on the floor" is the best 
> > wording
> > for a changelog. I think I prefer a wording like "ignore the initializer" or
> > another less idiomatic expression. And I think a ';' before the second 'if'
> > also increases readability.
> 
> Can there be side-effects in those initializer elements in Fortran?

For PARAMETERs certainly not, those need to be constant.
Even otherwise, this is in a routine that does
  /* Create a constructor from the list of elements.  */
  tmp = build_constructor (type, v);
  TREE_CONSTANT (tmp) = 1;
  return tmp;
at the end so I wouldn't expect side-effects anywhere.

Also, I think typically in the Fortran FE side-effects would go into
se.pre and se.post sequences, not into se.expr, and this routine
doesn't emit those se.pre/se.post sequences anywhere, so presumably it
assumes they don't exist.

What is the behavior with a RANGE_EXPR when one has { [0..10] = ++i; },
is that applying the side-effects 11 times or once ?


Jakub

Re: [PATCH] fortran: Fix up initializers of param(0) PARAMETERs [PR103691]

2022-03-25 Thread Richard Biener via Gcc-patches

On Fri, Mar 25, 2022 at 11:13 AM Tobias Burnus  wrote:
>
> On 25.03.22 09:57, Jakub Jelinek via Fortran wrote:
> > On the gfortran.dg/pr103691.f90 testcase the Fortran ICE emits
> >static real(kind=4) a[0] = {[0 ... -1]=2.0e+0};
> > That is an invalid RANGE_EXPR where the maximum is smaller than the minimum.
> >
> > The following patch fixes that.  If TYPE_MAX_VALUE is smaller than
> > TYPE_MIN_VALUE, the array is empty and so doesn't need any initializer,
> > if the two are equal, we don't need to bother with a RANGE_EXPR and
> > can just use that INTEGER_CST as the index and finally for the 2+ values
> > in the range it uses a RANGE_EXPR as before.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> LGTM – thanks for taking care of Fortran patches and regressions.
>
> > 2022-03-25  Jakub Jelinek  
> >
> >   PR fortran/103691
> >   * trans-array.cc (gfc_conv_array_initializer): If TYPE_MAX_VALUE is
> >   smaller than TYPE_MIN_VALUE (i.e. empty array), throw the initializer
> >   on the floor, if TYPE_MIN_VALUE is equal to TYPE_MAX_VALUE, use just
> >   the TYPE_MIN_VALUE as index instead of RANGE_EXPR.
>
> I am not sure whether "throw the initializer on the floor" is the best wording
> for a changelog. I think I prefer a wording like "ignore the initializer" or
> another less idiomatic expression. And I think a ';' before the second 'if'
> also increases readability.

Can there be side-effects in those initializer elements in Fortran?

Richard.

> Tobias
>
> > --- gcc/fortran/trans-array.cc.jj 2022-02-04 14:36:55.113603791 +0100
> > +++ gcc/fortran/trans-array.cc2022-03-24 16:14:58.334498775 +0100
> > @@ -6267,10 +6267,17 @@ gfc_conv_array_initializer (tree type, g
> > else
> >   gfc_conv_structure (, expr, 1);
> >
> > -  CONSTRUCTOR_APPEND_ELT (v, build2 (RANGE_EXPR, gfc_array_index_type,
> > -  TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
> > -  TYPE_MAX_VALUE (TYPE_DOMAIN (type))),
> > -   se.expr);
> > +  if (tree_int_cst_lt (TYPE_MAX_VALUE (TYPE_DOMAIN (type)),
> > +TYPE_MIN_VALUE (TYPE_DOMAIN (type
> > + break;
> > +  else if (tree_int_cst_equal (TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
> > +TYPE_MAX_VALUE (TYPE_DOMAIN (type
> > + range = TYPE_MIN_VALUE (TYPE_DOMAIN (type));
> > +  else
> > + range = build2 (RANGE_EXPR, gfc_array_index_type,
> > + TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
> > + TYPE_MAX_VALUE (TYPE_DOMAIN (type)));
> > +  CONSTRUCTOR_APPEND_ELT (v, range, se.expr);
> > break;
> >
> >   case EXPR_ARRAY:
> >
> >   Jakub
> >
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

Re: [PATCH] arm: Revert Auto-vectorization for MVE: add pack/unpack patterns PR target/104882

2022-03-25 Thread Jakub Jelinek via Gcc-patches

On Tue, Mar 22, 2022 at 03:33:44PM +0100, Christophe Lyon via Gcc-patches wrote:
> This reverts commit r12-1434-g046a3beb1673bf to fix PR target/104882.
> 
> As discussed in the PR, it turns out that the MVE ISA has no natural
> mapping with GCC's vec_pack_trunc / vec_unpack standard patterns, unlike
> Neon or SVE for instance.
> 
> This patch also adds the executable testcase provided in the PR.
> This test passes at -O3 because the generated code does not need
> to use the pack/unpack patterns, hence the use of -O2 which now
> triggers vectorization since a few months ago.

For reverting your own patches you don't need to wait for approval:
https://gcc.gnu.org/gitwrite.html
"Similarly, no outside approval is needed to revert a patch that you checked 
in."

The new test LGTM.

> 2022-03-18  Christophe Lyon  
> 
>   PR target/104882
>   Revert
>   2021-06-11  Christophe Lyon  
> 
>   gcc/
>   * config/arm/mve.md (mve_vec_unpack_lo_): Delete.
>   (mve_vec_unpack_hi_): Delete.
>   (@mve_vec_pack_trunc_lo_): Delete.
>   (mve_vmovntq_): Remove '@' prefix.
>   * config/arm/neon.md (vec_unpack_hi_): Move back
>   from vec-common.md.
>   (vec_unpack_lo_): Likewise.
>   (vec_pack_trunc_): Rename from
>   neon_quad_vec_pack_trunc_.
>   * config/arm/vec-common.md (vec_unpack_hi_): Delete.
>   (vec_unpack_lo_): Delete.
>   (vec_pack_trunc_): Delete.
> 
>   PR target/104882
>   gcc/testsuite/
>   * gcc.target/arm/simd/mve-vclz.c: Update expected results.
>   * gcc.target/arm/simd/mve-vshl.c: Likewise.
>   * gcc.target/arm/simd/mve-vec-pack.c: Delete.
>   * gcc.target/arm/simd/mve-vec-unpack.c: Delete.
>   * gcc.target/arm/simd/pr104882.c: New test.

Jakub

[PATCH] recog: Return 1 from insn_invalid_p if REG_INC reg overlaps some stored reg [PR103775]

2022-03-25 Thread Jakub Jelinek via Gcc-patches

Hi!

The following testcase ICEs on aarch64-linux with -g and
assembles with a warning otherwise, because it emits
ldrb w0,[x0,16]!
instruction which sets the x0 register multiple times.
Due to disabled DCE (from -Og) we end up before REE with:
(insn 12 39 13 2 (set (reg:SI 1 x1 [orig:93 _2 ] [93])
(zero_extend:SI (mem/c:QI (pre_modify:DI (reg/f:DI 0 x0 [114])
(plus:DI (reg/f:DI 0 x0 [114])
(const_int 16 [0x10]))) [1 u128_1+0 S1 A128]))) 
"pr103775.c":5:35 117 {*zero_extendqisi2_aarch64}
 (expr_list:REG_INC (reg/f:DI 0 x0 [114])
(nil)))
(insn 13 12 14 2 (set (reg:DI 0 x0 [orig:112 _2 ] [112])
(zero_extend:DI (reg:SI 1 x1 [orig:93 _2 ] [93]))) "pr103775.c":5:16 
111 {*zero_extendsidi2_aarch64}
 (nil))
which is valid but not exactly efficient as x0 is dead after the
insn that auto-increments it.  REE turns it into:
(insn 12 39 44 2 (set (reg:DI 0 x0)
(zero_extend:DI (mem/c:QI (pre_modify:DI (reg/f:DI 0 x0 [114])
(plus:DI (reg/f:DI 0 x0 [114])
(const_int 16 [0x10]))) [1 u128_1+0 S1 A128]))) 
"pr103775.c":5:35 119 {*zero_extendqidi2_aarch64}
 (expr_list:REG_INC (reg/f:DI 0 x0 [114])
(nil)))
(insn 44 12 14 2 (set (reg:DI 1 x1)
(reg:DI 0 x0)) "pr103775.c":5:35 -1
 (nil))
which is invalid because it sets x0 multiple times, one
in SET_DEST of the PATTERN and once in PRE_MODIFY.
As perhaps other passes than REE might suffer from it, IMHO it is better
to reject this during change validation.

Below is one patch that does that only if reload_completed, attached
is another version that does it always even before reload.

I've so far bootstrapped/regtested the first patch on
{x86_64,i686,powerpc64le,armv7hl}-linux, aarch64-linux regtest
is still pending.
If you prefer the second version, I can start testing it momentarily.

2022-03-25  Jakub Jelinek  

PR rtl-optimization/103775
* recog.cc (check_invalid_inc_dec): New function.
(insn_invalid_p): Return 1 if REG_INC operand overlaps
any stored REGs.

* gcc.dg/pr103775.c: New test.

--- gcc/recog.cc.jj 2022-01-18 11:58:59.802978913 +0100
+++ gcc/recog.cc2022-03-24 17:46:00.116489292 +0100
@@ -329,6 +329,17 @@ canonicalize_change_group (rtx_insn *ins
 return false;
 }
 
+/* Check if REG_INC argument in *data overlaps a stored REG.  */
+
+static void
+check_invalid_inc_dec (rtx reg, const_rtx, void *data)
+{
+  rtx *pinc = (rtx *) data;
+  if (*pinc == NULL_RTX || MEM_P (reg))
+return;
+  if (reg_overlap_mentioned_p (reg, *pinc))
+*pinc = NULL_RTX;
+}
 
 /* This subroutine of apply_change_group verifies whether the changes to INSN
were valid; i.e. whether INSN can still be recognized.
@@ -384,6 +395,17 @@ insn_invalid_p (rtx_insn *insn, bool in_
 
   if (! constrain_operands (1, get_preferred_alternatives (insn)))
return 1;
+
+  /* Punt if REG_INC argument overlaps some stored REG.  */
+  for (rtx link = FIND_REG_INC_NOTE (insn, NULL_RTX);
+  link; link = XEXP (link, 1))
+   if (REG_NOTE_KIND (link) == REG_INC)
+ {
+   rtx reg = XEXP (link, 0);
+   note_stores (insn, check_invalid_inc_dec, );
+   if (reg == NULL_RTX)
+ return 1;
+ }
 }
 
   INSN_CODE (insn) = icode;
--- gcc/testsuite/gcc.dg/pr103775.c.jj  2022-03-24 17:51:25.962817859 +0100
+++ gcc/testsuite/gcc.dg/pr103775.c 2022-03-24 17:51:11.024032506 +0100
@@ -0,0 +1,12 @@
+/* PR rtl-optimization/103775 */
+/* { dg-do assemble { target int128 } } */
+/* { dg-options "-Og -fno-forward-propagate -free -g" } */
+
+int
+foo (char a, short b, int c, __int128 d, char e, short f, int g, __int128 h)
+{
+  long i = __builtin_clrsbll ((char) h);
+  __builtin_memset ((char *)  + 4, d, 3);
+  c &= (char) h;
+  return c + i;
+}

Jakub
2022-03-25  Jakub Jelinek  

PR rtl-optimization/103775
* recog.cc (check_invalid_inc_dec): New function.
(insn_invalid_p): Return 1 if REG_INC operand overlaps
any stored REGs.

* gcc.dg/pr103775.c: New test.

--- gcc/recog.cc.jj 2022-01-18 11:58:59.802978913 +0100
+++ gcc/recog.cc2022-03-24 17:46:00.116489292 +0100
@@ -329,6 +329,17 @@ canonicalize_change_group (rtx_insn *ins
 return false;
 }
 
+/* Check if REG_INC argument in *data overlaps a stored REG.  */
+
+static void
+check_invalid_inc_dec (rtx reg, const_rtx, void *data)
+{
+  rtx *pinc = (rtx *) data;
+  if (*pinc == NULL_RTX || MEM_P (reg))
+return;
+  if (reg_overlap_mentioned_p (reg, *pinc))
+*pinc = NULL_RTX;
+}
 
 /* This subroutine of apply_change_group verifies whether the changes to INSN
were valid; i.e. whether INSN can still be recognized.
@@ -386,6 +397,17 @@ insn_invalid_p (rtx_insn *insn, bool in_
return 1;
 }
 
+  /* Punt if REG_INC argument overlaps some stored REG.  */
+  for (rtx link = FIND_REG_INC_NOTE (insn, NULL_RTX);
+

Re: [PATCH] fortran: Fix up initializers of param(0) PARAMETERs [PR103691]

2022-03-25 Thread Tobias Burnus


On 25.03.22 09:57, Jakub Jelinek via Fortran wrote:

On the gfortran.dg/pr103691.f90 testcase the Fortran ICE emits
   static real(kind=4) a[0] = {[0 ... -1]=2.0e+0};
That is an invalid RANGE_EXPR where the maximum is smaller than the minimum.

The following patch fixes that.  If TYPE_MAX_VALUE is smaller than
TYPE_MIN_VALUE, the array is empty and so doesn't need any initializer,
if the two are equal, we don't need to bother with a RANGE_EXPR and
can just use that INTEGER_CST as the index and finally for the 2+ values
in the range it uses a RANGE_EXPR as before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


LGTM – thanks for taking care of Fortran patches and regressions.


2022-03-25  Jakub Jelinek  

  PR fortran/103691
  * trans-array.cc (gfc_conv_array_initializer): If TYPE_MAX_VALUE is
  smaller than TYPE_MIN_VALUE (i.e. empty array), throw the initializer
  on the floor, if TYPE_MIN_VALUE is equal to TYPE_MAX_VALUE, use just
  the TYPE_MIN_VALUE as index instead of RANGE_EXPR.


I am not sure whether "throw the initializer on the floor" is the best wording
for a changelog. I think I prefer a wording like "ignore the initializer" or
another less idiomatic expression. And I think a ';' before the second 'if'
also increases readability.

Tobias


--- gcc/fortran/trans-array.cc.jj 2022-02-04 14:36:55.113603791 +0100
+++ gcc/fortran/trans-array.cc2022-03-24 16:14:58.334498775 +0100
@@ -6267,10 +6267,17 @@ gfc_conv_array_initializer (tree type, g
else
  gfc_conv_structure (, expr, 1);

-  CONSTRUCTOR_APPEND_ELT (v, build2 (RANGE_EXPR, gfc_array_index_type,
-  TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
-  TYPE_MAX_VALUE (TYPE_DOMAIN (type))),
-   se.expr);
+  if (tree_int_cst_lt (TYPE_MAX_VALUE (TYPE_DOMAIN (type)),
+TYPE_MIN_VALUE (TYPE_DOMAIN (type
+ break;
+  else if (tree_int_cst_equal (TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
+TYPE_MAX_VALUE (TYPE_DOMAIN (type
+ range = TYPE_MIN_VALUE (TYPE_DOMAIN (type));
+  else
+ range = build2 (RANGE_EXPR, gfc_array_index_type,
+ TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
+ TYPE_MAX_VALUE (TYPE_DOMAIN (type)));
+  CONSTRUCTOR_APPEND_ELT (v, range, se.expr);
break;

  case EXPR_ARRAY:

  Jakub


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tobias Burnus


On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:

On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:

[...]
Fix this by scaling down the failing test-cases.
Tested on x86_64-linux with nvptx accelerator.
[...]

Will defer to Thomas, as it is a purely OpenACC change.

One way to do it is
/* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests } } */
and using
#ifdef EXPENSIVE
[...]

For the Fortran test it would mean .F90 extension though...


Alternative, use the "-cpp" flag in 'dg-additional-options', which also
enables the C-pre-processor pre-processing in gfortran.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [committed] Docs: Document that taint analyzer checker disables some warnings [PR103533]

2022-03-25 Thread Tobias Burnus


This commit broke bootstrapping - well, at least kind of:
  make pdf   (and probably 'make dvi' and other formats using TeX)
now fails with:

Runaway argument?

-Wanalyzer-tainted-allocation-size @gol -Wanalyzer-tainted-array-inde@ETC.
src/gcc-mainline/gcc/doc/invoke.texi:96
82: File ended while scanning use of @doignoretext.

The problem seems to be that when producing TeX output, the '...' of
  @gccoptlist{ ... }
ends up in a single line. Thus, '@ignore' is found but no
corresponding '@end ignore'.

Using '@c' has the same problem. And adding a line break will give
an error as  @gccoptlist  does not permit another paragraph in
the argument.

My solution is move the '@ignore' comment after the closing '}'.

Committed attached patch as obvious
as r12-7810-g748f36a48b506f52e10bcdeb750a7fe9c30c26f3

The culprit is the following change:

On 25.03.22 02:08, David Malcolm via Gcc-patches wrote:

From: Avinash Sonawane 

On Thu, 2022-03-24 at 10:41 +0530, Avinash Sonawane wrote:

[...]

[...]

I've gone ahead and pushed it to trunk as r12-7808-g319ba7e241e7e2:
   
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=319ba7e241e7e21f9eb481f075310796f13d2035
(the patch I committed follows)
[...]

gcc/ChangeLog:
  PR analyzer/103533
  * doc/invoke.texi: Document that enabling taint analyzer
  checker disables some warnings from `-fanalyzer`.
[...]

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
[...]
@@ -9659,22 +9659,24 @@ Enabling this option effectively enables the following 
warnings:
  -Wanalyzer-free-of-non-heap @gol
  -Wanalyzer-malloc-leak @gol
  -Wanalyzer-mismatching-deallocation @gol
--Wanalyzer-possible-null-argument @gol
--Wanalyzer-possible-null-dereference @gol
  -Wanalyzer-null-argument @gol
  -Wanalyzer-null-dereference @gol
+-Wanalyzer-possible-null-argument @gol
+-Wanalyzer-possible-null-dereference @gol
  -Wanalyzer-shift-count-negative @gol
  -Wanalyzer-shift-count-overflow @gol
  -Wanalyzer-stale-setjmp-buffer @gol
+@ignore
  -Wanalyzer-tainted-allocation-size @gol
  -Wanalyzer-tainted-array-index @gol
  -Wanalyzer-tainted-divisor @gol
  -Wanalyzer-tainted-offset @gol
  -Wanalyzer-tainted-size @gol
+@end ignore
  -Wanalyzer-unsafe-call-within-signal-handler @gol
  -Wanalyzer-use-after-free @gol
--Wanalyzer-use-of-uninitialized-value @gol
  -Wanalyzer-use-of-pointer-in-stale-stack-frame @gol
+-Wanalyzer-use-of-uninitialized-value @gol
  -Wanalyzer-write-to-const @gol
  -Wanalyzer-write-to-string-literal @gol
  }
[...]


Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 748f36a48b506f52e10bcdeb750a7fe9c30c26f3
Author: Tobias Burnus 
Date:   Fri Mar 25 10:47:49 2022 +0100

doc/invoke.texi: Move @ignore block out of @gccoptlist [PR103533]

With TeX output ("make pdf"), @gccoptlist's content end up in a single
line such that TeX does not find the matching '@end ignore' for the
'@ignore' block – failing with a runaway error. Solution is to move
the @ignore block after the closing '}'.
(Follow up to r12-7808-g319ba7e241e7e21f9eb481f075310796f13d2035 )

gcc/
PR analyzer/103533
* doc/invoke.texi (Static Analyzer Options): Move
@ignore block after @gccoptlist's '}' for 'make pdf'.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index afb21d9154c..28307105541 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9667,13 +9667,6 @@ Enabling this option effectively enables the following warnings:
 -Wanalyzer-shift-count-negative @gol
 -Wanalyzer-shift-count-overflow @gol
 -Wanalyzer-stale-setjmp-buffer @gol
-@ignore
--Wanalyzer-tainted-allocation-size @gol
--Wanalyzer-tainted-array-index @gol
--Wanalyzer-tainted-divisor @gol
--Wanalyzer-tainted-offset @gol
--Wanalyzer-tainted-size @gol
-@end ignore
 -Wanalyzer-unsafe-call-within-signal-handler @gol
 -Wanalyzer-use-after-free @gol
 -Wanalyzer-use-of-pointer-in-stale-stack-frame @gol
@@ -9681,6 +9674,13 @@ Enabling this option effectively enables the following warnings:
 -Wanalyzer-write-to-const @gol
 -Wanalyzer-write-to-string-literal @gol
 }
+@ignore
+-Wanalyzer-tainted-allocation-size @gol
+-Wanalyzer-tainted-array-index @gol
+-Wanalyzer-tainted-divisor @gol
+-Wanalyzer-tainted-offset @gol
+-Wanalyzer-tainted-size @gol
+@end ignore
 
 This option is only available if GCC was configured with analyzer
 support enabled.

[PATCH] middle-end/105049 - fix uniform_vector_p and vector CTOR gimplification

2022-03-25 Thread Richard Biener via Gcc-patches

We have

  return VIEW_CONVERT_EXPR( VEC_PERM_EXPR < {<<< Unknown tree: 
compound_literal_expr
V D.1984 = { 0 }; >>>, { 0 }} , {<<< Unknown tree: compound_literal_expr
V D.1985 = { 0 }; >>>, { 0 }} , { 0, 0 } >  & {(short int) SAVE_EXPR 
, (short int) SAVE_EXPR });

where we gimplify the init CTORs to

  _1 = {{ 0 }, { 0 }};
  _2 = {{ 0 }, { 0 }};

instead of to vector constants.  That later runs into a bug in
uniform_vector_p which doesn't handle CTORs of vector elements
correctly.

The following adjusts uniform_vector_p to handle CTORs of vector
elements and also makes sure to simplify the CTORs to VECTOR_CSTs
during gimplification by re-ordering the simplification to after
CTOR flag recomputation.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  At this
point I'm leaning towards delaying the gimplification change
to stage1 - do you agree?

Thanks,
Richard.

2022-03-25  Richard Biener  

PR middle-end/105049
* gimplify.cc (gimplify_init_constructor): First gimplify,
then simplify the result to a VECTOR_CST.
* tree.cc (uniform_vector_p): Recurse for VECTOR_CST or
CONSTRUCTOR first elements.

* gcc/testsuite/gcc.dg/pr105049.c: New testcase.
---
 gcc/gimplify.cc | 33 -
 gcc/testsuite/gcc.dg/pr105049.c | 12 
 gcc/tree.cc |  2 ++
 3 files changed, 30 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr105049.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f62f150fc08..a866d4e6f56 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -5390,6 +5390,22 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
if (notify_temp_creation)
  return GS_OK;
 
+   /* Vector types use CONSTRUCTOR all the way through gimple
+  compilation as a general initializer.  */
+   FOR_EACH_VEC_SAFE_ELT (elts, ix, ce)
+ {
+   enum gimplify_status tret;
+   tret = gimplify_expr (>value, pre_p, post_p, is_gimple_val,
+ fb_rvalue);
+   if (tret == GS_ERROR)
+ ret = GS_ERROR;
+   else if (TREE_STATIC (ctor)
+&& !initializer_constant_valid_p (ce->value,
+  TREE_TYPE (ce->value)))
+ TREE_STATIC (ctor) = 0;
+ }
+   recompute_constructor_flags (ctor);
+
/* Go ahead and simplify constant constructors to VECTOR_CST.  */
if (TREE_CONSTANT (ctor))
  {
@@ -5412,25 +5428,8 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
TREE_OPERAND (*expr_p, 1) = build_vector_from_ctor (type, elts);
break;
  }
-
-   TREE_CONSTANT (ctor) = 0;
  }
 
-   /* Vector types use CONSTRUCTOR all the way through gimple
-  compilation as a general initializer.  */
-   FOR_EACH_VEC_SAFE_ELT (elts, ix, ce)
- {
-   enum gimplify_status tret;
-   tret = gimplify_expr (>value, pre_p, post_p, is_gimple_val,
- fb_rvalue);
-   if (tret == GS_ERROR)
- ret = GS_ERROR;
-   else if (TREE_STATIC (ctor)
-&& !initializer_constant_valid_p (ce->value,
-  TREE_TYPE (ce->value)))
- TREE_STATIC (ctor) = 0;
- }
-   recompute_constructor_flags (ctor);
if (!is_gimple_reg (TREE_OPERAND (*expr_p, 0)))
  TREE_OPERAND (*expr_p, 1) = get_formal_tmp_var (ctor, pre_p);
   }
diff --git a/gcc/testsuite/gcc.dg/pr105049.c b/gcc/testsuite/gcc.dg/pr105049.c
new file mode 100644
index 000..b0518c6a181
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr105049.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-tree-forwprop" } */
+
+typedef short __attribute__((__vector_size__ (sizeof(short V;
+typedef short __attribute__((__vector_size__ (2*sizeof(short U;
+char c;
+
+U
+foo (void)
+{
+  return __builtin_shufflevector ((V){}, (V){}, 0, 0) & c;
+}
diff --git a/gcc/tree.cc b/gcc/tree.cc
index b8017af6cfc..ec200e9a7eb 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10266,6 +10266,8 @@ uniform_vector_p (const_tree vec)
   if (i != nelts)
return NULL_TREE;
 
+  if (TREE_CODE (first) == CONSTRUCTOR || TREE_CODE (first) == VECTOR_CST)
+   return uniform_vector_p (first);
   return first;
 }
 
-- 
2.34.1

Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Jakub Jelinek via Gcc-patches

On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:
> When a display manager is running on an nvidia card, all CUDA kernel launches
> get a 5 seconds watchdog timer.
> 
> Consequently, when running the libgomp testsuite with nvptx accelerator and
> GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
> ...
> libgomp: cuStreamSynchronize error: the launch timed out and was terminated
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
>   -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
>   execution test
> ...
> 
> Fix this by scaling down the failing test-cases.
> 
> Tested on x86_64-linux with nvptx accelerator.
> 
> OK for trunk?

Will defer to Thomas, as it is a purely OpenACC change.
One way to do it is
/* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests } } */
and using
#ifdef EXPENSIVE
#define N 100
#else
#define N 50
#endif
etc., that way the tests will be normally scaled down, but with
GCC_TEST_RUN_EXPENSIVE=1 in the environment one can still request
the more expensive tests.
For the Fortran test it would mean .F90 extension though...

Jakub

[PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches

Hi,

When a display manager is running on an nvidia card, all CUDA kernel launches
get a 5 seconds watchdog timer.

Consequently, when running the libgomp testsuite with nvptx accelerator and
GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
  execution test
...

Fix this by scaling down the failing test-cases.

Tested on x86_64-linux with nvptx accelerator.

OK for trunk?

Thanks,
- Tom

[libgomp, testsuite] Scale down some OpenACC test-cases

libgomp/ChangeLog:

2022-03-25  Tom de Vries  

PR libgomp/105042
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce
execution time.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.

---
 .../libgomp.oacc-c-c++-common/parallel-dims.c  | 39 +++---
 .../libgomp.oacc-c-c++-common/vred2d-128.c |  2 +-
 .../libgomp.oacc-fortran/parallel-dims.f90 | 10 +++---
 3 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index b1cfe37df8a..d9e4bd0d75f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -49,6 +49,7 @@ static int acc_vector ()
   return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
 }
 
+#define N 50
 
 int main ()
 {
@@ -76,7 +77,7 @@ int main ()
 {
   /* We're actually executing with num_gangs (1).  */
   gangs_actual = 1;
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+  for (int i = N * gangs_actual; i > -N * gangs_actual; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -115,7 +116,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for 
adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target 
*-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+  for (int i = N * gangs_actual; i > -N * gangs_actual; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -154,7 +155,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for 
adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC worker loop parallelism} {} { target 
*-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * workers_actual; i > -100 * workers_actual; --i)
+  for (int i = N * workers_actual; i > -N * workers_actual; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -200,7 +201,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for 
adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC vector loop parallelism} {} { target 
*-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * vectors_actual; i > -100 * vectors_actual; --i)
+  for (int i = N * vectors_actual; i > -N * vectors_actual; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -250,7 +251,7 @@ int main ()
}
   /* As we're executing GR not GP, don't multiply with a "gangs_actual"
 factor.  */
-  for (int i = 100 /* * gangs_actual */; i > -100 /* * gangs_actual */; 
--i)
+  for (int i = N /* * gangs_actual */; i > -N /* * gangs_actual */; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -291,7 +292,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for 
adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target 
*-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+

[PATCH] fortran: Fix up initializers of param(0) PARAMETERs [PR103691]

2022-03-25 Thread Jakub Jelinek via Gcc-patches

Hi!

On the gfortran.dg/pr103691.f90 testcase the Fortran ICE emits
  static real(kind=4) a[0] = {[0 ... -1]=2.0e+0};
That is an invalid RANGE_EXPR where the maximum is smaller than the minimum.

The following patch fixes that.  If TYPE_MAX_VALUE is smaller than
TYPE_MIN_VALUE, the array is empty and so doesn't need any initializer,
if the two are equal, we don't need to bother with a RANGE_EXPR and
can just use that INTEGER_CST as the index and finally for the 2+ values
in the range it uses a RANGE_EXPR as before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-03-25  Jakub Jelinek  

PR fortran/103691
* trans-array.cc (gfc_conv_array_initializer): If TYPE_MAX_VALUE is
smaller than TYPE_MIN_VALUE (i.e. empty array), throw the initializer
on the floor, if TYPE_MIN_VALUE is equal to TYPE_MAX_VALUE, use just
the TYPE_MIN_VALUE as index instead of RANGE_EXPR.

--- gcc/fortran/trans-array.cc.jj   2022-02-04 14:36:55.113603791 +0100
+++ gcc/fortran/trans-array.cc  2022-03-24 16:14:58.334498775 +0100
@@ -6267,10 +6267,17 @@ gfc_conv_array_initializer (tree type, g
   else
gfc_conv_structure (, expr, 1);
 
-  CONSTRUCTOR_APPEND_ELT (v, build2 (RANGE_EXPR, gfc_array_index_type,
-TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
-TYPE_MAX_VALUE (TYPE_DOMAIN (type))),
- se.expr);
+  if (tree_int_cst_lt (TYPE_MAX_VALUE (TYPE_DOMAIN (type)),
+  TYPE_MIN_VALUE (TYPE_DOMAIN (type
+   break;
+  else if (tree_int_cst_equal (TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
+  TYPE_MAX_VALUE (TYPE_DOMAIN (type
+   range = TYPE_MIN_VALUE (TYPE_DOMAIN (type));
+  else
+   range = build2 (RANGE_EXPR, gfc_array_index_type,
+   TYPE_MIN_VALUE (TYPE_DOMAIN (type)),
+   TYPE_MAX_VALUE (TYPE_DOMAIN (type)));
+  CONSTRUCTOR_APPEND_ELT (v, range, se.expr);
   break;
 
 case EXPR_ARRAY:

Jakub

[PATCH 3/3] RISC-V:Cache Management Operation instructions testcases

2022-03-25 Thread yulong

From: yulong-plct 

This commit adds testcases about CMO instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbom-1.c: New test.
* gcc.target/riscv/cmo-zicbom-2.c: New test.
* gcc.target/riscv/cmo-zicbop-1.c: New test.
* gcc.target/riscv/cmo-zicbop-2.c: New test.
* gcc.target/riscv/cmo-zicboz-1.c: New test.
* gcc.target/riscv/cmo-zicboz-2.c: New test.

---
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 21 +
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 21 +
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 23 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 23 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c |  9 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c |  9 
 6 files changed, 106 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c

diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
new file mode 100644
index 000..26f980feb98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicbom -mabi=lp64" } */
+
+int foo1()
+{
+return __builtin_riscv_zicbom_cbo_clean();
+}
+
+int foo2()
+{
+return __builtin_riscv_zicbom_cbo_flush();
+}
+
+int foo3()
+{
+return __builtin_riscv_zicbom_cbo_inval();
+}
+
+/* { dg-final { scan-assembler-times "cbo.clean" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.flush" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.inval" 1 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
new file mode 100644
index 000..a997f22c233
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zicbom -mabi=ilp32" } */
+
+int foo1()
+{
+return __builtin_riscv_zicbom_cbo_clean();
+}
+
+int foo2()
+{
+return __builtin_riscv_zicbom_cbo_flush();
+}
+
+int foo3()
+{
+return __builtin_riscv_zicbom_cbo_inval();
+}
+
+/* { dg-final { scan-assembler-times "cbo.clean" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.flush" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.inval" 1 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
new file mode 100644
index 000..a6132d4d893
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+int foo1()
+{
+  return __builtin_riscv_zicbop_cbo_prefetchi(1);
+}
+
+/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
new file mode 100644
index 000..b88c1e42d99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile target { { rv32-*-*}}} */
+/* { dg-options "-march=rv32gc_zicbop -mabi=ilp32" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+int foo1()
+{
+  return __builtin_riscv_zicbop_cbo_prefetchi(1);
+}
+
+/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
new file mode 100644
index 000..3f1488a21b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicboz -mabi=lp64" } */
+
+int foo1()
+{
+return __builtin_riscv_zicboz_cbo_zero();
+}
+
+/* { dg-final { scan-assembler-times

[PATCH 2/3] RISC-V:Cache Management Operation instructions

2022-03-25 Thread yulong

From: yulong-plct 

This commit adds cbo.clea,cbo.flush,cbo.inval,cbo.zero,prefetch.i,prefetch.r 
and prefetch.w instructions.

gcc/ChangeLog:

* config/riscv/predicates.md (imm5_operand): Add a new operand type for 
prefetch instructions.
* config/riscv/riscv-builtins.cc (AVAIL): Add new AVAILs for CMO ISA 
Extensions.
(RISCV_ATYPE_SI): New.
(RISCV_ATYPE_DI): New.
* config/riscv/riscv-ftypes.def (0): New.
(1): New.
* config/riscv/riscv.md (riscv_clean_): New.
(riscv_flush_): New.
(riscv_inval_): New.
(riscv_zero_): New.
(prefetch): New.
(riscv_prefetchi_): New.
* config/riscv/riscv-cmo.def: New file.
---
 gcc/config/riscv/predicates.md |  4 +++
 gcc/config/riscv/riscv-builtins.cc | 16 +
 gcc/config/riscv/riscv-cmo.def | 17 ++
 gcc/config/riscv/riscv-ftypes.def  |  4 +++
 gcc/config/riscv/riscv.md  | 52 ++
 5 files changed, 93 insertions(+)
 create mode 100644 gcc/config/riscv/riscv-cmo.def

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 97cdbdf053b..3fb4d95ab08 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -239,3 +239,7 @@
 (define_predicate "const63_operand"
   (and (match_code "const_int")
(match_test "INTVAL (op) == 63")))
+
+(define_predicate "imm5_operand"
+  (and (match_code "const_int")
+   (match_test "INTVAL (op) < 5")))
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 0658f8d3047..795132a0c16 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -87,6 +87,18 @@ struct riscv_builtin_description {
 
 AVAIL (hard_float, TARGET_HARD_FLOAT)
 
+
+AVAIL (clean32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (clean64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (flush32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (flush64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (inval32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (inval64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (zero32,  TARGET_ZICBOZ && !TARGET_64BIT)
+AVAIL (zero64,  TARGET_ZICBOZ && TARGET_64BIT)
+AVAIL (prefetchi32, TARGET_ZICBOP && !TARGET_64BIT)
+AVAIL (prefetchi64, TARGET_ZICBOP && TARGET_64BIT)
+
 /* Construct a riscv_builtin_description from the given arguments.
 
INSN is the name of the associated instruction pattern, without the
@@ -119,6 +131,8 @@ AVAIL (hard_float, TARGET_HARD_FLOAT)
 /* Argument types.  */
 #define RISCV_ATYPE_VOID void_type_node
 #define RISCV_ATYPE_USI unsigned_intSI_type_node
+#define RISCV_ATYPE_SI intSI_type_node
+#define RISCV_ATYPE_DI intDI_type_node
 
 /* RISCV_FTYPE_ATYPESN takes N RISCV_FTYPES-like type codes and lists
their associated RISCV_ATYPEs.  */
@@ -128,6 +142,8 @@ AVAIL (hard_float, TARGET_HARD_FLOAT)
   RISCV_ATYPE_##A, RISCV_ATYPE_##B
 
 static const struct riscv_builtin_description riscv_builtins[] = {
+  #include "riscv-cmo.def"
+
   DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
   DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float)
 };
diff --git a/gcc/config/riscv/riscv-cmo.def b/gcc/config/riscv/riscv-cmo.def
new file mode 100644
index 000..01cbf6ad64f
--- /dev/null
+++ b/gcc/config/riscv/riscv-cmo.def
@@ -0,0 +1,17 @@
+// zicbom
+RISCV_BUILTIN (clean_si, "zicbom_cbo_clean", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE, clean32),
+RISCV_BUILTIN (clean_di, "zicbom_cbo_clean", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE, clean64),
+
+RISCV_BUILTIN (flush_si, "zicbom_cbo_flush", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE, flush32),
+RISCV_BUILTIN (flush_di, "zicbom_cbo_flush", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE, flush64),
+
+RISCV_BUILTIN (inval_si, "zicbom_cbo_inval", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE, inval32),
+RISCV_BUILTIN (inval_di, "zicbom_cbo_inval", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE, inval64),
+
+// zicboz
+RISCV_BUILTIN (zero_si, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE, zero32),
+RISCV_BUILTIN (zero_di, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE, zero64),
+
+// zicbop
+RISCV_BUILTIN (prefetchi_si, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI, prefetchi32),
+RISCV_BUILTIN (prefetchi_di, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI, prefetchi64),
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv-ftypes.def 
b/gcc/config/riscv/riscv-ftypes.def
index 2214c496f9b..62421292ce7 100644
--- a/gcc/config/riscv/riscv-ftypes.def
+++ b/gcc/config/riscv/riscv-ftypes.def
@@ -28,3 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 
 DEF_RISCV_FTYPE (0, (USI))
 DEF_RISCV_FTYPE (1, (VOID, USI))
+DEF_RISCV_FTYPE (0, (SI))
+DEF_RISCV_FTYPE (0, (DI))
+DEF_RISCV_FTYPE (1, (SI, SI))
+DEF_RISCV_FTYPE (1, (DI, DI))
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b3c5bce842a..43ad6e5a481 100644
--- a/gcc/config/riscv/riscv.md
+++

[PATCH 1/3] RISC-V: Add mininal support for Zicbo[mzp]

2022-03-25 Thread yulong

From: yulong-plct 

This commit adds minimal support for 'Zicbom','Zicboz' and 'Zicbop' extensions.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zicbom, zicboz, zicbop 
extensions.
* config/riscv/riscv-opts.h (MASK_ZICBOZ): New.
(MASK_ZICBOM): New.
(MASK_ZICBOP): New.
(TARGET_ZICBOZ): New.
(TARGET_ZICBOM): New.
(TARGET_ZICBOP): New.
* config/riscv/riscv.opt: New.

---
 gcc/common/config/riscv/riscv-common.cc | 6 ++
 gcc/config/riscv/riscv-opts.h   | 9 +
 gcc/config/riscv/riscv.opt  | 3 +++
 3 files changed, 18 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 1501242e296..52c6ac3b1c8 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -164,6 +164,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zksed", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zksh",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zk",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkn",   ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1109,6 +1112,9 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zksed",  _options::x_riscv_zk_subext, MASK_ZKSED},
   {"zksh",   _options::x_riscv_zk_subext, MASK_ZKSH},
   {"zkt",_options::x_riscv_zk_subext, MASK_ZKT},
+  {"zicboz", _options::x_riscv_zicmo_subext, MASK_ZICBOZ},
+  {"zicbom", _options::x_riscv_zicmo_subext, MASK_ZICBOM},
+  {"zicbop", _options::x_riscv_zicmo_subext, MASK_ZICBOP},
 
   {"zve32x",   _options::x_target_flags, MASK_VECTOR},
   {"zve32f",   _options::x_target_flags, MASK_VECTOR},
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 15bb5e76854..42a7ff698e7 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -83,6 +83,15 @@ enum stack_protector_guard {
 #define TARGET_ZBC((riscv_zb_subext & MASK_ZBC) != 0)
 #define TARGET_ZBS((riscv_zb_subext & MASK_ZBS) != 0)
 
+#define MASK_ZICBOZ   (1 << 0)
+#define MASK_ZICBOM   (1 << 1)
+#define MASK_ZICBOP   (1 << 2)
+
+
+#define TARGET_ZICBOZ ((riscv_zicmo_subext & MASK_ZICBOZ) != 0)
+#define TARGET_ZICBOM ((riscv_zicmo_subext & MASK_ZICBOM) != 0)
+#define TARGET_ZICBOP ((riscv_zicmo_subext & MASK_ZICBOP) != 0)
+
 #define MASK_ZBKB (1 << 0)
 #define MASK_ZBKC (1 << 1)
 #define MASK_ZBKX (1 << 2)
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 492aad12324..a0722613fcc 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -200,6 +200,9 @@ int riscv_zi_subext
 TargetVariable
 int riscv_zb_subext
 
+TargetVariable
+int riscv_zicmo_subext
+
 TargetVariable
 int riscv_zk_subext
 
-- 
2.17.1

[PATCH 0/3] RISC-V: Add Ratified Cache Management Operation ISA Extensions

2022-03-25 Thread yulong

From: yulong-plct 

This patchset adds support for three recently ratified RISC-V extensions:

-   Zicbom (Cache-Block Management Instructions)
-   Zicbop (Cache-Block Prefetch hint instructions)
-   Zicboz (Cache-Block Zero Instructions)

The naming of builtin caused oddities, so in this release we have changed the 
names of builtin. For example, change "__builtin_riscv_zero()" to 
"__builtin_riscv_zicboz_cbo_zero"

Patch 1: Add Zicbom/z/p mininal support
Patch 2: Add Zicbom/z/p instructions arch support
Patch 3: Add Zicbom/z/p instructions testcases

cf. 
;

*** BLURB HERE ***

yulong-plct (3):
  RISC-V: Add mininal support for Zicbo[mzp]
  RISC-V:Cache Management Operation instructions
  RISC-V:Cache Management Operation instructions testcases

 gcc/common/config/riscv/riscv-common.cc   |  6 +++
 gcc/config/riscv/predicates.md|  4 ++
 gcc/config/riscv/riscv-builtins.cc| 16 ++
 gcc/config/riscv/riscv-cmo.def| 17 ++
 gcc/config/riscv/riscv-ftypes.def |  4 ++
 gcc/config/riscv/riscv-opts.h |  9 
 gcc/config/riscv/riscv.md | 52 +++
 gcc/config/riscv/riscv.opt|  3 ++
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 21 
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 21 
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 23 
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 23 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c |  9 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c |  9 
 14 files changed, 217 insertions(+)
 create mode 100644 gcc/config/riscv/riscv-cmo.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c

-- 
2.17.1

73 matches

Mail list logo