Re: [PATCH] tree-optimization/104595 - vectorization of COND_EXPR with bool load

2022-05-04 Thread Richard Biener via Gcc-patches
On Mon, Feb 21, 2022 at 4:54 PM Richard Biener via Gcc-patches
 wrote:
>
> The following fixes an omission in bool pattern detection that
> makes it fail when check_bool_pattern fails for COND_EXPR.  That's
> not what it should do, instead it should still pattern recog
> to var != 0 even if no further adjustments to the def chain are
> necessary when var is not a mask already.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.
>
> There's another piece of the PR not yet fixed.

Re-tested on x86_64-unknown-linux-gnu, about to push if gcc.gnu.org lets me.

Richard.

> 2022-02-21  Richard Biener  
>
> PR tree-optimization/104595
> * tree-vect-patterns.c (vect_recog_bool_pattern): For
> COND_EXPR do not fail if check_bool_pattern returns false.
>
> * gcc.dg/vect/pr104595.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/pr104595.c | 24 
>  gcc/tree-vect-patterns.cc| 16 
>  2 files changed, 32 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr104595.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr104595.c 
> b/gcc/testsuite/gcc.dg/vect/pr104595.c
> new file mode 100644
> index 000..bb7d79aa69f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr104595.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_condition } */
> +
> +#define N 256
> +typedef char T;
> +extern T a[N];
> +extern T b[N];
> +extern T c[N];
> +extern _Bool pb[N];
> +extern char pc[N];
> +
> +void predicate_by_bool()
> +{
> +  for (int i = 0; i < N; i++)
> +c[i] = pb[i] ? a[i] : b[i];
> +}
> +
> +void predicate_by_char()
> +{
> +  for (int i = 0; i < N; i++)
> +c[i] = pc[i] ? a[i] : b[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 217bdfd7045..8c61eb965a6 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4450,18 +4450,18 @@ vect_recog_bool_pattern (vec_info *vinfo,
>if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
> return NULL;
>
> -  if (!check_bool_pattern (var, vinfo, bool_stmts))
> +  if (check_bool_pattern (var, vinfo, bool_stmts))
> +   var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> +  else if (integer_type_for_mask (var, vinfo))
> return NULL;
>
> -  rhs = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> -
>lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
>pattern_stmt
> - = gimple_build_assign (lhs, COND_EXPR,
> -build2 (NE_EXPR, boolean_type_node,
> -rhs, build_int_cst (type, 0)),
> -gimple_assign_rhs2 (last_stmt),
> -gimple_assign_rhs3 (last_stmt));
> +   = gimple_build_assign (lhs, COND_EXPR,
> +  build2 (NE_EXPR, boolean_type_node,
> +  var, build_int_cst (TREE_TYPE (var), 
> 0)),
> +  gimple_assign_rhs2 (last_stmt),
> +  gimple_assign_rhs3 (last_stmt));
>*type_out = vectype;
>vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
>
> --
> 2.34.1


[PATCH] libstdc++: testsuite: async.cc early timeout

2022-05-04 Thread Alexandre Oliva via Gcc-patches


The async call and future variable initialization may take a while to
complete on uniprocessors, especially if the async call and other
unrelated processes run before context switches back to the main
thread.

Taking steady_begin only then sometimes causes the 11*100ms in the
slow clock, counted from before the async call, to not be enough for
the measured wait to last 1s in the steady clock.  I've seen it fall
short of 1s by as little as a third of a tenth of a second in some
cases, but in one surprisingly extreme case the elapsed wait time got
only up to 216.7ms.

Initializing both timestamps next to each other, before the async
call, appears to avoid the problem entirely.  I've renamed the
variable moved out of the block so as to avoid name hiding in the
subsequent block, that has another steady_begin variable.

The second wait fails a lot less frequently, but the 2s limit has been
exceeded, so I'm bumping up the max sleep to ~4s, and the tolerance to
3s.


I wasn't sure about whether to leave the added outputs that I put in to
confirm the failure modes.  Please let me know in case they're
undersirable, and I'll take them out.

Regstrapped on x86_64-linux-gnu, ppc64le-linux-gnu, and also tested on
ppc- and ppc64-vx7r2.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/30_threads/async/async.cc (test04): Initialize
steady_start, renamed from steady_begin, next to slow_start.
Increase tolerance for final wait.
---
 libstdc++-v3/testsuite/30_threads/async/async.cc |   17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 38943ff1a9a5e..a36e1aee8bdef 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -20,6 +20,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+#include 
 
 #include 
 #include 
@@ -133,6 +134,7 @@ void test04()
 {
   using namespace std::chrono;
 
+  auto const steady_start = steady_clock::now();
   auto const slow_start = slow_clock::now();
   future f1 = async(launch::async, []() {
   std::this_thread::sleep_for(std::chrono::seconds(2));
@@ -140,21 +142,26 @@ void test04()
 
   // Wait for ~1s
   {
-auto const steady_begin = steady_clock::now();
 auto const status = f1.wait_until(slow_start + milliseconds(100));
 VERIFY(status == std::future_status::timeout);
-auto const elapsed = steady_clock::now() - steady_begin;
+auto const elapsed = steady_clock::now() - steady_start;
+if (elapsed < seconds(1))
+  std::cout << elapsed.count () << "ns < 1s" << std::endl;
 VERIFY(elapsed >= seconds(1));
 VERIFY(elapsed < seconds(2));
   }
 
-  // Wait for up to ~2s more
+  // Wait for up to ~4s more, but since the async sleep completes, the
+  // actual wait may be shorter than 1s.  Tolerate 3s because 2s
+  // hasn't been enough in some extreme cases.
   {
 auto const steady_begin = steady_clock::now();
-auto const status = f1.wait_until(slow_start + milliseconds(300));
+auto const status = f1.wait_until(slow_start + milliseconds(500));
 VERIFY(status == std::future_status::ready);
 auto const elapsed = steady_clock::now() - steady_begin;
-VERIFY(elapsed < seconds(2));
+if (elapsed >= seconds(3))
+  std::cout << elapsed.count () << "ns > 2s" << std::endl;
+VERIFY(elapsed < seconds(3));
   }
 }
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] gimple-isel: handle x CMP y ? z : 0

2022-05-04 Thread Richard Biener via Gcc-patches
On Wed, 4 May 2022, Richard Earnshaw wrote:

> 
> 
> On 04/05/2022 12:14, Richard Biener wrote:
> > On Wed, May 4, 2022 at 12:16 PM Richard Earnshaw via Gcc-patches
> >  wrote:
> >>
> >>
> >> Gimple isel already handles x CMP y ? -1 : 0 when lowering vector cond
> >> operations, but this can be generalized further when the comparison
> >> forms a natural mask so that we can also handle x CMP y ? z : 0 by
> >> transforming it into (x CMP y) & z.  This will, in most cases save
> >> having to load a register with the zero vector.
> > 
> > Hmm, I'm not sure why exactly the existing case is in ISEL but if
> > we want to extend it there we migh also want to handle ? 0 : -1
> > as BIT_NOT.
> 
> I was assuming that there had already been some level of canonicalization at
> this point, but maybe that's an incorrect assumption.  There are quite a few
> transforms that would avoid a select operation, but at some point the bit
> manipulations may well become more expensive than the select itself.
> 
> > 
> > For the case in question it looks like it might better fit as optimization
> > in match.pd?  It also lacks a query for present and3 support.
> > For match.pd it might be nice to handle x CMP y ? 0 : z as well
> > if we can invert x CMP y (and it has only a single use).  When in
> > ISEL we might as well check for andn3 support and go with
> > BIT_NOT + BIT_AND ...
> 
> I'll have to think about that a bit, the approach was inspired by the hunk
> just a bit earlier that starts:
> 
>   /* Lower mask typed, non-vector mode VEC_COND_EXPRs to bitwise operations.
>  Those can end up generated by folding and at least for integer mode masks
>  we cannot expect vcond expanders to exist.  We lower a ? b : c
>  to (b & a) | (c & ~a).  */
>   if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (lhs))
>   && !VECTOR_MODE_P (mode))
> {
>   gcc_assert (types_compatible_p (TREE_TYPE (op0), TREE_TYPE (op1)));
>   gimple_seq stmts = NULL;
>   tree type = TREE_TYPE (lhs);
>   location_t loc = gimple_location (stmt);
>   tree tem0 = gimple_build (&stmts, loc, BIT_AND_EXPR, type, op1, op0);
> 
> There's no test for and3 support there, but I guess that's working on
> integral modes rather than vectors.

Yes, that's for word_mode or AVX512 mask modes (QImode or HImode).

> > 
> > Do you have a testcase where it improves code generation btw?
> 
> It was originally inspired by:
> 
> void f (int * __restrict__ dest, int *a, int *b)
> {
>   int i;
>   for (i=0; i<4; i++) {
> dest[i] = a[i] == b[i];
>   }
> }
> 
> which with Neon on Arm was producing a vsel sequence rather than a mask
> operation because the backend doesn't handle this case directly during expand
> (unlike some other targets); but it seemed wrong for every target to have to
> handle this in the back-end when it's a pretty generic optimization.

But of course whether the AND or the VSEL is more efficient depends
on the target so in match.pd it would be canonicalization while in
ISEL (aka instruction selection) we should look at what the target
prefers.  ISEL is supposed to become a place where we do all the
non-trivial RTL expansion tricks.

The existing trick just drops an operation which should be always
profitable, evading the problem of deciding.  If we add things
here we should either reason that by all means profitability
is guaranteed (with a comment) or somehow query the target
(I'm not sure how - how does RTL expansion do it?  IIRC that
might also just pick the first "working" expansion?)

I understand your target can do the vcond as well, but for
the testcases it seems we have used_vec_cond_exprs == 1
and thus the vcond would also do the compare and thus to me
it looks like the single .VCOND is going to be cheaper
than a compare to produce the mask and the then AND?

If the VSEL doesn't actually perform the comparison then I think
you shouldn't have vcond{[u],eq} patterns but just comparison
patterns and vcond_mask?  For that I'd then say the target
could choose to expand vcond_mask with same mode (not
mixed FP/INT modes) as AND if profitable?

Richard.

> A more generalized case would be something like
> 
> void f (int * __restrict__ dest, int *a, int *b)
> {
>   int i;
>   for (i=0; i<4; i++) {
> dest[i] = a[i] ? b[i] : 0;
>   }
> }
> 
> R.
> 
> > 
> > Richard.
> > 
> >>gcc/ChangeLog:
> >> * gimple-isel.cc (gimple_expand_vec_cond_expr): Handle
> >> x CMP y ? z : 0.
> >>---
> >>  gcc/gimple-isel.cc | 29 +++--
> >>  1 file changed, 23 insertions(+), 6 deletions(-)
> >>
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


[PATCH] [PR100106] Reject unaligned subregs when strict alignment is required

2022-05-04 Thread Alexandre Oliva via Gcc-patches


The testcase for pr100106, compiled with optimization for 32-bit
powerpc -mcpu=604 with -mstrict-align expands the initialization of a
union from a float _Complex value into a load from an SCmode
constant pool entry, aligned to 4 bytes, into a DImode pseudo,
requiring 8-byte alignment.

The patch that introduced the testcase modified simplify_subreg to
avoid changing the MEM to outermode, but simplify_gen_subreg still
creates a SUBREG or a MEM that would require stricter alignment than
MEM's, and lra_constraints appears to get confused by that, repeatedly
creating unsatisfiable reloads for the SUBREG until it exceeds the
insn count.

Avoiding the unaligned SUBREG, expand splits the DImode dest into
SUBREGs and loads each SImode word of the constant pool with the
proper alignment.


At the time of posting this patch, it occurred to me that maybe the test
should allow paradoxical subregs of mems, or even that non-paradoxical
subregs of mems should be allowed to change to a mode with stricter
alignment, and the register allocator should deal with that somehow.
WDYT?


Regstrapped on x86_64-linux-gnu and ppc64le-linux-gnu, also tested
targeting ppc- and ppc64-vx7r2.  Ok to install?


for  gcc/ChangeLog

PR target/100106
* emit-rtl.c (validate_subreg): Reject a SUBREG of a MEM that
requires stricter alignment than MEM's.

for  gcc/testsuite/ChangeLog

PR target/100106
* gcc.target/powerpc/pr100106-sa.c: New.
---
 gcc/emit-rtl.cc|3 +++
 gcc/testsuite/gcc.target/powerpc/pr100106-sa.c |4 
 2 files changed, 7 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr100106-sa.c

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 1e02ae254d012..642e47eada0d7 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -982,6 +982,9 @@ validate_subreg (machine_mode omode, machine_mode imode,
 
   return subreg_offset_representable_p (regno, imode, offset, omode);
 }
+  else if (reg && MEM_P (reg)
+  && STRICT_ALIGNMENT && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
+return false;
 
   /* The outer size must be ordered wrt the register size, otherwise
  we wouldn't know at compile time how many registers the outer
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c 
b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
new file mode 100644
index 0..6cc29595c8b25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target { ilp32 } } } */
+/* { dg-options "-mcpu=604 -O -mstrict-align" } */
+
+#include "../../gcc.c-torture/compile/pr100106.c"


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] lto-plugin: add support for feature detection

2022-05-04 Thread Richard Biener via Gcc-patches
On Thu, May 5, 2022 at 8:15 AM Richard Biener
 wrote:
>
> On Wed, May 4, 2022 at 3:31 PM Martin Liška  wrote:
> >
> > On 5/4/22 15:10, Alexander Monakov wrote:
> > > On Wed, 4 May 2022, Martin Liška wrote:
> > >
> > >> On 5/4/22 14:32, Alexander Monakov wrote:
> > >>> On Wed, 4 May 2022, Martin Liška wrote:
> > >>>
> >  The patch is a follow-up of the discussion we've got in:
> >  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593901.html
> > 
> >  Mold linker would appreciate knowing in advance if get_symbols_v3 is 
> >  supported
> >  by a GCC plug-in or not.
> > >
> > > Out of curiousity, I looked at mold, and I don't understand what problem 
> > > this
> > > detection is solving, nor why this is the best way to solve that. Was 
> > > there
> > > some discussion with mold author I should check out?
> >
> > Sure, please take a look at this issue:
> > https://github.com/rui314/mold/issues/454#issuecomment-1116849458
> >
> > >
> > > Note that mold takes this not only as 'v3 API is supported', but, more
> > > importantly, as 'v2 entrypoint will not be called'.
> >
> > Yes, if they register get_symbols_v3, then it will be called. That's how the
> > plug-in works.
>
> I think they should simply try to not register LDPT_GET_SYMBOLS or
> LDPT_GET_SYMBOLS_V2 with the plugin in the onload hook and if
> that fails they will know the plugin doesn't support V3 only.  I suppose
> it should work to call onload() multiple times (when only increasing the
> set of supported functions) until it returns LDPS_OK without intermediately
> dlclosing it (maybe call cleanup_handler inbertween).  This should work
> for old plugin versions.
>
> That said, a better API extension compared to adding some random
> symbol like you propose is to enhance the return value from onload(),
> maybe returning an alternate transfer vector specifying symbol entries
> that will not be used (or return a transfer vector that will be used).
> We've been mostly versioning the symbol related hooks here.
>
> That said, I do not like at all this proposed add of a special symbol
> to flag exclusive v3 use.  That's a hack and not extensible at all.

Speaking of which, onload_v2 would be in order and should possibly
return some instantiation handle of the plugin that the linker could
instruct the plugin to dispose (reset ()?).  I see the GCC implementation
of the plugin just has a single global state and it doesn't seem that it's
prepared for multiple onload() calls (but it might work by accident if
you never remove things from the support vector but only add).

Without revamping the whole API onload_v2 could set the current
global state for the plugin based on the transfer vector and the reset()
API would discard the state (might also be redundant and implicitely
performed by the next onload_v2 call).

onload_v2 could then also have an "output" transfer vector where the
plugin simply copies the entries it picked and dropped those it will
never call.  We'd document the plugin may only pick _one_ of the versioned
API variants.

That said, the heuristic outlined above might still work with the present
onload() API and existing implementations.

Richard.

> Richard.
>
> > Martin
> >
> > >
> > > Alexander
> >


Re: [PATCH] lto-plugin: add support for feature detection

2022-05-04 Thread Richard Biener via Gcc-patches
On Wed, May 4, 2022 at 3:31 PM Martin Liška  wrote:
>
> On 5/4/22 15:10, Alexander Monakov wrote:
> > On Wed, 4 May 2022, Martin Liška wrote:
> >
> >> On 5/4/22 14:32, Alexander Monakov wrote:
> >>> On Wed, 4 May 2022, Martin Liška wrote:
> >>>
>  The patch is a follow-up of the discussion we've got in:
>  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593901.html
> 
>  Mold linker would appreciate knowing in advance if get_symbols_v3 is 
>  supported
>  by a GCC plug-in or not.
> >
> > Out of curiousity, I looked at mold, and I don't understand what problem 
> > this
> > detection is solving, nor why this is the best way to solve that. Was there
> > some discussion with mold author I should check out?
>
> Sure, please take a look at this issue:
> https://github.com/rui314/mold/issues/454#issuecomment-1116849458
>
> >
> > Note that mold takes this not only as 'v3 API is supported', but, more
> > importantly, as 'v2 entrypoint will not be called'.
>
> Yes, if they register get_symbols_v3, then it will be called. That's how the
> plug-in works.

I think they should simply try to not register LDPT_GET_SYMBOLS or
LDPT_GET_SYMBOLS_V2 with the plugin in the onload hook and if
that fails they will know the plugin doesn't support V3 only.  I suppose
it should work to call onload() multiple times (when only increasing the
set of supported functions) until it returns LDPS_OK without intermediately
dlclosing it (maybe call cleanup_handler inbertween).  This should work
for old plugin versions.

That said, a better API extension compared to adding some random
symbol like you propose is to enhance the return value from onload(),
maybe returning an alternate transfer vector specifying symbol entries
that will not be used (or return a transfer vector that will be used).
We've been mostly versioning the symbol related hooks here.

That said, I do not like at all this proposed add of a special symbol
to flag exclusive v3 use.  That's a hack and not extensible at all.

Richard.

> Martin
>
> >
> > Alexander
>


[committed] MAINTAINERS: Add myself as PowerPC port co-maintainer

2022-05-04 Thread Kewen.Lin via Gcc-patches
Hi,

Add myself as PowerPC port co-maintainer and to DCO section.  Pushed below as 
r13-127.

ChangeLog:

* MAINTAINERS: Add myself as PowerPC port co-maintainer and to DCO
section.
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 15973503722..87708942d39 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -112,6 +112,7 @@ riscv port  Andrew Waterman 

 riscv port Jim Wilson  
 rs6000/powerpc portDavid Edelsohn  
 rs6000/powerpc portSegher Boessenkool  
+rs6000/powerpc portKewen Lin   
 rs6000 vector extnsAldy Hernandez  
 rx portNick Clifton
 s390 port  Ulrich Weigand  
@@ -506,7 +507,6 @@ Ilya Leoshkevich

 Kriang Lerdsuwanakij   
 Renlin Li  
 Xinliang David Li  
-Kewen Lin  
 Chen Liqin 
 Jiangning Liu  
 Sa Liu 
@@ -717,6 +717,8 @@ information.
 Matthias Kretz 
 Jeff Law   
 Jeff Law   
+Kewen Lin  
+Kewen Lin  
 Gaius Mulley   
 Siddhesh Poyarekar 
 Navid Rahimi   
--
2.25.1


[PATCH] Strip of a vector load which is only used partially.

2022-05-04 Thread liuhongt via Gcc-patches
Optimize

  _1 = *srcp_3(D);
  _4 = VEC_PERM_EXPR <_1, _1, { 4, 5, 6, 7, 4, 5, 6, 7 }>;
  _5 = BIT_FIELD_REF <_4, 128, 0>;

to

  _1 = *srcp_3(D);
  _5 = BIT_FIELD_REF <_1, 128, 128>;

the upper will finally be optimized to

_5 = BIT_FIELD_REF <*srcp_3(D), 128, 128>;

Bootstrapped and regtested on x86_64-pc-linux-gnu{m32,}.
Ok for trunk?

gcc/ChangeLog:

PR tree-optimization/102583
* gimple.h (gate_optimize_vector_load): Declare.
* match.pd: Simplify (BIT_FIELD_REF (vec_perm *p *p { 4, 5, 6,
7, 4, 5, 6, 7 }) 128 0) to (BIT_FIELD_REF *p 128 128).
* tree-ssa-forwprop.cc (gate_optimize_vector_load): New
function.
(pass_forwprop::execute): Put condition codes in the upper new
function.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102583.c: New test.
---
 gcc/gimple.h |  1 +
 gcc/match.pd | 56 
 gcc/testsuite/gcc.target/i386/pr102583.c | 30 +
 gcc/tree-ssa-forwprop.cc | 32 +-
 4 files changed, 109 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102583.c

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 6b1e89ad74e..1747dae1193 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1638,6 +1638,7 @@ extern void maybe_remove_unused_call_args (struct 
function *, gimple *);
 extern bool gimple_inexpensive_call_p (gcall *);
 extern bool stmt_can_terminate_bb_p (gimple *);
 extern location_t gimple_or_expr_nonartificial_location (gimple *, tree);
+extern bool gate_optimize_vector_load (gimple *);
 
 /* Return the disposition for a warning (or all warnings by default)
for a statement.  */
diff --git a/gcc/match.pd b/gcc/match.pd
index 6d691d302b3..ac214310251 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6832,6 +6832,62 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
}
(cmp @0 { res; })
 
+#if GIMPLE
+/* Simplify partail vector access, transform
+
+   V8SI A;
+   V4SI B;
+   A = *PA;
+   B = VEC_PERM_EXPR (A, A, { 4, 5, 6, 7, 4, 5, 6, 7 });
+   C = BIT_FIELD_REF (B, 128, 0)
+
+to
+
+   A = *PA;
+   C = BIT_FIELD_REF (B, 128, 128);
+
+optimize_vector_load will eventually optimize the upper to
+
+   C = BIT_FIELD_REF (*PA, 128, 128);  */
+
+(simplify
+ (BIT_FIELD_REF (vec_perm@2 SSA_NAME@0 @0 VECTOR_CST@1) @rsize @rpos)
+ (if (VECTOR_TYPE_P (type)
+ && TYPE_MODE (type) != BLKmode
+ && single_use (@2)
+ && gate_optimize_vector_load (SSA_NAME_DEF_STMT (@0))
+ && types_match (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@0
+  (with
+   {
+ unsigned HOST_WIDE_INT nelts = -1;
+ if (!VECTOR_CST_NELTS (@1).is_constant (&nelts))
+   return NULL_TREE;
+ tree inner_type = TREE_TYPE (type);
+ unsigned HOST_WIDE_INT elt_w = tree_to_uhwi (TYPE_SIZE (inner_type));
+ unsigned HOST_WIDE_INT pos = tree_to_uhwi (@rpos);
+ unsigned HOST_WIDE_INT size = tree_to_uhwi (@rsize);
+ unsigned HOST_WIDE_INT start
+   = tree_to_uhwi (vector_cst_elt (@1, pos / elt_w));
+
+ for (unsigned HOST_WIDE_INT i  = pos / elt_w + 1; i != size / elt_w; i++)
+   {
+/* Continuous area.  */
+if (tree_to_uhwi (vector_cst_elt (@1, i)) - 1
+!= tree_to_uhwi (vector_cst_elt (@1, i - 1)))
+  return NULL_TREE;
+   }
+
+ /* Aligned or support movmisalign_optab.  */
+ unsigned HOST_WIDE_INT dest_align = tree_to_uhwi (TYPE_SIZE (type));
+ if ((TYPE_ALIGN (TREE_TYPE (@0)) % dest_align
+ || start * elt_w % dest_align)
+   && (optab_handler (movmisalign_optab, TYPE_MODE (type))
+   == CODE_FOR_nothing))
+   return NULL_TREE;
+   }
+   (BIT_FIELD_REF @0 @rsize { bitsize_int (start * elt_w); }
+#endif
+
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
 (simplify
diff --git a/gcc/testsuite/gcc.target/i386/pr102583.c 
b/gcc/testsuite/gcc.target/i386/pr102583.c
new file mode 100644
index 000..ff2ffb5e671
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr102583.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times {(?n)vcvtdq2ps[ \t]+20\(%.*%ymm} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)vcvtdq2ps[ \t]+8\(%.*%xmm} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)vmovq[ \t]+16\(%.*%xmm} 1 { target { 
! ia32 } } } } */
+/* { dg-final { scan-assembler-not {(?n)vpermd[ \t]+.*%zmm} } } */
+
+typedef int v16si __attribute__((vector_size(64)));
+typedef float v8sf __attribute__((vector_size(32)));
+typedef float v4sf __attribute__((vector_size(16)));
+typedef float v2sf __attribute__((vector_size(8)));
+
+v8sf part (v16si *srcp)
+{
+  v16si src = *srcp;
+  return (v8sf) { (float)src[5], (float)src[6], (float)src[7], (float)src[8],
+  (float)src[9], (float)src[10], (float)src[11], (float)src[12] };
+}
+
+v4sf part1 (v16si *srcp)
+{
+  v16si src = *srcp;
+  return (v4sf) { (float)src[2], (float)src[3], (float)src[4], (float)

Re: [PATCH] c++: wrong error with MVP and pushdecl [PR64679]

2022-05-04 Thread Jason Merrill via Gcc-patches

On 5/4/22 19:20, Marek Polacek wrote:

On Wed, May 04, 2022 at 05:44:45PM -0400, Jason Merrill wrote:

On 5/4/22 16:03, Marek Polacek wrote:

This patch fixes the second half of 64679.  Here we issue a wrong
"redefinition of 'int x'" for the following:

struct Bar {
  Bar(int, int, int);
};

int x = 1;
Bar bar(int(x), int(x), int{x}); // #1

cp_parser_parameter_declaration_list does pushdecl every time it sees
a named parameter, so the second "int(x)" causes the error.  That's
premature, since this turns out to be a constructor call after the
third argument!

If the first parameter is parenthesized, we can't push until we've
established we're looking at a function declaration.  Therefore this
could be fixed by some kind of lookahead.  I thought about introducing
a lightweight variant of cp_parser_parameter_declaration_list that would
not have any side effects and would return as soon as it figures out
whether it's looking at a declaration or expression.  Since that would
require fairly nontrivial changes, I wanted something simpler.  Something
like delaying the pushdecl until we've reached the ')' following the
parameter-declaration-clause.  But that doesn't quite cut it: we must
have pushed the parameters before processing a default argument, as in:

Bar bar(int(a), int(b), int c = sizeof(a));  // valid


I wondered how this would affect

   void f(int (i), decltype(i) j = 42);

interestingly, clang and EDG both reject this, but they accept

   void f(int (i), bool b = true, decltype(i) j = 42);

which suggests a similar implementation strategy.  MSVC accepts both.


Sigh, C++.

So the former would be rejected with this patch because decltype(i)
is parsed as the declspec.  And I can't play any cute games with decltype
because a decltype doesn't necessarily mean a parameter:

struct F {
   F(int, int);
};

void
g ()
{
   int x = 42;

   F v1(int(x), decltype(x)(42));

   F f1(int(i), decltype(i) j = 42);
   F f2(int(i), decltype(i) j);
   F f3(int(i), decltype(i)(j));
   F f4(int(i), decltype(i)(j) = 42);
   F f5(int (i), bool b = true, decltype(i) j = 42);
   F f6(int(i), decltype(x)(x));
}


But I think there's a way out: we could pushdecl the parameters as we
go and only stash when there would be a clash, if parsing tentatively.
And then push the pending parameters only at the end of the clause, solely
to get the redefinition/redeclaration error.  That is:

   Bar b(int(x), int(x), int{x});

would mean:
push x
store x
it's not a decl -> discard it, parse as an expression

   Bar b(int(x), int(x), int);

would mean:
push x
store x
it's a decl -> push pending parameters -> error

And then I don't need to push when about to commit, avoiding the need to
change cp_parser_parameter_declaration.  WDYT?


Sounds good to me.

Jason



Re: [PATCH] c++: wrong error with MVP and pushdecl [PR64679]

2022-05-04 Thread Marek Polacek via Gcc-patches
On Wed, May 04, 2022 at 05:44:45PM -0400, Jason Merrill wrote:
> On 5/4/22 16:03, Marek Polacek wrote:
> > This patch fixes the second half of 64679.  Here we issue a wrong
> > "redefinition of 'int x'" for the following:
> > 
> >struct Bar {
> >  Bar(int, int, int);
> >};
> > 
> >int x = 1;
> >Bar bar(int(x), int(x), int{x}); // #1
> > 
> > cp_parser_parameter_declaration_list does pushdecl every time it sees
> > a named parameter, so the second "int(x)" causes the error.  That's
> > premature, since this turns out to be a constructor call after the
> > third argument!
> > 
> > If the first parameter is parenthesized, we can't push until we've
> > established we're looking at a function declaration.  Therefore this
> > could be fixed by some kind of lookahead.  I thought about introducing
> > a lightweight variant of cp_parser_parameter_declaration_list that would
> > not have any side effects and would return as soon as it figures out
> > whether it's looking at a declaration or expression.  Since that would
> > require fairly nontrivial changes, I wanted something simpler.  Something
> > like delaying the pushdecl until we've reached the ')' following the
> > parameter-declaration-clause.  But that doesn't quite cut it: we must
> > have pushed the parameters before processing a default argument, as in:
> > 
> >Bar bar(int(a), int(b), int c = sizeof(a));  // valid
> 
> I wondered how this would affect
> 
>   void f(int (i), decltype(i) j = 42);
> 
> interestingly, clang and EDG both reject this, but they accept
> 
>   void f(int (i), bool b = true, decltype(i) j = 42);
> 
> which suggests a similar implementation strategy.  MSVC accepts both.

Sigh, C++.

So the former would be rejected with this patch because decltype(i)
is parsed as the declspec.  And I can't play any cute games with decltype
because a decltype doesn't necessarily mean a parameter:

struct F {
  F(int, int);
};

void
g ()
{
  int x = 42;

  F v1(int(x), decltype(x)(42));

  F f1(int(i), decltype(i) j = 42);
  F f2(int(i), decltype(i) j);
  F f3(int(i), decltype(i)(j));
  F f4(int(i), decltype(i)(j) = 42);
  F f5(int (i), bool b = true, decltype(i) j = 42);
  F f6(int(i), decltype(x)(x));
}


But I think there's a way out: we could pushdecl the parameters as we
go and only stash when there would be a clash, if parsing tentatively.
And then push the pending parameters only at the end of the clause, solely
to get the redefinition/redeclaration error.  That is:

  Bar b(int(x), int(x), int{x});

would mean:
push x
store x
it's not a decl -> discard it, parse as an expression

  Bar b(int(x), int(x), int);

would mean:
push x
store x
it's a decl -> push pending parameters -> error

And then I don't need to push when about to commit, avoiding the need to
change cp_parser_parameter_declaration.  WDYT?

Marek



[PATCH] libsanitizer: cherry-pick commit f52e365092aa from upstream

2022-05-04 Thread H.J. Lu via Gcc-patches
On Wed, May 4, 2022 at 1:59 AM Martin Liška  wrote:
>
> Hello.
>
> I'm going to do merge from upstream.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. I've 
> also tested
> on ppc64le-linux-gnu and verified the ABI.
>
> The only real change is a small change in
> gcc/testsuite/c-c++-common/asan/alloca_loop_unpoisoning.c
> where we need --param=asan-use-after-return=0.
>
> I'm going to push the patches.

Hi,

I am checking in this patch to cherry-pick

f52e365092aa [sanitizer] Use newfstatat for x32

to restore x32 build.

-- 
H.J.
From ae90c2d0f9bcc30af98c730f91544efa01cb897c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 4 May 2022 15:59:49 -0700
Subject: [PATCH] libsanitizer: cherry-pick commit f52e365092aa from upstream

cherry-pick:

f52e365092aa [sanitizer] Use newfstatat for x32
---
 libsanitizer/sanitizer_common/sanitizer_linux.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_linux.cpp b/libsanitizer/sanitizer_common/sanitizer_linux.cpp
index 8e144a4e9a0..e2c32d679ad 100644
--- a/libsanitizer/sanitizer_common/sanitizer_linux.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_linux.cpp
@@ -343,7 +343,7 @@ uptr internal_stat(const char *path, void *buf) {
 #if SANITIZER_FREEBSD
   return internal_syscall(SYSCALL(fstatat), AT_FDCWD, (uptr)path, (uptr)buf, 0);
 #elif SANITIZER_LINUX
-#  if SANITIZER_WORDSIZE == 64
+#  if SANITIZER_WORDSIZE == 64 || SANITIZER_X32
   return internal_syscall(SYSCALL(newfstatat), AT_FDCWD, (uptr)path, (uptr)buf,
   0);
 #  else
@@ -366,7 +366,7 @@ uptr internal_lstat(const char *path, void *buf) {
   return internal_syscall(SYSCALL(fstatat), AT_FDCWD, (uptr)path, (uptr)buf,
   AT_SYMLINK_NOFOLLOW);
 #elif SANITIZER_LINUX
-#  if defined(_LP64)
+#  if defined(_LP64) || SANITIZER_X32
   return internal_syscall(SYSCALL(newfstatat), AT_FDCWD, (uptr)path, (uptr)buf,
   AT_SYMLINK_NOFOLLOW);
 #  else
-- 
2.35.1



Re: [PATCH] libstdc++: fix pointer type exception catch [PR105387]

2022-05-04 Thread Jonathan Wakely via Gcc-patches
On Wed, 4 May 2022 at 12:14, Jonathan Wakely  wrote:
>
> On Tue, 3 May 2022 at 11:57, Jakob Hasse via Libstdc++
>  wrote:
> >
> > This is a patch for the bug 105387 reported in bugzilla: 105387 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105387. This report should 
> > contain all the necessary information about the issue. But the patch there 
> > was just a preliminary one.
> > I created a proper patch with the fix and a test case, based on TAG 
> > releases/gcc-11.2.0 (7ca388565af176bd4efd4f8db1e5e9e11e98ef45). The 
> > changelog is part of the commit message in the patch.
>
> Thanks for the patch!
>
> Some boring administrative comments:
>
> In the summary line of the git commit message:
> - The component should be libstdc++ not libstdc++-v3.
> - The PR number should include "PR" i.e. [PR105387].
> Your email Subject: gets this right, but the patch doesn't.
> This is (not very well) documented at
> https://gcc.gnu.org/contribute.html#patches
>
> Your new testcase has a FSF copyright notice. Unless you have already
> completed the paperwork to assign your work (either for just this
> change, or this and all future changes) to the FSF then you need to
> either do that legal paperwork, or alternatively contribute under the
> DCO terms without assigning copyright. See
> https://gcc.gnu.org/contribute.html#legal
> For simplicity and to expedite the process, I suggest just removing
> the copyright notice and license notice from the testcase, and adding
> a Signed-off-by: tag to the commit message, as per
> https://gcc.gnu.org/dco.html
>
> As for the actual code ...

The commit msg says:

"In case RTTI is enabled, this does not seem to be a problem because
RTTI-based checks would run before and prevent running into the bad
down-cast. However, since the check is very simple and I'm not 100% sure
about the RTTI-case, it has been left for both cases (RTTI and no-RTTI)."

I think we don't want to do the redundant __is_pointer_p() call for
the RTTI-enabled case. For that case we know it's a pointer, or we'd
have returned early, so making an extra virtual call is just
unnecessary overhead. So I think your new check should be done in an
#else branch of the existing #if __cpp_rtti block.

More importantly, I think your patch breaks this case, which currently
works with or without RTTI:

struct X { int i; };
try {  throw &X::i; }
catch (const int X::*) { }

For this case __is_pointer_p() is false, but the cast to
__pbase_type_info is safe, and required for correctness. I think with
your patch the exception won't be caught.

Maybe that's a sacrifice we have to make in order to avoid undefined
behaviour for your original example, but it's unfortunate if that's
the case.


[pushed] c++: alias CTAD refactoring [PR104470]

2022-05-04 Thread Jason Merrill via Gcc-patches
In my previous PR104470 patch I added yet another place that needs to handle
dependent member rewriting for deduction guides; this patches centralizes
rewriting into maybe_dependent_member_ref.  tsubst_baselink still has its
own handling because that's simpler than teaching maybe_dependent_member_ref
about BASELINKs.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/104470

gcc/cp/ChangeLog:

* pt.cc (maybe_dependent_member_ref): Handle types.
(tsubst, tsubst_copy): Use it.
(tsubst_aggr_type, instantiate_alias_template): Don't handle
tf_dguide here.
---
 gcc/cp/pt.cc | 104 ++-
 1 file changed, 54 insertions(+), 50 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ac002907a41..d8d36f35692 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -220,6 +220,7 @@ static tree make_argument_pack (tree);
 static void register_parameter_specializations (tree, tree);
 static tree enclosing_instantiation_of (tree tctx);
 static void instantiate_body (tree pattern, tree args, tree d, bool nested);
+static tree maybe_dependent_member_ref (tree, tree, tsubst_flags_t, tree);
 
 /* Make the current scope suitable for access checking when we are
processing T.  T can be FUNCTION_DECL for instantiated function
@@ -13725,18 +13726,6 @@ tsubst_aggr_type (tree t,
 complain, in_decl);
  if (argvec == error_mark_node)
r = error_mark_node;
- else if (!entering_scope && (complain & tf_dguide)
-  && dependent_scope_p (context))
-   {
- /* See maybe_dependent_member_ref.  */
- tree name = TYPE_IDENTIFIER (t);
- tree fullname = name;
- if (instantiates_primary_template_p (t))
-   fullname = build_nt (TEMPLATE_ID_EXPR, name,
-INNERMOST_TEMPLATE_ARGS (argvec));
- return build_typename_type (context, name, fullname,
- typename_type);
-   }
  else
{
  r = lookup_template_class (t, argvec, in_decl, context,
@@ -15586,6 +15575,9 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
 
   gcc_assert (type != unknown_type_node);
 
+  if (tree d = maybe_dependent_member_ref (t, args, complain, in_decl))
+return d;
+
   /* Reuse typedefs.  We need to do this to handle dependent attributes,
  such as attribute aligned.  */
   if (TYPE_P (t)
@@ -16815,16 +16807,58 @@ maybe_dependent_member_ref (tree t, tree args, 
tsubst_flags_t complain,
   if (!(complain & tf_dguide))
 return NULL_TREE;
 
-  tree ctx = context_for_name_lookup (t);
+  tree decl = (t && TYPE_P (t)) ? TYPE_NAME (t) : t;
+  if (!decl || !DECL_P (decl))
+return NULL_TREE;
+
+  tree ctx = context_for_name_lookup (decl);
   if (!CLASS_TYPE_P (ctx))
 return NULL_TREE;
 
   ctx = tsubst (ctx, args, complain, in_decl);
-  if (dependent_scope_p (ctx))
-return build_qualified_name (NULL_TREE, ctx, DECL_NAME (t),
-/*template_p=*/false);
+  if (!dependent_scope_p (ctx))
+return NULL_TREE;
 
-  return NULL_TREE;
+  if (TYPE_P (t))
+{
+  if (typedef_variant_p (t))
+   t = strip_typedefs (t);
+  tree decl = TYPE_NAME (t);
+  if (decl)
+   decl = maybe_dependent_member_ref (decl, args, complain, in_decl);
+  if (!decl)
+   return NULL_TREE;
+  return cp_build_qualified_type_real (TREE_TYPE (decl), cp_type_quals (t),
+  complain);
+}
+
+  tree name = DECL_NAME (t);
+  tree fullname = name;
+  if (instantiates_primary_template_p (t))
+{
+  tree tinfo = get_template_info (t);
+  name = DECL_NAME (TI_TEMPLATE (tinfo));
+  tree targs = INNERMOST_TEMPLATE_ARGS (TI_ARGS (tinfo));
+  targs = tsubst_template_args (targs, args, complain, in_decl);
+  fullname = build_nt (TEMPLATE_ID_EXPR, name, targs);
+}
+
+  if (TREE_CODE (t) == TYPE_DECL)
+{
+  if (TREE_CODE (TREE_TYPE (t)) == TYPENAME_TYPE
+ && TYPE_NAME (TREE_TYPE (t)) == t)
+   /* The TYPE_DECL for a typename has DECL_CONTEXT of the typename
+  scope, but it doesn't need to be rewritten again.  */
+   return NULL_TREE;
+  tree type = build_typename_type (ctx, name, fullname, typename_type);
+  return TYPE_NAME (type);
+}
+  else if (DECL_TYPE_TEMPLATE_P (t))
+return make_unbound_class_template (ctx, name,
+   NULL_TREE, complain);
+  else
+return build_qualified_name (NULL_TREE, ctx, fullname,
+TREE_CODE (t) == TEMPLATE_DECL);
 }
 
 /* Like tsubst, but deals with expressions.  This function just replaces
@@ -16840,6 +16874,9 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   if (t == NULL_TREE || t == error_mark_node || args == NULL_TREE)
 return 

Re: [PATCH] c++: wrong error with MVP and pushdecl [PR64679]

2022-05-04 Thread Jason Merrill via Gcc-patches

On 5/4/22 16:03, Marek Polacek wrote:

This patch fixes the second half of 64679.  Here we issue a wrong
"redefinition of 'int x'" for the following:

   struct Bar {
 Bar(int, int, int);
   };

   int x = 1;
   Bar bar(int(x), int(x), int{x}); // #1

cp_parser_parameter_declaration_list does pushdecl every time it sees
a named parameter, so the second "int(x)" causes the error.  That's
premature, since this turns out to be a constructor call after the
third argument!

If the first parameter is parenthesized, we can't push until we've
established we're looking at a function declaration.  Therefore this
could be fixed by some kind of lookahead.  I thought about introducing
a lightweight variant of cp_parser_parameter_declaration_list that would
not have any side effects and would return as soon as it figures out
whether it's looking at a declaration or expression.  Since that would
require fairly nontrivial changes, I wanted something simpler.  Something
like delaying the pushdecl until we've reached the ')' following the
parameter-declaration-clause.  But that doesn't quite cut it: we must
have pushed the parameters before processing a default argument, as in:

   Bar bar(int(a), int(b), int c = sizeof(a));  // valid


I wondered how this would affect

  void f(int (i), decltype(i) j = 42);

interestingly, clang and EDG both reject this, but they accept

  void f(int (i), bool b = true, decltype(i) j = 42);

which suggests a similar implementation strategy.  MSVC accepts both.


After more trial and error than I'd be willing to admit I came up with
the following, which stashes parameters into a vector when parsing
tentatively, and only pushes when we're about to commit to any tentative
parse (that handles the default argument case) or we're at the end of
a parameter-declaration-clause followed by a ')' (so that we still diagnose
redefining a parameter).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/64679

gcc/cp/ChangeLog:

* parser.cc (push_pending_decls): New.
(cp_parser_parameter_declaration_clause): Maintain a vector of
parameters that haven't been pushed yet.  Push them at the end of
a valid parameter-declaration-clause
(cp_parser_parameter_declaration_list): Take a new auto_vec parameter.
Pass it down to cp_parser_parameter_declaration.  Do not pushdecl
while parsing tentatively.
(cp_parser_parameter_declaration): Take a new auto_vec parameter.
push_pending_decls when we're about to commit.
(cp_parser_cache_defarg): Adjust the call to
cp_parser_parameter_declaration_list.

gcc/testsuite/ChangeLog:

* g++.dg/parse/ambig11.C: New test.
* g++.dg/parse/ambig12.C: New test.
* g++.dg/parse/ambig13.C: New test.
---
  gcc/cp/parser.cc | 76 
  gcc/testsuite/g++.dg/parse/ambig11.C | 39 ++
  gcc/testsuite/g++.dg/parse/ambig12.C | 12 +
  gcc/testsuite/g++.dg/parse/ambig13.C | 32 
  4 files changed, 148 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/parse/ambig11.C
  create mode 100644 gcc/testsuite/g++.dg/parse/ambig12.C
  create mode 100644 gcc/testsuite/g++.dg/parse/ambig13.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5fa743b5a8e..dd51b3a7b79 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2391,9 +2391,9 @@ static void cp_parser_type_specifier_seq
  static tree cp_parser_parameter_declaration_clause
(cp_parser *, cp_parser_flags);
  static tree cp_parser_parameter_declaration_list
-  (cp_parser *, cp_parser_flags);
+  (cp_parser *, cp_parser_flags, auto_vec *);
  static cp_parameter_declarator *cp_parser_parameter_declaration
-  (cp_parser *, cp_parser_flags, bool, bool *);
+  (cp_parser *, cp_parser_flags, bool, bool *, auto_vec * = nullptr);
  static tree cp_parser_default_argument
(cp_parser *, bool);
  static void cp_parser_function_body
@@ -24438,6 +24438,27 @@ function_being_declared_is_template_p (cp_parser* 
parser)
  (current_class_type));
  }
  
+/* Push any pending named parameters we have seen in this function declaration

+   and stashed into PENDING_DECLS.  This mechanism is used for cases like
+
+ S foo(int(x), int(x), int{x});
+
+   where it's not clear if we're dealing with a constructor call or a function
+   declaration until we've seen the last argument which breaks it up.  Pushing
+   immediately would result in the redefinition error on the second 'x'.
+
+   We only have to do this if the first parameter is parenthesized.  */
+
+static void
+push_pending_decls (auto_vec *pending_decls)
+{
+  if (!pending_decls)
+return;
+  for (tree p : pending_decls)
+pushdecl (p);
+  pending_decls->truncate (0);
+}
+
  /* Parse a parameter-declaration-clause.
  
 parameter-declaration-clause:

@@ -24494,8 +24515,12 @@ cp_parser_parameter_declaration_clause (cp_parser* 
parser,
return explicit_voi

Re: [PATCH v2] cpp: new built-in __EXP_COUNTER__

2022-05-04 Thread Kaz Kylheku via Gcc-patches
On 2022-04-21 09:11, Kaz Kylheku wrote:
> libcpp/ChangeLog
> 2022-04-21  Kaz Kylheku  
> 
>   This change introduces a pair of related macros
>   __EXP_COUNTER__ and __UEXP_COUNTER__.  These macros access
>   integer values which enumerate macro expansions.
>   They can be used for the purposes of obtaining, unique
>   identifiers (within the scope of a translation unit), as a
>   replacement for unreliable hacks based on __LINE__.

Ping!


[PATCH] c++: wrong error with MVP and pushdecl [PR64679]

2022-05-04 Thread Marek Polacek via Gcc-patches
This patch fixes the second half of 64679.  Here we issue a wrong
"redefinition of 'int x'" for the following:

  struct Bar {
Bar(int, int, int);
  };

  int x = 1;
  Bar bar(int(x), int(x), int{x}); // #1

cp_parser_parameter_declaration_list does pushdecl every time it sees
a named parameter, so the second "int(x)" causes the error.  That's
premature, since this turns out to be a constructor call after the
third argument!

If the first parameter is parenthesized, we can't push until we've
established we're looking at a function declaration.  Therefore this
could be fixed by some kind of lookahead.  I thought about introducing
a lightweight variant of cp_parser_parameter_declaration_list that would
not have any side effects and would return as soon as it figures out
whether it's looking at a declaration or expression.  Since that would
require fairly nontrivial changes, I wanted something simpler.  Something
like delaying the pushdecl until we've reached the ')' following the
parameter-declaration-clause.  But that doesn't quite cut it: we must
have pushed the parameters before processing a default argument, as in:

  Bar bar(int(a), int(b), int c = sizeof(a));  // valid

After more trial and error than I'd be willing to admit I came up with
the following, which stashes parameters into a vector when parsing
tentatively, and only pushes when we're about to commit to any tentative
parse (that handles the default argument case) or we're at the end of
a parameter-declaration-clause followed by a ')' (so that we still diagnose
redefining a parameter).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/64679

gcc/cp/ChangeLog:

* parser.cc (push_pending_decls): New.
(cp_parser_parameter_declaration_clause): Maintain a vector of
parameters that haven't been pushed yet.  Push them at the end of
a valid parameter-declaration-clause
(cp_parser_parameter_declaration_list): Take a new auto_vec parameter.
Pass it down to cp_parser_parameter_declaration.  Do not pushdecl
while parsing tentatively.
(cp_parser_parameter_declaration): Take a new auto_vec parameter.
push_pending_decls when we're about to commit.
(cp_parser_cache_defarg): Adjust the call to
cp_parser_parameter_declaration_list.

gcc/testsuite/ChangeLog:

* g++.dg/parse/ambig11.C: New test.
* g++.dg/parse/ambig12.C: New test.
* g++.dg/parse/ambig13.C: New test.
---
 gcc/cp/parser.cc | 76 
 gcc/testsuite/g++.dg/parse/ambig11.C | 39 ++
 gcc/testsuite/g++.dg/parse/ambig12.C | 12 +
 gcc/testsuite/g++.dg/parse/ambig13.C | 32 
 4 files changed, 148 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/ambig11.C
 create mode 100644 gcc/testsuite/g++.dg/parse/ambig12.C
 create mode 100644 gcc/testsuite/g++.dg/parse/ambig13.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5fa743b5a8e..dd51b3a7b79 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2391,9 +2391,9 @@ static void cp_parser_type_specifier_seq
 static tree cp_parser_parameter_declaration_clause
   (cp_parser *, cp_parser_flags);
 static tree cp_parser_parameter_declaration_list
-  (cp_parser *, cp_parser_flags);
+  (cp_parser *, cp_parser_flags, auto_vec *);
 static cp_parameter_declarator *cp_parser_parameter_declaration
-  (cp_parser *, cp_parser_flags, bool, bool *);
+  (cp_parser *, cp_parser_flags, bool, bool *, auto_vec * = nullptr);
 static tree cp_parser_default_argument
   (cp_parser *, bool);
 static void cp_parser_function_body
@@ -24438,6 +24438,27 @@ function_being_declared_is_template_p (cp_parser* 
parser)
  (current_class_type));
 }
 
+/* Push any pending named parameters we have seen in this function declaration
+   and stashed into PENDING_DECLS.  This mechanism is used for cases like
+
+ S foo(int(x), int(x), int{x});
+
+   where it's not clear if we're dealing with a constructor call or a function
+   declaration until we've seen the last argument which breaks it up.  Pushing
+   immediately would result in the redefinition error on the second 'x'.
+
+   We only have to do this if the first parameter is parenthesized.  */
+
+static void
+push_pending_decls (auto_vec *pending_decls)
+{
+  if (!pending_decls)
+return;
+  for (tree p : pending_decls)
+pushdecl (p);
+  pending_decls->truncate (0);
+}
+
 /* Parse a parameter-declaration-clause.
 
parameter-declaration-clause:
@@ -24494,8 +24515,12 @@ cp_parser_parameter_declaration_clause (cp_parser* 
parser,
   return explicit_void_list_node;
 }
 
+  /* A vector of parameters that haven't been pushed yet.  */
+  auto_vec pending_decls;
+
   /* Parse the parameter-declaration-list.  */
-  parameters = cp_parser_parameter_declaration_list (parser, flags);
+  parameters = cp_parser_parameter_declaration_list (parser, flags,
+ 

Re: [PATCH] c++: wrong parse with functors [PR64679]

2022-05-04 Thread Marek Polacek via Gcc-patches
On Tue, May 03, 2022 at 04:43:05PM -0400, Jason Merrill via Gcc-patches wrote:
> On 5/2/22 12:18, Marek Polacek wrote:
> > Consider
> > 
> >struct F {
> >  F(int) {}
> >  F operator()(int) const { return *this; }
> >};
> > 
> > and
> > 
> >F(i)(0)(0);
> > 
> > where we're supposed to first call the constructor and then invoke
> > the operator() twice.  However, we parse this as an init-declarator:
> > "(i)" looks like a perfectly valid declarator, then we see an '(' and
> > think it must be an initializer, so we commit and we're toast.
> 
> How vexing!

:-) Most vexing indeed!
 
> > My
> > fix is to look a little bit farther before deciding we've seen an
> > initializer.
> > 
> > This is only a half of c++/64679, the other part of the PR is unrelated:
> > there the problem is that we are calling pushdecl while parsing
> > tentatively (in cp_parser_parameter_declaration_list), which is bad.
> > I don't know how to fix it though, maybe move the pushdecl call to
> > grokparms?  Tricky :(.
> 
> Can we pop the parm decls when the tentative parse fails?

Unfortunately no, we'll have already given a hard error when that
happens.  About to send a patch where I describe the problem in
detail.

> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> OK.

Thanks!

Marek



[pushed] c++: optimize reshape_init

2022-05-04 Thread Jason Merrill via Gcc-patches
If the index of a constructor_elt is a FIELD_DECL, the CONSTRUCTOR is
already reshaped, so we can save time and memory by returning immediately.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* decl.cc (reshape_init): Shortcut already-reshaped init.
 (reshape_init_class): Assert not getting one here.
---
 gcc/cp/decl.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index c9110db796a..0fa758ff214 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -6631,7 +6631,9 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
 
  if (TREE_CODE (d->cur->index) == FIELD_DECL)
{
- /* We already reshaped this.  */
+ /* We already reshaped this; we should have returned early from
+reshape_init.  */
+ gcc_checking_assert (false);
  if (field != d->cur->index)
{
  if (tree id = DECL_NAME (d->cur->index))
@@ -7068,6 +7070,10 @@ reshape_init (tree type, tree init, tsubst_flags_t 
complain)
   if (vec_safe_is_empty (v))
 return init;
 
+  if ((*v)[0].index && TREE_CODE ((*v)[0].index) == FIELD_DECL)
+/* Already reshaped.  */
+return init;
+
   /* Brace elision is not performed for a CONSTRUCTOR representing
  parenthesized aggregate initialization.  */
   if (CONSTRUCTOR_IS_PAREN_INIT (init))

base-commit: c2e846b539bb932d7f68f7e6b3e401c361cc3bf3
-- 
2.27.0



Re: [PATCH] c++: ICE during aggr CTAD for member tmpl [PR105476]

2022-05-04 Thread Jason Merrill via Gcc-patches

On 5/4/22 10:09, Patrick Palka wrote:

Here we're crashing from maybe_aggr_guide ultimately because
processing_template_decl isn't set when partially instantiating the
guide's parameter list.  This causes us to prematurely force completion
of the dependent type Visitor_functior, which fails and results in
an unexpected error_mark_node.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/12.2?


OK.


PR c++/105476

gcc/cp/ChangeLog:

* pt.cc (maybe_aggr_guide): Set processing_template_decl when
partially instantiating the guide's parameter list.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr13.C: New test.
* g++.dg/cpp2a/class-deduction-aggr13a.C: New test.
---
  gcc/cp/pt.cc   |  6 +-
  .../g++.dg/cpp2a/class-deduction-aggr13.C  | 11 +++
  .../g++.dg/cpp2a/class-deduction-aggr13a.C | 18 ++
  3 files changed, 34 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ac002907a41..2bec47dc295 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -29569,7 +29569,11 @@ maybe_aggr_guide (tree tmpl, tree init, 
vec *args)
 PARMS, so that its template level is properly reduced and we don't get
 mismatches when deducing types using the guide with PARMS.  */
if (member_template_p)
-   parms = tsubst (parms, DECL_TI_ARGS (tmpl), complain, init);
+   {
+ ++processing_template_decl;
+ parms = tsubst (parms, DECL_TI_ARGS (tmpl), complain, init);
+ --processing_template_decl;
+   }
  }
else if (TREE_CODE (init) == TREE_LIST)
  {
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13.C
new file mode 100644
index 000..aa8032c60b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13.C
@@ -0,0 +1,11 @@
+// PR c++/105476
+// { dg-do compile { target c++20 } }
+
+template struct Visitor_functor;
+
+template struct Events {
+  template struct Visitor : Visitor_functor::type_t... { };
+};
+
+using ev_t = Events;
+ev_t::Visitor v = { {} }; // { dg-error "too many initializers" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13a.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13a.C
new file mode 100644
index 000..69ae5dd4b60
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13a.C
@@ -0,0 +1,18 @@
+// PR c++/105476
+// { dg-do compile { target c++20 } }
+// A valid version of class-deduction-aggr13.C.
+
+template struct Visitor_functor;
+
+template<> struct Visitor_functor {
+  using type_t = int;
+};
+
+template struct Events {
+  template struct Visitor {
+Visitor_functor::type_t t;
+  };
+};
+
+using ev_t = Events;
+ev_t::Visitor v = { {} };




Re: [PATCH] gengtype: do not skip char after escape sequnce

2022-05-04 Thread Iain Sandoe



> On 4 May 2022, at 20:14, Martin Liška  wrote:
> 
> Right now, when a \$x escape sequence occures, the
> next character after $x is skipped, which is bogus.
> 
> The code has very low coverage right now.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?

… just curious ...

Is there no way to test this?
or to identify a target where the behaviour would be changed with/without the 
patch?
(and confirm the expected result).

thanks
Iain


> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   * gengtype-state.cc (read_a_state_token): Do not skip extra
>   character after escaped sequence.
> ---
> gcc/gengtype-state.cc | 10 --
> 1 file changed, 10 deletions(-)
> 
> diff --git a/gcc/gengtype-state.cc b/gcc/gengtype-state.cc
> index dfd9ea52785..2dfe8edf1a5 100644
> --- a/gcc/gengtype-state.cc
> +++ b/gcc/gengtype-state.cc
> @@ -473,43 +473,33 @@ read_a_state_token (void)
>   {
>   case 'a':
> obstack_1grow (&bstring_obstack, '\a');
> -   getc (state_file);
> break;
>   case 'b':
> obstack_1grow (&bstring_obstack, '\b');
> -   getc (state_file);
> break;
>   case 't':
> obstack_1grow (&bstring_obstack, '\t');
> -   getc (state_file);
> break;
>   case 'n':
> obstack_1grow (&bstring_obstack, '\n');
> -   getc (state_file);
> break;
>   case 'v':
> obstack_1grow (&bstring_obstack, '\v');
> -   getc (state_file);
> break;
>   case 'f':
> obstack_1grow (&bstring_obstack, '\f');
> -   getc (state_file);
> break;
>   case 'r':
> obstack_1grow (&bstring_obstack, '\r');
> -   getc (state_file);
> break;
>   case '"':
> obstack_1grow (&bstring_obstack, '\"');
> -   getc (state_file);
> break;
>   case '\\':
> obstack_1grow (&bstring_obstack, '\\');
> -   getc (state_file);
> break;
>   case ' ':
> obstack_1grow (&bstring_obstack, ' ');
> -   getc (state_file);
> break;
>   case 'x':
> {
> -- 
> 2.36.0
> 



[PATCH] gengtype: do not skip char after escape sequnce

2022-05-04 Thread Martin Liška
Right now, when a \$x escape sequence occures, the
next character after $x is skipped, which is bogus.

The code has very low coverage right now.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* gengtype-state.cc (read_a_state_token): Do not skip extra
character after escaped sequence.
---
 gcc/gengtype-state.cc | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/gengtype-state.cc b/gcc/gengtype-state.cc
index dfd9ea52785..2dfe8edf1a5 100644
--- a/gcc/gengtype-state.cc
+++ b/gcc/gengtype-state.cc
@@ -473,43 +473,33 @@ read_a_state_token (void)
{
case 'a':
  obstack_1grow (&bstring_obstack, '\a');
- getc (state_file);
  break;
case 'b':
  obstack_1grow (&bstring_obstack, '\b');
- getc (state_file);
  break;
case 't':
  obstack_1grow (&bstring_obstack, '\t');
- getc (state_file);
  break;
case 'n':
  obstack_1grow (&bstring_obstack, '\n');
- getc (state_file);
  break;
case 'v':
  obstack_1grow (&bstring_obstack, '\v');
- getc (state_file);
  break;
case 'f':
  obstack_1grow (&bstring_obstack, '\f');
- getc (state_file);
  break;
case 'r':
  obstack_1grow (&bstring_obstack, '\r');
- getc (state_file);
  break;
case '"':
  obstack_1grow (&bstring_obstack, '\"');
- getc (state_file);
  break;
case '\\':
  obstack_1grow (&bstring_obstack, '\\');
- getc (state_file);
  break;
case ' ':
  obstack_1grow (&bstring_obstack, ' ');
- getc (state_file);
  break;
case 'x':
  {
-- 
2.36.0



Re: [PATCH v4] c++: add color to function decl printing

2022-05-04 Thread Jason Merrill via Gcc-patches

On 1/14/22 22:56, Jason Merrill wrote:

On 1/14/22 16:49, David Malcolm wrote:

On Mon, 2021-12-13 at 09:58 -0500, Jason Merrill via Gcc-patches wrote:

On 12/13/21 06:02, Jonathan Wakely wrote:

On Sun, 12 Dec 2021 at 05:39, Jason Merrill mailto:ja...@redhat.com>> wrote:
  >
  > In reading C++ diagnostics, it's often hard to find the name of
the
function
  > in the middle of the template header, return type, parameters, and
template
  > arguments.  So let's colorize it, and maybe the template argument
bindings
  > while we're at it.


Thanks for the patch; sorry for not responding before.

Overall I like that patch, with some reservations...


  >
  > I've somewhat arbitrarily chosen bold green for the function name,
and
  > non-bold magenta for the template arguments.  I'm not at all
attached to
  > these choices.


I tried the patch with gnome terminals "light" and "dark" themes, and
the colors seemed readable with both color schemes.


  >
  > A side-effect is that when this happens in a quote (i.e. %qD), the
  > rest of the quote after the function name is no longer bold.  I
think
that's
  > acceptable; returning to the bold would require maintaining a
colorize stack
  > instead of the on/off controls we have now.


I was going to grumble about this, but having tried it on some
examples, I think it's actually an improvement in readability to drop
the emboldening, in that it reduces the "wall of bold text" seen.

Having tried it on some examples, I think the patch as a whole is a
definite readability win, in that it breaks up long stretches of bold
text into useful colorized parts; the result seems much easier to
decipher at a glance.  I wonder to what extent this is a poor-man's
syntax highlighting?


I think it definitely is.  Going farther in that direction would make 
sense it future.



Did you try any other variants of the highlighting?  This approach
seems to work well, FWIW, I wonder if others might work better, or if
this is a local maxima in terms of readability.


Do you have any particular variants in mind?  This was motivated by my 
having trouble finding the name of the function in diagnostic output, so 
that's the main thing I wanted to highlight.



I'm taking the liberty of attaching a screenshot (137K) showing
before/after the patch for the sake of discussion.


  >
  > Any thoughts?


I was also concerned about how this would interact with the template
type diff printing from f012c8ef4b35dcee9b5a3807868d050812d5b3b9, but I
did a few tests and it seems to work OK.

I only tested in lightly for a few minutes, so it could do with some
more testing.



I thought I was going to love this ... and it's a nice little
improvement, but I didn't love it as much as I expected.

Is it intentional that only the last function in the instantiation
stack
gets colourized? In this example the function on line 390 isn't
highlighted:

/home/jwakely/gcc/12/include/c++/12.0.0/bits/ranges_base.h:390:12:
  required
by substitution of 'template  requires
(is_bounded_array_v::type>) ||
(__member_size*<_Tp>) || (__adl_size<_Tp>) ||
(__sentinel_size<_Tp>)
constexpr auto std::ranges::__cust_access::_Size::operator()(_Tp&&)
const [with _Tp = adl::Footie (&)[]]*'


Oops, I needed to change subst_to_string as well.  Updated patch
below.


Jonathan, did you try the v2 patch?




Aside: it's a little odd that the first caret diagnostic there only
highlights the word "operator" and not the name of the function,
"operator()".


Yes, I imagine we need to adjust DECL_SOURCE_LOCATION to use a range
instead of assuming the location of the first token is sufficient.

Jason


[...]


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9b4371b9213..cdfddd75343 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -4803,6 +4803,14 @@ SGR substring for location information,

@samp{file:line} or

  @vindex quote GCC_COLORS @r{capability}
  SGR substring for information printed within quotes.
+@item fnname=
+@vindex fnname GCC_COLORS @r{capability}
+SGR substring for names of C++ functions.
+
+@item targs=
+@vindex targs GCC_COLORS @r{capability}
+SGR substring for C++ function template parameter bindings.
+
  @item fixit-insert=
  @vindex fixit-insert GCC_COLORS @r{capability}
  SGR substring for fix-it hints suggesting text to


There's a section of the docs for -fdiagnostics-color a little above
this, starting
   "The default @env{GCC_COLORS} is"
which needs to be updated whenever we add new color capabilities.


Fixed.


diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 323643d1e6f..0008ee2ee8d 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1,4 +1,4 @@
-/* Call-backs for C++ error reporting.
+/* Call-backs for -*- C++ -*- error reporting.


This change isn't called out in the ChangeLog.  Is it deliberate?


Yes, an incidental change to tell emacs that it's a C++ source file 
despite the .c suffix.  Probably not important, since it seems we're 
about to rename the .c files.



[...]


@@ -1158,6 +1163,2

[pushed] c++: use %D more

2022-05-04 Thread Jason Merrill via Gcc-patches
There's no reason to call cxx_printable_name_translate here instead of using
%D in the format string.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* error.c (cp_print_error_function): Use %qD.
(function_category): Use %qD.
---
 gcc/cp/error.cc | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 53a9073d99b..2e9dd2c6abd 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -3543,7 +3543,7 @@ cp_print_error_function (diagnostic_context *context,
fndecl = current_function_decl;
 
  pp_printf (context->printer, function_category (fndecl),
-cxx_printable_name_translate (fndecl, 2));
+fndecl);
 
  while (abstract_origin)
{
@@ -3587,19 +3587,19 @@ cp_print_error_function (diagnostic_context *context,
{
  if (context->show_column && s.column != 0)
pp_printf (context->printer,
-  _("inlined from %qs at %r%s:%d:%d%R"),
-  cxx_printable_name_translate (fndecl, 2),
+  _("inlined from %qD at %r%s:%d:%d%R"),
+  fndecl,
   "locus", s.file, s.line, s.column);
  else
pp_printf (context->printer,
-  _("inlined from %qs at %r%s:%d%R"),
-  cxx_printable_name_translate (fndecl, 2),
+  _("inlined from %qD at %r%s:%d%R"),
+  fndecl,
   "locus", s.file, s.line);
 
}
  else
-   pp_printf (context->printer, _("inlined from %qs"),
-  cxx_printable_name_translate (fndecl, 2));
+   pp_printf (context->printer, _("inlined from %qD"),
+  fndecl);
}
}
  pp_character (context->printer, ':');
@@ -3613,7 +3613,8 @@ cp_print_error_function (diagnostic_context *context,
 }
 
 /* Returns a description of FUNCTION using standard terminology.  The
-   result is a format string of the form "In CATEGORY %qs".  */
+   result is a format string of the form "In CATEGORY %qD".  */
+
 static const char *
 function_category (tree fn)
 {
@@ -3624,20 +3625,20 @@ function_category (tree fn)
   && DECL_FUNCTION_MEMBER_P (fn))
 {
   if (DECL_STATIC_FUNCTION_P (fn))
-   return _("In static member function %qs");
+   return _("In static member function %qD");
   else if (DECL_COPY_CONSTRUCTOR_P (fn))
-   return _("In copy constructor %qs");
+   return _("In copy constructor %qD");
   else if (DECL_CONSTRUCTOR_P (fn))
-   return _("In constructor %qs");
+   return _("In constructor %qD");
   else if (DECL_DESTRUCTOR_P (fn))
-   return _("In destructor %qs");
+   return _("In destructor %qD");
   else if (LAMBDA_FUNCTION_P (fn))
return _("In lambda function");
   else
-   return _("In member function %qs");
+   return _("In member function %qD");
 }
   else
-return _("In function %qs");
+return _("In function %qD");
 }
 
 /* Disable warnings about missing quoting in GCC diagnostics for

base-commit: 4a2061610726becfa5158e418c69800f5634b4c1
-- 
2.27.0



Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-04 Thread David Faust via Gcc-patches




On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

#define __typetag1 __attribute__((btf_type_tag("tag1")))
#define __typetag2 __attribute__((btf_type_tag("tag2")))
#define __typetag3 __attribute__((btf_type_tag("tag3")))

int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3',
to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or C2x [[]]
attribute syntax.  In either syntax, __typetag2 __typetag3 should apply to
the type to which g points, not to g or its type, just as if you had a
type qualifier there.  You'd need to put the attributes (or qualifier)
after the *, not before, to make them apply to the pointer type.  See
"Attribute Syntax" in the GCC manual for how the syntax is defined for GNU
attributes and deduce in turn, for each subsequence of the tokens matching
the syntax for some kind of declarator, what the type for "T D1" would be
as defined there and in the C standard, as deduced from the type for "T D"
for a sub-declarator D.
 >> But GCC's attribute parsing produces a variable 'g' which is "a 

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the pointer
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an attribute
appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of the syntax 
was not correct.


(Actually, I made a bad mistake in paraphrasing this example from the 
discussion of it in the series cover letter. But, the reason why it is 
incorrect is the same.)



Yonghong, is the specific ordering an expectation in BPF programs or 
other users of the tags?


This example comes from my testing against clang to check that the BTF 
generated by both toolchains is compatible. In this case we get 
different results when using the GNU attribute syntax.



To avoid confusion, here is the full example (from the cover letter). 
The difference in the results is clear in the DWARF.



Consider the following example:

  #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
  #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
  #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))

  int __typetag1 * __typetag2 __typetag3 * g;

 
asm_written unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x77450888
attributes 
value 
readonly constant static "type-tag-3\000">>
chain 
value 
readonly constant static "type-tag-2\000"
pointer_to_this >
asm_written unsigned DI size  unit-size 

align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x77509930
attributes 
value 
readonly constant static "type-tag-1\000"
public static unsigned DI defer-output /home/dfaust/playpen/btf/annotate.c:29:42 size 
 unit-size 
align:64 warn_if_not_align:0>




The current implementation produces the following DWARF:

 <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
<1f>   DW_AT_name: g
<21>   DW_AT_decl_file   : 1
<22>   DW_AT_decl_line   : 6
<23>   DW_AT_decl_column : 42
<24>   DW_AT_type: <0x32>
<28>   DW_AT_external: 1
<28>   DW_AT_location: 9 byte block: 3 0 0 0 0 0 0 0 0(DW_OP_addr: 
0)
 <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
<33>   DW_AT_byte_size   : 8
<33>   DW_AT_type: <0x45>
<37>   DW_AT_sibling : <0x45>
 <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
<3c>   DW_AT_name: (indirect string, offset: 0x18): btf_type_tag
<40>   DW_AT_const_value : (indirect string, offset: 0xc7): type-tag-1
 <2><44>: Abbrev Number: 0
 <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
<46>   DW_AT_byte_size   : 8
<46>   DW_AT_type: <0x61>
<4a>   DW_AT_sibling : <0x61>
 <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
<4f>   DW_AT_name: (indirect string, offset: 0x18): btf_type_tag
<53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
 <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
<58>   DW_AT_name: (indirect string, offset: 0x18): btf_type_tag
<5c>   DW_AT_const_value : (indirect string, offset: 0xd2): type-tag-2
 <2><60>: Abbrev Number: 0
 <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
<62>   DW_AT_byte_size   : 4
<63>   DW_AT_encoding: 5  (signed)
<64>   DW_AT_name: int
 <1><68>: Abbrev Number: 

Re: [PATCH] Add a restriction on allocate clause (OpenMP 5.0)

2022-05-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 18, 2022 at 11:13:16PM +, Hafiz Abid Qadeer wrote:
> An allocate clause in target region must specify an allocator
> unless the compilation unit has requires construct with
> dynamic_allocators clause.  Current implementation of the allocate
> clause did not check for this restriction. This patch fills that
> gap.
> 
> gcc/ChangeLog:
> 
>   * omp-low.cc (omp_maybe_offloaded_ctx): New prototype.
>   (scan_sharing_clauses):  Check a restriction on allocate clause.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/allocate-2.c: Add tests.
>   * c-c++-common/gomp/allocate-8.c: New test.
>   * gfortran.dg/gomp/allocate-3.f90: Add tests.
>   * gcc.dg/gomp/pr104517.c: Update.

I think it has even stronger requirement, that the allocator is present
and is present in uses_allocators clause.

But we don't parse uses_allocators, so this is a good step forward.

Ok.

Jakub



Re: [Patch] OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg

2022-05-04 Thread Tobias Burnus

Hi Jakub,

On 04.05.22 14:03, Jakub Jelinek wrote:

On Wed, Apr 20, 2022 at 03:19:38PM +0200, Tobias Burnus wrote:

--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -13656,26 +13656,30 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
 new_var = lookup_decl (var, ctx);
 new_var = DECL_VALUE_EXPR (new_var);
 tree v = new_var;
+tree v2 = var;
+if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_PTR
+|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_ADDR)
+  {
+v2 = maybe_lookup_decl_in_outer_ctx (var, ctx);
+if (DECL_HAS_VALUE_EXPR_P (v2))
+  v2 = DECL_VALUE_EXPR (v2);

I don't understand the above 2 lines, why do you need that?


I think it was intermittently required with some (half-)working patch.
But I concur that it is no longer is needed.


Otherwise LGTM, so if the 2 lines aren't needed, please also drop the
{}s around v2 = maybe_lookup_decl_in_outer_ctx (var, ctx); and reindent.

I did so, tested it also on my end and committed it as

r13-116-g3f8c389fe90bf565a6221a46bb7fb745dd4c1510

Thanks for the review!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: libgomp/plugin/plugin-gcn.c: Use -foffload-options= in err msg

2022-05-04 Thread Jakub Jelinek via Gcc-patches
On Wed, May 04, 2022 at 06:16:14PM +0200, Tobias Burnus wrote:
> See also https://gcc.gnu.org/gcc-12/changes.html#languages and
> https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload
> 
> -foffload= was never officially documented, albeit most users will
> have encountered it. Since GCC 12 it is - but the -foffload=-
> part is officially only handled by -foffload-options=, even if it
> works as legacy feature with -foffload= as well.
> 
> OK for GCC 13?
> 
> Tobias
> 
> PS: Note that -foffload=amdgcn-amdhsa=-march=gfx908 and
> -foffload-options=amdgcn-amdhsa=-march=gfx908 are not identical.
> The former (legacy feature) will disable all other supported targets,
> such as nvptx, while the latter keeps the default set - and just passes
> that additional flag to the amdgcn target compiler.
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> libgomp/plugin/plugin-gcn.c: Use -foffload-options= in err msg
> 
> While -foffload=- works (never documented legacy feature),
> the documented way is to use -foffload-options=.
> 
> libgomp/ChangeLog:
> 
> * plugin/plugin-gcn.c (isa_matches_agent): Suggest -foffload-options.

LGTM.

Jakub



libgomp/plugin/plugin-gcn.c: Use -foffload-options= in err msg

2022-05-04 Thread Tobias Burnus

See also https://gcc.gnu.org/gcc-12/changes.html#languages and
https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload

-foffload= was never officially documented, albeit most users will
have encountered it. Since GCC 12 it is - but the -foffload=-
part is officially only handled by -foffload-options=, even if it
works as legacy feature with -foffload= as well.

OK for GCC 13?

Tobias

PS: Note that -foffload=amdgcn-amdhsa=-march=gfx908 and
-foffload-options=amdgcn-amdhsa=-march=gfx908 are not identical.
The former (legacy feature) will disable all other supported targets,
such as nvptx, while the latter keeps the default set - and just passes
that additional flag to the amdgcn target compiler.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp/plugin/plugin-gcn.c: Use -foffload-options= in err msg

While -foffload=- works (never documented legacy feature),
the documented way is to use -foffload-options=.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (isa_matches_agent): Suggest -foffload-options.

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index f305d726874..2b32f5352c8 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -2352,7 +2352,7 @@ isa_matches_agent (struct agent_info *agent, Elf64_Ehdr *image)
 
   snprintf (msg, sizeof msg,
 		"GCN code object ISA '%s' does not match GPU ISA '%s'.\n"
-		"Try to recompile with '-foffload=-march=%s'.\n",
+		"Try to recompile with '-foffload-options=-march=%s'.\n",
 		isa_s, agent_isa_s, agent_isa_gcc_s);
 
   hsa_error (msg, HSA_STATUS_ERROR);


Re: [PATCH] c++: parse error with >= in template argument list [PR105436]

2022-05-04 Thread Jason Merrill via Gcc-patches

On 5/4/22 11:15, Marek Polacek wrote:

On Tue, May 03, 2022 at 04:46:11PM -0400, Jason Merrill wrote:

On 5/3/22 16:43, Jakub Jelinek wrote:

On Tue, May 03, 2022 at 04:26:51PM -0400, Jason Merrill wrote:

On 5/2/22 12:19, Marek Polacek wrote:

This patch fixes an oversight whereby we treated >= as the end of
a template argument.  This causes problems in C++14, because in
cp_parser_template_argument we go different ways for C++14 and C++17:

 /* It must be a non-type argument.  In C++17 any constant-expression is
allowed.  */
 if (cxx_dialect > cxx14)
   goto general_expr;

so in this testcase in C++14 we get "N" as the template argument but in
C++17 it is the whole "N >= 5" expression.  So in C++14 the remaining
">= 5" triggered the newly-added diagnostic.


Hmm, I think >>= is questionable as well, as it could resolve to a constexpr
operator>>=.  Seems like the two calls to


The template argument is a constant-expression and >>= can't appear non-nested
in constant-expression non-terminal, can it?


Ah, true, a constant-expression is a conditional-expression, which can't
involve an assignment operator.


So do you want me to make any changes or is the patch OK as-is?


OK.



Re: [PATCH] OpenMP, libgomp: Environment variable syntax extension.

2022-05-04 Thread Tobias Burnus

On 04.05.22 17:12, Jakub Jelinek via Gcc-patches wrote:

Though, there is one gotcha, if we had code where we parsed some var first
and another one later and there was interdependence between the two, in
environ they can appear in any order.


I think for interdependence it depends whether in a first step, only a
flag or the string is stored or whether the data is fully evaluated.
Especially for some 'global' entries, storing pointer to the string and
evaluating it later makes sense. In some cases, evaluation the values
(e.g. convert to integer) could be done early while checking the range
could be done later. – In any case, OMP_DISPLAY_ENV needs to be handled
last – but it can be parsed early (e.g. by setting a simple flag).

Tobias

PS: For completeness, I want to point out that in the current nonpublic
draft (will show up as TR11 or OpenMP 6.0, I guess), some fixes related
to one ICV (default-device-var → global; issue #2740) and to the _DEV
rules (priority of variants, _all, fine-print, cleanup; PR #3175) was
done. Those clarifications were found and discussed while the patch was
written and, hence, are incorporated and should be nonsurprising.

PPS: An example of an ICV which has to be evaluated before another is
the to-be-added available-devices-icv  that surely have to be evaluated
before default-device-icv (→PR #3198; useful, fancy & overengeneered).
But that one is surely not a GCC 13 topic.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] c++: parse error with >= in template argument list [PR105436]

2022-05-04 Thread Marek Polacek via Gcc-patches
On Tue, May 03, 2022 at 04:46:11PM -0400, Jason Merrill wrote:
> On 5/3/22 16:43, Jakub Jelinek wrote:
> > On Tue, May 03, 2022 at 04:26:51PM -0400, Jason Merrill wrote:
> > > On 5/2/22 12:19, Marek Polacek wrote:
> > > > This patch fixes an oversight whereby we treated >= as the end of
> > > > a template argument.  This causes problems in C++14, because in
> > > > cp_parser_template_argument we go different ways for C++14 and C++17:
> > > > 
> > > > /* It must be a non-type argument.  In C++17 any 
> > > > constant-expression is
> > > >allowed.  */
> > > > if (cxx_dialect > cxx14)
> > > >   goto general_expr;
> > > > 
> > > > so in this testcase in C++14 we get "N" as the template argument but in
> > > > C++17 it is the whole "N >= 5" expression.  So in C++14 the remaining
> > > > ">= 5" triggered the newly-added diagnostic.
> > > 
> > > Hmm, I think >>= is questionable as well, as it could resolve to a 
> > > constexpr
> > > operator>>=.  Seems like the two calls to
> > 
> > The template argument is a constant-expression and >>= can't appear 
> > non-nested
> > in constant-expression non-terminal, can it?
> 
> Ah, true, a constant-expression is a conditional-expression, which can't
> involve an assignment operator.

So do you want me to make any changes or is the patch OK as-is?

Marek



Re: [PATCH] OpenMP, libgomp: Environment variable syntax extension.

2022-05-04 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 18, 2022 at 05:10:47PM +0100, Marcel Vollweiler wrote:
> Hi,
> 
> This patch considers the environment variable syntax extension for
> device-specific variants of environment variables from OpenMP 5.1 (see
> OpenMP 5.1 specification, p. 75 and p. 639). An environment variable
> (e.g. OMP_NUM_TEAMS) can have different suffixes:
> 
> _DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host.
> _DEV_ (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with
> number .
> no suffix (e.g. OMP_NUM_TEAMS): affects only the host.
> 
> In future OpenMP versions also suffix _ALL will be introduced (see
> discussion https://github.com/OpenMP/spec/issues/3179). This is also
> considered in this patch:
> 
> _ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host.
> 
> The precedence is as follows (descending). For the host:
> 
> 1. no suffix
> 2. _ALL
> 
> For devices:
> 
> 1. _DEV_
> 2. _DEV
> 3. _ALL
> 
> That means, _DEV_ is used whenever available. Otherwise _DEV is
> used if available, and at last _ALL. If there is no value for any of the
> variable variants, default values are used as already implemented before.
> 
> This patch concerns parsing (a), storing (b), output (c) and
> transmission to the device (d):
> 
> (a) The actual number of devices and the numbering are not known when
> parsing the environment variables. Thus all environment variables are
> iterated and searched for device-specific ones.
> 
> (b) Only configured device-specific variables are stored. Thus, linked
> lists are used.
> 
> (c) The output is done in omp_display_env (see specification p. 468f).
> Global ICVs are tagged with [all], see
> https://github.com/OpenMP/spec/issues/3179. ICVs which are not global
> but aren't handled device-specific yet are tagged with [host].
> omp_display_env outputs the initial values of the ICVs. That's why
> separate data structures are introduced (like gomp_initial_icv...).
> 
> (d) Device-specific ICVs which are already user accessible on the device
> are transmitted to the device (moreover nteams-var is added and used for
> the tests). There are ICVs which values are currently set explicitly in
> the config when copying them to the device: GOMP_NTHREADS_VAR,
> GOMP_THREAD_LIMIT_VAR, GOMP_DYN_VAR (see gomp_gcn_enter_kernel in
> libgomp/config/gcn/team.c and gomp_nvptx_main in
> libgomp/config/nvptx/team.c). The corresponding environment variables
> are nevertheless parsed and stored device-specific but the transmission
> to the device is not changed.

Just a partial review, there are many issues.
Some issues I'm mentioning just once or several times but many apply to
various other spots in the patch.

> +/* Returns the element of the list for the specified device number.  */
> +struct gomp_icv_list*
> +gomp_get_icv_list (struct gomp_icv_list **list, int device_num)
> +{
> +  struct gomp_icv_list *l = *list;
> +  while (l != NULL)
> +{
> +  if (l->device_num == device_num)
> + return l;
> +  l = l->next;
> +}
> +  return NULL;
> +}
> +
> +void*

Space before *.

> +gomp_get_icv_value_ptr (struct gomp_icv_list **list, int device_num)
> +{
> +  struct gomp_icv_list *l = gomp_get_icv_list (list, device_num);
> +  if (l == NULL)
> +return NULL;
> +  return l->value;
> +}
> +
> +/* Lists for initial device-specific ICVs, i.e. ICVs that are configured for
> +   particular devices (with environment variables like 
> OMP_NUM_TEAMS_DEV_42). */
> +struct gomp_icv_list *gomp_dyn_var_dev_list = NULL;
> +struct gomp_icv_list *gomp_nthreads_var_dev_list = NULL;
> +struct gomp_icv_list *gomp_nthreads_var_list_dev_list = NULL;
> +struct gomp_icv_list *gomp_nthreads_var_list_len_dev_list = NULL;
> +struct gomp_icv_list *gomp_run_sched_var_dev_list = NULL;
> +struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list = NULL;
> +struct gomp_icv_list *gomp_nteams_var_dev_list = NULL;
> +struct gomp_icv_list *gomp_thread_limit_var_dev_list = NULL;
> +struct gomp_icv_list *gomp_max_active_levels_var_dev_list = NULL;
> +struct gomp_icv_list *gomp_proc_bind_var_dev_list = NULL;
> +struct gomp_icv_list *gomp_proc_bind_var_list_dev_list = NULL;
> +struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list = NULL;
> +struct gomp_icv_list *stacksize_dev_list = NULL;
> +struct gomp_icv_list *wait_policy_dev_list = NULL;
> +struct gomp_icv_list *teams_thread_limit_var_dev_list = NULL;

To me the above is just too big extra .data growth, we should optimize for
the common case of no OMP_* env vars or a few host cases of them.
So, I think it is ok to have the gomp_initial_icv var as is and
gomp_initial_icv_flags too.  But I'd turn gomp_initial_icv_all and
gomp_initial_icv_dev into pointers to gomp_initial_icv_t, and maybe instead
of the OMP_*_DEV_ linked lists for each var separately add one linked
list that contains device number, next pointer, gomp_initial_icv_t for values
and gomp_icv_flags_t used as a bitmask "is this ICV set for this ".

> +
> +/* Flags for non-global

Re: [PATCH] lto-plugin: add support for feature detection

2022-05-04 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 4 May 2022 15:31:32 +0200
Martin Liška  wrote:

> On 5/4/22 15:10, Alexander Monakov wrote:
> > On Wed, 4 May 2022, Martin Liška wrote:
> >   
> >> On 5/4/22 14:32, Alexander Monakov wrote:  
> >>> On Wed, 4 May 2022, Martin Liška wrote:
> >>>  
>  The patch is a follow-up of the discussion we've got in:
>  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593901.html
> 
>  Mold linker would appreciate knowing in advance if get_symbols_v3 is 
>  supported
>  by a GCC plug-in or not.  
> > 
> > Out of curiousity, I looked at mold, and I don't understand what problem 
> > this
> > detection is solving, nor why this is the best way to solve that. Was there
> > some discussion with mold author I should check out?  
> 
> Sure, please take a look at this issue:
> https://github.com/rui314/mold/issues/454#issuecomment-1116849458
> 
> > 
> > Note that mold takes this not only as 'v3 API is supported', but, more
> > importantly, as 'v2 entrypoint will not be called'.  
> 
> Yes, if they register get_symbols_v3, then it will be called. That's how the
> plug-in works.

FWIW, I usually use an alias to some existing, exported symbol, like:
/* Announce that we are GPL.  */
#if defined __GNUC__
extern __typeof (plugin_init) plugin_is_GPL_compatible __attribute__ ((alias 
("plugin_init")));
#else
unsigned char plugin_is_GPL_compatible;
#endif


New Spanish PO file for 'cpplib' (version 12.1-b20220213)

2022-05-04 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Spanish team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/es.po

(This file, 'cpplib-12.1-b20220213.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-12.1-b20220213.es.po'

2022-05-04 Thread Translation Project Robot


cpplib-12.1-b20220213.es.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



New Spanish PO file for 'cpplib' (version 12.1-b20220213)

2022-05-04 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Spanish team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/es.po

(This file, 'cpplib-12.1-b20220213.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-12.1-b20220213.es.po'

2022-05-04 Thread Translation Project Robot


cpplib-12.1-b20220213.es.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE

2022-05-04 Thread Pop, Sebastian via Gcc-patches
> Yes this looks good to me (still needs maintainer approval). 

Thanks again Wilco for your review.

> One minor nitpick,
> a few of the tests check for __aarch64_cas2 - this should be 
> __aarch64_cas2_sync.

Fixed in the attached patch.

> Note the patch still needs an appropriate commit message.

Added the following ChangeLog entry to the commit message.

* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
of str array.
* config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call
memmodel_from_int and handle MEMMODEL_SYNC_*.
(DEF0): Add __aarch64_*_sync functions.

testsuite/
* gcc.target/aarch64/sync-comp-swap-ool.c: New.
* gcc.target/aarch64/sync-op-acquire-ool.c: New.
* gcc.target/aarch64/sync-op-full-ool.c: New.
* gcc.target/aarch64/target_attr_20.c: Update check.
* gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
From 3b624598035e4e0c1aee89efaae28596a64b3d0d Mon Sep 17 00:00:00 2001
From: Sebastian Pop 
Date: Mon, 18 Apr 2022 15:13:20 +
Subject: [PATCH] [AArch64] add barriers to ool __sync builtins

	* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
	of str array.
	* config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call
	memmodel_from_int and handle MEMMODEL_SYNC_*.
	(DEF0): Add __aarch64_*_sync functions.

testsuite/
	* gcc.target/aarch64/sync-comp-swap-ool.c: New.
	* gcc.target/aarch64/sync-op-acquire-ool.c: New.
	* gcc.target/aarch64/sync-op-full-ool.c: New.
	* gcc.target/aarch64/target_attr_20.c: Update check.
	* gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
	* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
	* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
---
 gcc/config/aarch64/aarch64-protos.h   |  2 +-
 gcc/config/aarch64/aarch64.cc | 12 --
 .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 +++
 .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 +++
 .../gcc.target/aarch64/sync-op-full-ool.c |  9 
 .../gcc.target/aarch64/target_attr_20.c   |  2 +-
 .../gcc.target/aarch64/target_attr_21.c   |  2 +-
 libgcc/config/aarch64/lse.S   | 42 +--
 libgcc/config/aarch64/t-lse   |  8 ++--
 9 files changed, 75 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 46bade28ed6..3ad5e77a1af 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1051,7 +1051,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-const char *str[5][4];
+const char *str[5][5];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 18f80499079..3ad11e84aae 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22670,14 +22670,14 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
-/* We store the names of the various atomic helpers in a 5x4 array.
+/* We store the names of the various atomic helpers in a 5x5 array.
Return the libcall function given MODE, MODEL and NAMES.  */
 
 rtx
 aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 			const atomic_ool_names *names)
 {
-  memmodel model = memmodel_base (INTVAL (model_rtx));
+  memmodel model = memmodel_from_int (INTVAL (model_rtx));
   int mode_idx, model_idx;
 
   switch (mode)
@@ -22717,6 +22717,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 case MEMMODEL_SEQ_CST:
   model_idx = 3;
   break;
+case MEMMODEL_SYNC_ACQUIRE:
+case MEMMODEL_SYNC_RELEASE:
+case MEMMODEL_SYNC_SEQ_CST:
+  model_idx = 4;
+  break;
 default:
   gcc_unreachable ();
 }
@@ -22729,7 +22734,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
   { "__aarch64_" #B #N "_relax", \
 "__aarch64_" #B #N "_acq", \
 "__aarch64_" #B #N "_rel", \
-"__aarch64_" #B #N "_acq_rel" }
+"__aarch64_" #B #N "_acq_rel", \
+"__aarch64_" #B #N "_sync" }
 
 #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
 		 { NULL, NULL, NULL, NULL }
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
new file mode 100644
index 000..372f4aa8746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv

Re: [PATCH][pushed] Remove dead code.

2022-05-04 Thread Martin Liška
On 5/4/22 15:20, Andreas Schwab wrote:
> On Mai 04 2022, Martin Liška wrote:
> 
>> diff --git a/gcc/gengtype-state.cc b/gcc/gengtype-state.cc
>> index ea566af3249..dfd9ea52785 100644
>> --- a/gcc/gengtype-state.cc
>> +++ b/gcc/gengtype-state.cc
>> @@ -473,43 +473,43 @@ read_a_state_token (void)
>>  {
>>  case 'a':
>>obstack_1grow (&bstring_obstack, '\a');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case 'b':
>>obstack_1grow (&bstring_obstack, '\b');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case 't':
>>obstack_1grow (&bstring_obstack, '\t');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case 'n':
>>obstack_1grow (&bstring_obstack, '\n');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case 'v':
>>obstack_1grow (&bstring_obstack, '\v');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case 'f':
>>obstack_1grow (&bstring_obstack, '\f');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case 'r':
>>obstack_1grow (&bstring_obstack, '\r');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case '"':
>>obstack_1grow (&bstring_obstack, '\"');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case '\\':
>>obstack_1grow (&bstring_obstack, '\\');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
>>  case ' ':
>>obstack_1grow (&bstring_obstack, ' ');
>> -  c = getc (state_file);
>> +  getc (state_file);
>>break;
> 
> This is surprising.  Does that mean that an escape sequence must always
> be followed by a character that is thrown away?

Yeah, that's bogus. There's only one string that has an escape sequence.
If I apply:

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index e3fff1f6975..982fa724af0 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -160,7 +160,7 @@ struct GTY((chain_next ("%h.next_bb"), chain_prev 
("%h.prev_bb"))) basic_block_d
is the better choice.  */
 typedef int __assert_gimple_bb_smaller_rtl_bb
   [(int) sizeof (struct rtl_bb_info)
-   - (int) sizeof (struct gimple_bb_info)];
+- (int) sizeof (struct gimple_bb_info)];
 
 
 #define BB_FREQ_MAX 1

Then I end up with:

CSTR=(int) sizeof (struct rtl_bb_info)
 (int) sizeof (struct gimple_bb_info)

Where '-' is wrongly discarded.

> state_writer::write_state_a_string surely doesn't suggest that.  I
> suspect that this part of the code has never been exercised.
> 

So yes, I'm going to prepare a patch which would not skip next character.

Martin


[PATCH] c++: ICE during aggr CTAD for member tmpl [PR105476]

2022-05-04 Thread Patrick Palka via Gcc-patches
Here we're crashing from maybe_aggr_guide ultimately because
processing_template_decl isn't set when partially instantiating the
guide's parameter list.  This causes us to prematurely force completion
of the dependent type Visitor_functior, which fails and results in
an unexpected error_mark_node.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/12.2?

PR c++/105476

gcc/cp/ChangeLog:

* pt.cc (maybe_aggr_guide): Set processing_template_decl when
partially instantiating the guide's parameter list.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr13.C: New test.
* g++.dg/cpp2a/class-deduction-aggr13a.C: New test.
---
 gcc/cp/pt.cc   |  6 +-
 .../g++.dg/cpp2a/class-deduction-aggr13.C  | 11 +++
 .../g++.dg/cpp2a/class-deduction-aggr13a.C | 18 ++
 3 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ac002907a41..2bec47dc295 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -29569,7 +29569,11 @@ maybe_aggr_guide (tree tmpl, tree init, 
vec *args)
 PARMS, so that its template level is properly reduced and we don't get
 mismatches when deducing types using the guide with PARMS.  */
   if (member_template_p)
-   parms = tsubst (parms, DECL_TI_ARGS (tmpl), complain, init);
+   {
+ ++processing_template_decl;
+ parms = tsubst (parms, DECL_TI_ARGS (tmpl), complain, init);
+ --processing_template_decl;
+   }
 }
   else if (TREE_CODE (init) == TREE_LIST)
 {
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13.C
new file mode 100644
index 000..aa8032c60b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13.C
@@ -0,0 +1,11 @@
+// PR c++/105476
+// { dg-do compile { target c++20 } }
+
+template struct Visitor_functor;
+
+template struct Events {
+  template struct Visitor : Visitor_functor::type_t... { };
+};
+
+using ev_t = Events;
+ev_t::Visitor v = { {} }; // { dg-error "too many initializers" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13a.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13a.C
new file mode 100644
index 000..69ae5dd4b60
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr13a.C
@@ -0,0 +1,18 @@
+// PR c++/105476
+// { dg-do compile { target c++20 } }
+// A valid version of class-deduction-aggr13.C.
+
+template struct Visitor_functor;
+
+template<> struct Visitor_functor {
+  using type_t = int;
+};
+
+template struct Events {
+  template struct Visitor {
+Visitor_functor::type_t t;
+  };
+};
+
+using ev_t = Events;
+ev_t::Visitor v = { {} };
-- 
2.36.0.44.g0f828332d5



[pushed] c++: Remove cdtor_label

2022-05-04 Thread Jason Merrill via Gcc-patches
Jakub pointed out that cdtor_label is unnecessary, we should get all the
desired semantics with a normal return.

Tested x86_64-pc-linux-gnu and arm-eabi//arm-sim, applying to trunk.

gcc/cp/ChangeLog:

* cp-tree.h (struct language_function): Remove x_cdtor_label.
(cdtor_label, LABEL_DECL_CDTOR): Remove.
* constexpr.cc (returns): Don't check LABEL_DECL_CDTOR.
(cxx_eval_constant_expression): Don't call returns.
* decl.cc (check_goto): Don't check cdtor_label.
(start_preparsed_function): And don't set it.
(finish_constructor_body, finish_destructor_body): Remove.
(finish_function_body): Don't call them.
* typeck.cc (check_return_expr): Handle cdtor_returns_this here.
* semantics.cc (finish_return_stmt): Not here.
---
 gcc/cp/cp-tree.h| 14 --
 gcc/cp/constexpr.cc |  8 ++--
 gcc/cp/decl.cc  | 20 
 gcc/cp/semantics.cc | 11 ---
 gcc/cp/typeck.cc| 12 ++--
 5 files changed, 12 insertions(+), 53 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8d7cf240b68..663fe7a20fc 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -543,7 +543,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   DECL_CONSTRAINT_VAR_P (in a PARM_DECL)
   TEMPLATE_DECL_COMPLEX_ALIAS_P (in TEMPLATE_DECL)
   DECL_INSTANTIATING_NSDMI_P (in a FIELD_DECL)
-  LABEL_DECL_CDTOR (in LABEL_DECL)
   USING_DECL_UNRELATED_P (in USING_DECL)
3: DECL_IN_AGGR_P.
4: DECL_C_BIT_FIELD (in a FIELD_DECL)
@@ -2057,7 +2056,6 @@ struct named_label_hash : ggc_remove 
 struct GTY(()) language_function {
   struct c_language_function base;
 
-  tree x_cdtor_label;
   tree x_current_class_ptr;
   tree x_current_class_ref;
   tree x_eh_spec_block;
@@ -2091,13 +2089,6 @@ struct GTY(()) language_function {
 
 #define cp_function_chain (cfun->language)
 
-/* In a constructor destructor, the point at which all derived class
-   destroying/construction has been done.  I.e., just before a
-   constructor returns, or before any base class destroying will be done
-   in a destructor.  */
-
-#define cdtor_label cp_function_chain->x_cdtor_label
-
 /* When we're processing a member function, current_class_ptr is the
PARM_DECL for the `this' pointer.  The current_class_ref is an
expression for `*this'.  */
@@ -4278,11 +4269,6 @@ get_vec_init_expr (tree t)
 #define DECL_LOCAL_DECL_ALIAS(NODE)\
   DECL_ACCESS ((gcc_checking_assert (DECL_LOCAL_DECL_P (NODE)), NODE))
 
-/* Nonzero if NODE is the target for genericization of 'return' stmts
-   in constructors/destructors of targetm.cxx.cdtor_returns_this targets.  */
-#define LABEL_DECL_CDTOR(NODE) \
-  DECL_LANG_FLAG_2 (LABEL_DECL_CHECK (NODE))
-
 /* True if NODE was declared with auto in its return type, but it has
started compilation and so the return type might have been changed by
return type deduction; its declared return type should be found in
diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index c40efa6cc4e..9b1e71857fc 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -6060,9 +6060,7 @@ static bool
 returns (tree *jump_target)
 {
   return *jump_target
-&& (TREE_CODE (*jump_target) == RETURN_EXPR
-   || (TREE_CODE (*jump_target) == LABEL_DECL
-   && LABEL_DECL_CDTOR (*jump_target)));
+&& TREE_CODE (*jump_target) == RETURN_EXPR;
 }
 
 static bool
@@ -7473,9 +7471,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
 
 case GOTO_EXPR:
   if (breaks (&TREE_OPERAND (t, 0))
- || continues (&TREE_OPERAND (t, 0))
- /* Allow for jumping to a cdtor_label.  */
- || returns (&TREE_OPERAND (t, 0)))
+ || continues (&TREE_OPERAND (t, 0)))
*jump_target = TREE_OPERAND (t, 0);
   else
{
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 324498f399d..c9110db796a 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -3613,11 +3613,6 @@ check_goto (tree decl)
   if (TREE_CODE (decl) != LABEL_DECL)
 return;
 
-  /* We didn't record any information about this label when we created it,
- and there's not much point since it's trivial to analyze as a return.  */
-  if (decl == cdtor_label)
-return;
-
   hashval_t hash = IDENTIFIER_HASH_VALUE (DECL_NAME (decl));
   named_label_entry **slot
 = named_labels->find_slot_with_hash (DECL_NAME (decl), hash, NO_INSERT);
@@ -17325,14 +17320,6 @@ start_preparsed_function (tree decl1, tree attrs, int 
flags)
 
   ++function_depth;
 
-  if (DECL_DESTRUCTOR_P (decl1)
-  || (DECL_CONSTRUCTOR_P (decl1)
- && targetm.cxx.cdtor_returns_this ()))
-{
-  cdtor_label = create_artificial_label (input_location);
-  LABEL_DECL_CDTOR (cdtor_label) = true;
-}
-
   start_fname_decls ();
 
   store_parm_decls (current_function_parms);
@@ -17503,9 +17490,6 @@ finish_constructor_body (void)
 
   if (targetm.cxx.cdtor_returns_this ())
 {
-  /* Any

Re: [PATCH] gimple-isel: handle x CMP y ? z : 0

2022-05-04 Thread Richard Earnshaw via Gcc-patches




On 04/05/2022 12:14, Richard Biener wrote:

On Wed, May 4, 2022 at 12:16 PM Richard Earnshaw via Gcc-patches
 wrote:



Gimple isel already handles x CMP y ? -1 : 0 when lowering vector cond
operations, but this can be generalized further when the comparison
forms a natural mask so that we can also handle x CMP y ? z : 0 by
transforming it into (x CMP y) & z.  This will, in most cases save
having to load a register with the zero vector.


Hmm, I'm not sure why exactly the existing case is in ISEL but if
we want to extend it there we migh also want to handle ? 0 : -1
as BIT_NOT.


I was assuming that there had already been some level of 
canonicalization at this point, but maybe that's an incorrect 
assumption.  There are quite a few transforms that would avoid a select 
operation, but at some point the bit manipulations may well become more 
expensive than the select itself.




For the case in question it looks like it might better fit as optimization
in match.pd?  It also lacks a query for present and3 support.
For match.pd it might be nice to handle x CMP y ? 0 : z as well
if we can invert x CMP y (and it has only a single use).  When in
ISEL we might as well check for andn3 support and go with
BIT_NOT + BIT_AND ...


I'll have to think about that a bit, the approach was inspired by the 
hunk just a bit earlier that starts:


  /* Lower mask typed, non-vector mode VEC_COND_EXPRs to bitwise 
operations.
 Those can end up generated by folding and at least for integer 
mode masks

 we cannot expect vcond expanders to exist.  We lower a ? b : c
 to (b & a) | (c & ~a).  */
  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (lhs))
  && !VECTOR_MODE_P (mode))
{
  gcc_assert (types_compatible_p (TREE_TYPE (op0), TREE_TYPE (op1)));
  gimple_seq stmts = NULL;
  tree type = TREE_TYPE (lhs);
  location_t loc = gimple_location (stmt);
  tree tem0 = gimple_build (&stmts, loc, BIT_AND_EXPR, type, op1, op0);

There's no test for and3 support there, but I guess that's working 
on integral modes rather than vectors.




Do you have a testcase where it improves code generation btw?


It was originally inspired by:

void f (int * __restrict__ dest, int *a, int *b)
{
  int i;
  for (i=0; i<4; i++) {
dest[i] = a[i] == b[i];
  }
}

which with Neon on Arm was producing a vsel sequence rather than a mask 
operation because the backend doesn't handle this case directly during 
expand (unlike some other targets); but it seemed wrong for every target 
to have to handle this in the back-end when it's a pretty generic 
optimization.


A more generalized case would be something like

void f (int * __restrict__ dest, int *a, int *b)
{
  int i;
  for (i=0; i<4; i++) {
dest[i] = a[i] ? b[i] : 0;
  }
}

R.



Richard.


gcc/ChangeLog:
 * gimple-isel.cc (gimple_expand_vec_cond_expr): Handle
 x CMP y ? z : 0.
---
  gcc/gimple-isel.cc | 29 +++--
  1 file changed, 23 insertions(+), 6 deletions(-)



Re: [PATCH] lto-plugin: add support for feature detection

2022-05-04 Thread Martin Liška
On 5/4/22 15:10, Alexander Monakov wrote:
> On Wed, 4 May 2022, Martin Liška wrote:
> 
>> On 5/4/22 14:32, Alexander Monakov wrote:
>>> On Wed, 4 May 2022, Martin Liška wrote:
>>>
 The patch is a follow-up of the discussion we've got in:
 https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593901.html

 Mold linker would appreciate knowing in advance if get_symbols_v3 is 
 supported
 by a GCC plug-in or not.
> 
> Out of curiousity, I looked at mold, and I don't understand what problem this
> detection is solving, nor why this is the best way to solve that. Was there
> some discussion with mold author I should check out?

Sure, please take a look at this issue:
https://github.com/rui314/mold/issues/454#issuecomment-1116849458

> 
> Note that mold takes this not only as 'v3 API is supported', but, more
> importantly, as 'v2 entrypoint will not be called'.

Yes, if they register get_symbols_v3, then it will be called. That's how the
plug-in works.

Martin

> 
> Alexander



Re: [PATCH][pushed] Remove dead code.

2022-05-04 Thread Andreas Schwab
On Mai 04 2022, Martin Liška wrote:

> diff --git a/gcc/gengtype-state.cc b/gcc/gengtype-state.cc
> index ea566af3249..dfd9ea52785 100644
> --- a/gcc/gengtype-state.cc
> +++ b/gcc/gengtype-state.cc
> @@ -473,43 +473,43 @@ read_a_state_token (void)
>   {
>   case 'a':
> obstack_1grow (&bstring_obstack, '\a');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case 'b':
> obstack_1grow (&bstring_obstack, '\b');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case 't':
> obstack_1grow (&bstring_obstack, '\t');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case 'n':
> obstack_1grow (&bstring_obstack, '\n');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case 'v':
> obstack_1grow (&bstring_obstack, '\v');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case 'f':
> obstack_1grow (&bstring_obstack, '\f');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case 'r':
> obstack_1grow (&bstring_obstack, '\r');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case '"':
> obstack_1grow (&bstring_obstack, '\"');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case '\\':
> obstack_1grow (&bstring_obstack, '\\');
> -   c = getc (state_file);
> +   getc (state_file);
> break;
>   case ' ':
> obstack_1grow (&bstring_obstack, ' ');
> -   c = getc (state_file);
> +   getc (state_file);
> break;

This is surprising.  Does that mean that an escape sequence must always
be followed by a character that is thrown away?
state_writer::write_state_a_string surely doesn't suggest that.  I
suspect that this part of the code has never been exercised.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] tree-optimization/103116 - SLP permutes and peeling for gaps

2022-05-04 Thread Richard Biener via Gcc-patches
On Wed, 4 May 2022, Richard Sandiford wrote:

> Richard Biener  writes:
> > The testcase shows that we can end up with a contiguous access across
> > loop iterations but by means of permutations the elements accessed
> > might only cover parts of a vector.  In this case we end up with
> > GROUP_GAP == 0 but still need to avoid accessing excess elements
> > in the last loop iterations.  Peeling for gaps is designed to cover
> > this but a single scalar iteration might not cover all of the excess
> > elements.  The following ensures peeling for gaps is done in this
> > situation and when that isn't sufficient because we need to peel
> > more than one iteration (gcc.dg/vect/pr103116-2.c), fail the SLP
> > vectorization.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> 
> LGTM.

Thanks, pushed.

> In principle I think we could (in future) handle some of the
> !multiple_p cases for variable-length vectors, but I don't think it
> would ever trigger in practice yet, given the limited permutes we
> support in that case.

I wonder if for variable-length vectors the gap peeling can be
better avoided by using a static mask?  It would of course be
repeated til the vector length, not sure if that's always
possible for { 1, 1 ..., 0, 0, ..., } style masks of fixed
known (sub-)lengths.

Richard.


Re: [PATCH] lto-plugin: add support for feature detection

2022-05-04 Thread Alexander Monakov
On Wed, 4 May 2022, Martin Liška wrote:

> On 5/4/22 14:32, Alexander Monakov wrote:
> > On Wed, 4 May 2022, Martin Liška wrote:
> > 
> >> The patch is a follow-up of the discussion we've got in:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593901.html
> >>
> >> Mold linker would appreciate knowing in advance if get_symbols_v3 is 
> >> supported
> >> by a GCC plug-in or not.

Out of curiousity, I looked at mold, and I don't understand what problem this
detection is solving, nor why this is the best way to solve that. Was there
some discussion with mold author I should check out?

Note that mold takes this not only as 'v3 API is supported', but, more
importantly, as 'v2 entrypoint will not be called'.

Alexander


Re: [PATCH] tree-optimization/103116 - SLP permutes and peeling for gaps

2022-05-04 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The testcase shows that we can end up with a contiguous access across
> loop iterations but by means of permutations the elements accessed
> might only cover parts of a vector.  In this case we end up with
> GROUP_GAP == 0 but still need to avoid accessing excess elements
> in the last loop iterations.  Peeling for gaps is designed to cover
> this but a single scalar iteration might not cover all of the excess
> elements.  The following ensures peeling for gaps is done in this
> situation and when that isn't sufficient because we need to peel
> more than one iteration (gcc.dg/vect/pr103116-2.c), fail the SLP
> vectorization.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?

LGTM.  In principle I think we could (in future) handle some of the
!multiple_p cases for variable-length vectors, but I don't think it
would ever trigger in practice yet, given the limited permutes we
support in that case.

Thanks,
Richard

>
> Thanks,
> Richard.
>
> 2022-05-04  Richard Biener  
>
>   PR tree-optimization/103116
>   * tree-vect-stmts.cc (get_group_load_store_type): Handle the
>   case we need peeling for gaps even though GROUP_GAP is zero.
>
>   * gcc.dg/vect/pr103116-1.c: New testcase.
>   * gcc.dg/vect/pr103116-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/pr103116-1.c | 50 ++
>  gcc/testsuite/gcc.dg/vect/pr103116-2.c | 59 ++
>  gcc/tree-vect-stmts.cc | 31 ++
>  3 files changed, 140 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr103116-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr103116-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr103116-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr103116-1.c
> new file mode 100644
> index 000..d3639fc8cfd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr103116-1.c
> @@ -0,0 +1,50 @@
> +/* { dg-require-effective-target mmap } */
> +
> +#include 
> +#include 
> +
> +#define COUNT 128
> +#define MMAP_SIZE 0x2
> +#define ADDRESS 0x112200
> +#define TYPE unsigned int
> +
> +#ifndef MAP_ANONYMOUS
> +#define MAP_ANONYMOUS MAP_ANON
> +#endif
> +
> +void __attribute__((noipa))
> +loop (TYPE *restrict x, TYPE *restrict y)
> +{
> +  for (int i = 0; i < COUNT; ++i)
> +{
> +  x[i * 4] = y[i * 2] + 1;
> +  x[i * 4 + 1] = y[i * 2] + 2;
> +  x[i * 4 + 2] = y[i * 2 + 1] + 3;
> +  x[i * 4 + 3] = y[i * 2 + 1] + 4;
> +}
> +}
> +
> +TYPE x[COUNT * 4];
> +
> +int
> +main (void)
> +{
> +  void *y;
> +  TYPE *end_y;
> +
> +  y = mmap ((void *) ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
> +MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +  if (y == MAP_FAILED)
> +{
> +  perror ("mmap");
> +  return 1;
> +}
> +
> +  end_y = (TYPE *) ((char *) y + MMAP_SIZE);
> +
> +  loop (x, end_y - COUNT * 2);
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "Data access with gaps requires scalar 
> epilogue loop" "vect" { target { vect_perm && vect_int } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr103116-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr103116-2.c
> new file mode 100644
> index 000..2f4ed0f404c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr103116-2.c
> @@ -0,0 +1,59 @@
> +/* { dg-require-effective-target mmap } */
> +/* { dg-additional-options "-mssse3" { target x86_64-*-* i?86-*-* } } */
> +
> +#include 
> +#include 
> +#include "tree-vect.h"
> +
> +#define COUNT 128
> +#define MMAP_SIZE 0x2
> +#define ADDRESS 0x112200
> +#define TYPE unsigned short
> +#define GROUP_SIZE 2
> +
> +#ifndef MAP_ANONYMOUS
> +#define MAP_ANONYMOUS MAP_ANON
> +#endif
> +
> +void __attribute__((noipa))
> +loop (TYPE *restrict x, TYPE *restrict y)
> +{
> +  for (int i = 0; i < COUNT; ++i)
> +{
> +  x[i * 8] = y[i * GROUP_SIZE] + 1;
> +  x[i * 8 + 1] = y[i * GROUP_SIZE] + 2;
> +  x[i * 8 + 2] = y[i * GROUP_SIZE + 1] + 3;
> +  x[i * 8 + 3] = y[i * GROUP_SIZE + 1] + 4;
> +  x[i * 8 + 4] = y[i * GROUP_SIZE] + 5;
> +  x[i * 8 + 5] = y[i * GROUP_SIZE] + 6;
> +  x[i * 8 + 6] = y[i * GROUP_SIZE + 1] + 7;
> +  x[i * 8 + 7] = y[i * GROUP_SIZE + 1] + 8;
> +}
> +}
> +
> +TYPE x[COUNT * 4];
> +
> +int
> +main (void)
> +{
> +  void *y;
> +  TYPE *end_y;
> +
> +  check_vect ();
> +
> +  y = mmap ((void *) ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
> +MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +  if (y == MAP_FAILED)
> +{
> +  perror ("mmap");
> +  return 1;
> +}
> +
> +  end_y = (TYPE *) ((char *) y + MMAP_SIZE);
> +
> +  loop (x, end_y - COUNT * GROUP_SIZE);
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "peeling for gaps insufficient for access" 
> "vect" { target { vect_perm_short } } } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c9534ef9b1e..d8da13e312a 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2293,6 +2293,37 @@ get_group_load_store_type (vec_info 

[PATCH][pushed] Remove dead code.

2022-05-04 Thread Martin Liška
gcc/ChangeLog:

* gengtype-state.cc (read_a_state_token): Remove dead code.
* ipa-profile.cc (ipa_profile_read_summary_section): Likewise.
---
 gcc/gengtype-state.cc | 22 +++---
 gcc/ipa-profile.cc|  1 -
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/gengtype-state.cc b/gcc/gengtype-state.cc
index ea566af3249..dfd9ea52785 100644
--- a/gcc/gengtype-state.cc
+++ b/gcc/gengtype-state.cc
@@ -473,43 +473,43 @@ read_a_state_token (void)
{
case 'a':
  obstack_1grow (&bstring_obstack, '\a');
- c = getc (state_file);
+ getc (state_file);
  break;
case 'b':
  obstack_1grow (&bstring_obstack, '\b');
- c = getc (state_file);
+ getc (state_file);
  break;
case 't':
  obstack_1grow (&bstring_obstack, '\t');
- c = getc (state_file);
+ getc (state_file);
  break;
case 'n':
  obstack_1grow (&bstring_obstack, '\n');
- c = getc (state_file);
+ getc (state_file);
  break;
case 'v':
  obstack_1grow (&bstring_obstack, '\v');
- c = getc (state_file);
+ getc (state_file);
  break;
case 'f':
  obstack_1grow (&bstring_obstack, '\f');
- c = getc (state_file);
+ getc (state_file);
  break;
case 'r':
  obstack_1grow (&bstring_obstack, '\r');
- c = getc (state_file);
+ getc (state_file);
  break;
case '"':
  obstack_1grow (&bstring_obstack, '\"');
- c = getc (state_file);
+ getc (state_file);
  break;
case '\\':
  obstack_1grow (&bstring_obstack, '\\');
- c = getc (state_file);
+ getc (state_file);
  break;
case ' ':
  obstack_1grow (&bstring_obstack, ' ');
- c = getc (state_file);
+ getc (state_file);
  break;
case 'x':
  {
@@ -520,7 +520,7 @@ read_a_state_token (void)
  fatal_reading_state
(NULL_STATE_TOKEN,
 "Lexical error in string hex escape");
-   c = getc (state_file);
+   getc (state_file);
break;
  }
default:
diff --git a/gcc/ipa-profile.cc b/gcc/ipa-profile.cc
index ffdcb4476e8..b74d77f2c95 100644
--- a/gcc/ipa-profile.cc
+++ b/gcc/ipa-profile.cc
@@ -481,7 +481,6 @@ ipa_profile_read_summary_section (struct lto_file_decl_data 
*file_data,
   for (i = 0; i < count; i++)
 {
   index = streamer_read_uhwi (ib);
-  encoder = file_data->symtab_node_encoder;
   node
= dyn_cast (lto_symtab_encoder_deref (encoder, index));
 
-- 
2.36.0



Re: [PATCH] lto-plugin: add support for feature detection

2022-05-04 Thread Martin Liška
On 5/4/22 14:32, Alexander Monakov wrote:
> On Wed, 4 May 2022, Martin Liška wrote:
> 
>> The patch is a follow-up of the discussion we've got in:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593901.html
>>
>> Mold linker would appreciate knowing in advance if get_symbols_v3 is 
>> supported
>> by a GCC plug-in or not.
>>
>> Ready to be installed?
> 
> Quick note: if the linker is supposed to check for presence of this symbol
> via dlsym(), I expect it won't work as-is since lto-plugin.map hides every
> symbol except 'onload'.

Oh, good point!

Changing that, I get now:

$ nm ./lto-plugin/.libs/liblto_plugin.so | grep v3
00015008 D supports_get_symbols_v3

> 
> (also it might be nicer to reword the comment to say 'Presence of the 
> following
> symbols is used for ...', because you're leaving the value as 'false').

Sure. I'm also changing it's default value.

Sending v2.

Cheers,
Martin

> 
> Alexander
> 
>> Thanks,
>> Martin
>>
>> lto-plugin/ChangeLog:
>>
>>  * lto-plugin.c (supports_get_symbols_v3): Add symbol.
>> ---
>>  lto-plugin/lto-plugin.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
>> index 47378435612..049f3841d5b 100644
>> --- a/lto-plugin/lto-plugin.c
>> +++ b/lto-plugin/lto-plugin.c
>> @@ -1554,3 +1554,8 @@ onload (struct ld_plugin_tv *tv)
>>  
>>return LDPS_OK;
>>  }
>> +
>> +/* The following symbols are used for dynamic detection of plug-in features
>> +   from linker side.  */
>> +
>> +bool supports_get_symbols_v3;
From 7b5acd79a749e50bc825a49b283d4f9e82a5e1b5 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 1 Mar 2022 12:27:45 +0100
Subject: [PATCH] lto-plugin: add support for feature detection

lto-plugin/ChangeLog:

	* lto-plugin.c (supports_get_symbols_v3): Add symbol and
	initialize it to true;

lto-plugin/ChangeLog:

	* lto-plugin.map: Add supports_get_symbols_v3 symbol.
---
 lto-plugin/lto-plugin.c   | 5 +
 lto-plugin/lto-plugin.map | 5 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 47378435612..6aac5867066 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -1554,3 +1554,8 @@ onload (struct ld_plugin_tv *tv)
 
   return LDPS_OK;
 }
+
+/* Presence of the following symbols is used for dynamic detection of plug-in features
+   from linker side.  */
+
+bool supports_get_symbols_v3 = true;
diff --git a/lto-plugin/lto-plugin.map b/lto-plugin/lto-plugin.map
index 3d60e71cc2d..42dc50a819c 100644
--- a/lto-plugin/lto-plugin.map
+++ b/lto-plugin/lto-plugin.map
@@ -1,3 +1,6 @@
 {
-  global: onload; local: *;
+  global: onload;
+  supports_get_symbols_v3;
+
+  local: *;
 };
-- 
2.36.0



Re: [PATCH, stage 1] Fortran: Add support for OMP non-rectangular loops

2022-05-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 25, 2022 at 08:03:38PM -0600, Sandra Loosemore wrote:
> This patch adds support for OMP 5.1 "canonical loop nest form" to the
> Fortran front end, marks non-rectangular loops for processing
> by the middle end, and implements missing checks in the gimplifier
> for additional prohibitions on non-rectangular loops.
> 
> Note that the OMP spec also prohibits non-rectangular loops with the TILE
> construct; that construct hasn't been implemented yet, so that error will
> need to be filled in later.
> 
>   gcc/fortran/
>   * gfortran.h (struct gfc_omp_clauses): Add non_rectangular bit.
>   * openmp.cc (is_outer_iteration_variable): New function.
>   (expr_is_invariant): New function.
>   (bound_expr_is_canonical): New function.
>   (resolve_omp_do): Replace existing non-rectangularity error with
>   check for canonical form and setting non_rectangular bit.
>   * trans-openmp.cc (gfc_trans_omp_do): Transfer non_rectangular
>   flag to generated tree structure.
> 
>   gcc/
>   * gimplify.cc (gimplify_omp_for): Update messages for SCHEDULED
>   and ORDERED clause conflict errors.  Add check for GRAINSIZE and
>   NUM_TASKS on TASKLOOP.
> 
>   gcc/testsuite/
>   * c-c++-common/gomp/loop-6.c (f3): New function to test TASKLOOP
>   diagnostics.
>   * gfortran.dg/gomp/collapse1.f90: Update expected messages.
>   * gfortran.dg/gomp/pr85313.f90: Remove dg-error on non-rectangular
>   loops that are now accepted.
>   * gfortran.dg/gomp/non-rectangular-loop.f90: New file.
>   * gfortran.dg/gomp/canonical-loop-1.f90: New file.
>   * gfortran.dg/gomp/canonical-loop-2.f90: New file.
> 
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -12468,11 +12468,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>  OMP_CLAUSE_SCHEDULE))
>   error_at (EXPR_LOCATION (for_stmt),
> "%qs clause may not appear on non-rectangular %qs",
> -   "schedule", "for");
> +   "schedule", (lang_GNU_Fortran () ? "do" : "for"));
> if (omp_find_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_ORDERED))
>   error_at (EXPR_LOCATION (for_stmt),
> "%qs clause may not appear on non-rectangular %qs",
> -   "ordered", "for");
> +   "ordered", (lang_GNU_Fortran () ? "do" : "for"));

Please drop the superfluous ()s around the argument, just use
  "...", lang_GNU_Fortran () ? "do" : "for");

Ok for trunk with that nit fixed.  And thanks for discovering the missing
grainsize/num_tasks checks.

Jakub



Re: [PATCH] lto-plugin: add support for feature detection

2022-05-04 Thread Alexander Monakov
On Wed, 4 May 2022, Martin Liška wrote:

> The patch is a follow-up of the discussion we've got in:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593901.html
> 
> Mold linker would appreciate knowing in advance if get_symbols_v3 is supported
> by a GCC plug-in or not.
> 
> Ready to be installed?

Quick note: if the linker is supposed to check for presence of this symbol
via dlsym(), I expect it won't work as-is since lto-plugin.map hides every
symbol except 'onload'.

(also it might be nicer to reword the comment to say 'Presence of the following
symbols is used for ...', because you're leaving the value as 'false').

Alexander

> Thanks,
> Martin
> 
> lto-plugin/ChangeLog:
> 
>   * lto-plugin.c (supports_get_symbols_v3): Add symbol.
> ---
>  lto-plugin/lto-plugin.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
> index 47378435612..049f3841d5b 100644
> --- a/lto-plugin/lto-plugin.c
> +++ b/lto-plugin/lto-plugin.c
> @@ -1554,3 +1554,8 @@ onload (struct ld_plugin_tv *tv)
>  
>return LDPS_OK;
>  }
> +
> +/* The following symbols are used for dynamic detection of plug-in features
> +   from linker side.  */
> +
> +bool supports_get_symbols_v3;
> 


[PATCH] lto-plugin: add support for feature detection

2022-05-04 Thread Martin Liška
The patch is a follow-up of the discussion we've got in:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593901.html

Mold linker would appreciate knowing in advance if get_symbols_v3 is supported
by a GCC plug-in or not.

Ready to be installed?
Thanks,
Martin

lto-plugin/ChangeLog:

* lto-plugin.c (supports_get_symbols_v3): Add symbol.
---
 lto-plugin/lto-plugin.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 47378435612..049f3841d5b 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -1554,3 +1554,8 @@ onload (struct ld_plugin_tv *tv)
 
   return LDPS_OK;
 }
+
+/* The following symbols are used for dynamic detection of plug-in features
+   from linker side.  */
+
+bool supports_get_symbols_v3;
-- 
2.36.0



Re: [Patch] OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg

2022-05-04 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 20, 2022 at 03:19:38PM +0200, Tobias Burnus wrote:
> For array-descriptor vars, the descriptor is assigned to a temporary. However,
> this failed when the clause's argument was in turn in a data-sharing clause
> as the outer context's VALUE_EXPR wasn't used.
> 
> gcc/ChangeLog:
> 
>   * omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list
>   item that is in an outer data-sharing clause.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.fortran/use_device_addr-5.f90: New test.
> 
>  gcc/omp-low.cc |  22 ++--
>  .../libgomp.fortran/use_device_addr-5.f90  | 143 
> +
>  2 files changed, 156 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
> index bf5779b6543..6e387fd9a61 100644
> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -13656,26 +13656,30 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
>   new_var = lookup_decl (var, ctx);
>   new_var = DECL_VALUE_EXPR (new_var);
>   tree v = new_var;
> + tree v2 = var;
> + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_PTR
> + || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_ADDR)
> +   {
> + v2 = maybe_lookup_decl_in_outer_ctx (var, ctx);
> + if (DECL_HAS_VALUE_EXPR_P (v2))
> +   v2 = DECL_VALUE_EXPR (v2);

I don't understand the above 2 lines, why do you need that?
Regardless whether v2 has DECL_VALUE_EXPR or not, the type of the
DECL_VALUE_EXPR (v2) and v2 should be the same, build_fold_indirect_ref
should work on both and then v2 is only used as second operand of
gimplify_assign, where the gimplifier makes sure to handle DECL_VALUE_EXPR
correctly.  I certainly don't see any difference in the *.omplower dump
if I comment out the above 2 lines.

Otherwise LGTM, so if the 2 lines aren't needed, please also drop the
{}s around v2 = maybe_lookup_decl_in_outer_ctx (var, ctx); and reindent.

> +   }
>  
>   if (is_ref)
> {
> - var = build_fold_indirect_ref (var);
> - gimplify_expr (&var, &assign_body, NULL, is_gimple_val,
> -fb_rvalue);
> - v = create_tmp_var_raw (TREE_TYPE (var), get_name (var));
> + v2 = build_fold_indirect_ref (v2);
> + v = create_tmp_var_raw (TREE_TYPE (v2), get_name (var));
>   gimple_add_tmp_var (v);
>   TREE_ADDRESSABLE (v) = 1;
> - gimple_seq_add_stmt (&assign_body,
> -  gimple_build_assign (v, var));
> + gimplify_assign (v, v2, &assign_body);
>   tree rhs = build_fold_addr_expr (v);
>   gimple_seq_add_stmt (&assign_body,
>gimple_build_assign (new_var, rhs));
> }
>   else
> -   gimple_seq_add_stmt (&assign_body,
> -gimple_build_assign (new_var, var));
> +   gimplify_assign (new_var, v2, &assign_body);
>  
> - tree v2 = lang_hooks.decls.omp_array_data (unshare_expr (v), 
> false);
> + v2 = lang_hooks.decls.omp_array_data (unshare_expr (v), false);
>   gcc_assert (v2);
>   gimplify_expr (&x, &assign_body, NULL, is_gimple_val, 
> fb_rvalue);
>   gimple_seq_add_stmt (&assign_body,

Jakub



[PATCH] Fold more vector constants early

2022-05-04 Thread Richard Biener via Gcc-patches
In PR105049 we had

  return VIEW_CONVERT_EXPR( VEC_PERM_EXPR < {<<< Unknown tree: 
compound_literal_expr
V D.1984 = { 0 }; >>>, { 0 }} , {<<< Unknown tree: compound_literal_expr
V D.1985 = { 0 }; >>>, { 0 }} , { 0, 0 } >  & {(short int) SAVE_EXPR 
, (short int) SAVE_EXPR });

where we gimplify the init CTORs to

  _1 = {{ 0 }, { 0 }};
  _2 = {{ 0 }, { 0 }};

instead of to vector constants.  The following makes sure to simplify the
CTORs to VECTOR_CSTs during gimplification by re-ordering the simplification
to after CTOR flag recomputation and gimplification of the elements.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-03-25  Richard Biener  

* gimplify.cc (gimplify_init_constructor): First gimplify,
then simplify the result to a VECTOR_CST.
---
 gcc/gimplify.cc | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 2588824dce2..f052d9f970c 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -5432,6 +5432,22 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
if (notify_temp_creation)
  return GS_OK;
 
+   /* Vector types use CONSTRUCTOR all the way through gimple
+  compilation as a general initializer.  */
+   FOR_EACH_VEC_SAFE_ELT (elts, ix, ce)
+ {
+   enum gimplify_status tret;
+   tret = gimplify_expr (&ce->value, pre_p, post_p, is_gimple_val,
+ fb_rvalue);
+   if (tret == GS_ERROR)
+ ret = GS_ERROR;
+   else if (TREE_STATIC (ctor)
+&& !initializer_constant_valid_p (ce->value,
+  TREE_TYPE (ce->value)))
+ TREE_STATIC (ctor) = 0;
+ }
+   recompute_constructor_flags (ctor);
+
/* Go ahead and simplify constant constructors to VECTOR_CST.  */
if (TREE_CONSTANT (ctor))
  {
@@ -5454,25 +5470,8 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
TREE_OPERAND (*expr_p, 1) = build_vector_from_ctor (type, elts);
break;
  }
-
-   TREE_CONSTANT (ctor) = 0;
  }
 
-   /* Vector types use CONSTRUCTOR all the way through gimple
-  compilation as a general initializer.  */
-   FOR_EACH_VEC_SAFE_ELT (elts, ix, ce)
- {
-   enum gimplify_status tret;
-   tret = gimplify_expr (&ce->value, pre_p, post_p, is_gimple_val,
- fb_rvalue);
-   if (tret == GS_ERROR)
- ret = GS_ERROR;
-   else if (TREE_STATIC (ctor)
-&& !initializer_constant_valid_p (ce->value,
-  TREE_TYPE (ce->value)))
- TREE_STATIC (ctor) = 0;
- }
-   recompute_constructor_flags (ctor);
if (!is_gimple_reg (TREE_OPERAND (*expr_p, 0)))
  TREE_OPERAND (*expr_p, 1) = get_formal_tmp_var (ctor, pre_p);
   }
-- 
2.35.3


Re: [PATCH] gimple-isel: handle x CMP y ? z : 0

2022-05-04 Thread Richard Biener via Gcc-patches
On Wed, May 4, 2022 at 12:16 PM Richard Earnshaw via Gcc-patches
 wrote:
>
>
> Gimple isel already handles x CMP y ? -1 : 0 when lowering vector cond
> operations, but this can be generalized further when the comparison
> forms a natural mask so that we can also handle x CMP y ? z : 0 by
> transforming it into (x CMP y) & z.  This will, in most cases save
> having to load a register with the zero vector.

Hmm, I'm not sure why exactly the existing case is in ISEL but if
we want to extend it there we migh also want to handle ? 0 : -1
as BIT_NOT.

For the case in question it looks like it might better fit as optimization
in match.pd?  It also lacks a query for present and3 support.
For match.pd it might be nice to handle x CMP y ? 0 : z as well
if we can invert x CMP y (and it has only a single use).  When in
ISEL we might as well check for andn3 support and go with
BIT_NOT + BIT_AND ...

Do you have a testcase where it improves code generation btw?

Richard.

> gcc/ChangeLog:
> * gimple-isel.cc (gimple_expand_vec_cond_expr): Handle
> x CMP y ? z : 0.
> ---
>  gcc/gimple-isel.cc | 29 +++--
>  1 file changed, 23 insertions(+), 6 deletions(-)
>


[PATCH] gimple-isel: handle x CMP y ? z : 0

2022-05-04 Thread Richard Earnshaw via Gcc-patches

Gimple isel already handles x CMP y ? -1 : 0 when lowering vector cond
operations, but this can be generalized further when the comparison
forms a natural mask so that we can also handle x CMP y ? z : 0 by
transforming it into (x CMP y) & z.  This will, in most cases save
having to load a register with the zero vector.

gcc/ChangeLog:
* gimple-isel.cc (gimple_expand_vec_cond_expr): Handle
x CMP y ? z : 0.
---
 gcc/gimple-isel.cc | 29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index a8f7a0d25d0..5bf4a4eccc1 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -190,16 +190,33 @@ gimple_expand_vec_cond_expr (struct function *fun, gimple_stmt_iterator *gsi,
 	can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
 		 tcode);
 
-	  /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
+	  /* Try to fold x CMP y ? z : 0.  */
 	  if (can_compute_op0
-	  && integer_minus_onep (op1)
 	  && integer_zerop (op2)
 	  && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
 	{
-	  tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
-	  gassign *new_stmt = gimple_build_assign (lhs, conv_op);
-	  gsi_replace (gsi, new_stmt, true);
-	  return new_stmt;
+	  /* If Z is -1, then the result is just the comparison.  */
+	  if (integer_minus_onep (op1))
+		{
+		  tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs),
+	 op0);
+		  gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+		  gsi_replace (gsi, new_stmt, true);
+		  return new_stmt;
+		}
+	  /* Otherwise, use the comparison as a mask for Z.  */
+	  else
+		{
+		  gimple_seq stmts = NULL;
+		  tree type = TREE_TYPE (lhs);
+		  location_t loc = gimple_location (stmt);
+		  tree tem0 = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
+	type, op0);
+		  tree tem1 = gimple_build (&stmts, loc, BIT_AND_EXPR, type,
+	tem0, op1);
+		  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+		  return gimple_build_assign (lhs, tem1);
+		}
 	}
 
 	  /* When the compare has EH we do not want to forward it when


[PATCH] tree-optimization/103116 - SLP permutes and peeling for gaps

2022-05-04 Thread Richard Biener via Gcc-patches
The testcase shows that we can end up with a contiguous access across
loop iterations but by means of permutations the elements accessed
might only cover parts of a vector.  In this case we end up with
GROUP_GAP == 0 but still need to avoid accessing excess elements
in the last loop iterations.  Peeling for gaps is designed to cover
this but a single scalar iteration might not cover all of the excess
elements.  The following ensures peeling for gaps is done in this
situation and when that isn't sufficient because we need to peel
more than one iteration (gcc.dg/vect/pr103116-2.c), fail the SLP
vectorization.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2022-05-04  Richard Biener  

PR tree-optimization/103116
* tree-vect-stmts.cc (get_group_load_store_type): Handle the
case we need peeling for gaps even though GROUP_GAP is zero.

* gcc.dg/vect/pr103116-1.c: New testcase.
* gcc.dg/vect/pr103116-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr103116-1.c | 50 ++
 gcc/testsuite/gcc.dg/vect/pr103116-2.c | 59 ++
 gcc/tree-vect-stmts.cc | 31 ++
 3 files changed, 140 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr103116-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr103116-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr103116-1.c 
b/gcc/testsuite/gcc.dg/vect/pr103116-1.c
new file mode 100644
index 000..d3639fc8cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr103116-1.c
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target mmap } */
+
+#include 
+#include 
+
+#define COUNT 128
+#define MMAP_SIZE 0x2
+#define ADDRESS 0x112200
+#define TYPE unsigned int
+
+#ifndef MAP_ANONYMOUS
+#define MAP_ANONYMOUS MAP_ANON
+#endif
+
+void __attribute__((noipa))
+loop (TYPE *restrict x, TYPE *restrict y)
+{
+  for (int i = 0; i < COUNT; ++i)
+{
+  x[i * 4] = y[i * 2] + 1;
+  x[i * 4 + 1] = y[i * 2] + 2;
+  x[i * 4 + 2] = y[i * 2 + 1] + 3;
+  x[i * 4 + 3] = y[i * 2 + 1] + 4;
+}
+}
+
+TYPE x[COUNT * 4];
+
+int
+main (void)
+{
+  void *y;
+  TYPE *end_y;
+
+  y = mmap ((void *) ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
+MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+  if (y == MAP_FAILED)
+{
+  perror ("mmap");
+  return 1;
+}
+
+  end_y = (TYPE *) ((char *) y + MMAP_SIZE);
+
+  loop (x, end_y - COUNT * 2);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Data access with gaps requires scalar epilogue 
loop" "vect" { target { vect_perm && vect_int } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr103116-2.c 
b/gcc/testsuite/gcc.dg/vect/pr103116-2.c
new file mode 100644
index 000..2f4ed0f404c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr103116-2.c
@@ -0,0 +1,59 @@
+/* { dg-require-effective-target mmap } */
+/* { dg-additional-options "-mssse3" { target x86_64-*-* i?86-*-* } } */
+
+#include 
+#include 
+#include "tree-vect.h"
+
+#define COUNT 128
+#define MMAP_SIZE 0x2
+#define ADDRESS 0x112200
+#define TYPE unsigned short
+#define GROUP_SIZE 2
+
+#ifndef MAP_ANONYMOUS
+#define MAP_ANONYMOUS MAP_ANON
+#endif
+
+void __attribute__((noipa))
+loop (TYPE *restrict x, TYPE *restrict y)
+{
+  for (int i = 0; i < COUNT; ++i)
+{
+  x[i * 8] = y[i * GROUP_SIZE] + 1;
+  x[i * 8 + 1] = y[i * GROUP_SIZE] + 2;
+  x[i * 8 + 2] = y[i * GROUP_SIZE + 1] + 3;
+  x[i * 8 + 3] = y[i * GROUP_SIZE + 1] + 4;
+  x[i * 8 + 4] = y[i * GROUP_SIZE] + 5;
+  x[i * 8 + 5] = y[i * GROUP_SIZE] + 6;
+  x[i * 8 + 6] = y[i * GROUP_SIZE + 1] + 7;
+  x[i * 8 + 7] = y[i * GROUP_SIZE + 1] + 8;
+}
+}
+
+TYPE x[COUNT * 4];
+
+int
+main (void)
+{
+  void *y;
+  TYPE *end_y;
+
+  check_vect ();
+
+  y = mmap ((void *) ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
+MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+  if (y == MAP_FAILED)
+{
+  perror ("mmap");
+  return 1;
+}
+
+  end_y = (TYPE *) ((char *) y + MMAP_SIZE);
+
+  loop (x, end_y - COUNT * GROUP_SIZE);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "peeling for gaps insufficient for access" 
"vect" { target { vect_perm_short } } } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c9534ef9b1e..d8da13e312a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2293,6 +2293,37 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
  gcc_assert (!loop_vinfo || cmp > 0);
  *memory_access_type = VMAT_CONTIGUOUS;
}
+
+ /* When we have a contiguous access across loop iterations
+but the access in the loop doesn't cover the full vector
+we can end up with no gap recorded but still excess
+elements accessed, see PR103116.  Make sure we peel for
+gaps if necessary and sufficient and give up if not.  */
+ if (loop_vinfo
+ && *memory_access_type == VMAT_CONTIGUOUS

Re: [PATCH] genconditions: Add support for targets without non-trivial insn conditions

2022-05-04 Thread Jakub Jelinek via Gcc-patches
On Wed, May 04, 2022 at 10:57:57AM +0200, Richard Biener wrote:
> On Wed, 4 May 2022, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > Somebody complained on IRC that when writing a new backend one can get
> > an error while compiling build/gencondmd.cc.
> > The problem is that when host compiler is g++ 3 or later (or when
> > bootstrapping), we compile it with g++ -std=c++11 -pedantic and
> > the generated insn_conditions array contains pairs
> > { "cond", __builtin_constant_p (cond) ? (int) (cond) : -1 },
> > where cond is some non-trivial instruction condition.  Now if a target
> > uses "" for all the conditions (admittedly unlikely for non-trivial
> > target), the initializer for insn_conditions[] is {} and that is
> > pedantically rejected because C++ doesn't support zero-sized arrays.
> > 
> > The following patch fixes that by adding an artificial termination
> > element and skips that during the walk.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, no change in
> > generated insn-conditions.md with it.  Ok for trunk?
> 
> OK.
> 
> I wonder what the generated GCC_VERSION conditions are about ...

To determine if an insn condition is compile time constant or not.
If it is and e.g. it is always false, it can just omit the insn altogether,
etc.
See https://gcc.gnu.org/legacy-ml/gcc-patches/2006-01/msg00394.html
for more details.

Jakub



Re: [PATCH] genconditions: Add support for targets without non-trivial insn conditions

2022-05-04 Thread Richard Biener via Gcc-patches
On Wed, 4 May 2022, Jakub Jelinek wrote:

> Hi!
> 
> Somebody complained on IRC that when writing a new backend one can get
> an error while compiling build/gencondmd.cc.
> The problem is that when host compiler is g++ 3 or later (or when
> bootstrapping), we compile it with g++ -std=c++11 -pedantic and
> the generated insn_conditions array contains pairs
> { "cond", __builtin_constant_p (cond) ? (int) (cond) : -1 },
> where cond is some non-trivial instruction condition.  Now if a target
> uses "" for all the conditions (admittedly unlikely for non-trivial
> target), the initializer for insn_conditions[] is {} and that is
> pedantically rejected because C++ doesn't support zero-sized arrays.
> 
> The following patch fixes that by adding an artificial termination
> element and skips that during the walk.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, no change in
> generated insn-conditions.md with it.  Ok for trunk?

OK.

I wonder what the generated GCC_VERSION conditions are about ...

Richard.

> 2022-05-04  Jakub Jelinek  
> 
>   * genconditions.cc (write_conditions): Append a { nullptr, -1 }
>   element at the end of insn_conditions.
>   (write_writer): Use ARRAY_SIZE (insn_conditions) - 1 instead of
>   ARRAY_SIZE (insn_conditions).
> 
> --- gcc/genconditions.cc.jj   2022-01-18 11:58:59.586981999 +0100
> +++ gcc/genconditions.cc  2022-05-03 20:01:59.428065439 +0200
> @@ -175,7 +175,7 @@ static const struct c_test insn_conditio
>  
>traverse_c_tests (write_one_condition, 0);
>  
> -  puts ("\n};\n#endif /* gcc >= 3.0.1 */\n");
> +  puts ("  { nullptr, -1 }\n};\n#endif /* gcc >= 3.0.1 */\n");
>  }
>  
>  /* Emit code which will convert the C-format table to a
> @@ -192,7 +192,7 @@ write_writer (void)
>  "  const char *p;\n"
>  "  puts (\"(define_conditions [\");\n"
>   "#if GCC_VERSION >= 3001\n"
> - "  for (i = 0; i < ARRAY_SIZE (insn_conditions); i++)\n"
> + "  for (i = 0; i < ARRAY_SIZE (insn_conditions) - 1; i++)\n"
>   "{\n"
>   "  printf (\"  (%d \\\"\", insn_conditions[i].value);\n"
>   "  for (p = insn_conditions[i].expr; *p; p++)\n"
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [committed][libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.c

2022-05-04 Thread Thomas Schwinge
Hi Tom!

On 2022-04-01T13:23:06+0200, Tom de Vries  wrote:
> When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an
> RTX A2000 (sm_86) with driver 510.60.02 I run into:
> ...
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \
>   -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  \
>   output pattern test
> ...
>
> The failing check verifies the launch dimensions:
> ...
> /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: \
> launch gangs=1, workers=8, vectors=128" } */
> ...
> which fails because (as we can see with GOMP_DEBUG=1) the actual num_workers
> is 6:
> ...
>   nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128
> ...
>
> This is due to the result of cuOccupancyMaxPotentialBlockSize (which suggests
> 'a launch configuration with reasonable occupancy') printed just before:
> ...
> cuOccupancyMaxPotentialBlockSize: grid = 52, block = 768
> ...
> [ Note: 6 * 128 == 768. ]

I had a while ago observed, and now finally looked into a similar case
with Nvidia TITAN V, Driver 455.23.05, GCC/nvptx default multilib.
Looking at 'GOMP_DEBUG=1' output:

'-O2'; all good:

[...]
Link log info: 0 bytes gmem
info: Function properties for 'main$_omp_fn$0':
info: used 32 registers, 0 stack, 288 bytes smem, 360 bytes cmem[0], 0 
bytes lmem
  GOMP_OFFLOAD_openacc_exec: prepare mappings
 warp_size=32, block_size=1024, dev_size=80, cpu_size=2048
 default dimensions [160,32,32]
cuOccupancyMaxPotentialBlockSize: grid = 160, block = 1024
  nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=8, vectors=128
  nvptx_exec: kernel main$_omp_fn$0: finished

... vs. '-O0'; similar to your report:

[...]
Link log info: 0 bytes gmem
info: Function properties for 'main$_omp_fn$0':
info: used 33 registers, 32 stack, 432 bytes smem, 360 bytes cmem[0], 0 
bytes lmem
  GOMP_OFFLOAD_openacc_exec: prepare mappings
 warp_size=32, block_size=1024, dev_size=80, cpu_size=2048
 default dimensions [160,32,32]
cuOccupancyMaxPotentialBlockSize: grid = 160, block = 768
  nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128
  nvptx_exec: kernel main$_omp_fn$0: finished

..., so I would've suggested:

> Fix this by updating the check to allow num_workers in the range 1 to 8.

... to do this for '-O0' only, to make sure that we'll notice should the
'-O2' case regress at some later point in time.  Are you OK if I make the
obvious a change?


But that said...  We might also generally classify this as a regression,
because when using the GCC/nvptx '-mptx=3.1' instead of default multilib
('-foffload-options=nvptx-none=-mptx=3.1'), I see:

'-O2'; all good (exactly the same launch configuration as with GCC/nvptx
default multilib, see above):

[...]
Link log info: 0 bytes gmem
info: Function properties for 'main$_omp_fn$0':
info: used 32 registers, 0 stack, 288 bytes smem, 360 bytes cmem[0], 0 
bytes lmem
  GOMP_OFFLOAD_openacc_exec: prepare mappings
 warp_size=32, block_size=1024, dev_size=80, cpu_size=2048
 default dimensions [160,32,32]
cuOccupancyMaxPotentialBlockSize: grid = 160, block = 1024
  nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=8, vectors=128
  nvptx_exec: kernel main$_omp_fn$0: finished

..., but also for -O0'; all good:

Link log info: 0 bytes gmem
info: Function properties for 'main$_omp_fn$0':
info: used 30 registers, 32 stack, 432 bytes smem, 360 bytes cmem[0], 0 
bytes lmem
  GOMP_OFFLOAD_openacc_exec: prepare mappings
 warp_size=32, block_size=1024, dev_size=80, cpu_size=2048
 default dimensions [160,32,32]
cuOccupancyMaxPotentialBlockSize: grid = 160, block = 1024
  nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=8, vectors=128
  nvptx_exec: kernel main$_omp_fn$0: finished

Are you able to reproduce that?

Follows '-O0' word-diff between GCC/nvptx default vs. '-mptx=3.1'
multilib:

Link log info: 0 bytes gmem
info: Function properties for 'main$_omp_fn$0':
info: used [-33-]{+30+} registers, 32 stack, 432 bytes smem, 360 bytes 
cmem[0], 0 bytes lmem
  GOMP_OFFLOAD_openacc_exec: prepare mappings
 warp_size=32, block_size=1024, dev_size=80, cpu_size=2048
 default dimensions [160,32,32]
cuOccupancyMaxPotentialBlockSize: grid = 160, block = [-768-]{+1024+}
  nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, 
[-workers=6,-]{+workers=8,+} vectors=128
  nvptx_exec: kernel main$_omp_fn$0: finished

Notice that the GCC/nvptx default multilib uses 33 registers vs. the
'-mptx=3.1' multilib uses 30 registers!  (..., which then allows for
'block = [-768-]{+1024+}', 'workers=[-6-]{+8+}').

If that's useful, 'diff' of the PTX code that gets loaded to the GPU:

 // BEGIN PREAMBLE
-.version 6.0
+.version 3.1
 .target sm_30

Re: [PATCH] arm: Restrict support of vectors of boolean immediates (PR target/104662)

2022-05-04 Thread Christophe Lyon via Gcc-patches




On 5/4/22 10:03, Kyrylo Tkachov wrote:




-Original Message-
From: Gcc-patches  On Behalf Of Christophe
Lyon via Gcc-patches
Sent: Wednesday, May 4, 2022 8:51 AM
To: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] arm: Restrict support of vectors of boolean immediates
(PR target/104662)

ping?

On 4/20/22 10:50, Christophe Lyon wrote:

This simple patch avoids the ICE described in the PR:
internal compiler error: in simd_valid_immediate, at

config/arm/arm.cc:12866


with an early exit from simd_valid_immediate if we are trying to
handle a vector of booleans and MVE is not enabled.

We still get an ICE when compiling the existing
gcc.dg/rtl/arm/mve-vxbi.c without -march=armv8.1-m.main+mve:

error: unrecognizable insn:
(insn 7 5 8 2 (set (reg:V4BI 114)
  (const_vector:V4BI [
  (const_int 1 [0x1])
  (const_int 0 [0]) repeated x2
  (const_int 1 [0x1])
  ])) -1
   (nil))
during RTL pass: ira

but there's little we can do since the testcase explicitly creates
vectors of booleans which do need MVE.

That is the reason why I do not add a testcase.


Ok if testing is fine.


Thanks, pushed as r13-102-ge2285af309000b

Christophe


Thanks,
Kyrill



2022-04-19  Christophe Lyon  

PR target/104662
* config/arm/arm.cc (simd_valid_immediate): Exit when input is a
vector of booleans and MVE is not enabled.
---
   gcc/config/arm/arm.cc | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 14e2fdfeafa..69a18c2f157 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -12849,6 +12849,9 @@ simd_valid_immediate (rtx op, machine_mode

mode, int inverse,

  || n_elts * innersize != 16))
   return -1;

+  if (!TARGET_HAVE_MVE && GET_MODE_CLASS (mode) ==

MODE_VECTOR_BOOL)

+return -1;
+
 /* Vectors of float constants.  */
 if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
   {


[PATCH] genconditions: Add support for targets without non-trivial insn conditions

2022-05-04 Thread Jakub Jelinek via Gcc-patches
Hi!

Somebody complained on IRC that when writing a new backend one can get
an error while compiling build/gencondmd.cc.
The problem is that when host compiler is g++ 3 or later (or when
bootstrapping), we compile it with g++ -std=c++11 -pedantic and
the generated insn_conditions array contains pairs
{ "cond", __builtin_constant_p (cond) ? (int) (cond) : -1 },
where cond is some non-trivial instruction condition.  Now if a target
uses "" for all the conditions (admittedly unlikely for non-trivial
target), the initializer for insn_conditions[] is {} and that is
pedantically rejected because C++ doesn't support zero-sized arrays.

The following patch fixes that by adding an artificial termination
element and skips that during the walk.

Bootstrapped/regtested on x86_64-linux and i686-linux, no change in
generated insn-conditions.md with it.  Ok for trunk?

2022-05-04  Jakub Jelinek  

* genconditions.cc (write_conditions): Append a { nullptr, -1 }
element at the end of insn_conditions.
(write_writer): Use ARRAY_SIZE (insn_conditions) - 1 instead of
ARRAY_SIZE (insn_conditions).

--- gcc/genconditions.cc.jj 2022-01-18 11:58:59.586981999 +0100
+++ gcc/genconditions.cc2022-05-03 20:01:59.428065439 +0200
@@ -175,7 +175,7 @@ static const struct c_test insn_conditio
 
   traverse_c_tests (write_one_condition, 0);
 
-  puts ("\n};\n#endif /* gcc >= 3.0.1 */\n");
+  puts ("  { nullptr, -1 }\n};\n#endif /* gcc >= 3.0.1 */\n");
 }
 
 /* Emit code which will convert the C-format table to a
@@ -192,7 +192,7 @@ write_writer (void)
 "  const char *p;\n"
 "  puts (\"(define_conditions [\");\n"
"#if GCC_VERSION >= 3001\n"
-   "  for (i = 0; i < ARRAY_SIZE (insn_conditions); i++)\n"
+   "  for (i = 0; i < ARRAY_SIZE (insn_conditions) - 1; i++)\n"
"{\n"
"  printf (\"  (%d \\\"\", insn_conditions[i].value);\n"
"  for (p = insn_conditions[i].expr; *p; p++)\n"

Jakub



RE: [PATCH] arm: Restrict support of vectors of boolean immediates (PR target/104662)

2022-05-04 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: Wednesday, May 4, 2022 8:51 AM
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] arm: Restrict support of vectors of boolean immediates
> (PR target/104662)
> 
> ping?
> 
> On 4/20/22 10:50, Christophe Lyon wrote:
> > This simple patch avoids the ICE described in the PR:
> > internal compiler error: in simd_valid_immediate, at
> config/arm/arm.cc:12866
> >
> > with an early exit from simd_valid_immediate if we are trying to
> > handle a vector of booleans and MVE is not enabled.
> >
> > We still get an ICE when compiling the existing
> > gcc.dg/rtl/arm/mve-vxbi.c without -march=armv8.1-m.main+mve:
> >
> > error: unrecognizable insn:
> > (insn 7 5 8 2 (set (reg:V4BI 114)
> >  (const_vector:V4BI [
> >  (const_int 1 [0x1])
> >  (const_int 0 [0]) repeated x2
> >  (const_int 1 [0x1])
> >  ])) -1
> >   (nil))
> > during RTL pass: ira
> >
> > but there's little we can do since the testcase explicitly creates
> > vectors of booleans which do need MVE.
> >
> > That is the reason why I do not add a testcase.

Ok if testing is fine.
Thanks,
Kyrill

> >
> > 2022-04-19  Christophe Lyon  
> >
> > PR target/104662
> > * config/arm/arm.cc (simd_valid_immediate): Exit when input is a
> > vector of booleans and MVE is not enabled.
> > ---
> >   gcc/config/arm/arm.cc | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> > index 14e2fdfeafa..69a18c2f157 100644
> > --- a/gcc/config/arm/arm.cc
> > +++ b/gcc/config/arm/arm.cc
> > @@ -12849,6 +12849,9 @@ simd_valid_immediate (rtx op, machine_mode
> mode, int inverse,
> >   || n_elts * innersize != 16))
> >   return -1;
> >
> > +  if (!TARGET_HAVE_MVE && GET_MODE_CLASS (mode) ==
> MODE_VECTOR_BOOL)
> > +return -1;
> > +
> > /* Vectors of float constants.  */
> > if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
> >   {


[PATCH][PUSHED] hwasan: support new dg-output format.

2022-05-04 Thread Martin Liška
Supports change in libsanitizer where it newly reports:
READ of size 4 at 0xc3d4 tags: 02/01(00) (ptr/mem) in thread T0

So the 'tags' contains now 3 entries compared to 2 entries.

gcc/testsuite/ChangeLog:

* c-c++-common/hwasan/alloca-outside-caught.c: Update dg-output.
* c-c++-common/hwasan/heap-overflow.c: Likewise.
* c-c++-common/hwasan/hwasan-thread-access-parent.c: Likewise.
* c-c++-common/hwasan/large-aligned-1.c: Likewise.
* c-c++-common/hwasan/stack-tagging-basic-1.c: Likewise.
---
 gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c   | 2 +-
 gcc/testsuite/c-c++-common/hwasan/heap-overflow.c   | 2 +-
 gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c | 2 +-
 gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c | 2 +-
 gcc/testsuite/c-c++-common/hwasan/stack-tagging-basic-1.c   | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c 
b/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
index 60d7a9a874f..6f3825bee7c 100644
--- a/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
+++ b/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
@@ -20,6 +20,6 @@ main ()
 }
 
 /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" } 
*/
-/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/00 \\(ptr/mem\\) in thread T0.*" } */
+/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/00.* \\(ptr/mem\\) in thread T0.*" } */
 /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } */
 /* { dg-output "SUMMARY: HWAddressSanitizer: tag-mismatch \[^\n\]*.*" } */
diff --git a/gcc/testsuite/c-c++-common/hwasan/heap-overflow.c 
b/gcc/testsuite/c-c++-common/hwasan/heap-overflow.c
index 137466800de..bddb38c81f1 100644
--- a/gcc/testsuite/c-c++-common/hwasan/heap-overflow.c
+++ b/gcc/testsuite/c-c++-common/hwasan/heap-overflow.c
@@ -23,7 +23,7 @@ int main(int argc, char **argv) {
 }
 
 /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" } 
*/
-/* { dg-output "READ of size 1 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\] \\(ptr/mem\\) 
in thread T0.*" } */
+/* { dg-output "READ of size 1 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\].* 
\\(ptr/mem\\) in thread T0.*" } */
 /* { dg-output "located 0 bytes to the right of 10-byte region.*" } */
 /* { dg-output "allocated here:.*" } */
 /* { dg-output "#1 0x\[0-9a-f\]+ +in _*main \[^\n\r]*heap-overflow.c:18" } */
diff --git a/gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c 
b/gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c
index 828909d3b3b..eca27c8cd2c 100644
--- a/gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c
+++ b/gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c
@@ -46,6 +46,6 @@ main (int argc, char **argv)
 }
 
 /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" } 
*/
-/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
00/\[\[:xdigit:\]\]\[\[:xdigit:\]\] \\(ptr/mem\\) in thread T1.*" } */
+/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
00/\[\[:xdigit:\]\]\[\[:xdigit:\]\].* \\(ptr/mem\\) in thread T1.*" } */
 /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } */
 /* { dg-output "SUMMARY: HWAddressSanitizer: tag-mismatch \[^\n\]*.*" } */
diff --git a/gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c 
b/gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c
index 1aa13032396..6158ba4bdbc 100644
--- a/gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c
+++ b/gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c
@@ -9,6 +9,6 @@
 /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" } 
*/
 /* NOTE: This assumes the current tagging mechanism (one at a time from the
base and large aligned variables being handled first).  */
-/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\] \\(ptr/mem\\) 
in thread T0.*" } */
+/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
\[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\].* 
\\(ptr/mem\\) in thread T0.*" } */
 /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } */
 /* { dg-output "SUMMARY: HWAddressSanitizer: tag-mismatch \[^\n\]*.*" } */
diff --git a/gcc/testsuite/c-c++-common/hwasan/stack-tagging-basic-1.c 
b/gcc/testsuite/c-c++-common/hwasan/stack-tagging-basic-1.c
index 90d5837254f..c4e698e1b8b 100644
--- a/gcc/testsuite/c-c++-common/hwasan/stack-tagging-basic-1.c
+++ b/gcc/testsuite/c-c++-common/hwasan/stack-tagging-basic-1.c
@@ -13,6 +13,6 @@
 #undef ARG
 
 /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" } 
*/
-/* { dg-output "READ of size

Re: [PATCH] arm: Restrict support of vectors of boolean immediates (PR target/104662)

2022-05-04 Thread Christophe Lyon via Gcc-patches

ping?

On 4/20/22 10:50, Christophe Lyon wrote:

This simple patch avoids the ICE described in the PR:
internal compiler error: in simd_valid_immediate, at config/arm/arm.cc:12866

with an early exit from simd_valid_immediate if we are trying to
handle a vector of booleans and MVE is not enabled.

We still get an ICE when compiling the existing
gcc.dg/rtl/arm/mve-vxbi.c without -march=armv8.1-m.main+mve:

error: unrecognizable insn:
(insn 7 5 8 2 (set (reg:V4BI 114)
 (const_vector:V4BI [
 (const_int 1 [0x1])
 (const_int 0 [0]) repeated x2
 (const_int 1 [0x1])
 ])) -1
  (nil))
during RTL pass: ira

but there's little we can do since the testcase explicitly creates
vectors of booleans which do need MVE.

That is the reason why I do not add a testcase.

2022-04-19  Christophe Lyon  

PR target/104662
* config/arm/arm.cc (simd_valid_immediate): Exit when input is a
vector of booleans and MVE is not enabled.
---
  gcc/config/arm/arm.cc | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 14e2fdfeafa..69a18c2f157 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -12849,6 +12849,9 @@ simd_valid_immediate (rtx op, machine_mode mode, int 
inverse,
  || n_elts * innersize != 16))
  return -1;
  
+  if (!TARGET_HAVE_MVE && GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)

+return -1;
+
/* Vectors of float constants.  */
if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
  {