date:20220517

Re: [PATCH] [Middle-end] Enhance final_value_replacement_loop to handle bitwise induction.

2022-05-17 Thread Richard Biener via Gcc-patches

On Wed, May 18, 2022 at 4:45 AM Hongtao Liu  wrote:
>
> On Fri, May 13, 2022 at 7:16 PM Richard Biener
>  wrote:
> >
> > On Fri, May 13, 2022 at 5:37 AM Hongtao Liu  wrote:
> > >
> > > On Wed, May 11, 2022 at 4:45 PM Richard Biener via Gcc-patches
> > >  wrote:
> > > >
> > > > On Mon, May 9, 2022 at 7:19 AM liuhongt  wrote:
> > > > >
> > > > > This patch will enable below optimization:
> > > > >
> > > > >  {
> > > > > -  int bit;
> > > > > -  long long unsigned int _1;
> > > > > -  long long unsigned int _2;
> > > > > -
> > > > > [local count: 46707768]:
> > > > > -
> > > > > -   [local count: 1027034057]:
> > > > > -  # tmp_11 = PHI 
> > > > > -  # bit_13 = PHI 
> > > > > -  _1 = 1 << bit_13;
> > > > > -  _2 = ~_1;
> > > > > -  tmp_8 = _2 & tmp_11;
> > > > > -  bit_9 = bit_13 + -3;
> > > > > -  if (bit_9 != -3(OVF))
> > > > > -goto ; [95.65%]
> > > > > -  else
> > > > > -goto ; [4.35%]
> > > > > -
> > > > > -   [local count: 46707768]:
> > > > > -  return tmp_8;
> > > > > +  tmp_12 = tmp_6(D) & 7905747460161236406;
> > > > > +  return tmp_12;
> > > > >
> > > > >  }
> > > > >
> > > > >
> > > > > Boostrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > > > > Ok for trunk?
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR middle-end/103462
> > > > > * match.pd (bitwise_induction_p): New match.
> > > > > * tree-scalar-evolution.c (gimple_bitwise_induction_p):
> > > > > Declare.
> > > > > (analyze_and_compute_bitwise_induction_effect): New function.
> > > > > (enum bit_op_kind): New enum.
> > > > > (final_value_replacement_loop): Enhanced to handle bitwise
> > > > > induction.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/i386/pr103462-1.c: New test.
> > > > > * gcc.target/i386/pr103462-2.c: New test.
> > > > > * gcc.target/i386/pr103462-3.c: New test.
> > > > > * gcc.target/i386/pr103462-4.c: New test.
> > > > > * gcc.target/i386/pr103462-5.c: New test.
> > > > > * gcc.target/i386/pr103462-6.c: New test.
> > > > > ---
> > > > >  gcc/match.pd   |   7 +
> > > > >  gcc/testsuite/gcc.target/i386/pr103462-1.c | 111 +
> > > > >  gcc/testsuite/gcc.target/i386/pr103462-2.c |  45 ++
> > > > >  gcc/testsuite/gcc.target/i386/pr103462-3.c | 111 +
> > > > >  gcc/testsuite/gcc.target/i386/pr103462-4.c |  46 ++
> > > > >  gcc/testsuite/gcc.target/i386/pr103462-5.c | 111 +
> > > > >  gcc/testsuite/gcc.target/i386/pr103462-6.c |  46 ++
> > > > >  gcc/tree-scalar-evolution.cc   | 178 
> > > > > -
> > > > >  8 files changed, 654 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-1.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-2.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-3.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-4.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-5.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-6.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 6d691d302b3..24ff5f9e6a8 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -7746,3 +7746,10 @@ and,
> > > > >== TYPE_UNSIGNED (TREE_TYPE (@3
> > > > > && single_use (@4)
> > > > > && single_use (@5
> > > > > +
> > > > > +(for bit_op (bit_and bit_ior bit_xor)
> > > > > + (match (bitwise_induction_p @0 @2 @3)
> > > > > +   (bit_op:c (nop_convert1? (bit_not2?@0 (convert3? (lshift 
> > > > > integer_onep@1 @2 @3)))
> > > > > +
> > > > > +(match (bitwise_induction_p @0 @2 @3)
> > > > > +  (bit_not (nop_convert1? (bit_xor@0 (convert2? (lshift 
> > > > > integer_onep@1 @2)) @3
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr103462-1.c 
> > > > > b/gcc/testsuite/gcc.target/i386/pr103462-1.c
> > > > > new file mode 100644
> > > > > index 000..1dc4c2acad6
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/pr103462-1.c
> > > > > @@ -0,0 +1,111 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O1 -fdump-tree-sccp-details" } */
> > > > > +/* { dg-final { scan-tree-dump-times {final value replacement} 12 
> > > > > "sccp" } } */
> > > > > +
> > > > > +unsigned long long
> > > > > +__attribute__((noipa))
> > > > > +foo (unsigned long long tmp)
> > > > > +{
> > > > > +  for (int bit = 0; bit < 64; bit += 3)
> > > > > +tmp &= ~(1ULL << bit);
> > > > > +  return tmp;
> > > > > +}
> > > > > +
> > > > > +unsigned long long
> > > > > +__attribute__((noipa))
> > > > > +foo1 (unsigned long long tmp)
> > > > > +{
> > > > > +  for (int bit = 63; bit >= 0; bit -= 3)
> > > > > +tmp &= ~(1ULL << bit);
> > > > > +  return tmp;
> > > > > +}
> > > > > +
> > > > > +unsigned long long
> > > > > +__at

Re: [PATCH] PR tree-optimization/31178 - Add rshift side effect.

2022-05-17 Thread Richard Biener via Gcc-patches

On Tue, May 17, 2022 at 8:41 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> This patch implements side effects of the second operand of a shift
> operation.
>
> given A >> B or A << B, the range of B is restricted to [0, PRECISION_A).
>
> Fortran is currently more permissive than this, allowing the range to be
> [0, PRECISION_A], so this si the value we currently default to in this
> patch.  If the fortran front end were adjusted, we could adjust the end
> point.
>
> This currently bootstraps with no regressions on x86_64-pc-linux-gnu.
>
> Is this sufficient, or should I also be checking some other flags which
> may allow other values outside this range to be valid?

I think the "undefined behavior" side-effects are dangerous since
I know of no code that makes sure to clamp the shift argument when
hoisting it.  sanitizing shifts will also not help to discover such latent
issues since sanitizing is happening early and it will most definitely
avoid the hoisting itself.

As to that we _definitely_ want a way to disable this [assumption
that the shift operand is in-range] if we make that assumption
even on IL state after optimizations.

Candidates to look for are invariant motion, ifcombine,
partial PRE, PRE in general (we possibly hoist such expressions
across function calls that might terminate the program normally).

That is, what prevents

   if (n > 32)
 abort ();
   x = i << n;

to be transformed to

   x = i << n;
   if (n > 32)
 abort ();

?  Yes, that's probably a latent issue in some sense but you would
now optimize the if (n > 32) abort () away while previously x would
have an undetermined value but we'd still abort.

Do you have some statistics on how this particular side-effect
improves code generation?

Richard.

>
> Andrew
>
>
> PS. Note that in the testcase,  one of the tests is currently disabled
> as full recomputation of side-effects is not quite in place yet. WHen ti
> is, I will enable the test.
>

Re: [PATCH] Add divide by zero side effect.

2022-05-17 Thread Richard Biener via Gcc-patches

On Tue, May 17, 2022 at 8:40 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> I haven't checked this patch in yet.  This implements a side effect that
> the divisor cannot be 0 after a divide executes. This allows us to fold
> the divide away:
>
> a = b / c;
> if (c == 0)
>dead();
>
> This bootstraps on x86_64-pc-linux-gnu with no regressions, but I first
> wanted to check to see if there are some flags or conditions that should
> e checked in order NOT to do this optimization.  I am guessing there is
> probably something :-)Anyway, this is how we straightforwardly add
> side effects now.
>
> Does the patch conditions need tweaking to apply the side effect?

What does "after the stmt" mean?  If the stmt throws internally then on
the EH edge the divisor can be zero.

How do you fold away the divide in your above example?

Richard.

>
> Andrew
>

Re: [PATCH] middle-end/105604 - snprintf dianostics and non-constant sizes/offsets

2022-05-17 Thread Richard Biener via Gcc-patches

On Tue, 17 May 2022, Martin Sebor wrote:

> On 5/16/22 03:16, Richard Biener wrote:
> > The following tries to correct get_origin_and_offset_r not handling
> > non-constant sizes of array elements in ARRAY_REFs and non-constant
> > offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
> > should be treated in this API and existing handling isn't consistent
> > here either.  The following applies two different variants, treating
> > non-constant array sizes like non-constant array indices and
> > treating non-constant offsets of COMPONENT_REFs by terminating
> > the recursion (not sure what that means to the callers).
> > 
> > Basically the code failed to use component_ref_field_offset and
> > array_ref_element_size and instead relies on inappropriate
> > helpers (that shouldn't exist in the first place ...).  The code
> > is also not safe-guarded against overflows in the final offset/size
> > computations but I'm not trying to rectify that.
> > 
> > Martin - can you comment on how the API should handle such
> > situations?
> 
> It looks like the -Wrestrict warning here ignores offsets equal to
> HOST_WIDE_INT_MIN so presumably setting dst_offset (via *fldoff) to
> that should avoid it.  Or maybe to HWI_MAX as it does for variable
> offsets.

Can you suggest wording for the function comment as to how it handles
the case when offset or size cannot be determined exactly?   The
comment currently only suggests that the caller possibly cannot
trust fldsize or off when the function returns NULL but the actual
implementation differs from that.

> It also looks like the function only handles constant offsets and
> sizes, and I have a vague recollection of enhancing it to work with
> ranges.  That should avoid the overflow problem too.

So the correct thing is to return NULL?

Is the patch OK as-is?  As said, I'm not sure how the caller interprets
the result and how it can distinguish the exact vs. non-exact cases
or what a "conservative" inexact answer would be.

Please help properly documenting this API.

Thanks,
Richard.

> Martin
> 
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK for trunk and branches?
> > 
> > Thanks,
> > Richard.
> > 
> > 2022-05-16  Richard Biener  
> > 
> >  PR middle-end/105604
> >  * gimple-ssa-sprintf.cc (get_origin_and_offset_r):
> >  Handle non-constant ARRAY_REF element size and non-constant
> >  COMPONENT_REF field offset.
> > 
> > * gcc.dg/torture/pr105604.c: New testcase.
> > ---
> >   gcc/gimple-ssa-sprintf.cc   | 14 +++---
> >   gcc/testsuite/gcc.dg/torture/pr105604.c | 24 
> >   2 files changed, 35 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.dg/torture/pr105604.c
> > 
> > diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
> > index c93f12f90b5..14e215ce69c 100644
> > --- a/gcc/gimple-ssa-sprintf.cc
> > +++ b/gcc/gimple-ssa-sprintf.cc
> > @@ -2312,14 +2312,16 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT
> > *fldoff, HOST_WIDE_INT *fldsize,
> >HOST_WIDE_INT idx = (tree_fits_uhwi_p (offset)
> >   ? tree_to_uhwi (offset) : HOST_WIDE_INT_MAX);
> >   + tree elsz = array_ref_element_size (x);
> >tree eltype = TREE_TYPE (x);
> >if (TREE_CODE (eltype) == INTEGER_TYPE)
> >  {
> >if (off)
> >  *off = idx;
> >   }
> > -   else if (idx < HOST_WIDE_INT_MAX)
> > - *fldoff += idx * int_size_in_bytes (eltype);
> > +   else if (idx < HOST_WIDE_INT_MAX
> > +&& tree_fits_shwi_p (elsz))
> > + *fldoff += idx * tree_to_shwi (elsz);
> >else
> >  *fldoff = idx;
> >   @@ -2350,8 +2352,14 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT
> > *fldoff, HOST_WIDE_INT *fldsize,
> >   
> >   case COMPONENT_REF:
> > {
> > +   tree foff = component_ref_field_offset (x);
> > tree fld = TREE_OPERAND (x, 1);
> > -   *fldoff += int_byte_position (fld);
> > +   if (!tree_fits_shwi_p (foff)
> > +   || !tree_fits_shwi_p (DECL_FIELD_BIT_OFFSET (fld)))
> > + return x;
> > +   *fldoff += (tree_to_shwi (foff)
> > +   + (tree_to_shwi (DECL_FIELD_BIT_OFFSET (fld))
> > +  / BITS_PER_UNIT));
> >   
> >get_origin_and_offset_r (fld, fldoff, fldsize, off);
> >x = TREE_OPERAND (x, 0);
> > diff --git a/gcc/testsuite/gcc.dg/torture/pr105604.c
> > b/gcc/testsuite/gcc.dg/torture/pr105604.c
> > new file mode 100644
> > index 000..b002251df10
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/pr105604.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-Wall" } */
> > +
> > +struct {
> > +  long users;
> > +  long size;
> > +  char *data;
> > +} * main_trans;
> > +void *main___trans_tmp_1;
> > +int sprintf(char *, char *, ...);
> > +int main() {
> > +  int users = 0;
> > +  struct {
> > +long users;
> > +long size;
> > +char *data;
> > +int links[users];
> > +char buf[];
> > +  } *trans = trans;
> > +  tr

Re: [PATCH] Simplify logic in tree-scalar-evolution's expensive_expression_p.

2022-05-17 Thread Richard Biener via Gcc-patches

On Tue, May 17, 2022 at 7:51 PM Roger Sayle  wrote:
>
>
> This patch simplifies tree-scalar-evolution's expensive_expression_p, but
> produces identical results; the replacement implementation is just smaller
> (uses less memory), faster and easier to understand.
>
> The current idiom (introduced to fix PR90726) looks like:
>
> hash_map cache;
> uint64_t expanded_size = 0;
> return (expression_expensive_p (expr, cache, expanded_size)
>|| expanded_size > cache.elements ());
>
> Here the recursive function computes expanded_size, effectively the
> number of tree nodes visited, which is then only used in the comparison
> against cache.elements(), i.e. to check whether the number of visited
> nodes is greater than the number of unique visited nodes.  This is
> equivalent to instead checking where expression_expensive_p's recursion
> visits any node more than once.
>
> Instead of using a map to cache the "cost" of revisited sub-trees, the
> same outcome can be determined using a set, and immediately returning
> true as soon as encountering a previously seen tree node, avoiding the
> unnecessary "cost"/expanded_size computation.  [A simplification analogous
> to checking STL's empty() instead of comparing size() with zero].
>
> The semantics of expensive_expression_p (both before and after) are
> quite reasonable, as calling unshare_expr on a generic tree can result
> in an exponential growth in the number of gimple statements, hence
> any "shared" nodes are indeed expensive.  If shared nodes are to be
> allowed, they'll need to be managed explicitly with SAVE_EXPR (or similar
> mechanism) that avoids exponential growth.

Indeed.  The code seems to allow for doing "better" than counting the
cost of a sub-expression either as one or giving up on the whole expression
though while the cleanup removes this possibility.  In fact the predicate
currently allows arbitrary expensive (read: large) expressions and the
result of expression_expensive_p (x) and expression_expensive_p
(unshare_expr (x))
isn't equal which IMHO is a bug.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}, with
> no new failures.  Is this a reasonable clean-up for mainline?

So I think the change goes in the wrong direction, even if it preserves
the current semantics.  The bug is probably in the callers allowing
arbitrarily large expressions so maybe the API can be changed to
provide a max_size argument.

I know I placed the extra expression expansion limit ontop of the
previous implementation which just looked for an expensive operation.
Also note that technically unshare_expr isn't necessary if
gimplification would cope with shared trees - possibly a variant which,
instead of unsharing in-expression uses, would insert SAVE_EXPRs,
would do the trick here.

Richard.

>
> 2022-05-17  Roger Sayle  
>
> gcc/ChangeLog
> * tree_scalar_evolution.cc (expression_expensive_p): Change type
> of cache from hash_map to hash_set, and remove cost argument.
> When expr appears in the hash_set, return true.  Calculation of
> cost (and updating hash_map) is no longer required.
> (expression_expensive_p):  Simplify top-level implementation.
>
>
> Thanks in advance,
> Roger
> --
>

Re: [PATCH] PR tree-optimization/105458 - Check for equivalence after merging relations.

2022-05-17 Thread Richard Biener via Gcc-patches

On Tue, May 17, 2022 at 5:46 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> Sorry, missed this one earlier.
>
> When we register a relation, such as LE_EXPR,  we first check if there
> is an existing relation that applies, and if so they are combined. We
> were checking if the relation being registered was an EQ_EXPR, and if
> so, invoked the equivalence oracle.
>
>   I was doing the check for EQ_EXPR first, then merging with any
> existing relation.   In this case, the merge resulted in transforming
> the LE_EXPR into an EQ_EXPR, but the check to invoke the
> equivalence_oracle had already been done, and we got to a place we
> shouldn't have. doh!
>
> The fix is to do the merge first, then check for EQ_EXPR.
>
> The patch is a hair different (due to VREL_*  renames in gcc13), so I've
> attached both patches.
>
> bootstraps on gcc12 and gcc13 with no regressions.  pushed on trunk.
>
> OK for GCC12?

OK

> Andrew

Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109.

2022-05-17 Thread Michael Meissner via Gcc-patches

On Fri, May 13, 2022 at 01:20:30PM -0500, will schmidt wrote:
> On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote:
> > Optimize multiply/add of DImode extended to TImode, PR target/103109.
> > 
> > On power9 and power10 systems, we have instructions that support doing
> > 64-bit integers converted to 128-bit integers and producing 128-bit
> > results.  This patch adds support to generate these instructions.
> > 
> > Previously GCC had define_expands to handle conversion of the 64-bit
> > extend to 128-bit and multiply.  This patch changes these define_expands
> > to define_insn_and_split and then it provides combiner patterns to
> > generate thes multiply/add instructions.
> > 
> > To support using this optimization on power9, this patch extends the sign
> > extend DImode to TImode to also run on power9 (added for PR
> > target/104698).
> > 
> > This patch needs the previous patch to add unsigned DImode to TImode
> > conversion so that the combiner can combine the extend, multiply, and add
> > instructions.
> > 
> > I have built this patch on little endian power10, little endian power9, and 
> > big
> > endian power8 systems.  There were no regressions when I ran it.  Can I 
> > install
> > this patch into the GCC 13 master branch?
> > 
> > 2022-05-13   Michael Meissner  
> > 
> > gcc/
> > PR target/103109
> > * config/rs6000/rs6000.md (su_int32): New code attribute.
> > (mul3): Convert from define_expand to
> > define_insn_and_split.
> > (maddld4): Add generator function.
> 
> -(define_insn "*maddld4"
> +(define_insn "maddld4"
> 
> Is the removal of the "*" considering adding generator?  (Thats
> terminology that I'm not immediately familiar with). 

Yes.  If you have a pattern:

(define_insn "foosi2"
  [(set (match_operand:SI 0 "register_operand" "=r")
(foo:SI (match_operand:SI 1 "register_operand" "r")))]
""
"foo %0,%1")

It creates a 'gen_foosi2' function that has 2 arguments, and it makes the insn
listed.

It then has support for insn recognition and output.

If the pattern starts with a '*', there is no 'gen_foosi2' function created,
but the insn recognitiion and output are still done.

In practice, you typically use the '*' names for patterns that are used as the
targets of combination, or separate insns for different machines.

Here is the verbage from rtl.texi:

These names serve one of two purposes.  The first is to indicate that the
instruction performs a certain standard job for the RTL-generation
pass of the compiler, such as a move, an addition, or a conditional
jump.  The second is to help the target generate certain target-specific
operations, such as when implementing target-specific intrinsic functions.

It is better to prefix target-specific names with the name of the
target, to avoid any clash with current or future standard names.

The absence of a name is indicated by writing an empty string
where the name should go.  Nameless instruction patterns are never
used for generating RTL code, but they may permit several simpler insns
to be combined later on.

For the purpose of debugging the compiler, you may also specify a
name beginning with the @samp{*} character.  Such a name is used only
for identifying the instruction in RTL dumps; it is equivalent to having
a nameless pattern for all other purposes.  Names beginning with the
@samp{*} character are not required to be unique.

> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103109.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr103109.c
> > new file mode 100644
> > index 000..ae2cfb9eda7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr103109.c
> > @@ -0,0 +1,62 @@
> > +/* { dg-require-effective-target int128 } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> > +
> > +/* This test makes sure that GCC generates the maddhd, maddhdu, and maddld
> > +   power9 instructions when doing some forms of 64-bit integers converted 
> > to
> > +   128-bit integers and used with multiply/add operations.  */
> > +
> > +__int128_t
> > +s_mult_add (long long a,
> > +   long long b,
> > +   long long c)
> > +{
> > +  /* maddhd, maddld.  */
> > +  return ((__int128_t)a * (__int128_t)b) + (__int128_t)c;
> > +}
> > +
> > +/* Test 32-bit constants that are loaded into GPRs instead of doing the
> > +   mulld/mulhd and then addic/addime or addc/addze.  */
> 
> addme ?

Yes, I meant addme.  Good catch.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH] [i386] recognize bzhi pattern when there's zero_extendsidi.

2022-05-17 Thread Hongtao Liu via Gcc-patches

On Tue, May 17, 2022 at 6:07 PM Uros Bizjak via Gcc-patches
 wrote:
>
> On Tue, May 17, 2022 at 5:06 AM liuhongt  wrote:
> >
> > backend has
> >
> > 16550(define_insn "*bmi2_bzhi_3_2"
> > 16551  [(set (match_operand:SWI48 0 "register_operand" "=r")
> > 16552(and:SWI48
> > 16553  (plus:SWI48
> > 16554(ashift:SWI48 (const_int 1)
> > 16555  (match_operand:QI 2 "register_operand" "r"))
> > 16556(const_int -1))
> > 16557  (match_operand:SWI48 1 "nonimmediate_operand" "rm")))
> > 16558   (clobber (reg:CC FLAGS_REG))]
> > 16559  "TARGET_BMI2"
> > 16560  "bzhi\t{%2, %1, %0|%0, %1, %2}"
> > 16561  [(set_attr "type" "bitmanip")
> > 16562   (set_attr "prefix" "vex")
> > 16563   (set_attr "mode" "")])
> >
> > But there's extra zero_extend in pattern match.
> >
> > 424Failed to match this instruction:
> > 425(parallel [
> > 426(set (reg:DI 90)
> > 427(zero_extend:DI (and:SI (plus:SI (ashift:SI (const_int 1 
> > [0x1])
> > 428(subreg:QI (reg:SI 98) 0))
> > 429(const_int -1 [0x]))
> > 430(subreg:SI (reg:DI 95) 0
> > 431(clobber (reg:CC 17 flags))
> > 432])
> >
> > Add new define_insn for it.
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}..
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/104375
> > * config/i386/i386.md (*bmi2_bzhi_zero_extendsidi_4): New
> > define_insn.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr104375.c: New test.
>
> OK with a nit below.
Thanks for the review, changed and committed.
>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386.md  | 16 
> >  gcc/testsuite/gcc.target/i386/pr104375.c |  9 +
> >  2 files changed, 25 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104375.c
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index f9c06ff302a..ec7bdd04947 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -16636,6 +16636,22 @@ (define_insn "*bmi2_bzhi_3_3"
> > (set_attr "prefix" "vex")
> > (set_attr "mode" "")])
> >
> > +(define_insn "*bmi2_bzhi_zero_extendsidi_4"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (zero_extend:DI
> > + (and:SI
> > +   (plus:SI
> > + (ashift:SI (const_int 1)
> > +(match_operand:QI 2 "register_operand" "r"))
> > + (const_int -1))
> > +   (match_operand:SI 1 "nonimmediate_operand" "rm"
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_BMI2 && TARGET_64BIT"
>
> Please put TARGET_64BIT first here.
>
> > +  "bzhi\t{%q2, %q1, %q0|%q0, %q1, %q2}"
> > +  [(set_attr "type" "bitmanip")
> > +   (set_attr "prefix" "vex")
> > +   (set_attr "mode" "DI")])
> > +
> >  (define_insn "bmi2_pdep_3"
> >[(set (match_operand:SWI48 0 "register_operand" "=r")
> >  (unspec:SWI48 [(match_operand:SWI48 1 "register_operand" "r")
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104375.c 
> > b/gcc/testsuite/gcc.target/i386/pr104375.c
> > new file mode 100644
> > index 000..5c9f511da5c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104375.c
> > @@ -0,0 +1,9 @@
> > +#/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-mbmi2 -O2" } */
> > +/* { dg-final { scan-assembler-times {(?n)shrx[\t ]+} 1 } } */
> > +/* { dg-final { scan-assembler-times {(?n)bzhi[\t ]+} 1 } } */
> > +
> > +unsigned long long bextr_u64(unsigned long long w, unsigned off, unsigned 
> > int len)
> > +{
> > +return (w >> off) & ((1U << len) - 1U);
> > +}
> > --
> > 2.18.1
> >



-- 
BR,
Hongtao

Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OImode.

2022-05-17 Thread Hongtao Liu via Gcc-patches

On Tue, May 17, 2022 at 6:03 PM Uros Bizjak  wrote:
>
> On Tue, May 17, 2022 at 3:33 AM Hongtao Liu  wrote:
> >
> > On Mon, May 16, 2022 at 5:21 PM Uros Bizjak via Gcc-patches
> >  wrote:
> > >
> > > On Sat, May 7, 2022 at 7:05 AM liuhongt  wrote:
> > > >
> > > > This is adjusted patch only for OImode.
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/104610
> > > > * config/i386/i386-expand.cc (ix86_expand_branch): Use ptest
> > > > for QImode when code is EQ or NE.
> > > > * config/i386/sse.md (cbranch4): Extend to OImode.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/pr104610.c: New test.
> > > > ---
> > > >  gcc/config/i386/i386-expand.cc   | 10 +-
> > > >  gcc/config/i386/sse.md   |  8 ++--
> > > >  gcc/testsuite/gcc.target/i386/pr104610.c | 15 +++
> > > >  3 files changed, 30 insertions(+), 3 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104610.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > > b/gcc/config/i386/i386-expand.cc
> > > > index bc806ffa283..c2f8776102c 100644
> > > > --- a/gcc/config/i386/i386-expand.cc
> > > > +++ b/gcc/config/i386/i386-expand.cc
> > > > @@ -2267,11 +2267,19 @@ ix86_expand_branch (enum rtx_code code, rtx 
> > > > op0, rtx op1, rtx label)
> > > >
> > > >/* Handle special case - vector comparsion with boolean result, 
> > > > transform
> > > >   it using ptest instruction.  */
> > > > -  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> > > > +  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> > > > +  || (mode == OImode && (code == EQ || code == NE)))
> > >
> > > No need for the code check here. You have an assert in the code below.
> > >
> > Changed.
> > I mistakenly saw the QImode as OImode, I thought OImode other compare
> > code can also handle.
> > > >  {
> > > >rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> > > >machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : 
> > > > V2DImode;
> > > >
> > > > +  if (mode == OImode)
> > > > +   {
> > > > + op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > > > + op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > > > + mode = p_mode;
> > > > +   }
> > > > +
> > > >gcc_assert (code == EQ || code == NE);
> > >
> > > Please put the above hunk after the assert.
> > Changed.
> > >
> > > >/* Generate XOR since we can't check that one operand is zero 
> > > > vector.  */
> > > >tmp = gen_reg_rtx (mode);
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > index 7b791def542..9514b8e0234 100644
> > > > --- a/gcc/config/i386/sse.md
> > > > +++ b/gcc/config/i386/sse.md
> > > > @@ -26034,10 +26034,14 @@ (define_expand 
> > > > "maskstore"
> > > >   (match_operand: 2 "register_operand")))]
> > > >"TARGET_AVX512BW")
> > > >
> > > > +(define_mode_iterator VI48_OI_AVX
> > > > +  [(V8SI "TARGET_AVX") (V4DI "TARGET_AVX") (OI "TARGET_AVX")
> > > > +   V4SI V2DI])
> > > > +
> > > >  (define_expand "cbranch4"
> > > >[(set (reg:CC FLAGS_REG)
> > > > -   (compare:CC (match_operand:VI48_AVX 1 "register_operand")
> > > > -   (match_operand:VI48_AVX 2 "nonimmediate_operand")))
> > > > +   (compare:CC (match_operand:VI48_OI_AVX 1 "register_operand")
> > > > +   (match_operand:VI48_OI_AVX 2 
> > > > "nonimmediate_operand")))
> > > > (set (pc) (if_then_else
> > > >(match_operator 0 "bt_comparison_operator"
> > > > [(reg:CC FLAGS_REG) (const_int 0)])
> > >
> > > Please rather put the new cbranchoi4 expander in i386.md.
> > Good idea, changed.
> > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c 
> > > > b/gcc/testsuite/gcc.target/i386/pr104610.c
> > > > new file mode 100644
> > > > index 000..00866238bd7
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> > > > @@ -0,0 +1,15 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -mmove-max=256 -mstore-max=256" } */
> > > > +/* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
> > > > +/* { dg-final { scan-assembler-times {sete} 1 } } */
> > > > +/* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> > > > +/* { dg-final { scan-assembler-not {(?n)jne.*L[0-9]} } } */
> > > > +
> > > > +
> > > > +#include
> > > > +__attribute__((target("avx")))
> > > > +bool f256(char *a)
> > >
> > > Use _Bool istead and simply pass -mavx to dg-options.
> > >
> > Changed.
> > > Uros.
> > >
> > > > +{
> > > > +  char t[] = "0123456789012345678901234567890";
> > > > +  return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
> > > > +}
> > > > --
> > > > 2.18.1
> > > >
> >
> >
> > Here's the updated patch.
>
>
>gcc_assert (code == EQ || code == NE);
> +

Re: [PATCH] [Middle-end] Enhance final_value_replacement_loop to handle bitwise induction.

2022-05-17 Thread Hongtao Liu via Gcc-patches

On Fri, May 13, 2022 at 7:16 PM Richard Biener
 wrote:
>
> On Fri, May 13, 2022 at 5:37 AM Hongtao Liu  wrote:
> >
> > On Wed, May 11, 2022 at 4:45 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Mon, May 9, 2022 at 7:19 AM liuhongt  wrote:
> > > >
> > > > This patch will enable below optimization:
> > > >
> > > >  {
> > > > -  int bit;
> > > > -  long long unsigned int _1;
> > > > -  long long unsigned int _2;
> > > > -
> > > > [local count: 46707768]:
> > > > -
> > > > -   [local count: 1027034057]:
> > > > -  # tmp_11 = PHI 
> > > > -  # bit_13 = PHI 
> > > > -  _1 = 1 << bit_13;
> > > > -  _2 = ~_1;
> > > > -  tmp_8 = _2 & tmp_11;
> > > > -  bit_9 = bit_13 + -3;
> > > > -  if (bit_9 != -3(OVF))
> > > > -goto ; [95.65%]
> > > > -  else
> > > > -goto ; [4.35%]
> > > > -
> > > > -   [local count: 46707768]:
> > > > -  return tmp_8;
> > > > +  tmp_12 = tmp_6(D) & 7905747460161236406;
> > > > +  return tmp_12;
> > > >
> > > >  }
> > > >
> > > >
> > > > Boostrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > > > Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR middle-end/103462
> > > > * match.pd (bitwise_induction_p): New match.
> > > > * tree-scalar-evolution.c (gimple_bitwise_induction_p):
> > > > Declare.
> > > > (analyze_and_compute_bitwise_induction_effect): New function.
> > > > (enum bit_op_kind): New enum.
> > > > (final_value_replacement_loop): Enhanced to handle bitwise
> > > > induction.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/pr103462-1.c: New test.
> > > > * gcc.target/i386/pr103462-2.c: New test.
> > > > * gcc.target/i386/pr103462-3.c: New test.
> > > > * gcc.target/i386/pr103462-4.c: New test.
> > > > * gcc.target/i386/pr103462-5.c: New test.
> > > > * gcc.target/i386/pr103462-6.c: New test.
> > > > ---
> > > >  gcc/match.pd   |   7 +
> > > >  gcc/testsuite/gcc.target/i386/pr103462-1.c | 111 +
> > > >  gcc/testsuite/gcc.target/i386/pr103462-2.c |  45 ++
> > > >  gcc/testsuite/gcc.target/i386/pr103462-3.c | 111 +
> > > >  gcc/testsuite/gcc.target/i386/pr103462-4.c |  46 ++
> > > >  gcc/testsuite/gcc.target/i386/pr103462-5.c | 111 +
> > > >  gcc/testsuite/gcc.target/i386/pr103462-6.c |  46 ++
> > > >  gcc/tree-scalar-evolution.cc   | 178 -
> > > >  8 files changed, 654 insertions(+), 1 deletion(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-3.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-4.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-5.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-6.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index 6d691d302b3..24ff5f9e6a8 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -7746,3 +7746,10 @@ and,
> > > >== TYPE_UNSIGNED (TREE_TYPE (@3
> > > > && single_use (@4)
> > > > && single_use (@5
> > > > +
> > > > +(for bit_op (bit_and bit_ior bit_xor)
> > > > + (match (bitwise_induction_p @0 @2 @3)
> > > > +   (bit_op:c (nop_convert1? (bit_not2?@0 (convert3? (lshift 
> > > > integer_onep@1 @2 @3)))
> > > > +
> > > > +(match (bitwise_induction_p @0 @2 @3)
> > > > +  (bit_not (nop_convert1? (bit_xor@0 (convert2? (lshift integer_onep@1 
> > > > @2)) @3
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr103462-1.c 
> > > > b/gcc/testsuite/gcc.target/i386/pr103462-1.c
> > > > new file mode 100644
> > > > index 000..1dc4c2acad6
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr103462-1.c
> > > > @@ -0,0 +1,111 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O1 -fdump-tree-sccp-details" } */
> > > > +/* { dg-final { scan-tree-dump-times {final value replacement} 12 
> > > > "sccp" } } */
> > > > +
> > > > +unsigned long long
> > > > +__attribute__((noipa))
> > > > +foo (unsigned long long tmp)
> > > > +{
> > > > +  for (int bit = 0; bit < 64; bit += 3)
> > > > +tmp &= ~(1ULL << bit);
> > > > +  return tmp;
> > > > +}
> > > > +
> > > > +unsigned long long
> > > > +__attribute__((noipa))
> > > > +foo1 (unsigned long long tmp)
> > > > +{
> > > > +  for (int bit = 63; bit >= 0; bit -= 3)
> > > > +tmp &= ~(1ULL << bit);
> > > > +  return tmp;
> > > > +}
> > > > +
> > > > +unsigned long long
> > > > +__attribute__((noipa))
> > > > +foo2 (unsigned long long tmp)
> > > > +{
> > > > +  for (int bit = 0; bit < 64; bit += 3)
> > > > +tmp &= (1ULL << bit);
> > > > +  return tmp;
> > > > +}
> > > > +
> > > > +unsigned long long
> > > > +__attribute__((noipa))
> > > > +foo3 (unsigned long long tmp)
> > > > +{
> > > > +  f

Re: [PATCH v2] rs6000: Prefer assigning the MMA vector operands to altivec registers [PR105556]

2022-05-17 Thread Peter Bergner via Gcc-patches

On 5/17/22 6:41 PM, Segher Boessenkool wrote:
> On Mon, May 16, 2022 at 05:31:31PM -0500, Peter Bergner wrote:
>>  (define_insn "mma_"
>> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
>> -(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
>> -(match_operand:V16QI 2 "vsx_register_operand" "wa")]
>> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
>> +(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
>> +(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
> 
> You now have two "?" on alternative 1, instead of just one.  This is the
> same as if you had had
>   [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
>   (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,??wa")
>   (match_operand:V16QI 2 "vsx_register_operand" "v,wa")]
> The "?" are per alternative, not really per operand.  It won't change
> much here of course, just penalise more than you perhaps expected.

Ak, ok.  I think giving an extra penalty is fine here, since we really
really want to use altivec regs here, so I went with the patch as is.
Pushed.

I'll wait a few days before backporting to GCC 12.  As for GCC 11 & 10,
I'll wait until someone actually has a test case that shows the same
problem.  I suspect a patch went into GCC 12 that changed the costs
slightly and that's why we don't see the problem on the older branches.
Thanks!

Peter

[PATCH] c++: fix SIGFPE with -Wclass-memaccess [PR105634]

2022-05-17 Thread Marek Polacek via Gcc-patches

Here we crash because we attempt to % by 0.  Thus fixed.
While at it, I've moved the -Wclass-memaccess tests into warn/.
I've checked that the # of expected passes is the same before/after
the move.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/105634

gcc/cp/ChangeLog:

* call.cc (maybe_warn_class_memaccess): Avoid % by zero.

gcc/testsuite/ChangeLog:

* g++.dg/Wclass-memaccess-2.C: Moved to...
* g++.dg/warn/Wclass-memaccess-2.C: ...here.
* g++.dg/Wclass-memaccess-3.C: Moved to...
* g++.dg/warn/Wclass-memaccess-3.C: ...here.
* g++.dg/Wclass-memaccess-4.C: Moved to...
* g++.dg/warn/Wclass-memaccess-4.C: ...here.
* g++.dg/Wclass-memaccess-5.C: Moved to...
* g++.dg/warn/Wclass-memaccess-5.C: ...here.
* g++.dg/Wclass-memaccess-6.C: Moved to...
* g++.dg/warn/Wclass-memaccess-6.C: ...here.
* g++.dg/Wclass-memaccess.C: Moved to...
* g++.dg/warn/Wclass-memaccess.C: ...here.
* g++.dg/warn/Wclass-memaccess-7.C: New test.
---
 gcc/cp/call.cc  |  2 ++
 .../g++.dg/{ => warn}/Wclass-memaccess-2.C  |  0
 .../g++.dg/{ => warn}/Wclass-memaccess-3.C  |  0
 .../g++.dg/{ => warn}/Wclass-memaccess-4.C  |  0
 .../g++.dg/{ => warn}/Wclass-memaccess-5.C  |  0
 .../g++.dg/{ => warn}/Wclass-memaccess-6.C  |  0
 gcc/testsuite/g++.dg/warn/Wclass-memaccess-7.C  | 13 +
 gcc/testsuite/g++.dg/{ => warn}/Wclass-memaccess.C  |  0
 8 files changed, 15 insertions(+)
 rename gcc/testsuite/g++.dg/{ => warn}/Wclass-memaccess-2.C (100%)
 rename gcc/testsuite/g++.dg/{ => warn}/Wclass-memaccess-3.C (100%)
 rename gcc/testsuite/g++.dg/{ => warn}/Wclass-memaccess-4.C (100%)
 rename gcc/testsuite/g++.dg/{ => warn}/Wclass-memaccess-5.C (100%)
 rename gcc/testsuite/g++.dg/{ => warn}/Wclass-memaccess-6.C (100%)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wclass-memaccess-7.C
 rename gcc/testsuite/g++.dg/{ => warn}/Wclass-memaccess.C (100%)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 0240e364324..14c6037729f 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10329,6 +10329,8 @@ maybe_warn_class_memaccess (location_t loc, tree fndecl,
  /* Finally, warn on partial copies.  */
  unsigned HOST_WIDE_INT typesize
= tree_to_uhwi (TYPE_SIZE_UNIT (desttype));
+ if (typesize == 0)
+   break;
  if (unsigned HOST_WIDE_INT partial = tree_to_uhwi (sz) % typesize)
warned = warning_at (loc, OPT_Wclass_memaccess,
 (typesize - partial > 1
diff --git a/gcc/testsuite/g++.dg/Wclass-memaccess-2.C 
b/gcc/testsuite/g++.dg/warn/Wclass-memaccess-2.C
similarity index 100%
rename from gcc/testsuite/g++.dg/Wclass-memaccess-2.C
rename to gcc/testsuite/g++.dg/warn/Wclass-memaccess-2.C
diff --git a/gcc/testsuite/g++.dg/Wclass-memaccess-3.C 
b/gcc/testsuite/g++.dg/warn/Wclass-memaccess-3.C
similarity index 100%
rename from gcc/testsuite/g++.dg/Wclass-memaccess-3.C
rename to gcc/testsuite/g++.dg/warn/Wclass-memaccess-3.C
diff --git a/gcc/testsuite/g++.dg/Wclass-memaccess-4.C 
b/gcc/testsuite/g++.dg/warn/Wclass-memaccess-4.C
similarity index 100%
rename from gcc/testsuite/g++.dg/Wclass-memaccess-4.C
rename to gcc/testsuite/g++.dg/warn/Wclass-memaccess-4.C
diff --git a/gcc/testsuite/g++.dg/Wclass-memaccess-5.C 
b/gcc/testsuite/g++.dg/warn/Wclass-memaccess-5.C
similarity index 100%
rename from gcc/testsuite/g++.dg/Wclass-memaccess-5.C
rename to gcc/testsuite/g++.dg/warn/Wclass-memaccess-5.C
diff --git a/gcc/testsuite/g++.dg/Wclass-memaccess-6.C 
b/gcc/testsuite/g++.dg/warn/Wclass-memaccess-6.C
similarity index 100%
rename from gcc/testsuite/g++.dg/Wclass-memaccess-6.C
rename to gcc/testsuite/g++.dg/warn/Wclass-memaccess-6.C
diff --git a/gcc/testsuite/g++.dg/warn/Wclass-memaccess-7.C 
b/gcc/testsuite/g++.dg/warn/Wclass-memaccess-7.C
new file mode 100644
index 000..7e86b248629
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wclass-memaccess-7.C
@@ -0,0 +1,13 @@
+// PR c++/105634
+// { dg-do compile { target { c++11 } } }
+// { dg-options "-Wall" }
+
+struct s
+{
+  struct {} a[] = 1.0; // { dg-error "" }
+  void f (char *c)
+  {
+s s;
+__builtin_memcpy (&s, c, sizeof(s));
+  }
+};
diff --git a/gcc/testsuite/g++.dg/Wclass-memaccess.C 
b/gcc/testsuite/g++.dg/warn/Wclass-memaccess.C
similarity index 100%
rename from gcc/testsuite/g++.dg/Wclass-memaccess.C
rename to gcc/testsuite/g++.dg/warn/Wclass-memaccess.C

base-commit: c9852156dd2fedec130f6d8eb669579ef6237946
-- 
2.36.1

[PATCH] c: Implement new -Wenum-int-mismatch warning [PR105131]

2022-05-17 Thread Marek Polacek via Gcc-patches

In C, an enumerated type is compatible with char, a signed integer type,
or an unsigned integer type (6.7.2.2/5).  Therefore this code compiles:

  enum E { l = -1, z = 0, g = 1 };
  int foo(void);
  enum E foo(void) { return z; }

if the underlying type of 'enum E' is 'int' (if not, we emit an error).
This is different for typedefs, where C11 permits typedefs to be
redeclared to the same type, but not to compatible types.  In C++, the
code above is invalid.

It seems desirable to emit a warning in the C case, because it is
probably a mistake and definitely a portability error, given that the
choice of the underlying type is implementation-defined.

To that end, this patch implements a new -Wenum-int-mismatch warning.
Conveniently, we already have comptypes_check_enum_int to detect such
mismatches.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c/105131

gcc/c-family/ChangeLog:

* c.opt (Wenum-int-mismatch): New.

gcc/c/ChangeLog:

* c-decl.cc (diagnose_mismatched_decls): Warn about enum/integer type
mismatches.
* c-tree.h (comptypes_check_enum_int): Declare.
* c-typeck.cc (comptypes): No longer static.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wenum-int-mismatch.

gcc/testsuite/ChangeLog:

* gcc.dg/Wenum-int-mismatch-1.c: New test.
* gcc.dg/Wenum-int-mismatch-2.c: New test.
---
 gcc/c-family/c.opt  |  4 +++
 gcc/c/c-decl.cc | 13 ++--
 gcc/c/c-tree.h  |  1 +
 gcc/c/c-typeck.cc   |  2 +-
 gcc/doc/invoke.texi | 20 
 gcc/testsuite/gcc.dg/Wenum-int-mismatch-1.c | 35 +
 gcc/testsuite/gcc.dg/Wenum-int-mismatch-2.c | 35 +
 7 files changed, 107 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wenum-int-mismatch-1.c
 create mode 100644 gcc/testsuite/gcc.dg/Wenum-int-mismatch-2.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 035b1de0d84..0cb64283261 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -638,6 +638,10 @@ Wenum-conversion
 C ObjC C++ ObjC++ Var(warn_enum_conversion) Init(0) Warning LangEnabledBy(C 
ObjC,Wextra)
 Warn about implicit conversion of enum types.
 
+Wenum-int-mismatch
+C ObjC Var(warn_enum_int_mismatch) Warning LangEnabledBy(C ObjC,Wall)
+Warn about enum/integer type mismatches.
+
 Werror
 C ObjC C++ ObjC++
 ; Documented in common.opt
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 83655548fc4..5266a61b859 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -1993,9 +1993,12 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
 
   bool pedwarned = false;
   bool warned = false;
+  bool enum_and_int_p = false;
   auto_diagnostic_group d;
 
-  if (!comptypes (oldtype, newtype))
+  int comptypes_result = comptypes_check_enum_int (oldtype, newtype,
+  &enum_and_int_p);
+  if (!comptypes_result)
 {
   if (TREE_CODE (olddecl) == FUNCTION_DECL
  && fndecl_built_in_p (olddecl, BUILT_IN_NORMAL)
@@ -2137,6 +2140,13 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
  return false;
}
 }
+  /* Warn about enum/integer type mismatches.  They are compatible types
+ (C2X 6.7.2.2/5), but may pose portability problems.  */
+  else if (enum_and_int_p && TREE_CODE (newdecl) != TYPE_DECL)
+warned = warning_at (DECL_SOURCE_LOCATION (newdecl),
+OPT_Wenum_int_mismatch,
+"conflicting types for %q+D due to enum/integer "
+"mismatch; have %qT", newdecl, newtype);
 
   /* Redeclaration of a type is a constraint violation (6.7.2.3p1),
  but silently ignore the redeclaration if either is in a system
@@ -2146,7 +2156,6 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
   if (TREE_CODE (newdecl) == TYPE_DECL)
 {
   bool types_different = false;
-  int comptypes_result;
 
   comptypes_result
= comptypes_check_different_types (oldtype, newtype, &types_different);
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index c70f0ba5ab6..2bcb9662620 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -685,6 +685,7 @@ extern tree require_complete_type (location_t, tree);
 extern bool same_translation_unit_p (const_tree, const_tree);
 extern int comptypes (tree, tree);
 extern int comptypes_check_different_types (tree, tree, bool *);
+extern int comptypes_check_enum_int (tree, tree, bool *);
 extern bool c_vla_type_p (const_tree);
 extern bool c_mark_addressable (tree, bool = false);
 extern void c_incomplete_type_error (location_t, const_tree, const_tree);
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index bcfe08b82bc..4f3611f1b89 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -1055,7 +1055,7 @@ comptypes (tree type1, tree type2)
 /* Like comptypes, but if it returns non-zero because enum a

[PATCH v3] c, c++: -Wswitch warning on [[maybe_unused]] enumerator [PR105497]

2022-05-17 Thread Marek Polacek via Gcc-patches

On Tue, May 10, 2022 at 09:54:12AM -0400, Marek Polacek wrote:
> On Tue, May 10, 2022 at 08:58:46AM -0400, Jason Merrill wrote:
> > On 5/7/22 18:26, Marek Polacek wrote:
> > > Corrected version that avoids an uninitialized warning:
> > > 
> > > This PR complains that we emit the "enumeration value not handled in
> > > switch" warning even though the enumerator was marked with the
> > > [[maybe_unused]] attribute.
> > > 
> > > The first snag was that I couldn't just check TREE_USED, because
> > > the enumerator could have been used earlier in the function, which
> > > doesn't play well with the c_do_switch_warnings warning.  Instead,
> > > I had to check the attributes on the CONST_DECL directly, which led
> > > to the second, and worse, snag: in C we don't have direct access to
> > > the CONST_DECL for the enumerator.
> > 
> > I wonder if you want to change that instead of working around it?
> 
> I wouldn't mind looking into that; I've hit this discrepancy numerous
> times throughout the years and it'd be good to unify it so that the
> c-common code doesn't need to hack around it.
>  
> Let's see how far I'll get...
 
Now done (r13-575), which makes this patch a piece of cake.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR complains that we emit the "enumeration value not handled in
switch" warning even though the enumerator was marked with the
[[maybe_unused]] attribute.

I couldn't just check TREE_USED, because the enumerator could have been
used earlier in the function, which doesn't play well with the
c_do_switch_warnings warning.  Instead, I had to check the attributes on
the CONST_DECL.  This is easy since the TYPE_VALUES of an enum type are
now consistent between C and C++, both of which store the CONST_DECL in
its TREE_VALUE.

PR c++/105497

gcc/c-family/ChangeLog:

* c-warn.cc (c_do_switch_warnings): Don't warn about unhandled
enumerator when it was marked with attribute unused.

gcc/testsuite/ChangeLog:

* c-c++-common/Wswitch-1.c: New test.
* g++.dg/warn/Wswitch-4.C: New test.
---
 gcc/c-family/c-warn.cc | 11 +-
 gcc/testsuite/c-c++-common/Wswitch-1.c | 29 ++
 gcc/testsuite/g++.dg/warn/Wswitch-4.C  | 52 ++
 3 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wswitch-1.c
 create mode 100644 gcc/testsuite/g++.dg/warn/Wswitch-4.C

diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index cae89294aea..ea7335f3edf 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -1738,8 +1738,8 @@ c_do_switch_warnings (splay_tree cases, location_t 
switch_location,
   for (chain = TYPE_VALUES (type); chain; chain = TREE_CHAIN (chain))
 {
   tree value = TREE_VALUE (chain);
-  if (TREE_CODE (value) == CONST_DECL)
-   value = DECL_INITIAL (value);
+  tree attrs = DECL_ATTRIBUTES (value);
+  value = DECL_INITIAL (value);
   node = splay_tree_lookup (cases, (splay_tree_key) value);
   if (node)
{
@@ -1769,6 +1769,13 @@ c_do_switch_warnings (splay_tree cases, location_t 
switch_location,
   /* We've now determined that this enumerated literal isn't
 handled by the case labels of the switch statement.  */
 
+  /* Don't warn if the enumerator was marked as unused.  We can't use
+TREE_USED here: it could have been set on the enumerator if the
+enumerator was used earlier.  */
+  if (lookup_attribute ("unused", attrs)
+ || lookup_attribute ("maybe_unused", attrs))
+   continue;
+
   /* If the switch expression is a constant, we only really care
 about whether that constant is handled by the switch.  */
   if (cond && tree_int_cst_compare (cond, value))
diff --git a/gcc/testsuite/c-c++-common/Wswitch-1.c 
b/gcc/testsuite/c-c++-common/Wswitch-1.c
new file mode 100644
index 000..de9ee03b0a3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wswitch-1.c
@@ -0,0 +1,29 @@
+/* PR c++/105497 */
+/* { dg-options "-Wswitch" } */
+
+enum E {
+  A,
+  B,
+  C __attribute((unused)),
+  D
+};
+
+void
+g (enum E e)
+{
+  switch (e)
+{
+case A:
+case B:
+case D:
+  break;
+}
+
+  switch (e) // { dg-warning "not handled in switch" }
+{
+case A:
+case B:
+case C:
+  break;
+}
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wswitch-4.C 
b/gcc/testsuite/g++.dg/warn/Wswitch-4.C
new file mode 100644
index 000..553a57d777b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wswitch-4.C
@@ -0,0 +1,52 @@
+// PR c++/105497
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wswitch" }
+
+enum class Button
+{
+Left,
+Right,
+Middle,
+NumberOfButtons [[maybe_unused]]
+};
+
+enum class Sound
+{
+  Bark,
+  Meow,
+  Hiss,
+  Moo __attribute((unused))
+};
+
+enum class Chordata
+{
+  Urochordata,
+  Cephalochordata,
+  Vertebrata
+};
+
+int main()
+{
+  Button b = Button::Left;
+  switch

Re: [PATCH v2] rs6000: Prefer assigning the MMA vector operands to altivec registers [PR105556]

2022-05-17 Thread Segher Boessenkool

On Mon, May 16, 2022 at 05:31:31PM -0500, Peter Bergner wrote:
> On 5/10/22 5:35 PM, Segher Boessenkool wrote:
> > Out of interest, did you try using v,?wa (so just two alternatives, not
> > four)?  Or did you think it wouldresult in  measurably worse code?  Or
> > did you decide it is not such bad backend code size explosion after
> > all :-)
> 
> So I tried using just "v,?wa" instead of the 4 alternative "v,v,?d,?d"
> version and that fixes the performance issue too and is simpler too.
> The other option is "better", in that it can allow one operand to get
> a "v" reg when the other gets a "d" reg, but I think that's just a
> micro-optimization and not worth the extra complexity in the pattern.
> Thanks for the suggestion!

The difference is that "v,?wa" makes no difference between one or more
lower VSRs used.  But whenever you would see that there is so much
register allocation pressure already that it does not change anything
materially.

> gcc/
>   PR target/105556
>   * config/rs6000/mma.md (mma_, mma_, mma_, mma_,
>   mma_, mma_, mma_, mma_,
>   mma_, mma_, mma_, mma_,
>   mma_, mma_): Replace "wa" constraints with "v,?wa".
>   Update other operands accordingly.

>  (define_insn "mma_"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
> - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
> - (match_operand:V16QI 2 "vsx_register_operand" "wa")]
> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> + (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]

You now have two "?" on alternative 1, instead of just one.  This is the
same as if you had had
  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,??wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,wa")]
The "?" are per alternative, not really per operand.  It won't change
much here of course, just penalise more than you perhaps expected.

With or without that changed: okay for trunk and for 12 (after the usual
cooldown).  Thanks!  Also okay for 11 and 10, shoukd you want that later
anyway.

Segher

Go patch committed: load LHS subexpressions of op= only once

2022-05-17 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend loads LHS subexpressions of op=
assignment only once.  This avoids inconsistencies if the variables
are changed by evaluating the RHS.  This fixes
https://go.dev/issue/52811.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.  There is a test case in
the main Go repo that will be copied over at some point.

Ian
88c35c4d23e2458687508187601f8d2b8570bbe3
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index daa725f9de9..5fa8becde3e 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-f5bc28a30b7503015bbef38afb5812313184e822
+9d07072e58ca4f9f05343dfd3475b9f49dae5ec5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/statements.cc b/gcc/go/gofrontend/statements.cc
index 95fa3c48709..b3db843365e 100644
--- a/gcc/go/gofrontend/statements.cc
+++ b/gcc/go/gofrontend/statements.cc
@@ -1260,6 +1260,16 @@ Assignment_operation_statement::do_lower(Gogo*, 
Named_object*,
   Move_ordered_evals moe(b);
   this->lhs_->traverse_subexpressions(&moe);
 
+  // We can still be left with subexpressions that have to be loaded
+  // even if they don't have side effects themselves, in case the RHS
+  // changes variables named on the LHS.
+  int i;
+  if (this->lhs_->must_eval_subexpressions_in_order(&i))
+{
+  Move_subexpressions ms(i, b);
+  this->lhs_->traverse_subexpressions(&ms);
+}
+
   Expression* lval = this->lhs_->copy();
 
   Operator op;

[pushed] c++: constexpr ref to array of array [PR102307]

2022-05-17 Thread Jason Merrill via Gcc-patches

The problem here is that first check_initializer calls
build_aggr_init_full_exprs, which does overload resolution, but then in the
case of failed constexpr throws away the result and does it again in
build_functional_cast.  But in the first overload resolution,
reshape_init_array_1 decided to reuse the inner CONSTRUCTORs because
tf_error is set, so we know we're committed.  But the second pass gets
confused by the CONSTRUCTORs with non-init-list types.

Fixed by avoiding a second pass: instead, pass the call from build_aggr_init
to build_cplus_new, which will turn it into a TARGET_EXPR.  I don't bother
to change the object argument because it will be replaced later in
simplify_aggr_init_expr.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/102307

gcc/cp/ChangeLog:

* decl.cc (check_initializer): Use build_cplus_new in case of
constexpr failure.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-array2.C: New test.
---
 gcc/cp/decl.cc| 17 -
 gcc/testsuite/g++.dg/cpp1z/constexpr-array2.C | 12 
 2 files changed, 24 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-array2.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 5654bc754e6..381259cb9cf 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -7413,12 +7413,19 @@ check_initializer (tree decl, tree init, int flags, 
vec **cleanups)
  /* Declared constexpr or constinit, but no suitable initializer;
 massage init appropriately so we can pass it into
 store_init_value for the error.  */
- if (CLASS_TYPE_P (type)
- && (!init || TREE_CODE (init) == TREE_LIST))
+ tree new_init = NULL_TREE;
+ if (!processing_template_decl
+ && TREE_CODE (init_code) == CALL_EXPR)
+   new_init = build_cplus_new (type, init_code, tf_none);
+ else if (CLASS_TYPE_P (type)
+  && (!init || TREE_CODE (init) == TREE_LIST))
+   new_init = build_functional_cast (input_location, type,
+ init, tf_none);
+ if (new_init)
{
- init = build_functional_cast (input_location, type,
-   init, tf_none);
- if (TREE_CODE (init) == TARGET_EXPR)
+ init = new_init;
+ if (TREE_CODE (init) == TARGET_EXPR
+ && !(flags & LOOKUP_ONLYCONVERTING))
TARGET_EXPR_DIRECT_INIT_P (init) = true;
}
  init_code = NULL_TREE;
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-array2.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-array2.C
new file mode 100644
index 000..c30e3f2361d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-array2.C
@@ -0,0 +1,12 @@
+// PR c++/102307
+// { dg-do compile { target c++11 } }
+
+#include 
+template  struct Matrix {
+  constexpr Matrix(double const (&arr)[N][M]); // { dg-warning "never defined" 
}
+  constexpr Matrix(std::array, N> const &arr);
+};
+int main() {
+  constexpr Matrix<2, 3>
+mat {{ {1.0, 2.0, 3.0}, {4.0, 5.0, 6.0} }}; // { dg-error "before its 
definition" }
+}

base-commit: ed12749a3c9d9569a2c23df2e0db2136dcd3512d
-- 
2.27.0

Re: [PATCH] middle-end/105604 - snprintf dianostics and non-constant sizes/offsets

2022-05-17 Thread Martin Sebor via Gcc-patches


On 5/16/22 03:16, Richard Biener wrote:

The following tries to correct get_origin_and_offset_r not handling
non-constant sizes of array elements in ARRAY_REFs and non-constant
offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
should be treated in this API and existing handling isn't consistent
here either.  The following applies two different variants, treating
non-constant array sizes like non-constant array indices and
treating non-constant offsets of COMPONENT_REFs by terminating
the recursion (not sure what that means to the callers).

Basically the code failed to use component_ref_field_offset and
array_ref_element_size and instead relies on inappropriate
helpers (that shouldn't exist in the first place ...).  The code
is also not safe-guarded against overflows in the final offset/size
computations but I'm not trying to rectify that.

Martin - can you comment on how the API should handle such
situations?


It looks like the -Wrestrict warning here ignores offsets equal to
HOST_WIDE_INT_MIN so presumably setting dst_offset (via *fldoff) to
that should avoid it.  Or maybe to HWI_MAX as it does for variable
offsets.

It also looks like the function only handles constant offsets and
sizes, and I have a vague recollection of enhancing it to work with
ranges.  That should avoid the overflow problem too.

Martin



Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk and branches?

Thanks,
Richard.

2022-05-16  Richard Biener  

PR middle-end/105604
* gimple-ssa-sprintf.cc (get_origin_and_offset_r):
Handle non-constant ARRAY_REF element size and non-constant
COMPONENT_REF field offset.

* gcc.dg/torture/pr105604.c: New testcase.
---
  gcc/gimple-ssa-sprintf.cc   | 14 +++---
  gcc/testsuite/gcc.dg/torture/pr105604.c | 24 
  2 files changed, 35 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/torture/pr105604.c

diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
index c93f12f90b5..14e215ce69c 100644
--- a/gcc/gimple-ssa-sprintf.cc
+++ b/gcc/gimple-ssa-sprintf.cc
@@ -2312,14 +2312,16 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT *fldoff, 
HOST_WIDE_INT *fldsize,
HOST_WIDE_INT idx = (tree_fits_uhwi_p (offset)
 ? tree_to_uhwi (offset) : HOST_WIDE_INT_MAX);
  
+	tree elsz = array_ref_element_size (x);

tree eltype = TREE_TYPE (x);
if (TREE_CODE (eltype) == INTEGER_TYPE)
  {
if (off)
  *off = idx;
  }
-   else if (idx < HOST_WIDE_INT_MAX)
- *fldoff += idx * int_size_in_bytes (eltype);
+   else if (idx < HOST_WIDE_INT_MAX
+&& tree_fits_shwi_p (elsz))
+ *fldoff += idx * tree_to_shwi (elsz);
else
  *fldoff = idx;
  
@@ -2350,8 +2352,14 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT *fldoff, HOST_WIDE_INT *fldsize,
  
  case COMPONENT_REF:

{
+   tree foff = component_ref_field_offset (x);
tree fld = TREE_OPERAND (x, 1);
-   *fldoff += int_byte_position (fld);
+   if (!tree_fits_shwi_p (foff)
+   || !tree_fits_shwi_p (DECL_FIELD_BIT_OFFSET (fld)))
+ return x;
+   *fldoff += (tree_to_shwi (foff)
+   + (tree_to_shwi (DECL_FIELD_BIT_OFFSET (fld))
+  / BITS_PER_UNIT));
  
  	get_origin_and_offset_r (fld, fldoff, fldsize, off);

x = TREE_OPERAND (x, 0);
diff --git a/gcc/testsuite/gcc.dg/torture/pr105604.c 
b/gcc/testsuite/gcc.dg/torture/pr105604.c
new file mode 100644
index 000..b002251df10
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr105604.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wall" } */
+
+struct {
+  long users;
+  long size;
+  char *data;
+} * main_trans;
+void *main___trans_tmp_1;
+int sprintf(char *, char *, ...);
+int main() {
+  int users = 0;
+  struct {
+long users;
+long size;
+char *data;
+int links[users];
+char buf[];
+  } *trans = trans;
+  trans->data = trans->buf;
+  main___trans_tmp_1 = trans;
+  main_trans = main___trans_tmp_1;
+  sprintf(main_trans->data, "test");
+}

Re: [PATCH] c: use CONST_DECL for enumerators in TYPE_VALUES

2022-05-17 Thread Marek Polacek via Gcc-patches

On Tue, May 17, 2022 at 02:59:00PM -0700, Ian Lance Taylor wrote:
> On Tue, May 17, 2022 at 2:46 PM Marek Polacek  wrote:
> >
> > On Tue, May 17, 2022 at 09:35:14PM +, Joseph Myers wrote:
> > > On Tue, 17 May 2022, Marek Polacek via Gcc-patches wrote:
> > >
> > > > The C and C++ FEs differ in TYPE_VALUES for an enum type: an entry in
> > > > the list in the C++ FE has a CONST_DECL in the TREE_VALUE, but the C FE
> > > > has only the numerical value of the CONST_DECL there.  This has caused
> > > > me some trouble in my PR105497 patch.  Using a CONST_DECL is preferable
> > > > because a CONST_DECL can track more information (e.g., attributes), and
> > > > you can always get the value simply by looking at its DECL_INITIAL.
> > > >
> > > > This turned out to be a trivial change.  One place in godump.cc had to 
> > > > be
> > > > adjusted.  I'm not changing the CONST_DECL check in c_do_switch_warnings
> > > > because I'll be changing it soon in my next patch.  I didn't see any 
> > > > other
> > > > checks that this patch makes redundant.
> > > >
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > >
> > > > gcc/c/ChangeLog:
> > > >
> > > > * c-decl.cc (finish_enum): Store the CONST_DECL into TREE_VALUE, not
> > > > its value.
> > >
> > > The C front-end changes are OK.
> >
> > Thanks.  Ian, are the (more or less obvious) godump.cc changes also OK?
> 
> Yes, that change is OK (assuming it works).  Thanks.

Thanks.  It still works, the code in question is tested by e.g.
testsuite/gcc.misc-tests/godump-1.c.

Marek

Re: [PATCH] c: use CONST_DECL for enumerators in TYPE_VALUES

2022-05-17 Thread Ian Lance Taylor via Gcc-patches

On Tue, May 17, 2022 at 2:46 PM Marek Polacek  wrote:
>
> On Tue, May 17, 2022 at 09:35:14PM +, Joseph Myers wrote:
> > On Tue, 17 May 2022, Marek Polacek via Gcc-patches wrote:
> >
> > > The C and C++ FEs differ in TYPE_VALUES for an enum type: an entry in
> > > the list in the C++ FE has a CONST_DECL in the TREE_VALUE, but the C FE
> > > has only the numerical value of the CONST_DECL there.  This has caused
> > > me some trouble in my PR105497 patch.  Using a CONST_DECL is preferable
> > > because a CONST_DECL can track more information (e.g., attributes), and
> > > you can always get the value simply by looking at its DECL_INITIAL.
> > >
> > > This turned out to be a trivial change.  One place in godump.cc had to be
> > > adjusted.  I'm not changing the CONST_DECL check in c_do_switch_warnings
> > > because I'll be changing it soon in my next patch.  I didn't see any other
> > > checks that this patch makes redundant.
> > >
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > >
> > > gcc/c/ChangeLog:
> > >
> > > * c-decl.cc (finish_enum): Store the CONST_DECL into TREE_VALUE, not
> > > its value.
> >
> > The C front-end changes are OK.
>
> Thanks.  Ian, are the (more or less obvious) godump.cc changes also OK?

Yes, that change is OK (assuming it works).  Thanks.

Ian

Re: [PATCH] c: use CONST_DECL for enumerators in TYPE_VALUES

2022-05-17 Thread Marek Polacek via Gcc-patches

On Tue, May 17, 2022 at 09:35:14PM +, Joseph Myers wrote:
> On Tue, 17 May 2022, Marek Polacek via Gcc-patches wrote:
> 
> > The C and C++ FEs differ in TYPE_VALUES for an enum type: an entry in
> > the list in the C++ FE has a CONST_DECL in the TREE_VALUE, but the C FE
> > has only the numerical value of the CONST_DECL there.  This has caused
> > me some trouble in my PR105497 patch.  Using a CONST_DECL is preferable
> > because a CONST_DECL can track more information (e.g., attributes), and
> > you can always get the value simply by looking at its DECL_INITIAL.
> > 
> > This turned out to be a trivial change.  One place in godump.cc had to be
> > adjusted.  I'm not changing the CONST_DECL check in c_do_switch_warnings
> > because I'll be changing it soon in my next patch.  I didn't see any other
> > checks that this patch makes redundant.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > gcc/c/ChangeLog:
> > 
> > * c-decl.cc (finish_enum): Store the CONST_DECL into TREE_VALUE, not
> > its value.
> 
> The C front-end changes are OK.

Thanks.  Ian, are the (more or less obvious) godump.cc changes also OK?

Marek

Re: [PATCH] c: use CONST_DECL for enumerators in TYPE_VALUES

2022-05-17 Thread Joseph Myers

On Tue, 17 May 2022, Marek Polacek via Gcc-patches wrote:

> The C and C++ FEs differ in TYPE_VALUES for an enum type: an entry in
> the list in the C++ FE has a CONST_DECL in the TREE_VALUE, but the C FE
> has only the numerical value of the CONST_DECL there.  This has caused
> me some trouble in my PR105497 patch.  Using a CONST_DECL is preferable
> because a CONST_DECL can track more information (e.g., attributes), and
> you can always get the value simply by looking at its DECL_INITIAL.
> 
> This turned out to be a trivial change.  One place in godump.cc had to be
> adjusted.  I'm not changing the CONST_DECL check in c_do_switch_warnings
> because I'll be changing it soon in my next patch.  I didn't see any other
> checks that this patch makes redundant.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> gcc/c/ChangeLog:
> 
>   * c-decl.cc (finish_enum): Store the CONST_DECL into TREE_VALUE, not
>   its value.

The C front-end changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] c: use CONST_DECL for enumerators in TYPE_VALUES

2022-05-17 Thread Marek Polacek via Gcc-patches

The C and C++ FEs differ in TYPE_VALUES for an enum type: an entry in
the list in the C++ FE has a CONST_DECL in the TREE_VALUE, but the C FE
has only the numerical value of the CONST_DECL there.  This has caused
me some trouble in my PR105497 patch.  Using a CONST_DECL is preferable
because a CONST_DECL can track more information (e.g., attributes), and
you can always get the value simply by looking at its DECL_INITIAL.

This turned out to be a trivial change.  One place in godump.cc had to be
adjusted.  I'm not changing the CONST_DECL check in c_do_switch_warnings
because I'll be changing it soon in my next patch.  I didn't see any other
checks that this patch makes redundant.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c/ChangeLog:

* c-decl.cc (finish_enum): Store the CONST_DECL into TREE_VALUE, not
its value.

gcc/ChangeLog:

* godump.cc (go_output_typedef): Use the DECL_INITIAL of the TREE_VALUE.
---
 gcc/c/c-decl.cc | 4 +++-
 gcc/godump.cc   | 9 +
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index e49879ab286..83655548fc4 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9253,7 +9253,9 @@ finish_enum (tree enumtype, tree values, tree attributes)
 
  DECL_INITIAL (enu) = ini;
  TREE_PURPOSE (pair) = DECL_NAME (enu);
- TREE_VALUE (pair) = ini;
+ /* To match the C++ FE, store the CONST_DECL rather than just its
+value.  */
+ TREE_VALUE (pair) = enu;
}
 
   TYPE_VALUES (enumtype) = values;
diff --git a/gcc/godump.cc b/gcc/godump.cc
index 2ae0bcc9672..c0f52bbd0f2 100644
--- a/gcc/godump.cc
+++ b/gcc/godump.cc
@@ -1114,6 +1114,7 @@ go_output_typedef (class godump_container *container, 
tree decl)
  struct macro_hash_value *mhval;
  void **slot;
  char buf[WIDE_INT_PRINT_BUFFER_SIZE];
+ tree value = DECL_INITIAL (TREE_VALUE (element));
 
  name = IDENTIFIER_POINTER (TREE_PURPOSE (element));
 
@@ -1127,12 +1128,12 @@ go_output_typedef (class godump_container *container, 
tree decl)
  if (*slot != NULL)
macro_hash_del (*slot);
 
- if (tree_fits_shwi_p (TREE_VALUE (element)))
+ if (tree_fits_shwi_p (value))
snprintf (buf, sizeof buf, HOST_WIDE_INT_PRINT_DEC,
-tree_to_shwi (TREE_VALUE (element)));
- else if (tree_fits_uhwi_p (TREE_VALUE (element)))
+tree_to_shwi (value));
+ else if (tree_fits_uhwi_p (value))
snprintf (buf, sizeof buf, HOST_WIDE_INT_PRINT_UNSIGNED,
- tree_to_uhwi (TREE_VALUE (element)));
+ tree_to_uhwi (value));
  else
print_hex (wi::to_wide (element), buf);
 

base-commit: b7501739f3b14ac7749aace93f636d021fd607f7
-- 
2.36.1

[committed] libstdc++: Relax memory ordering for default memory resource object

2022-05-17 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

Currently pmr::set_default_resource and pmr::get_default_resource both
use sequentially consistent memory ordering. This is overkill. The
standard only requires that a call to set_default_resource synchronizes
with subsequent calls to set_default_resource and get_default_resource.

Using acquire-release for the setter and acquire for the getter is
sufficient to meet the requirement.

Reviewed-by: Thomas Rodgers  

libstdc++-v3/ChangeLog:

* src/c++17/memory_resource.cc (set_default_resource): Use
memory_order_acq_rel.
(get_default_resource): Use memory_order_acquire.
---
 libstdc++-v3/src/c++17/memory_resource.cc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index bb6334c9694..8bc55a69f1f 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -112,13 +112,13 @@ namespace pmr
   mutex mx;
   memory_resource* val;
 
-  memory_resource* load()
+  memory_resource* load(std::memory_order)
   {
lock_guard lock(mx);
return val;
   }
 
-  memory_resource* exchange(memory_resource* r)
+  memory_resource* exchange(memory_resource* r, std::memory_order)
   {
lock_guard lock(mx);
return std::__exchange(val, r);
@@ -134,12 +134,12 @@ namespace pmr
 
   memory_resource* val;
 
-  memory_resource* load() const
+  memory_resource* load(std::memory_order) const
   {
return val;
   }
 
-  memory_resource* exchange(memory_resource* r)
+  memory_resource* exchange(memory_resource* r, std::memory_order)
   {
return std::__exchange(val, r);
   }
@@ -166,12 +166,12 @@ namespace pmr
   {
 if (r == nullptr)
   r = new_delete_resource();
-return default_res.obj.exchange(r);
+return default_res.obj.exchange(r, std::memory_order_acq_rel);
   }
 
   memory_resource*
   get_default_resource() noexcept
-  { return default_res.obj.load(); }
+  { return default_res.obj.load(std::memory_order_acquire); }
 
   // Member functions for std::pmr::monotonic_buffer_resource
 
-- 
2.34.3

[committed] libstdc++: Add attributes to functions in

2022-05-17 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

Add attributes to the accessors for the global memory resource objects,
to allow the compiler to eliminate redundant calls to them. For example,
multiple calls to std::pmr::new_delete_resource() will always return the
same object, and so the compiler can replace them with a single call.

Ideally we would like adjacent calls to std::pmr::get_default_resource()
to be combined into a single call by the CSE pass. The 'pure' attribute
would permit that. However, the standard requires that calls to
std::pmr::set_default_resource() synchronize with subsequent calls to
std::pmr::get_default_resource().  With 'pure' the DCE pass might
eliminate seemingly redundant calls to std::pmr::get_default_resource().
That might be unsafe, because the caller might be relying on the
associated synchronization. We could use a hypothetical attribute that
allows CSE but not DCE, but we don't have one. So it can't be 'pure'.

Also add [[nodiscard]] to equality operators.

libstdc++-v3/ChangeLog:

* include/std/memory_resource (new_delete_resource): Add
nodiscard, returns_nonnull and const attributes.
(null_memory_resource): Likewise.
(set_default_resource, get_default_resource): Add returns_nonnull
attribute.
(memory_resource::is_equal): Add nodiscard attribute.
(operator==, operator!=): Likewise.
---
 libstdc++-v3/include/std/memory_resource | 30 
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/memory_resource 
b/libstdc++-v3/include/std/memory_resource
index 4a18d3e8598..88e8abd60fa 100644
--- a/libstdc++-v3/include/std/memory_resource
+++ b/libstdc++-v3/include/std/memory_resource
@@ -74,11 +74,26 @@ namespace pmr
 #endif
 
   // Global memory resources
-  memory_resource* new_delete_resource() noexcept;
-  memory_resource* null_memory_resource() noexcept;
-  memory_resource* set_default_resource(memory_resource* __r) noexcept;
-  memory_resource* get_default_resource() noexcept
-__attribute__((__returns_nonnull__));
+
+  /// A pmr::memory_resource that uses `new` to allocate memory
+  [[nodiscard, __gnu__::__returns_nonnull__, __gnu__::__const__]]
+  memory_resource*
+  new_delete_resource() noexcept;
+
+  /// A pmr::memory_resource that always throws `bad_alloc`
+  [[nodiscard, __gnu__::__returns_nonnull__, __gnu__::__const__]]
+  memory_resource*
+  null_memory_resource() noexcept;
+
+  /// Replace the default memory resource pointer
+  [[__gnu__::__returns_nonnull__]]
+  memory_resource*
+  set_default_resource(memory_resource* __r) noexcept;
+
+  /// Get the current default memory resource pointer
+  [[__gnu__::__returns_nonnull__]]
+  memory_resource*
+  get_default_resource() noexcept;
 
   // Pool resource classes
   struct pool_options;
@@ -111,6 +126,7 @@ namespace pmr
 __attribute__((__nonnull__))
 { return do_deallocate(__p, __bytes, __alignment); }
 
+[[nodiscard]]
 bool
 is_equal(const memory_resource& __other) const noexcept
 { return do_is_equal(__other); }
@@ -126,11 +142,13 @@ namespace pmr
 do_is_equal(const memory_resource& __other) const noexcept = 0;
   };
 
+  [[nodiscard]]
   inline bool
   operator==(const memory_resource& __a, const memory_resource& __b) noexcept
   { return &__a == &__b || __a.is_equal(__b); }
 
 #if __cpp_impl_three_way_comparison < 201907L
+  [[nodiscard]]
   inline bool
   operator!=(const memory_resource& __a, const memory_resource& __b) noexcept
   { return !(__a == __b); }
@@ -369,6 +387,7 @@ namespace pmr
 };
 
   template
+[[nodiscard]]
 inline bool
 operator==(const polymorphic_allocator<_Tp1>& __a,
   const polymorphic_allocator<_Tp2>& __b) noexcept
@@ -376,6 +395,7 @@ namespace pmr
 
 #if __cpp_impl_three_way_comparison < 201907L
   template
+[[nodiscard]]
 inline bool
 operator!=(const polymorphic_allocator<_Tp1>& __a,
   const polymorphic_allocator<_Tp2>& __b) noexcept
-- 
2.34.3

[committed] libstdc++: Add attributes to and related

2022-05-17 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

Add the const attribute to std::future_category() and
std::iostream_category(), to match the existing attributes on
std::generic_category() and std::system_category().

Also add [[nodiscard]] to those functions and to the comparison
operators for std::error_code and std::error_condition, and to
std::make_error_code and std::make_error_condition overloads.

libstdc++-v3/ChangeLog:

* include/bits/ios_base.h (io_category): Add const and nodiscard
attributes.
(make_error_code, make_error_condition): Add nodiscard.
* include/std/future (future_category): Add const and nodiscard.
(make_error_code, make_error_condition): Add nodiscard.
* include/std/system_error (generic_category system_category):
Add nodiscard. Replace _GLIBCXX_CONST with C++11 attribute.
(error_code::value, error_code::category, error_code::operator bool)
(error_condition::value, error_condition::category)
(error_condition::operator bool, make_error_code)
(make_error_condition, operator==, operator!=, operator<=>): Add
nodiscard.
---
 libstdc++-v3/include/bits/ios_base.h  |  6 +-
 libstdc++-v3/include/std/future   |  3 +++
 libstdc++-v3/include/std/system_error | 23 +--
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/ios_base.h 
b/libstdc++-v3/include/bits/ios_base.h
index bdb30140536..e34097171a5 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -205,12 +205,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template <> struct is_error_code_enum : public true_type { };
 
-  const error_category& iostream_category() noexcept;
+  [[__nodiscard__, __gnu__::__const__]]
+  const error_category&
+  iostream_category() noexcept;
 
+  [[__nodiscard__]]
   inline error_code
   make_error_code(io_errc __e) noexcept
   { return error_code(static_cast(__e), iostream_category()); }
 
+  [[__nodiscard__]]
   inline error_condition
   make_error_condition(io_errc __e) noexcept
   { return error_condition(static_cast(__e), iostream_category()); }
diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index f7de8ddb0bc..a925d03d19c 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -82,15 +82,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct is_error_code_enum : public true_type { };
 
   /// Points to a statically-allocated object derived from error_category.
+  [[__nodiscard__, __gnu__::__const__]]
   const error_category&
   future_category() noexcept;
 
   /// Overload of make_error_code for `future_errc`.
+  [[__nodiscard__]]
   inline error_code
   make_error_code(future_errc __errc) noexcept
   { return error_code(static_cast(__errc), future_category()); }
 
   /// Overload of make_error_condition for `future_errc`.
+  [[__nodiscard__]]
   inline error_condition
   make_error_condition(future_errc __errc) noexcept
   { return error_condition(static_cast(__errc), future_category()); }
diff --git a/libstdc++-v3/include/std/system_error 
b/libstdc++-v3/include/std/system_error
index 95508da73dd..87cf720f6e3 100644
--- a/libstdc++-v3/include/std/system_error
+++ b/libstdc++-v3/include/std/system_error
@@ -153,12 +153,14 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
 equivalent(const error_code& __code, int __i) const noexcept;
 
 /// An error_category only compares equal to itself.
+[[__nodiscard__]]
 bool
 operator==(const error_category& __other) const noexcept
 { return this == &__other; }
 
 /// Ordered comparison that defines a total order for error categories.
 #if __cpp_lib_three_way_comparison
+[[nodiscard]]
 strong_ordering
 operator<=>(const error_category& __rhs) const noexcept
 { return std::compare_three_way()(this, &__rhs); }
@@ -176,10 +178,14 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
   // DR 890.
 
   /// Error category for `errno` error codes.
-  _GLIBCXX_CONST const error_category& generic_category() noexcept;
+  [[__nodiscard__, __gnu__::__const__]]
+  const error_category&
+  generic_category() noexcept;
 
   /// Error category for other error codes defined by the OS.
-  _GLIBCXX_CONST const error_category& system_category() noexcept;
+  [[__nodiscard__, __gnu__::__const__]]
+  const error_category&
+  system_category() noexcept;
 
   /// @}
 
@@ -241,10 +247,12 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
   { return *this = make_error_code(__e); }
 
 /// The error value.
+[[__nodiscard__]]
 int
 value() const noexcept { return _M_value; }
 
 /// The error category that this error belongs to.
+[[__nodiscard__]]
 const error_category&
 category() const noexcept { return *_M_cat; }
 
@@ -259,6 +267,7 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
 { return category().message(value()); }
 
 /// Test whether `value()` is non-zero.
+[[__nodisc

[committed] Revert 'Use more ARRAY_SIZE.' for mkoffload (was: [PATCH] Use more ARRAY_SIZE.)

2022-05-17 Thread Tobias Burnus


Hi Martin,

On 16.05.22 10:39, Martin Liška wrote:

All right, CCing the following maintainers for other parts:

- David for JIT and Analyzer
- Tobias for Fortran part
- Jason for C-family part


Sorry for having missed that review request – and thanks to Mikael for
doing the review!

And thanks for the patch in general.


There are not further comments from the remaining C-family part so I'm going
to install the patch.


This patch broke offloading – fixed by reverting the patch for
{gcn,nvptx}/mkoffload.cc – and committed as obvious. Changing a C-code
generating string without telling the then called C compiler about the
macro won't fly.  See attachment for
r13-569-gc9852156dd2fedec130f6d8eb669579ef6237946, which reverts it for
mkoffload.cc, only.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit c9852156dd2fedec130f6d8eb669579ef6237946
Author: Tobias Burnus 
Date:   Tue May 17 20:46:29 2022 +0200

Revert 'Use more ARRAY_SIZE.' for mkoffload

Revert commit r13-472-gca32b29ec3e92dcf8dda5c2501d0baf9dd1cb09d partially;
namely for {gcn,nvptx}/mkoffload.cc, only.

The patch changed 'sizeof(...)/sizeof(...[0])' to the 'ARRAY_SIZE' macro,
which is in principle a good idea – except that in the two mkoffload.cc,
the change happened inside a string that is used to generate plain C code.

With offlading to nvptx or gcn, the mkoffload genenates then the C file
and compilation of the latter fails with
"warning: implicit declaration of function 'ARRAY_SIZE'" followed by
"error: initializer element is not constant"

gcc/
* config/gcn/mkoffload.cc (process_obj): Revert: Use ARRAY_SIZE.
* config/nvptx/mkoffload.cc (process): Likewise.
---
 gcc/config/gcn/mkoffload.cc   | 2 +-
 gcc/config/nvptx/mkoffload.cc | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 9db2124b129..94ba7ffa5af 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -723,7 +723,7 @@ process_obj (FILE *in, FILE *cfile)
 	   "  unsigned global_variable_count;\n"
 	   "} target_data = {\n"
 	   "  &gcn_image,\n"
-	   "  ARRAY_SIZE (gcn_kernels),\n"
+	   "  sizeof (gcn_kernels) / sizeof (gcn_kernels[0]),\n"
 	   "  gcn_kernels,\n"
 	   "  gcn_num_vars\n"
 	   "};\n\n");
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index fa3b4b76821..b28c1a32292 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -316,11 +316,11 @@ process (FILE *in, FILE *out)
 	   "  const struct nvptx_fn *fn_names;\n"
 	   "  unsigned fn_num;\n"
 	   "} target_data = {\n"
-	   "  ptx_objs, ARRAY_SIZE (ptx_objs),\n"
+	   "  ptx_objs, sizeof (ptx_objs) / sizeof (ptx_objs[0]),\n"
 	   "  var_mappings,"
-	   "  ARRAY_SIZE (var_mappings),\n"
+	   "  sizeof (var_mappings) / sizeof (var_mappings[0]),\n"
 	   "  func_mappings,"
-	   "  ARRAY_SIZE (func_mappings)\n"
+	   "  sizeof (func_mappings) / sizeof (func_mappings[0])\n"
 	   "};\n\n");
 
   fprintf (out, "#ifdef __cplusplus\n"

[PATCH] PR tree-optimization/31178 - Add rshift side effect.

2022-05-17 Thread Andrew MacLeod via Gcc-patches

This patch implements side effects of the second operand of a shift 
operation.


given A >> B or A << B, the range of B is restricted to [0, PRECISION_A).

Fortran is currently more permissive than this, allowing the range to be 
[0, PRECISION_A], so this si the value we currently default to in this 
patch.  If the fortran front end were adjusted, we could adjust the end 
point.


This currently bootstraps with no regressions on x86_64-pc-linux-gnu.

Is this sufficient, or should I also be checking some other flags which 
may allow other values outside this range to be valid?


Andrew


PS. Note that in the testcase,  one of the tests is currently disabled 
as full recomputation of side-effects is not quite in place yet. WHen ti 
is, I will enable the test.


commit e283395a570328874d3215893c7781fd2770d87f
Author: Andrew MacLeod 
Date:   Mon Apr 4 16:26:15 2022 -0400

Add rshift side effect.

After a shift operation, we can make deductions about the bounds of the shift
value based on the precision of the value being shifted.

Fortran is currently more permissive than the other front ends, so we set the
range of B in A >> B to [0, PRECISION_A] rather than [0, PRECISION_A) that the
other front ends require.

gcc/
PR tree-optimization/31178
* gimple-range-side-effect.cc (stmt_side_effects::stmt_side_effects):
Add suport for LSHIFT_EXPR and RSHIFT_EXPR.

gcc/testsuite/
* gcc.dg/tree-ssa/pr31178.c: New.

diff --git a/gcc/gimple-range-side-effect.cc b/gcc/gimple-range-side-effect.cc
index 548e4bea313..fdd5fdc296d 100644
--- a/gcc/gimple-range-side-effect.cc
+++ b/gcc/gimple-range-side-effect.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimple-walk.h"
 #include "cfganal.h"
+#include "stor-layout.h"		// for element_precision()
 
 // Adapted from infer_nonnull_range_by_dereference and check_loadstore
 // to process nonnull ssa_name OP in S.  DATA contains a pointer to a
@@ -129,6 +130,22 @@ stmt_side_effects::stmt_side_effects (gimple *s)
 	  if (gimple_range_ssa_p (gimple_assign_rhs2 (s)))
 	add_nonzero (gimple_assign_rhs2 (s));
 	  break;
+
+	case LSHIFT_EXPR:
+	case RSHIFT_EXPR:
+	  if (gimple_range_ssa_p (gimple_assign_rhs2 (s)))
+	{
+	  // A << B, A >>B implies [0, PRECISION_of_A)
+	  tree op1_type = TREE_TYPE (gimple_assign_rhs1 (s));
+	  tree op2_type = TREE_TYPE (gimple_assign_rhs2 (s));
+	  tree l = build_int_cst (op2_type, 0);
+	  // C is [0, N), but fortran is [0, N], so default to [0, N].
+	  tree u = build_int_cst (op2_type, element_precision (op1_type));
+	  int_range_max shift (l, u);
+	  add_range (gimple_assign_rhs2 (s), shift);
+	}
+	  break;
+
 	default:
 	  break;
 	}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr31178.c b/gcc/testsuite/gcc.dg/tree-ssa/pr31178.c
new file mode 100644
index 000..27c72fb7104
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr31178.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp " } */
+
+/* Side effects of divide are that the divisor cannot be 0. */
+
+#include "limits.h"
+void dead (int);
+
+void
+f1 (int a, int c) {
+  int b = a;
+  if ((a << c) > 100)
+b = c;
+
+  // Fortran allows [0, sizeof(int)] , so that is GCC default for now.
+  if (c < 0 || c > sizeof(int) * CHAR_BIT)
+dead (b);
+}
+
+#if 0
+/* Until we get recomputation of a side effect value working, ... */
+
+int 
+f2 (int a, int c) {
+  int nz = (c < 0 || c > sizeof(int) * CHAR_BIT);
+  int b = a >> c;
+  if (nz)
+dead (0);
+  return b;
+}
+#endif
+
+/* { dg-final { scan-tree-dump-not "dead" "evrp" } } */

[PATCH] Add divide by zero side effect.

2022-05-17 Thread Andrew MacLeod via Gcc-patches

I haven't checked this patch in yet.  This implements a side effect that 
the divisor cannot be 0 after a divide executes. This allows us to fold 
the divide away:


a = b / c;
if (c == 0)
  dead();

This bootstraps on x86_64-pc-linux-gnu with no regressions, but I first 
wanted to check to see if there are some flags or conditions that should 
e checked in order NOT to do this optimization.  I am guessing there is 
probably something :-)    Anyway, this is how we straightforwardly add 
side effects now.


Does the patch conditions need tweaking to apply the side effect?

Andrew

commit 3bbcccf2ddd4d50cc5febf630bd8b55a45688352
Author: Andrew MacLeod 
Date:   Mon Apr 4 16:13:57 2022 -0400

Add divide by zero side effect.

After a divide, we know the divisor is not zero.

gcc/
* gimple-range-side-effect.cc (stmt_side_effects::stmt_side_effects):
Add support for all divides.

gcc/testsuite/
* gcc.dg/tree-ssa/evrp-zero.c: New.

diff --git a/gcc/gimple-range-side-effect.cc b/gcc/gimple-range-side-effect.cc
index 2c8c77dc569..548e4bea313 100644
--- a/gcc/gimple-range-side-effect.cc
+++ b/gcc/gimple-range-side-effect.cc
@@ -116,6 +116,23 @@ stmt_side_effects::stmt_side_effects (gimple *s)
 walk_stmt_load_store_ops (s, (void *)this, non_null_loadstore,
 			  non_null_loadstore);
 
+  if (is_a (s))
+{
+  switch (gimple_assign_rhs_code (s))
+	{
+	case TRUNC_DIV_EXPR:
+	case CEIL_DIV_EXPR:
+	case FLOOR_DIV_EXPR:
+	case ROUND_DIV_EXPR:
+	case EXACT_DIV_EXPR:
+	  // Divide means operand 2 is not zero after this stmt.
+	  if (gimple_range_ssa_p (gimple_assign_rhs2 (s)))
+	add_nonzero (gimple_assign_rhs2 (s));
+	  break;
+	default:
+	  break;
+	}
+}
 }
 
 // -
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp-zero.c b/gcc/testsuite/gcc.dg/tree-ssa/evrp-zero.c
new file mode 100644
index 000..2b76e449c9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp-zero.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp " } */
+
+/* Side effects of divide are that the divisor cannot be 0. */
+
+void dead (int);
+
+void
+f1 (int a, int c) {
+  int b = a;
+  if (a / c > 10)
+b = c;
+
+  if (c == 0)
+dead (b);
+}
+
+
+void
+f2 (int a, int c) {
+  int nz = c == 0;
+  int b = a / c;
+  if (nz)
+dead (0);
+}
+
+
+/* { dg-final { scan-tree-dump-not "dead" "evrp" } } */

[COMMITTED] Add ranger side effect infrastructure.

2022-05-17 Thread Andrew MacLeod via Gcc-patches

This patch replaces the old non-null processing mechanism in ranger with 
generic side-effect processing.


The way it use to work:
- The first time a query for non-nullness was made on an ssa-name, a 
quick pass over the immediate use lists was made.
- This checked each use for triggering the non-null property, and if it 
did, a but was set in a sparse bitmap for that block.
- This was considered to be true on exit from the block, so whenever a 
query for a range crossed a block boundry, the ~[0,0] range was applied 
to the current range as appropriate.


How it works now.

- There are 2 new classes in gimple-range-side-effect.h

 * side_effect_manager  : maintains a list of "side-effect" ranges for
   each block. These are also considered to apply upon exit from a
   block in much the same way the bitmap use to work, except it isn't
   limited to non-null.
 * stmt_side_effects : This is the statement level list of side effects
   on a stmt.


The side effect managed is maintained with rangers cache, and is 
utilized via GORI and ranger itself to apply any side effects that have 
been registered as appropriate. Its operation is relatively transparent, 
and you can forget it exists for the most part.


When ranger is doing a DOM-walk, as in VRP/EVRP, whenever a stmt is 
'finalized', any side effects are registered immediately in the on-entry 
cache for that block, and any subsequent uses within the block of that 
name will have the side effects registered.  Any pure on-demand clients 
will only get that benefit on exit from the block.


the stmt_side_effect class is a lightweight class that is constructed 
from a stmt.  Once constructed, it becomes a simple list of names/ranges 
accessed via 3 methods, num(), name(), and range().   ie:


stmt_side_effects se (stmt);
for (unsigned x = 0; x < se.num (); x++)
  process (se.name (x), se.range (x));


To keep it lightweight, a small internal vector (currently 10 items) of 
side effects is maintained.


The constructor for this class contains all the "smarts" for finding 
side effects


There is also an option (controlled via a constructor flag) in ranger to 
continue using the on-demand mechanism of scanning the immediate uses 
lists with the side-effect manager, as some clients, such as threading, 
do not process every statement, but still want as accurate information 
as possible.  Otherwise there is no following of imm-use chains anywhere.


This patch uses this mechanism to replace the non-null processing, the 
following 2 patches implement other side effects as simple examples of 
how its used.


The overhead of doing this generically is fairly low. 1% in EVRP and 
<0.5% everywhere else.


Bootstraps on x86_64-pc-linux-gnu with no regressions. pushed.

Andrew






commit b7501739f3b14ac7749aace93f636d021fd607f7
Author: Andrew MacLeod 
Date:   Mon May 9 15:35:14 2022 -0400

Add side effect infrastructure.

Replace the non-null procesing with a generic side effect implementation that
can handle arbitrary side effects.

* Makefile.in (OBJS): Add gimple-range-side-effect.o.
* gimple-range-cache.cc (non_null_ref::non_null_ref): Delete.
(non_null_ref::~non_null_ref): Delete.
(non_null_ref::set_nonnull): Delete.
(non_null_ref::non_null_deref_p): Delete.
(non_null_ref::process_name): Delete.
(ranger_cache::ranger_cache): Initialize m_exit object.
(ranger_cache::fill_block_cache): Use m_exit object intead of nonnull.
(ranger_cache::range_from_dom): Use side_effect class and m_exit object.
(ranger_cache::update_to_nonnull): Delete.
(non_null_loadstore): Delete.
(ranger_cache::block_apply_nonnull): Delete.
(ranger_cache::apply_side_effects): New.
* gimple-range-cache.h (class non_null_ref): Delete.
(non_null_ref::adjust_range): Delete.
(class ranger_cache): Adjust prototypes, add side effect manager.
* gimple-range-path.cc (path_range_query::range_defined_in_block): Use
side effect manager for queries.
(path_range_query::adjust_for_non_null_uses): Ditto.
* gimple-range-path.h (class path_range_query): Delete non_null_ref.
* gimple-range-side-effect.cc: New.
* gimple-range-side-effect.h: New.
* gimple-range.cc (gimple_ranger::gimple_ranger): Update contructor.
(gimple_ranger::range_of_expr): Check def block for override value.
(gimple_ranger::range_on_entry): Don't scan dominators for non-null.
(gimple_ranger::range_on_edge): Check for outgoing side-effects.
(gimple_ranger::register_side_effects): Call apply_side_effects.
(enable_ranger): Update contructor.
* gimple-range.h (class gimple_ranger): Update prototype.
(enable_ranger): Update prototype.
* tree-vrp.cc (execute_ranger_vrp)

Re: [Patch] OpenMP: Skip target-nesting warning for reverse offload

2022-05-17 Thread Jakub Jelinek via Gcc-patches

On Mon, May 16, 2022 at 05:14:12PM +0200, Tobias Burnus wrote:
> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -3883,6 +3883,16 @@ check_omp_nesting_restrictions (gimple *stmt, 
> omp_context *ctx)
>   }
> else
>   {
> +   if ((gimple_omp_target_kind (ctx->stmt)
> +== GF_OMP_TARGET_KIND_REGION)
> +   && (gimple_omp_target_kind (stmt)
> +   == GF_OMP_TARGET_KIND_REGION)
> +   && ((c = omp_find_clause (
> +  gimple_omp_target_clauses (stmt),
> +  OMP_CLAUSE_DEVICE))
> +   != NULL_TREE)
> +   && OMP_CLAUSE_DEVICE_ANCESTOR (c))
> + break;

The ( at the end of line is too ugly for me.
Can't you write it as:
  if ((gimple_omp_target_kind (ctx->stmt)
   == GF_OMP_TARGET_KIND_REGION)
  && (gimple_omp_target_kind (stmt)
  == GF_OMP_TARGET_KIND_REGION))
{
  c = omp_find_clause (gimple_omp_target_clauses (stmt),
   OMP_CLAUSE_DEVICE);
  if (c && OMP_CLAUSE_DEVICE_ANCESTOR (c))
break;
}
?

Otherwise LGTM.

Jakub

demangler: Structured Bindings

2022-05-17 Thread Nathan Sidwell


C++ Structured bindings have a mangling that has yet to be formally
documented.  However, it's been around for a while and shows up for
module support.

This adds it to the demangler.

nathan

--
Nathan SidwellFrom 451894cadcf1210883ceefb2d69a0ed2d6a8cd8b Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Tue, 8 Mar 2022 13:00:35 -0800
Subject: [PATCH 1/2] demangler: Structured Bindings

C++ Structured bindings have a mangling that has yet to be formally
documented.  However, it's been around for a while and shows up for
module support.

	include/
	* demangle.h (enum demangle_component_type): Add
	DEMANGLE_COMPONENT_STRUCTURED_BINDING.
	libiberty/
	* cp-demangle.c (d_make_comp): Adjust.
	(d_unqualified_name): Add 'DC' support.
	(d_count_template_scopes): Adjust.
	(d_print_comp_inner): Add structured binding.
	* testsuite/demangle-expected: Add testcases.
---
 include/demangle.h|  4 ++-
 libiberty/cp-demangle.c   | 49 +++
 libiberty/testsuite/demangle-expected | 10 ++
 3 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/include/demangle.h b/include/demangle.h
index 402308f769f..44a27374d4f 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -449,7 +449,9 @@ enum demangle_component_type
   /* A cloned function.  */
   DEMANGLE_COMPONENT_CLONE,
   DEMANGLE_COMPONENT_NOEXCEPT,
-  DEMANGLE_COMPONENT_THROW_SPEC
+  DEMANGLE_COMPONENT_THROW_SPEC,
+
+  DEMANGLE_COMPONENT_STRUCTURED_BINDING
 };
 
 /* Types which are only used internally.  */
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 6dff7d28fcf..fc618fa7383 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -1020,6 +1020,7 @@ d_make_comp (struct d_info *di, enum demangle_component_type type,
 case DEMANGLE_COMPONENT_NULLARY:
 case DEMANGLE_COMPONENT_TRINARY_ARG2:
 case DEMANGLE_COMPONENT_TPARM_OBJ:
+case DEMANGLE_COMPONENT_STRUCTURED_BINDING:
   if (left == NULL)
 	return NULL;
   break;
@@ -1619,12 +1620,12 @@ d_prefix (struct d_info *di, int subst)
 }
 }
 
-/*  ::= 
-  ::= 
-  ::= 
-		  ::=  
-
-	::= L  
+/*  ::=  []
+  ::=  []
+  ::=  []
+		  ::=   []
+		  ::= DC + E []
+	::= L   []
 */
 
 static struct demangle_component *
@@ -1655,6 +1656,28 @@ d_unqualified_name (struct d_info *di)
 			   d_source_name (di));
 	}
 }
+  else if (peek == 'D' && d_peek_next_char (di) == 'C')
+{
+  // structured binding
+  d_advance (di, 2);
+  struct demangle_component *prev = NULL;
+  do
+	{
+	  struct demangle_component *next = 
+	d_make_comp (di, DEMANGLE_COMPONENT_STRUCTURED_BINDING,
+			 d_source_name (di), NULL);
+	  if (prev)
+	d_right (prev) = next;
+	  else
+	ret = next;
+	  prev = next;
+	}
+  while (prev && d_peek_char (di) != 'E');
+  if (prev)
+	d_advance (di, 1);
+  else
+	ret = NULL;
+}
   else if (peek == 'C' || peek == 'D')
 ret = d_ctor_dtor_name (di);
   else if (peek == 'L')
@@ -4179,6 +4202,7 @@ d_count_templates_scopes (struct d_print_info *dpi,
 case DEMANGLE_COMPONENT_CHARACTER:
 case DEMANGLE_COMPONENT_NUMBER:
 case DEMANGLE_COMPONENT_UNNAMED_TYPE:
+case DEMANGLE_COMPONENT_STRUCTURED_BINDING:
   break;
 
 case DEMANGLE_COMPONENT_TEMPLATE:
@@ -4850,6 +4874,19 @@ d_print_comp_inner (struct d_print_info *dpi, int options,
   d_append_char (dpi, ']');
   return;
 
+case DEMANGLE_COMPONENT_STRUCTURED_BINDING:
+  d_append_char (dpi, '[');
+  for (;;)
+	{
+	  d_print_comp (dpi, options, d_left (dc));
+	  dc = d_right (dc);
+	  if (!dc)
+	break;
+	  d_append_string (dpi, ", ");
+	}
+  d_append_char (dpi, ']');
+  return;
+
 case DEMANGLE_COMPONENT_QUAL_NAME:
 case DEMANGLE_COMPONENT_LOCAL_NAME:
   d_print_comp (dpi, options, d_left (dc));
diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected
index de54ad73cc8..2b0b531d4bf 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -1493,3 +1493,13 @@ decltype ({parm#1}.A::x) f(A)
 
 _Z2f6IP1AEDtptfp_gssr1A1BE1xET_
 decltype ({parm#1}->(::A::B::x)) f6(A*)
+
+# Structured Bindings
+_ZDC1a1bE
+[a, b]
+
+_ZNStDC1aEE
+std::[a]
+
+_ZN3NMSDC1aEE
+NMS::[a]
-- 
2.30.2

Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async

2022-05-17 Thread Jakub Jelinek via Gcc-patches

On Tue, May 17, 2022 at 11:57:02AM +0200, Marcel Vollweiler wrote:
> > More importantly, I have no idea how this can work when you pass arg_size 0
> > and arg_align 0.  The s variable is in the current function frame, with
> > arg_size 0 nothing is really copied to the generated task.
> > arg_size should be sizeof (memcpy_t) and arg_align __alignof__ (memcpy_t)
> > (well, struct omp_target_memcpy_data).
> 
> The copy function of GOMP_task ("cpyfn") is not used here (set to NULL) and 
> thus
> also arg_size and arg_align are set to 0 since they are related to cpyfn if I
> understand it correctly.

No, arg_size and arg_align are for all (explicit) tasks the size and
alignment of the arguments.  For an included task (one executed by the
encountering thread) we indeed use data directly instead of allocating
arg_size arg_align aligned bytes and copying data to it.  But when we create
a deferred task (that is the only thing that actually can be asynchronous), we
allocate struct gomp_task together with memory for the data (arg_size bytes
aligned to arg_align).  If cpyfn, we invoke that copy function (from source
data to the destination buffer), otherwise memcpy.  cpyfn is a callback that
will do memcpy for parts that need bitwise copy and copy construction /
whatever else is needed for other data.
Looking at your patch, you call GOMP_task always with if_clause = false,
that means it is always included task (like with #pragma omp task if(0)),
but that also means calling GOMP_task doesn't bring any advantages and it is
not asynchronous.
If you called it with if_clause = true, like what #pragma omp task would do,
then the arg_size = 0 and arg_align = 0 would make it not work at all,
so after fixing if_clause, you need to supply sizeof (s) and __alignof__ (s).

> > Also, it would be nice to avoid GOMP_task for the depobj_count == 0 case
> > at least sometimes (but perhaps that can be done incrementally) and instead
> > use some CUDA etc. asynchronous copy APIs.  We don't really need to wait
> > for anything in that case, and from OpenMP POV all we need to make sure is
> > that barrier/taskwait/taskgroup end will know about these "tasks" and
> > wait for them.  So, it can be implemented more like #pragma omp target 
> > nowait
> > instead of #pragma omp task that calls the synchronous omp_target_memcpy.
> > Though, maybe that is how it should be implemented always, something like
> > gomp_create_target_task and its caller.  We already use that single routine
> > for multiple purposes (target nowait as well as target enter/exit data
> > nowait), so just telling it somehow that it shouldn't do mapping/unmapping
> > and perhaps target execution and instead copying would be nice.
> 
> I dont't see/understand the advantage using gomp_create_target_task over
> GOMP_task. Whether the task waits for dependencies
> ("gomp_task_maybe_wait_for_dependencies") depends on GOMP_TASK_FLAG_DEPEND 
> which
> is only set if depobj_count > 0 and depobj_list != NULL. Thus, there shouldn't
> be any waiting in case of depobj_count == 0? Additionally, in both functions a
> new thread is created - independently of dependencies.

GOMP_task never creates a new thread.
gomp_create_target_task can create (but just once) an unshackeled thread
that runs on the side, doesn't do normal OpenMP user work and just polls the
offloading device and performs unmapping or whatever is needed to finish a
nowait offloaded task.

The disadvantage of GOMP_task is:
1) if you call say omp_target_memcpy_async from outside of parallel, it will
   not be actually asynchronous even if you call GOMP_task with if_clause = true
2) if you call it from inside of parallel, it might be scheduled only when
   some host thread is ready for work (e.g. when reaching #pragma omp barrier,
   implicit barrier, #pragma omp taskwait etc.), so even when the offloading
   device is unused but host has lots of work to do, it might take quite a
   while before starting the work, and then one of the OpenMP host threads
   will be blocked waiting for the copying to be done

gomp_create_target_task doesn't have these disadvantages, it can fire off the
copying right away and then just needs to be able to figure out when it
finished (either the unshackeled thread polls the device, or some other way
how to find out that it finished; but OpenMP certainly needs to know that,
because user code can say #pragma omp taskwait for it, or it should be
complete at the end of a taskgroup, or at the end of #pragma omp barrier
or implicit barrier etc.).

Anyway, I guess it is ok to use GOMP_task in the initial patch and change it
later, but if_clause = false and 0, 0 for arg_{size,align} are definitely
wrong.

> +int
> +omp_target_memcpy (void *dst, const void *src, size_t length, size_t 
> dst_offset,
> +size_t src_offset, int dst_device_num, int src_device_num)
> +{
> +  struct gomp_device_descr *dst_devicep = NULL, *src_devicep = NULL;
> +  int ret;
> +
> +  ret = omp_target_memcpy_check

[PATCH] Simplify logic in tree-scalar-evolution's expensive_expression_p.

2022-05-17 Thread Roger Sayle


This patch simplifies tree-scalar-evolution's expensive_expression_p, but
produces identical results; the replacement implementation is just smaller
(uses less memory), faster and easier to understand.

The current idiom (introduced to fix PR90726) looks like:

hash_map cache;
uint64_t expanded_size = 0;
return (expression_expensive_p (expr, cache, expanded_size)
   || expanded_size > cache.elements ());

Here the recursive function computes expanded_size, effectively the
number of tree nodes visited, which is then only used in the comparison
against cache.elements(), i.e. to check whether the number of visited
nodes is greater than the number of unique visited nodes.  This is
equivalent to instead checking where expression_expensive_p's recursion
visits any node more than once.

Instead of using a map to cache the "cost" of revisited sub-trees, the
same outcome can be determined using a set, and immediately returning
true as soon as encountering a previously seen tree node, avoiding the
unnecessary "cost"/expanded_size computation.  [A simplification analogous
to checking STL's empty() instead of comparing size() with zero].

The semantics of expensive_expression_p (both before and after) are
quite reasonable, as calling unshare_expr on a generic tree can result
in an exponential growth in the number of gimple statements, hence
any "shared" nodes are indeed expensive.  If shared nodes are to be
allowed, they'll need to be managed explicitly with SAVE_EXPR (or similar
mechanism) that avoids exponential growth.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}, with
no new failures.  Is this a reasonable clean-up for mainline?


2022-05-17  Roger Sayle  

gcc/ChangeLog
* tree_scalar_evolution.cc (expression_expensive_p): Change type
of cache from hash_map to hash_set, and remove cost argument.
When expr appears in the hash_set, return true.  Calculation of
cost (and updating hash_map) is no longer required.
(expression_expensive_p):  Simplify top-level implementation.


Thanks in advance,
Roger
--

diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 72ceb40..347dede 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3352,8 +3352,7 @@ scev_finalize (void)
for scev_const_prop.  */
 
 static bool
-expression_expensive_p (tree expr, hash_map &cache,
-   uint64_t &cost)
+expression_expensive_p (tree expr, hash_set &cache)
 {
   enum tree_code code;
 
@@ -3377,19 +3376,11 @@ expression_expensive_p (tree expr, hash_map &cache,
return true;
 }
 
-  bool visited_p;
-  uint64_t &local_cost = cache.get_or_insert (expr, &visited_p);
-  if (visited_p)
-{
-  uint64_t tem = cost + local_cost;
-  if (tem < cost)
-   return true;
-  cost = tem;
-  return false;
-}
-  local_cost = 1;
+  /* If we've encountered this expression before, it would be duplicated
+ by unshare_expr, which makes this expression expensive.  */
+  if (cache.add (expr))
+return true;
 
-  uint64_t op_cost = 0;
   if (code == CALL_EXPR)
 {
   tree arg;
@@ -3431,16 +3422,14 @@ expression_expensive_p (tree expr, hash_map &cache,
}
 
   FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
-   if (expression_expensive_p (arg, cache, op_cost))
+   if (expression_expensive_p (arg, cache))
  return true;
-  *cache.get (expr) += op_cost;
-  cost += op_cost + 1;
   return false;
 }
 
   if (code == COND_EXPR)
 {
-  if (expression_expensive_p (TREE_OPERAND (expr, 0), cache, op_cost)
+  if (expression_expensive_p (TREE_OPERAND (expr, 0), cache)
  || (EXPR_P (TREE_OPERAND (expr, 1))
  && EXPR_P (TREE_OPERAND (expr, 2)))
  /* If either branch has side effects or could trap.  */
@@ -3448,13 +3437,9 @@ expression_expensive_p (tree expr, hash_map &cache,
  || generic_expr_could_trap_p (TREE_OPERAND (expr, 1))
  || TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 0))
  || generic_expr_could_trap_p (TREE_OPERAND (expr, 0))
- || expression_expensive_p (TREE_OPERAND (expr, 1),
-cache, op_cost)
- || expression_expensive_p (TREE_OPERAND (expr, 2),
-cache, op_cost))
+ || expression_expensive_p (TREE_OPERAND (expr, 1), cache)
+ || expression_expensive_p (TREE_OPERAND (expr, 2), cache))
return true;
-  *cache.get (expr) += op_cost;
-  cost += op_cost + 1;
   return false;
 }
 
@@ -3462,15 +3447,13 @@ expression_expensive_p (tree expr, hash_map &cache,
 {
 case tcc_binary:
 case tcc_comparison:
-  if (expression_expensive_p (TREE_OPERAND (expr, 1), cache, op_cost))
+  if (expression_expensive_p (TREE_OPERAND (expr, 1), cache))
return true;
 
   /* Fallthru.

RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.

2022-05-17 Thread Tamar Christina via Gcc-patches

 […]
> >> > We generate for e.g.:
> >> >
> >> > #include 
> >> >
> >> > uint16_t f8 (uint8_t xr, uint8_t xc){
> >> > return (uint8_t)(xr * xc);
> >> > }
> >> >
> >> > (insn 9 6 10 2 (set (reg:HI 101)
> >> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1
> >> (nil))
> >> (insn 10 9 11 2 (set (reg:HI 102)
> >> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1
> >> (nil))
> >> (insn 11 10 12 2 (set (reg:SI 103)
> >> (mult:SI (subreg:SI (reg:HI 101) 0)
> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
> >> (nil))
> >> >
> >> > Out of expand. The paradoxical subreg isn't generated at all out of
> >> > expand unless it's needed. It does keep the original params around
> >> > as
> >> unused:
> >> >
> >> > (insn 2 7 4 2 (set (reg:QI 97)
> >> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1
> >> (nil))
> >> (insn 4 2 3 2 (set (reg:QI 99)
> >> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1
> >> (nil))
> >> >
> >> > And the paradoxical subreg is moved into the first operation requiring 
> >> > it:
> >> >
> >> > (insn 11 10 12 2 (set (reg:SI 103)
> >> (mult:SI (subreg:SI (reg:HI 101) 0)
> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
> >> (nil))
> >>
> >> Ah, OK, this isn't what I'd imaagined.  I thought the xr and xc
> >> registers would be SIs and the DECL_RTLs would be QI subregs of those SI
> regs.
> >> I think that might work better, for the reasons above.  (That is,
> >> whenever we need the register in extended form, we can simply extend
> >> the existing reg rather than create a new one.)
> >
> > Ah, I see, no, I explicitly avoid this. When doing the type promotions
> > I tell it that size of the copies of xr and xc is still the original size, 
> > e.g. QI (i.e. I
> don't change 97 and 99).
> > This is different from what we do with extends where 97 and 99 *would*
> be changed.
> >
> > The reason is that if I make this SI the compiler thinks it knows the
> > value of all the bits in the register which led to various miscompares as it
> thinks it can use the SI value directly.
> >
> > This happens because again the xr and xc are hard regs. So having 97
> > be
> >
> > (set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0))
> >
> > gets folded to an incorrect
> >
> > (set (reg:SI 97) (reg:SI 0 x0 [ xr ]))
> 
> This part I would expect (and hope for :-)).
> 
> > And now 97 is free to be used without any zero extension, as 97 on it's own
> is an invalid RTX.
> 
> But the way I'd imagined it working, expand would need to insert an
> extension before any operation that needs the upper 24 bits to be defined
> (e.g. comparisons, right shifts).  If the DECL_RTL is (subreg:QI (reg:SI x) 0)
> then the upper bits are not defined, since SUBREG_PROMOTED_VAR_P
> would/should be false for the subreg.

Ah I see, my fear here was that if we have a pattern which splits out the 
zero-extend for whatever reason
that if it gets folded it would be invalid.  But I think I understand what you 
meant.  In your case
we'd never again use the hardreg, but that everything goes through 97. Got it.

> 
> E.g. for:
> 
>   int8_t foo(int8_t x) { return x >> 1; }
> 
> x would have a DECL_RTL of (subreg:QI (reg:SI x) 0), the parameter
> assignment would be expanded as:
> 
>   (set (reg:SI x) (reg:SI x0))
> 
> the shift would be expanded as:
> 
>   (set (reg:SI x) (zero_extend:SI (subreg:QI (reg:SI x) 0)))
>   (set (reg:SI x) (ashiftrt:SI (reg:SI x) (const_int 1)))
> 
> and the return assignment would be expanded as:
> 
>   (set (reg:SI x0) (reg:SI x))
> 
> x + 1 would instead be expanded to just:
> 
>   (set (reg:SI x) (plus:SI (reg:SI x) (const_int 1)))
> 
> (without an extension).
> 
> I realised later though that, although reusing the DECL_RTL reg for the
> extension has the nice RA property of avoiding multiple live values, it would
> make it harder to combine the extension into the operation if the variable is
> still live afterwards.  So I guess we lose something both ways.
> 
> Maybe we need a different approach, not based on changing
> PROMOTE_MODE.
> 
> I wonder how easy it would be to do the promotion in gimple, then reuse
> backprop to determine when a sign/zero-extension (i.e. a normal gimple cast)
> can be converted into an “any extend”
> (probably represented as a new ifn).

Do you mean without changing the hook implementation but keeping the current 
promotion?

I guess the problem here is that it's the inverse cases that's the problem 
isn't it? It's not that in
gimple there are unneeded extends, it's that some operations require an 
any-extend no?

like in gimple ~a where a is an 8-bit quantity requires an any-extend, but no 
cast would be there
in gimple.

So for instance

#include 

uint8_t f (uint8_t a)
{
return ~a;
}

Is just simply:

f (uint8_t a)
{
  uint8_t _2;

   [local count: 1073741824]:
  _2 = ~a_1(D);
  return _2;

}

In gimple. I'm also slightly worried about interfering with phi opts. Backprop 
runs
before ifcombine and pihops for instance and there are various phi opts like 
ifcombine_ifandif
that rely on the

Re: [AArch64] Improve SVE dup intrinsics codegen

2022-05-17 Thread Richard Sandiford via Gcc-patches

"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch teaches the aarch64 backend to improve codegen when using dup 
> with NEON vectors with repeating patterns. It will attempt to use a 
> smaller NEON vector (or element) to limit the number of instructions 
> needed to construct the input vector.

The new sequences definitely look like an improvement.  However, this
change overlaps a bit with what Prathamesh is doing for PR96463.

Stepping back and thinking about how we handle this kind of thing
in general, it might make sense to do the following:

(1) Extend VEC_PERM_EXPR so that it can handle Advanced SIMD inputs
and SVE outputs (for constant permute indices).  This is part of
what Prathamesh is doing.

(2a) Add a way for targets to expand such VEC_PERM_EXPRs when the
 arguments are CONSTRUCTORs.  This would only be useful for
 variable-length vectors, since VEC_PERM_EXPRs of CONSTRUCTORs
 should be folded to new CONSTRUCTORs for fixed-length vectors.

(2b) Generalise the SVE handling in aarch64_expand_vector_init
 to cope with general rtx_vector_builders, rather than just
 fixed-length ones, and use it to implement the new hook
 added in (2a).

(3a) Use VEC_PERM_EXPRs of CONSTRUCTORs to simplify or replace the
 duplicate_and_interleave stuff in SLP (think Richi would be glad
 to see this go :-)).

(3b) Make svdupq_impl::fold() lower non-constant inputs to VEC_PERM_EXPRs
 of CONSTRUCTORs.

with (3a) and (3b) being independent from each other.

The advantages of doing things this way are:

* autovectorised SLP code will benefit from the same tricks as svdupq.

* gimple optimisers get to work with the simplified svdupq form.

If you don't want to do that, or wait for it to happen, perhaps
we could short-circuit the process by doing (2b) on its own.
That is, create an interface like:

   void aarch64_expand_vector_init (rtx target, rtx_vector_builder &builder);

Then have svdupq_impl::expand stuff the elements into an
rtx_vector_builder (a bit like svdupq_impl::fold does with a
tree_vector_builder when the elements are constant) and pass the
rtx_vector_builder to this new routine.  Then aarch64_expand_vector_init
would be a home for all the optimisations, using the npatterns/
nelts_per_pattern information where useful.  It would be good if
possible to integrate it with the existing SVE aarch64_expand_vector_init
code.

This would also make it easier to optimise:

svint8_t int8_2(int8_t a, int8_t b)
{
return svdupq_n_s8(a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b);
}

to the expected 16-bit dup, even without V2QI being defined.

Thanks,
Richard

> Bootstrapped and regression tested  aarch64-none-linux-gnu.
>
> Is his OK for trunk?
>
> gcc/ChangeLog:
>
>      * config/aarch64/aarch64.cc (aarch64_simd_container_mode): Make 
> it global.
>      * config/aarch64/aarch64-protos.h 
> (aarch64_simd_container_mode): Declare it.
>      * config/aarch64/aarch64-sve.md (*vec_duplicate_reg): 
> Rename this to ...
>      (@aarch64_vec_duplicae_reg_): ... this.
>      * gcc/config/aarch64-sve-builtins-base.cc 
> (svdup_lane_impl::expand): Improve codegen when inputs form a repeating 
> pattern.
>
> gcc/testsuite/ChangeLog:
>
>      * gcc.target/aarch64/sve/dup_opt.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 2ac781dff4a93cbe0f0b091147b2521ed1a88750..cfc31b467cf1d3cd79b2dfe6a54e6910dd43b5d8
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -771,6 +771,7 @@ int aarch64_branch_cost (bool, bool);
>  enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
>  bool aarch64_advsimd_struct_mode_p (machine_mode mode);
>  opt_machine_mode aarch64_vq_mode (scalar_mode);
> +machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
>  opt_machine_mode aarch64_full_sve_mode (scalar_mode);
>  bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
>  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 
> c24c05487246f529f81867d6429e636fd6dc74d0..f8b755a83dc37578363270618323f87c95fa327f
>  100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -875,13 +875,98 @@ public:
> argument N to go into architectural lane N, whereas Advanced SIMD
> vectors are loaded memory lsb to register lsb.  We therefore need
> to reverse the elements for big-endian targets.  */
> -rtx vq_reg = gen_reg_rtx (vq_mode);
>  rtvec vec = rtvec_alloc (elements_per_vq);
>  for (unsigned int i = 0; i < elements_per_vq; ++i)
>{
>   unsigned int argno = BYTES_BIG_ENDIAN ? elements_per_vq - i - 1 : i;
>   RTVEC_ELT (vec, i) = e.args[argno];
>}
> +
> +/* Look for a repeating pattern

Re: [PATCH v2] PR105169 Fix references to discarded sections

2022-05-17 Thread Richard Biener via Gcc-patches




> Am 17.05.2022 um 19:37 schrieb Giuliano Belinassi via Gcc-patches 
> :
> 
> On Mon, 2022-05-09 at 13:39 +0200, Richard Biener wrote:
>>> On Sat, 7 May 2022, Giuliano Belinassi wrote:
>>> 
>>> When -fpatchable-function-entry= is enabled, certain C++ codes
>>> fails to
>>> link because of generated references to discarded sections in
>>> __patchable_function_entry section. This commit fixes this problem
>>> by
>>> puting those references in a COMDAT section.
>>> 
>>> On the previous patch, GCC fails to compile with --enable-vtable-
>>> verify
>>> enabled. This version compiles fine with it.
>> 
>> OK for trunk.
>> 
> 
> Pushed to trunk. Is it okay if I backport it to older versions as well?

Ok for the GCC 12 branch after a week with no reported issues. Since it’s not a 
regression further backporting requires good reasons.

Richard.

> Giuliano.
> 
>> Thanks,
>> Richard.
>> 
>>> 2022-05-06  Giuliano Belinassi  
>>> 
>>>PR c++/105169
>>>* targhooks.cc (default_print_patchable_function_entry_1):
>>> Handle COMDAT case.
>>>* varasm.cc (switch_to_comdat_section): New
>>>(handle_vtv_comdat_section): Call switch_to_comdat_section.
>>>* varasm.h: Declare switch_to_comdat_section.
>>> 
>>> gcc/testsuite/ChangeLog
>>> 2022-05-06  Giuliano Belinassi  
>>> 
>>>PR c++/105169
>>>* g++.dg/modules/pr105169.h: New file.
>>>* g++.dg/modules/pr105169_a.C: New test.
>>>* g++.dg/modules/pr105169_b.C: New file.
>>> ---
>>> gcc/targhooks.cc  |  8 --
>>> gcc/testsuite/g++.dg/modules/pr105169.h   | 22 +++
>>> gcc/testsuite/g++.dg/modules/pr105169_a.C | 25 +
>>> gcc/testsuite/g++.dg/modules/pr105169_b.C | 12 +
>>> gcc/varasm.cc | 33 ++-
>>> 
>>> gcc/varasm.h  |  2 ++
>>> 6 files changed, 87 insertions(+), 15 deletions(-)
>>> create mode 100644 gcc/testsuite/g++.dg/modules/pr105169.h
>>> create mode 100644 gcc/testsuite/g++.dg/modules/pr105169_a.C
>>> create mode 100644 gcc/testsuite/g++.dg/modules/pr105169_b.C
>>> 
>>> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
>>> index 399d6f874dc..b15ae19bcb6 100644
>>> --- a/gcc/targhooks.cc
>>> +++ b/gcc/targhooks.cc
>>> @@ -2009,8 +2009,12 @@ default_print_patchable_function_entry_1
>>> (FILE *file,
>>>   patch_area_number++;
>>>   ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE",
>>> patch_area_number);
>>> 
>>> -  switch_to_section (get_section
>>> ("__patchable_function_entries",
>>> -  flags, current_function_decl));
>>> +  section *sect = get_section ("__patchable_function_entries",
>>> +  flags, current_function_decl);
>>> +  if (HAVE_COMDAT_GROUP && DECL_COMDAT_GROUP
>>> (current_function_decl))
>>> +switch_to_comdat_section (sect, current_function_decl);
>>> +  else
>>> +switch_to_section (sect);
>>>   assemble_align (POINTER_SIZE);
>>>   fputs (asm_op, file);
>>>   assemble_name_raw (file, buf);
>>> diff --git a/gcc/testsuite/g++.dg/modules/pr105169.h
>>> b/gcc/testsuite/g++.dg/modules/pr105169.h
>>> new file mode 100644
>>> index 000..a7e76270531
>>> --- /dev/null
>>> +++ b/gcc/testsuite/g++.dg/modules/pr105169.h
>>> @@ -0,0 +1,22 @@
>>> +class IPXAddressClass
>>> +{
>>> +public:
>>> +IPXAddressClass(void);
>>> +};
>>> +
>>> +class WinsockInterfaceClass
>>> +{
>>> +
>>> +public:
>>> +WinsockInterfaceClass(void);
>>> +
>>> +virtual void Set_Broadcast_Address(void*){};
>>> +
>>> +virtual int Get_Protocol(void)
>>> +{
>>> +return 0;
>>> +};
>>> +
>>> +protected:
>>> +};
>>> +
>>> diff --git a/gcc/testsuite/g++.dg/modules/pr105169_a.C
>>> b/gcc/testsuite/g++.dg/modules/pr105169_a.C
>>> new file mode 100644
>>> index 000..66dc4b7901f
>>> --- /dev/null
>>> +++ b/gcc/testsuite/g++.dg/modules/pr105169_a.C
>>> @@ -0,0 +1,25 @@
>>> +/* { dg-module-do link } */
>>> +/* { dg-options "-std=c++11 -fpatchable-function-entry=1 -O2" } */
>>> +/* { dg-additional-options "-std=c++11 -fpatchable-function-
>>> entry=1 -O2" } */
>>> +
>>> +/* This test is in the "modules" package because it supports
>>> multiple files
>>> +   linkage.  */
>>> +
>>> +#include "pr105169.h"
>>> +
>>> +WinsockInterfaceClass* PacketTransport;
>>> +
>>> +IPXAddressClass::IPXAddressClass(void)
>>> +{
>>> +}
>>> +
>>> +int function()
>>> +{
>>> +  return PacketTransport->Get_Protocol();
>>> +}
>>> +
>>> +int main()
>>> +{
>>> +  IPXAddressClass ipxaddr;
>>> +  return 0;
>>> +}
>>> diff --git a/gcc/testsuite/g++.dg/modules/pr105169_b.C
>>> b/gcc/testsuite/g++.dg/modules/pr105169_b.C
>>> new file mode 100644
>>> index 000..5f8b00dfe51
>>> --- /dev/null
>>> +++ b/gcc/testsuite/g++.dg/modules/pr105169_b.C
>>> @@ -0,0 +1,12 @@
>>> +/* { dg-module-do link } */
>>> +/* { dg-options "-std=c++11 -fpatchable-function-entry=1 -O2" } */
>>> +/* { dg-additional-options "-std=c++11 -fpatchable-function-
>>> entry=1 -O2" } */
>>> +
>

Re: [PATCH v2] PR105169 Fix references to discarded sections

2022-05-17 Thread Giuliano Belinassi via Gcc-patches

On Mon, 2022-05-09 at 13:39 +0200, Richard Biener wrote:
> On Sat, 7 May 2022, Giuliano Belinassi wrote:
> 
> > When -fpatchable-function-entry= is enabled, certain C++ codes
> > fails to
> > link because of generated references to discarded sections in
> > __patchable_function_entry section. This commit fixes this problem
> > by
> > puting those references in a COMDAT section.
> > 
> > On the previous patch, GCC fails to compile with --enable-vtable-
> > verify
> > enabled. This version compiles fine with it.
> 
> OK for trunk.
> 

Pushed to trunk. Is it okay if I backport it to older versions as well?

Giuliano.

> Thanks,
> Richard.
> 
> > 2022-05-06  Giuliano Belinassi  
> > 
> > PR c++/105169
> > * targhooks.cc (default_print_patchable_function_entry_1):
> > Handle COMDAT case.
> > * varasm.cc (switch_to_comdat_section): New
> > (handle_vtv_comdat_section): Call switch_to_comdat_section.
> > * varasm.h: Declare switch_to_comdat_section.
> > 
> > gcc/testsuite/ChangeLog
> > 2022-05-06  Giuliano Belinassi  
> > 
> > PR c++/105169
> > * g++.dg/modules/pr105169.h: New file.
> > * g++.dg/modules/pr105169_a.C: New test.
> > * g++.dg/modules/pr105169_b.C: New file.
> > ---
> >  gcc/targhooks.cc  |  8 --
> >  gcc/testsuite/g++.dg/modules/pr105169.h   | 22 +++
> >  gcc/testsuite/g++.dg/modules/pr105169_a.C | 25 +
> >  gcc/testsuite/g++.dg/modules/pr105169_b.C | 12 +
> >  gcc/varasm.cc | 33 ++-
> > 
> >  gcc/varasm.h  |  2 ++
> >  6 files changed, 87 insertions(+), 15 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/modules/pr105169.h
> >  create mode 100644 gcc/testsuite/g++.dg/modules/pr105169_a.C
> >  create mode 100644 gcc/testsuite/g++.dg/modules/pr105169_b.C
> > 
> > diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> > index 399d6f874dc..b15ae19bcb6 100644
> > --- a/gcc/targhooks.cc
> > +++ b/gcc/targhooks.cc
> > @@ -2009,8 +2009,12 @@ default_print_patchable_function_entry_1
> > (FILE *file,
> >patch_area_number++;
> >ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE",
> > patch_area_number);
> >  
> > -  switch_to_section (get_section
> > ("__patchable_function_entries",
> > - flags, current_function_decl));
> > +  section *sect = get_section ("__patchable_function_entries",
> > + flags, current_function_decl);
> > +  if (HAVE_COMDAT_GROUP && DECL_COMDAT_GROUP
> > (current_function_decl))
> > +   switch_to_comdat_section (sect, current_function_decl);
> > +  else
> > +   switch_to_section (sect);
> >assemble_align (POINTER_SIZE);
> >fputs (asm_op, file);
> >assemble_name_raw (file, buf);
> > diff --git a/gcc/testsuite/g++.dg/modules/pr105169.h
> > b/gcc/testsuite/g++.dg/modules/pr105169.h
> > new file mode 100644
> > index 000..a7e76270531
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/pr105169.h
> > @@ -0,0 +1,22 @@
> > +class IPXAddressClass
> > +{
> > +public:
> > +IPXAddressClass(void);
> > +};
> > +
> > +class WinsockInterfaceClass
> > +{
> > +
> > +public:
> > +WinsockInterfaceClass(void);
> > +
> > +virtual void Set_Broadcast_Address(void*){};
> > +
> > +virtual int Get_Protocol(void)
> > +{
> > +return 0;
> > +};
> > +
> > +protected:
> > +};
> > +
> > diff --git a/gcc/testsuite/g++.dg/modules/pr105169_a.C
> > b/gcc/testsuite/g++.dg/modules/pr105169_a.C
> > new file mode 100644
> > index 000..66dc4b7901f
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/pr105169_a.C
> > @@ -0,0 +1,25 @@
> > +/* { dg-module-do link } */
> > +/* { dg-options "-std=c++11 -fpatchable-function-entry=1 -O2" } */
> > +/* { dg-additional-options "-std=c++11 -fpatchable-function-
> > entry=1 -O2" } */
> > +
> > +/* This test is in the "modules" package because it supports
> > multiple files
> > +   linkage.  */
> > +
> > +#include "pr105169.h"
> > +
> > +WinsockInterfaceClass* PacketTransport;
> > +
> > +IPXAddressClass::IPXAddressClass(void)
> > +{
> > +}
> > +
> > +int function()
> > +{
> > +  return PacketTransport->Get_Protocol();
> > +}
> > +
> > +int main()
> > +{
> > +  IPXAddressClass ipxaddr;
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/g++.dg/modules/pr105169_b.C
> > b/gcc/testsuite/g++.dg/modules/pr105169_b.C
> > new file mode 100644
> > index 000..5f8b00dfe51
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/pr105169_b.C
> > @@ -0,0 +1,12 @@
> > +/* { dg-module-do link } */
> > +/* { dg-options "-std=c++11 -fpatchable-function-entry=1 -O2" } */
> > +/* { dg-additional-options "-std=c++11 -fpatchable-function-
> > entry=1 -O2" } */
> > +
> > +/* This test is in the "modules" package because it supports
> > multiple files
> > +   linkage.  */
> > +
> > +#include "pr105169.h"
> > +
> > +WinsockInterfaceClass::WinsockInterfaceClass(void)
>

Re: [PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines.

2022-05-17 Thread Segher Boessenkool

Hi!

On Tue, May 17, 2022 at 11:54:10AM -0500, will schmidt wrote:
> These defines are no longer used once the rs6000 built-in
> reworks were completed.   Would be good to remove them.

:-)

> There was a reference to RS6000_BTC_SPECIAL in a TODO comment
> in rs6000-builtins.def.  That comment remains, but I have updated
> the comment to refer to "SPECIAL" processing, instead of having it
> refer directly to the RS6000_BTC_SPECIAL macro.
> 
> 2022-05-17  Will Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def: rephrase
>   RS6000_BTC_SPECIAL in comment.

"Rephrase", capital R.

>   * config/rs6000/rs6000.h:  Remove definitions

One space after a colon.

>   RS6000_BTC_UNARY, RS6000_BTC_BINARY,
>   RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY,
>   RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK,
>   RS6000_BTC_SPECIAL, RS6000_BTC_PREDICATE, RS6000_BTC_ABS,
>   RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_MISC,
>   RS6000_BTC_CONST, RS6000_BTC_PURE, RS6000_BTC_FP,
>   RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR,
>   RS6000_BTC_ATTR_MASK, RS6000_BTC_SPR, RS6000_BTC_VOID,
>   RS6000_BTC_CR, RS6000_BTC_OVERLOADED, RS6000_BTC_GIMPLE,
>   RS6000_BTC_MISC_MASK, RS6000_BTC_MEM, RS6000_BTC_SAT,
>   RS6000_BTM_ALWAYS

Sentences end in a dot, and every changelog line is a sentence.  But,
this should be
* config/rs6000/rs6000.h (RS6000_BTC_UNARY, RS6000_BTC_BINARY,
RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY, RS6000_BTC_QUINARY,
...
RS6000_BTC_SAT, RS6000_BTM_ALWAYS): Delete.

> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1423,11 +1423,11 @@
>  
>pure vsc __builtin_vsx_ld_elemrev_v16qi (signed long, const void *);
>  LD_ELEMREV_V16QI vsx_ld_elemrev_v16qi {ldvec,endian}
>  
>  ; TODO: There is apparent intent in rs6000-builtin.def to have

That file does no longer exist.

> -; RS6000_BTC_SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are
> +; SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are
>  ; no def_builtin calls for any of them.  At some point, we may want to add a
>  ; set of built-ins for whichever vector types make sense for these.

Is the comment still relevant?  If so a bit more rephrasing woukd be
good; if not, just remove it?

Okay for trunk with those things dealt with somehow.  Thanks!


Segher

[PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines.

2022-05-17 Thread will schmidt via Gcc-patches

[PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines.

Hi, 

These defines are no longer used once the rs6000 built-in
reworks were completed.   Would be good to remove them.

There was a reference to RS6000_BTC_SPECIAL in a TODO comment
in rs6000-builtins.def.  That comment remains, but I have updated
the comment to refer to "SPECIAL" processing, instead of having it
refer directly to the RS6000_BTC_SPECIAL macro.

2022-05-17  Will Schmidt  

gcc/
* config/rs6000/rs6000-builtins.def: rephrase
RS6000_BTC_SPECIAL in comment.
* config/rs6000/rs6000.h:  Remove definitions
RS6000_BTC_UNARY, RS6000_BTC_BINARY,
RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY,
RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK,
RS6000_BTC_SPECIAL, RS6000_BTC_PREDICATE, RS6000_BTC_ABS,
RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_MISC,
RS6000_BTC_CONST, RS6000_BTC_PURE, RS6000_BTC_FP,
RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR,
RS6000_BTC_ATTR_MASK, RS6000_BTC_SPR, RS6000_BTC_VOID,
RS6000_BTC_CR, RS6000_BTC_OVERLOADED, RS6000_BTC_GIMPLE,
RS6000_BTC_MISC_MASK, RS6000_BTC_MEM, RS6000_BTC_SAT,
RS6000_BTM_ALWAYS


diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f4a9f24bcc5c..9a63a9eda580 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1423,11 +1423,11 @@
 
   pure vsc __builtin_vsx_ld_elemrev_v16qi (signed long, const void *);
 LD_ELEMREV_V16QI vsx_ld_elemrev_v16qi {ldvec,endian}
 
 ; TODO: There is apparent intent in rs6000-builtin.def to have
-; RS6000_BTC_SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are
+; SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are
 ; no def_builtin calls for any of them.  At some point, we may want to add a
 ; set of built-ins for whichever vector types make sense for these.
 
   pure vsq __builtin_vsx_lxvd2x_v1ti (signed long, const void *);
 LXVD2X_V1TI vsx_load_v1ti {ldvec}
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 523256a5c9d5..90a357ab7932 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2247,58 +2247,10 @@ extern char rs6000_reg_names[][8];  /* register 
names (0 vs. %r0).  */
 /* #define  MACHINE_no_sched_speculative_load */
 
 /* General flags.  */
 extern int frame_pointer_needed;
 
-/* Classification of the builtin functions as to which switches enable the
-   builtin, and what attributes it should have.  We used to use the target
-   flags macros, but we've run out of bits, so we now map the options into new
-   settings used here.  */
-
-/* Builtin operand count.  */
-#define RS6000_BTC_UNARY   0x0001  /* normal unary function.  */
-#define RS6000_BTC_BINARY  0x0002  /* normal binary function.  */
-#define RS6000_BTC_TERNARY 0x0003  /* normal ternary function.  */
-#define RS6000_BTC_QUATERNARY  0x0004  /* normal quaternary
-  function. */
-#define RS6000_BTC_QUINARY 0x0005  /* normal quinary function.  */
-#define RS6000_BTC_SENARY  0x0006  /* normal senary function.  */
-#define RS6000_BTC_OPND_MASK   0x0007  /* Mask to isolate operands. */
-
-/* Builtin attributes.  */
-#define RS6000_BTC_SPECIAL 0x  /* Special function.  */
-#define RS6000_BTC_PREDICATE   0x0008  /* predicate function.  */
-#define RS6000_BTC_ABS 0x0010  /* Altivec/VSX ABS
-  function.  */
-#define RS6000_BTC_DST 0x0020  /* Altivec DST function.  */
-
-#define RS6000_BTC_TYPE_MASK   0x003f  /* Mask to isolate types */
-
-#define RS6000_BTC_MISC0x  /* No special 
attributes.  */
-#define RS6000_BTC_CONST   0x0100  /* Neither uses, nor
-  modifies global state.  */
-#define RS6000_BTC_PURE0x0200  /* reads global
-  state/mem and does
-  not modify global state.  */
-#define RS6000_BTC_FP  0x0400  /* depends on rounding mode.  */
-#define RS6000_BTC_QUAD0x0800  /* Uses a register 
quad.  */
-#define RS6000_BTC_PAIR0x1000  /* Uses a register 
pair.  */
-#define RS6000_BTC_QUADPAIR0x1800  /* Uses a quad and a pair.  */
-#define RS6000_BTC_ATTR_MASK   0x1f00  /* Mask of the attributes.  */
-
-/* Miscellaneous information.  */
-#define RS6000_BTC_SPR 0x0100  /* function references SPRs.  */
-#define RS6000_BTC_VOID0x0200  /* function has no 
return value.  */
-#define RS6000_BTC_CR  0x0400  /* function references a CR.  */

[x86 PATCH] Correct ix86_rtx_cost for multi-word multiplication.

2022-05-17 Thread Roger Sayle


This is the i386 backend specific piece of my revised patch for
PR middle-end/98865, where Richard Biener has suggested that I perform
the desired transformation during RTL expansion where the backend can
control whether it is profitable to convert a multiplication into a
bit-wise AND and a negation.  This works well for x86_64, but alas
exposes a latent bug with -m32, where a DImode multiplication incorrectly
appears to be cheaper than negdi2+anddi3(!?).  The fix to ix86_rtx_costs
is to report that a DImode (multi-word) multiplication actually requires
three SImode multiplications and two SImode additions.  This also corrects
the cost of TImode multiplication on TARGET_64BIT.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}, with
no new failures.  This change avoids the need for a !ia32 target selector
for the upcoming new test case gcc.target/i386/pr98865.c.
Ok for mainline?


2022-05-17  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) [MULT]: When mode size
is wider than word_mode, a multiplication costs three word_mode
multiplications and two word_mode additions.

Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 86752a6..e8a2229 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20634,7 +20634,17 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
op0 = XEXP (op0, 0), mode = GET_MODE (op0);
}
 
- *total = (cost->mult_init[MODE_INDEX (mode)]
+ int mult_init;
+ // Double word multiplication requires 3 mults and 2 adds.
+ if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   {
+ mult_init = 3 * cost->mult_init[MODE_INDEX (word_mode)]
+ + 2 * cost->add;
+ nbits *= 3;
+   }
+ else mult_init = cost->mult_init[MODE_INDEX (mode)];
+
+ *total = (mult_init
+ nbits * cost->mult_bit
+ rtx_cost (op0, mode, outer_code, opno, speed)
+ rtx_cost (op1, mode, outer_code, opno, speed));

Re: [PATCH] c++: constexpr init of union sub-aggr w/ base [PR105491]

2022-05-17 Thread Patrick Palka via Gcc-patches

On Sat, May 7, 2022 at 5:18 PM Jason Merrill  wrote:
>
> On 5/6/22 16:46, Patrick Palka wrote:
> > On Fri, 6 May 2022, Jason Merrill wrote:
> >
> >> On 5/6/22 16:10, Patrick Palka wrote:
> >>> On Fri, 6 May 2022, Patrick Palka wrote:
> >>>
>  On Fri, 6 May 2022, Jason Merrill wrote:
> 
> > On 5/6/22 14:00, Patrick Palka wrote:
> >> On Fri, 6 May 2022, Patrick Palka wrote:
> >>
> >>> On Fri, 6 May 2022, Jason Merrill wrote:
> >>>
>  On 5/6/22 11:22, Patrick Palka wrote:
> > Here ever since r10-7313-gb599bf9d6d1e18,
> > reduced_constant_expression_p
> > in C++11/14 is rejecting the marked sub-aggregate initializer
> > (of type
> > S)
> >
> >   W w = {.D.2445={.s={.D.2387={.m=0}, .b=0}}}
> >  ^
> >
> > ultimately because said initializer has CONSTRUCTOR_NO_CLEARING
> > set,
> > and
> > so the function proceeds to verify that all fields of S are
> > initialized.
> > And before C++17 we don't expect to see base class fields (since
> > next_initializable_field skips over the), so the base class
> > initializer
> > causes r_c_e_p to return false.
> 
>  That seems like the primary bug.  I guess r_c_e_p shouldn't be
>  using
>  next_initializable_field.  Really that function should only be
>  used for
>  aggregates.
> >>>
> >>> I see, I'll try replacing it in r_c_e_p.  Would that be in addition
> >>> to
> >>> or instead of the clear_no_implicit_zero approach?
> >>
> >> I'm testing the following, which uses a custom predicate instead of
> >> next_initializable_field in r_c_e_p.
> >
> > Let's make it a public predicate, not internal to r_c_e_p.  Maybe it
> > could be
> > next_subobject_field, and the current next_initializable_field change to
> > next_aggregate_field?
> 
>  Will do.
> 
> >
> >> Looks like the inner initializer {.D.2387={.m=0}, .b=0} is formed
> >> during
> >> the subobject constructor call:
> >>
> >> V::V (&((struct S *) this)->D.2120);
> >>
> >> after the evaluation of which, 'result' in cxx_eval_call_expression is
> >> NULL
> >> (presumably because it's a CALL_EXPR, not AGGR_INIT_EXPR?):
> >>
> >> /* This can be null for a subobject constructor call, in
> >>which case what we care about is the initialization
> >>side-effects rather than the value.  We could get at the
> >>value by evaluating *this, but we don't bother; there's
> >>no need to put such a call in the hash table.  */
> >> result = lval ? ctx->object : ctx->ctor;
> >>
> >> so we end up not calling clear_no_implicit_zero for the inner
> >> initializer
> >> directly.  We only call clear_no_implicit_zero after evaluating the
> >> AGGR_INIT_EXPR for outermost initializer (of type W).
> >
> > Maybe for constructors we could call it on ctx->ctor instead of result,
> > or
> > call r_c_e_p in C++20+?
> 
>  But both ctx->ctor and ->object are NULL during a subobject constructor
>  call (since we apparently clear these fields when entering a
>  STATEMENT_LIST):
> 
>  So I tried instead obtaining the constructor by evaluating new_obj via
> 
>  --- a/gcc/cp/constexpr.cc
>  +++ b/gcc/cp/constexpr.cc
>  @@ -2993,6 +2988,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx,
>  tree t,
>  in order to detect reading an unitialized object in constexpr
>  instead
>  of value-initializing it.  (reduced_constant_expression_p is
>  expected to
>  take care of clearing the flag.)  */
>  +  if (new_obj && DECL_CONSTRUCTOR_P (fun))
>  +result = cxx_eval_constant_expression (ctx, new_obj, /*lval=*/false,
>  +  non_constant_p, overflow_p);
>   if (TREE_CODE (result) == CONSTRUCTOR
>   && (cxx_dialect < cxx20
>  || !DECL_CONSTRUCTOR_P (fun)))
> 
>  but that seems to break e.g. g++.dg/cpp2a/constexpr-init12.C because
>  after the subobject constructor call
> 
>  S::S (&((struct W *) this)->s, NON_LVALUE_EXPR <8>);
> 
>  the constructor for the subobject a.s in new_obj is still completely
>  missing (I suppose because S::S doesn't initialize any of its members)
>  so trying to obtain it causes us to complain too soon from
>  cxx_eval_component_reference:
> 
>  constexpr-init12.C:16:24:   in ‘constexpr’ expansion of ‘W(42)’
>  constexpr-init12.C:10:22:   in ‘constexpr’ expansion of
>  ‘((W*)this)->W::s.S::S(8)’
>  constexpr-init12.C:16:24: error: accessing uninitialized member ‘W::s’
>   16 | constexpr auto a = W(42); // { dg-error "not a constant
>  express

Re: [PATCH] gdc 9, 10 and 11 bug fix

2022-05-17 Thread Marc Aurèle La France


On Tue, 17 May 2022, Iain Buclaw wrote:

Excerpts from Marc Aurèle La France's message of Mai 17, 2022 5:31 pm:

On Tue, 17 May 2022, Iain Buclaw wrote:

Excerpts from Marc Aurèle La France's message of Mai 16, 2022 11:34 pm:

On Sun, 15 May 2022, Iain Buclaw wrote:

Excerpts from Marc Aurèle La France's message of Mai 12, 2022 10:29 pm:



No compiler has any business rejecting files for the sole crime of
being symlinked to.  The following applies, modulo patch fuzz, to the
9, 10 and 11 series of compilers.



Given my use of shadow trees, this bug attempted to prevent me from
building 12.1.0.  The D-based gdc in 12.1.0 and up does not exhibit
this quirky behaviour.



Thanks, I've checked upstream and see the following change:



https://github.com/dlang/dmd/pull/11836/commits/ebda81e44fd0ca4b247a1860d9bef411c41c16cb



It should be fine to just backport that.



Thanks for the pointer.



I ended up with the three slightly different diffs below, one each for
the 9, 10 and 11 branches.  Each was rebuilt using 8.5.0, then used to
rebuild 12.1.0.  All of this ran smoothly without complaint, although I
wouldn't want to do this on a 486...



Signed-off-by: Marc Aurèle La France 



For GCC 9   --  8< --

diff -aNpRruz -X /etc/diff.excludes gcc-9.4.0/gcc/d/dmd/root/filename.c 
devel-9.4.0/gcc/d/dmd/root/filename.c
--- gcc-9.4.0/gcc/d/dmd/root/filename.c 2021-06-01 01:53:04.716474774 -0600
+++ devel-9.4.0/gcc/d/dmd/root/filename.c   2022-05-15 15:02:49.995441507 
-0600
@@ -475,53 +475,7 @@ const char *FileName::safeSearchPath(Strings *path, const 
char *name)

 return FileName::searchPath(path, name, false);
 #elif POSIX
-/* Even with realpath(), we must check for // and disallow it
- */
-for (const char *p = name; *p; p++)
-{
-char c = *p;
-if (c == '/' && p[1] == '/')
-{
-return NULL;
-}
-}



I'd keep this check in, otherwise removing/replacing only the `if
(path)` branch looks OK to me.



The corresponding D code doesn't care about double slashes and neither
should this.  Also, the comment is misleading as realpath() would no
longer be used here.



True, but the D front-end does check the path in other places now:



https://github.com/dlang/dmd/blob/e9ba29d71b557fe079e95ee6554f116b24159bab/src/dmd/root/filename.d#L787-L803



https://github.com/dlang/dmd/blob/e9ba29d71b557fe079e95ee6554f116b24159bab/src/dmd/expressionsem.d#L5984-L6007



If we remove all checks, then there'd be nothing to prevent either
import("/file") or import("../file") from being used.


There is still no check for double slashes.  All I want here is to fix a 
C++ -> D migration bug without leaving any detritus behind.


Marc.

Re: [PATCH] gdc 9, 10 and 11 bug fix

2022-05-17 Thread Iain Buclaw via Gcc-patches

Excerpts from Marc Aurèle La France's message of Mai 17, 2022 5:31 pm:
> On Tue, 17 May 2022, Iain Buclaw wrote:
>> Excerpts from Marc Aurèle La France's message of Mai 16, 2022 11:34 pm:
>>> On Sun, 15 May 2022, Iain Buclaw wrote:
 Excerpts from Marc Aurèle La France's message of Mai 12, 2022 10:29 pm:
> 
> No compiler has any business rejecting files for the sole crime of
> being symlinked to.  The following applies, modulo patch fuzz, to the
> 9, 10 and 11 series of compilers.
> 
> Given my use of shadow trees, this bug attempted to prevent me from
> building 12.1.0.  The D-based gdc in 12.1.0 and up does not exhibit
> this quirky behaviour.
> 
 Thanks, I've checked upstream and see the following change:
> 
 https://github.com/dlang/dmd/pull/11836/commits/ebda81e44fd0ca4b247a1860d9bef411c41c16cb
> 
 It should be fine to just backport that.
> 
>>> Thanks for the pointer.
> 
>>> I ended up with the three slightly different diffs below, one each for
>>> the 9, 10 and 11 branches.  Each was rebuilt using 8.5.0, then used to
>>> rebuild 12.1.0.  All of this ran smoothly without complaint, although I
>>> wouldn't want to do this on a 486...
> 
>>> Signed-off-by: Marc Aurèle La France 
> 
>>> For GCC 9   --  8< --
>>>
>>> diff -aNpRruz -X /etc/diff.excludes gcc-9.4.0/gcc/d/dmd/root/filename.c 
>>> devel-9.4.0/gcc/d/dmd/root/filename.c
>>> --- gcc-9.4.0/gcc/d/dmd/root/filename.c 2021-06-01 01:53:04.716474774 
>>> -0600
>>> +++ devel-9.4.0/gcc/d/dmd/root/filename.c   2022-05-15 15:02:49.995441507 
>>> -0600
>>> @@ -475,53 +475,7 @@ const char *FileName::safeSearchPath(Strings *path, 
>>> const char *name)
>>>
>>>  return FileName::searchPath(path, name, false);
>>>  #elif POSIX
>>> -/* Even with realpath(), we must check for // and disallow it
>>> - */
>>> -for (const char *p = name; *p; p++)
>>> -{
>>> -char c = *p;
>>> -if (c == '/' && p[1] == '/')
>>> -{
>>> -return NULL;
>>> -}
>>> -}
> 
>> I'd keep this check in, otherwise removing/replacing only the `if
>> (path)` branch looks OK to me.
> 
> The corresponding D code doesn't care about double slashes and neither 
> should this.  Also, the comment is misleading as realpath() would no 
> longer be used here.
> 

True, but the D front-end does check the path in other places now:

https://github.com/dlang/dmd/blob/e9ba29d71b557fe079e95ee6554f116b24159bab/src/dmd/root/filename.d#L787-L803

https://github.com/dlang/dmd/blob/e9ba29d71b557fe079e95ee6554f116b24159bab/src/dmd/expressionsem.d#L5984-L6007

If we remove all checks, then there'd be nothing to prevent either
import("/file") or import("../file") from being used.

Iain.

[x86 PATCH take 2] Avoid andn and generate shorter not;and with -Oz.

2022-05-17 Thread Roger Sayle


This is a revised version of my i386 backend patch to avoid andn with -Oz,
when an explicit not;and (or not;test) would be (one byte) shorter.
https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593168.html
This revision incorporates Michael Matz's feedback/suggestions with
explicit checks for LEGACY_INT_REG_P and REX_INT_REG_P.

This patch has been tested against gcc13 trunk on x86_64-pc-linux-gnu
with make bootstrap and make -k check, both with and without
--target_board=unix{-m32}, with no new failures.  Ok for mainline?

2022-05-17  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (define_split):  Split *andsi_1 and
*andn_si_ccno after reload with -Oz.

gcc/testsuite/ChangeLog
* gcc.target/i386/bmi-and-3.c: New test case.


Thanks in advance,
Roger
--

> -Original Message-
> From: Michael Matz 
> Sent: 13 April 2022 14:11
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] Avoid andn and generate shorter not;and with -Oz.
> 
> Hello,
> 
> On Wed, 13 Apr 2022, Roger Sayle wrote:
> 
> > The x86 instruction encoding for SImode andn is longer than the
> > equivalent notl/andl sequence when the source for the not operand is
> > the same register as the destination.
> 
> _And_ when no REX prefixes are necessary for the notl,andn, which they are
if
> the respective registers are %r8 or beyond.  As you seem to be fine with
saving
> just a byte you ought to test that as well to not waste one again :-)
> 
> 
> Ciao,
> Michael.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f9c06ff..33473c6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10401,6 +10401,40 @@
   [(set_attr "type" "bitmanip")
(set_attr "btver2_decode" "direct, double")
(set_attr "mode" "")])
+
+;; Split *andnsi_1 after reload with -Oz when not;and is shorter.
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+   (and:SI (not:SI (match_operand:SI 1 "register_operand"))
+   (match_operand:SI 2 "nonimmediate_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "reload_completed
+   && optimize_insn_for_size_p () && optimize_size > 1
+   && REGNO (operands[0]) == REGNO (operands[1])
+   && LEGACY_INT_REG_P (operands[0])
+   && !REX_INT_REG_P (operands[2])
+   && !reg_overlap_mentioned_p (operands[0], operands[2])"
+  [(set (match_dup 0) (not:SI (match_dup 1)))
+   (parallel [(set (match_dup 0) (and:SI (match_dup 0) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+;; Split *andn_si_ccno with -Oz when not;test is shorter.
+(define_split
+  [(set (match_operand 0 "flags_reg_operand")
+   (match_operator 1 "compare_operator"
+ [(and:SI (not:SI (match_operand:SI 2 "general_reg_operand"))
+  (match_operand:SI 3 "nonimmediate_operand"))
+  (const_int 0)]))
+   (clobber (match_dup 2))]
+  "reload_completed
+   && optimize_insn_for_size_p () && optimize_size > 1
+   && LEGACY_INT_REG_P (operands[2])
+   && !REX_INT_REG_P (operands[3])
+   && !reg_overlap_mentioned_p (operands[2], operands[3])"
+  [(set (match_dup 2) (not:SI (match_dup 2)))
+   (set (match_dup 0) (match_op_dup 1
+[(and:SI (match_dup 3) (match_dup 2))
+(const_int 0)]))])
 
 ;; Logical inclusive and exclusive OR instructions
 
diff --git a/gcc/testsuite/gcc.target/i386/bmi-andn-3.c 
b/gcc/testsuite/gcc.target/i386/bmi-andn-3.c
new file mode 100644
index 000..16993a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/bmi-andn-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-Oz -mbmi" } */
+int m;
+
+int foo(int x, int y)
+{
+  return (x & ~y) != 0;
+}
+
+int bar(int x)
+{
+  return (~x & m) != 0;
+}
+/* { dg-final { scan-assembler-not "andn\[ \\t\]+" } } */
+

[PATCH] PR tree-optimization/105458 - Check for equivalence after merging relations.

2022-05-17 Thread Andrew MacLeod via Gcc-patches


Sorry, missed this one earlier.

When we register a relation, such as LE_EXPR,  we first check if there 
is an existing relation that applies, and if so they are combined. We 
were checking if the relation being registered was an EQ_EXPR, and if 
so, invoked the equivalence oracle.


 I was doing the check for EQ_EXPR first, then merging with any 
existing relation.   In this case, the merge resulted in transforming 
the LE_EXPR into an EQ_EXPR, but the check to invoke the 
equivalence_oracle had already been done, and we got to a place we 
shouldn't have. doh!


The fix is to do the merge first, then check for EQ_EXPR.

The patch is a hair different (due to VREL_*  renames in gcc13), so I've 
attached both patches.


bootstraps on gcc12 and gcc13 with no regressions.  pushed on trunk.

OK for GCC12?

Andrew
commit 84c0e801f26275a4700c10a710a185c17f0418e5
Author: Andrew MacLeod 
Date:   Mon May 16 21:39:30 2022 -0400

Check for equivalence after merging relations.

When registering a relation, we need to merge with any existing relation
before checking if it was an equivalence... otherwise it was not being
handled properly.

gcc/
PR tree-optimization/105458
* value-relation.cc (path_oracle::register_relation): Merge, then check
for equivalence.

gcc/testsuite/
* gcc.dg/pr105458.c: New.

diff --git a/gcc/testsuite/gcc.dg/pr105458.c b/gcc/testsuite/gcc.dg/pr105458.c
new file mode 100644
index 000..eb58bf21f32
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr105458.c
@@ -0,0 +1,20 @@
+/* PR tree-optimization/105458 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fexpensive-optimizations -fno-tree-dominator-opts " } */
+
+void
+yj (int j4)
+{
+  int t3;
+
+  for (t3 = 0; t3 < 6; ++t3)
+{
+  short int v4 = t3;
+
+  if (v4 == j4 || v4 > t3)
+for (;;)
+  {
+  }
+}
+}
+
diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index a93565109f9..85d159f5d96 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -1384,16 +1384,16 @@ path_oracle::register_relation (basic_block bb, relation_kind k, tree ssa1,
   fprintf (dump_file, " (root: bb%d)\n", bb->index);
 }
 
+  relation_kind curr = query_relation (bb, ssa1, ssa2);
+  if (curr != VREL_VARYING)
+k = relation_intersect (curr, k);
+
   if (k == VREL_EQ)
 {
   register_equiv (bb, ssa1, ssa2);
   return;
 }
 
-  relation_kind curr = query_relation (bb, ssa1, ssa2);
-  if (curr != VREL_VARYING)
-k = relation_intersect (curr, k);
-
   bitmap_set_bit (m_relations.m_names, SSA_NAME_VERSION (ssa1));
   bitmap_set_bit (m_relations.m_names, SSA_NAME_VERSION (ssa2));
   relation_chain *ptr = (relation_chain *) obstack_alloc (&m_chain_obstack,
commit c48fe8d3430e81ddea621c24e9b66d55aadfb316
Author: Andrew MacLeod 
Date:   Tue May 17 09:36:39 2022 -0400

Check for equivalence after merging relations.

When registering a relation, we need to merge with any existing relation
before checking if it was an equivalence... otherwise it was not being
handled properly.

gcc/
PR tree-optimization/105458
* value-relation.cc (path_oracle::register_relation): Merge, then check
for equivalence.

gcc/testsuite/
* gcc.dg/pr105458.c: New.

diff --git a/gcc/testsuite/gcc.dg/pr105458.c b/gcc/testsuite/gcc.dg/pr105458.c
new file mode 100644
index 000..eb58bf21f32
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr105458.c
@@ -0,0 +1,20 @@
+/* PR tree-optimization/105458 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fexpensive-optimizations -fno-tree-dominator-opts " } */
+
+void
+yj (int j4)
+{
+  int t3;
+
+  for (t3 = 0; t3 < 6; ++t3)
+{
+  short int v4 = t3;
+
+  if (v4 == j4 || v4 > t3)
+for (;;)
+  {
+  }
+}
+}
+
diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 077ab4230a7..a69ad080e47 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -1388,16 +1388,16 @@ path_oracle::register_relation (basic_block bb, relation_kind k, tree ssa1,
   fprintf (dump_file, " (root: bb%d)\n", bb->index);
 }
 
+  relation_kind curr = query_relation (bb, ssa1, ssa2);
+  if (curr != VREL_NONE)
+k = relation_intersect (curr, k);
+
   if (k == EQ_EXPR)
 {
   register_equiv (bb, ssa1, ssa2);
   return;
 }
 
-  relation_kind curr = query_relation (bb, ssa1, ssa2);
-  if (curr != VREL_NONE)
-k = relation_intersect (curr, k);
-
   bitmap_set_bit (m_relations.m_names, SSA_NAME_VERSION (ssa1));
   bitmap_set_bit (m_relations.m_names, SSA_NAME_VERSION (ssa2));
   relation_chain *ptr = (relation_chain *) obstack_alloc (&m_chain_obstack,

Re: [PATCH] i386: Remove constraints when used with constant integer predicates.

2022-05-17 Thread Uros Bizjak via Gcc-patches

I have reverted the patch to fix PR105624.

Uros.

On Sun, May 15, 2022 at 10:10 PM Uros Bizjak  wrote:
>
> const_int_operand and other const*_operand predicates do not need
> constraints when the constraint is inherited from the range of
> constant integer predicate.  Remove the constraint in case all
> alternatives use the same inherited constraint.
>
> 2022-05-15  Uroš Bizjak  
>
> gcc/ChangeLog:
>
> * config/i386/i386.md: Remove constraints when used with
> const_int_operand, const0_operand, const_1_operand, constm1_operand,
> const8_operand, const128_operand, const248_operand, const123_operand,
> const2367_operand, const1248_operand, const359_operand,
> const_4_or_8_to_11_operand, const48_operand, const_0_to_1_operand,
> const_0_to_3_operand, const_0_to_4_operand, const_0_to_5_operand,
> const_0_to_7_operand, const_0_to_15_operand, const_0_to_31_operand,
> const_0_to_63_operand, const_0_to_127_operand, const_0_to_255_operand,
> const_0_to_255_mul_8_operand, const_1_to_31_operand,
> const_1_to_63_operand, const_2_to_3_operand, const_4_to_5_operand,
> const_4_to_7_operand, const_6_to_7_operand, const_8_to_9_operand,
> const_8_to_11_operand, const_8_to_15_operand, const_10_to_11_operand,
> const_12_to_13_operand, const_12_to_15_operand, const_14_to_15_operand,
> const_16_to_19_operand, const_16_to_31_operand, const_20_to_23_operand,
> const_24_to_27_operand and const_28_to_31_operand.
> * config/i386/mmx.md: Ditto.
> * config/i386/sse.md: Ditto.
> * config/i386/subst.md: Ditto.
> * config/i386/sync.md: Ditto.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Pushed to master.
>
> Uros.

Re: [PATCH] gdc 9, 10 and 11 bug fix

2022-05-17 Thread Marc Aurèle La France


On Tue, 17 May 2022, Iain Buclaw wrote:

Excerpts from Marc Aurèle La France's message of Mai 16, 2022 11:34 pm:

On Sun, 15 May 2022, Iain Buclaw wrote:

Excerpts from Marc Aurèle La France's message of Mai 12, 2022 10:29 pm:



No compiler has any business rejecting files for the sole crime of
being symlinked to.  The following applies, modulo patch fuzz, to the
9, 10 and 11 series of compilers.



Given my use of shadow trees, this bug attempted to prevent me from
building 12.1.0.  The D-based gdc in 12.1.0 and up does not exhibit
this quirky behaviour.



Thanks, I've checked upstream and see the following change:



https://github.com/dlang/dmd/pull/11836/commits/ebda81e44fd0ca4b247a1860d9bef411c41c16cb



It should be fine to just backport that.



Thanks for the pointer.



I ended up with the three slightly different diffs below, one each for
the 9, 10 and 11 branches.  Each was rebuilt using 8.5.0, then used to
rebuild 12.1.0.  All of this ran smoothly without complaint, although I
wouldn't want to do this on a 486...



Signed-off-by: Marc Aurèle La France 



For GCC 9   --  8< --

diff -aNpRruz -X /etc/diff.excludes gcc-9.4.0/gcc/d/dmd/root/filename.c 
devel-9.4.0/gcc/d/dmd/root/filename.c
--- gcc-9.4.0/gcc/d/dmd/root/filename.c 2021-06-01 01:53:04.716474774 -0600
+++ devel-9.4.0/gcc/d/dmd/root/filename.c   2022-05-15 15:02:49.995441507 
-0600
@@ -475,53 +475,7 @@ const char *FileName::safeSearchPath(Strings *path, const 
char *name)

 return FileName::searchPath(path, name, false);
 #elif POSIX
-/* Even with realpath(), we must check for // and disallow it
- */
-for (const char *p = name; *p; p++)
-{
-char c = *p;
-if (c == '/' && p[1] == '/')
-{
-return NULL;
-}
-}



I'd keep this check in, otherwise removing/replacing only the `if
(path)` branch looks OK to me.


The corresponding D code doesn't care about double slashes and neither 
should this.  Also, the comment is misleading as realpath() would no 
longer be used here.


Marc.

[PATCH] Do not clear bb->aux in duplicate_loop_body_to_header_edge

2022-05-17 Thread Richard Biener via Gcc-patches

duplicate_loop_body_to_header_edge clears bb->aux which is not wanted
by a new use in loop unswitching.  The clearing was introduced with
r0-69110-g6580ee7781f903 and it seems accidentially so.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

2022-05-17  Richard Biener  

* cfgloopmanip.cc (duplicate_loop_body_to_header_edge): Do
not clear bb->aux of the copied blocks.
---
 gcc/cfgloopmanip.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index b4357c03e86..7736e3ec709 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -1351,7 +1351,6 @@ duplicate_loop_body_to_header_edge (class loop *loop, 
edge e,
   unsigned j;
 
   bb = bbs[i];
-  bb->aux = 0;
 
   auto_vec dom_bbs = get_dominated_by (CDI_DOMINATORS, bb);
   FOR_EACH_VEC_ELT (dom_bbs, j, dominated)
-- 
2.35.3

[committed] libgomp: Clarify that omp_display_env is fully implemented

2022-05-17 Thread Jakub Jelinek via Gcc-patches

Hi!

OpenMP 5.2 added
"When called from within a target region the effect is unspecified."
restriction to omp_display_env, so it is ok not to support it in
target regions (worst case we could add an empty implementation
or one with __builtin_trap in there).

Committed to trunk and 12.2.

2022-05-17  Jakub Jelinek  

* libgomp.texi (OpenMP 5.1): Remove "Not inside target regions"
comment for omp_display_env feature.

--- libgomp/libgomp.texi.jj 2022-05-17 16:51:06.792778541 +0200
+++ libgomp/libgomp.texi2022-05-17 16:56:05.117730443 +0200
@@ -321,8 +321,7 @@ The OpenMP 4.5 specification is fully su
   @code{omp_aligned_calloc} runtime routines @tab Y @tab
 @item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
   @code{omp_atv_default} changed @tab Y @tab
-@item @code{omp_display_env} runtime routine @tab Y
-  @tab Not inside @code{target} regions
+@item @code{omp_display_env} runtime routine @tab Y @tab
 @item @code{ompt_scope_endpoint_t} enum: @code{ompt_scope_beginend} @tab N @tab
 @item @code{ompt_sync_region_t} enum additions @tab N @tab
 @item @code{ompt_state_t} enum: @code{ompt_state_wait_barrier_implementation}


Jakub

Re: [PATCH] Mitigate -Wmaybe-uninitialized in expmed.cc.

2022-05-17 Thread Martin Liška

On 5/16/22 12:32, Richard Biener wrote:
> It only seems to happen with your host compiler though?  The set of

Yes, happens with just released 12.1 as host compiler:

g++ -fcf-protection -fno-PIE -c   -g -O2 -DIN_GCC -fPIC-fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
-DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc 
-I/home/marxin/Programming/gcc/gcc/. 
-I/home/marxin/Programming/gcc/gcc/../include 
-I/home/marxin/Programming/gcc/gcc/../libcpp/include 
-I/home/marxin/Programming/gcc/gcc/../libcody  
-I/home/marxin/Programming/gcc/gcc/../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o expmed.o -MT expmed.o 
-MMD -MP -MF ./.deps/expmed.TPo /home/marxin/Programming/gcc/gcc/expmed.cc
In file included from /home/marxin/Programming/gcc/gcc/coretypes.h:478,
 from /home/marxin/Programming/gcc/gcc/expmed.cc:26:
In function ‘poly_uint16 mode_to_bytes(machine_mode)’,
inlined from ‘typename if_nonpoly::type 
GET_MODE_SIZE(const T&) [with T = scalar_int_mode]’ at 
/home/marxin/Programming/gcc/gcc/machmode.h:647:24,
inlined from ‘rtx_def* emit_store_flag_1(rtx, rtx_code, rtx, rtx, 
machine_mode, int, int, machine_mode)’ at 
/home/marxin/Programming/gcc/gcc/expmed.cc:5728:56:
/home/marxin/Programming/gcc/gcc/machmode.h:550:49: warning: ‘*(unsigned 
int*)((char*)&int_mode + offsetof(scalar_int_mode, scalar_int_mode::m_mode))’ 
may be used uninitialized [-Wmaybe-uninitialized]
  550 |   ? mode_size_inline (mode) : mode_size[mode]);
  | ^~~~
/home/marxin/Programming/gcc/gcc/expmed.cc: In function ‘rtx_def* 
emit_store_flag_1(rtx, rtx_code, rtx, rtx, machine_mode, int, int, 
machine_mode)’:
/home/marxin/Programming/gcc/gcc/expmed.cc:5657:19: note: ‘*(unsigned 
int*)((char*)&int_mode + offsetof(scalar_int_mode, scalar_int_mode::m_mode))’ 
was declared here
 5657 |   scalar_int_mode int_mode;
  |   ^~~~

Cheers,
Martin

[PATCH v2 4/5] xtensa: Add setmemsi insn pattern

2022-05-17 Thread Takayuki 'January June' Suwa via Gcc-patches


This patch introduces setmemsi insn pattern of two kinds, unrolled loop and
small loop, for fixed small length and constant initialization value.

gcc/ChangeLog:

* gcc/config/xtensa/xtensa-protos.h
(xtensa_expand_block_set_unrolled_loop,
xtensa_expand_block_set_small_loop): New prototypes.
* gcc/config/xtensa/xtensa.c (xtensa_sizeof_MOVI,
xtensa_expand_block_set_unrolled_loop,
xtensa_expand_block_set_small_loop): New functions.
* gcc/config/xtensa/xtensa.md (setmemsi): New expansion pattern.
* gcc/config/xtensa/xtensa.opt (mlongcalls): Add target mask.
---
 gcc/config/xtensa/xtensa-protos.h |   2 +
 gcc/config/xtensa/xtensa.c| 208 ++
 gcc/config/xtensa/xtensa.md   |  16 +++
 gcc/config/xtensa/xtensa.opt  |   2 +-
 4 files changed, 227 insertions(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/xtensa-protos.h 
b/gcc/config/xtensa/xtensa-protos.h

index 18d803581..80b1da2bb 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -41,6 +41,8 @@ extern void xtensa_expand_conditional_branch (rtx *, 
machine_mode);

 extern int xtensa_expand_conditional_move (rtx *, int);
 extern int xtensa_expand_scc (rtx *, machine_mode);
 extern int xtensa_expand_block_move (rtx *);
+extern int xtensa_expand_block_set_unrolled_loop (rtx *);
+extern int xtensa_expand_block_set_small_loop (rtx *);
 extern void xtensa_split_operand_pair (rtx *, machine_mode);
 extern int xtensa_emit_move_sequence (rtx *, machine_mode);
 extern rtx xtensa_copy_incoming_a7 (rtx);
diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index d3405beb6..fb398d00c 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1363,6 +1363,214 @@ xtensa_expand_block_move (rtx *operands)
 }


+/* Try to expand a block set operation to a sequence of RTL move
+   instructions.  If not optimizing, or if the block size is not a
+   constant, or if the block is too large, or if the value to
+   initialize the block with is not a constant, the expansion
+   fails and GCC falls back to calling memset().
+
+   operands[0] is the destination
+   operands[1] is the length
+   operands[2] is the initialization value
+   operands[3] is the alignment */
+
+static int
+xtensa_sizeof_MOVI (HOST_WIDE_INT imm)
+{
+  return (TARGET_DENSITY && IN_RANGE (imm, -32, 95)) ? 2 : 3;
+}
+
+int
+xtensa_expand_block_set_unrolled_loop (rtx *operands)
+{
+  rtx dst_mem = operands[0];
+  HOST_WIDE_INT bytes, value, align;
+  int expand_len, funccall_len;
+  rtx x, reg;
+  int offset;
+
+  if (!CONST_INT_P (operands[1]) || !CONST_INT_P (operands[2]))
+return 0;
+
+  bytes = INTVAL (operands[1]);
+  if (bytes <= 0)
+return 0;
+  value = (int8_t)INTVAL (operands[2]);
+  align = INTVAL (operands[3]);
+  if (align > MOVE_MAX)
+align = MOVE_MAX;
+
+  /* Insn expansion: holding the init value.
+ Either MOV(.N) or L32R w/litpool.  */
+  if (align == 1)
+expand_len = xtensa_sizeof_MOVI (value);
+  else if (value == 0 || value == -1)
+expand_len = TARGET_DENSITY ? 2 : 3;
+  else
+expand_len = 3 + 4;
+  /* Insn expansion: a series of aligned memory stores.
+ Consist of S8I, S16I or S32I(.N).  */
+  expand_len += (bytes / align) * (TARGET_DENSITY
+  && align == 4 ? 2 : 3);
+  /* Insn expansion: the remainder, sub-aligned memory stores.
+ A combination of S8I and S16I as needed.  */
+  expand_len += ((bytes % align + 1) / 2) * 3;
+
+  /* Function call: preparing two arguments.  */
+  funccall_len = xtensa_sizeof_MOVI (value);
+  funccall_len += xtensa_sizeof_MOVI (bytes);
+  /* Function call: calling memset().  */
+  funccall_len += TARGET_LONGCALLS ? (3 + 4 + 3) : 3;
+
+  /* Apply expansion bonus (2x) if optimizing for speed.  */
+  if (optimize > 1 && !optimize_size)
+funccall_len *= 2;
+
+  /* Decide whether to expand or not, based on the sum of the length
+ of instructions.  */
+  if (expand_len > funccall_len)
+return 0;
+
+  x = XEXP (dst_mem, 0);
+  if (!REG_P (x))
+dst_mem = replace_equiv_address (dst_mem, force_reg (Pmode, x));
+  switch (align)
+{
+case 1:
+  break;
+case 2:
+  value = (int16_t)((uint8_t)value * 0x0101U);
+  break;
+case 4:
+  value = (int32_t)((uint8_t)value * 0x01010101U);
+  break;
+default:
+  gcc_unreachable ();
+}
+  reg = force_reg (SImode, GEN_INT (value));
+
+  offset = 0;
+  do
+{
+  int unit_size = MIN (bytes, align);
+  machine_mode unit_mode = (unit_size >= 4 ? SImode :
+  (unit_size >= 2 ? HImode :
+QImode));
+  unit_size = GET_MODE_SIZE (unit_mode);
+
+  emit_move_insn (adjust_address (dst_mem, unit_mode, offset),
+ unit_mode == SImode ? reg
+ : convert_to_mode (unit_mode, reg, true));
+
+  offset += unit_size;
+

Re: [PING] Advise to call 'internal_error' instead of 'abort' or 'fancy_abort'

2022-05-17 Thread Richard Biener via Gcc-patches

On Tue, May 17, 2022 at 12:15 PM Thomas Schwinge
 wrote:
>
> Hi!
>
> Ping.

OK.

>
> Grüße
>  Thomas
>
>
> On 2022-05-10T16:03:12+0200, I wrote:
> > Hi!
> >
> > On 2022-05-03T15:46:43+0200, Richard Biener  
> > wrote:
> >> On Tue, May 3, 2022 at 2:29 PM Thomas Schwinge  
> >> wrote:
> >>> On 2022-05-03T12:53:50+0200, Richard Biener  
> >>> wrote:
> >>> > On Tue, May 3, 2022 at 10:16 AM Thomas Schwinge 
> >>> >  wrote:
> >>> >> On 2022-05-03T09:17:52+0200, Richard Biener 
> >>> >>  wrote:
> >>> >> > On Mon, May 2, 2022 at 4:01 PM Thomas Schwinge 
> >>> >> >  wrote:
> >>> >> > +#if 0
> >>> >> >gcc_unreachable ();
> >>> >> > +#else
> >>> >> > +  /* ..., but due to bugs (PR100400), we may actually come here.
> >>> >> > +Reliably catch this, regardless of checking level.  */
> >>> >> > +  abort ();
> >>> >> > +#endif
> >>> >> >
> >>> >> > this doesn't look correct.  If you want a reliable diagnostic here 
> >>> >> > please [...]
> >>> >> > call internal_error () manually (the IL verifiers do this).
> >>> >>
> >>> >> Hmm, I feel I'm going in circles...  ;-)
> >
> >>> >> I first had this as 'internal_error', but then saw the following source
> >>> >> code comment, 'gcc/diagnostic.cc':
> >>> >>
> >>> >> /* An internal consistency check has failed.  We make no attempt to
> >>> >>continue.  Note that unless there is debugging value to be had 
> >>> >> from
> >>> >>a more specific message, or some other good reason, you should 
> >>> >> use
> >>> >>abort () instead of calling this function directly.  */
> >>> >> void
> >>> >> internal_error (const char *gmsgid, ...)
> >>> >> {
> >>> >>
> >>> >> Here, there's no "debugging value to be had from a more specific
> >>> >> message", and I couldn't think of "some other good reason", so decided 
> >>> >> to
> >>> >> "use abort () instead of calling this function directly".
> >>> >
> >>> > I think that is misguided.
> >>>
> >>> So that I know which one to fix/reconsider: does your "that" refer to the
> >>> 'gcc/diagnostic.cc:internal_error' source code comment cited above, or my
> >>> interpretation of it?
> >>
> >> The comment to "use abort ()".
> >
> > Does the attached
> > "Advise to call 'internal_error' instead of 'abort' or 'fancy_abort'"
> > capture what you had in mind?
> >
> > (This is, obviously, not updating any of the many 'abort' or even a few
> > 'fancy_abort' calls that we currently have.)
> >
> >
> > Grüße
> >  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

[committed] openmp: Add support for inoutset depend-kind

2022-05-17 Thread Jakub Jelinek via Gcc-patches

Hi!

This patch adds support for inoutset depend-kind in depend
clauses.  It is very similar to the in depend-kind in that
a task with a dependency with that depend-kind is dependent
on all previously created sibling tasks with matching address
unless they have the same depend-kind.
In the in depend-kind case everything is dependent except
for in -> in dependency, for inoutset everything is
dependent except for inoutset -> inoutset dependency.
mutexinoutset is also similar (everything is dependent except
for mutexinoutset -> mutexinoutset dependency), but there is
also the additional restriction that only one task with
mutexinoutset for each address can be scheduled at once (i.e.
mutual exclusitivty).  For now we support mutexinoutset
the same as inout/out, but the inoutset support is full.

In order not to bump the ABI for dependencies each time
(we've bumped it already once, the old ABI supports only
inout/out and in depend-kind, the new ABI supports
inout/out, mutexinoutset, in and depobj), this patch arranges
for inoutset to be at least for the time being always handled
as if it was specified through depobj even when it is not.
So it uses the new ABI for that and inoutset are represented
like depobj - pointer to a pair of pointers where the first one
will be the actual address of the object mentioned in depend
clause and second pointer will be (void *) GOMP_DEPEND_INOUTSET.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed
to trunk.

2022-05-17  Jakub Jelinek  

gcc/
* tree-core.h (enum omp_clause_depend_kind): Add
OMP_CLAUSE_DEPEND_INOUTSET.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
* gimplify.cc (gimplify_omp_depend): Likewise.
* omp-low.cc (lower_depend_clauses): Likewise.
gcc/c-family/
* c-omp.cc (c_finish_omp_depobj): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
gcc/c/
* c-parser.cc (c_parser_omp_clause_depend): Parse
inoutset depend-kind.
(c_parser_omp_depobj): Likewise.
gcc/cp/
* parser.cc (cp_parser_omp_clause_depend): Parse
inoutset depend-kind.
(cp_parser_omp_depobj): Likewise.
* cxx-pretty-print.cc (cxx_pretty_printer::statement): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
gcc/testsuite/
* c-c++-common/gomp/all-memory-1.c (boo): Add test with
inoutset depend-kind.
* c-c++-common/gomp/all-memory-2.c (boo): Likewise.
* c-c++-common/gomp/depobj-1.c (f1): Likewise.
(f2): Adjusted expected diagnostics.
* g++.dg/gomp/depobj-1.C (f4): Adjust expected diagnostics.
include/
* gomp-constants.h (GOMP_DEPEND_INOUTSET): Define.
libgomp/
* libgomp.h (struct gomp_task_depend_entry): Change is_in type
from bool to unsigned char.
* task.c (gomp_task_handle_depend): Handle GOMP_DEPEND_INOUTSET.
Ignore dependencies where
task->depend[i].is_in && task->depend[i].is_in == ent->is_in
rather than just task->depend[i].is_in && ent->is_in.  Remember
whether GOMP_DEPEND_IN loop is needed and guard the loop with that
conditional.
(gomp_task_maybe_wait_for_dependencies): Handle GOMP_DEPEND_INOUTSET.
Ignore dependencies where elem.is_in && elem.is_in == ent->is_in
rather than just elem.is_in && ent->is_in.
* testsuite/libgomp.c-c++-common/depend-1.c (test): Add task with
inoutset depend-kind.
* testsuite/libgomp.c-c++-common/depend-2.c (test): Likewise.
* testsuite/libgomp.c-c++-common/depend-3.c (test): Likewise.
* testsuite/libgomp.c-c++-common/depend-inoutset-1.c: New test.

--- gcc/tree-core.h.jj  2022-05-17 09:00:46.753995662 +0200
+++ gcc/tree-core.h 2022-05-17 11:19:00.901120286 +0200
@@ -1527,6 +1527,7 @@ enum omp_clause_depend_kind
   OMP_CLAUSE_DEPEND_OUT,
   OMP_CLAUSE_DEPEND_INOUT,
   OMP_CLAUSE_DEPEND_MUTEXINOUTSET,
+  OMP_CLAUSE_DEPEND_INOUTSET,
   OMP_CLAUSE_DEPEND_SOURCE,
   OMP_CLAUSE_DEPEND_SINK,
   OMP_CLAUSE_DEPEND_DEPOBJ,
--- gcc/tree-pretty-print.cc.jj 2022-05-17 09:00:46.794995110 +0200
+++ gcc/tree-pretty-print.cc2022-05-17 11:19:00.902120273 +0200
@@ -804,6 +804,9 @@ dump_omp_clause (pretty_printer *pp, tre
case OMP_CLAUSE_DEPEND_MUTEXINOUTSET:
  name = "mutexinoutset";
  break;
+   case OMP_CLAUSE_DEPEND_INOUTSET:
+ name = "inoutset";
+ break;
case OMP_CLAUSE_DEPEND_SOURCE:
  pp_string (pp, "source)");
  return;
--- gcc/gimplify.cc.jj  2022-05-17 09:00:46.563998222 +0200
+++ gcc/gimplify.cc 2022-05-17 11:19:00.890120434 +0200
@@ -8270,9 +8270,9 @@ gimplify_omp_depend (tree *list_p, gimpl
 {
   tree c;
   gimple *g;
-  size_t n[4] = { 0, 0, 0, 0 };
-  bool unused[4];
-  tree counts[4] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
+  size_t n[5] = { 0, 0, 0, 0, 0 };
+  bool unused[5];
+  tree counts[5] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
   tree last_

Re: [PATCH] lto-plugin: add support for feature detection

2022-05-17 Thread Martin Liška

On 5/16/22 17:16, Alexander Monakov wrote:
> On Mon, 16 May 2022, Martin Liška wrote:
> 
>> I've implemented first version of the patch, please take a look.
> 
> I'll comment on the patch, feel free to inform me when I should back off
> with forcing my opinion in this thread :)

I do really welcome your suggestions Alexander ;)

> 
>> --- a/include/plugin-api.h
>> +++ b/include/plugin-api.h
>> @@ -483,6 +483,18 @@ enum ld_plugin_level
>>LDPL_FATAL
>>  };
>>  
>> +/* The linker's interface for API version negotiation.  */
>> +
>> +typedef
>> +int (*ld_plugin_get_api_version) (char *linker_identifier, int 
>> linker_version,
>> +  int preferred_linker_api,
>> +  const char **compiler_identifier,
>> +  int *compiler_version);
>> +
>> +typedef
>> +enum ld_plugin_status
>> +(*ld_plugin_register_get_api_version) (ld_plugin_get_api_version handler);
>> +
>>  /* Values for the tv_tag field of the transfer vector.  */
>>  
>>  enum ld_plugin_tag
>> @@ -521,6 +533,7 @@ enum ld_plugin_tag
>>LDPT_REGISTER_NEW_INPUT_HOOK,
>>LDPT_GET_WRAP_SYMBOLS,
>>LDPT_ADD_SYMBOLS_V2,
>> +  LDPT_REGISTER_GET_API_VERSION,
>>  };
>>  
>>  /* The plugin transfer vector.  */
>> @@ -556,6 +569,7 @@ struct ld_plugin_tv
>>  ld_plugin_get_input_section_size tv_get_input_section_size;
>>  ld_plugin_register_new_input tv_register_new_input;
>>  ld_plugin_get_wrap_symbols tv_get_wrap_symbols;
>> +ld_plugin_register_get_api_version tv_register_get_api_version;
>>} tv_u;
>>  };
> 
> Here I disagree with the overall design. Rui already pointed out how plugin
> API seems to consist of callbacks-that-register-callbacks, and I'm with him
> on that, let's not make that worse. On a more serious note, this pattern:
> 
> * the linker provides register_get_api_version entrypoint
> * the plugin registers its get_api_version implementation
> * the linker uses the provided function pointer
> 
> is problematic because the plugin doesn't know when the linker is going to
> invoke its callback (or maybe the linker won't do that at all).

Yes, depends on what direction of the communication do we want to implement.
My implementation was that linker provides API version request, its version
and then it's plugin which decides about a version and that value is returned
to linker (via its callback).

> 
> I'd recommend to reduce the level of indirection, remove the register_
> callback, and simply require that if LDPT_GET_API_VERSION is provided,
> the plugin MUST invoke it before returning from onload, i.e.:
> 
> * the linker invokes onload with LDPT_GET_API_VERSION in 'transfer vector'
> * the plugin iterates over the transfer vector and notes if 
> LDPT_GET_API_VERSION
>   is seen
>   * if not, the plugin knows the linker is predates its introduction
>   * if yes, the plugin invokes it before returning from onload
> * the linker now knows the plugin version (either one provided via
>   LDPT_GET_API_VERSION, or 'old' if the callback wasn't invoked).

All right, so it will be linker who will make a decision...

> 
>> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
>> index 00b760636dc..49484decd89 100644
>> --- a/lto-plugin/lto-plugin.c
>> +++ b/lto-plugin/lto-plugin.c
>> @@ -69,6 +69,7 @@ along with this program; see the file COPYING3.  If not see
>>  #include "../gcc/lto/common.h"
>>  #include "simple-object.h"
>>  #include "plugin-api.h"
>> +#include "ansidecl.h"
>>  
>>  /* We need to use I64 instead of ll width-specifier on native Windows.
>> The reason for this is that older MS-runtimes don't support the ll.  */
>> @@ -166,6 +167,10 @@ static ld_plugin_add_input_file add_input_file;
>>  static ld_plugin_add_input_library add_input_library;
>>  static ld_plugin_message message;
>>  static ld_plugin_add_symbols add_symbols, add_symbols_v2;
>> +static ld_plugin_register_get_api_version register_get_api_version;
>> +
>> +/* By default, use version 1 if there is not negotiation.  */
>> +static int used_api_version = 1;
>>  
>>  static struct plugin_file_info *claimed_files = NULL;
>>  static unsigned int num_claimed_files = 0;
>> @@ -1407,6 +1412,29 @@ process_option (const char *option)
>>verbose = verbose || debug;
>>  }
>>  
>> +static int
>> +get_api_version (char *linker_identifier, int linker_version,
>> + int preferred_linker_api,
> 
> The 'preferred' qualifier seems vague. If you go with my suggestion above,
> I'd suggest to pass lowest and highest supported version number, and the 
> linker
> can check if that intersects with its range of supported versions, and error 
> out
> if the intersection is empty (and otherwise return the highest version they 
> both
> support as the 'negotiated' one).

... so the plug-in tells the linker about range of versions and linker will 
make a decision.
Got it, lemme prepare v2 of the patch.

> 
>> + const char **compiler_identifier,
>> + int *comp

[AArch64] Improve SVE dup intrinsics codegen

2022-05-17 Thread Andre Vieira (lists) via Gcc-patches


Hi,

This patch teaches the aarch64 backend to improve codegen when using dup 
with NEON vectors with repeating patterns. It will attempt to use a 
smaller NEON vector (or element) to limit the number of instructions 
needed to construct the input vector.


Bootstrapped and regression tested  aarch64-none-linux-gnu.

Is his OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.cc (aarch64_simd_container_mode): Make 
it global.
    * config/aarch64/aarch64-protos.h 
(aarch64_simd_container_mode): Declare it.
    * config/aarch64/aarch64-sve.md (*vec_duplicate_reg): 
Rename this to ...

    (@aarch64_vec_duplicae_reg_): ... this.
    * gcc/config/aarch64-sve-builtins-base.cc 
(svdup_lane_impl::expand): Improve codegen when inputs form a repeating 
pattern.


gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/sve/dup_opt.c: New test.
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
2ac781dff4a93cbe0f0b091147b2521ed1a88750..cfc31b467cf1d3cd79b2dfe6a54e6910dd43b5d8
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -771,6 +771,7 @@ int aarch64_branch_cost (bool, bool);
 enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
 bool aarch64_advsimd_struct_mode_p (machine_mode mode);
 opt_machine_mode aarch64_vq_mode (scalar_mode);
+machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
 opt_machine_mode aarch64_full_sve_mode (scalar_mode);
 bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
 bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 
c24c05487246f529f81867d6429e636fd6dc74d0..f8b755a83dc37578363270618323f87c95fa327f
 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -875,13 +875,98 @@ public:
argument N to go into architectural lane N, whereas Advanced SIMD
vectors are loaded memory lsb to register lsb.  We therefore need
to reverse the elements for big-endian targets.  */
-rtx vq_reg = gen_reg_rtx (vq_mode);
 rtvec vec = rtvec_alloc (elements_per_vq);
 for (unsigned int i = 0; i < elements_per_vq; ++i)
   {
unsigned int argno = BYTES_BIG_ENDIAN ? elements_per_vq - i - 1 : i;
RTVEC_ELT (vec, i) = e.args[argno];
   }
+
+/* Look for a repeating pattern in the 128-bit input as that potentially
+   simplifies constructing the input vector.
+   For example, codegen for svdupq_n_s32 (a, b, a, b), could be simplified
+   from:
+   dup v0.4s, w0
+   fmovs1, w1
+   ins v0.s[1], v1.s[0]
+   ins v0.s[3], v1.s[0]
+   dup z0.q, z0.q[0]
+   to:
+   fmovd0, x0
+   ins v0.s[1], w1
+   mov z0.d, d0
+   where we can see it uses a [a, b] input vector reducing the number of
+   needed instructions.  */
+if  (elements_per_vq > 1 && mode == e.vector_mode(0))
+  {
+   unsigned int new_elements_n = elements_per_vq;
+   bool group = true;
+   while (group && new_elements_n > 1)
+ {
+   for (unsigned int i = 0; i < new_elements_n / 2; ++i)
+ {
+   if (rtx_equal_p (RTVEC_ELT (vec, i),
+RTVEC_ELT (vec, new_elements_n / 2 + i)) == 0)
+ {
+   group = false;
+   break;
+ }
+ }
+   if (group)
+ new_elements_n /= 2;
+ }
+   /* We have found a repeating pattern smaller than 128-bits, so use that
+  instead.  */
+   if (new_elements_n < elements_per_vq)
+ {
+   unsigned int input_size = 128 / elements_per_vq * new_elements_n;
+   scalar_mode new_mode
+ = int_mode_for_size (input_size, 0).require ();
+   rtx input;
+   if (new_elements_n > 1)
+ {
+   if (input_size < 64)
+ {
+   /* TODO: Remove this when support for 32- and 16-bit vectors
+  is added.
+  */
+   new_elements_n *= 64/input_size;
+   input_size = 64;
+   new_mode = int_mode_for_size (input_size, 0).require ();
+ }
+   input = gen_reg_rtx (new_mode);
+   rtvec new_vec = rtvec_alloc (new_elements_n);
+   for (unsigned int i = 0; i < new_elements_n; ++i)
+ RTVEC_ELT (new_vec, i) = RTVEC_ELT (vec, i);
+
+   machine_mode merge_mode
+ = aarch64_simd_container_mode (element_mode, input_size);
+
+   rtx merge_subreg = simplify_gen_subreg (merge_mode, input,
+   new_mode, 0);
+   aarch64_expand_vector_init (merge_subreg,

Re: [Patch] gcn/t-omp-device: Add 'amdgcn' as 'arch' [PR105602]

2022-05-17 Thread Jakub Jelinek via Gcc-patches

On Tue, May 17, 2022 at 02:45:09PM +0200, Tobias Burnus wrote:
> Hi Jakub, hi Andrew,
> 
> On 17.05.22 10:01, Jakub Jelinek wrote:
> > But the above patch only implements it partially.
> > What is in omp-device-properties-* is for the sake of the host compiler,
> > [...]
> > You need to also change gcc/config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa)
> > case omp_device_arch: handling so that it accepts both "gcn" and "amdgcn"
> > equally.
> 
> Done with the attached patch, which I intent to commit after the lunch break,
> unless there are further comments.
> 
> * * *
> 
> Additionally, I am considering to document the permitted values → second 
> patch.
> The idea is to later add more device-specific information, separately for gcn
> and nvptx like simd/teams/threads handling or similar information (additional
> envvar, where it makes sense etc.), which is currently spread over several
> slides, wikipages, mouth-to-mouth information etc.
> 
> Thoughts regarding the second patch?
> 
> Tobias
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> gcn/t-omp-device: Add 'amdgcn' as 'arch' [PR105602]
> 
> Improve cross-compiler handling.
> 
> gcc/ChangeLog:
> 
>   PR target/105602
>   * config/gcn/t-omp-device (arch): Add 'amdgcn' besides existing 'gcn'.
>   * config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Likewise.
> 
>  gcc/config/gcn/gcn.cc   | 2 +-
>  gcc/config/gcn/t-omp-device | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
> index e2e9335ad75..92896ab972f 100644
> --- a/gcc/config/gcn/gcn.cc
> +++ b/gcc/config/gcn/gcn.cc
> @@ -2632,7 +2632,7 @@ gcn_omp_device_kind_arch_isa (enum 
> omp_device_kind_arch_isa trait,
>  case omp_device_kind:
>return strcmp (name, "gpu") == 0;
>  case omp_device_arch:
> -  return strcmp (name, "gcn") == 0;
> +  return (strcmp (name, "amdgcn") == 0 || strcmp (name, "gcn") == 0);

The ()s around it aren't needed and don't make it more readable.

Otherwise LGTM.

> +@headitem @code{arch} @tab @code{kind} @tab @code{isa}
> +@item @code{intel_mic}, @code{86}, @code{86_64}, @code{386}, @code{486},
> +  @code{586}, @code{686}, @code{a32}

You've lost the first letter of most of the above arches.
x86, x86_64, i386, i486, i586, i686, ia32

Otherwise LGTM.

Jakub

Re: [Patch] gcn/t-omp-device: Add 'amdgcn' as 'arch' [PR105602]

2022-05-17 Thread Tobias Burnus


Hi Jakub, hi Andrew,

On 17.05.22 10:01, Jakub Jelinek wrote:

But the above patch only implements it partially.
What is in omp-device-properties-* is for the sake of the host compiler,
[...]
You need to also change gcc/config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa)
case omp_device_arch: handling so that it accepts both "gcn" and "amdgcn"
equally.


Done with the attached patch, which I intent to commit after the lunch break,
unless there are further comments.

* * *

Additionally, I am considering to document the permitted values → second patch.
The idea is to later add more device-specific information, separately for gcn
and nvptx like simd/teams/threads handling or similar information (additional
envvar, where it makes sense etc.), which is currently spread over several
slides, wikipages, mouth-to-mouth information etc.

Thoughts regarding the second patch?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcn/t-omp-device: Add 'amdgcn' as 'arch' [PR105602]

Improve cross-compiler handling.

gcc/ChangeLog:

	PR target/105602
	* config/gcn/t-omp-device (arch): Add 'amdgcn' besides existing 'gcn'.
	* config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Likewise.

 gcc/config/gcn/gcn.cc   | 2 +-
 gcc/config/gcn/t-omp-device | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index e2e9335ad75..92896ab972f 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -2632,7 +2632,7 @@ gcn_omp_device_kind_arch_isa (enum omp_device_kind_arch_isa trait,
 case omp_device_kind:
   return strcmp (name, "gpu") == 0;
 case omp_device_arch:
-  return strcmp (name, "gcn") == 0;
+  return (strcmp (name, "amdgcn") == 0 || strcmp (name, "gcn") == 0);
 case omp_device_isa:
   if (strcmp (name, "fiji") == 0)
 	return gcn_arch == PROCESSOR_FIJI;
diff --git a/gcc/config/gcn/t-omp-device b/gcc/config/gcn/t-omp-device
index cd56e2f8a68..e1d9e0d2a1e 100644
--- a/gcc/config/gcn/t-omp-device
+++ b/gcc/config/gcn/t-omp-device
@@ -1,4 +1,4 @@
 omp-device-properties-gcn: $(srcdir)/config/gcn/gcn.cc
 	echo kind: gpu > $@
-	echo arch: gcn >> $@
+	echo arch: amdgcn gcn >> $@
 	echo isa: fiji gfx900 gfx906 gfx908 >> $@
libgomp.texi: Document OpenMP context selectors

libgomp/
	libgomp.texi (Offload-Target Specifics): New chapter; add section
	to document OpenMP context selectors.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 8a7512fb959..78a8b881a40 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -113,6 +113,7 @@ changed to GNU Offloading and Multi Processing Runtime Library.
 * OpenACC Library Interoperability:: OpenACC library interoperability with the
NVIDIA CUBLAS library.
 * OpenACC Profiling Interface::
+* Offload-Target Specifics::   Notes on offload-target specific internals
 * The libgomp ABI::Notes on the external ABI presented by libgomp.
 * Reporting Bugs:: How to report bugs in the GNU Offloading and
Multi Processing Runtime Library.
@@ -4255,6 +4256,37 @@ offloading devices (it's not clear if they should be):
 @item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
 @end itemize
 
+@c -
+@c Offload-Target Specifics
+@c -
+
+@node Offload-Target Specifics
+@chapter Offload-Target Specifics
+
+The following sections present notes on the offload-target specifics.
+
+@menu
+* OpenMP Context Selectors::
+@end menu
+
+@node OpenMP Context Selectors
+@section OpenMP Context Selectors
+
+@code{vendor} is always @code{gnu}. References are to the GCC manual.
+
+@multitable @columnfractions .60 .10 .25
+@headitem @code{arch} @tab @code{kind} @tab @code{isa}
+@item @code{intel_mic}, @code{86}, @code{86_64}, @code{386}, @code{486},
+  @code{586}, @code{686}, @code{a32}
+  @tab @code{host}
+  @tab See @code{-m...} flags in ``x86 Options'' (without @code{-m})
+@item @code{amdgcn} @code{gcn}
+  @tab @code{gpu}
+  @tab See @code{-march=} in ``AMD GCN Options''
+@item @code{nvptx}
+  @tab @code{gpu}
+  @tab See @code{-misa=} in ``Nvidia PTX Options''
+@end multitable
 
 
 @c -

Re: [PATCH] gdc 9, 10 and 11 bug fix

2022-05-17 Thread Iain Buclaw via Gcc-patches

Excerpts from Marc Aurèle La France's message of Mai 16, 2022 11:34 pm:
> On Sun, 15 May 2022, Iain Buclaw wrote:
>> Excerpts from Marc Aurèle La France's message of Mai 12, 2022 10:29 pm:
> 
>>> No compiler has any business rejecting files for the sole crime of
>>> being symlinked to.  The following applies, modulo patch fuzz, to the
>>> 9, 10 and 11 series of compilers.
> 
>>> Given my use of shadow trees, this bug attempted to prevent me from
>>> building 12.1.0.  The D-based gdc in 12.1.0 and up does not exhibit
>>> this quirky behaviour.
> 
>> Thanks, I've checked upstream and see the following change:
> 
>> https://github.com/dlang/dmd/pull/11836/commits/ebda81e44fd0ca4b247a1860d9bef411c41c16cb
> 
>> It should be fine to just backport that.
> 
> Thanks for the pointer.
> 
> I ended up with the three slightly different diffs below, one each for
> the 9, 10 and 11 branches.  Each was rebuilt using 8.5.0, then used to
> rebuild 12.1.0.  All of this ran smoothly without complaint, although I
> wouldn't want to do this on a 486...
> 
> Thanks again and have a great day.
> 
> Marc.
> 
> Signed-off-by: Marc Aurèle La France 
> 
> For GCC 9   --  8< --
> 
> diff -aNpRruz -X /etc/diff.excludes gcc-9.4.0/gcc/d/dmd/root/filename.c 
> devel-9.4.0/gcc/d/dmd/root/filename.c
> --- gcc-9.4.0/gcc/d/dmd/root/filename.c   2021-06-01 01:53:04.716474774 
> -0600
> +++ devel-9.4.0/gcc/d/dmd/root/filename.c 2022-05-15 15:02:49.995441507 
> -0600
> @@ -475,53 +475,7 @@ const char *FileName::safeSearchPath(Strings *path, 
> const char *name)
> 
>  return FileName::searchPath(path, name, false);
>  #elif POSIX
> -/* Even with realpath(), we must check for // and disallow it
> - */
> -for (const char *p = name; *p; p++)
> -{
> -char c = *p;
> -if (c == '/' && p[1] == '/')
> -{
> -return NULL;
> -}
> -}

I'd keep this check in, otherwise removing/replacing only the `if
(path)` branch looks OK to me.

Iain.

[committed] libstdc++: Skip tests that fail for the versioned namespace

2022-05-17 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

-- >8 --

Most tests for the contents of header synopses need to be supressed for
the versioned namespace build, because redeclaring the entities in std
fails when they were originally declared in std::__8.

I added these tests recently without the suppression, so they fail.

libstdc++-v3/ChangeLog:

* testsuite/20_util/expected/synopsis.cc: Skip for versioned
namespace.
* testsuite/27_io/headers/iosfwd/synopsis.cc: Likewise.
---
 libstdc++-v3/testsuite/20_util/expected/synopsis.cc | 1 +
 libstdc++-v3/testsuite/27_io/headers/iosfwd/synopsis.cc | 1 +
 2 files changed, 2 insertions(+)

diff --git a/libstdc++-v3/testsuite/20_util/expected/synopsis.cc 
b/libstdc++-v3/testsuite/20_util/expected/synopsis.cc
index 304bae93ebd..3a7eef3eee4 100644
--- a/libstdc++-v3/testsuite/20_util/expected/synopsis.cc
+++ b/libstdc++-v3/testsuite/20_util/expected/synopsis.cc
@@ -1,5 +1,6 @@
 // { dg-options "-std=gnu++23" }
 // { dg-do compile { target c++23 } }
+// { dg-require-normal-namespace "" }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/27_io/headers/iosfwd/synopsis.cc 
b/libstdc++-v3/testsuite/27_io/headers/iosfwd/synopsis.cc
index 467d63609bd..b6d3fa7a719 100644
--- a/libstdc++-v3/testsuite/27_io/headers/iosfwd/synopsis.cc
+++ b/libstdc++-v3/testsuite/27_io/headers/iosfwd/synopsis.cc
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-require-normal-namespace "" }
 
 #include 
 
-- 
2.34.3

[committed] libstdc++: Stop defining C++0x compat symbols for versioned namespace

2022-05-17 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, --enable-symvers=gnu and
--enable-symvers=gnu-versioned-namespace.

Pushed to trunk.

-- >8 --

The src/c++11/compatibility*-c++0x.cc files define symbols that need to
be exported for ancient versions of libstdc++.so.6 due to changes
between C++0x and the final C++11 standard. Those symbols are not needed
in the libstdc++.so.8 library, and we can skip building them entirely.

This also fixes the build failure I introduced last week when making the
versioned namespace config not use the _V2 namespace for compat symbols.

libstdc++-v3/ChangeLog:

* src/Makefile.am [ENABLE_SYMVERS_GNU_NAMESPACE] (cxx11_sources):
Do not build the compatibility*-c++0x.cc objects.
* src/Makefile.in: Regenerate.
* src/c++11/compatibility-c++0x.cc [_GLIBCXX_INLINE_VERSION]:
Refuse to build for the versioned namespace.
* src/c++11/compatibility-chrono.cc: Likewise.
* src/c++11/compatibility-condvar.cc: Likewise.
* src/c++11/compatibility-thread-c++0x.cc: Likewise.
* src/c++11/chrono.cc (system_clock, steady_clock):
Use macros to define in inline namespace _V2, matching the
declarations in .
* src/c++11/system_error.cc (system_category, generic_category):
Likewise.
---
 libstdc++-v3/src/Makefile.am  | 16 +++---
 libstdc++-v3/src/Makefile.in  | 31 ---
 libstdc++-v3/src/c++11/chrono.cc  |  5 ++-
 libstdc++-v3/src/c++11/compatibility-c++0x.cc |  4 +++
 .../src/c++11/compatibility-chrono.cc |  4 +++
 .../src/c++11/compatibility-condvar.cc|  4 +++
 .../src/c++11/compatibility-thread-c++0x.cc   |  4 +++
 libstdc++-v3/src/c++11/system_error.cc| 10 --
 8 files changed, 55 insertions(+), 23 deletions(-)

diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
index 9c3f4aca655..b83c222d51d 100644
--- a/libstdc++-v3/src/Makefile.am
+++ b/libstdc++-v3/src/Makefile.am
@@ -96,6 +96,16 @@ else
 ldbl_alt128_compat_sources =
 endif
 
+if ENABLE_SYMVERS_GNU_NAMESPACE
+cxx0x_compat_sources =
+else
+cxx0x_compat_sources = \
+   compatibility-atomic-c++0x.cc \
+   compatibility-c++0x.cc \
+   compatibility-chrono.cc \
+   compatibility-condvar.cc \
+   compatibility-thread-c++0x.cc
+endif
 
 parallel_compat_sources = \
compatibility-parallel_list.cc  compatibility-parallel_list-2.cc
@@ -108,11 +118,7 @@ cxx98_sources = \
${ldbl_compat_sources}
 
 cxx11_sources = \
-   compatibility-c++0x.cc \
-   compatibility-atomic-c++0x.cc \
-   compatibility-thread-c++0x.cc \
-   compatibility-chrono.cc \
-   compatibility-condvar.cc \
+   ${cxx0x_compat_sources} \
${ldbl_alt128_compat_sources}
 
 libstdc___la_SOURCES = $(cxx98_sources) $(cxx11_sources)
diff --git a/libstdc++-v3/src/c++11/chrono.cc b/libstdc++-v3/src/c++11/chrono.cc
index 6825b5bc4bf..5539d8cbedd 100644
--- a/libstdc++-v3/src/c++11/chrono.cc
+++ b/libstdc++-v3/src/c++11/chrono.cc
@@ -43,8 +43,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   namespace chrono
   {
-// XXX GLIBCXX_ABI Deprecated
-inline namespace _V2 {
+_GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2)
 
 constexpr bool system_clock::is_steady;
 
@@ -94,7 +93,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 }
 
-  } // end inline namespace _V2
+_GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
   } // namespace chrono
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/src/c++11/compatibility-c++0x.cc 
b/libstdc++-v3/src/c++11/compatibility-c++0x.cc
index d48f5bd1e28..768dd666d00 100644
--- a/libstdc++-v3/src/c++11/compatibility-c++0x.cc
+++ b/libstdc++-v3/src/c++11/compatibility-c++0x.cc
@@ -40,6 +40,10 @@
 # error "compatibility-c++0x.cc must be compiled with -std=gnu++0x"
 #endif
 
+#if _GLIBCXX_INLINE_VERSION
+# error "compatibility-thread-c++0x.cc is not needed for 
gnu-versioned-namespace"
+#endif
+
 #ifdef _GLIBCXX_SHARED
 
 namespace std _GLIBCXX_VISIBILITY(default)
diff --git a/libstdc++-v3/src/c++11/compatibility-chrono.cc 
b/libstdc++-v3/src/c++11/compatibility-chrono.cc
index 38b96e811fb..6beb8b39a25 100644
--- a/libstdc++-v3/src/c++11/compatibility-chrono.cc
+++ b/libstdc++-v3/src/c++11/compatibility-chrono.cc
@@ -24,6 +24,10 @@
 
 #include 
 
+#if _GLIBCXX_INLINE_VERSION
+# error "compatibility-thread-c++0x.cc is not needed for 
gnu-versioned-namespace"
+#endif
+
 #ifdef _GLIBCXX_USE_C99_STDINT_TR1
 
 #ifdef _GLIBCXX_USE_GETTIMEOFDAY
diff --git a/libstdc++-v3/src/c++11/compatibility-condvar.cc 
b/libstdc++-v3/src/c++11/compatibility-condvar.cc
index ea3e11efeda..e3a8b8403ca 100644
--- a/libstdc++-v3/src/c++11/compatibility-condvar.cc
+++ b/libstdc++-v3/src/c++11/compatibility-condvar.cc
@@ -28,6 +28,10 @@
 # error "compatibility-condvar-c++0x.cc must be compiled with -std=gnu++11"
 #endif
 
+#if _GLIBCXX_INLINE_VERSION
+# error "compatibility-thread-c++0x.cc is not needed for 
gnu-versioned-namespace"
+#endif
+
 #if defined(_GLI

[PING] Advise to call 'internal_error' instead of 'abort' or 'fancy_abort'

2022-05-17 Thread Thomas Schwinge

Hi!

Ping.


Grüße
 Thomas


On 2022-05-10T16:03:12+0200, I wrote:
> Hi!
>
> On 2022-05-03T15:46:43+0200, Richard Biener  
> wrote:
>> On Tue, May 3, 2022 at 2:29 PM Thomas Schwinge  
>> wrote:
>>> On 2022-05-03T12:53:50+0200, Richard Biener  
>>> wrote:
>>> > On Tue, May 3, 2022 at 10:16 AM Thomas Schwinge  
>>> > wrote:
>>> >> On 2022-05-03T09:17:52+0200, Richard Biener  
>>> >> wrote:
>>> >> > On Mon, May 2, 2022 at 4:01 PM Thomas Schwinge 
>>> >> >  wrote:
>>> >> > +#if 0
>>> >> >gcc_unreachable ();
>>> >> > +#else
>>> >> > +  /* ..., but due to bugs (PR100400), we may actually come here.
>>> >> > +Reliably catch this, regardless of checking level.  */
>>> >> > +  abort ();
>>> >> > +#endif
>>> >> >
>>> >> > this doesn't look correct.  If you want a reliable diagnostic here 
>>> >> > please [...]
>>> >> > call internal_error () manually (the IL verifiers do this).
>>> >>
>>> >> Hmm, I feel I'm going in circles...  ;-)
>
>>> >> I first had this as 'internal_error', but then saw the following source
>>> >> code comment, 'gcc/diagnostic.cc':
>>> >>
>>> >> /* An internal consistency check has failed.  We make no attempt to
>>> >>continue.  Note that unless there is debugging value to be had 
>>> >> from
>>> >>a more specific message, or some other good reason, you should use
>>> >>abort () instead of calling this function directly.  */
>>> >> void
>>> >> internal_error (const char *gmsgid, ...)
>>> >> {
>>> >>
>>> >> Here, there's no "debugging value to be had from a more specific
>>> >> message", and I couldn't think of "some other good reason", so decided to
>>> >> "use abort () instead of calling this function directly".
>>> >
>>> > I think that is misguided.
>>>
>>> So that I know which one to fix/reconsider: does your "that" refer to the
>>> 'gcc/diagnostic.cc:internal_error' source code comment cited above, or my
>>> interpretation of it?
>>
>> The comment to "use abort ()".
>
> Does the attached
> "Advise to call 'internal_error' instead of 'abort' or 'fancy_abort'"
> capture what you had in mind?
>
> (This is, obviously, not updating any of the many 'abort' or even a few
> 'fancy_abort' calls that we currently have.)
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a8017c7b5fa7b5e8210b6446acf7dd09989a7517 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 10 May 2022 15:56:08 +0200
Subject: [PATCH] Advise to call 'internal_error' instead of 'abort' or
 'fancy_abort'

	gcc/
	* diagnostic.cc: Don't advise to call 'abort' instead of
	'internal_error'.
	* system.h: Advise to call 'internal_error' instead of 'abort' or
	'fancy_abort'.

Suggested-by: Richard Biener 
---
 gcc/diagnostic.cc | 4 +---
 gcc/system.h  | 6 --
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 73324a728fe..fef11467b6f 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -1935,9 +1935,7 @@ fatal_error (location_t loc, const char *gmsgid, ...)
 }
 
 /* An internal consistency check has failed.  We make no attempt to
-   continue.  Note that unless there is debugging value to be had from
-   a more specific message, or some other good reason, you should use
-   abort () instead of calling this function directly.  */
+   continue.  */
 void
 internal_error (const char *gmsgid, ...)
 {
diff --git a/gcc/system.h b/gcc/system.h
index c25cd64366f..187763efcd6 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -770,8 +770,10 @@ extern int vsnprintf (char *, size_t, const char *, va_list);
 #endif
 #endif
 
-/* Redefine abort to report an internal error w/o coredump, and
-   reporting the location of the error in the source file.  */
+/* Redefine 'abort' to report an internal error w/o coredump, and
+   reporting the location of the error in the source file.
+   Instead of directly calling 'abort' or 'fancy_abort', GCC code
+   should normally call 'internal_error' with a specific message.  */
 extern void fancy_abort (const char *, int, const char *)
 	 ATTRIBUTE_NORETURN ATTRIBUTE_COLD;
 #define abort() fancy_abort (__FILE__, __LINE__, __FUNCTION__)
-- 
2.25.1

Re: [PATCH] [i386] recognize bzhi pattern when there's zero_extendsidi.

2022-05-17 Thread Uros Bizjak via Gcc-patches

On Tue, May 17, 2022 at 5:06 AM liuhongt  wrote:
>
> backend has
>
> 16550(define_insn "*bmi2_bzhi_3_2"
> 16551  [(set (match_operand:SWI48 0 "register_operand" "=r")
> 16552(and:SWI48
> 16553  (plus:SWI48
> 16554(ashift:SWI48 (const_int 1)
> 16555  (match_operand:QI 2 "register_operand" "r"))
> 16556(const_int -1))
> 16557  (match_operand:SWI48 1 "nonimmediate_operand" "rm")))
> 16558   (clobber (reg:CC FLAGS_REG))]
> 16559  "TARGET_BMI2"
> 16560  "bzhi\t{%2, %1, %0|%0, %1, %2}"
> 16561  [(set_attr "type" "bitmanip")
> 16562   (set_attr "prefix" "vex")
> 16563   (set_attr "mode" "")])
>
> But there's extra zero_extend in pattern match.
>
> 424Failed to match this instruction:
> 425(parallel [
> 426(set (reg:DI 90)
> 427(zero_extend:DI (and:SI (plus:SI (ashift:SI (const_int 1 [0x1])
> 428(subreg:QI (reg:SI 98) 0))
> 429(const_int -1 [0x]))
> 430(subreg:SI (reg:DI 95) 0
> 431(clobber (reg:CC 17 flags))
> 432])
>
> Add new define_insn for it.
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}..
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/104375
> * config/i386/i386.md (*bmi2_bzhi_zero_extendsidi_4): New
> define_insn.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr104375.c: New test.

OK with a nit below.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.md  | 16 
>  gcc/testsuite/gcc.target/i386/pr104375.c |  9 +
>  2 files changed, 25 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104375.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index f9c06ff302a..ec7bdd04947 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -16636,6 +16636,22 @@ (define_insn "*bmi2_bzhi_3_3"
> (set_attr "prefix" "vex")
> (set_attr "mode" "")])
>
> +(define_insn "*bmi2_bzhi_zero_extendsidi_4"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +   (zero_extend:DI
> + (and:SI
> +   (plus:SI
> + (ashift:SI (const_int 1)
> +(match_operand:QI 2 "register_operand" "r"))
> + (const_int -1))
> +   (match_operand:SI 1 "nonimmediate_operand" "rm"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_BMI2 && TARGET_64BIT"

Please put TARGET_64BIT first here.

> +  "bzhi\t{%q2, %q1, %q0|%q0, %q1, %q2}"
> +  [(set_attr "type" "bitmanip")
> +   (set_attr "prefix" "vex")
> +   (set_attr "mode" "DI")])
> +
>  (define_insn "bmi2_pdep_3"
>[(set (match_operand:SWI48 0 "register_operand" "=r")
>  (unspec:SWI48 [(match_operand:SWI48 1 "register_operand" "r")
> diff --git a/gcc/testsuite/gcc.target/i386/pr104375.c 
> b/gcc/testsuite/gcc.target/i386/pr104375.c
> new file mode 100644
> index 000..5c9f511da5c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr104375.c
> @@ -0,0 +1,9 @@
> +#/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-mbmi2 -O2" } */
> +/* { dg-final { scan-assembler-times {(?n)shrx[\t ]+} 1 } } */
> +/* { dg-final { scan-assembler-times {(?n)bzhi[\t ]+} 1 } } */
> +
> +unsigned long long bextr_u64(unsigned long long w, unsigned off, unsigned 
> int len)
> +{
> +return (w >> off) & ((1U << len) - 1U);
> +}
> --
> 2.18.1
>

Re: [PATCH] i386: Fix up V2DI and V1TI inequality comparisons [PR105613]

2022-05-17 Thread Uros Bizjak via Gcc-patches

On Tue, May 17, 2022 at 9:00 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The recent r13-458 change to introduce vec_cmpeqv1tiv1ti and
> add TARGET_SSE2 support to vec_cmpeqv2div2di works nicely for
> equality comparisons, but as the testcase shows doesn't work
> for inequality comparisons.
> For EQ if we perform comparison with twice as many half-sized elemenets,
> the result should be ~0 when both halves are ~0 only (both halves need
> to be equal for the whole to be equal), otherwise 0, so AND is the
> correct operation for it.
> But for NE, the result should be ~0 when either of the halves is ~0
> (if either half is not equal, the whole is not equal) and so the right
> operation for NE is IOR, not AND.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-05-17  Jakub Jelinek  
>
> PR target/105613
> * config/i386/sse.md (vec_cmpeqv2div2di, vec_cmpeqv1tiv1ti): Use
> andv4si3 only for EQ, for NE use iorv4si3 instead.
>
> * gcc.c-torture/execute/pr105613.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/sse.md.jj   2022-05-16 09:46:01.962065216 +0200
> +++ gcc/config/i386/sse.md  2022-05-16 10:48:45.698038881 +0200
> @@ -4407,7 +4407,10 @@ (define_expand "vec_cmpeqv2div2di"
>emit_insn (gen_sse2_pshufd (tmp1, ops[0], GEN_INT (0xb1)));
>
>rtx tmp2 = gen_reg_rtx (V4SImode);
> -  emit_insn (gen_andv4si3 (tmp2, tmp1, ops[0]));
> +  if (GET_CODE (operands[1]) == EQ)
> +   emit_insn (gen_andv4si3 (tmp2, tmp1, ops[0]));
> +  else
> +   emit_insn (gen_iorv4si3 (tmp2, tmp1, ops[0]));
>
>emit_move_insn (operands[0], gen_lowpart (V2DImode, tmp2));
>  }
> @@ -4435,7 +4438,10 @@ (define_expand "vec_cmpeqv1tiv1ti"
>emit_insn (gen_sse2_pshufd (tmp1, tmp2, GEN_INT (0x4e)));
>
>rtx tmp3 = gen_reg_rtx (V4SImode);
> -  emit_insn (gen_andv4si3 (tmp3, tmp2, tmp1));
> +  if (GET_CODE (operands[1]) == EQ)
> +emit_insn (gen_andv4si3 (tmp3, tmp2, tmp1));
> +  else
> +emit_insn (gen_iorv4si3 (tmp3, tmp2, tmp1));
>
>emit_move_insn (operands[0], gen_lowpart (V1TImode, tmp3));
>DONE;
> --- gcc/testsuite/gcc.c-torture/execute/pr105613.c.jj   2022-05-16 
> 10:42:34.286151601 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr105613.c  2022-05-16 
> 10:48:07.687562119 +0200
> @@ -0,0 +1,26 @@
> +/* PR target/105613 */
> +/* { dg-do run { target int128 } } */
> +
> +typedef unsigned __int128 __attribute__((__vector_size__ (16))) V;
> +
> +void
> +foo (V v, V *r)
> +{
> +  *r = v != 0;
> +}
> +
> +int
> +main ()
> +{
> +  V r;
> +  foo ((V) {5}, &r);
> +  if (r[0] != ~(unsigned __int128) 0)
> +__builtin_abort ();
> +  foo ((V) {0x50005ULL}, &r);
> +  if (r[0] != ~(unsigned __int128) 0)
> +__builtin_abort ();
> +  foo ((V) {0}, &r);
> +  if (r[0] != 0)
> +__builtin_abort ();
> +  return 0;
> +}
>
> Jakub
>

Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OImode.

2022-05-17 Thread Uros Bizjak via Gcc-patches

On Tue, May 17, 2022 at 3:33 AM Hongtao Liu  wrote:
>
> On Mon, May 16, 2022 at 5:21 PM Uros Bizjak via Gcc-patches
>  wrote:
> >
> > On Sat, May 7, 2022 at 7:05 AM liuhongt  wrote:
> > >
> > > This is adjusted patch only for OImode.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/104610
> > > * config/i386/i386-expand.cc (ix86_expand_branch): Use ptest
> > > for QImode when code is EQ or NE.
> > > * config/i386/sse.md (cbranch4): Extend to OImode.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr104610.c: New test.
> > > ---
> > >  gcc/config/i386/i386-expand.cc   | 10 +-
> > >  gcc/config/i386/sse.md   |  8 ++--
> > >  gcc/testsuite/gcc.target/i386/pr104610.c | 15 +++
> > >  3 files changed, 30 insertions(+), 3 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104610.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index bc806ffa283..c2f8776102c 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -2267,11 +2267,19 @@ ix86_expand_branch (enum rtx_code code, rtx op0, 
> > > rtx op1, rtx label)
> > >
> > >/* Handle special case - vector comparsion with boolean result, 
> > > transform
> > >   it using ptest instruction.  */
> > > -  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> > > +  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> > > +  || (mode == OImode && (code == EQ || code == NE)))
> >
> > No need for the code check here. You have an assert in the code below.
> >
> Changed.
> I mistakenly saw the QImode as OImode, I thought OImode other compare
> code can also handle.
> > >  {
> > >rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> > >machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : 
> > > V2DImode;
> > >
> > > +  if (mode == OImode)
> > > +   {
> > > + op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > > + op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > > + mode = p_mode;
> > > +   }
> > > +
> > >gcc_assert (code == EQ || code == NE);
> >
> > Please put the above hunk after the assert.
> Changed.
> >
> > >/* Generate XOR since we can't check that one operand is zero 
> > > vector.  */
> > >tmp = gen_reg_rtx (mode);
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 7b791def542..9514b8e0234 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -26034,10 +26034,14 @@ (define_expand 
> > > "maskstore"
> > >   (match_operand: 2 "register_operand")))]
> > >"TARGET_AVX512BW")
> > >
> > > +(define_mode_iterator VI48_OI_AVX
> > > +  [(V8SI "TARGET_AVX") (V4DI "TARGET_AVX") (OI "TARGET_AVX")
> > > +   V4SI V2DI])
> > > +
> > >  (define_expand "cbranch4"
> > >[(set (reg:CC FLAGS_REG)
> > > -   (compare:CC (match_operand:VI48_AVX 1 "register_operand")
> > > -   (match_operand:VI48_AVX 2 "nonimmediate_operand")))
> > > +   (compare:CC (match_operand:VI48_OI_AVX 1 "register_operand")
> > > +   (match_operand:VI48_OI_AVX 2 "nonimmediate_operand")))
> > > (set (pc) (if_then_else
> > >(match_operator 0 "bt_comparison_operator"
> > > [(reg:CC FLAGS_REG) (const_int 0)])
> >
> > Please rather put the new cbranchoi4 expander in i386.md.
> Good idea, changed.
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c 
> > > b/gcc/testsuite/gcc.target/i386/pr104610.c
> > > new file mode 100644
> > > index 000..00866238bd7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -mmove-max=256 -mstore-max=256" } */
> > > +/* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
> > > +/* { dg-final { scan-assembler-times {sete} 1 } } */
> > > +/* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> > > +/* { dg-final { scan-assembler-not {(?n)jne.*L[0-9]} } } */
> > > +
> > > +
> > > +#include
> > > +__attribute__((target("avx")))
> > > +bool f256(char *a)
> >
> > Use _Bool istead and simply pass -mavx to dg-options.
> >
> Changed.
> > Uros.
> >
> > > +{
> > > +  char t[] = "0123456789012345678901234567890";
> > > +  return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
> > > +}
> > > --
> > > 2.18.1
> > >
>
>
> Here's the updated patch.


   gcc_assert (code == EQ || code == NE);
+  if (mode == OImode)

Please add one line of vertical space in the code above.

OK with that change.

Thanks,
Uros.

Re: [wwwdocs][Patch] Add OpenMP by-GCC-version implementation status

2022-05-17 Thread Jakub Jelinek via Gcc-patches

On Tue, May 17, 2022 at 11:50:15AM +0200, Tobias Burnus wrote:
> On 17.05.22 11:00, Jakub Jelinek wrote:
> > BTW, it would be really nice to use colors like
> > https://gcc.gnu.org/projects/cxx-status.html uses, use just GCC versions
> > instead of GCC version and No instead of N and use hyperlinks to
> > changes.html OpenMP ids (or just changes.html if we don't have an id).
> Done so: https://gcc.gnu.org/projects/gomp/
> 
> I also had to add a table.omptable to gcc.css as we need column 2 not 3
> or 4 to be centered.

Thanks.
Noticed small typo, committed now as obvious.

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 52991c02..799a1e3b 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -24,7 +24,7 @@ OpenMP and OpenACC are supported with GCC's C, C++ and 
Fortran compilers.
   Contributing
   Reporting Bugs
   OpenMP Implementation Status:
-  2.5 · 5.0 ·
+  2.5 · 3.0 ·
   3.1 · 4.0 ·
   4.5 · 5.0 ·
   5.1 · 5.2


Jakub

Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async

2022-05-17 Thread Marcel Vollweiler


Hi Jakub,


--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -224,6 +224,8 @@ OMP_5.1 {
 omp_set_teams_thread_limit_8_;
 omp_get_teams_thread_limit;
 omp_get_teams_thread_limit_;
+omp_target_memcpy_async;
+omp_target_memcpy_rect_async;
  } OMP_5.0.2;


These should be added to OMP_5.1.1, not here.


Changed.


--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -272,6 +272,10 @@ extern int omp_target_is_present (const void *, int) 
__GOMP_NOTHROW;
  extern int omp_target_memcpy (void *, const void *, __SIZE_TYPE__,
   __SIZE_TYPE__, __SIZE_TYPE__, int, int)
__GOMP_NOTHROW;
+extern int omp_target_memcpy_async (void *, const void *, __SIZE_TYPE__,
+__SIZE_TYPE__, __SIZE_TYPE__, int, int,
+int, omp_depend_t*)


Formatting, space before *.


Changed.


+  __GOMP_NOTHROW;
  extern int omp_target_memcpy_rect (void *, const void *, __SIZE_TYPE__, int,
const __SIZE_TYPE__ *,
const __SIZE_TYPE__ *,
@@ -279,6 +283,14 @@ extern int omp_target_memcpy_rect (void *, const void *, 
__SIZE_TYPE__, int,
const __SIZE_TYPE__ *,
const __SIZE_TYPE__ *, int, int)
__GOMP_NOTHROW;
+extern int omp_target_memcpy_rect_async (void *, const void *, __SIZE_TYPE__,
+ int, const __SIZE_TYPE__ *,
+ const __SIZE_TYPE__ *,
+ const __SIZE_TYPE__ *,
+ const __SIZE_TYPE__ *,
+ const __SIZE_TYPE__ *, int, int, int,
+ omp_depend_t*)


Likewise.


Changed.


-int
-omp_target_memcpy (void *dst, const void *src, size_t length,
-   size_t dst_offset, size_t src_offset, int dst_device_num,
-   int src_device_num)
+static int
+omp_target_memcpy_check (void *dst, const void *src, int dst_device_num,
+ int src_device_num,
+ struct gomp_device_descr **dst_devicep,
+ struct gomp_device_descr **src_devicep)
  {


Why does omp_target_memcpy_check need the dst and src arguments?  From what
I can see, they aren't used by it.


Good point, dst and src arguments are removed.


+typedef struct
+{
+  void *dst;
+  const void *src;
+  size_t length;
+  size_t dst_offset;
+  size_t src_offset;
+  struct gomp_device_descr *dst_devicep;
+  struct gomp_device_descr *src_devicep;
+} memcpy_t;


Please come up with some less generic name, struct omp_target_memcpy_data
or something similar.  Even the *_t suffix is problematic, as *_t is
reserved for the implementation.


Renamed "memcpy_t" into "omp_target_memcpy_data" and "memcpy_rect_t" into
"omp_target_memcpy_rect_data".


+
+void
+omp_target_memcpy_async_helper (void *args)


This should be static.


Changed for "omp_target_memcpy_async_helper" and
"omp_target_memcpy_rect_async_helper".


+{
+  memcpy_t *a = args;
+  int ret = omp_target_memcpy_copy (a->dst, a->src, a->length, a->dst_offset,
+a->src_offset, a->dst_devicep,
+a->src_devicep);
+  if (ret)
+gomp_fatal ("asynchronous memcpy failed");


I'm not really sure killing the whole program if the copying failed is the
best action.  Has it been discussed on omp-lang?  Perhaps the APIs should
have a way how to propagate the result to the caller when it completes
somehow?


I agree that gomp_fatal is quite harsh here. Otherwise I am afraid that
undefined behaviour can result from silently ignoring copy failures. I agree
with Tobias to keep gomp_fatal for now (as I don't see any useful alternative
yet) and discuss a (general) approach for OpenMP (as Tobias triggered in
https://github.com/OpenMP/spec/issues/3286).

As Tobias suggested, I replaced the error messages with "omp_target_memcpy
failed" and "omp_target_memcpy_rect failed".


Even if we do that, the ret variable seems to be superfluos, just do
   if (omp_target_memcpy_copy (...))
 gomp_fatal (...);


Changed.




+{
+  struct gomp_device_descr *dst_devicep = NULL, *src_devicep = NULL;
+
+  int check = omp_target_memcpy_check (dst, src, dst_device_num, 
src_device_num,
+   &dst_devicep, &src_devicep);
+  if (check)
+return check;
+
+  void (*fn) (void *) = &omp_target_memcpy_async_helper;
+  void *data = NULL;
+  void (*cpyfn) (void *, void *) = NULL;
+  long arg_size = 0;
+  long arg_align = 0;
+  bool if_clause = false;
+  unsigned flags = 0;
+  int priority_arg = 0;
+  void *detach = NULL;
+
+  memcpy_t s = {
+.dst = dst,
+.src = src,
+.length = length,
+.dst_offset = dst_offset,
+.src_offset = src_offset,
+.dst_devicep = dst_devicep,
+.src_devicep = src_devicep
+  };


I think we in libgomp try to use C89 and so declare vars first befor

Re: [wwwdocs][Patch] Add OpenMP by-GCC-version implementation status

2022-05-17 Thread Tobias Burnus


On 17.05.22 11:00, Jakub Jelinek wrote:

BTW, it would be really nice to use colors like
https://gcc.gnu.org/projects/cxx-status.html uses, use just GCC versions
instead of GCC version and No instead of N and use hyperlinks to
changes.html OpenMP ids (or just changes.html if we don't have an id).

Done so: https://gcc.gnu.org/projects/gomp/

I also had to add a table.omptable to gcc.css as we need column 2 not 3
or 4 to be centered.

Thanks again for all the comments!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] [x86_64]: Zhaoxin lujiazui enablement

2022-05-17 Thread Mayshao-oc

> On Tue, May 17, 2022 at 5:15 AM mayshao  wrote:
>> Hi Uros:
>> This patch fix Zhaoxin CPU vendor ID detection problem and add 
>> zhaoxin "lujiazui" processor support.
>> Currently gcc can't recognize Zhaoxin CPU(vendor ID "CentaurHauls" 
>> and "Shanghai") if user use -march=native option, which is confusing for 
>> users.
>> This patch enables -march=native in zhaoxin family 7th processor and 
>> -march/-mtune=lujiazui, costs and tunning are set according to the 
>> characteristics of the processor.We add a new md file to describe lujiazui 
>> pipeline.
>> Testing:
>> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>> Ok for master?
>> Background:
>> Related Zhaoxin linux kernel patch can be found at:
>> https://lore.kernel.org/lkml/01042674b2f741b2aed1f797359bd...@zhaoxin.com/
>> Related Zhaoxin glibc patch can be found at:
>> https://sourceware.org/git/?p=glibc.git;a=commit;h=32ac0b988466785d6e3cc1dffc364bb26fc63193
>> gcc/ChangeLog:
> The entries below are suspiciously empty - please fill in the details.

Sorry for forgetting this. Update patch. Thanks.

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Detect
the specific type of Zhaoxin CPU, and return Zhaoxin CPU name.
(cpu_indicator_init): Handle Zhaoxin processors.
* common/config/i386/i386-common.cc: Add lujiazui.
* common/config/i386/i386-cpuinfo.h (enum processor_vendor): Add
VENDOR_ZHAOXIN.
(enum processor_types): Add ZHAOXIN_FAM7H.
(enum processor_subtypes): Add ZHAOXIN_FAM7H_LUJIAZUI.
* config.gcc: Add lujiazui.
* config/i386/cpuid.h (signature_SHANGHAI_ebx): Add
Signatures for zhaoxin
(signature_SHANGHAI_ecx): Ditto.
(signature_SHANGHAI_edx): Ditto.
* config/i386/driver-i386.cc (host_detect_local_cpu): Let
-march=native recognize lujiazui processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add lujiazui.
* config/i386/i386-options.cc (m_LUJIAZUI): New_definition.
* config/i386/i386.h (enum processor_type): Ditto.
* config/i386/i386.md: Add lujiazui.
* config/i386/x86-tune-costs.h (struct processor_costs): Add
lujiazui costs.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add lujiazui.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Add lujiazui tunnings.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_MOVX): Ditto.
(X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
(X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
(X86_TUNE_USE_LEAVE): Ditto.
(X86_TUNE_PUSH_MEMORY): Ditto.
(X86_TUNE_LCP_STALL): Ditto.
(X86_TUNE_USE_INCDEC): Ditto.
(X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
(X86_TUNE_OPT_AGU): Ditto.
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
(X86_TUNE_USE_SAHF): Ditto.
(X86_TUNE_USE_BT): Ditto.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
(X86_TUNE_ONE_IF_CONV_INSN): Ditto.
(X86_TUNE_AVOID_MFENCE): Ditto.
(X86_TUNE_EXPAND_ABS): Ditto.
(X86_TUNE_USE_SIMODE_FIOP): Ditto.
(X86_TUNE_USE_FFREEP): Ditto.
(X86_TUNE_EXT_80387_CONSTANTS): Ditto.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
(X86_TUNE_SSE_TYPELESS_STORES): Ditto.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
* doc/extend.texi: Add details about lujiazui.
* doc/invoke.texi: Add details about lujiazui.
* config/i386/lujiazui.md: Introduce lujiazui cpu and include new md file.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Test -arch=lujiauzi and -tune=lujiazui.
* g++.target/i386/mv32.C: Ditto.

>> * common/config/i386/cpuinfo.h (get_zhaoxin_cpu):
>> (cpu_indicator_init):
>> * common/config/i386/i386-common.cc:
>> * common/config/i386/i386-cpuinfo.h (enum processor_vendor):
>> (enum processor_types):
>> (enum processor_subtypes):
>> * config.gcc:
>> * config/i386/cpuid.h (signature_SHANGHAI_ebx):
>> (signature_SHANGHAI_ecx):
>> (signature_SHANGHAI_edx):
>> * config/i386/driver-i386.cc (host_detect_local_cpu):
>> * config/i386/i386-c.cc (ix86_target_macros_internal):
>> * config/i386/i386-options.cc (m_LUJIAZUI):
>> * config/i386/i386.h (enum processor_type):
>> * config/i386/i386.md:
>> * config/i386/x86-tune-costs.h (struct processor_costs):
>> * config/i386/x86-tune-sched.cc (ix86_issue_rate):
>> (ix86_adjust_cost):
>> * config/i386/x86-tune.def (X86_TUNE_SCHEDULE):
>> (X86_TUNE_PARTIAL_REG_DEPENDENCY):
>> (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY):
>> (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY):
>> (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY):
>> (X86_TUNE_MOVX):
>> (X86_TUNE_MEMOR

Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.

2022-05-17 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
[…]
>> E.g. does the patch avoid the AND in:
>> 
>> #include 
>> uint8_t foo(uint8_t x, int y) {
>> if (y) {
>> printf("Foo %d\n", x ? 1 : 2);
>> __builtin_abort ();
>> }
>> return x + 1;
>> }
>> 
>> ?
>
> Morning,
>
> It does actually, it generates:
>
> foo:
> cbnzw1, .L9
> add w0, w0, 1
> ret
> .L9:
> tst w0, 255
> stp x29, x30, [sp, -16]!
> csetw1, ne
> add w1, w1, 1
> mov x29, sp
> adrpx0, .LC0
> add x0, x0, :lo12:.LC0
> bl  printf
> bl  abort
> .size   foo, .-foo

Ah, nice.

> Now I will admit that this isn't because of a grand master design, but
> purely because the patch works around the cases seen in SPEC.  In those
> cases the comparisons in question were floated out of the if statement.
>
> The heuristic in patch 2/3 allows this because it only looks for compares in
> gimple assigns whereas in this case the compare is in the Gimple cond
> directly.

OK.

[…]
>> > We generate for e.g.:
>> >
>> > #include 
>> >
>> > uint16_t f8 (uint8_t xr, uint8_t xc){
>> > return (uint8_t)(xr * xc);
>> > }
>> >
>> > (insn 9 6 10 2 (set (reg:HI 101)
>> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1
>> (nil))
>> (insn 10 9 11 2 (set (reg:HI 102)
>> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1
>> (nil))
>> (insn 11 10 12 2 (set (reg:SI 103)
>> (mult:SI (subreg:SI (reg:HI 101) 0)
>> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
>> (nil))
>> >
>> > Out of expand. The paradoxical subreg isn't generated at all out of
>> > expand unless it's needed. It does keep the original params around as
>> unused:
>> >
>> > (insn 2 7 4 2 (set (reg:QI 97)
>> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1
>> (nil))
>> (insn 4 2 3 2 (set (reg:QI 99)
>> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1
>> (nil))
>> >
>> > And the paradoxical subreg is moved into the first operation requiring it:
>> >
>> > (insn 11 10 12 2 (set (reg:SI 103)
>> (mult:SI (subreg:SI (reg:HI 101) 0)
>> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
>> (nil))
>> 
>> Ah, OK, this isn't what I'd imaagined.  I thought the xr and xc registers 
>> would
>> be SIs and the DECL_RTLs would be QI subregs of those SI regs.
>> I think that might work better, for the reasons above.  (That is, whenever we
>> need the register in extended form, we can simply extend the existing reg
>> rather than create a new one.)
>
> Ah, I see, no, I explicitly avoid this. When doing the type promotions I tell 
> it that
> size of the copies of xr and xc is still the original size, e.g. QI (i.e. I 
> don't change 97 and 99).
> This is different from what we do with extends where 97 and 99 *would* be 
> changed.
>
> The reason is that if I make this SI the compiler thinks it knows the value 
> of all the bits
> in the register which led to various miscompares as it thinks it can use the 
> SI value directly.
>
> This happens because again the xr and xc are hard regs. So having 97 be
>
> (set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0))
>
> gets folded to an incorrect
>
> (set (reg:SI 97) (reg:SI 0 x0 [ xr ]))

This part I would expect (and hope for :-)).

> And now 97 is free to be used without any zero extension, as 97 on it's own 
> is an invalid RTX.

But the way I'd imagined it working, expand would need to insert an
extension before any operation that needs the upper 24 bits to be
defined (e.g. comparisons, right shifts).  If the DECL_RTL is
(subreg:QI (reg:SI x) 0) then the upper bits are not defined,
since SUBREG_PROMOTED_VAR_P would/should be false for the subreg.

E.g. for:

  int8_t foo(int8_t x) { return x >> 1; }

x would have a DECL_RTL of (subreg:QI (reg:SI x) 0), the parameter
assignment would be expanded as:

  (set (reg:SI x) (reg:SI x0))

the shift would be expanded as:

  (set (reg:SI x) (zero_extend:SI (subreg:QI (reg:SI x) 0)))
  (set (reg:SI x) (ashiftrt:SI (reg:SI x) (const_int 1)))

and the return assignment would be expanded as:

  (set (reg:SI x0) (reg:SI x))

x + 1 would instead be expanded to just:

  (set (reg:SI x) (plus:SI (reg:SI x) (const_int 1)))

(without an extension).

I realised later though that, although reusing the DECL_RTL reg for
the extension has the nice RA property of avoiding multiple live values,
it would make it harder to combine the extension into the operation
if the variable is still live afterwards.  So I guess we lose something
both ways.

Maybe we need a different approach, not based on changing PROMOTE_MODE.

I wonder how easy it would be to do the promotion in gimple,
then reuse backprop to determine when a sign/zero-extension
(i.e. a normal gimple cast) can be converted into an “any extend”
(probably represented as a new ifn).

> So I have to keep the intermediate copy QI mode, after which the RTX 
> optimizations
> being done during expand generates the forms above.
>
>> 
>> I think that's where confusion was coming from.
>>

Re: [wwwdocs][Patch] Add OpenMP by-GCC-version implementation status

2022-05-17 Thread Jakub Jelinek via Gcc-patches

On Tue, May 17, 2022 at 10:49:42AM +0200, Tobias Burnus wrote:
> Thoughts on this part?

Either place is fine.

> > > +Map-order clarificationsGCC?
> > This entry I gave up on, it isn't exactly clear to me what that
> > bullet is about and once we figure that out, we need to do some archeology
> > on whether we support it at all and if yes, since which commit and thus
> > since which GCC version.
> I concur – but the question is how to handle it now? Leave it and
> correct it later? Comment/remove it?

Write ? in the Version column alone instead of GCC? and in comment
say To be verified or something similar.

BTW, it would be really nice to use colors like
https://gcc.gnu.org/projects/cxx-status.html uses, use just GCC versions
instead of GCC version and No instead of N and use hyperlinks to
changes.html OpenMP ids (or just changes.html if we don't have an id).
And, for No we could in the future hyperlink to bugzilla if we file
PRs for those missing or WIP features which people can assign etc.

But that can be changed incrementally.

> Can you check whether you now like the bullet points? If so, I will
> update the .texi to match.

LGTM.

Jakub

Re: [wwwdocs][Patch] Add OpenMP by-GCC-version implementation status

2022-05-17 Thread Tobias Burnus


Hi Jakub & Gerald,

first, thanks for all the suggestions!

I have now followed Gerald's suggestion to place the table into the main
GOMP page.

I then also decided to make it more GCC-user orientated than
GCC-developer orientated by re-writing the intro (but keeping the old
one as background), also referencing  OpenACC, mentioning the
command-line arguments to use + adding a bunch of GCC links to have
everything together.

Thoughts on this part?

I also added the first OpenMP entry to gcc-13/changes.html, which also
links to the GOMP page. (I expect more mainline commits in the near
feature, but I want to have a stub there. I think the next update could
be done in a month or two – once more items have accumulated.)

And I tried to incorporate all suggested changes. Regarding:

On 17.05.22 10:21, Jakub Jelinek wrote:

+Features added by OpenMP version
+
+  OpenMP 4.5
...

I think we should have also 2.5, 3.0, 3.1 and 4.0 entries in the above list,
with similar style as the OpenMP 4.5 entry.
...
Though, maybe a table form for the 2.5-4.5 versions would be more consistent
with the rest,


Partially because I am lazy and partially because I think a table with
those entries looks odd, I went for the bullet style.


+Map-order clarificationsGCC?

This entry I gave up on, it isn't exactly clear to me what that
bullet is about and once we figure that out, we need to do some archeology
on whether we support it at all and if yes, since which commit and thus
since which GCC version.

I concur – but the question is how to handle it now? Leave it and
correct it later? Comment/remove it?

Can you check whether you now like the bullet points? If so, I will
update the .texi to match.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
 Add OpenMP by-GCC-version implementation status

 * htdocs/projects/gomp/index.html: Add by-GCC-version implementation
 status; add new intro, add links and crossrefs.
 * htdocs/projects/gomp/index.html: Link it.
 * htdocs/gcc-13/changes.html: Likewise; document first new features.

 htdocs/gcc-13/changes.html  |  12 ++
 htdocs/projects/gomp/index.html | 303 ++--
 htdocs/projects/index.html  |   1 +
 3 files changed, 302 insertions(+), 14 deletions(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index f21b546b..d62030d2 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -37,6 +37,18 @@ a work-in-progress.
 
 General Improvements
 
+
+  https://gcc.gnu.org/projects/gomp/";>OpenMP
+  
+The following OpenMP 5.1 features have been added: the
+omp_all_memory reserved locator and the
+omp_target_is_accessible and omp_get_mapped_ptr
+API routines.
+  
+  
+
+
+
 
 New Languages and Language specific improvements
 
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 59697c10..32345c8e 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -3,13 +3,84 @@
 
 
 
-GOMP — An OpenMP implementation for GCC
+GNU Offloading and Multi-Processing Project (GOMP)
 https://gcc.gnu.org/gcc.css"; />
 
 
 
 
-Welcome to the home of GOMP
+GNU Offloading and Multi-Processing Project (GOMP)
+
+The GOMP project consists of implementation of OpenMP and OpenACC to
+permit annotating the source code to permit running it concurrently with
+thread parallelization and on offloading devices (accelerators such
+as GPUs), including the associated run-time library and API routines. Both
+OpenMP and OpenACC are supported with GCC's C, C++ and Fortran compilers.
+
+Content
+
+  Usage
+  History and Project Goal
+  Contributing
+  Reporting Bugs
+  OpenMP Implementation Status:
+  2.5 · 5.0 ·
+  3.1 · 4.0 ·
+  4.5 · 5.0 ·
+  5.1 · 5.2
+  OpenMP Releases and Status
+
+
+Usage
+
+  To enable https://www.openmp.org";>OpenMP,
+  use https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-fopenmp";
+  >-fopenmp; -fopenmp-simd can be used
+  to enable only the SIMD vectorization and loop-transformation constructs
+  without creating multiple threads, offloading code or adding library
+  dependency.
+  To enable https://www.openacc.org";>OpenACC,
+  use https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-fopenacc";
+  >-fopenacc.
+  If either is enabled, offloading is automatically generated for all
+  offload-device types for which the compiler has been configured. Use https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload";
+  >-foffload= to disable or specify the offload-devices to be
+  used. Use https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload-options";
+  >-foffload-options= to pass device-specific compiler and
+  linker flags.
+
+
+Diagnostics
+
+  The https://gcc.gnu.org/onlinedocs/

[PATCH] tree-optimization/105618 - restore load sinking

2022-05-17 Thread Richard Biener via Gcc-patches

The PR97330 fix caused some missed sinking of loads out of loops
the following patch re-instantiates.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-05-17  Richard Biener  

PR tree-optimization/105618
* tree-ssa-sink.cc (statement_sink_location): For virtual
PHI uses ignore those defining the used virtual operand.

* gcc.dg/tree-ssa/ssa-sink-19.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-19.c | 21 +
 gcc/tree-ssa-sink.cc|  3 +++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-19.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-19.c
new file mode 100644
index 000..e98d13fe85b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-19.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink1-details -fdump-tree-cddce2-details" } */
+
+static int b=4;
+int c;
+
+int
+main()
+{
+  int e[5] = {1,1,1,1,1};
+  for (; b >= 0; b--) {
+c = e[b];
+  }
+  return 0;
+}
+
+/* We should sink e[b] out of the loop which is possible after
+   applying store motion to c and b.  */
+/* { dg-final { scan-tree-dump "Sinking # VUSE" "sink1" } } */
+/* And remove the loop after final value replacement.  */
+/* { dg-final { scan-tree-dump "fix_loop_structure: removing loop" "cddce2" } 
} */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 1c226406feb..8ce4403ddc8 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -390,6 +390,9 @@ statement_sink_location (gimple *stmt, basic_block frombb,
 with the use.  */
  if (gimple_code (use_stmt) == GIMPLE_PHI)
{
+ /* If the PHI defines the virtual operand, ignore it.  */
+ if (gimple_phi_result (use_stmt) == gimple_vuse (stmt))
+   continue;
  /* In case the PHI node post-dominates the current insert
 location we can disregard it.  But make sure it is not
 dominating it as well as can happen in a CFG cycle.  */
-- 
2.35.3

[Ada] Subprogram renaming fails to hide homograph

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

The compiler failed to detect an error where the first prefix of an
expanded name given as the renamed subprogram in a subprogram renaming
declaration denotes a unit with the same name as the name given for the
subprogram renaming. Such a unit must be hidden by the renaming itself.
An error check is added to catch this case.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch8.adb (Analyze_Subprogram_Renaming): Add error check for
the case of a renamed subprogram given by an expanded name whose
outermost prefix names a unit that is hidden by the name of the
renaming.
(Ult_Expanded_Prefix): New local expression function to return
the ultimate prefix of an expanded name.diff --git a/gcc/ada/sem_ch8.adb b/gcc/ada/sem_ch8.adb
--- a/gcc/ada/sem_ch8.adb
+++ b/gcc/ada/sem_ch8.adb
@@ -3967,6 +3967,31 @@ package body Sem_Ch8 is
  ("implicit operation& is not visible (RM 8.3 (15))",
   Nam, Old_S);
 end if;
+
+ --  Check whether an expanded name used for the renamed subprogram
+ --  begins with the same name as the renaming itself, and if so,
+ --  issue an error about the prefix being hidden by the renaming.
+ --  We exclude generic instances from this checking, since such
+ --  normally illegal renamings can be constructed when expanding
+ --  instantiations.
+
+ elsif Nkind (Nam) = N_Expanded_Name then
+declare
+   function Ult_Expanded_Prefix (N : Node_Id) return Node_Id is
+ (if Nkind (N) /= N_Expanded_Name
+  then N
+  else Ult_Expanded_Prefix (Prefix (N)));
+   --  Returns the ultimate prefix of an expanded name
+
+begin
+   if Chars (Entity (Ult_Expanded_Prefix (Nam))) = Chars (New_S)
+ and then not In_Instance
+   then
+  Error_Msg_Sloc := Sloc (N);
+  Error_Msg_NE
+("& is hidden by declaration#", Nam, New_S);
+   end if;
+end;
  end if;
 
  Set_Convention (New_S, Convention (Old_S));

[Ada] Restore defensive guard in checks for volatile actuals

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

When flagging names of volatile objects occurring in actual parameters
it is safer to guard against identifiers without entity. This is
redundant (because earlier in the resolution of actual parameters we
already guard against actuals with Any_Type), but perhaps such
identifiers will become allowed in constructs like:

   Subprogram_Call
 (Actual =>
(declare
   X : Boolean := ...
 with Annotate (GNATprove, ...)));
^

which include an identifier that does not denote any entity.

Code cleanup related to handling of volatile components; behaviour is
unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_res.adb (Flag_Effectively_Volatile_Objects): Restore
redundant guard.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -3873,7 +3873,8 @@ package body Sem_Res is
   --  selector_name in selected_component or as a choice in
   --  component_association.
 
-  if Is_Object (Id)
+  if Present (Id)
+and then Is_Object (Id)
 and then Ekind (Id) not in E_Component | E_Discriminant
 and then Is_Effectively_Volatile_For_Reading (Id)
 and then

[Ada] CUDA: remove code performing kernel registration

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

A previous commit implemented a new kernel registration scheme, using
the binder to generate registration code rather than inserting
registration code in packages.  Now that this new approach has had time
to be thoroughly tested, it is time to remove the old approach.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gnat_cuda.ads: Update package-level comments.
(Build_And_Insert_CUDA_Initialization): Remove function.
* gnat_cuda.adb (Build_And_Insert_CUDA_Initialization): Remove
function.
(Expand_CUDA_Package): Remove call to
Build_And_Insert_CUDA_Initialization.diff --git a/gcc/ada/gnat_cuda.adb b/gcc/ada/gnat_cuda.adb
--- a/gcc/ada/gnat_cuda.adb
+++ b/gcc/ada/gnat_cuda.adb
@@ -31,19 +31,12 @@ with Einfo.Entities; use Einfo.Entities;
 with Einfo.Utils;use Einfo.Utils;
 with Elists; use Elists;
 with Errout; use Errout;
-with Namet;  use Namet;
 with Nlists; use Nlists;
 with Nmake;  use Nmake;
-with Rtsfind;use Rtsfind;
-with Sem;use Sem;
 with Sem_Aux;use Sem_Aux;
 with Sem_Util;   use Sem_Util;
 with Sinfo.Nodes;use Sinfo.Nodes;
 with Sinfo;  use Sinfo;
-with Snames; use Snames;
-with Stringt;use Stringt;
-with Tbuild; use Tbuild;
-with Uintp;  use Uintp;
 
 with GNAT.HTable;
 
@@ -83,25 +76,6 @@ package body GNAT_CUDA is
--  least one procedure marked with aspect CUDA_Global. The values are
--  Elists of the marked procedures.
 
-   procedure Build_And_Insert_CUDA_Initialization (N : Node_Id);
-   --  Builds declarations necessary for CUDA initialization and inserts them
-   --  in N, the package body that contains CUDA_Global nodes. These
-   --  declarations are:
-   --
-   --* A symbol to hold the pointer P to the CUDA fat binary.
-   --
-   --* A type definition T for a wrapper that contains the pointer to the
-   --  CUDA fat binary.
-   --
-   --* An object of the aforementioned type to hold the aforementioned
-   --  pointer.
-   --
-   --* For each CUDA_Global procedure in the package, a declaration of a C
-   --  string containing the function's name.
-   --
-   --* A procedure that takes care of calling CUDA functions that register
-   --  CUDA_Global procedures with the runtime.
-
procedure Empty_CUDA_Global_Subprograms (Pack_Id : Entity_Id);
--  For all subprograms marked CUDA_Global in Pack_Id, remove declarations
--  and replace statements with a single null statement.
@@ -234,13 +208,6 @@ package body GNAT_CUDA is
 
   Remove_CUDA_Device_Entities
 (Package_Specification (Corresponding_Spec (N)));
-
-  --  If procedures marked with CUDA_Global have been defined within N,
-  --  we need to register them with the CUDA runtime at program startup.
-  --  This requires multiple declarations and function calls which need
-  --  to be appended to N's declarations.
-
-  Build_And_Insert_CUDA_Initialization (N);
end Expand_CUDA_Package;
 
--
@@ -270,463 +237,6 @@ package body GNAT_CUDA is
   return CUDA_Kernels_Table.Get (Pack_Id);
end Get_CUDA_Kernels;
 
-   --
-   -- Build_And_Insert_CUDA_Initialization --
-   --
-
-   procedure Build_And_Insert_CUDA_Initialization (N : Node_Id) is
-
-  --  For the following kernel declaration:
-  --
-  --  package body  is
-  -- procedure  (X : Integer) with CUDA_Global;
-  --  end package;
-  --
-  --  Insert the following declarations:
-  --
-  -- Fat_Binary : System.Address;
-  -- pragma Import
-  --(Convention=> C,
-  -- Entity=> Fat_Binary,
-  -- External_Name => "_binary__fatbin_start");
-  --
-  -- Wrapper : Fatbin_Wrapper :=
-  --   (16#466243b1#, 1, Fat_Binary'Address, System.Null_Address);
-  --
-  -- Proc_Symbol_Name : Interfaces.C.Strings.Chars_Ptr :=
-  --   Interfaces.C.Strings.New_Char_Array("");
-  --
-  -- Fat_Binary_Handle : System.Address :=
-  --   CUDA.Internal.Register_Fat_Binary (Wrapper'Address);
-  --
-  -- procedure Initialize_CUDA_Kernel is
-  -- begin
-  --CUDA.Internal.Register_Function
-  --   (Fat_Binary_Handle,
-  --'Address,
-  --Proc_Symbol_Name,
-  --Proc_Symbol_Name,
-  ---1,
-  --System.Null_Address,
-  --System.Null_Address,
-  --System.Null_Address,
-  --System.Null_Address,
-  --System.Null_Address);
-  --CUDA.Internal.Register_Fat_Binary_End (Fat_Binary_Handle);
-  -- end Initialize_CUDA_Kernel;
-  --
-  --  Proc_Symbol_Name is the name of the procedure marked with
-  --  CUDA_Glo

[Ada] Enhance the warning on C enum with size clause for size /= 32

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

Improve the warning message and silence warning when size > 32, this is
likely intentional and does not warrant a warning.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* freeze.adb (Freeze_Enumeration_Type): Fix comment, enhance
message and silence warning for size > 32.diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -7968,15 +7968,17 @@ package body Freeze is
 
   else
  --  If the enumeration type interfaces to C, and it has a size clause
- --  that specifies less than int size, it warrants a warning. The
- --  user may intend the C type to be an enum or a char, so this is
+ --  that is smaller than the size of int, it warrants a warning. The
+ --  user may intend the C type to be a boolean or a char, so this is
  --  not by itself an error that the Ada compiler can detect, but it
- --  it is a worth a heads-up. For Boolean and Character types we
+ --  is worth a heads-up. For Boolean and Character types we
  --  assume that the programmer has the proper C type in mind.
+ --  For explicit sizes larger than int, assume the user knows what
+ --  he is doing and that the code is intentional.
 
  if Convention (Typ) = Convention_C
and then Has_Size_Clause (Typ)
-   and then Esize (Typ) /= Esize (Standard_Integer)
+   and then Esize (Typ) < Standard_Integer_Size
and then not Is_Boolean_Type (Typ)
and then not Is_Character_Type (Typ)
 
@@ -7985,7 +7987,12 @@ package body Freeze is
and then not Target_Short_Enums
  then
 Error_Msg_N
-  ("C enum types have the size of a C int??", Size_Clause (Typ));
+  ("??the size of enums in C is implementation-defined",
+   Size_Clause (Typ));
+Error_Msg_N
+  ("\??check that the C counterpart has size of " &
+   UI_Image (Esize (Typ)),
+   Size_Clause (Typ));
  end if;
 
  Adjust_Esize_For_Alignment (Typ);

[Ada] Allow inlining for proof inside generics

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

For local subprograms without contracts inside generics, allow their
inlining for proof in GNATprove mode. This requires forbidding the
inlining of subprograms which contain references to object renamings,
which would be replaced in the SPARK expansion and violate assumptions
of the inlining code.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_spark.adb (Expand_SPARK_Potential_Renaming): Deal with no
entity case.
* inline.ads (Check_Object_Renaming_In_GNATprove_Mode): New
procedure.
* inline.adb (Check_Object_Renaming_In_GNATprove_Mode): New
procedure.
(Can_Be_Inlined_In_GNATprove_Mode): Remove case forbidding
inlining for subprograms inside generics.
* sem_ch12.adb (Copy_Generic_Node): Preserve global entities
when inlining in GNATprove mode.
* sem_ch6.adb (Analyse_Subprogram_Body_Helper): Remove body to
inline if renaming is detected in GNATprove mode.diff --git a/gcc/ada/exp_spark.adb b/gcc/ada/exp_spark.adb
--- a/gcc/ada/exp_spark.adb
+++ b/gcc/ada/exp_spark.adb
@@ -850,9 +850,12 @@ package body Exp_SPARK is
--  Start of processing for Expand_SPARK_Potential_Renaming
 
begin
-  --  Replace a reference to a renaming with the actual renamed object
+  --  Replace a reference to a renaming with the actual renamed object.
+  --  Protect against previous errors leaving no entity in N.
 
-  if Is_Object (Obj_Id) then
+  if Present (Obj_Id)
+and then Is_Object (Obj_Id)
+  then
  Ren := Renamed_Object (Obj_Id);
 
  if Present (Ren) then


diff --git a/gcc/ada/inline.adb b/gcc/ada/inline.adb
--- a/gcc/ada/inline.adb
+++ b/gcc/ada/inline.adb
@@ -1893,13 +1893,6 @@ package body Inline is
   then
  return False;
 
-  --  Subprograms in generic instances are currently not inlined, as this
-  --  interacts badly with the expansion of object renamings in GNATprove
-  --  mode.
-
-  elsif Instantiation_Location (Sloc (Id)) /= No_Location then
- return False;
-
   --  Do not inline subprograms and entries defined inside protected types,
   --  which typically are not helper subprograms, which also avoids getting
   --  spurious messages on calls that cannot be inlined.
@@ -2643,6 +2636,75 @@ package body Inline is
   end if;
end Check_And_Split_Unconstrained_Function;
 
+   -
+   -- Check_Object_Renaming_In_GNATprove_Mode --
+   -
+
+   procedure Check_Object_Renaming_In_GNATprove_Mode (Spec_Id : Entity_Id) is
+  Decl  : constant Node_Id := Unit_Declaration_Node (Spec_Id);
+  Body_Decl : constant Node_Id :=
+Unit_Declaration_Node (Corresponding_Body (Decl));
+
+  function Check_Object_Renaming (N : Node_Id) return Traverse_Result;
+  --  Returns Abandon on node N if this is a reference to an object
+  --  renaming, which will be expanded into the renamed object in
+  --  GNATprove mode.
+
+  ---
+  -- Check_Object_Renaming --
+  ---
+
+  function Check_Object_Renaming (N : Node_Id) return Traverse_Result is
+  begin
+ case Nkind (Original_Node (N)) is
+when N_Expanded_Name
+   | N_Identifier
+=>
+   declare
+  Obj_Id : constant Entity_Id := Entity (Original_Node (N));
+   begin
+  --  Recognize the case when SPARK expansion rewrites a
+  --  reference to an object renaming.
+
+  if Present (Obj_Id)
+and then Is_Object (Obj_Id)
+and then Present (Renamed_Object (Obj_Id))
+and then Nkind (Renamed_Object (Obj_Id)) not in N_Entity
+
+--  Copy_Generic_Node called for inlining expects the
+--  references to global entities to have the same kind
+--  in the "generic" code and its "instantiation".
+
+and then Nkind (Original_Node (N)) /=
+  Nkind (Renamed_Object (Obj_Id))
+  then
+ return Abandon;
+  else
+ return OK;
+  end if;
+   end;
+
+when others =>
+   return OK;
+ end case;
+  end Check_Object_Renaming;
+
+  function Check_All_Object_Renamings is new
+Traverse_Func (Check_Object_Renaming);
+
+   --  Start of processing for Check_Object_Renaming_In_GNATprove_Mode
+
+   begin
+  --  Subprograms with object renamings replaced by the special SPARK
+  --  expansion cannot be inlined.
+
+  if Check_All_Object_Renamings (Body_Decl) /= OK then
+ Cannot_Inline ("cannot inline & (object renaming)?",
+Body_Decl, Spec_Id);
+ Set_Body_To

[Ada] Provide allocation subtype for allocators of a Designated_Storage_Model type

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

When an allocator is for an access type that has a
Designated_Storage_Model aspect, and the designated type is an
unconstrained record type with discriminants, and the subtype associated
with the allocator is constrained, a dereference of the new access value
can be passed to the designated type's initialization procedure. The
post-front-end phase of the compiler needs to be able to create a
temporary object in the host memory space to pass to the init proc,
which requires creating such an object, but the subtype needed for the
allocation isn't readily available at the point of the dereference.  To
make the subtype easily accessible, we set the Actual_Designated_Subtype
of such a dereference to the subtype of the allocated object.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch4.adb (Expand_N_Allocator): For an allocator with an
unconstrained discriminated designated type, and whose
allocation subtype is constrained, set the
Actual_Designated_Subtype of the dereference passed to the init
proc of the designated type to be the allocation subtype.
* sinfo.ads: Add documentation of new setting of
Actual_Designated_Subtype on a dereference used as an actual
parameter of call to an init proc associated with an allocator.
Also add missing syntax and documentation for the GNAT language
extension that allows an expression as a default for a concrete
generic formal function.diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -5135,6 +5135,30 @@ package body Exp_Ch4 is
 Set_Expression (N, New_Occurrence_Of (Typ, Loc));
  end if;
 
+ --  When the designated subtype is unconstrained and
+ --  the allocator specifies a constrained subtype (or
+ --  such a subtype has been created, such as above by
+ --  Build_Default_Subtype), associate that subtype with
+ --  the dereference of the allocator's access value.
+ --  This is needed by the back end for cases where
+ --  the access type has a Designated_Storage_Model,
+ --  to support allocation of a host object of the right
+ --  size for passing to the initialization procedure.
+
+ if not Is_Constrained (Dtyp)
+   and then Is_Constrained (Typ)
+ then
+declare
+   Init_Deref : constant Node_Id :=
+ Unqual_Conv (Init_Arg1);
+begin
+   pragma Assert
+ (Nkind (Init_Deref) = N_Explicit_Dereference);
+
+   Set_Actual_Designated_Subtype (Init_Deref, Typ);
+end;
+ end if;
+
  Discr := First_Elmt (Discriminant_Constraint (Typ));
  while Present (Discr) loop
 Nod := Node (Discr);


diff --git a/gcc/ada/sinfo.ads b/gcc/ada/sinfo.ads
--- a/gcc/ada/sinfo.ads
+++ b/gcc/ada/sinfo.ads
@@ -816,12 +816,15 @@ package Sinfo is
 
--  Actual_Designated_Subtype
--Present in N_Free_Statement and N_Explicit_Dereference nodes. If gigi
-   --needs to known the dynamic constrained subtype of the designated
-   --object, this attribute is set to that type. This is done for
-   --N_Free_Statements for access-to-classwide types and access to
-   --unconstrained packed array types, and for N_Explicit_Dereference when
-   --the designated type is an unconstrained packed array and the
-   --dereference is the prefix of a 'Size attribute reference.
+   --needs to know the dynamic constrained subtype of the designated
+   --object, this attribute is set to that subtype. This is done for
+   --N_Free_Statements for access-to-classwide types and access-to-
+   --unconstrained packed array types. For N_Explicit_Dereference,
+   --this is done in two circumstances: 1) when the designated type is
+   --an unconstrained packed array and the dereference is the prefix of
+   --a 'Size attribute reference, or 2) when the dereference node is
+   --created for the expansion of an allocator with a subtype_indication
+   --and the designated subtype is an unconstrained discriminated type.
 
--  Address_Warning_Posted
--Present in N_Attribute_Definition nodes. Set to indicate that we have
@@ -7313,10 +7316,15 @@ package Sinfo is
   --  Specification
   --  Default_Name (set to Empty if no subprogram default)
   --  Box_Present
+  --  Expression (set to Empty if no expression present)
 
-  --  Note: if no subprogram default is present, then Name is set
+  --  Note: If no subprogram default is p

[Ada] Cleanups related to front-end SJLJ

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

This patch cleans up some code that is left over from the front-end SJLJ
exception handling mechanism, which has been removed.
This is in preparation for fixing a finalization-related bug.

Most importantly:

The documentation is changed: a Handled_Sequence_Of_Statements node
CAN contain both Exception_Handlers and an At_End_Proc.

The assertion contradicting that is removed from
Expand_At_End_Handler.

The From_At_End field is removed.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sinfo.ads: Remove From_At_End.  Update comments.
* gen_il-fields.ads, gen_il-gen-gen_nodes.adb, sem_ch11.adb:
Remove From_At_End.
* exp_ch11.adb (Expand_At_End_Handler): Remove assertion.
* fe.h (Exception_Mechanism, Exception_Mechanism_Type, Has_DIC,
Has_Invariants, Is_List_Member, List_Containing): Remove
declarations that are not used in gigi.
* opt.ads (Exception_Mechanism): This is not used in gigi.
* exp_util.ads: Minor comment fix.diff --git a/gcc/ada/exp_ch11.adb b/gcc/ada/exp_ch11.adb
--- a/gcc/ada/exp_ch11.adb
+++ b/gcc/ada/exp_ch11.adb
@@ -85,8 +85,6 @@ package body Exp_Ch11 is
   pragma Unreferenced (Blk_Id);
begin
   pragma Assert (Present (Entity (At_End_Proc (HSS;
-  pragma Assert (No (Exception_Handlers (HSS)));
-  return;
end Expand_At_End_Handler;
 
---


diff --git a/gcc/ada/exp_util.ads b/gcc/ada/exp_util.ads
--- a/gcc/ada/exp_util.ads
+++ b/gcc/ada/exp_util.ads
@@ -1105,8 +1105,8 @@ package Exp_Util is
--1) controlled objects
--2) library-level tagged types
--
-   --  These cases require special actions on scope exit. The flag Lib_Level
-   --  is set True if the construct is at library level, and False otherwise.
+   --  These cases require special actions on scope exit. Lib_Level is True if
+   --  the construct is at library level, and False otherwise.
 
function Safe_Unchecked_Type_Conversion (Exp : Node_Id) return Boolean;
--  Given the node for an N_Unchecked_Type_Conversion, return True if this


diff --git a/gcc/ada/fe.h b/gcc/ada/fe.h
--- a/gcc/ada/fe.h
+++ b/gcc/ada/fe.h
@@ -207,7 +207,6 @@ extern Boolean In_Extended_Main_Code_Unit	(Entity_Id);
 #define Enable_128bit_Types		opt__enable_128bit_types
 #define Exception_Extra_Info		opt__exception_extra_info
 #define Exception_Locations_Suppressed	opt__exception_locations_suppressed
-#define Exception_Mechanism		opt__exception_mechanism
 #define Generate_SCO_Instance_Table	opt__generate_sco_instance_table
 #define GNAT_Mode			opt__gnat_mode
 #define List_Representation_Info	opt__list_representation_info
@@ -218,10 +217,6 @@ typedef enum {
   Ada_83, Ada_95, Ada_2005, Ada_2012, Ada_2022, Ada_With_Extensions
 } Ada_Version_Type;
 
-typedef enum {
-  Back_End_ZCX, Back_End_SJLJ
-} Exception_Mechanism_Type;
-
 extern Ada_Version_Type Ada_Version;
 extern Boolean Assume_No_Invalid_Values;
 extern Boolean Back_End_Inlining;
@@ -229,7 +224,6 @@ extern Boolean Debug_Generated_Code;
 extern Boolean Enable_128bit_Types;
 extern Boolean Exception_Extra_Info;
 extern Boolean Exception_Locations_Suppressed;
-extern Exception_Mechanism_Type Exception_Mechanism;
 extern Boolean Generate_SCO_Instance_Table;
 extern Boolean GNAT_Mode;
 extern Int List_Representation_Info;
@@ -645,12 +639,6 @@ B Is_Floating_Point_Type  (E Id);
 #define Is_Record_Type einfo__utils__is_record_type
 B Is_Record_Type  (E Id);
 
-#define Has_DIC einfo__utils__has_dic
-B Has_DIC (E Id);
-
-#define Has_Invariants einfo__utils__has_invariants
-B Has_Invariants (E Id);
-
 #define Is_Full_Access einfo__utils__is_full_access
 B Is_Full_Access (E Id);
 
@@ -668,12 +656,6 @@ E Next_Stored_Discriminant (E Id);
 // fe.h is included before einfo.h.
 Entity_Kind Parameter_Mode (E Id);
 
-#define Is_List_Member einfo__utils__is_list_member
-B Is_List_Member (N Node);
-
-#define List_Containing einfo__utils__list_containing
-S List_Containing (N Node);
-
 // The following is needed because Convention in Sem_Util is a renaming
 // of Basic_Convention.
 


diff --git a/gcc/ada/gen_il-fields.ads b/gcc/ada/gen_il-fields.ads
--- a/gcc/ada/gen_il-fields.ads
+++ b/gcc/ada/gen_il-fields.ads
@@ -191,7 +191,6 @@ package Gen_IL.Fields is
   Formal_Type_Definition,
   Forwards_OK,
   From_Aspect_Specification,
-  From_At_End,
   From_At_Mod,
   From_Conditional_Expression,
   From_Default,


diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -1043,8 +1043,7 @@ begin -- Gen_IL.Gen.Gen_Nodes
 
Cc (N_Raise_Statement, N_Statement_Other_Than_Procedure_Call,
(Sy (Name, Node_Id, Default_Empty),
-Sy (Expression, Node_Id, Default_Empty),
-Sm (From_At_End, Flag)));
+Sy (Expression, Node_Id, Default_Empty)));
 
Cc (N_R

[Ada] GNAT.Binary_Search is not internal

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

Put package GNAT.Binary_Search to predefined units list.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* impunit.adb: Add "g-binsea" to Non_Imp_File_Names_95 list.diff --git a/gcc/ada/impunit.adb b/gcc/ada/impunit.adb
--- a/gcc/ada/impunit.adb
+++ b/gcc/ada/impunit.adb
@@ -241,6 +241,7 @@ package body Impunit is
 ("g-arrspl", F),  -- GNAT.Array_Split
 ("g-awk   ", F),  -- GNAT.AWK
 ("g-binenv", F),  -- GNAT.Bind_Environment
+("g-binsea", F),  -- GNAT.Binary_Search
 ("g-boubuf", F),  -- GNAT.Bounded_Buffers
 ("g-boumai", F),  -- GNAT.Bounded_Mailboxes
 ("g-brapre", F),  -- GNAT.Branch_Prediction

[Ada] Fix insertion of declaration inside quantified expression

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

When the evaluation of the subtype_indication for the
iterator_specification of a quantified_expression leads to the insertion
of a type declaration, this should be done with Insert_Action instead of
Insert_Before.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch5.adb (Analyze_Iterator_Specification): Use
Insert_Action when possibly inside an expression.diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb
--- a/gcc/ada/sem_ch5.adb
+++ b/gcc/ada/sem_ch5.adb
@@ -2316,7 +2316,7 @@ package body Sem_Ch5 is
   Defining_Identifier => S,
   Subtype_Indication  => New_Copy_Tree (Subt));
 begin
-   Insert_Before (Parent (Parent (N)), Decl);
+   Insert_Action (N, Decl);
Analyze (Decl);
Rewrite (Subt, New_Occurrence_Of (S, Sloc (Subt)));
 end;

[Ada] Fix Forced sign flag in formatted string

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

Fix the Forced sign flag that is incorrectly ignored for scientific
notation and shortest representation.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/g-forstr.adb (Is_Number): Add scientific notation and
shortest representation.diff --git a/gcc/ada/libgnat/g-forstr.adb b/gcc/ada/libgnat/g-forstr.adb
--- a/gcc/ada/libgnat/g-forstr.adb
+++ b/gcc/ada/libgnat/g-forstr.adb
@@ -58,7 +58,7 @@ package body GNAT.Formatted_String is
 
type Sign_Kind is (Neg, Zero, Pos);
 
-   subtype Is_Number is F_Kind range Decimal_Int .. Decimal_Float;
+   subtype Is_Number is F_Kind range Decimal_Int .. Shortest_Decimal_Float_Up;
 
type F_Sign is (If_Neg, Forced, Space) with Default_Value => If_Neg;

[Ada] Fix small glitch in Expand_N_Full_Type_Declaration

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

The original node is not guaranteed to also be an
N_Full_Type_Declaration, so the code needs to look into the node itself.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch3.adb (Expand_N_Full_Type_Declaration): Look into N.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -6102,8 +6102,7 @@ package body Exp_Ch3 is
  Par_Id := Base_Type (Full_View (Par_Id));
   end if;
 
-  if Nkind (Type_Definition (Original_Node (N))) =
-   N_Derived_Type_Definition
+  if Nkind (Type_Definition (N)) = N_Derived_Type_Definition
 and then not Is_Tagged_Type (Def_Id)
 and then Present (Freeze_Node (Par_Id))
 and then Present (TSS_Elist (Freeze_Node (Par_Id)))

[Ada] Requires_Cleanup_Actions and N_Protected_Body

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

This patch disallows N_Protected_Body from being passed to
Requires_Cleanup_Actions. Protected bodies never need cleanup, and are
never passed to Requires_Cleanup_Actions, which is a good thing, because
it would blow up on Handled_Statement_Sequence, which doesn't exist for
N_Protected_Body.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_util.adb (Requires_Cleanup_Actions): Remove
N_Protected_Body from the case statement, so that case will be
covered by "raise Program_Error".diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -12781,7 +12781,6 @@ package body Exp_Util is
 | N_Block_Statement
 | N_Entry_Body
 | N_Package_Body
-| N_Protected_Body
 | N_Subprogram_Body
 | N_Task_Body
  =>

[Ada] Output.w always writes to stderr

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

There are several debugging procedures called Output.w, and some
output-redirection features. This patch modifies Output.w so their
output is not redirected; it always goes to standard error. Otherwise,
debugging output can get mixed in with some "real" output (perhaps to a
file), which causes confusion and in some cases failure to build.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* output.adb (Pop_Output, Set_Output): Unconditionally flush
output when switching from one output destination to another.
Otherwise buffering can cause garbled output.
(w): Push/pop the current settings, and temporarily
Set_Standard_Error during these procedures.diff --git a/gcc/ada/output.adb b/gcc/ada/output.adb
--- a/gcc/ada/output.adb
+++ b/gcc/ada/output.adb
@@ -235,6 +235,7 @@ package body Output is
 
procedure Pop_Output is
begin
+  Flush_Buffer;
   pragma Assert (FD_Stack_Idx >= FD_Array'First);
   Current_FD := FD_Stack (FD_Stack_Idx);
   FD_Stack_Idx := FD_Stack_Idx - 1;
@@ -292,10 +293,7 @@ package body Output is
 
procedure Set_Output (FD : File_Descriptor) is
begin
-  if Special_Output_Proc = null then
- Flush_Buffer;
-  end if;
-
+  Flush_Buffer;
   Current_FD := FD;
end Set_Output;
 
@@ -323,59 +321,99 @@ package body Output is
 
procedure w (C : Character) is
begin
+  Push_Output;
+  Set_Standard_Error;
+
   Write_Char (''');
   Write_Char (C);
   Write_Char (''');
   Write_Eol;
+
+  Pop_Output;
end w;
 
procedure w (S : String) is
begin
+  Push_Output;
+  Set_Standard_Error;
+
   Write_Str (S);
   Write_Eol;
+
+  Pop_Output;
end w;
 
procedure w (V : Int) is
begin
+  Push_Output;
+  Set_Standard_Error;
+
   Write_Int (V);
   Write_Eol;
+
+  Pop_Output;
end w;
 
procedure w (B : Boolean) is
begin
+  Push_Output;
+  Set_Standard_Error;
+
   if B then
  w ("True");
   else
  w ("False");
   end if;
+
+  Pop_Output;
end w;
 
procedure w (L : String; C : Character) is
begin
+  Push_Output;
+  Set_Standard_Error;
+
   Write_Str (L);
   Write_Char (' ');
   w (C);
+
+  Pop_Output;
end w;
 
procedure w (L : String; S : String) is
begin
+  Push_Output;
+  Set_Standard_Error;
+
   Write_Str (L);
   Write_Char (' ');
   w (S);
+
+  Pop_Output;
end w;
 
procedure w (L : String; V : Int) is
begin
+  Push_Output;
+  Set_Standard_Error;
+
   Write_Str (L);
   Write_Char (' ');
   w (V);
+
+  Pop_Output;
end w;
 
procedure w (L : String; B : Boolean) is
begin
+  Push_Output;
+  Set_Standard_Error;
+
   Write_Str (L);
   Write_Char (' ');
   w (B);
+
+  Pop_Output;
end w;

[Ada] Generic binary search implementation

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

Allows binary search in sorted anonymous array (or array-like
container).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/g-binsea.ads, libgnat/g-binsea.adb
(GNAT.Binary_Search): New package.
* Makefile.rtl (GNATRTL_NONTASKING_OBJS): New item in list.
* doc/gnat_rm/the_gnat_library.rst (GNAT.Binary_Search): New
package record.
* gnat_rm.texi: Regenerate.

patch.diff.gz
Description: application/gzip

[Ada] Fix bogus visibility error with partially parameterized formal package

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

The problem comes from the special instantiation (abbreviated instantiation
in GNAT parlance) done to check conformance between a formal package and its
corresponding actual in a generic instantiation: the compiler instantiates
the formal package, in the context of the generic instantiation, so that it
can check the conformance of the actual with the result.

More precisely, it occurs with formal packages that are only partially
parameterized, i.e. that have at least one parameter association and an
(others => <>) choice. In this case, RM 12.7(10/2) says that the visible
part of the formal package contains a copy of the formal parameters that
are not explicitly associated.

The analysis of these copies for the abbreviated instantiation is not done
in the correct context when the generic unit is a child generic unit.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch12.ads (Is_Abbreviated_Instance): Declare.
* sem_ch12.adb (Check_Abbreviated_Instance): Declare.
(Requires_Conformance_Checking): Declare.
(Analyze_Association.Process_Default): Fix subtype of parameter.
(Analyze_Formal_Object_Declaration): Check whether it is in the
visible part of abbreviated instance.
(Analyze_Formal_Subprogram_Declaration): Likewise.
(Analyze_Formal_Type_Declaration): Likewise.
(Analyze_Package_Instantiation): Do not check for a generic child
unit in the case of an abbreviated instance.
(Check_Abbreviated_Instance): New procedure.
(Check_Formal_Packages): Tidy up.
(Copy_Generic_Elist): Fix comment.
(Instantiate_Formal_Package): Tidy up.  If the generic unit is a
child unit, copy the qualified name onto the abbreviated instance.
(Is_Abbreviated_Instance): New function.
(Collect_Previous_Instances): Call Is_Abbreviated_Instance.
(Requires_Conformance_Checking): New function.
* sem_ch7.adb (Analyze_Package_Specification): Do not install the
private declarations of the parent for an abbreviated instance.diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -516,6 +516,22 @@ package body Sem_Ch12 is
--  The body of the wrapper is a call to the actual, with the generated
--  pre/postconditon checks added.
 
+   procedure Check_Abbreviated_Instance
+ (N: Node_Id;
+  Parent_Installed : in out Boolean);
+   --  If the name of the generic unit in an abbreviated instantiation is an
+   --  expanded name, then the prefix may be an instance and the selector may
+   --  designate a child unit. If the parent is installed as a result of this
+   --  call, then Parent_Installed is set True, otherwise Parent_Installed is
+   --  unchanged by the call.
+
+   --  This routine needs to be called for declaration nodes of formal objects,
+   --  types and subprograms to check whether they are the copy, present in the
+   --  visible part of the abbreviated instantiation of formal packages, of the
+   --  declaration node of their corresponding formal parameter in the template
+   --  of the formal package, as specified by RM 12.7(10/2), so as to establish
+   --  the proper context for their analysis.
+
procedure Check_Access_Definition (N : Node_Id);
--  Subsidiary routine to null exclusion processing. Perform an assertion
--  check on Ada version and the presence of an access definition in N.
@@ -865,6 +881,10 @@ package body Sem_Ch12 is
procedure Remove_Parent (In_Body : Boolean := False);
--  Reverse effect after instantiation of child is complete
 
+   function Requires_Conformance_Checking (N : Node_Id) return Boolean;
+   --  Determine whether the formal package declaration N requires conformance
+   --  checking with actuals in instantiations.
+
procedure Restore_Hidden_Primitives (Prims_List : in out Elist_Id);
--  Restore suffix 'P' to primitives of Prims_List and leave Prims_List
--  set to No_Elist.
@@ -1160,10 +1180,10 @@ package body Sem_Ch12 is
   --  association for it includes a box, or whether the associations
   --  include an Others clause.
 
-  procedure Process_Default (F : Entity_Id);
-  --  Add a copy of the declaration of generic formal F to the list of
-  --  associations, and add an explicit box association for F if there
-  --  is none yet, and the default comes from an Others_Choice.
+  procedure Process_Default (Formal : Node_Id);
+  --  Add a copy of the declaration of a generic formal to the list of
+  --  associations, and add an explicit box association for its entity
+  --  if there is none yet, and the default comes from an Others_Choice.
 
   function Renames_Standard_Subprogram (Subp : Entity_Id) return Boolean;
   --  Determine whether Subp renames one of the subprograms defined in the
@@ -1517,9 +1537,9 @@ package body Sem_Ch12 is
   -- Process_Default --

[Ada] Take full view of private type

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

This allows to resolve the following:

  type Rec (<>) is private;
  type Arr (<>) is private;
   private
  type Arr is array (Positive range <>) of Natural;
  type Rec (L : Natural) is record
 F1 : Integer;
 F2 : Arr (1 .. L);
  end record;

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch3.adb (Analyze_Subtype_Declaration): Use underlying type
of Indic_Typ.
(Constrain_Array): Ditto for T.diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -5978,7 +5978,7 @@ package body Sem_Ch3 is
   if Nkind (Subtype_Indication (N)) = N_Subtype_Indication then
  declare
 Indic_Typ: constant Entity_Id :=
- Etype (Subtype_Mark (Subtype_Indication (N)));
+  Underlying_Type (Etype (Subtype_Mark (Subtype_Indication (N;
 Subt_Index   : Node_Id;
 Target_Index : Node_Id;
 
@@ -13595,6 +13595,8 @@ package body Sem_Ch3 is
  T := Designated_Type (T);
   end if;
 
+  T := Underlying_Type (T);
+
   --  If an index constraint follows a subtype mark in a subtype indication
   --  then the type or subtype denoted by the subtype mark must not already
   --  impose an index constraint. The subtype mark must denote either an

[Ada] Allow 'Reduce with -gnat2022

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

After a period of experimentation, allow 'Reduce in Ada 2022 mode.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_attr.adb (Analyze_Attribute [Attribute_Reduce]): Allow
'Reduce for Ada 2022 and above.
* sem_attr.ads (Attribute_Impl_Def): 'Reduce is no longer
implementation defined.diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -5775,11 +5775,7 @@ package body Sem_Attr is
 
   when Attribute_Reduce =>
  Check_E2;
-
- if not Extensions_Allowed then
-Error_Attr
-  ("% attribute only supported under -gnatX", P);
- end if;
+ Error_Msg_Ada_2022_Feature ("Reduce attribute", Sloc (N));
 
  declare
 Stream : constant Node_Id := Prefix (N);


diff --git a/gcc/ada/sem_attr.ads b/gcc/ada/sem_attr.ads
--- a/gcc/ada/sem_attr.ads
+++ b/gcc/ada/sem_attr.ads
@@ -407,13 +407,6 @@ package Sem_Attr is
   --  as Range applied to the array itself. The result is of type universal
   --  integer.
 
-  
-  -- Reduce --
-  
-
-  Attribute_Reduce => True,
-  --  See AI12-0262-1
-
   -
   -- Ref --
   -

[Ada] Don't create calls to Abort_Undefer when not Abort_Allowed

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

Prevent creation of references to Abort_Undefer when aborts aren't
allowed. Another solution could have been an early return at
Expand_N_Asynchronous_Select's beginning, but this would break backends
that currently expect trees that do not contain any
N_Asynchronous_Selects in their AST (e.g. CodePeer).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch9.adb (Expand_N_Asynchronous_Select): Don't generate
Abort_Undefers when not Abort_Allowed.diff --git a/gcc/ada/exp_ch9.adb b/gcc/ada/exp_ch9.adb
--- a/gcc/ada/exp_ch9.adb
+++ b/gcc/ada/exp_ch9.adb
@@ -7812,7 +7812,9 @@ package body Exp_Ch9 is
 
  Hdle := New_List (Build_Abort_Block_Handler (Loc));
 
- Prepend_To (Astats, Build_Runtime_Call (Loc, RE_Abort_Undefer));
+ if Abort_Allowed then
+Prepend_To (Astats, Build_Runtime_Call (Loc, RE_Abort_Undefer));
+ end if;
 
  Abortable_Block :=
Make_Block_Statement (Loc,

[Ada] Typo fix in finalization comment

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

Add missing 's' and reformat the comment block.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch7.adb: Fix typo.diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -157,14 +157,14 @@ package body Exp_Ch7 is
-- Finalization Management --
-
 
-   --  This part describe how Initialization/Adjustment/Finalization procedures
-   --  are generated and called. Two cases must be considered, types that are
-   --  Controlled (Is_Controlled flag set) and composite types that contain
-   --  controlled components (Has_Controlled_Component flag set). In the first
-   --  case the procedures to call are the user-defined primitive operations
-   --  Initialize/Adjust/Finalize. In the second case, GNAT generates
-   --  Deep_Initialize, Deep_Adjust and Deep_Finalize that are in charge
-   --  of calling the former procedures on the controlled components.
+   --  This part describes how Initialization/Adjustment/Finalization
+   --  procedures are generated and called. Two cases must be considered: types
+   --  that are Controlled (Is_Controlled flag set) and composite types that
+   --  contain controlled components (Has_Controlled_Component flag set). In
+   --  the first case the procedures to call are the user-defined primitive
+   --  operations Initialize/Adjust/Finalize. In the second case, GNAT
+   --  generates Deep_Initialize, Deep_Adjust and Deep_Finalize that are in
+   --  charge of calling the former procedures on the controlled components.
 
--  For records with Has_Controlled_Component set, a hidden "controller"
--  component is inserted. This controller component contains its own

[Ada] Initialize Compiler_State to avoid Constraint_Error

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

When building gnat1 with `-gnatVa` as we do locally, rules like: `gcc -c
-gnatyM79 ` will throw a
constraint error as `lib.compiler_state` is initialized by par.adb, ie
after scanning. Therefore any error_msg thrown during scanning will
perform this uninitialized read (which raises a Constraint_Error when
the compiler was compiled with `-gnatVa`).

Initialize this flag to `Parsing`.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* lib.ads: initialize `Compiler_State` to `Parsing`.diff --git a/gcc/ada/lib.ads b/gcc/ada/lib.ads
--- a/gcc/ada/lib.ads
+++ b/gcc/ada/lib.ads
@@ -39,7 +39,7 @@ package Lib is
--  Type to hold list of indirect references to unit number table
 
type Compiler_State_Type is (Parsing, Analyzing);
-   Compiler_State : Compiler_State_Type;
+   Compiler_State : Compiler_State_Type := Parsing;
--  Indicates current state of compilation. This is used to implement the
--  function In_Extended_Main_Source_Unit.

[Ada] Deal with derived record types in Has_Compatible_Representation

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

More precisely, untagged record types, as tagged record types are already
handled by the predicate.  If the derived type has not been given its own
representation clause, then the representations are the same.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Has_Compatible_Representation): Return true for
derived untagged record types without representation clause.diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -13539,6 +13539,16 @@ package body Sem_Ch13 is
  if Is_Packed (T1) /= Is_Packed (T2) then
 return False;
 
+ --  If the operand type is derived from the target type and no clause
+ --  has been given after the derivation, then the representations are
+ --  the same since the derived type inherits that of the parent type.
+
+ elsif Is_Derived_Type (T2)
+   and then Etype (T2) = T1
+   and then not Has_Record_Rep_Clause (T2)
+ then
+return True;
+
  --  Otherwise we must check components. Typ2 maybe a constrained
  --  subtype with fewer components, so we compare the components
  --  of the base types.

[Ada] Streamline implementation of Has_Compatible_Representation

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

The predicate is only concerned with the internal representation of types
and this representation is shared by the subtypes of a given type, so the
implementation can directly look into the (implementation) base types.

No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.ads (Has_Compatible_Representation): Minor tweaks.
* sem_ch13.adb (Has_Compatible_Representation): Look directly into
the (implementation) base types and simplifiy accordingly.
* exp_ch5.adb (Change_Of_Representation): Adjust.
* exp_ch6.adb (Expand_Actuals): Likewise.diff --git a/gcc/ada/exp_ch5.adb b/gcc/ada/exp_ch5.adb
--- a/gcc/ada/exp_ch5.adb
+++ b/gcc/ada/exp_ch5.adb
@@ -292,8 +292,8 @@ package body Exp_Ch5 is
   return
 Nkind (Rhs) = N_Type_Conversion
   and then not Has_Compatible_Representation
- (Target_Type  => Etype (Rhs),
-  Operand_Type => Etype (Expression (Rhs)));
+ (Target_Typ  => Etype (Rhs),
+  Operand_Typ => Etype (Expression (Rhs)));
end Change_Of_Representation;
 
--


diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -1576,8 +1576,8 @@ package body Exp_Ch6 is
 Var := Make_Var (Expression (Actual));
 
 Crep := not Has_Compatible_Representation
-  (Target_Type  => F_Typ,
-   Operand_Type => Etype (Expression (Actual)));
+  (Target_Typ  => F_Typ,
+   Operand_Typ => Etype (Expression (Actual)));
 
  else
 V_Typ := Etype (Actual);
@@ -2379,8 +2379,8 @@ package body Exp_Ch6 is
   --  Also pass by copy if change of representation
 
   or else not Has_Compatible_Representation
-(Target_Type  => Etype (Formal),
- Operand_Type => Etype (Expression (Actual
+(Target_Typ  => Etype (Formal),
+ Operand_Typ => Etype (Expression (Actual
 then
Add_Call_By_Copy_Code;
 
@@ -4556,8 +4556,8 @@ package body Exp_Ch6 is
   --  warning, and do the change of representation.
 
   elsif not Has_Compatible_Representation
-  (Target_Type  => Formal_Typ,
-   Operand_Type => Parent_Typ)
+  (Target_Typ  => Formal_Typ,
+   Operand_Typ => Parent_Typ)
   then
  Error_Msg_N
("??change of representation required", Actual);


diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -13436,56 +13436,40 @@ package body Sem_Ch13 is
---
 
function Has_Compatible_Representation
- (Target_Type, Operand_Type : Entity_Id) return Boolean
+ (Target_Typ, Operand_Typ : Entity_Id) return Boolean
is
-  T1 : constant Entity_Id := Underlying_Type (Target_Type);
-  T2 : constant Entity_Id := Underlying_Type (Operand_Type);
+  --  The subtype-specific representation attributes (Size and Alignment)
+  --  do not affect representation from the point of view of this function.
 
-   begin
-  --  A quick check, if base types are the same, then we definitely have
-  --  the same representation, because the subtype specific representation
-  --  attributes (Size and Alignment) do not affect representation from
-  --  the point of view of this test.
-
-  if Base_Type (T1) = Base_Type (T2) then
- return True;
+  T1 : constant Entity_Id := Implementation_Base_Type (Target_Typ);
+  T2 : constant Entity_Id := Implementation_Base_Type (Operand_Typ);
 
-  elsif Is_Private_Type (Base_Type (T2))
-and then Base_Type (T1) = Full_View (Base_Type (T2))
-  then
- return True;
-
-  --  If T2 is a generic actual it is declared as a subtype, so
-  --  check against its base type.
+   begin
+  --  Return true immediately for the same base type
 
-  elsif Is_Generic_Actual_Type (T1)
-and then Has_Compatible_Representation (Base_Type (T1), T2)
-  then
+  if T1 = T2 then
  return True;
-  end if;
 
   --  Tagged types always have the same representation, because it is not
   --  possible to specify different representations for common fields.
 
-  if Is_Tagged_Type (T1) then
+  elsif Is_Tagged_Type (T1) then
  return True;
-  end if;
 
   --  Representations are definitely different if conventions differ
 
-  if Convention (T1) /= Convention (T2) then
+  elsif Convention (T1) /= Convention (T2) then
  return Fa

[Ada] Remove superfluous call to Original_Node

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

The function Same_Object starts by taking the Original_Node of its
arguments.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch5.adb (Analyze_Assignment): Remove superfluous call to
Original_Node.diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb
--- a/gcc/ada/sem_ch5.adb
+++ b/gcc/ada/sem_ch5.adb
@@ -,7 +,7 @@ package body Sem_Ch5 is
 
  --  Where the object is the same on both sides
 
- and then Same_Object (Lhs, Original_Node (Rhs))
+ and then Same_Object (Lhs, Rhs)
 
  --  But exclude the case where the right side was an operation that
  --  got rewritten (e.g. JUNK + K, where K was known to be zero). We

[Ada] Crash freezing declaration that will raise constraint error

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

When the compiler is built with assertions enabled and processes the
following declarations:

   type Vector_Boolean_Array is array (1 .. 10) of Boolean;
   O2 : constant Vector_Boolean_Array := [for J in 2 => True];

The expression is rewritten by the frontend with an N_Raise_CE node,
which leads to an assertion error at the freezing point of the object
declaration.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* freeze.adb (Freeze_Entity): Protect the call to
Declaration_Node against entities of expressions replaced by the
frontend with an N_Raise_CE node.diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -6582,9 +6582,13 @@ package body Freeze is
end if;
 end if;
 
---  Special processing for objects created by object declaration
+--  Special processing for objects created by object declaration;
+--  we protect the call to Declaration_Node against entities of
+--  expressions replaced by the frontend with an N_Raise_CE node.
 
-if Nkind (Declaration_Node (E)) = N_Object_Declaration then
+if Ekind (E) in E_Constant | E_Variable
+  and then Nkind (Declaration_Node (E)) = N_Object_Declaration
+then
Freeze_Object_Declaration (E);
 end if;

[Ada] Use Actions field of freeze nodes for subprograms

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

This has a couple of advantages: 1) the actions are analyzed with checks
disabled and 2) they are considered elaboration code by Sem_Elab.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch13.adb (Expand_N_Freeze_Entity): Delete freeze nodes for
subprograms only if they have no actions.
* exp_ch6.adb (Freeze_Subprogram): Put the actions into the Actions
field of the freeze node instead of inserting them after it.
* sem_elab.adb (Is_SPARK_Semantic_Target): Fix typo in comment.
* gcc-interface/trans.cc (process_freeze_entity): Return early for
freeze nodes of subprograms with Interface_Alias set.diff --git a/gcc/ada/exp_ch13.adb b/gcc/ada/exp_ch13.adb
--- a/gcc/ada/exp_ch13.adb
+++ b/gcc/ada/exp_ch13.adb
@@ -617,14 +617,12 @@ package body Exp_Ch13 is
   elsif Is_Subprogram (E) then
  Exp_Ch6.Freeze_Subprogram (N);
 
- --  Ada 2005 (AI-251): Remove the freezing node associated with the
- --  entities internally used by the frontend to register primitives
- --  covering abstract interfaces. The call to Freeze_Subprogram has
- --  already expanded the code that fills the corresponding entry in
- --  its secondary dispatch table and therefore the code generator
- --  has nothing else to do with this freezing node.
-
- Delete := Present (Interface_Alias (E));
+ --  Ada 2005 (AI-251): Remove the freeze nodes associated with the
+ --  entities internally used by the front end to register primitives
+ --  covering abstract interfaces if they have no side effects. For the
+ --  others, gigi must discard them after evaluating the side effects.
+
+ Delete := Present (Interface_Alias (E)) and then No (Actions (N));
   end if;
 
   --  Analyze actions generated by freezing. The init_proc contains source


diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -7856,6 +7856,8 @@ package body Exp_Ch6 is
  declare
 Typ : constant Entity_Id := Scope (DTC_Entity (Subp));
 
+L : List_Id;
+
  begin
 --  Handle private overridden primitives
 
@@ -7895,8 +7897,17 @@ package body Exp_Ch6 is
  Register_Predefined_DT_Entry (Subp);
   end if;
 
-  Insert_Actions_After (N,
-Register_Primitive (Loc, Prim => Subp));
+  L := Register_Primitive (Loc, Prim => Subp);
+
+  if Is_Empty_List (L) then
+ null;
+
+  elsif No (Actions (N)) then
+ Set_Actions (N, L);
+
+  else
+ Append_List (L, Actions (N));
+  end if;
end if;
 end if;
  end;


diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -9045,6 +9045,11 @@ process_freeze_entity (Node_Id gnat_node)
   if (kind == E_Class_Wide_Type)
 return;
 
+  /* Likewise for the entities internally used by the front-end to register
+ primitives covering abstract interfaces, see Expand_N_Freeze_Entity.  */
+  if (Is_Subprogram (gnat_entity) && Present (Interface_Alias (gnat_entity)))
+return;
+
   /* Check for an old definition if this isn't an object with address clause,
  since the saved GCC tree is the address expression in that case.  */
   gnu_old


diff --git a/gcc/ada/sem_elab.adb b/gcc/ada/sem_elab.adb
--- a/gcc/ada/sem_elab.adb
+++ b/gcc/ada/sem_elab.adb
@@ -1845,7 +1845,7 @@ package body Sem_Elab is
 
   function Is_SPARK_Semantic_Target (Id : Entity_Id) return Boolean;
   pragma Inline (Is_SPARK_Semantic_Target);
-  --  Determine whether arbitrary entity Id nodes a source or internally
+  --  Determine whether arbitrary entity Id denotes a source or internally
   --  generated subprogram which emulates SPARK semantics.
 
   function Is_Subprogram_Inst (Id : Entity_Id) return Boolean;

[Ada] Implement calls to abstract subprograms in class-wide pre/post-conditions

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

In some special cases involving class-wide pre/post conditions, Ada
allows a non-dispatching call to an abstract function (which is usually
illegal).  Fix a bug in the implementation of Ada's rules about the
run-time behavior of such a call. Thanks to Javier Miranda for producing
this patch.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* contracts.adb (Build_Call_Helper_Body): Improve handling of
the case of a (legal) non-dispatching call to an abstract
subprogram.diff --git a/gcc/ada/contracts.adb b/gcc/ada/contracts.adb
--- a/gcc/ada/contracts.adb
+++ b/gcc/ada/contracts.adb
@@ -3899,7 +3899,16 @@ package body Contracts is
 Set_Corresponding_Body (Helper_Decl, Body_Id);
 Set_Must_Override (Body_Spec, False);
 
-if Present (Class_Preconditions (Subp_Id)) then
+if Present (Class_Preconditions (Subp_Id))
+--  Evaluate the expression if we are building a dynamic helper
+--  or we are building a static helper for a non-abstract tagged
+--  type; for abstract tagged types the helper just returns True
+--  since it is called by the indirect call wrapper (ICW).
+  and then
+(Is_Dynamic
+   or else
+  not Is_Abstract_Type (Find_Dispatching_Type (Subp_Id)))
+then
Return_Expr :=
  Copy_And_Update_References (Class_Preconditions (Subp_Id));
 
@@ -3910,7 +3919,8 @@ package body Contracts is
 --  enabled.
 
 else
-   pragma Assert (Present (Ignored_Class_Preconditions (Subp_Id)));
+   pragma Assert (Present (Ignored_Class_Preconditions (Subp_Id))
+ or else Is_Abstract_Type (Find_Dispatching_Type (Subp_Id)));
Return_Expr := New_Occurrence_Of (Standard_True, Loc);
 end if;

[Ada] Fix documentation of using attribute Loop_Entry in pragmas

2022-05-17 Thread Pierre-Marie de Rodat via Gcc-patches

Attribute Loop_Entry was initially only allowed to appear in pragmas
Loop_Variant and Loop_Invariant. Then it was also allowed to appear in
pragmas Assert, Assert_And_Cut and Assume, but this change was not
reflected in the GNAT RM.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* doc/gnat_rm/implementation_defined_attributes.rst
(Loop_Entry): Mention pragmas Assert, Assert_And_Cut and Assume;
refill.
* gnat_rm.texi: Regenerate.diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst b/gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst
--- a/gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst
@@ -629,10 +629,13 @@ to the value an expression had upon entry to the subprogram. The
 relevant loop is either identified by the given loop name, or it is the
 innermost enclosing loop when no loop name is given.
 
-A ``Loop_Entry`` attribute can only occur within a
-``Loop_Variant`` or ``Loop_Invariant`` pragma. A common use of
-``Loop_Entry`` is to compare the current value of objects with their
-initial value at loop entry, in a ``Loop_Invariant`` pragma.
+A ``Loop_Entry`` attribute can only occur within an ``Assert``,
+``Assert_And_Cut``, ``Assume``, ``Loop_Variant`` or ``Loop_Invariant`` pragma.
+In addition, such a pragma must be one of the items in the sequence
+of statements of a loop body, or nested inside block statements that
+appear in the sequence of statements of a loop body.
+A common use of ``Loop_Entry`` is to compare the current value of objects with
+their initial value at loop entry, in a ``Loop_Invariant`` pragma.
 
 The effect of using ``X'Loop_Entry`` is the same as declaring
 a constant initialized with the initial value of ``X`` at loop


diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -11028,10 +11028,13 @@ to the value an expression had upon entry to the subprogram. The
 relevant loop is either identified by the given loop name, or it is the
 innermost enclosing loop when no loop name is given.
 
-A @code{Loop_Entry} attribute can only occur within a
-@code{Loop_Variant} or @code{Loop_Invariant} pragma. A common use of
-@code{Loop_Entry} is to compare the current value of objects with their
-initial value at loop entry, in a @code{Loop_Invariant} pragma.
+A @code{Loop_Entry} attribute can only occur within an @code{Assert},
+@code{Assert_And_Cut}, @code{Assume}, @code{Loop_Variant} or @code{Loop_Invariant} pragma.
+In addition, such a pragma must be one of the items in the sequence
+of statements of a loop body, or nested inside block statements that
+appear in the sequence of statements of a loop body.
+A common use of @code{Loop_Entry} is to compare the current value of objects with
+their initial value at loop entry, in a @code{Loop_Invariant} pragma.
 
 The effect of using @code{X'Loop_Entry} is the same as declaring
 a constant initialized with the initial value of @code{X} at loop

1 2 >

1 - 100 of 114 matches

Mail list logo