Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Richard Biener via Gcc-patches
On Thu, Apr 23, 2020 at 12:31 AM Jeff Law  wrote:
>
> On Wed, 2020-04-22 at 15:50 -0500, Segher Boessenkool wrote:
> >
> > > > In some ways it feels like it would be easier to resurrect RTL SSA :-)
> >
> > Why was RTL SSA abandoned?
> >
> > It might well work to keep everything in SSA form all the way to RA.
> > Hrm, that doesn't sound bad at all :-)
> >
> > (The PHIs need to be made explicit to something that resembles the
> > machine code we will end up with, very early in the pipeline, but it
> > could still also be some valid SSA form; and we can of course also
> > have hard registers in all RTL, so that needs to be dealt with sanely
> > some way as well  Lots of details, I don't see a crucial problem though,
> > probably means I need to look harder ;-) )
> Lack of time mostly.  There's some complications like subregs, argument 
> registers
> and the like.  But you can restrict ssa based analysis & optimizations to just
> the set of pseudos that are in SSA form and do something more conservative on 
> the
> rest.

I guess time is better spent on trying to extend GIMPLE + SSA up to RA, thus
make instruction selection on GIMPLE.  Back in time Steven spent quite some
time doing factored SSA but I don't remember either what went wrong or whether
simply DF appeared first.

Richard.

> Jeff
>


RE: [GCC][PATCH][ARM]: Modify the MVE polymorphic variant arguments to match the MVE intrinsic definition.

2020-04-23 Thread Kyrylo Tkachov



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 22 April 2020 14:00
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [GCC][PATCH][ARM]: Modify the MVE polymorphic variant
> arguments to match the MVE intrinsic definition.
> 
> Hello,
> 
> When MVE intrinsic's are called, few implicit typecasting are done on the
> formal arguments to match the intrinsic parameters.
> But when same intrinsics are called through MVE polymorphic variants,
> _Generic feature used here does strict type checking and fails to match the
> exact intrinsic.
> This patch corrects the behaviour of polymorphic variants and match the
> expected intrinsic by explicitly typecasting the polymorphic variant's
> arguments.
> 
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more
> details.
> [1] https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/mve-intrinsics
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for trunk?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2020-04-22  Srinath Parvathaneni  
> 
>   * config/arm/arm_mve.h (__arm_vbicq_n_u16): Modify function
> parameter's
>   datatype.
>   (__arm_vbicq_n_s16): Likewise.
>   (__arm_vbicq_n_u32): Likewise.
>   (__arm_vbicq_n_s32): Likewise.
>   (__arm_vbicq): Likewise.
>   (__arm_vbicq_n_s16): Modify MVE polymorphic variant argument's
> datatype.
>   (__arm_vbicq_n_s32): Likewise.
>   (__arm_vbicq_n_u16): Likewise.
>   (__arm_vbicq_n_u32): Likewise.
>   (__arm_vdupq_m_n_s8): Likewise.
>   (__arm_vdupq_m_n_s16): Likewise.
>   (__arm_vdupq_m_n_s32): Likewise.
>   (__arm_vdupq_m_n_u8): Likewise.
>   (__arm_vdupq_m_n_u16): Likewise.
>   (__arm_vdupq_m_n_u32): Likewise.
>   (__arm_vdupq_m_n_f16): Likewise.
>   (__arm_vdupq_m_n_f32): Likewise.
>   (__arm_vldrhq_gather_offset_s16): Likewise.
>   (__arm_vldrhq_gather_offset_s32): Likewise.
>   (__arm_vldrhq_gather_offset_u16): Likewise.
>   (__arm_vldrhq_gather_offset_u32): Likewise.
>   (__arm_vldrhq_gather_offset_f16): Likewise.
>   (__arm_vldrhq_gather_offset_z_s16): Likewise.
>   (__arm_vldrhq_gather_offset_z_s32): Likewise.
>   (__arm_vldrhq_gather_offset_z_u16): Likewise.
>   (__arm_vldrhq_gather_offset_z_u32): Likewise.
>   (__arm_vldrhq_gather_offset_z_f16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_s16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_s32): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_u16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_u32): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_f16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_s16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_s32): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_u16): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_u32): Likewise.
>   (__arm_vldrhq_gather_shifted_offset_z_f16): Likewise.
>   (__arm_vldrwq_gather_offset_s32): Likewise.
>   (__arm_vldrwq_gather_offset_u32): Likewise.
>   (__arm_vldrwq_gather_offset_f32): Likewise.
>   (__arm_vldrwq_gather_offset_z_s32): Likewise.
>   (__arm_vldrwq_gather_offset_z_u32): Likewise.
>   (__arm_vldrwq_gather_offset_z_f32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_s32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_u32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_f32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_z_s32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_z_u32): Likewise.
>   (__arm_vldrwq_gather_shifted_offset_z_f32): Likewise.
>   (__arm_vdwdupq_x_n_u8): Likewise.
>   (__arm_vdwdupq_x_n_u16): Likewise.
>   (__arm_vdwdupq_x_n_u32): Likewise.
>   (__arm_viwdupq_x_n_u8): Likewise.
>   (__arm_viwdupq_x_n_u16): Likewise.
>   (__arm_viwdupq_x_n_u32): Likewise.
>   (__arm_vidupq_x_n_u8): Likewise.
>   (__arm_vddupq_x_n_u8): Likewise.
>   (__arm_vidupq_x_n_u16): Likewise.
>   (__arm_vddupq_x_n_u16): Likewise.
>   (__arm_vidupq_x_n_u32): Likewise.
>   (__arm_vddupq_x_n_u32): Likewise.
>   (__arm_vldrdq_gather_offset_s64): Likewise.
>   (__arm_vldrdq_gather_offset_u64): Likewise.
>   (__arm_vldrdq_gather_offset_z_s64): Likewise.
>   (__arm_vldrdq_gather_offset_z_u64): Likewise.
>   (__arm_vldrdq_gather_shifted_offset_s64): Likewise.
>   (__arm_vldrdq_gather_shifted_offset_u64): Likewise.
>   (__arm_vldrdq_gather_shifted_offset_z_s64): Likewise.
>   (__arm_vldrdq_gather_shifted_offset_z_u64): Likewise.
>   (__arm_vidupq_m_n_u8): Likewise.
>   (__arm_vidupq_m_n_u16): Likewise.
>   (__arm_vidupq_m_n_u32): Likewise.
>   (__arm_vddupq_m_n_u8): Likewise.
>   (__arm_vddupq_m_n_u16): Likewise.
>   (__arm_vddupq_m_n_u32): Likewise.
>   (__arm_vidupq_n_u16): Likewise.
>   (__arm_vidupq_n_u32): Like

Re: [PATCH] rs6000: Fix C++14 vs. C++17 ABI bug on powerpc64le [PR94707]

2020-04-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 22, 2020 at 05:10:56PM -0500, Segher Boessenkool wrote:
> > PR target/94707
> > * config/rs6000/rs6000-call.c (rs6000_aggregate_candidate): Add
> > CXX17_EMPTY_BASE_SEEN argument.  Pass it to recursive calls.
> > Ignore cxx17_empty_base_field_p fields after setting
> > *CXX17_EMPTY_BASE_SEEN to true.
> 
> Please use the literal capitalisation of symbol names?  So that grep and
> other search works (the ALL CAPS thing is for the function introductory
> comment (and other function comments) only).

Done.

> > +   if (cxx17_empty_base_field_p (field))
> > + {
> > +   *cxx17_empty_base_seen = true;
> > +   continue;
> > + }
> 
> Is there no way to describe this without referring to "c++17" (or even
> "base field")?  It's a pretty gross abstraction violation.

I'm afraid it is desirable to talk about c++17 and base field, otherwise
it won't be clear what we mean.

> > + inform (input_location,
> > + "prior to GCC 10, parameters of type "
> > + "%qT were passed incorrectly for C++17", type);
> 
> This could more explicitly say that makes the compiled code incompatible
> between GCC 10 and older, which is the point of the warning?  It's not
> really necessary to say the old one was bad -- let's hope it was :-)

After many iterations on IRC we came up with the following wording, which is
what I've committed.  Might be nice to also have URL with larger explanation
in changes.html. but the diagnostic framework doesn't support that yet.

2020-04-23  Jakub Jelinek  

PR target/94707
* config/rs6000/rs6000-call.c (rs6000_aggregate_candidate): Add
cxx17_empty_base_seen argument.  Pass it to recursive calls.
Ignore cxx17_empty_base_field_p fields after setting
*cxx17_empty_base_seen to true.
(rs6000_discover_homogeneous_aggregate): Adjust
rs6000_aggregate_candidate caller.  With -Wpsabi, diagnose homogeneous
aggregates with C++17 empty base fields.

* g++.dg/tree-ssa/pr27830.C: Use -Wpsabi -w for -std=c++17 and higher.

--- gcc/config/rs6000/rs6000-call.c.jj  2020-03-30 22:53:40.746640328 +0200
+++ gcc/config/rs6000/rs6000-call.c 2020-04-22 13:05:07.947809888 +0200
@@ -5528,7 +5528,8 @@ const struct altivec_builtin_types altiv
sub-tree.  */
 
 static int
-rs6000_aggregate_candidate (const_tree type, machine_mode *modep)
+rs6000_aggregate_candidate (const_tree type, machine_mode *modep,
+   bool *cxx17_empty_base_seen)
 {
   machine_mode mode;
   HOST_WIDE_INT size;
@@ -5598,7 +5599,8 @@ rs6000_aggregate_candidate (const_tree t
|| TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
  return -1;
 
-   count = rs6000_aggregate_candidate (TREE_TYPE (type), modep);
+   count = rs6000_aggregate_candidate (TREE_TYPE (type), modep,
+   cxx17_empty_base_seen);
if (count == -1
|| !index
|| !TYPE_MAX_VALUE (index)
@@ -5636,7 +5638,14 @@ rs6000_aggregate_candidate (const_tree t
if (TREE_CODE (field) != FIELD_DECL)
  continue;
 
-   sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep);
+   if (cxx17_empty_base_field_p (field))
+ {
+   *cxx17_empty_base_seen = true;
+   continue;
+ }
+
+   sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep,
+   cxx17_empty_base_seen);
if (sub_count < 0)
  return -1;
count += sub_count;
@@ -5669,7 +5678,8 @@ rs6000_aggregate_candidate (const_tree t
if (TREE_CODE (field) != FIELD_DECL)
  continue;
 
-   sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep);
+   sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep,
+   cxx17_empty_base_seen);
if (sub_count < 0)
  return -1;
count = count > sub_count ? count : sub_count;
@@ -5710,7 +5720,9 @@ rs6000_discover_homogeneous_aggregate (m
   && AGGREGATE_TYPE_P (type))
 {
   machine_mode field_mode = VOIDmode;
-  int field_count = rs6000_aggregate_candidate (type, &field_mode);
+  bool cxx17_empty_base_seen = false;
+  int field_count = rs6000_aggregate_candidate (type, &field_mode,
+   &cxx17_empty_base_seen);
 
   if (field_count > 0)
{
@@ -5725,6 +5737,18 @@ rs6000_discover_homogeneous_aggregate (m
*elt_mode = field_mode;
  if (n_elts)
*n_elts = field_count;
+ if (cxx17_empty_base_seen && warn_psabi)
+   {
+ static const_tree last_reported_type;
+ if (type != last_reported_type)
+   {
+ 

[rs6000] fix mffsl emulation

2020-04-23 Thread Alexandre Oliva


The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going
through the motions, but not storing the result in the given
operands[0]; it rather modifies operands[0] without effect.  It also
creates a DImode pseudo that it doesn't use, a DFmode pseudo that's
unnecessary AFAICT, and it's not indented per the GNU Coding
Standards.  The patch below fixes all of these.

I wasn't sure simplify_gen_subreg might possibly emit any code in
obscure cases, so I left it in place, and accommodated the possibility
that the result of the mode conversion back might need copying to the
requested output operand.  In my tests, the output was always a
REG:DF, so the subreg was trivial and the conversion back got the
original REG:DF output back.


I'm concerned about several issues in the mffsl testcase.  First, I
don't see that comparing the values as doubles rather than as long
longs is desirable.  These are FPSCR bitfields, not FP numbers.  I
understand mffs et al use double because they output to FP registers,
but...  The bit patterns might not even be well-formed FP numbers.

Another issue with the test is that, if the compare fails, it calls
mffsl again to print the value, as if it would yield the same result.
But part of the FPSCR that mffsl (emulated with mmfl or not) copies to
the output FP register is the FPCC, so the fcmpu used to compare the
result of the first mmfsl will modify FPSCR and thus the result of the
second mmfsl call.

Yet another issue is that the test assumed the mmfs bits not extracted
by mffsl are all zero.  This appears to be the case, as the bits left
out are for sticky exceptions, but there are reserved parts of FPSCR
that might turn out to be set in the future, and then the masking in
the GCC-emulated version of mffsl would zero out those bits and cause
the compare to fail.

So I put in masking in the mffs result before the compare, but then,
what if mffsl is changed so that it copies additional nonzero bits?
Should we mask both mffs and mffsl outputs?  Or is it safe to leave
those bits alone and assume them to be zero at the entry point of
main(), as the test used to do?


Regstrapped on powerpc64le-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
output operand in emulation.  Simplify.

for  gcc/testsuite/ChangeLog

* gcc.target/powerpc/test_mffsl.c: Call mffsl only once.
Reinterpret the doubles as long longs for compares.  Mask out
mffs bits that are not expected from mffsl.
---
 gcc/config/rs6000/rs6000.md   |   26 +
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c |   12 
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 11ab745..8f1ab55 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13620,18 +13620,20 @@
 
   if (!TARGET_P9_MISC)
 {
-   rtx tmp_di = gen_reg_rtx (DImode);
-   rtx tmp_df = gen_reg_rtx (DFmode);
-
-   /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
-  instruction using the mffs instruction and masking off the bits
-  the mmsl instruciton actually reads.  */
-   emit_insn (gen_rs6000_mffs (tmp_df));
-   tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
-   emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL)));
-
-   operands[0] = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
-   DONE;
+  rtx tmp_df = operands[0];
+  rtx tmp_di;
+
+  /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
+instruction using the mffs instruction and masking off the bits
+the mmsl instruciton actually reads.  */
+  emit_insn (gen_rs6000_mffs (tmp_df));
+  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
+  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL)));
+
+  tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
+  if (operands[0] != tmp_df)
+   emit_move_insn (operands[0], tmp_df);
+  DONE;
 }
 
 emit_insn (gen_rs6000_mffsl_hw (operands[0]));
diff --git a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c 
b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
index 93a8ec2..a1f73aa 100644
--- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
+++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
@@ -14,17 +14,21 @@ int main ()
   union blah {
 double d;
 unsigned long long ll;
-  } conv_val;
+  } mffs_val, mffsl_val;
 
   /* Test reading the FPSCR register.  */
   __asm __volatile ("mffs %0" : "=f"(f14));
-  conv_val.d = f14;
+  mffs_val.d = f14;
+  /* Select the bits obtained by mffsl.  */
+  mffs_val.ll &= 0x70007f0ffLL;
 
-  if (conv_val.d != __builtin_mffsl())
+  mffsl_val.d = __builtin_mffsl ();
+
+  if (mffs_val.ll != mffsl_val.ll)
 {
 #ifdef DEBUG
   printf("ERROR, __builtin_mffsl() returned 0x%llx, not the expecected 
value 

[PATCH][wwwdocs][AArch64] Fix typo in sve2-aes

2020-04-23 Thread Kyrylo Tkachov
Hi all,

Pushing this obvious typo fix to the AArch64 changes.html for GCC 10.

Thanks,
Kyrill


wwwdocs-typo.patch
Description: wwwdocs-typo.patch


Re: [PATCH] amdgcn: Add stub personality function

2020-04-23 Thread Andrew Stubbs

On 22/04/2020 22:10, Kwok Cheung Yeung wrote:

Hello

This patch adds a stub implementation of __gxx_personality_v0, which is 
used in C++ exception handling. AMD GCN currently does not actually 
support exception handling (the unwind functions are all stubs too), so 
adding an extra stub function does not regress the current level of 
functionality any. This allows the following tests in the libgomp 
testsuite that were previously failing with a linker error to compile 
and run, provided that they do not throw any exceptions:


libgomp.c-c++-common/function-not-offloaded.c
libgomp.c++/for-15.C
libgomp.c++/for-24.C
libgomp.oacc-c-c++-common/routine-1.c
libgomp.oacc-c++/pr71959.C
libgomp.oacc-c++/routine-1-auto.C
libgomp.oacc-c++/routine-1-template-auto.C
libgomp.oacc-c++/routine-1-template-trailing-return-type.C
libgomp.oacc-c++/routine-1-template.C
libgomp.oacc-c++/routine-1-trailing-return-type.C

Tested with offloaded and standalone builds of GCC for AMD GCN. Okay for 
trunk?


Kwok

2020-04-22  Kwok Cheung Yeung  

 libgcc/
 * config/gcn/unwind-gcn.c (__gxx_personality_v0): New.

diff --git a/libgcc/config/gcn/unwind-gcn.c 
b/libgcc/config/gcn/unwind-gcn.c

index 813f03f..6508b45 100644
--- a/libgcc/config/gcn/unwind-gcn.c
+++ b/libgcc/config/gcn/unwind-gcn.c
@@ -35,3 +35,13 @@ _Unwind_GetIPInfo (struct _Unwind_Context *c, int 
*ip_before_insn)

  {
    return 0;
  }
+
+_Unwind_Reason_Code
+__gxx_personality_v0 (int version,
+  _Unwind_Action actions,
+  _Unwind_Exception_Class exception_class,
+  struct _Unwind_Exception *ue_header,
+  struct _Unwind_Context *context)
+{
+  return 0;
+}



OK.

Andrew


Re: introduce target tmpnam and require it in tests relying on it

2020-04-23 Thread Alexandre Oliva
On Apr 21, 2020, Bernhard Reutner-Fischer  wrote:

> On 17 April 2020 21:21:41 CEST, Martin Sebor via Gcc-patches
>  wrote:
>> On 4/17/20 11:48 AM, Alexandre Oliva wrote:
>>> On Apr  9, 2020, Alexandre Oliva  wrote:
>>> 
 Some target C libraries that aren't recognized as freestanding don't
 have filesystem support, so calling tmpnam, fopen/open and
 remove/unlink fails to link.
>>> 
 This patch introduces a tmpnam effective target to the testsuite,
>> and
 requires it in the tests that call tmpnam.

 for  gcc/testsuite/ChangeLog
>>> 
 * lib/target-supports.exp (check_effective_target_tmpnam): New.
 * gcc.c-torture/execute/fprintf-2.c: Require it.
 * gcc.c-torture/execute/printf-2.c: Likewise.
 * gcc.c-torture/execute/user-printf.c: Likewise.
>>> 
>>> Ping?
>>> 
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543672.html
>> 
>> I'm okay with the changes to the tests.
>> 
>> The target-supports.exp changes look reasonable to me as well but
>> I can't approve them.  Since you said it's for targets that don't
>> have file I/O functions I wonder if the name would better reflect
>> that if it were called, say, check_effective_target_fileio?

> If you want a fileio predicate then please do not keys it off obsolescent 
> functions.

I'd actually considered adding two expect/dejagnu procs, one for fileio,
one for tmpnam, possibly with the latter depending on the former, but
decided to take the simpler path on the grounds that all tests that
would have depended on fileio would also depend on tmpnam.

Plus, it did seem to make sense to test for tmpnam, since it probably
won't be found on freestanding environments (the affected tests require
non-freestanding effective target, but that translate to requiring I/O
support), and tmpnam might be removed from standards in the future.  We
might want to catch that, rather than silently skip the test, though.

I'd be glad to add an intermediate fileio effective target, or rename
the proposed one and drop tmpnam from it, if there's agreement such a
separate effective target would be more useful.


So, should I rename _tmpnam to _fileio and drop tmpnam() from the code
snippet in the effective target test?  Or should I keep _tmpnam and
introduce _fileio?  With or without a dependency of _tmpnam on _fileio?

Since Jeff Law approved the patch as is, would you guys mind if I make
any further changes as separate, followup patches?

Thanks,

-- 
Alexandre Oliva, freedom fighterhe/himhttps://FSFLA.org/blogs/lxo/
Free Software Evangelist  Stallman was right, but he's left :(
GNU Toolchain Engineer   Live long and free, and prosper ethically


Re: [PATCH] libstdc++: don't use #include_next in c_global headers

2020-04-23 Thread Jonathan Wakely via Gcc-patches

On 23/04/20 06:32 +0200, Helmut Grohne wrote:

Hi,

On Mon, Apr 20, 2020 at 10:12:37AM +0100, Jonathan Wakely wrote:

> Now you are probably going to say that "-isystem /usr/include" is a bad
> idea and that you shouldn't do that.

Right.

> I'm inclined to agree. This isn't a
> problem just yet. Debian wants to move /usr/include/stdlib.h to
> /usr/include//stdlib.h. After that move, the problematic flag
> becomes "-isystem /usr/include/". Unfortunately, around 30
> Debian packages[1] do pass exactly that flag. Regardless whether doing
> so is a bad idea, I guess we will have to support that.

Or Debian should fix what they're going to break.


This is not quite precise. The offending -isystem
/usr/include/ flag is already being passed. According to what
you write later, doing so is broken today. It just happens to work by
accident. So all we do is making the present breakage visible.


> I am proposing to replace those two #include_next with plain #include.
> That'll solve the problem described above, but it is not entirely
> obvious that doing so doesn't break something else.
>
> After switching those #include_next to #include,
> libstdc++-v3/include/c_global/cstdlib will continue to temporarily
> will #include . Now, it'll search all include directories. It
> may find libstdc++-v3/include/c_comaptibility/stdlib.h or the libc's
> version. We cannot tell which. If it finds the one from libstdc++-v3,
> the header will notice the _GLIBCXX_INCLUDE_NEXT_C_HEADERS macro and
> immediately #include_next  skipping the rest of the header.
> That in turn will find the libc version. So in both cases, it ends up
> using the right one. Precisely what we wanted.

As Marc said, this doesn't work.


That is not very precise either. Marc said that it won't fix all cases.
In practice, it would make those work that don't #include  but
use #include  instead.

Marc also indicated that using include_next for a header of a different
name is wrong. So this is a bug in libstdc++ regardless of whether it
breaks or unbreaks other pieces of software.


He said he doesn't like it, that doesn't mean it's a bug or actually
causes incorrect results.

Whereas using -isystem provably *does* break the implementation,
making it impossible for #include  to meet the requirements
of the C++ standard. And your proposed patch doesn't prevent that.



If a program tries to include  it needs to get the libstdc++
version, otherwise only the libc versions of certain functions are
defined. That means the additional C++ overloads such as ::abs(long)
and ::abs(long long) won't be defined. That is the reason why
libstdc++ provides its own .

And if you do -isystem /usr/include (or any other option that causes
libstdc++'s  to be skipped) that doesn't work. Only
::abs(int) gets defined.

So -isystem /usr/include breaks code, with or without your patch.


It is very difficult to disagree with -isystem /usr/include or -isystem
/usr/include/ being broken and unsupported. Having you state it
that clearly does help with communicating to other upstreams. For this
reason, I've looked into the remaining cases. It turns out that there
aren't that many left. In particular chromium, opencv and vtk got fixed
in the mean time. Basically all remaining failures could be attributed
to qmake, which passes all directories below /usr/include (including
/usr/include and /usr/include/ if a .pc file mentions them)
using -isystem. I've sent a patch https://bugs.debian.org/958479 to make
qmake stop doing that.

I therefore agree with you that the patch I sent for libstdc++ is not
necessary to make packages build on Debian. Removing the offending
-isystem flags from the respective builds is a manageable option and has
already happened to a large extend.


Yes, I introduced the current  and  wrappers years
ago in GCC 6, and so I'm surprised to see it coming up again now.
Several packages had problems and already fixed them.


We can conclude that the motivation for my patch is not a good one,
because it embraces broken behaviour. However, the use of include_next
remains a bug, because the name of the including and the name of the
included header differ, and it should be fixed on that ground.


Not liking something is not a bug.

You need to demonstrate an actual bug (e.g. failure to compile,
non-conformance to the C++ standard) that is not caused by user error
(like misuse of -isystem) to argue for fixing something.




RE: [PATCH v2] aarch64: Add TX3 machine model

2020-04-23 Thread Kyrylo Tkachov
Hi Anton,

Thanks to you and Joel for clarifying the copyright assignment...

> -Original Message-
> From: Gcc-patches  On Behalf Of Anton
> Youdkevitch
> Sent: 20 April 2020 19:29
> To: gcc-patches@gcc.gnu.org
> Cc: jo...@marvell.com
> Subject: [PATCH v2] aarch64: Add TX3 machine model
> 
> Here is the patch introducing thunderxt311 maching model
> for the scheduler. A name for the new chip was added to the
> list of the names to be recognized as a valid parameter for mcpu
> and mtune flags. The TX2 cost model was reused for TX3.
> 
> The previously used "cryptic" name for the command line
> parameter is replaced with the same "thunderxt311" name.
> 
> Bootstrapped on AArch64.
> 
> 2020-04-20 Anton Youdkevitch 
> 
> * config/aarch64/aarch64-cores.def: Add the chip name.
> * config/aarch64/aarch64-tune.md: Regenerated.
> * gcc/config/aarch64/aarch64.c: Add the cost tables for the chip.
> * gcc/config/aarch64/thunderx3t11.md: New file: add the new
> machine model for the scheduler
> * gcc/config/aarch64/aarch64.md: Include the new model.

No "gcc/" in the path here.
Also, please add an entry in the documentation in doc/invoke.texi for the new 
option.

> 
> ---
>  gcc/config/aarch64/aarch64-cores.def |   3 +
>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>  gcc/config/aarch64/aarch64.c |  27 +
>  gcc/config/aarch64/aarch64.md|   1 +
>  gcc/config/aarch64/thunderx3t11.md   | 686 +++
>  5 files changed, 718 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index ea9b98b..ece6c34 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -95,6 +95,9 @@ AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  
AARCH64_FL_FOR_ARCH8_1 | AA
 /* Cavium ('C') cores. */
 AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  
AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1)
 
+/* Cavium ('??') cores (TX3). */
+AARCH64_CORE("thunderx3t11",  thunderx3t11,  thunderx3t11, 8_1A,  
AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx3t11, 0x43, 0x0b8, 0x0a)
+

I appreciate this is early CPU enablement and documentation is not always 
ready, but would it be better to use a "Marvell cores" comment above that 
entry? Up to you.
The more important thing is the architecture features enabled. The entry here 
means it's an Armv8.1-a CPU (with crypto).
>From what I can find on the Internet [1] this CPU has Armv8.3-a features. Can 
>you please double-check that and update the flags here, if necessary?
It would be a shame to miss out on architecture enablement for 
-mcpu=thunderx3t11 due to a flag mismatch.

Thanks,
Kyrill

[1] 
https://www.anandtech.com/show/15621/marvell-announces-thunderx3-96-cores-384-thread-3rd-gen-arm-server-processor


Re: [patch, fortran] Fix PR 93956, wrong pointer when returned via function

2020-04-23 Thread Paul Richard Thomas via Gcc-patches
Hi Thomas,

You didn't attach the testcase but never mind, I am sure that it is OK :-)

OK for trunk and, if you feel like it, for 9-branch.

Thanks

Paul


On Tue, 21 Apr 2020 at 22:56, Thomas Koenig via Fortran 
wrote:

> Hello world,
>
> this one took a bit of detective work.  When array pointers point
> to components of derived types, we currently set the span field
> and then create an array temporary when we pass the array
> pointer to a procedure as a non-pointer or non-target argument.
> (This is inefficient, but that's for another release).
>
> Now, the compiler detected this case when there was a direct assignment
> like p => a%b, but not when p was returned either as a function result
> or via an argument.  This patch fixes that.
>
> Regression-tested. OK for trunk, gcc 9 and gcc8 (all are affected)?
>
> Regards
>
> Thomas
>
> 2020-04-21  Thomas Koenig  
>
> PR fortran/93956
> * expr.c (gfc_check_pointer_assign): Also set subref_array_pointer
> when a function returns a pointer.
> * interface.c (gfc_set_subref_array_pointer_arg): New function.
> (gfc_procedure_use): Call it.
>
> 2020-04-21  Thomas Koenig  
>
> PR fortran/93956
> * gfortran.dg/pointer_assign_13.f90: New test.
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


[PATCH PR94708] rtl combine should consider NaNs when generate fp min/max

2020-04-23 Thread Zhanghaijian (A)
Hi

This is a simple fix for pr94708.
It's unsafe for rtl combine to generate fp min/max under 
-funsafe-math-optimizations, considering NaNs.
We can only do this kind of transformation under -funsafe-math-optimizations 
and -ffinite-math-only.
Bootstrap and tested on aarch64 Linux platform. No new regression witnessed.

Any suggestion?  

Thanks,
Haijian Zhang


pr94708-v1.patch
Description: pr94708-v1.patch


Re: [PATCH] coroutines: Handle lambda capture objects in the way as clang.

2020-04-23 Thread Iain Sandoe

Nathan Sidwell  wrote:


On 4/22/20 8:48 AM, Iain Sandoe wrote:

Hi,
There is no PR for this, at present, but the implementation of
clang and GCC's handling of lambda capture object implicit parms
is currently different.  There is still some discussion about
'correct' interpretation of the standard - but in the short-term
it is probably best to have consistent implementations - even if
those subsequently turn out to be 'consistently wrong'.


Agreed, the std is at best ambigiuous in this area, we should aim for  
implementation agreement.


following more discussion amongst WG21 members, it appears that there is  
still
some confusion over the status of other implementations, and it may well be  
that
clang will be updated to follow the pattern that GCC is currently  
implementing.


In light of this, perhaps it’s best to withdraw this patch for now.

Iain



Re: [PATCH PR94708] rtl combine should consider NaNs when generate fp min/max

2020-04-23 Thread Richard Biener via Gcc-patches
On Thu, Apr 23, 2020 at 10:42 AM Zhanghaijian (A)
 wrote:
>
> Hi
>
> This is a simple fix for pr94708.
> It's unsafe for rtl combine to generate fp min/max under 
> -funsafe-math-optimizations, considering NaNs.
> We can only do this kind of transformation under -funsafe-math-optimizations 
> and -ffinite-math-only.
> Bootstrap and tested on aarch64 Linux platform. No new regression witnessed.
>
> Any suggestion?

Please do not check flags, instead use && !HONOR_NANS (mode)
What about signed zeros?  The GENERIC folding routine producing
min/max is avoiding it when those are honored (and it doesn't
check flag_unsafe_math_optmizations at all).

Certainly the patch is an incremental correct fix, with the
flag testing replaced by the mode feature testing.

Richard.

> Thanks,
> Haijian Zhang


Re: [PATCH] [Stage1] Refactor tree-ssa-operands.c

2020-04-23 Thread Richard Biener via Gcc-patches
On Wed, Apr 22, 2020 at 8:40 PM Giuliano Belinassi
 wrote:
>
> This patch refactors tree-ssa-operands.c by wrapping the global
> variables into a class, and also removes unused code.
>
> Just sending this for when Stage1 is back again.
>
> I ran the testsuite and bootstraped in a x86_64 linux machine and
> found no issues.

First of all thanks for doing this.  I have a few editorial suggestions
about the class setup - first the name build_virtual_operands is
badly chosen, I prefer operand_scanner.  Second I suggest to
have the CTOR take the invariants as arguments which is the
stmt we operate on and its containing function.  Thus,

  operand_scanner (function *, gimple *);

which makes passing those down functions unnecessary.

Since build_vuses is now a member and allocated for each stmt
which would be a regression I'd suggest to use an auto_vec
with some pre-allocated storage, thus change it to

  auto_vec build_uses;

that also makes the destructor trivial (please simply remove
cleanup_build_arrays).  I guess there's further possibilities
for streamlining the initialization/teardown process but that's
better done as followup.

Otherwise the change looks OK to me.

Thanks,
Richard.

> gcc/ChangeLog:
> 2020-04-22  Giuliano Belinassi  
>
> * tree-ssa-operands.c (build_virtual_operands): New class.
> (operands_bitmap_obstack): Remove.
> (n_initialized): Remove.
> (build_uses): Move to build_virtual_operands class.
> (build_vuse): Same as above.
> (build_vdef): Same as above.
> (verify_ssa_operands): Same as above.
> (finalize_ssa_uses): Same as above.
> (cleanup_build_arrays): Same as above.
> (finalize_ssa_stmt_operands): Same as above.
> (start_ssa_stmt_operands): Same as above.
> (append_use): Same as above.
> (append_vdef): Same as above.
> (add_virtual_operand): Same as above.
> (add_stmt_operand): Same as above.
> (get_mem_ref_operands): Same as above.
> (get_tmr_operands): Same as above.
> (maybe_add_call_vops): Same as above.
> (get_asm_stmt_operands): Same as above.
> (get_expr_operands): Same as above.
> (parse_ssa_operands): Same as above.
> (finalize_ssa_defs): Same as above.
> (build_ssa_operands): Same as above, plus create a C-like wrapper.
> (update_stmt_operands): Create an instance of build_virtual_operands.


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Segher Boessenkool
On Thu, Apr 23, 2020 at 09:32:37AM +0200, Richard Biener wrote:
> On Thu, Apr 23, 2020 at 12:31 AM Jeff Law  wrote:
> > On Wed, 2020-04-22 at 15:50 -0500, Segher Boessenkool wrote:
> > > > > In some ways it feels like it would be easier to resurrect RTL SSA :-)
> > >
> > > Why was RTL SSA abandoned?
> > >
> > > It might well work to keep everything in SSA form all the way to RA.
> > > Hrm, that doesn't sound bad at all :-)
> > >
> > > (The PHIs need to be made explicit to something that resembles the
> > > machine code we will end up with, very early in the pipeline, but it
> > > could still also be some valid SSA form; and we can of course also
> > > have hard registers in all RTL, so that needs to be dealt with sanely
> > > some way as well  Lots of details, I don't see a crucial problem though,
> > > probably means I need to look harder ;-) )
> > Lack of time mostly.  There's some complications like subregs, argument 
> > registers
> > and the like.  But you can restrict ssa based analysis & optimizations to 
> > just
> > the set of pseudos that are in SSA form and do something more conservative 
> > on the
> > rest.
> 
> I guess time is better spent on trying to extend GIMPLE + SSA up to RA, thus
> make instruction selection on GIMPLE.

I think this is a bad idea.  By the time you have invented enough new
"lower GIMPLE" ("limple"?) to be able to use it to describe machine
insns like we can with RTL, you will have a more verbose, more memory
hungry, slower, etc. reinvented RTL.

RTL is a *feature*, and it is one of the things that makes GCC
significantly better than the competition.


More optimisations should move to GIMPLE, for example some loop
optimisations should be done much earlier (most unrolling).  The expand
pass should lose most of the "optimisations" it has built up over the
decades (that now often are detrimental at best).  Some of what expand
now does should probably be done while still in GIMPLE, even.

But it is very useful to have a separate "low level" representation,
that is actually close to the machine code we will eventually generate.
RTL is one such representation, and we already have it, and it is very
well tuned by now -- throwing it away would require some *huge*
advantage, because the costs of doing that are immense as well.


Segher


Re: [PATCH] rs6000: Fix C++14 vs. C++17 ABI bug on powerpc64le [PR94707]

2020-04-23 Thread Segher Boessenkool
Hi!

On Thu, Apr 23, 2020 at 10:06:16AM +0200, Jakub Jelinek wrote:
> > Is there no way to describe this without referring to "c++17" (or even
> > "base field")?  It's a pretty gross abstraction violation.
> 
> I'm afraid it is desirable to talk about c++17 and base field, otherwise
> it won't be clear what we mean.

Yeah, but that just shows it is a bad abstraction (not an abstraction at
all really, heh).  Not nice :-(

> > > +   inform (input_location,
> > > +   "prior to GCC 10, parameters of type "
> > > +   "%qT were passed incorrectly for C++17", type);
> > 
> > This could more explicitly say that makes the compiled code incompatible
> > between GCC 10 and older, which is the point of the warning?  It's not
> > really necessary to say the old one was bad -- let's hope it was :-)
> 
> After many iterations on IRC we came up with the following wording, which is
> what I've committed.  Might be nice to also have URL with larger explanation
> in changes.html. but the diagnostic framework doesn't support that yet.

> +   inform (input_location,
> +   "parameter passing for argument of type %qT "
> +   "when C++17 is enabled changed to match C++14 "
> +   "in GCC 10.1", type);

It isn't "to match C++14".  It simply is a bugfix, we didn't follow
the ABI before :-)

Thanks again for doing this work, much appreciated,


Segher


RE: [PATCH] aarch64: eliminate redundant zero extend after bitwise negation

2020-04-23 Thread Kyrylo Tkachov
Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: 22 April 2020 21:41
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> ; Marcus Shawcroft
> ; Kyrylo Tkachov ;
> nd 
> Subject: [PATCH] aarch64: eliminate redundant zero extend after bitwise
> negation
> 
> Hello,
> 
> The attached patch eliminates a redundant zero extend from the AArch64
> backend. Given the following C code:
> 
> unsigned long long foo(unsigned a)
> {
> return ~a;
> }
> 
> prior to this patch, AArch64 GCC at -O2 generates:
> 
> foo:
> mvn w0, w0
> uxtwx0, w0
> ret
> 
> but the uxtw is redundant, since the mvn clears the upper half of the x0
> register. After applying this patch, GCC at -O2 gives:
> 
> foo:
> mvn w0, w0
> ret
> 
> Testing:
> Added regression test which passes after applying the change to
> aarch64.md.
> Full bootstrap and regression on aarch64-linux with no additional 
> failures.
> 

Thanks, this patch is ok.
However, GCC 10 is now in stage 4, so I'll defer committing it until stage 1 
reopens (hopefully not long to go!)
If I do not commit it in the first few weeks of stage 1 feel free to ping me.

Kyrill

> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2020-04-22  Alex Coplan  
> 
> * config/aarch64/aarch64.md (*one_cmpl_zero_extend): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-04-22  Alex Coplan  
> 
> * gcc.target/aarch64/mvn_zero_ext.c: New test.



Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Richard Biener via Gcc-patches
On Thu, Apr 23, 2020 at 12:17 PM Segher Boessenkool
 wrote:
>
> On Thu, Apr 23, 2020 at 09:32:37AM +0200, Richard Biener wrote:
> > On Thu, Apr 23, 2020 at 12:31 AM Jeff Law  wrote:
> > > On Wed, 2020-04-22 at 15:50 -0500, Segher Boessenkool wrote:
> > > > > > In some ways it feels like it would be easier to resurrect RTL SSA 
> > > > > > :-)
> > > >
> > > > Why was RTL SSA abandoned?
> > > >
> > > > It might well work to keep everything in SSA form all the way to RA.
> > > > Hrm, that doesn't sound bad at all :-)
> > > >
> > > > (The PHIs need to be made explicit to something that resembles the
> > > > machine code we will end up with, very early in the pipeline, but it
> > > > could still also be some valid SSA form; and we can of course also
> > > > have hard registers in all RTL, so that needs to be dealt with sanely
> > > > some way as well  Lots of details, I don't see a crucial problem though,
> > > > probably means I need to look harder ;-) )
> > > Lack of time mostly.  There's some complications like subregs, argument 
> > > registers
> > > and the like.  But you can restrict ssa based analysis & optimizations to 
> > > just
> > > the set of pseudos that are in SSA form and do something more 
> > > conservative on the
> > > rest.
> >
> > I guess time is better spent on trying to extend GIMPLE + SSA up to RA, thus
> > make instruction selection on GIMPLE.
>
> I think this is a bad idea.  By the time you have invented enough new
> "lower GIMPLE" ("limple"?) to be able to use it to describe machine
> insns like we can with RTL, you will have a more verbose, more memory
> hungry, slower, etc. reinvented RTL.

I don't think there's much to invent.

I think at least one step would be uncontroversical(?), namely moving
the RTL expansion "magic"
up to a GIMPLE pass.  Where the "magic" would be to turn
GIMPLE stmts not directly expandable via an existing optab into
GIMPLE that can be trivially expanded.  That includes eventually
combining multiple stmts into more powerful instructions and
doing the magic we have in, like, expand_binop (widening, etc.).
Where there's not a 1:1 mapping of a GIMPLE stmt to an optab
GIMPLE gets direct-internal-fn calls.
Then RTL expansion would be mostly invoking gen_insn (optab-code).

More controversical would be ending up in GIMPLE there.  I think
GIMPLE can handle all RTL insns if we massage GIMPLE_ASM
a bit.  You'd end up with, say,

 asm ("(set (reg:DI $0)
(and:DI (reg/v:DI $1 [ dst ])
(reg:DI $2)))" : "r" (_1) : "r" (_2), "r" (_3) : "cc");

in place of

  _1 = _2 & _3;

and the GIMPLE_ASM text could be actual RTL.  We'd extend
the stmt with an extra operand to denote recognized patterns,
so another option would be to keep the original GIMPLE as well.

> RTL is a *feature*, and it is one of the things that makes GCC
> significantly better than the competition.

That said, I actually agree with that.  It's just that I hope we can
make some of the knowledge just represented on the RTL side
available on the GIMPLE side.  The more complicated parts,
like calling conventions, that is.

And yes, I want to get rid of that expand monster to be able to
do something like sched1 on "GIMPLE" without expand coming
along and re-scheduling everything at-will.

> More optimisations should move to GIMPLE, for example some loop
> optimisations should be done much earlier (most unrolling).  The expand
> pass should lose most of the "optimisations" it has built up over the
> decades (that now often are detrimental at best).  Some of what expand
> now does should probably be done while still in GIMPLE, even.
>
> But it is very useful to have a separate "low level" representation,
> that is actually close to the machine code we will eventually generate.
> RTL is one such representation, and we already have it, and it is very
> well tuned by now -- throwing it away would require some *huge*
> advantage, because the costs of doing that are immense as well.

But being stuck with something means no progress...  I know
very well it's 100 times harder to get rid of something than to
add something new ontop.

Richard.

>
> Segher


[PATCH] Enable simple invocation of runtest in testsuite

2020-04-23 Thread Matthias Kretz
I noticed this inconvenience while learning dejagnu.

From: Matthias Kretz 

* testsuite/Makefile.am: Remove dup target_triplet and set tool,
allowing runtest to work without arguments.
---
 libstdc++-v3/testsuite/Makefile.am | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/Makefile.am
index e19509d2534..9cef1e65e1b 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -47,6 +47,7 @@ site.exp: Makefile
 	@echo '## these variables are automatically generated by make ##' >site.tmp
 	@echo '# Do not edit here.  If you wish to override these values' >>site.tmp
 	@echo '# edit the last section' >>site.tmp
+	@echo 'set tool libstdc++' >>site.tmp
 	@echo 'set srcdir $(srcdir)' >>site.tmp
 	@echo "set objdir `pwd`" >>site.tmp
 	@echo 'set build_alias "$(build_alias)"' >>site.tmp
@@ -55,7 +56,6 @@ site.exp: Makefile
 	@echo 'set host_triplet $(host_triplet)' >>site.tmp
 	@echo 'set target_alias "$(target_alias)"' >>site.tmp
 	@echo 'set target_triplet $(target_triplet)' >>site.tmp
-	@echo 'set target_triplet $(target_triplet)' >>site.tmp
 	@echo 'set libiconv "$(LIBICONV)"' >>site.tmp
 	@echo 'set baseline_dir "$(baseline_dir)"' >> site.tmp
 	@echo 'set baseline_subdir_switch "$(baseline_subdir_switch)"' >> site.tmp


Re: [PATCH] amdgcn: Add stub personality function

2020-04-23 Thread Thomas Schwinge
Hi!

On 2020-04-23T09:15:29+0100, Andrew Stubbs  wrote:
> On 22/04/2020 22:10, Kwok Cheung Yeung wrote:
>> This patch adds a stub implementation of __gxx_personality_v0, which is
>> used in C++ exception handling. AMD GCN currently does not actually
>> support exception handling

So we should simply disable it properly (see below)...

>> (the unwind functions are all stubs too), so
>> adding an extra stub function does not regress the current level of
>> functionality any.

... instead of adding such stub functions.

>> This allows the following tests in the libgomp
>> testsuite that were previously failing with a linker error to compile
>> and run, provided that they do not throw any exceptions:
>>
>> libgomp.c-c++-common/function-not-offloaded.c
>> libgomp.c++/for-15.C
>> libgomp.c++/for-24.C
>> libgomp.oacc-c-c++-common/routine-1.c
>> libgomp.oacc-c++/pr71959.C
>> libgomp.oacc-c++/routine-1-auto.C
>> libgomp.oacc-c++/routine-1-template-auto.C
>> libgomp.oacc-c++/routine-1-template-trailing-return-type.C
>> libgomp.oacc-c++/routine-1-template.C
>> libgomp.oacc-c++/routine-1-trailing-return-type.C

That's  "[amdgcn] ld: error: undefined
symbol: __gxx_personality_v0", by the way, so should be referenced in the
ChangeLog.

>> Tested with offloaded and standalone builds of GCC for AMD GCN. Okay for
>> trunk?

>>  libgcc/
>>  * config/gcn/unwind-gcn.c (__gxx_personality_v0): New.

>> --- a/libgcc/config/gcn/unwind-gcn.c
>> +++ b/libgcc/config/gcn/unwind-gcn.c

>> +_Unwind_Reason_Code
>> +__gxx_personality_v0 (int version,
>> +  _Unwind_Action actions,
>> +  _Unwind_Exception_Class exception_class,
>> +  struct _Unwind_Exception *ue_header,
>> +  struct _Unwind_Context *context)
>> +{
>> +  return 0;
>> +}

What does a 'return 0' semantically mean here?  Shouldn't this rather
return some '_URC_*' code -- or even abort (given that we're not
supporting unwinding)?

> OK.

I suggest we instead apply what I'd proposed a month ago in
 "[amdgcn] ld: error: undefined symbol:
__gxx_personality_v0", and now yesterday (coincidentally) posted.

+static enum unwind_info_type
+gcn_except_unwind_info (struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return UI_NONE;
+}

+#undef  TARGET_EXCEPT_UNWIND_INFO
+#define TARGET_EXCEPT_UNWIND_INFO gcn_except_unwind_info


Grüße
 Thomas
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] amdgcn: Add stub personality function

2020-04-23 Thread Kwok Cheung Yeung

On 23/04/2020 12:05 pm, Thomas Schwinge wrote:

So we should simply disable it properly (see below)...

... instead of adding such stub functions.


I suggest we instead apply what I'd proposed a month ago in
 "[amdgcn] ld: error: undefined symbol:
__gxx_personality_v0", and now yesterday (coincidentally) posted.

 +static enum unwind_info_type
 +gcn_except_unwind_info (struct gcc_options *opts ATTRIBUTE_UNUSED)
 +{
 +  return UI_NONE;
 +}

 +#undef  TARGET_EXCEPT_UNWIND_INFO
 +#define TARGET_EXCEPT_UNWIND_INFO gcn_except_unwind_info



I agree that not generating the problematic code in the first place is the 
better approach. Does that mean we can now remove libgcc/config/gcn/unwind-gcn.c 
completely?


Thanks

Kwok


Re: [rs6000] fix mffsl emulation

2020-04-23 Thread Segher Boessenkool
Hi!

On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote:
> The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going
> through the motions, but not storing the result in the given
> operands[0]; it rather modifies operands[0] without effect.

Heh, oops.

> It also
> creates a DImode pseudo that it doesn't use, a DFmode pseudo that's
> unnecessary AFAICT, and it's not indented per the GNU Coding
> Standards.  The patch below fixes all of these.
> 
> I wasn't sure simplify_gen_subreg might possibly emit any code in
> obscure cases,

It never does, it just returns an RTX.  Where would it emit the insns to?

> so I left it in place, and accommodated the possibility
> that the result of the mode conversion back might need copying to the
> requested output operand.  In my tests, the output was always a
> REG:DF, so the subreg was trivial and the conversion back got the
> original REG:DF output back.

> I'm concerned about several issues in the mffsl testcase.  First, I
> don't see that comparing the values as doubles rather than as long
> longs is desirable.  These are FPSCR bitfields, not FP numbers.  I
> understand mffs et al use double because they output to FP registers,
> but...  The bit patterns might not even be well-formed FP numbers.

"Desirable", probably not, no.  But it will always work: since all the
top bits are zeros always, it will always be a subnormal number, so all
comparisons will work as expected / wanted.

> Another issue with the test is that, if the compare fails, it calls
> mffsl again to print the value, as if it would yield the same result.
> But part of the FPSCR that mffsl (emulated with mmfl or not) copies to
> the output FP register is the FPCC, so the fcmpu used to compare the
> result of the first mmfsl will modify FPSCR and thus the result of the
> second mmfsl call.

Yeah, good point.  We never *use* FPCC (nowhere, not just in this test),
but that is somewhat beside the point.

OTOH, all this is only done if we are debugging the testcase, so it
isn't important either way.

> Yet another issue is that the test assumed the mmfs bits not extracted
> by mffsl are all zero.  This appears to be the case, as the bits left
> out are for sticky exceptions, but there are reserved parts of FPSCR
> that might turn out to be set in the future, and then the masking in
> the GCC-emulated version of mffsl would zero out those bits and cause
> the compare to fail.

All those extra bits are required to be set to 0.  From the ISA:
  For Move From FPSCR Lightweight (mffsl), do the following.  The
  contents of the control bits in the FPSCR, that is, bits 29:31 (DRN)
  and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), and the non-sticky status
  bits in the FPSCR, that is, bits 45:51 (FR, FI, C, FL, FG, FE, FU),
  are placed into the corresponding bits in register FRT. All other bits
  in register FRT are set to 0.

> So I put in masking in the mffs result before the compare, but then,
> what if mffsl is changed so that it copies additional nonzero bits?
> Should we mask both mffs and mffsl outputs?  Or is it safe to leave
> those bits alone and assume them to be zero at the entry point of
> main(), as the test used to do?

The code as-is was correct here (the compiler code as well as the
testcase code).

> for  gcc/ChangeLog
> 
>   * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
> output operand in emulation.  Simplify.

Indent with a tab please.

> +  rtx tmp_df = operands[0];

Please don't reuse pseudos (or worse, non-pseudo registers).  This was
correct in the original code.

> +  rtx tmp_di;

Please just declare at first use, like the original did.  It's the more
modern style, and it actually is nicer ;-)

> +  /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
> +  instruction using the mffs instruction and masking off the bits
> +  the mmsl instruciton actually reads.  */

"mffsl instruction".  Heh, I see the original was bad already, oops.

As I said above, the insn is *required* to zero all other bits.

> +  emit_insn (gen_rs6000_mffs (tmp_df));
> +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> +  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL)));
> +
> +  tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> +  if (operands[0] != tmp_df)
> + emit_move_insn (operands[0], tmp_df);
> +  DONE;

That "==" won't do what you want, I guess?  It's not needed, anyway, it
is just a distracting premature micro-optimisation.

So just do the emit_move_insn please, with whitespace and comment fixed
if you want?

Thanks,


Segher

(Is there a PR, btw?)


[AArch64] (PR94383) Avoid C++17 empty base field checking for HVA/HFA

2020-04-23 Thread Matthew Malcomson
In C++17, an empty class deriving from an empty base is not an
aggregate, while in C++14 it is.  In order to implement this, GCC adds
an artificial field to such classes.

This artificial field has no mapping to Fundamental Data Types in the
AArch64 PCS ABI and hence should not count towards determining whether an
object can be passed using the vector registers as per section
"6.4.2 Parameter Passing Rules" in the AArch64 PCS.
https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#the-base-procedure-call-standard

This patch avoids counting this artificial field in
aapcs_vfp_sub_candidate, and hence calculates whether such objects
should be passed in vector registers in the same manner as C++14 (where
the artificial field does not exist).

Before this change, the test below would pass the arguments to `f` in
general registers.  After this change, the test passes the arguments to
`f` using the vector registers.

The new behaviour matches the behaviour of `armclang`, and also matches
the behaviour when run with `-std=gnu++14`.

> gcc -std=gnu++17 test.cpp

``` test.cpp
struct base {};

struct pair : base
{
  float first;
  float second;
  pair (float f, float s) : first(f), second(s) {}
};

void f (pair);
int main()
{
  f({3.14, 666});
  return 1;
}
```

We add a `-Wpsabi` warning to catch cases where this fix has changed the ABI for
some functions.  Unfortunately this warning is not emitted twice for multiple
calls to the same function, but I feel this is not much of a problem and can be
fixed later if needs be.

(i.e. if `main` called `f` twice in a row we only emit a diagnostic for the
first).

Testing:
Minimal testing done just to demonstrate new warning message and to check
tests related to this still pass.


gcc/ChangeLog:

2020-04-23  Matthew Malcomson  
Jakub Jelinek  

PR target/94383
* config/aarch64/aarch64.c (aapcs_vfp_sub_candidate): Account for C++17
empty base class artificial fields.
(aarch64_vfp_is_call_or_return_candidate): Warn when ABI PCS decision is
different after this fix.



### Attachment also inlined for ease of reply###


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
f728ac530a4ac968de16ea59d2b724ed63b23d6f..25f60ec44ae3a177d0aceb4bae92ea17da91ba05
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16442,9 +16442,19 @@ aarch64_member_type_forces_blk (const_tree 
field_or_array, machine_mode mode)
If *MODEP is VOIDmode, then set it to the first valid floating point
type.  If a non-floating point type is found, or if a floating point
type that doesn't match a non-VOIDmode *MODEP is found, then return -1,
-   otherwise return the count in the sub-tree.  */
+   otherwise return the count in the sub-tree.
+
+   The AVOID_CXX17_EMPTY_BASE argument is to allow the caller to check whether
+   this function has changed its behavior after the fix for PR94384 -- this fix
+   is to avoid artificial fields in empty base classes.
+   When called with this argument as a NULL pointer this function does not
+   avoid the artificial fields -- this is useful to check whether the function
+   returns something different after the fix.
+   When called pointing at a value, this function avoids such artificial fields
+   and sets the value to TRUE when one of these fields has been set.  */
 static int
-aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep)
+aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep,
+bool *avoid_cxx17_empty_base)
 {
   machine_mode mode;
   HOST_WIDE_INT size;
@@ -16520,7 +16530,8 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode 
*modep)
|| TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
  return -1;
 
-   count = aapcs_vfp_sub_candidate (TREE_TYPE (type), modep);
+   count = aapcs_vfp_sub_candidate (TREE_TYPE (type), modep,
+avoid_cxx17_empty_base);
if (count == -1
|| !index
|| !TYPE_MAX_VALUE (index)
@@ -16558,7 +16569,18 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode 
*modep)
if (TREE_CODE (field) != FIELD_DECL)
  continue;
 
-   sub_count = aapcs_vfp_sub_candidate (TREE_TYPE (field), modep);
+   /* Ignore C++17 empty base fields, while their type indicates
+  they do contain padding, they have zero size and thus don't
+  contain any padding.  */
+   if (cxx17_empty_base_field_p (field)
+   && avoid_cxx17_empty_base)
+ {
+   *avoid_cxx17_empty_base = true;
+   continue;
+ }
+
+   sub_count = aapcs_vfp_sub_candidate (TREE_TYPE (field), modep,
+avoid_cxx17_empty_base);
if (sub_count < 0)
  return -1;
count += sub_count

Re: [AArch64] (PR94383) Avoid C++17 empty base field checking for HVA/HFA

2020-04-23 Thread Richard Sandiford
Matthew Malcomson  writes:
> In C++17, an empty class deriving from an empty base is not an
> aggregate, while in C++14 it is.  In order to implement this, GCC adds
> an artificial field to such classes.
>
> This artificial field has no mapping to Fundamental Data Types in the
> AArch64 PCS ABI and hence should not count towards determining whether an
> object can be passed using the vector registers as per section
> "6.4.2 Parameter Passing Rules" in the AArch64 PCS.
> https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#the-base-procedure-call-standard
>
> This patch avoids counting this artificial field in
> aapcs_vfp_sub_candidate, and hence calculates whether such objects
> should be passed in vector registers in the same manner as C++14 (where
> the artificial field does not exist).
>
> Before this change, the test below would pass the arguments to `f` in
> general registers.  After this change, the test passes the arguments to
> `f` using the vector registers.
>
> The new behaviour matches the behaviour of `armclang`, and also matches
> the behaviour when run with `-std=gnu++14`.
>
>> gcc -std=gnu++17 test.cpp
>
> ``` test.cpp
> struct base {};
>
> struct pair : base
> {
>   float first;
>   float second;
>   pair (float f, float s) : first(f), second(s) {}
> };
>
> void f (pair);
> int main()
> {
>   f({3.14, 666});
>   return 1;
> }
> ```
>
> We add a `-Wpsabi` warning to catch cases where this fix has changed the ABI 
> for
> some functions.  Unfortunately this warning is not emitted twice for multiple
> calls to the same function, but I feel this is not much of a problem and can 
> be
> fixed later if needs be.
>
> (i.e. if `main` called `f` twice in a row we only emit a diagnostic for the
> first).
>
> Testing:
> Minimal testing done just to demonstrate new warning message and to check
> tests related to this still pass.
>
>
> gcc/ChangeLog:
>
> 2020-04-23  Matthew Malcomson  
>   Jakub Jelinek  
>
>   PR target/94383
>   * config/aarch64/aarch64.c (aapcs_vfp_sub_candidate): Account for C++17
>   empty base class artificial fields.
>   (aarch64_vfp_is_call_or_return_candidate): Warn when ABI PCS decision is
>   different after this fix.

OK if it passes testing, thanks.

Richard


Re: [PATCH PR94708] rtl combine should consider NaNs when generate fp min/max

2020-04-23 Thread Segher Boessenkool
Hi!

On Thu, Apr 23, 2020 at 11:05:22AM +0200, Richard Biener wrote:
> On Thu, Apr 23, 2020 at 10:42 AM Zhanghaijian (A)
>  wrote:
> > This is a simple fix for pr94708.
> > It's unsafe for rtl combine to generate fp min/max under 
> > -funsafe-math-optimizations, considering NaNs.
> > We can only do this kind of transformation under 
> > -funsafe-math-optimizations and -ffinite-math-only.
> > Bootstrap and tested on aarch64 Linux platform. No new regression witnessed.
> >
> > Any suggestion?
> 
> Please do not check flags, instead use && !HONOR_NANS (mode)

Yeah, good point.

> What about signed zeros?

-funsafe-math-optimizations implies -fno-signed-zeros.

> The GENERIC folding routine producing
> min/max is avoiding it when those are honored (and it doesn't
> check flag_unsafe_math_optmizations at all).
> 
> Certainly the patch is an incremental correct fix, with the
> flag testing replaced by the mode feature testing.

Yeah, and the SMAX etc. definition is so weak that it isn't obvious
that this combine transform is valid without this flag.  We can or
should fix that, of course :-)


Segher


[PATCH] rs6000: Small improvement to the C++17 ABI fix [PR94707]

2020-04-23 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 23, 2020 at 05:24:19AM -0500, Segher Boessenkool wrote:
> > + inform (input_location,
> > + "parameter passing for argument of type %qT "
> > + "when C++17 is enabled changed to match C++14 "
> > + "in GCC 10.1", type);
> 
> It isn't "to match C++14".  It simply is a bugfix, we didn't follow
> the ABI before :-)

The reason for the exact wording was to make it clearer to the user
that C++17 doesn't have a different ABI from C++14 now, but it had in the
older releases.

Anyway, based on IRC discussion with Richard Sandiford on IRC, we should
probably test type uids instead of type pointers because type uids aren't
reused, but type pointers in a very bad luck case could be, and having the
static var at filescope and GTY((deletable)) is an overkill (and with costs
during GC time).

Ok if it passes bootstrap/regtest?

2020-04-23  Jakub Jelinek  

PR target/94707
* config/rs6000/rs6000-call.c (rs6000_discover_homogeneous_aggregate):
Use TYPE_UID (TYPE_MAIN_VARIANT (type)) instead of type to check
if the same type has been diagnosed most recently already.

--- gcc/config/rs6000/rs6000-call.c.jj  2020-04-23 09:59:12.002172006 +0200
+++ gcc/config/rs6000/rs6000-call.c 2020-04-23 13:42:10.037745872 +0200
@@ -5739,14 +5739,15 @@ rs6000_discover_homogeneous_aggregate (m
*n_elts = field_count;
  if (cxx17_empty_base_seen && warn_psabi)
{
- static const_tree last_reported_type;
- if (type != last_reported_type)
+ static unsigned last_reported_type_uid;
+ unsigned uid = TYPE_UID (TYPE_MAIN_VARIANT (type));
+ if (uid != last_reported_type_uid)
{
  inform (input_location,
  "parameter passing for argument of type %qT "
  "when C++17 is enabled changed to match C++14 "
  "in GCC 10.1", type);
- last_reported_type = type;
+ last_reported_type_uid = uid;
}
}
  return true;


Jakub



Re: [PATCH, libgfortran] Protect the trigd functions in libgfortran from unavailable math functions [PR94586, PR94694]

2020-04-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 22, 2020 at 01:13:47PM -0400, Fritz Reese via Gcc-patches wrote:
> Jakub has OK'd the patch and I recently authored the original trigd

My patch is already in.

> code being modified here. After his patch is committed I will commit
> this one unless others have comments or concerns.
> 
> libgfortran/ChangeLog:
> 
> 2020-04-22  Fritz Reese  
> 
> * intrinsics/trigd.c, intrinsics/trigd_lib.inc, intrinsics/trigd.inc:
> Guard against unavailable math functions.
> Use suffixes from kinds.h based on the REAL kind.
> 
> gcc/fortran/ChangeLog:
> 
> 2020-04-22  Fritz Reese  
> 
> * trigd_fe.inc: Use mpfr to compute cosd(30) rather than a host-
> precision floating point literal based on an invalid macro.

Ok for trunk, thanks.

Jakub



Re: [PATCH][RFC] extend DECL_GIMPLE_REG_P to all types

2020-04-23 Thread Richard Biener
On Wed, 22 Apr 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 22 Apr 2020, Richard Biener wrote:
> >
> >> 
> >> This extends DECL_GIMPLE_REG_P to all types so we can clear
> >> TREE_ADDRESSABLE even for integers with partial defs, not just
> >> complex and vector variables.  To make that transition easier
> >> the patch inverts DECL_GIMPLE_REG_P to DECL_NOT_GIMPLE_REG_P
> >> since that makes the default the current state for all other
> >> types besides complex and vectors.  That also nicely simplifies
> >> code throughout the compiler.
> >> 
> >> TREE_ADDRESSABLE and DECL_NOT_GIMPLE_REG_P are now truly
> >> independent, either set prevents a decl from being rewritten
> >> into SSA form.
> >> 
> >> For the testcase in PR94703 we're able to expand the partial
> >> def'ed local integer to a register then, producing a single
> >> movl rather than going through the stack.
> >> 
> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >> 
> >> If there are no objections I'm going to install this once
> >> stage1 opens.
> >
> > Of course there was some fallout.  On 32bit x86 gcc.dg/torture/pr71522.c
> > fails execution because while the GIMPLE is unchanged at RTL expansion
> > time:
> >
> > main ()
> > {
> >   char s[12];
> >   long double d;
> >
> >   MEM  [(char * {ref-all})&d] = MEM  
> > [(char * {ref-all})"AAA"];
> >   MEM  [(char * {ref-all})&s] = MEM  
> > [(char * {ref-all})&d];
> >   _1 = __builtin_strcmp (&s, "AAA");
> >   if (_1 != 0)
> > ...
> >
> > we now assign 'd' an XFmode register (TREE_ADDRESSABLE is cleared
> > now since we can set DECL_NOT_GIMPLE_REG_P).  The case is lost
> > then, impossible to fix up AFAICS.  On x86 all moves to/from
> > XFmode are normalizing, specifically we end up with
> >
> > fldt.LC0
> > fstpt   (%esp)
> >
> > now the most appealing solution - and totally in the opposite
> > direction of this patch - is to simply stop expanding non-SSA names
> > as pseudos.  I do not remember the history as why we do this
> > but it's likely remanents we preserved from either pre-SSA, times
> > we did not go into SSA for -O0 or times we really gone out-of-SSA.
> >
> > There is _some_ good reason to expand a non-SSA "register" into
> > a pseudo though - namely that RTL is not SSA and thus can accept
> > partial defs.  And of course that RTL cannot get rid of a stack
> > slot assigned to a variable.  Today we have somewhat robust
> > infrastructure to deal with partial defs on GIMPLE, namely
> > BIT_INSERT_EXPR, but it's not fully exercised.
> 
> Yeah, not being able to get rid of the stack slot seems
> worrying here.
> 
> > It's of course possible to fixup the above problematical
> > cases (there's precenent with discover_nonconstant_array_refs,
> > which could be "easily" extended to handle "weird" accesses
> > of non-integral-mode variables) but with the recent discussion
> > on making RTL expansion more straight-forward I'd bring up
> > the above idea ... it would get rid of quite some special
> > code dealing with tcc_reference trees (and MEM_REFs) ending
> > up operating on registers.
> 
> It might be nice to do it eventually, but I think at least
> is_gimple_reg_type would need to be "return true" first,
> otherwise we'll lose too much on aggregates.
> 
> There's also the problem that things passed in registers do need
> to be RTL registers at function boundaries, so I'm not sure all
> the expand code would necessarily go away.
> 
> Wouldn't want to see all targets suffer for XFmode oddities :-)

OK, so here's the patch amemded with some heuristics to catch
this.  The heuristic triggers exactly on the previously
failing testcase and nothing else on a x86_64 bootstrap and regtest.
Citing the code:

/* If there's a chance to get a pseudo for t then if it would be of float 
mode
   and the actual access is via an integer mode (lowered memcpy or similar
   access) then avoid the register expansion if the mode likely is not 
storage
   suitable for raw bits processing (like XFmode on i?86).  */

static void
avoid_type_punning_on_regs (tree t)
{
  machine_mode access_mode = TYPE_MODE (TREE_TYPE (t));
  if (access_mode != BLKmode
  && !SCALAR_INT_MODE_P (access_mode))
return;
  tree base = get_base_address (t);
  if (DECL_P (base)
  && !TREE_ADDRESSABLE (base)
  && FLOAT_MODE_P (DECL_MODE (base))
  && maybe_lt (GET_MODE_PRECISION (DECL_MODE (base)),
   GET_MODE_BITSIZE (GET_MODE_INNER (DECL_MODE (base
  /* Double check in the expensive way we really would get a pseudo.  
*/
  && use_register_for_decl (base))
TREE_ADDRESSABLE (base) = 1;
}

invoked on stores like

if (gimple_vdef (stmt))
  {
tree t = gimple_get_lhs (stmt);
if (t && REFERENCE_CLASS_P (t))
  avoid_type_punning_on_regs (t);
  }

loads are not an issue on their own.  So the basic idea is to rule
out float-mode pseudos whi

Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Segher Boessenkool
On Thu, Apr 23, 2020 at 12:52:30PM +0200, Richard Biener wrote:
> On Thu, Apr 23, 2020 at 12:17 PM Segher Boessenkool
>  wrote:
> >
> > On Thu, Apr 23, 2020 at 09:32:37AM +0200, Richard Biener wrote:
> > > On Thu, Apr 23, 2020 at 12:31 AM Jeff Law  wrote:
> > > > On Wed, 2020-04-22 at 15:50 -0500, Segher Boessenkool wrote:
> > > > > > > In some ways it feels like it would be easier to resurrect RTL 
> > > > > > > SSA :-)
> > > > >
> > > > > Why was RTL SSA abandoned?
> > > > >
> > > > > It might well work to keep everything in SSA form all the way to RA.
> > > > > Hrm, that doesn't sound bad at all :-)
> > > > >
> > > > > (The PHIs need to be made explicit to something that resembles the
> > > > > machine code we will end up with, very early in the pipeline, but it
> > > > > could still also be some valid SSA form; and we can of course also
> > > > > have hard registers in all RTL, so that needs to be dealt with sanely
> > > > > some way as well  Lots of details, I don't see a crucial problem 
> > > > > though,
> > > > > probably means I need to look harder ;-) )
> > > > Lack of time mostly.  There's some complications like subregs, argument 
> > > > registers
> > > > and the like.  But you can restrict ssa based analysis & optimizations 
> > > > to just
> > > > the set of pseudos that are in SSA form and do something more 
> > > > conservative on the
> > > > rest.
> > >
> > > I guess time is better spent on trying to extend GIMPLE + SSA up to RA, 
> > > thus
> > > make instruction selection on GIMPLE.
> >
> > I think this is a bad idea.  By the time you have invented enough new
> > "lower GIMPLE" ("limple"?) to be able to use it to describe machine
> > insns like we can with RTL, you will have a more verbose, more memory
> > hungry, slower, etc. reinvented RTL.
> 
> I don't think there's much to invent.
> 
> I think at least one step would be uncontroversical(?), namely moving
> the RTL expansion "magic"
> up to a GIMPLE pass.  Where the "magic" would be to turn
> GIMPLE stmts not directly expandable via an existing optab into
> GIMPLE that can be trivially expanded.  That includes eventually
> combining multiple stmts into more powerful instructions and
> doing the magic we have in, like, expand_binop (widening, etc.).
> Where there's not a 1:1 mapping of a GIMPLE stmt to an optab
> GIMPLE gets direct-internal-fn calls.
> Then RTL expansion would be mostly invoking gen_insn (optab-code).

Most of expand is *other stuff*.  Expand does a *lot* of things that are
actually changing the code.  And much of that is not done anywhere else
either yet, so this cannot be fixed by simply deleting the offending code.

> More controversical would be ending up in GIMPLE there.  I think
> GIMPLE can handle all RTL insns if we massage GIMPLE_ASM
> a bit.  You'd end up with, say,
> 
>  asm ("(set (reg:DI $0)
> (and:DI (reg/v:DI $1 [ dst ])
> (reg:DI $2)))" : "r" (_1) : "r" (_2), "r" (_3) : "cc");
> 
> in place of
> 
>   _1 = _2 & _3;
> 
> and the GIMPLE_ASM text could be actual RTL.  We'd extend
> the stmt with an extra operand to denote recognized patterns,
> so another option would be to keep the original GIMPLE as well.

Why would you ever want to do that?  That would take much more memory,
and RTL's memory use until recently always was a pain point.

> > RTL is a *feature*, and it is one of the things that makes GCC
> > significantly better than the competition.
> 
> That said, I actually agree with that.  It's just that I hope we can
> make some of the knowledge just represented on the RTL side
> available on the GIMPLE side.  The more complicated parts,
> like calling conventions, that is.

Yeah, and like I said, some things (unroll...) should move to GIMPLE as
well, most of it anyway.  And some of the remaining RTL code needs a
good overhaul (oh hello CSE).

> And yes, I want to get rid of that expand monster to be able to
> do something like sched1 on "GIMPLE" without expand coming
> along and re-scheduling everything at-will.

Right, ideally, expand would just translate GIMPLE to RTL one-to-one
(well, few-to-few, whatever :-) ).  But it does so much other stuff now,
so all that has to be moved or reimplemented or whatever.

> > More optimisations should move to GIMPLE, for example some loop
> > optimisations should be done much earlier (most unrolling).  The expand
> > pass should lose most of the "optimisations" it has built up over the
> > decades (that now often are detrimental at best).  Some of what expand
> > now does should probably be done while still in GIMPLE, even.
> >
> > But it is very useful to have a separate "low level" representation,
> > that is actually close to the machine code we will eventually generate.
> > RTL is one such representation, and we already have it, and it is very
> > well tuned by now -- throwing it away would require some *huge*
> > advantage, because the costs of doing that are immense as well.
> 
> But being stuck with something means no 

Re: [PATCH] amdgcn: Add stub personality function

2020-04-23 Thread Andrew Stubbs

On 23/04/2020 12:21, Kwok Cheung Yeung wrote:
I agree that not generating the problematic code in the first place is 
the better approach. Does that mean we can now remove 
libgcc/config/gcn/unwind-gcn.c completely?


That was added for the benefit of libgfortran, not C++. It's used by the 
backtrace code called from the stop handler, among other places, and I 
don't believe there was another workaround for that (without adding 
extra config checks to libgfortran).


I had not got to Thomas's patch yet, so I hadn't noticed they solved the 
same problem.


It probably is better to use Thomas's patch. I'm just worried that it'll 
catch me out when we do implement this stuff, in a way that the stub 
would not.


Andrew


Re: [PATCH] Do not remove ifunc_resolver in LTO.

2020-04-23 Thread Jan Hubicka
> On 4/22/20 8:11 PM, Jan Hubicka wrote:
> > > On Mon, 2020-04-20 at 11:34 +0200, Martin Liška wrote:
> > > > Hi.
> > > > 
> > > > The patch prevents a ifunc alias from removal in remove unreachable 
> > > > nodes.
> > > > Note that ifunc alias lives in a COMDAT section and so that
> > > > cgraph_node::can_remove_if_no_direct_calls_and_refs_p returned true for 
> > > > it.
> > > > 
> > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > > > I was unable to create a lto test-case where a linked binary could be
> > > > scanned for assembly.
> > > > 
> > > > Ready to be installed?
> > > > Thanks,
> > > > Martin
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > 2020-04-20  Martin Liska  
> > > > 
> > > > PR lto/94659
> > > > * cgraph.h 
> > > > (cgraph_node::can_remove_if_no_direct_calls_and_refs_p):
> > > > Do not remove ifunc_resolvers in remove unreachable nodes in 
> > > > LTO.
> > > OK
> > Is it intended to keep the comdat group alive even when the function is
> > not used by the current translation unit?
> 
> Yes, if you take a look at the mentioned PR, the function is exported and used
> by a different TU.
> 
> Or do you have a particular test-case which you're talking about?

Well, first I think you want to use force_output flag when you create
the alias instead of modifying can_remove_if_no_direct_calls_and_refs_p.

I wonder what happens when your function is static and we optimize away
all uses of it - then I think GCC should optimize it away.
Similarly if the function is comdat:
__attribute__((target_clones("default,avx")))
inline
int f1()
{
return 2;
}
int
main()
{
  return f1();
}

We should avoid outputting it to every compilation unit that includes
the header.

I do not recall, why that function is comdat at first place?

Honza


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Richard Biener via Gcc-patches
On Thu, Apr 23, 2020 at 2:07 PM Segher Boessenkool
 wrote:
>
> On Thu, Apr 23, 2020 at 12:52:30PM +0200, Richard Biener wrote:
> > On Thu, Apr 23, 2020 at 12:17 PM Segher Boessenkool
> >  wrote:
> > >
> > > On Thu, Apr 23, 2020 at 09:32:37AM +0200, Richard Biener wrote:
> > > > On Thu, Apr 23, 2020 at 12:31 AM Jeff Law  wrote:
> > > > > On Wed, 2020-04-22 at 15:50 -0500, Segher Boessenkool wrote:
> > > > > > > > In some ways it feels like it would be easier to resurrect RTL 
> > > > > > > > SSA :-)
> > > > > >
> > > > > > Why was RTL SSA abandoned?
> > > > > >
> > > > > > It might well work to keep everything in SSA form all the way to RA.
> > > > > > Hrm, that doesn't sound bad at all :-)
> > > > > >
> > > > > > (The PHIs need to be made explicit to something that resembles the
> > > > > > machine code we will end up with, very early in the pipeline, but it
> > > > > > could still also be some valid SSA form; and we can of course also
> > > > > > have hard registers in all RTL, so that needs to be dealt with 
> > > > > > sanely
> > > > > > some way as well  Lots of details, I don't see a crucial problem 
> > > > > > though,
> > > > > > probably means I need to look harder ;-) )
> > > > > Lack of time mostly.  There's some complications like subregs, 
> > > > > argument registers
> > > > > and the like.  But you can restrict ssa based analysis & 
> > > > > optimizations to just
> > > > > the set of pseudos that are in SSA form and do something more 
> > > > > conservative on the
> > > > > rest.
> > > >
> > > > I guess time is better spent on trying to extend GIMPLE + SSA up to RA, 
> > > > thus
> > > > make instruction selection on GIMPLE.
> > >
> > > I think this is a bad idea.  By the time you have invented enough new
> > > "lower GIMPLE" ("limple"?) to be able to use it to describe machine
> > > insns like we can with RTL, you will have a more verbose, more memory
> > > hungry, slower, etc. reinvented RTL.
> >
> > I don't think there's much to invent.
> >
> > I think at least one step would be uncontroversical(?), namely moving
> > the RTL expansion "magic"
> > up to a GIMPLE pass.  Where the "magic" would be to turn
> > GIMPLE stmts not directly expandable via an existing optab into
> > GIMPLE that can be trivially expanded.  That includes eventually
> > combining multiple stmts into more powerful instructions and
> > doing the magic we have in, like, expand_binop (widening, etc.).
> > Where there's not a 1:1 mapping of a GIMPLE stmt to an optab
> > GIMPLE gets direct-internal-fn calls.
> > Then RTL expansion would be mostly invoking gen_insn (optab-code).
>
> Most of expand is *other stuff*.  Expand does a *lot* of things that are
> actually changing the code.  And much of that is not done anywhere else
> either yet, so this cannot be fixed by simply deleting the offending code.
>
> > More controversical would be ending up in GIMPLE there.  I think
> > GIMPLE can handle all RTL insns if we massage GIMPLE_ASM
> > a bit.  You'd end up with, say,
> >
> >  asm ("(set (reg:DI $0)
> > (and:DI (reg/v:DI $1 [ dst ])
> > (reg:DI $2)))" : "r" (_1) : "r" (_2), "r" (_3) : "cc");
> >
> > in place of
> >
> >   _1 = _2 & _3;
> >
> > and the GIMPLE_ASM text could be actual RTL.  We'd extend
> > the stmt with an extra operand to denote recognized patterns,
> > so another option would be to keep the original GIMPLE as well.
>
> Why would you ever want to do that?  That would take much more memory,
> and RTL's memory use until recently always was a pain point.

It's not the RTL IL that uses much memory, it's infrastructure like DF
that easily blows up.  Mind that GCC has only a single function in
RTL at a time but the whole program in GIMPLE.  And GIMPLE
includes "DF" by default.

> > > RTL is a *feature*, and it is one of the things that makes GCC
> > > significantly better than the competition.
> >
> > That said, I actually agree with that.  It's just that I hope we can
> > make some of the knowledge just represented on the RTL side
> > available on the GIMPLE side.  The more complicated parts,
> > like calling conventions, that is.
>
> Yeah, and like I said, some things (unroll...) should move to GIMPLE as
> well, most of it anyway.  And some of the remaining RTL code needs a
> good overhaul (oh hello CSE).
>
> > And yes, I want to get rid of that expand monster to be able to
> > do something like sched1 on "GIMPLE" without expand coming
> > along and re-scheduling everything at-will.
>
> Right, ideally, expand would just translate GIMPLE to RTL one-to-one
> (well, few-to-few, whatever :-) ).  But it does so much other stuff now,
> so all that has to be moved or reimplemented or whatever.
>
> > > More optimisations should move to GIMPLE, for example some loop
> > > optimisations should be done much earlier (most unrolling).  The expand
> > > pass should lose most of the "optimisations" it has built up over the
> > > decades (that now often are detrimental at best).  Some of what ex

Re: [PATCH] rs6000: Small improvement to the C++17 ABI fix [PR94707]

2020-04-23 Thread Segher Boessenkool
On Thu, Apr 23, 2020 at 01:48:00PM +0200, Jakub Jelinek wrote:
> On Thu, Apr 23, 2020 at 05:24:19AM -0500, Segher Boessenkool wrote:
> > > +   inform (input_location,
> > > +   "parameter passing for argument of type %qT "
> > > +   "when C++17 is enabled changed to match C++14 "
> > > +   "in GCC 10.1", type);
> > 
> > It isn't "to match C++14".  It simply is a bugfix, we didn't follow
> > the ABI before :-)
> 
> The reason for the exact wording was to make it clearer to the user
> that C++17 doesn't have a different ABI from C++14 now, but it had in the
> older releases.

No, it used the same ABI then as well, but with a buggy implementation :-)

The ABI is not determined by GCC.

> Anyway, based on IRC discussion with Richard Sandiford on IRC, we should
> probably test type uids instead of type pointers because type uids aren't
> reused, but type pointers in a very bad luck case could be, and having the
> static var at filescope and GTY((deletable)) is an overkill (and with costs
> during GC time).
> 
> Ok if it passes bootstrap/regtest?
> 
> 2020-04-23  Jakub Jelinek  
> 
>   PR target/94707
>   * config/rs6000/rs6000-call.c (rs6000_discover_homogeneous_aggregate):
>   Use TYPE_UID (TYPE_MAIN_VARIANT (type)) instead of type to check
>   if the same type has been diagnosed most recently already.
> 
> --- gcc/config/rs6000/rs6000-call.c.jj2020-04-23 09:59:12.002172006 
> +0200
> +++ gcc/config/rs6000/rs6000-call.c   2020-04-23 13:42:10.037745872 +0200
> @@ -5739,14 +5739,15 @@ rs6000_discover_homogeneous_aggregate (m
>   *n_elts = field_count;
> if (cxx17_empty_base_seen && warn_psabi)
>   {
> -   static const_tree last_reported_type;
> -   if (type != last_reported_type)
> +   static unsigned last_reported_type_uid;
> +   unsigned uid = TYPE_UID (TYPE_MAIN_VARIANT (type));
> +   if (uid != last_reported_type_uid)
>   {
> inform (input_location,
> "parameter passing for argument of type %qT "
> "when C++17 is enabled changed to match C++14 "
> "in GCC 10.1", type);
> -   last_reported_type = type;
> +   last_reported_type_uid = uid;
>   }
>   }
> return true;

That looks fine, please go ahead.  Thanks :-)


Segher


[PATCH] aarch64: add tests for CPP predefines under -mgeneral-regs-only

2020-04-23 Thread Yangfei (Felix)
Hi,

I noticed that gcc.target/aarch64/pragma_cpp_predefs_1.c performs testing 
for -mgeneral-regs-only.
This adds similar testing in the following two tests to make sure CPP 
predefines redefinitions on #pragma
works as expected when -mgeneral-regs-only option is specified (See 
PR94678):
gcc.target/aarch64/pragma_cpp_predefs_2.c
gcc.target/aarch64/pragma_cpp_predefs_3.c

The two tests pass with the modification.  OK?

gcc/testsuite/
PR target/94678
* gcc.target/aarch64/pragma_cpp_predefs_2.c: Fix typos, pop_pragma ->
pop_options. Add tests for general-regs-only.
* gcc.target/aarch64/pragma_cpp_predefs_3.c: Add tests for
general-regs-only.

Thanks for your help,
Felix


add-sve-predef-tests-v1.diff
Description: add-sve-predef-tests-v1.diff


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Segher Boessenkool
On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > But being stuck with something means no progress...  I know
> > > very well it's 100 times harder to get rid of something than to
> > > add something new ontop.
> >
> > Well, what progress do you expect to make?  After expand that is :-)
> 
> I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> no CSE, ...

RTL CSE for example is very much required to get any good code.  It
needs to CSE stuff that wasn't there before expand.

The pass currently does much more (as well as not enough), of course.

> The important part before RA is robust and intelligent
> instruction selection - part of which is already done at RTL expansion
> time.

LOL.

The expand pass doesn't often make good choices, and it *shouldn't*, it
should not make many choices at all; it should just generate valid RTL,
new pseudos for everything, and let later RTL passes make faster code
from that.

> > Most of what is done in RTL is done very well.
> 
> Umm, well...  I beg to differ with regard to DF and passes like
> postreload-gcse.

What is wrong with DF?

Is there something particular in postreload-gcse that is bad?  To me it
always is just one of those passes that doesn't do anything :-)  That
can and should be cleaned up, sure :-)

> > Replacing specific things in RTL, maybe as big as whole passes, already
> > is hard to do without regressing (a *lot*).  And if there is no real
> > reason to do that...
> 
> The motivation is to make GCC faster, obviously.  Not spending time
> doing things that should have been done before (RTL PRE vs. GIMPLE PRE, etc.).
> Using the same infrastructure (what, no loop dependency analysis on RTL?), 
> etc.

But everything you want to remove isn't high on profiles anyway?  And
you proposed adding bigger, slower, stuff to replace it all with.

Slow are RA, and the language frontends (esp. C++ like to take 15% of
total :-/)

> You could say we should do more on RTL and enhance it instead, like do
> vectorization where we actually could have a better idea on costs and
> capabilities.  But I'm a GIMPLE person and don't know enough of RTL to
> enhance it ...

Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
there.  But for low-level, close-to-the-machine stuff, RTL is much
better suited.  And we *do* want to optimise at that level as well, and
much more than just peepholes.


Segher


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Richard Biener via Gcc-patches
On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
 wrote:
>
> On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > But being stuck with something means no progress...  I know
> > > > very well it's 100 times harder to get rid of something than to
> > > > add something new ontop.
> > >
> > > Well, what progress do you expect to make?  After expand that is :-)
> >
> > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > no CSE, ...
>
> RTL CSE for example is very much required to get any good code.  It
> needs to CSE stuff that wasn't there before expand.

Sure, but then we should fix that!

> The pass currently does much more (as well as not enough), of course.
>
> > The important part before RA is robust and intelligent
> > instruction selection - part of which is already done at RTL expansion
> > time.
>
> LOL.
>
> The expand pass doesn't often make good choices, and it *shouldn't*, it
> should not make many choices at all; it should just generate valid RTL,
> new pseudos for everything, and let later RTL passes make faster code
> from that.

But valid RTL is instructions that are recognized.  Which means
when the target doesn't support an SImode add we may not create
one.  That's instruction selection ;)

> > > Most of what is done in RTL is done very well.
> >
> > Umm, well...  I beg to differ with regard to DF and passes like
> > postreload-gcse.
>
> What is wrong with DF?

It's slow and memory hungry?

> Is there something particular in postreload-gcse that is bad?  To me it
> always is just one of those passes that doesn't do anything :-)  That
> can and should be cleaned up, sure :-)

postreload-gcse is ad-hoc, it uses full blown gcse tools that easily
blow up (compute_transp) when it doesn't really require it
(Ive fixed things up a bit in dc91c65378cd0e6c0).  But I wonder why,
if we want to do PRE of loads, we don't simply schedule another
gcse pass rather than implementing a new one.  IIRC what the pass
does could be done with much more local dataflow.  Both
postreload gcse and cse are major time-hogs on "bad" testcases :/

> > > Replacing specific things in RTL, maybe as big as whole passes, already
> > > is hard to do without regressing (a *lot*).  And if there is no real
> > > reason to do that...
> >
> > The motivation is to make GCC faster, obviously.  Not spending time
> > doing things that should have been done before (RTL PRE vs. GIMPLE PRE, 
> > etc.).
> > Using the same infrastructure (what, no loop dependency analysis on RTL?), 
> > etc.
>
> But everything you want to remove isn't high on profiles anyway?  And
> you proposed adding bigger, slower, stuff to replace it all with.
>
> Slow are RA, and the language frontends (esp. C++ like to take 15% of
> total :-/)
>
> > You could say we should do more on RTL and enhance it instead, like do
> > vectorization where we actually could have a better idea on costs and
> > capabilities.  But I'm a GIMPLE person and don't know enough of RTL to
> > enhance it ...
>
> Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> there.  But for low-level, close-to-the-machine stuff, RTL is much
> better suited.  And we *do* want to optimise at that level as well, and
> much more than just peepholes.

Well, everything that requires costing (unrolling, vectorization,
IV selection to name a few) _is_ close-to-the-machine.  We're
just saying they are not because GIMPLE is so much easier to
work with here (not sure why exactly...).

Richard.

>
> Segher


Re: [PATCH] aarch64, libgcc: Fix unwinding from pac-ret to normal frames [PR94514]

2020-04-23 Thread Szabolcs Nagy
The 04/22/2020 15:22, Christophe Lyon wrote:
> The new test fails with ilp32, not sure if that's supposed to work?
> 
> FAIL: gcc.target/aarch64/pr94514.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.target/aarch64/pr94514.c:27:4: warning: cast to
> pointer from integer of different size [-Wint-to-pointer-cast]
> 
> spawn 
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/aarch64-none-elf/invoke-foundation-v8-bare-metal.sh
> ./pr94514.exe
> force_unwind_stop: CFA: 0xef80 PC: 0x80001304 actions: 10
> force_unwind_stop: CFA: 0xef90 PC: 0x8000133c actions: 10
> Terminated by exception.
> 
> *** EXIT code 126
> gcc.target/aarch64/pr94514.c execution test (reason: TCL LOOKUP CHANNEL exp7)
> FAIL: gcc.target/aarch64/pr94514.c execution test
> 
> (executed using the Foundation Model)
> 
> 
> The C++ test compiles without warnings, but fails at execution too
...
> Maybe you just want to skip the test for ilp32?

i didn't test ilp32, i would expect a compile time error for

__attribute__((target("branch-protection=pac-ret")))

on ilp32, or just ignoring it (which would have worked),
runtime error for this on a pac-enabled system is nasty.

i disabled the test on ilp32 as an obvious fix (attached),
and raised PR94729 for the attribute handling in ilp32.

thanks for catching this.
>From 744b3e4478df83f54543964b8eb7250eb9bb6d40 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy 
Date: Thu, 23 Apr 2020 11:26:10 +0100
Subject: [PATCH] aarch64: disable tests on ilp32 [PR94514]

branch-protection=pac-ret is only supported with lp64 abi.

gcc/testsuite/ChangeLog:

	PR target/94514
	* g++.target/aarch64/pr94514.C: Require lp64.
	* gcc.target/aarch64/pr94514.c: Likewise.
---
 gcc/testsuite/ChangeLog| 6 ++
 gcc/testsuite/g++.target/aarch64/pr94514.C | 1 +
 gcc/testsuite/gcc.target/aarch64/pr94514.c | 1 +
 3 files changed, 8 insertions(+)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 245c1512c76..7e676f053a5 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2020-04-23  Szabolcs Nagy  
+
+	PR target/94514
+	* g++.target/aarch64/pr94514.C: Require lp64.
+	* gcc.target/aarch64/pr94514.c: Likewise.
+
 2020-04-23  Jakub Jelinek  
 
 	PR target/94707
diff --git a/gcc/testsuite/g++.target/aarch64/pr94514.C b/gcc/testsuite/g++.target/aarch64/pr94514.C
index 2a8c949ba30..ae925cafeb6 100644
--- a/gcc/testsuite/g++.target/aarch64/pr94514.C
+++ b/gcc/testsuite/g++.target/aarch64/pr94514.C
@@ -1,5 +1,6 @@
 /* PR target/94514. Unwind across mixed pac-ret and non-pac-ret frames.  */
 /* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
 
 __attribute__((noinline, target("branch-protection=pac-ret")))
 static void do_throw (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr94514.c b/gcc/testsuite/gcc.target/aarch64/pr94514.c
index bbbf5a6b0b3..cbc940421d2 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr94514.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr94514.c
@@ -1,5 +1,6 @@
 /* PR target/94514. Unwind across mixed pac-ret and non-pac-ret frames.  */
 /* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-options "-fexceptions -O2" } */
 
 #include 
-- 
2.17.1



Re: [PATCH] coroutines: Handle lambda capture objects in the way as clang.

2020-04-23 Thread Iain Sandoe
Iain Sandoe  wrote:

> Nathan Sidwell  wrote:
> 
>> On 4/22/20 8:48 AM, Iain Sandoe wrote:
>>> Hi,
>>> There is no PR for this, at present, but the implementation of
>>> clang and GCC's handling of lambda capture object implicit parms
>>> is currently different.  There is still some discussion about
>>> 'correct' interpretation of the standard - but in the short-term
>>> it is probably best to have consistent implementations - even if
>>> those subsequently turn out to be 'consistently wrong'.
>> 
>> Agreed, the std is at best ambigiuous in this area, we should aim for 
>> implementation agreement.
> 
> following more discussion amongst WG21 members, it appears that there is still
> some confusion over the status of other implementations, and it may well be 
> that
> clang will be updated to follow the pattern that GCC is currently 
> implementing.
> 
> In light of this, perhaps it’s best to withdraw this patch for now.

I think we should apply the following, pending the resolution of the ‘correct’
action for the lambda closure object.  This brings GCC into line with clang for 
the
handing of ‘this’ in method coroutines, but leaves it untouched for lambda 
closure
object pointers.

OK for master?
thanks
Iain



We changed the argument passed to the promise parameter preview
to match a reference to *this.  However to be consistent with the
other ports, we do need to match the reference transformation in
the traits lookup and the promise allocator lookup.

gcc/cp/ChangeLog:

2020-04-23  Iain Sandoe  

* coroutines.cc (instantiate_coro_traits): Pass a reference to
object type rather than a pointer type for 'this', for method
coroutines.
(struct param_info): Add a field to hold that the parm is a lambda
closure pointer.
(morph_fn_to_coro): Check for lambda closure pointers in the
args.  Use a reference to *this when building the args list for the
promise allocator lookup.
---
 gcc/cp/coroutines.cc | 52 
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 1a31415690b..8eec53cea46 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -406,14 +406,25 @@ instantiate_coro_traits (tree fndecl, location_t kw)
  type.  */
 
   tree functyp = TREE_TYPE (fndecl);
+  tree arg = DECL_ARGUMENTS (fndecl);
+  bool lambda_p = LAMBDA_TYPE_P (DECL_CONTEXT (fndecl));
   tree arg_node = TYPE_ARG_TYPES (functyp);
   tree argtypes = make_tree_vec (list_length (arg_node)-1);
   unsigned p = 0;
 
   while (arg_node != NULL_TREE && !VOID_TYPE_P (TREE_VALUE (arg_node)))
 {
-  TREE_VEC_ELT (argtypes, p++) = TREE_VALUE (arg_node);
+  if (is_this_parameter (arg) && !lambda_p)
+   {
+ /* We pass a reference to *this to the param preview.  */
+ tree ct = TREE_TYPE (TREE_TYPE (arg));
+ TREE_VEC_ELT (argtypes, p++) = cp_build_reference_type (ct, false);
+   }
+  else
+   TREE_VEC_ELT (argtypes, p++) = TREE_VALUE (arg_node);
+
   arg_node = TREE_CHAIN (arg_node);
+  arg = DECL_CHAIN (arg);
 }
 
   tree argtypepack = cxx_make_type (TYPE_ARGUMENT_PACK);
@@ -1885,6 +1896,7 @@ struct param_info
   bool pt_ref;   /* Was a pointer to object.  */
   bool trivial_dtor; /* The frame type has a trivial DTOR.  */
   bool this_ptr; /* Is 'this' */
+  bool lambda_cobj;  /* Lambda capture object */
 };
 
 struct local_var_info
@@ -3798,6 +3810,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  The second two entries start out empty - and only get populated
  when we see uses.  */
   param_uses = new hash_map;
+  bool lambda_p = LAMBDA_TYPE_P (DECL_CONTEXT (orig));
 
   for (tree arg = DECL_ARGUMENTS (orig); arg != NULL;
   arg = DECL_CHAIN (arg))
@@ -3837,7 +3850,17 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
}
  else
parm.frame_type = actual_type;
+
  parm.this_ptr = is_this_parameter (arg);
+ if (lambda_p)
+   {
+ parm.lambda_cobj = parm.this_ptr
+|| (DECL_NAME (arg) == closure_identifier);
+ parm.this_ptr = false;
+   }
+ else
+   parm.lambda_cobj = false;
+
  parm.trivial_dtor = TYPE_HAS_TRIVIAL_DESTRUCTOR (parm.frame_type);
  tree pname = DECL_NAME (arg);
  char *buf = xasprintf ("__parm.%s", IDENTIFIER_POINTER (pname));
@@ -3977,9 +4000,28 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
those of the original function.  */
   vec *args = make_tree_vector ();
   vec_safe_push (args, resizeable); /* Space needed.  */
+
   for (tree arg = DECL_ARGUMENTS (orig); arg != NULL;
   arg = DECL_CHAIN (arg))
-   vec_safe_push (args, arg);
+   {
+ param_info *parm_i = param_uses->get (arg);
+ gcc_checking_asser

Re: [PING] [PATCH] [ARM] Adjust test expectations of unaligned-memcpy-2/3.c (PR 91614)

2020-04-23 Thread Bernd Edlinger



On 4/22/20 8:20 PM, Jeff Law wrote:
> On Thu, 2020-03-26 at 04:23 +0100, Bernd Edlinger wrote:
>> Hi,
>>
>> I am pinging this because PR 91614 has been raised to P2 now:
>> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg00370.html
> It's been nearly a month since the ping and many since the patch was posted
> originally.
> 
> I interpret Richard E's comments from Sept as a reject of the patch as-is.
> 
> I tend to agree with Richard in that testing assembly code for this is going 
> to
> be quite fragile and an execution test would be better.
> 

Yes, that is/was my understanding as well, I just wanted to make sure this was 
not
a mis-understanding or a message lost in transmission.
I think we can live with this test case as known fail.
Jeff, could you please make this tracker a P4 instead of P2 ?

FYI: I received no bounce, but any message can easily be dropped by spam 
filters...


Thanks,
Bernd.

> Jeff
> 


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Segher Boessenkool
On Thu, Apr 23, 2020 at 03:07:23PM +0200, Richard Biener wrote:
> On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
>  wrote:
> >
> > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > But being stuck with something means no progress...  I know
> > > > > very well it's 100 times harder to get rid of something than to
> > > > > add something new ontop.
> > > >
> > > > Well, what progress do you expect to make?  After expand that is :-)
> > >
> > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > no CSE, ...
> >
> > RTL CSE for example is very much required to get any good code.  It
> > needs to CSE stuff that wasn't there before expand.
> 
> Sure, but then we should fix that!

The expand pass, but also many RTL passes, will naturally generate code
that can be CSEd.  You don't want passes to have to do this themselves:
for example, this can be constants used to implement some standard
patterns in target code, etc.

> > LOL.
> >
> > The expand pass doesn't often make good choices, and it *shouldn't*, it
> > should not make many choices at all; it should just generate valid RTL,
> > new pseudos for everything, and let later RTL passes make faster code
> > from that.
> 
> But valid RTL is instructions that are recognized.  Which means
> when the target doesn't support an SImode add we may not create
> one.  That's instruction selection ;)

In that sense, you can call all RTL passes instruction selection?
Usually I understand more like combine, cprop, fwprop, cse, ifcvt,
splitters, peepholes.  That kind of thing :-)

Pretty much all of the RTL passes before RA, and a few after it.

> > > > Most of what is done in RTL is done very well.
> > >
> > > Umm, well...  I beg to differ with regard to DF and passes like
> > > postreload-gcse.
> >
> > What is wrong with DF?
> 
> It's slow and memory hungry?

Very true, of course.  But can this be significantly better?

> > Is there something particular in postreload-gcse that is bad?  To me it
> > always is just one of those passes that doesn't do anything :-)  That
> > can and should be cleaned up, sure :-)
> 
> postreload-gcse is ad-hoc, it uses full blown gcse tools that easily
> blow up (compute_transp) when it doesn't really require it
> (Ive fixed things up a bit in dc91c65378cd0e6c0).  But I wonder why,
> if we want to do PRE of loads, we don't simply schedule another
> gcse pass rather than implementing a new one.  IIRC what the pass
> does could be done with much more local dataflow.  Both
> postreload gcse and cse are major time-hogs on "bad" testcases :/

RTL CSE?  Really?  It just loves to give up early (which is a bad thing
of course, but that makes it take bounded time, and *less* on bad
testcases :-) )

So the "normal" gcse does not have this problem?

> > Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> > there.  But for low-level, close-to-the-machine stuff, RTL is much
> > better suited.  And we *do* want to optimise at that level as well, and
> > much more than just peepholes.
> 
> Well, everything that requires costing (unrolling, vectorization,
> IV selection to name a few) _is_ close-to-the-machine.  We're
> just saying they are not because GIMPLE is so much easier to
> work with here (not sure why exactly...).

Those transforms aren't close to the machine, not in the same way,
because they are beneficial independent of what exact instruction
sequences are generated.

Both are nasty in that both have cases doing the transform actually
hurts quite a bit; but *not* doing it where it *could* costs a lot as
well.  But other than that "little" issue ;-)


Segher


Follow-up Patch – Re: [Patch][OpenMP] Fix 'omp exit data' for Fortran arrays (PR 94635)

2020-04-23 Thread Tobias Burnus

On 4/20/20 11:33 PM, Thomas Schwinge wrote:

Really 'GOMP_MAP_DELETE', or should that rather be 'GOMP_MAP_RELEASE'?


Depends on the previous item, i.e. 'delete:' vs. 'release:/from:/…'

Rather obvious – OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
[OpenMP] Fix 'omp exit data' for Fortran arrays (PR 94635)

	PR middle-end/94635
	* gimplify.c (gimplify_scan_omp_clauses): For MAP_TO_PSET with
	OMP_TARGET_EXIT_DATA, use 'release:' unless the associated
	item is 'delete:'.

	PR middle-end/94635
	* gfortran.dg/gomp/target-exit-data.f90: New.

 gcc/gimplify.c  |  4 +++-
 gcc/testsuite/gfortran.dg/gomp/target-exit-data.f90 | 20 
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2f2c51b2d89..0bac9900210 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8789,7 +8789,9 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	 to be delete; hence, we turn the MAP_TO_PSET into a MAP_DELETE.  */
 	  if (code == OMP_TARGET_EXIT_DATA
 	  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_TO_PSET)
-	OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_DELETE);
+	OMP_CLAUSE_SET_MAP_KIND (c, OMP_CLAUSE_MAP_KIND (*prev_list_p)
+	== GOMP_MAP_DELETE
+	? GOMP_MAP_DELETE : GOMP_MAP_RELEASE);
 	  else if ((code == OMP_TARGET_EXIT_DATA || code == OMP_TARGET_UPDATE)
 		   && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
 		   || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_TO_PSET))
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-exit-data.f90 b/gcc/testsuite/gfortran.dg/gomp/target-exit-data.f90
new file mode 100644
index 000..ed57d0072d7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/target-exit-data.f90
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-omplower" }
+!
+! PR middle-end/94635
+
+integer, allocatable :: one(:), two(:), three(:)
+
+!$omp target enter data map(alloc:one)
+!$omp target enter data map(alloc:two)
+!$omp target enter data map(to:three)
+
+! ...
+!$omp target exit data map(delete:one)
+!$omp target exit data map(release:two)
+!$omp target exit data map(from:three)
+end
+
+! { dg-final { scan-tree-dump "omp target exit data map\\(delete:.*\\) map\\(delete:one \\\[len: .*\\\]\\)" "omplower" } }
+! { dg-final { scan-tree-dump "omp target exit data map\\(release:.*\\) map\\(release:two \\\[len: .*\\\]\\)" "omplower" } }
+! { dg-final { scan-tree-dump "omp target exit data map\\(from:.*\\) map\\(release:three \\\[len: .*\\\]\\)" "omplower" } }


[PATCH] aarch64: ensure bti c is emitted at function start [PR94697]

2020-04-23 Thread Szabolcs Nagy
The bti pass currently first emits bti c at function start
if there is no paciasp (which also acts as indirect call
landing pad), then bti j is emitted at jump labels, however
if there is a label right before paciasp then the function
start can end up like

  foo:
  label:
bti j
paciasp
...

This patch is a minimal fix that just moves the bti c handling
after the bti j handling so we end up with

  foo:
bti c
  label:
bti j
paciasp
...

This could be improved by emitting bti jc in this case, or by
detecting that the label is not in fact an indirect jump target
and then this situation would be much less common.

Needs to be backported to gcc-9 branch.

gcc/ChangeLog:

2020-04-XX  Szabolcs Nagy  

PR target/94697
* config/aarch64/aarch64-bti-insert.c (rest_of_insert_bti): Swap
bti c and bti j handling.

gcc/testsuite/ChangeLog:

2020-04-XX  Szabolcs Nagy  

PR target/94697
* gcc.target/aarch64/pr94697.c: New test.
---
 gcc/config/aarch64/aarch64-bti-insert.c| 32 +++---
 gcc/testsuite/gcc.target/aarch64/pr94697.c | 19 +
 2 files changed, 35 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr94697.c

diff --git a/gcc/config/aarch64/aarch64-bti-insert.c 
b/gcc/config/aarch64/aarch64-bti-insert.c
index 295d18acab8..aa091c308f6 100644
--- a/gcc/config/aarch64/aarch64-bti-insert.c
+++ b/gcc/config/aarch64/aarch64-bti-insert.c
@@ -132,22 +132,6 @@ rest_of_insert_bti (void)
   rtx_insn *insn;
   basic_block bb;
 
-  /* Since a Branch Target Exception can only be triggered by an indirect call,
- we exempt function that are only called directly.  We also exempt
- functions that are already protected by Return Address Signing (PACIASP/
- PACIBSP).  For all other cases insert a BTI C at the beginning of the
- function.  */
-  if (!cgraph_node::get (cfun->decl)->only_called_directly_p ())
-{
-  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
-  insn = BB_HEAD (bb);
-  if (!aarch64_pac_insn_p (get_first_nonnote_insn ()))
-   {
- bti_insn = gen_bti_c ();
- emit_insn_before (bti_insn, insn);
-   }
-}
-
   bb = 0;
   FOR_EACH_BB_FN (bb, cfun)
 {
@@ -203,6 +187,22 @@ rest_of_insert_bti (void)
}
 }
 
+  /* Since a Branch Target Exception can only be triggered by an indirect call,
+ we exempt function that are only called directly.  We also exempt
+ functions that are already protected by Return Address Signing (PACIASP/
+ PACIBSP).  For all other cases insert a BTI C at the beginning of the
+ function.  */
+  if (!cgraph_node::get (cfun->decl)->only_called_directly_p ())
+{
+  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
+  insn = BB_HEAD (bb);
+  if (!aarch64_pac_insn_p (get_first_nonnote_insn ()))
+   {
+ bti_insn = gen_bti_c ();
+ emit_insn_before (bti_insn, insn);
+   }
+}
+
   timevar_pop (TV_MACH_DEP);
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr94697.c 
b/gcc/testsuite/gcc.target/aarch64/pr94697.c
new file mode 100644
index 000..e6069d22ece
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr94697.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=standard" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void bar (int *);
+void *addr;
+
+/*
+** foo:
+** hint(25|34|38) // (paciasp|bti c|bti jc)
+** ...
+*/
+int foo (int x)
+{
+label:
+  addr = &&label;
+  bar (&x);
+  return x;
+}
-- 
2.17.1




[PATCH 2/8] testsuite: Add arm_v8_2a_fp16_neon and arm_v8_2a_bf16_neon options

2020-04-23 Thread Christophe Lyon via Gcc-patches
A few tests lack the dg-add-options directives associated with the
dg-require-effective-target they are using. Adding them enables to
pass the right float-abi option.

2020-04-21  Christophe Lyon  

gcc/tesuite/
* gcc.target/arm/bfloat16_scalar_typecheck.c: Add
arm_v8_2a_fp16_neon and arm_v8_2a_bf16_neon.
* gcc.target/arm/bfloat16_vector_typecheck_1.c: Likewise.
* gcc.target/arm/bfloat16_vector_typecheck_2.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c   | 2 ++
 gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c | 2 ++
 gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c | 6 --
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c
index 672641e..8c80c55 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c
@@ -2,6 +2,8 @@
 /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
 /* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
 /* { dg-additional-options "-march=armv8.6-a+bf16+fp16 -Wno-pedantic -O3 
--save-temps" }  */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c
index ba39cb6..f3c350b 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c
@@ -2,6 +2,8 @@
 /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
 /* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
 /* { dg-additional-options "-march=armv8.6-a+bf16+fp16 -Wno-pedantic -O3 
--save-temps" }  */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c
index 16669dc..de0ade5 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c
@@ -2,6 +2,8 @@
 /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
 /* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
 /* { dg-additional-options "-march=armv8.6-a+bf16+fp16 -Wno-pedantic -O3 
--save-temps" }  */
 
 #include 
@@ -25,8 +27,8 @@ float is_a_float16;
 double is_a_double;
 
 bfloat16x8_t foo3 (void) { return (bfloat16x8_t) 
0x12345678123456781234567812345678; }
- /* { dg-error {integer constant is too large for its type} "" {target *-*-*} 
27 } */
- /* { dg-error {cannot convert a value of type 'long long int' to vector type 
'__simd128_bfloat16_t' which has different size} "" {target *-*-*} 27 } */
+ /* { dg-error {integer constant is too large for its type} "" {target *-*-*} 
.-1 } */
+ /* { dg-error {cannot convert a value of type 'long long int' to vector type 
'__simd128_bfloat16_t' which has different size} "" {target *-*-*} .-2 } */
 
 bfloat16x8_t footest (bfloat16x8_t vector0)
 {
-- 
2.7.4



[PATCH 6/8] testsuite: Add arm_dsp_ok effective target and use it in arm/dsp_arith.c

2020-04-23 Thread Christophe Lyon via Gcc-patches
gcc.target/arm/acle/dsp_arith.c uses DSP intrinsics, which arm_acle.h
defines only with __ARM_FEATURE_DSP, so make the test check for that
property rather than arm_qbit_ok.

However, the existing arm_dsp effective target only checks if DSP
features are supported with the current multilib rather than trying
-march and -mfloat-abi options. Thus we introduce a similar effective
target, arm_dsp_ok and associated dg-add-options.

This makes dsp_arith.c unsupported rather than failed when no option
combination is suitable.

2020-04-21  Christophe Lyon  

gcc/
* doc/sourcebuild.texi (arm_dsp_ok, arm_dsp): Document.

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_arm_dsp_ok_nocache)
(check_effective_target_arm_dsp_ok, add_options_for_arm_dsp): New.
* gcc.target/arm/acle/dsp_arith.c: Use arm_dsp_ok effective target
and add arm_dsp options.
---
 gcc/doc/sourcebuild.texi  | 11 
 gcc/testsuite/gcc.target/arm/acle/dsp_arith.c |  4 +--
 gcc/testsuite/lib/target-supports.exp | 40 +++
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index b696120..b79f65e 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1971,6 +1971,12 @@ ARM Target supports options suitable for accessing the 
Q-bit manipulation
 intrinsics from @code{arm_acle.h}.
 Some multilibs may be incompatible with these options.
 
+@item arm_dsp_ok
+@anchor{arm_dsp_ok}
+ARM Target supports options suitable for accessing the DSP intrinsics
+from @code{arm_acle.h}.
+Some multilibs may be incompatible with these options.
+
 @item arm_softfp_ok
 @anchor{arm_softfp_ok}
 ARM target supports the @code{-mfloat-abi=softfp} option.
@@ -2613,6 +2619,11 @@ Add options to enable generation of the @code{VFMAL} and 
@code{VFMSL}
 instructions, if this is supported by the target; see the
 @ref{arm_fp16fml_neon_ok} effective target keyword.
 
+@item arm_dsp
+Add options for ARM DSP intrinsics support, if this is supported by
+the target; see the @ref{arm_dsp_ok,,arm_dsp_ok effective target
+keyword}.
+
 @item bind_pic_locally
 Add the target-specific flags needed to enable functions to bind
 locally when using pic/PIC passes in the testsuite.
diff --git a/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c 
b/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c
index 9ebd55a..7bf458e 100644
--- a/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c
+++ b/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_qbit_ok } */
-/* { dg-add-options arm_qbit  } */
+/* { dg-require-effective-target arm_dsp_ok } */
+/* { dg-add-options arm_dsp } */
 
 #include 
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 53ff2f6..9430be9 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3994,6 +3994,46 @@ proc add_options_for_arm_qbit { flags } {
 return "$flags $et_arm_qbit_flags"
 }
 
+# Return 1 if this is an ARM target supporting the DSP intrinsics from
+# arm_acle.h.  Some multilibs may be incompatible with these options.
+# Also set et_arm_dsp_flags to the best options to add.
+# arm_acle.h includes stdint.h which can cause trouble with incompatible
+# -mfloat-abi= options.
+# check_effective_target_arm_dsp also exists, which checks the current
+# multilib, without trying other options.
+
+proc check_effective_target_arm_dsp_ok_nocache { } {
+global et_arm_dsp_flags
+set et_arm_dsp_flags ""
+foreach flags {"" "-march=armv5te" "-march=armv5te -mfloat-abi=softfp" 
"-march=armv5te -mfloat-abi=hard"} {
+  if { [check_no_compiler_messages_nocache et_arm_dsp_ok object {
+   #include 
+   int dummy;
+   #ifndef __ARM_FEATURE_DSP
+   #error not DSP
+   #endif
+  } "$flags"] } {
+   set et_arm_dsp_flags $flags
+   return 1
+  }
+}
+
+  return 0
+}
+
+proc check_effective_target_arm_dsp_ok { } {
+return [check_cached_effective_target et_arm_dsp_flags \
+   check_effective_target_arm_dsp_ok_nocache]
+}
+
+proc add_options_for_arm_dsp { flags } {
+if { ! [check_effective_target_arm_dsp_ok] } {
+   return "$flags"
+}
+global et_arm_dsp_flags
+return "$flags $et_arm_dsp_flags"
+}
+
 # Return 1 if this is an ARM target supporting -mfpu=neon without any
 # -mfloat-abi= option.  Useful in tests where add_options is not
 # supported (such as lto tests).
-- 
2.7.4



[PATCH 4/8] testsuite: Add arm_softfp_ok or arm_hard_ok as needed.

2020-04-23 Thread Christophe Lyon via Gcc-patches
Several tests want to override the -mfloat-abi option detected by the
other effective targets. Make sure it is supported, so that these
tests are unsupported rather than failed.

2020-04-21  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/bf16_dup.c: Add arm_softfp_ok.
* gcc.target/arm/bfloat16_simd_1_2.c: Likewise.
* gcc.target/arm/bfloat16_simd_2_2.c: Likewise.
* gcc.target/arm/bfloat16_simd_3_2.c: Likewise.
* gcc.target/arm/bf16_reinterpret.c: Add arm_hard_ok.
* gcc.target/arm/bfloat16_simd_2_1.c: Likewise.
* gcc.target/arm/bfloat16_simd_3_1.c: Likewise.
* gcc.target/arm/simd/bf16_vldn_1.c: Likewise.
* gcc.target/arm/simd/bf16_vstn_1.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/bf16_dup.c  | 1 +
 gcc/testsuite/gcc.target/arm/bf16_reinterpret.c  | 1 +
 gcc/testsuite/gcc.target/arm/bfloat16_simd_1_2.c | 1 +
 gcc/testsuite/gcc.target/arm/bfloat16_simd_2_1.c | 1 +
 gcc/testsuite/gcc.target/arm/bfloat16_simd_2_2.c | 1 +
 gcc/testsuite/gcc.target/arm/bfloat16_simd_3_1.c | 1 +
 gcc/testsuite/gcc.target/arm/bfloat16_simd_3_2.c | 1 +
 gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c  | 1 +
 gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c  | 1 +
 9 files changed, 9 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/bf16_dup.c 
b/gcc/testsuite/gcc.target/arm/bf16_dup.c
index 94be99a..fe8c741 100644
--- a/gcc/testsuite/gcc.target/arm/bf16_dup.c
+++ b/gcc/testsuite/gcc.target/arm/bf16_dup.c
@@ -1,4 +1,5 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_softfp_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
 /* { dg-add-options arm_v8_2a_bf16_neon }  */
 /* { dg-additional-options "-save-temps -march=armv8.2-a+bf16+fp16 
-mfloat-abi=softfp" } */
diff --git a/gcc/testsuite/gcc.target/arm/bf16_reinterpret.c 
b/gcc/testsuite/gcc.target/arm/bf16_reinterpret.c
index e7d30a9..044a664 100644
--- a/gcc/testsuite/gcc.target/arm/bf16_reinterpret.c
+++ b/gcc/testsuite/gcc.target/arm/bf16_reinterpret.c
@@ -1,4 +1,5 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_hard_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
 /* { dg-add-options arm_v8_2a_bf16_neon }  */
 /* { dg-additional-options "-save-temps -march=armv8.2-a+fp16+bf16 
-mfloat-abi=hard -mfpu=crypto-neon-fp-armv8" } */
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_2.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_2.c
index 4ffcc54..95eecec 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_2.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_2.c
@@ -1,4 +1,5 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_softfp_ok } */
 /* { dg-require-effective-target arm_v8_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
 /* { dg-additional-options "-march=armv8.2-a+bf16 -mfloat-abi=softfp 
-mfpu=auto" } */
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_1.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_1.c
index 05ee4d8..02b4c41 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_1.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_1.c
@@ -1,4 +1,5 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_hard_ok } */
 /* { dg-require-effective-target arm_v8_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
 /* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard 
-mfpu=neon-fp-armv8" } */
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_2.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_2.c
index 15fba31..175bfa5 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_2.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_2.c
@@ -1,4 +1,5 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_softfp_ok } */
 /* { dg-require-effective-target arm_v8_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
 /* { dg-additional-options "-march=armv8.2-a -mfloat-abi=softfp 
-mfpu=neon-fp-armv8" } */
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_1.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_1.c
index b9b7606..d2326c2 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_1.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_1.c
@@ -1,4 +1,5 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_hard_ok } */
 /* { dg-require-effective-target arm_v8_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
 /* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard 
-mfpu=neon-fp-armv8" } */
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_2.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_2.c
index ab1fe10..346253b 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_2.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_2.c
@@ -1,4 +1,5 @

[PATCH 7/8] testsuite: [arm] Remove useless -mfloat-abi option

2020-04-23 Thread Christophe Lyon via Gcc-patches
These tests pass with their current dg-add-options, no need to force
-mfloat=abi.

2020-04-21  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/armv8_1m-shift-imm-1.c: Remove -mfloat=abi option.
* gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise.
* gcc.target/arm/pr51534.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c | 2 +-
 gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c | 2 +-
 gcc/testsuite/gcc.target/arm/pr51534.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
index 883fbb09..84f13e2 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfloat-abi=softfp -mlittle-endian" } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-options "-O2 -mlittle-endian" } */
 /* { dg-add-options arm_v8_1m_mve } */
 
 long long longval1;
diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
index e125ff8..8668b6b 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfloat-abi=softfp -mlittle-endian" } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-options "-O2 -mlittle-endian" } */
 /* { dg-add-options arm_v8_1m_mve } */
 
 long long longval2;
diff --git a/gcc/testsuite/gcc.target/arm/pr51534.c 
b/gcc/testsuite/gcc.target/arm/pr51534.c
index f675a44..3711b45 100644
--- a/gcc/testsuite/gcc.target/arm/pr51534.c
+++ b/gcc/testsuite/gcc.target/arm/pr51534.c
@@ -3,7 +3,7 @@
 
 /* { dg-do assemble } */
 /* { dg-require-effective-target arm_neon_ok } */
-/* { dg-options "-save-temps -mfloat-abi=hard -O3" } */
+/* { dg-options "-save-temps -O3" } */
 /* { dg-add-options arm_neon } */
 
 #include 
-- 
2.7.4



[PATCH 1/8] testsuite: Fix -mfloat-abi order in arm_v8_2a_bf16_neon_ok and arm_v8_2a_i8mm_ok_nocache

2020-04-23 Thread Christophe Lyon via Gcc-patches
Make the order in which we try -mfloat-abi options consistent with the
other similar effective targets: try softfp first, then hard.

We have new failures on arm-eabi:
FAIL: gcc.target/arm/bfloat16_scalar_1_1.c check-function-bodies stacktest1
FAIL: gcc.target/arm/bfloat16_simd_1_1.c check-function-bodies stacktest1
FAIL: gcc.target/arm/bfloat16_simd_1_1.c check-function-bodies stacktest2
FAIL: gcc.target/arm/bfloat16_simd_1_1.c check-function-bodies stacktest3
FAIL: gcc.target/arm/simd/bf16_ma_1.c check-function-bodies test_vfmabq_f32
FAIL: gcc.target/arm/simd/bf16_ma_1.c check-function-bodies test_vfmabq_lane_f32
FAIL: gcc.target/arm/simd/bf16_ma_1.c check-function-bodies 
test_vfmabq_laneq_f32
FAIL: gcc.target/arm/simd/bf16_ma_1.c check-function-bodies test_vfmatq_f32
FAIL: gcc.target/arm/simd/bf16_ma_1.c check-function-bodies test_vfmatq_lane_f32
FAIL: gcc.target/arm/simd/bf16_ma_1.c check-function-bodies 
test_vfmatq_laneq_f32
FAIL: gcc.target/arm/simd/bf16_mmla_1.c check-function-bodies test_vmmlaq_f32
FAIL: gcc.target/arm/simd/vdot-2-1.c check-function-bodies sfoo_lane
FAIL: gcc.target/arm/simd/vdot-2-1.c check-function-bodies sfooq_lane
FAIL: gcc.target/arm/simd/vdot-2-1.c check-function-bodies usfoo
FAIL: gcc.target/arm/simd/vdot-2-1.c check-function-bodies usfoo_lane
FAIL: gcc.target/arm/simd/vdot-2-1.c check-function-bodies usfoo_lane_untied
FAIL: gcc.target/arm/simd/vdot-2-1.c check-function-bodies usfoo_untied
FAIL: gcc.target/arm/simd/vdot-2-1.c check-function-bodies usfooq_lane
FAIL: gcc.target/arm/simd/vdot-2-2.c check-function-bodies sfoo_lane
FAIL: gcc.target/arm/simd/vdot-2-2.c check-function-bodies sfooq_lane
FAIL: gcc.target/arm/simd/vdot-2-2.c check-function-bodies usfoo
FAIL: gcc.target/arm/simd/vdot-2-2.c check-function-bodies usfoo_lane
FAIL: gcc.target/arm/simd/vdot-2-2.c check-function-bodies usfoo_lane_untied
FAIL: gcc.target/arm/simd/vdot-2-2.c check-function-bodies usfoo_untied
FAIL: gcc.target/arm/simd/vdot-2-2.c check-function-bodies usfooq_lane

are these tests supposed to require -float-abi=hard?

2020-04-21  Christophe Lyon  

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_arm_v8_2a_i8mm_ok_nocache): Fix
-mfloat-abi= options order.
(check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): Likewise.
---
 gcc/testsuite/lib/target-supports.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index a667ddf..53ff2f6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5017,7 +5017,7 @@ proc check_effective_target_arm_v8_2a_i8mm_ok_nocache { } 
{
 
 # Iterate through sets of options to find the compiler flags that
 # need to be added to the -march option.
-foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" 
"-mfloat-abi=softfp -mfpu=neon-fp-armv8" } {
+foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
"-mfloat-abi=hard -mfpu=neon-fp-armv8" } {
 if { [check_no_compiler_messages_nocache \
   arm_v8_2a_i8mm_ok object {
 #include 
@@ -5102,7 +5102,7 @@ proc 
check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } {
 return 0;
 }
 
-foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" 
"-mfloat-abi=softfp -mfpu=neon-fp-armv8" } {
+foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
"-mfloat-abi=hard -mfpu=neon-fp-armv8" } {
 if { [check_no_compiler_messages_nocache arm_v8_2a_bf16_neon_ok object 
{
 #include 
 #if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)
-- 
2.7.4



[PATCH 5/8] testsuite: Add arm_softfp_ok in gcc.target/arm/pr51968.c

2020-04-23 Thread Christophe Lyon via Gcc-patches
This test forces -mfloat-abi=softfp, so we add the related effective
target to make it unsupported on arm-linux-gnueabihf.

2020-04-21  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/pr51968.c: Add arm_softfp_ok effective target.
---
 gcc/testsuite/gcc.target/arm/pr51968.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr51968.c 
b/gcc/testsuite/gcc.target/arm/pr51968.c
index 7814702..c06da48 100644
--- a/gcc/testsuite/gcc.target/arm/pr51968.c
+++ b/gcc/testsuite/gcc.target/arm/pr51968.c
@@ -1,7 +1,8 @@
 /* PR target/51968 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=armv7-a -mfloat-abi=softfp -mfpu=neon" } */
+/* { dg-require-effective-target arm_softfp_ok } */
 /* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -march=armv7-a -mfloat-abi=softfp -mfpu=neon" } */
 #include 
 
 struct T { int8x8x2_t val; };
-- 
2.7.4



[PATCH 3/8] testsuite: Add arm_v8_2a_i8mm options in gcc.target/arm/simd/vmmla_1.c

2020-04-23 Thread Christophe Lyon via Gcc-patches
We need to add the options corresponding to the arm_v8_2a_i8mm_ok
effective target in order to use the right float-abi option.

2020-04-21  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/simd/vmmla_1.c: Add arm_v8_2a_i8mm options.
---
 gcc/testsuite/gcc.target/arm/simd/vmmla_1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/simd/vmmla_1.c 
b/gcc/testsuite/gcc.target/arm/simd/vmmla_1.c
index b766a91..0007de6 100644
--- a/gcc/testsuite/gcc.target/arm/simd/vmmla_1.c
+++ b/gcc/testsuite/gcc.target/arm/simd/vmmla_1.c
@@ -1,6 +1,7 @@
 /* { dg-do assemble } */
 /* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
 /* { dg-options "-save-temps -O2" } */
+/* { dg-add-options arm_v8_2a_i8mm } */
 /* { dg-additional-options "-march=armv8.2-a+i8mm" } */
 
 #include "arm_neon.h"
-- 
2.7.4



RE: [PATCH] aarch64: ensure bti c is emitted at function start [PR94697]

2020-04-23 Thread Kyrylo Tkachov


> -Original Message-
> From: Szabolcs Nagy 
> Sent: 23 April 2020 14:51
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Kyrylo Tkachov
> ; Sudakshina Das 
> Subject: [PATCH] aarch64: ensure bti c is emitted at function start [PR94697]
> 
> The bti pass currently first emits bti c at function start
> if there is no paciasp (which also acts as indirect call
> landing pad), then bti j is emitted at jump labels, however
> if there is a label right before paciasp then the function
> start can end up like
> 
>   foo:
>   label:
> bti j
> paciasp
> ...
> 
> This patch is a minimal fix that just moves the bti c handling
> after the bti j handling so we end up with
> 
>   foo:
> bti c
>   label:
> bti j
> paciasp
> ...
> 
> This could be improved by emitting bti jc in this case, or by
> detecting that the label is not in fact an indirect jump target
> and then this situation would be much less common.
> 
> Needs to be backported to gcc-9 branch.

Ok.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
> 2020-04-XX  Szabolcs Nagy  
> 
>   PR target/94697
>   * config/aarch64/aarch64-bti-insert.c (rest_of_insert_bti): Swap
>   bti c and bti j handling.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-04-XX  Szabolcs Nagy  
> 
>   PR target/94697
>   * gcc.target/aarch64/pr94697.c: New test.
> ---
>  gcc/config/aarch64/aarch64-bti-insert.c| 32 +++---
>  gcc/testsuite/gcc.target/aarch64/pr94697.c | 19 +
>  2 files changed, 35 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr94697.c
> 
> diff --git a/gcc/config/aarch64/aarch64-bti-insert.c
> b/gcc/config/aarch64/aarch64-bti-insert.c
> index 295d18acab8..aa091c308f6 100644
> --- a/gcc/config/aarch64/aarch64-bti-insert.c
> +++ b/gcc/config/aarch64/aarch64-bti-insert.c
> @@ -132,22 +132,6 @@ rest_of_insert_bti (void)
>rtx_insn *insn;
>basic_block bb;
> 
> -  /* Since a Branch Target Exception can only be triggered by an indirect 
> call,
> - we exempt function that are only called directly.  We also exempt
> - functions that are already protected by Return Address Signing (PACIASP/
> - PACIBSP).  For all other cases insert a BTI C at the beginning of the
> - function.  */
> -  if (!cgraph_node::get (cfun->decl)->only_called_directly_p ())
> -{
> -  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> -  insn = BB_HEAD (bb);
> -  if (!aarch64_pac_insn_p (get_first_nonnote_insn ()))
> - {
> -   bti_insn = gen_bti_c ();
> -   emit_insn_before (bti_insn, insn);
> - }
> -}
> -
>bb = 0;
>FOR_EACH_BB_FN (bb, cfun)
>  {
> @@ -203,6 +187,22 @@ rest_of_insert_bti (void)
>   }
>  }
> 
> +  /* Since a Branch Target Exception can only be triggered by an indirect 
> call,
> + we exempt function that are only called directly.  We also exempt
> + functions that are already protected by Return Address Signing (PACIASP/
> + PACIBSP).  For all other cases insert a BTI C at the beginning of the
> + function.  */
> +  if (!cgraph_node::get (cfun->decl)->only_called_directly_p ())
> +{
> +  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> +  insn = BB_HEAD (bb);
> +  if (!aarch64_pac_insn_p (get_first_nonnote_insn ()))
> + {
> +   bti_insn = gen_bti_c ();
> +   emit_insn_before (bti_insn, insn);
> + }
> +}
> +
>timevar_pop (TV_MACH_DEP);
>return 0;
>  }
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr94697.c
> b/gcc/testsuite/gcc.target/aarch64/pr94697.c
> new file mode 100644
> index 000..e6069d22ece
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr94697.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbranch-protection=standard" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +void bar (int *);
> +void *addr;
> +
> +/*
> +** foo:
> +**   hint(25|34|38) // (paciasp|bti c|bti jc)
> +**   ...
> +*/
> +int foo (int x)
> +{
> +label:
> +  addr = &&label;
> +  bar (&x);
> +  return x;
> +}
> --
> 2.17.1
> 



[PATCH 8/8] testsuite: Fix -mfloat-abi order in arm_v8_1m_mve_ok and arm_v8_1m_mve_fp_ok

2020-04-23 Thread Christophe Lyon via Gcc-patches
Make the order in which we try -mfloat-abi options consistent with the
other similar effective targets: try softfp first, then hard.

We have new failures on arm-eabi:

FAIL: gcc.target/arm/mve/intrinsics/mve_vector_int.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_int.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_int.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_int.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint1.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint1.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint1.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint1.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint2.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint2.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint2.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_uint2.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c scan-assembler vmov\\tr0, 
r1, d0
FAIL: gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c scan-assembler vmov\\tr0, 
r1, d0
FAIL: gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c scan-assembler vmov\\tr0, 
r1, d0
FAIL: gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c scan-assembler vmov\\tr0, 
r1, d0
FAIL: gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c scan-assembler vmov\\td0, 
r[1-9]*[0-9], r[1-9]*[0-9]
FAIL: gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c scan-assembler vmov\\td0, 
r[1-9]*[0-9], r[1-9]*[0-9]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_float2.c scan-assembler 
vmov\\tq[0-7], q[0-7]
FAIL: gcc.target/arm/mve/intrinsics/mve_vector_float2.c scan-assembler 
vmov\\tq[0-7], q[0-7]

are these tests supposed to require -float-abi=hard?

2020-04-21  Christophe Lyon  

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): Fix
-mfloat-abi= options order.
(check_effective_target_arm_v8_2a_i8mm_ok_nocache): Likewise
---
 gcc/testsuite/lib/target-supports.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 9430be9..2dca1cf 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4815,7 +4815,7 @@ proc check_effective_target_arm_v8_1m_mve_fp_ok_nocache { 
} {
 
 # Iterate through sets of options to find the compiler flags that
 # need to be added to the -march option.
-foreach flags {"" "-mfloat-abi=hard -mfpu=auto 
-march=armv8.1-m.main+mve.fp" "-mfloat-abi=softfp -mfpu=auto 
-march=armv8.1-m.main+mve.fp"} {
+foreach flags {"" "-mfloat-abi=softfp -mfpu=auto 
-march=armv8.1-m.main+mve.fp" "-mfloat-abi=hard -mfpu=auto 
-march=armv8.1-m.main+mve.fp"} {
if { [check_no_compiler_messages_nocache \
  arm_v8_1m_mve_fp_ok object {
#include 
@@ -4998,7 +4998,7 @@ proc check_effective_target_arm_v8_1m_mve_ok_nocache { } {
 
 # Iterate through sets of options to find the compiler flags that
 # need to be added to the -march option.
-foreach flags {"" "-mfloat-abi=hard -mfpu=auto -march=armv8.1-m.main+mve" 
"-mfloat-abi=softfp -mfpu=auto -march=armv8.1-m.main+mve"} {
+foreach flags {"" "-mfloat-abi=softfp -mfpu=auto 
-march=armv8.1-m.main+mve" "-mfloat-abi=hard -mfpu=auto 
-march=armv8.1-m.main+mve"} {
 if { [check_no_compiler_messages_nocache \
   arm_v8_1m_mve_ok object {
 #if !defined (__ARM_FEATURE_MVE)
-- 
2.7.4



Re: [PATCH] coroutines: Fix handling of conditional statements [PR94288]

2020-04-23 Thread Nathan Sidwell

On 4/20/20 12:48 PM, Iain Sandoe wrote:

Hi,

Normally, when we find a statement containing an await expression
this will be expanded to a statement list implementing the control
flow implied.  The expansion process successively replaces each
await expression in a statement with the result of its await_resume().

In the case of conditional statements (if, while, do, switch) the
expansion of the condition (or expression in the case of do-while)
cannot take place 'inline', leading to the PR.

The solution is to evaluate the expression separately, and to
transform while and do-while loops into endless loops with a break
on the required condition.

In fixing this, I realised that I'd also made a thinko in the case
of expanding truth-and/or-if expressions, where one arm of the
expression might need to be short-circuited.  The mechanism for
expanding via the tree walk will not work correctly in this case and
we need to pre-expand any truth-and/or-if with an await expression
on its conditionally-taken arm.  This applies to any statement with
truth-and/or-if expressions, so can be handled generically.

This has been tested in various permutations (including without-
checking) on x86_64-darwin, now testing on x86/power Linux.

The testcases do not include the testcase from the PR since that
fails because of PR94661 (it can be included when that’s resolved).

The testcases are appended as a text file.

OK for master, assuming that regstraps on x86/power Linux are OK?
thanks
Iain


gcc/cp/ChangeLog:

2020-04-20  Iain Sandoe  

PR c++/94288
* coroutines.cc (await_statement_expander): Simplify cases.
(struct susp_frame_data): Add fields for truth and/or if
cases, rename one field.
(analyze_expression_awaits): New.
(expand_one_truth_if): New.
(add_var_to_bind): New helper.
(coro_build_add_if_not_cond_break): New helper.
(await_statement_walker): Handle conditional expressions,
handle expansion of truth-and/or-if cases.
(bind_expr_find_in_subtree): New, checking-only.
(coro_body_contains_bind_expr_p): New, checking-only.
(morph_fn_to_coro): Ensure that we have a top level bind
expression.


ok.

--
Nathan Sidwell


[committed] amdgcn: Check HSA return codes [PR94629]

2020-04-23 Thread Andrew Stubbs
This patch adds some additional checking to ensure that the HSA runtime 
call do not return errors.


Previously the return codes were ignored, but failure still detected by 
the output data still having its initial value. This was probably safe, 
but a static analyzer correctly noticed that the status was ignored. 
Anyway, this is probably good practice.


Andrew
amdgcn: Check HSA return codes [PR94629]

Ensure that the returned status values are not ignored.  The old code was
not broken, but this is both safer and satisfies static analysis.

2020-04-23  Andrew Stubbs  

	PR other/94629

	libgomp/
	* plugin/plugin-gcn.c (init_hsa_context): Check return value from
	hsa_iterate_agents.
	(GOMP_OFFLOAD_init_device): Check return values from both calls to
	hsa_agent_iterate_regions.

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index dc72c90962c..4c6a4c03b6e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1508,6 +1508,8 @@ init_hsa_context (void)
 = GOMP_PLUGIN_malloc_cleared (hsa_context.agent_count
   * sizeof (struct agent_info));
   status = hsa_fns.hsa_iterate_agents_fn (assign_agent_ids, &agent_index);
+  if (status != HSA_STATUS_SUCCESS)
+return hsa_error ("Scanning compute agents failed", status);
   if (agent_index != hsa_context.agent_count)
 {
   GOMP_PLUGIN_error ("Failed to assign IDs to all GCN agents");
@@ -3473,6 +3475,9 @@ GOMP_OFFLOAD_init_device (int n)
   status = hsa_fns.hsa_agent_iterate_regions_fn (agent->id,
 		 get_kernarg_memory_region,
 		 &agent->kernarg_region);
+  if (status != HSA_STATUS_SUCCESS
+  && status != HSA_STATUS_INFO_BREAK)
+hsa_error ("Scanning memory regions failed", status);
   if (agent->kernarg_region.handle == (uint64_t) -1)
 {
   GOMP_PLUGIN_error ("Could not find suitable memory region for kernel "
@@ -3486,6 +3491,9 @@ GOMP_OFFLOAD_init_device (int n)
   status = hsa_fns.hsa_agent_iterate_regions_fn (agent->id,
 		 get_data_memory_region,
 		 &agent->data_region);
+  if (status != HSA_STATUS_SUCCESS
+  && status != HSA_STATUS_INFO_BREAK)
+hsa_error ("Scanning memory regions failed", status);
   if (agent->data_region.handle == (uint64_t) -1)
 {
   GOMP_PLUGIN_error ("Could not find suitable memory region for device "


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Jeff Law via Gcc-patches
On Thu, 2020-04-23 at 07:07 -0500, Segher Boessenkool wrote:
> 
> > I think at least one step would be uncontroversical(?), namely moving
> > the RTL expansion "magic"
> > up to a GIMPLE pass.  Where the "magic" would be to turn
> > GIMPLE stmts not directly expandable via an existing optab into
> > GIMPLE that can be trivially expanded.  That includes eventually
> > combining multiple stmts into more powerful instructions and
> > doing the magic we have in, like, expand_binop (widening, etc.).
> > Where there's not a 1:1 mapping of a GIMPLE stmt to an optab
> > GIMPLE gets direct-internal-fn calls.
> > Then RTL expansion would be mostly invoking gen_insn (optab-code).
> 
> Most of expand is *other stuff*.  Expand does a *lot* of things that are
> actually changing the code.  And much of that is not done anywhere else
> either yet, so this cannot be fixed by simply deleting the offending code.
A lot of what is done by the expansion pass is historical and dates back to when
we never optimized more than a statement at a time for trees and the real heavy
lifting was all done in RTL.

THe introduction of tree-ssa meant that all the expansion code which wanted to
see a "nice" complex statement and generate good RTL code was useless and as a
result we saw significant regressions in the end code quality.

That in turn brought in TER who's sole purpose was to reconstruct those more
complex trees for the purposes of improving initial expansion.  I think match.pd
has the potential to make TER go away as match.pd is a generic combining
framework.

WRT extending SSA deeper, I'm all for it and it's always been something I wanted
to see happen.I've always believed we could do RTL SSA to the register
allocation phase and that doing so would be a positive.  If we start at the 
front
of the RTL pipeline and work towards the back we bump into CSE quickly, which
would be good.  Our CSE is awful across multiple axis.  I suspect most RTL 
passes
would be reimplementations rather than bolting SSA on the side and I suspect
those reimplementations would be simpler than the existing stuff -- I'm a strong
believer there's a lot of dead code in the RTL passes as well.


jeff





RE: [PATCH PR94708] rtl combine should consider NaNs when generate fp min/max

2020-04-23 Thread Zhanghaijian (A)
Thanks for your suggestions. I have modified accordingly.
Attached please find the adapted patch. Bootstrap and tested on aarch64 Linux 
platform.
Does the v2 patch look batter?

diff --git a/gcc/combine.c b/gcc/combine.c
index cff76cd3303..ad8a385fc48 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -6643,7 +6643,8 @@ simplify_if_then_else (rtx x)
 
   /* Look for MIN or MAX.  */
 
-  if ((! FLOAT_MODE_P (mode) || flag_unsafe_math_optimizations)
+  if ((! FLOAT_MODE_P (mode)
+   || (!HONOR_NANS (mode) && !HONOR_SIGNED_ZEROS (mode)))
   && comparison_p
   && rtx_equal_p (XEXP (cond, 0), true_rtx)
   && rtx_equal_p (XEXP (cond, 1), false_rtx)

Thanks,
Haijian Zhang

-Original Message-
From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] 
Sent: Thursday, April 23, 2020 7:38 PM
To: Richard Biener 
Cc: Zhanghaijian (A) ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH PR94708] rtl combine should consider NaNs when generate fp 
min/max

Hi!

On Thu, Apr 23, 2020 at 11:05:22AM +0200, Richard Biener wrote:
> On Thu, Apr 23, 2020 at 10:42 AM Zhanghaijian (A) 
>  wrote:
> > This is a simple fix for pr94708.
> > It's unsafe for rtl combine to generate fp min/max under 
> > -funsafe-math-optimizations, considering NaNs.
> > We can only do this kind of transformation under 
> > -funsafe-math-optimizations and -ffinite-math-only.
> > Bootstrap and tested on aarch64 Linux platform. No new regression witnessed.
> >
> > Any suggestion?
> 
> Please do not check flags, instead use && !HONOR_NANS (mode)

Yeah, good point.

> What about signed zeros?

-funsafe-math-optimizations implies -fno-signed-zeros.

> The GENERIC folding routine producing
> min/max is avoiding it when those are honored (and it doesn't check 
> flag_unsafe_math_optmizations at all).
> 
> Certainly the patch is an incremental correct fix, with the flag 
> testing replaced by the mode feature testing.

Yeah, and the SMAX etc. definition is so weak that it isn't obvious that this 
combine transform is valid without this flag.  We can or should fix that, of 
course :-)


Segher


pr94708-v2.patch
Description: pr94708-v2.patch


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Jeff Law via Gcc-patches
On Thu, 2020-04-23 at 15:07 +0200, Richard Biener wrote:
> On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
>  wrote:
> > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > But being stuck with something means no progress...  I know
> > > > > very well it's 100 times harder to get rid of something than to
> > > > > add something new ontop.
> > > > 
> > > > Well, what progress do you expect to make?  After expand that is :-)
> > > 
> > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > no CSE, ...
> > 
> > RTL CSE for example is very much required to get any good code.  It
> > needs to CSE stuff that wasn't there before expand.
> 
> Sure, but then we should fix that!
Exactly.  It's purpose largely becomes dealing with the redundancies exposed by
expansion.  ie, address arithmetic and the like.   A lot of its path following
code should be throttled back.

> 
> But valid RTL is instructions that are recognized.  Which means
> when the target doesn't support an SImode add we may not create
> one.  That's instruction selection ;)
That's always a point of tension.  But I think that in general continuing to 
have
targets claim to support things they do not (such as double-wordsize arithmetic,
logicals, moves, etc) is a mistake.  It made sense at one time, but I think 
we've
got better mechansisms in place to deal with this stuff now.

> 
> > Is there something particular in postreload-gcse that is bad?  To me it
> > always is just one of those passes that doesn't do anything :-)  That
> > can and should be cleaned up, sure :-)
> 
> postreload-gcse is ad-hoc, it uses full blown gcse tools that easily
> blow up (compute_transp) when it doesn't really require it
> (Ive fixed things up a bit in dc91c65378cd0e6c0).  But I wonder why,
> if we want to do PRE of loads, we don't simply schedule another
> gcse pass rather than implementing a new one.  IIRC what the pass
> does could be done with much more local dataflow.  Both
> postreload gcse and cse are major time-hogs on "bad" testcases :/
I think the biggest reason is the existing gcse bits inherently assume they can
create new registers.  It's deeply baked into gcse.c.  There's ways around that,
but it's likely a lot of work.

> 
> > Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> > there.  But for low-level, close-to-the-machine stuff, RTL is much
> > better suited.  And we *do* want to optimise at that level as well, and
> > much more than just peepholes.
> 
> Well, everything that requires costing (unrolling, vectorization,
> IV selection to name a few) _is_ close-to-the-machine.  We're
> just saying they are not because GIMPLE is so much easier to
> work with here (not sure why exactly...).
The primary motivation behind discouraging target costing and the like from
gimple was to make it easier to implement and predict the behavior of the gimple
optimizers.   We've relaxed that somewhat, particularly for vectorization, but I
think the principle is still solid.

But I think there is a place for adding target dependencies -- and that's at the
end of the current gimple pipeline.

Jeff



[Version 2][PATCH][gcc][PR94230]provide an option to change the size limitation for -Wmisleading-indent

2020-04-23 Thread Qing Zhao via Gcc-patches
Hi, 
This is the second version of the patch based on the previous discussion.
In this new version, the major changes are:

1. The name of the option is changed to -flarge-source-files;
2. Add a hint to use this new option “-flarge-source-files” in the routine 
“get_visual_column”;
3. Documentation for this new option;
4. Update the testing case location-overflow-test-1.c to include the new hint.

Please take a look at this new patch and let me know any new comment.

thanks.

Qing.

gcc/ChangeLog:

2020-04-22  qing zhao  

PR c/94230
* common.opt: Add -flarge-source-files.
* doc/invoke.texi: Document it.
* toplev.c (process_options): set line_table->default_range_bits
to 0 when flag_large_source_files is true.


gcc/c-family/ChangeLog:

2020-04-22  qing zhao  

PR c/94230
* c-indentation.c (get_visual_column): Add a hint to use the new
-flarge-source-files option.

gcc/testsuite/ChangeLog:

2020-04-22  qing zhao  

PR c/94230
* gcc.dg/plugin/location-overflow-test-1.c (fn_1): New message to 
provide hint to use the new -flarge-source-files option.



PR94230.patch
Description: Binary data




> On Apr 22, 2020, at 9:22 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi, Richard And Dave:
> 
> Thanks a lot for the review and comments.
> 
>> On Apr 21, 2020, at 1:46 PM, Richard Sandiford  
>> wrote:
>> 
>> David Malcolm  writes:
>>> On Tue, 2020-04-21 at 15:04 +0100, Richard Sandiford wrote
 
 Please add:
 
PR c/94230
> 
> Will do. 
> 
 
>   * common.opt: Add -flocation-ranges.
>   * doc/invoke.texi: Document it.
>   * toplev.c (process_options): set line_table-
>> default_range_bits
>   to 0 when flag_location_ranges is false. 
 
 I think it would be worth adding a hint to use the new option to
 get_visual_column, when warning about column tracking being disabled.
 This should probably be a second inform(), immediately after the
 current one.
> 
> Sounds reasonable to me, I will add that.
> 
 
> @@ -14151,6 +14151,13 @@ This option may be useful in conjunction
> with the @option{-B} or
> perform additional processing of the program source between
> normal preprocessing and compilation.
> 
> +@item -flocation-ranges
> +@opindex flocation-ranges
 
 Normally the documented option should be the non-default one,
 so -fno-... in this case.
> 
> Okay. 
> 
 
> +Enable range tracking when recording source locations.
> +By default, GCC enables range tracking when recording source
> locations.
> +If disable range tracking by -fno-location-ranges, more location
> space
> +will be saved for column tracking.
 
 My understanding is that the patch doesn't actually disable location-
 range
 tracking, but simply uses a less efficient form for *all* ranges,
 rather
 than only using the less efficient form for ranges that aren't "caret
 at
 start, length < 64 chars".
>>> 
>>> Indeed.
> 
> Okay, I see. 
> Providing a good documentation at the user level for this option might be a 
> challenge to me, I will try.  -:)
> 
>>> 
 I know you're simply following the suggestion in the PR, sorry,
>>> 
>>> Sorry.  I did put a caveat on the suggestion FWIW.
>>> 
 but I wonder if the option should instead be:
 
 -flarge-source-files
 
 since that seems like a more user-facing concept.  The option would
 tell GCC that the source files are likely to be very large and that
 GCC should adapt accordingly.  In particular, the option makes GCC
 cope with more source lines at the expense of slowing down
 compilation
 and using more memory.
>>> 
>>> Another approach would be to go lower-level and introduce a param for
>>> this; something like "--param location-range-bits" defaulting to 5; the
>>> user can set it to 0 to disable the range-bit optimization to deal with
>>> bigger files, and it allows for experimentation without rebuilding the
>>> compiler.
>>> 
>>> Again, I don't know if this is a good idea; I'm thinking aloud; I'm not
>>> sure what the best direction here is.
>> 
>> The reason I like the -flarge-source-files (suggestion for better
>> names welcome) is that the user is giving user-level information and
>> letting the compiler decide how to deal with that.  What the option
>> actually does can change with the implementation as necessary.
>> 
>> Potentially any user could hit the -Wmisleading-indent note, or could
>> hit the limit at which columns get dropped from diagnostics.  So while
>> this option isn't going to be used all that often, it also doesn't feel
>> like an option designed specifically for “power users” who like to
>> experiment with compiler internals.
> 
> Agreed, I prefer to use -flarge-source-files for this functionality. 
> 
> Let me know if you have any other suggestions for this patch.
> 
> Thanks.
> 
> Qing
> 
> 
>> 
>> T

[committed] vect: Fix comparisons between invariant booleans [PR94727]

2020-04-23 Thread Richard Sandiford
This PR was caused by mismatched expectations between
vectorizable_comparison and SLP.  We had a "<" comparison
between two booleans that were leaves of the SLP tree, so
vectorizable_comparison fell back on:

  /* Invariant comparison.  */
  if (!vectype)
{
  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1),
 slp_node);
  if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits))
return false;
}

rhs1 and rhs2 were *unsigned* boolean types, so we got back a vector
of unsigned integers.  This in itself was OK, and meant that "<"
worked as expected without the need for the boolean fix-ups:

  /* Boolean values may have another representation in vectors
 and therefore we prefer bit operations over comparison for
 them (which also works for scalar masks).  We store opcodes
 to use in bitop1 and bitop2.  Statement is vectorized as
   BITOP2 (rhs1 BITOP1 rhs2) or
   rhs1 BITOP2 (BITOP1 rhs2)
 depending on bitop1 and bitop2 arity.  */
  bool swap_p = false;
  if (VECTOR_BOOLEAN_TYPE_P (vectype))
{

However, vectorizable_comparison then used vect_get_slp_defs to get
the actual operands.  The request went to vect_get_constant_vectors,
which also has logic to calculate the vector type.  The problem was
that this type was different from the one chosen above:

  if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op))
  && vect_mask_constant_operand_p (stmt_vinfo))
vector_type = truth_type_for (stmt_vectype);
  else
vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), op_node);

So the function gave back a vector of mask types, which here are vectors
of *signed* booleans.  This meant that "<" gave:

  true (-1) < false (0)

and so the boolean fixup above was needed after all.

Fixed by making vectorizable_comparison also pick a mask type in this case.

Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
Approved by Richard in the PR.

Richard


2020-04-23  Richard Sandiford  

gcc/
PR tree-optimization/94727
* tree-vect-stmts.c (vectorizable_comparison): Use mask_type when
comparing invariant scalar booleans.

gcc/testsuite/
PR tree-optimization/94727
* gcc.dg/vect/pr94727.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr94727.c | 24 
 gcc/tree-vect-stmts.c   |  7 +--
 2 files changed, 29 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr94727.c

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 7f3a9fb5fb3..88a1e2c51d2 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -10566,8 +10566,11 @@ vectorizable_comparison (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
   /* Invariant comparison.  */
   if (!vectype)
 {
-  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1),
-slp_node);
+  if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
+   vectype = mask_type;
+  else
+   vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1),
+  slp_node);
   if (!vectype || maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits))
return false;
 }
diff --git a/gcc/testsuite/gcc.dg/vect/pr94727.c 
b/gcc/testsuite/gcc.dg/vect/pr94727.c
new file mode 100644
index 000..38408711345
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr94727.c
@@ -0,0 +1,24 @@
+/* { dg-additional-options "-O3" } */
+
+unsigned char a[16][32];
+long b[16][32];
+unsigned long c;
+_Bool d;
+
+void __attribute__((noipa))
+foo (void)
+{
+  for (int j = 0; j < 8; j++)
+for (int i = 0; i < 17; ++i)
+  b[j][i] = (a[j][i] < c) > d;
+}
+
+int
+main (void)
+{
+  c = 1;
+  foo ();
+  if (!b[0][0])
+__builtin_abort ();
+  return 0;
+}


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Eric Botcazou
> > > What is wrong with DF?
> > 
> > It's slow and memory hungry?
> 
> Very true, of course.  But can this be significantly better?

That's a good question worth investigating in my opinion, because DF didn't 
quite achieve its initial goal of replacing all the custom liveness analysis 
passes only because of its performance; the rest is quite neat.  But IIRC 
long-time GCC hackers were involved in it, so it was probably tuned already.

-- 
Eric Botcazou


Re: [PR93488] [OpenACC] ICE in type-cast 'async', 'wait' clauses

2020-04-23 Thread Andrew Stubbs

On 21/04/2020 10:54, Thomas Schwinge wrote:

Normalize GOACC_parallel_keyed async and wait parameters


Oh, not only 'GOACC_parallel_keyed', but also other OpenACC directives.
Maybe simply "[OpenACC] Avoid ICE in type-cast 'async', 'wait' clauses"?
;-P

To record the review effort, please include "Reviewed-by: Thomas Schwinge
" in the commit log, see
.


I've committed the attached.

Andrew
OpenACC: Avoid ICE in type-cast 'async', 'wait' clauses

2020-04-23  Andrew Stubbs  
	Thomas Schwinge  

	PR middle-end/93488

	gcc/
	* omp-expand.c (expand_omp_target): Use force_gimple_operand_gsi on
	t_async and the wait arguments.

	gcc/testsuite/
	* c-c++-common/goacc/pr93488.c: New file.

Reviewed-by: Thomas Schwinge 

diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index a642ccc9980..da1f4c39d18 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -8418,7 +8418,9 @@ expand_omp_target (struct omp_region *region)
 	  i_async));
 	  }
 	if (t_async)
-	  args.safe_push (t_async);
+	  args.safe_push (force_gimple_operand_gsi (&gsi, t_async, true,
+		NULL_TREE, true,
+		GSI_SAME_STMT));
 
 	/* Save the argument index, and ... */
 	unsigned t_wait_idx = args.length ();
@@ -8431,9 +8433,12 @@ expand_omp_target (struct omp_region *region)
 	for (; c; c = OMP_CLAUSE_CHAIN (c))
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
 	{
-	  args.safe_push (fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-		integer_type_node,
-		OMP_CLAUSE_WAIT_EXPR (c)));
+	  tree arg = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
+	   integer_type_node,
+	   OMP_CLAUSE_WAIT_EXPR (c));
+	  arg = force_gimple_operand_gsi (&gsi, arg, true, NULL_TREE, true,
+	  GSI_SAME_STMT);
+	  args.safe_push (arg);
 	  num_waits++;
 	}
 
diff --git a/gcc/testsuite/c-c++-common/goacc/pr93488.c b/gcc/testsuite/c-c++-common/goacc/pr93488.c
new file mode 100644
index 000..6fddad919d2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/pr93488.c
@@ -0,0 +1,22 @@
+/* PR middle-end/93488
+ 
+   Ensure that wait and async arguments can be cast to the correct type
+   without breaking gimple verification.  */
+
+void test()
+{
+  /* int */ unsigned char a = 1;
+  /* int */ unsigned char w = 1;
+
+#pragma acc parallel wait(w) async(a)
+  ;
+#pragma acc kernels wait(w) async(a)
+  ;
+#pragma acc serial wait(w) async(a)
+  ;
+  int data = 0;
+#pragma acc enter data wait(w) async(a) create(data)
+#pragma acc update wait(w) async(a) device(data)
+#pragma acc exit data wait(w) async(a) delete(data)
+#pragma acc wait(w) async(a)
+}


Re: [PATCH] aarch64: add tests for CPP predefines under -mgeneral-regs-only

2020-04-23 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
> Hi,
>
> I noticed that gcc.target/aarch64/pragma_cpp_predefs_1.c performs testing 
> for -mgeneral-regs-only.
> This adds similar testing in the following two tests to make sure CPP 
> predefines redefinitions on #pragma
> works as expected when -mgeneral-regs-only option is specified (See 
> PR94678):
> gcc.target/aarch64/pragma_cpp_predefs_2.c
> gcc.target/aarch64/pragma_cpp_predefs_3.c
>
> The two tests pass with the modification.  OK?
>
> gcc/testsuite/
> PR target/94678
> * gcc.target/aarch64/pragma_cpp_predefs_2.c: Fix typos, pop_pragma ->
> pop_options. Add tests for general-regs-only.
> * gcc.target/aarch64/pragma_cpp_predefs_3.c: Add tests for
> general-regs-only.

Thanks, pushed to master.

Richard


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Richard Sandiford
Jeff Law via Gcc-patches  writes:
> On Thu, 2020-04-23 at 15:07 +0200, Richard Biener wrote:
>> On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
>>  wrote:
>> > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
>> > > > > But being stuck with something means no progress...  I know
>> > > > > very well it's 100 times harder to get rid of something than to
>> > > > > add something new ontop.
>> > > > 
>> > > > Well, what progress do you expect to make?  After expand that is :-)
>> > > 
>> > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
>> > > no CSE, ...
>> > 
>> > RTL CSE for example is very much required to get any good code.  It
>> > needs to CSE stuff that wasn't there before expand.
>> 
>> Sure, but then we should fix that!
> Exactly.  It's purpose largely becomes dealing with the redundancies exposed 
> by
> expansion.  ie, address arithmetic and the like.   A lot of its path following
> code should be throttled back.

Agreed.  But things like address legitimisation and ensuring immediate
operands are in-range could be done in gimple too, and probably be
optimised more effectively and efficiently in SSA form than in RTL.
The ultimate question then wouldn't just be "does the target support
this optab?" but also "are these operands already legitimate for the
optab"?

I also wonder how difficult it would be to get recog to recognise
gimple :-)

Thanks,
Richard


Re: [PATCH] rs6000, Fix header comment for intrinsic function

2020-04-23 Thread will schmidt via Gcc-patches
On Wed, 2020-04-22 at 11:20 -0700, Carl Love wrote:
> GCC maintainers:
> 

Hi,


> The following is a trivial patch to fix a comment describing the
> intrinsic function _mm_movemask_epi8.  The comment was expanded to
> clarify the layout of the returned result. 

Something seems wrong there, see below.

> 
> The patch does not make any functional changes.
> 
> Please let me know if the patch is OK for mainline and backporting as
> appropriate.
> 
> Thanks.
> 
>  Carl Love
> ---
> rs6000, Fix header comment for intrinsic function _mm_movemask_epi8
> 
> gcc/ChangeLog
> 
> 2020-04-22  Carl Love  
> 
>   * config/rs6000/emmintrin.h (_mm_movemask_epi8): Fix comment
> for the
>   function.


drop /for the function/

> 
> Signed-off-by: Carl Love 
> ---
>  gcc/config/rs6000/emmintrin.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/emmintrin.h
> b/gcc/config/rs6000/emmintrin.h
> index 2462cf5bdac..0872a75c0de 100644
> --- a/gcc/config/rs6000/emmintrin.h
> +++ b/gcc/config/rs6000/emmintrin.h
> @@ -2033,7 +2033,9 @@ _mm_min_epu8 (__m128i __A, __m128i __B)
>  #ifdef _ARCH_PWR8
>  /* Intrinsic functions that require PowerISA 2.07 minimum.  */
> 
> -/* Creates a 4-bit mask from the most significant bits of the SPFP
> values.  */
> +/* Creates a 16-bit mask from the most significant bits of the
> sixteen 8-bit
> +   values.  The 16-bit result is placed in bits[48:63], bits [0:47]
> and
> +   bits [64:127] are  set to zero.  */


That function returns an int out of one of the elements of the result
vector.


extern __inline int __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_movemask_epi8 (__m128i __A)
{
...
  __vector unsigned long long result;
...

So the description of the intermediate __vector unsigned long long
variable, if actually needed, would fit better near the assignment to
result.

  result = ((__vector unsigned long long)
vec_vbpermq ((__vector unsigned char) __A,
 (__vector unsigned char) perm_mask));

But then you would probably want to clarify that the return is only one
of those long long vector elements. 

#ifdef __LITTLE_ENDIAN__
  return result[1];
#else
  return result[0];
#end
if


thanks,
-Will

>  extern __inline int __attribute__((__gnu_inline__,
> __always_inline__, __artificial__))
>  _mm_movemask_epi8 (__m128i __A)
>  {



Re: [PATCH] handle initialized flexible array members in __builtin_object_size [PR92815]

2020-04-23 Thread Jeff Law via Gcc-patches
On Wed, 2020-04-22 at 15:36 -0600, Martin Sebor via Gcc-patches wrote:
> When computing the size of an object with a flexible array member
> the object size pass doesn't consider that the initializer of such
> an object can result in its size being in excess of the size of
> the enclosing type.  As a result, stores into such objects by
> string functions causes false positive warnings and can abort
> at runtime.
> 
> The warnings are an old regression but as more of them make use
> of the object size results more of them are affected by the bug.
> The abort goes back to when support for _FORTIFY_SOURCE was added.
> 
> The same problem has already been independently fixed in GCC 10
> for -Warray-bounds which doesn't use the object size checking pass,
> but the object size bug still remains.  The attached patch corrects
> it as well.
> 
> Tested on x86_64-linux.
Do you need to change guarding condition to use decl_init_size instead of
DECL_SIZE_UNIT as well?

 else if (pt_var
   && DECL_P (pt_var)
   && tree_fits_uhwi_p (DECL_SIZE_UNIT (pt_var))
^^
   && tree_to_uhwi (DECL_SIZE_UNIT (pt_var)) < offset_limit)
^^
{
  *pdecl = pt_var;
  pt_var_size = DECL_SIZE_UNIT (pt_var);
}

Jeff



[PATCH] tsan: Add optional support for distinguishing volatiles

2020-04-23 Thread Marco Elver via Gcc-patches
Add support to optionally emit different instrumentation for accesses to
volatile variables. While the default TSAN runtime likely will never
require this feature, other runtimes for different environments that
have subtly different memory models or assumptions may require
distinguishing volatiles.

One such environment are OS kernels, where volatile is still used in
various places for various reasons, and often declare volatile to be
"safe enough" even in multi-threaded contexts. One such example is the
Linux kernel, which implements various synchronization primitives using
volatile (READ_ONCE(), WRITE_ONCE()). Here the Kernel Concurrency
Sanitizer (KCSAN) [1], is a runtime that uses TSAN instrumentation but
otherwise implements a very different approach to race detection from
TSAN.

While in the Linux kernel it is generally discouraged to use volatiles
explicitly, the topic will likely come up again, and we will eventually
need to distinguish volatile accesses [2]. The other use-case is
ignoring data races on specially marked variables in the kernel, for
example bit-flags (here we may hide 'volatile' behind a different name
such as 'no_data_race').

[1] https://github.com/google/ktsan/wiki/KCSAN
[2] 
https://lkml.kernel.org/r/CANpmjNOfXNE-Zh3MNP=-gmnhvKbsfUfTtWkyg_=vqtxs4nn...@mail.gmail.com

2020-04-23  Marco Elver  

gcc/
* params.opt: Define --param=tsan-distinguish-volatile=[0,1].
* sanitizer.def (BUILT_IN_TSAN_VOLATILE_READ1): Define new
builtin for volatile instrumentation of reads/writes.
(BUILT_IN_TSAN_VOLATILE_READ2): Likewise.
(BUILT_IN_TSAN_VOLATILE_READ4): Likewise.
(BUILT_IN_TSAN_VOLATILE_READ8): Likewise.
(BUILT_IN_TSAN_VOLATILE_READ16): Likewise.
(BUILT_IN_TSAN_VOLATILE_WRITE1): Likewise.
(BUILT_IN_TSAN_VOLATILE_WRITE2): Likewise.
(BUILT_IN_TSAN_VOLATILE_WRITE4): Likewise.
(BUILT_IN_TSAN_VOLATILE_WRITE8): Likewise.
(BUILT_IN_TSAN_VOLATILE_WRITE16): Likewise.
* tsan.c (get_memory_access_decl): Argument if access is
volatile. If param tsan-distinguish-volatile is non-zero, and
access if volatile, return volatile instrumentation decl.
(instrument_expr): Check if access is volatile.

gcc/testsuite/
* c-c++-common/tsan/volatile.c: New test.
---
 gcc/ChangeLog  | 19 +++
 gcc/params.opt |  4 ++
 gcc/sanitizer.def  | 21 
 gcc/testsuite/ChangeLog|  4 ++
 gcc/testsuite/c-c++-common/tsan/volatile.c | 62 ++
 gcc/tsan.c | 53 --
 6 files changed, 146 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/tsan/volatile.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5f299e463db..aa2bb98ae05 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,22 @@
+2020-04-23  Marco Elver  
+
+   * params.opt: Define --param=tsan-distinguish-volatile=[0,1].
+   * sanitizer.def (BUILT_IN_TSAN_VOLATILE_READ1): Define new
+   builtin for volatile instrumentation of reads/writes.
+   (BUILT_IN_TSAN_VOLATILE_READ2): Likewise.
+   (BUILT_IN_TSAN_VOLATILE_READ4): Likewise.
+   (BUILT_IN_TSAN_VOLATILE_READ8): Likewise.
+   (BUILT_IN_TSAN_VOLATILE_READ16): Likewise.
+   (BUILT_IN_TSAN_VOLATILE_WRITE1): Likewise.
+   (BUILT_IN_TSAN_VOLATILE_WRITE2): Likewise.
+   (BUILT_IN_TSAN_VOLATILE_WRITE4): Likewise.
+   (BUILT_IN_TSAN_VOLATILE_WRITE8): Likewise.
+   (BUILT_IN_TSAN_VOLATILE_WRITE16): Likewise.
+   * tsan.c (get_memory_access_decl): Argument if access is
+   volatile. If param tsan-distinguish-volatile is non-zero, and
+   access if volatile, return volatile instrumentation decl.
+   (instrument_expr): Check if access is volatile.
+
 2020-04-23  Srinath Parvathaneni  
 
* config/arm/arm_mve.h (__arm_vbicq_n_u16): Modify function parameter's
diff --git a/gcc/params.opt b/gcc/params.opt
index 4aec480798b..9b564bb046c 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -908,6 +908,10 @@ Stop reverse growth if the reverse probability of best 
edge is less than this th
 Common Joined UInteger Var(param_tree_reassoc_width) Param Optimization
 Set the maximum number of instructions executed in parallel in reassociated 
tree.  If 0, use the target dependent heuristic.
 
+-param=tsan-distinguish-volatile=
+Common Joined UInteger Var(param_tsan_distinguish_volatile) IntegerRange(0, 1) 
Param
+Emit special instrumentation for accesses to volatiles.
+
 -param=uninit-control-dep-attempts=
 Common Joined UInteger Var(param_uninit_control_dep_attempts) Init(1000) 
IntegerRange(1, 65536) Param Optimization
 Maximum number of nested calls to search for control dependencies during 
uninitialized variable analysis.
diff --git a/gcc/sanitizer.def b/gcc/sanitizer.def
index 11eb6467eba..a32715ddb92 100644
--- a/gcc/sanitizer.def
+++ b/

Re: [RFC][PR target PR90000] (rs6000) Compile time hog w/impossible asm constraint lra loop

2020-04-23 Thread will schmidt via Gcc-patches
On Wed, 2020-04-22 at 12:26 -0600, Jeff Law wrote:
> On Fri, 2020-04-10 at 16:40 -0500, will schmidt via Gcc-patches
> wrote:
> > [RFC][PR target/9] Compile time hog w/impossible asm constraint
> > lra loop
> > 
> > Hi,
> >   RFC for a bandaid/patch to partially address target PR/9.
> > 
> > This adds an escape condition from the forever loop where 
> > LRA gets stuck while attempting to handle constraints from an 
> > instruction that has previously suffered an impossible constraint
> > error.
> > 
> > This is somewhat inspired by MAX_RELOAD_INSNS_NUMBER as
> > seen in lra-constraints.c lra_constraints().   This utilizes the
> > existing counter variable lra_constraint_iter.
> > 
> > More needs to be done here, as this does replace a spin-forever
> > situation with an ICE.
> > 
> > Thanks
> > -Will
> > 
> > 
> > gcc/
> > 2020-04-10  Will Schmidt  
> > 
> > * lra.c: Add include of rtl-error.h.
> > (MAX_LRA_CONSTRAINT_PASSES): New define.
> > (lra): Add check of lra_constraint_iter value.
> 
> Doesn't this argue that there's some other datastructure that needs
> to be updated
> when we removed the impossible asm?

Yes, i think so.   I'm just not sure exactly what or where.
The submitted patch is minimally allowing for manageable-in-size reload
dumps for my continued debug.  :-)

There is an old patch that addressed what looks like a similar issue,
but i wasn't able to directly apply that to this situation without
failing in other places. 

>commit e86c0101ae59b32c3f10edcca78398cbf8848eaa
>Author: Steven Bosscher 
>Date:   Thu Jan 24 10:30:26 2013 +
>re PR inline-asm/55934 (LRA inline asm error recovery)

Which does a bit more, but at it's core is this:

+ PATTERN (insn) = gen_rtx_USE (VOIDmode, const0_rtx);
+ lra_set_insn_deleted (insn);


I suspect this particular scenario with the testcase is a dependency across
several 'insns', so marking just one as deleted is not enough.
(but i'm not sure,..

void foo (void)
{
  register float __attribute__ ((mode(SD))) r31 __asm__ ("r31");
  register float __attribute__ ((mode(SD))) fr1 __asm__ ("fr1");

  __asm__ ("#" : "=d" (fr1));
  r31 = fr1;
  __asm__ ("#" : : "r" (r31));
}

thanks
-Will

> 
> Jeff
> > 
> 
> 



Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Jeff Law via Gcc-patches
On Thu, 2020-04-23 at 16:32 +0100, Richard Sandiford wrote:
> Jeff Law via Gcc-patches  writes:
> > On Thu, 2020-04-23 at 15:07 +0200, Richard Biener wrote:
> > > On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
> > >  wrote:
> > > > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > > > But being stuck with something means no progress...  I know
> > > > > > > very well it's 100 times harder to get rid of something than to
> > > > > > > add something new ontop.
> > > > > > 
> > > > > > Well, what progress do you expect to make?  After expand that is :-)
> > > > > 
> > > > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > > > no CSE, ...
> > > > 
> > > > RTL CSE for example is very much required to get any good code.  It
> > > > needs to CSE stuff that wasn't there before expand.
> > > 
> > > Sure, but then we should fix that!
> > Exactly.  It's purpose largely becomes dealing with the redundancies exposed
> > by
> > expansion.  ie, address arithmetic and the like.   A lot of its path
> > following
> > code should be throttled back.
> 
> Agreed.  But things like address legitimisation and ensuring immediate
> operands are in-range could be done in gimple too, and probably be
> optimised more effectively and efficiently in SSA form than in RTL.
Yea, but once exposed we'd probably need some kind of way to prevent something
like an out-of-range immediate from being propagated back into the statement. 
Certainly do-able though.  Maybe significantly easier than resurrecting RTL-SSA.

> The ultimate question then wouldn't just be "does the target support
> this optab?" but also "are these operands already legitimate for the
> optab"?
> 
> I also wonder how difficult it would be to get recog to recognise
> gimple :-)
:-)  You know better than I...  The more structured nature of gimple probably
helps that effort. 

jeff
> 



[PATCH] rs6000: Replace outdated link to ELFv2 ABI

2020-04-23 Thread Bill Schmidt via Gcc-patches
A user reported that we are still referring to a public review
draft of the ELFv2 ABI specification.  Replace that by a permalink.

Tested with "make pdf" and verified the link is hot.  Is this okay
for master?

Thanks,
Bill

2020-04-24  Bill Schmidt  

* gcc/doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
Replace outdated link to ELFv2 ABI.
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c17b1040bde..936c22e2fe7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -17633,7 +17633,7 @@ subject to change without notice.
 
 GCC complies with the OpenPOWER 64-Bit ELF V2 ABI Specification,
 which may be found at
-@uref{http://openpowerfoundation.org/wp-content/uploads/resources/leabi-prd/content/index.html}.
+@uref{https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture}.
 Appendix A of this document lists the vector API interfaces that must be
 provided by compliant compilers.  Programmers should preferentially use
 the interfaces described therein.  However, historically GCC has provided
-- 
2.17.1



Re: [AMD GCN] Use 'radeon' for the environment variable 'ACC_DEVICE_TYPE'

2020-04-23 Thread Andrew Stubbs

On 21/04/2020 13:24, Thomas Schwinge wrote:

I wondered whether for symmetry, the GCC-internal 'GOMP_DEVICE_GCN',
'OFFLOAD_TARGET_TYPE_GCN' should also be renamed to '*_RADEON'?  Or,
going by example of '*_NVIDIA_PTX', name them '*_AMD_GCN'.  Or, in fact
then leave them as '*_GCN', given Julian's point in another thread that
"CPU architectures or core implementations [sometimes shift] company
allegiance".  Thank you for listening to me thinking aloud.  ;-)


I don't think the GCC internal names need to change. The port name is 
now "gcn", and the target name "amdgcn". I don't think we need a third 
name for it!



But more importantly, what about the user-visible 'gcn' in the
'ACC_DEVICE_TYPE' environment variable?  As I'd quoted in
:

| Per OpenACC 3.0, A.1.2. "AMD
| GPU Targets", for example, there is 'acc_device_radeon'

... which you've addressed...

| (and "the
| case-insensitive name 'radeon' for the environment variable
| 'ACC_DEVICE_TYPE'").

..., but this not yet?


Oh, right, the user-visible case ought to accept "radeon", at least. :-(


Please see the attached "[AMD GCN] Use 'radeon' for the environment
variable 'ACC_DEVICE_TYPE'", completely untested.  Will you please test
and review that?  If approving this patch, please respond with
"Reviewed-by: NAME " so that your effort will be recorded in the
commit log, see .


This is OK, assuming it tests OK.

Andrew


Re: [PATCH v2] aarch64: Add TX3 machine model

2020-04-23 Thread Anton Youdkevitch

Hi Kyrylo,

On 23.4.2020 11:29 , Kyrylo Tkachov wrote:

Hi Anton,

Thanks to you and Joel for clarifying the copyright assignment...


-Original Message-
From: Gcc-patches  On Behalf Of Anton
Youdkevitch
Sent: 20 April 2020 19:29
To: gcc-patches@gcc.gnu.org
Cc: jo...@marvell.com
Subject: [PATCH v2] aarch64: Add TX3 machine model

Here is the patch introducing thunderxt311 maching model
for the scheduler. A name for the new chip was added to the
list of the names to be recognized as a valid parameter for mcpu
and mtune flags. The TX2 cost model was reused for TX3.

The previously used "cryptic" name for the command line
parameter is replaced with the same "thunderxt311" name.

Bootstrapped on AArch64.

2020-04-20 Anton Youdkevitch 

 * config/aarch64/aarch64-cores.def: Add the chip name.
 * config/aarch64/aarch64-tune.md: Regenerated.
 * gcc/config/aarch64/aarch64.c: Add the cost tables for the chip.
 * gcc/config/aarch64/thunderx3t11.md: New file: add the new
 machine model for the scheduler
 * gcc/config/aarch64/aarch64.md: Include the new model.

No "gcc/" in the path here.
Also, please add an entry in the documentation in doc/invoke.texi for the new 
option.

Yes, sure, I missed that.
Will correct.


---
  gcc/config/aarch64/aarch64-cores.def |   3 +
  gcc/config/aarch64/aarch64-tune.md   |   2 +-
  gcc/config/aarch64/aarch64.c |  27 +
  gcc/config/aarch64/aarch64.md|   1 +
  gcc/config/aarch64/thunderx3t11.md   | 686 +++
  5 files changed, 718 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index ea9b98b..ece6c34 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -95,6 +95,9 @@ AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  
AARCH64_FL_FOR_ARCH8_1 | AA
  /* Cavium ('C') cores. */
  AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  
AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1)
  
+/* Cavium ('??') cores (TX3). */

+AARCH64_CORE("thunderx3t11",  thunderx3t11,  thunderx3t11, 8_1A,  
AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx3t11, 0x43, 0x0b8, 0x0a)
+

I appreciate this is early CPU enablement and documentation is not always ready, but 
would it be better to use a "Marvell cores" comment above that entry? Up to you.
The more important thing is the architecture features enabled. The entry here 
means it's an Armv8.1-a CPU (with crypto).
 From what I can find on the Internet [1] this CPU has Armv8.3-a features. Can 
you please double-check that and update the flags here, if necessary?
It would be a shame to miss out on architecture enablement for 
-mcpu=thunderx3t11 due to a flag mismatch.

No, I was using the wrong version. It has to be corrected.

Thanks a lot for you comments.

--
  Anton


Re: [PATCH PR94708] rtl combine should consider NaNs when generate fp min/max

2020-04-23 Thread Segher Boessenkool
Hi!

On Thu, Apr 23, 2020 at 02:34:03PM +, Zhanghaijian (A) wrote:
> Thanks for your suggestions. I have modified accordingly.
> Attached please find the adapted patch. Bootstrap and tested on aarch64 Linux 
> platform.
> Does the v2 patch look batter?
> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index cff76cd3303..ad8a385fc48 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -6643,7 +6643,8 @@ simplify_if_then_else (rtx x)
>  
>/* Look for MIN or MAX.  */
>  
> -  if ((! FLOAT_MODE_P (mode) || flag_unsafe_math_optimizations)
> +  if ((! FLOAT_MODE_P (mode)
> +   || (!HONOR_NANS (mode) && !HONOR_SIGNED_ZEROS (mode)))
>&& comparison_p
>&& rtx_equal_p (XEXP (cond, 0), true_rtx)
>&& rtx_equal_p (XEXP (cond, 1), false_rtx)

> > The GENERIC folding routine producing
> > min/max is avoiding it when those are honored (and it doesn't check 
> > flag_unsafe_math_optmizations at all).
> > 
> > Certainly the patch is an incremental correct fix, with the flag 
> > testing replaced by the mode feature testing.
> 
> Yeah, and the SMAX etc. definition is so weak that it isn't obvious that this 
> combine transform is valid without this flag.  We can or should fix that, of 
> course :-)

Please put flag_unsafe_math_optimizations back?  It isn't clear at all
that we do not need it.

Also, do you have a changelog entry?


Segher


Re: introduce target tmpnam and require it in tests relying on it

2020-04-23 Thread Martin Sebor via Gcc-patches

On 4/23/20 2:21 AM, Alexandre Oliva wrote:

On Apr 21, 2020, Bernhard Reutner-Fischer  wrote:


On 17 April 2020 21:21:41 CEST, Martin Sebor via Gcc-patches
 wrote:

On 4/17/20 11:48 AM, Alexandre Oliva wrote:

On Apr  9, 2020, Alexandre Oliva  wrote:


Some target C libraries that aren't recognized as freestanding don't
have filesystem support, so calling tmpnam, fopen/open and
remove/unlink fails to link.



This patch introduces a tmpnam effective target to the testsuite,

and

requires it in the tests that call tmpnam.



for  gcc/testsuite/ChangeLog



* lib/target-supports.exp (check_effective_target_tmpnam): New.
* gcc.c-torture/execute/fprintf-2.c: Require it.
* gcc.c-torture/execute/printf-2.c: Likewise.
* gcc.c-torture/execute/user-printf.c: Likewise.


Ping?

https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543672.html


I'm okay with the changes to the tests.

The target-supports.exp changes look reasonable to me as well but
I can't approve them.  Since you said it's for targets that don't
have file I/O functions I wonder if the name would better reflect
that if it were called, say, check_effective_target_fileio?



If you want a fileio predicate then please do not keys it off obsolescent 
functions.


I'd actually considered adding two expect/dejagnu procs, one for fileio,
one for tmpnam, possibly with the latter depending on the former, but
decided to take the simpler path on the grounds that all tests that
would have depended on fileio would also depend on tmpnam.

Plus, it did seem to make sense to test for tmpnam, since it probably
won't be found on freestanding environments (the affected tests require
non-freestanding effective target, but that translate to requiring I/O
support), and tmpnam might be removed from standards in the future.  We
might want to catch that, rather than silently skip the test, though.

I'd be glad to add an intermediate fileio effective target, or rename
the proposed one and drop tmpnam from it, if there's agreement such a
separate effective target would be more useful.


So, should I rename _tmpnam to _fileio and drop tmpnam() from the code
snippet in the effective target test?  Or should I keep _tmpnam and
introduce _fileio?  With or without a dependency of _tmpnam on _fileio?

Since Jeff Law approved the patch as is, would you guys mind if I make
any further changes as separate, followup patches?


Sure.  I'd go with _fileio but that's just a suggestion.  I don't
think there are enough uses of tmpnam in the test suite or risk
that it will disappear anytime soon to justify its own target test
or removing its uses, but I'm not opposed to it either.

Martin


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Segher Boessenkool
Hi!

On Thu, Apr 23, 2020 at 08:29:34AM -0600, Jeff Law wrote:
> On Thu, 2020-04-23 at 07:07 -0500, Segher Boessenkool wrote:
> > Most of expand is *other stuff*.  Expand does a *lot* of things that are
> > actually changing the code.  And much of that is not done anywhere else
> > either yet, so this cannot be fixed by simply deleting the offending code.
> A lot of what is done by the expansion pass is historical and dates back to 
> when
> we never optimized more than a statement at a time for trees and the real 
> heavy
> lifting was all done in RTL.

Yes.

> THe introduction of tree-ssa meant that all the expansion code which wanted to
> see a "nice" complex statement and generate good RTL code was useless and as a
> result we saw significant regressions in the end code quality.
> 
> That in turn brought in TER who's sole purpose was to reconstruct those more
> complex trees for the purposes of improving initial expansion.  I think 
> match.pd
> has the potential to make TER go away as match.pd is a generic combining
> framework.

Ooh that would be great news as well!

For moving stuff out of expand, it needs to *move*, not just be deleted.

> WRT extending SSA deeper, I'm all for it and it's always been something I 
> wanted
> to see happen.I've always believed we could do RTL SSA to the register
> allocation phase and that doing so would be a positive.

Well, since we have hard regs as well, it cannot really be SSA?  And SSA
doesn't fit RTL all that well anyway?  But the "webs always" idea is the
same of course.

> If we start at the front
> of the RTL pipeline and work towards the back we bump into CSE quickly, which
> would be good.  Our CSE is awful across multiple axis.

RTL CSE does quite many things that aren't really CSE...  Have to figure
out for each whether it still is useful, and what to do with it then.

> I suspect most RTL passes
> would be reimplementations rather than bolting SSA on the side and I suspect
> those reimplementations would be simpler than the existing stuff -- I'm a 
> strong
> believer there's a lot of dead code in the RTL passes as well.

Sure, but there also is a lot of stuff that perhaps is historical, and
perhaps is badly architected, but still does beneficial things.

If a reimplementation hugely improves things it's not bad in the end to
lose all such little things, but otherwise?


Segher


[wwwdocs] Remove form for (un)subscribing from old mailing lists

2020-04-23 Thread Jonathan Wakely via Gcc-patches
This no longer works, so direct people to the mailman listinfo pages
instead.

OK to commit to wwwdocs?


commit 2ae426c876e6ddc026bc2a30e82cb56946c9031f
Author: Jonathan Wakely 
Date:   Thu Apr 23 18:38:00 2020 +0100

Remove form for (un)subscribing from old mailing lists

This no longer works, so direct people to the mailman listinfo pages
instead.

diff --git a/htdocs/lists.html b/htdocs/lists.html
index b05e1712..ea37aedc 100644
--- a/htdocs/lists.html
+++ b/htdocs/lists.html
@@ -194,46 +194,9 @@ it's assumed that no-one on these lists means to speak for 
their company.
 
 Subscribing/unsubscribing
 
-You will be able to subscribe or unsubscribe from any of the GCC mailing
-lists via this form:
-
-
-
-
-https://gcc.gnu.org/cgi-bin/ml-request";>
-  Mailing list: 
-  
-   gcc-announce
-   gcc
-   gcc-help
-   gcc-bugs
-   gcc-patches
-   gcc-testresults
-   gcc-cvs
-   gcc-cvs-wwwdocs
-   gcc-regression
-   libstdc++
-   libstdc++-cvs
-   fortran
-   jit
-   gnutools-advocacy
-  
- 
-  Your e-mail address:
-  
- 
-  Digest version
-   
-  
-subscribe
-unsubscribe
-  
-   
- 
-
-
+You can subscribe or unsubscribe from any of the GCC mailing
+lists by clicking on the list name above and then following the
+"more information about this list" link.
 
 If you're having trouble getting off a list, look at the
 List-Unsubscribe: header on a message sent to that list.


[PATCH] tree: Fix up get_narrower [PR94724]

2020-04-23 Thread Jakub Jelinek via Gcc-patches
Hi!

In the recent get_narrower change, I wanted it to be efficient and avoid
recursion if there are many nested COMPOUND_EXPRs.  That builds the
COMPOUND_EXPR nest with the right arguments, but as build2_loc computes some
flags like TREE_SIDE_EFFECTS, TREE_CONSTANT and TREE_READONLY, when it
is called with something that will not be the argument in the end, those
flags are computed incorrectly.
So, this patch instead uses an auto_vec and builds them in the reverse order
so when they are built, they are built with the correct operands.

Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

2020-04-23  Jakub Jelinek  

PR middle-end/94724
* tree.c (get_narrower): Instead of creating COMPOUND_EXPRs
temporarily with non-final second operand and updating it later,
push COMPOUND_EXPRs into a vector and process it in reverse,
creating COMPOUND_EXPRs with the final operands.

* gcc.c-torture/execute/pr94724.c: New test.

--- gcc/tree.c.jj   2020-04-04 09:14:29.808002636 +0200
+++ gcc/tree.c  2020-04-23 11:07:34.003675831 +0200
@@ -8881,18 +8881,22 @@ get_narrower (tree op, int *unsignedp_pt
 
   if (TREE_CODE (op) == COMPOUND_EXPR)
 {
-  while (TREE_CODE (op) == COMPOUND_EXPR)
+  do
op = TREE_OPERAND (op, 1);
+  while (TREE_CODE (op) == COMPOUND_EXPR);
   tree ret = get_narrower (op, unsignedp_ptr);
   if (ret == op)
return win;
-  op = win;
-  for (tree *p = &win; TREE_CODE (op) == COMPOUND_EXPR;
-  op = TREE_OPERAND (op, 1), p = &TREE_OPERAND (*p, 1))
-   *p = build2_loc (EXPR_LOCATION (op), COMPOUND_EXPR,
-TREE_TYPE (ret), TREE_OPERAND (op, 0),
-ret);
-  return win;
+  auto_vec  v;
+  unsigned int i;
+  for (tree op = win; TREE_CODE (op) == COMPOUND_EXPR;
+  op = TREE_OPERAND (op, 1))
+   v.safe_push (op);
+  FOR_EACH_VEC_ELT_REVERSE (v, i, op)
+   ret = build2_loc (EXPR_LOCATION (op), COMPOUND_EXPR,
+ TREE_TYPE (win), TREE_OPERAND (op, 0),
+ ret);
+  return ret;
 }
   while (TREE_CODE (op) == NOP_EXPR)
 {
--- gcc/testsuite/gcc.c-torture/execute/pr94724.c.jj2020-04-23 
11:11:52.470736940 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr94724.c   2020-04-23 
11:14:27.999367103 +0200
@@ -0,0 +1,12 @@
+/* PR middle-end/94724 */
+
+short a, b;
+
+int
+main ()
+{
+  (0, (0, (a = 0 >= 0, b))) != 53601;
+  if (a != 1)
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: [Version 2][PATCH][gcc][PR94230]provide an option to change the size limitation for -Wmisleading-indent

2020-04-23 Thread Richard Sandiford
Qing Zhao  writes:
> ---
>  gcc/c-family/c-indentation.c   |  3 +++
>  gcc/common.opt |  5 +
>  gcc/doc/invoke.texi| 15 ++-
>  gcc/testsuite/gcc.dg/plugin/location-overflow-test-1.c |  2 +-
>  gcc/toplev.c   |  3 +++
>  5 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c
> index f737555..7074b10 100644
> --- a/gcc/c-family/c-indentation.c
> +++ b/gcc/c-family/c-indentation.c
> @@ -67,6 +67,9 @@ get_visual_column (expanded_location exploc, location_t loc,
> "%<-Wmisleading-indentation%> is disabled from this point"
> " onwards, since column-tracking was disabled due to"
> " the size of the code/headers");
> +   inform (loc,
> +   "please add %<-flarge-source-files%> to invoke more" 
> +   " column-tracking for large source files");
>   }
>return false;
>  }

This should be conditional on !flag_large_source_files.

> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4368910..10a3d5b 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1597,6 +1597,11 @@ fkeep-gc-roots-live
>  Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
>  ; Always keep a pointer to a live memory block
>  
> +flarge-source-files
> +Common Report Var(flag_large_source_files) Init(0)
> +Adjust GCC to cope with large source files to provide more accurate
> +column information.
> +

I'm having difficulty suggesting wording here, but I think would be good
to mention the downside.  How about:

--
Improve GCC's ability to track column numbers in large source files,
at the expense of slower compilation.
--

>  floop-parallelize-all
>  Common Report Var(flag_loop_parallelize_all) Optimization
>  Mark all loops as parallel.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 96a9516..c6ea9ef 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -574,7 +574,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fdebug-cpp  -fdirectives-only  -fdollars-in-identifiers  @gol
>  -fexec-charset=@var{charset}  -fextended-identifiers  @gol
>  -finput-charset=@var{charset}  -fmacro-prefix-map=@var{old}=@var{new}  @gol
> --fmax-include-depth=@var{depth} @gol
> +-fmax-include-depth=@var{depth} -flarge-source-files @gol
>  -fno-canonical-system-headers  -fpch-deps  -fpch-preprocess  @gol
>  -fpreprocessed  -ftabstop=@var{width}  -ftrack-macro-expansion  @gol
>  -fwide-exec-charset=@var{charset}  -fworking-directory @gol

This should be kept in alphabetical order, so after -finput-charset.

> @@ -14151,6 +14151,19 @@ This option may be useful in conjunction with the 
> @option{-B} or
>  perform additional processing of the program source between
>  normal preprocessing and compilation.
>  
> +@item -flarge-source-files
> +@opindex flarge-source-files
> +Adjust GCC to cope with large source files to provide more accurate
> +column information. 
> +By default, GCC will lose accurate column information if the source 
> +file is very large.
> +If this option is provided, GCC will adapt accordingly to provide more
> +accurate column information. 
> +This option may be useful when any user hits the 
> @option{-Wmisleading-indent} 
> +note, "is disabled from this point onwards, since column-tracking was 
> disabled 
> +due to the size of the code/headers", or hits the limit at which columns get
> +dropped from diagnostics.
> +

On a similar vein, how about:

--
Adjust GCC to expect large source files, at the expense of slower
compilation and higher memory usage.

Specifically, GCC normally tracks both column numbers and line numbers
within source files and it normally prints both of these numbers in
diagnostics.  However, once it has processed a certain number of source
lines, it stops tracking column numbers and only tracks line numbers.
This means that diagnostics for later lines do not include column numbers.
It also means that options like @option{-Wmisleading-indent} cease to work
at that point, although the compiler prints a note if this happens.
Passing @option{-flarge-source-files} significantly increases the number
of source lines that GCC can process before it stops tracking column
numbers.
--

Thanks,
Richard


Re: [patch, fortran] Fix PR 93956, wrong pointer when returned via function

2020-04-23 Thread Thomas Koenig via Gcc-patches

Hi Paul,


You didn't attach the testcase but never mind, I am sure that it is OK :-)


You're right. I thought I had it in the git diff, but then again, I am
still learning the niceties (and not-so-niceties) of git.

Test case is attached, for completeness.


OK for trunk and, if you feel like it, for 9-branch.


Yes, I think this deserves a backport.

Thanks for the review!

Best regards

Thomas
! { dg-do run }
! PR 93956 - span was set incorrectly, leading to wrong code.
! Original test case by "martin".
program array_temps
  implicit none
  
  type :: tt
 integer :: u = 1
 integer :: v = 2
  end type tt

  type(tt), dimension(:), pointer :: r
  integer :: n
  integer, dimension(:), pointer :: p, q, u

  n = 10
  allocate(r(1:n))
  call foo(r%v,n)
  p => get(r(:))
  call foo(p, n)
  call get2(r,u)
  call foo(u,n)
  q => r%v
  call foo(q, n)

deallocate(r)

contains

   subroutine foo(a, n)
  integer, dimension(:), intent(in) :: a
  integer, intent(in) :: n
  if (sum(a(1:n)) /= 2*n) stop 1
   end subroutine foo

   function get(x) result(q)
  type(tt), dimension(:), target, intent(in) :: x
  integer, dimension(:), pointer :: q
  q => x(:)%v
   end function get

   subroutine get2(x,q)
  type(tt), dimension(:), target, intent(in) :: x
  integer, dimension(:), pointer, intent(out) :: q
  q => x(:)%v
end subroutine get2
end program array_temps


Re: [PATCH] c++: Explicit constructor called in copy-initialization [PR90320]

2020-04-23 Thread Jason Merrill via Gcc-patches

On 4/22/20 11:27 PM, Marek Polacek wrote:

This test is rejected with a bogus "use of deleted function" error
starting with r225705 whereby convert_like_real/ck_base no longer
sets LOOKUP_ONLYCONVERTING for user_conv_p conversions.  This does
not seem to be always correct.  To recap, when we have something like
T t = x where T is a class type and the type of x is not T or derived
from T, we perform copy-initialization, something like:
   1. choose a user-defined conversion to convert x to T, the result is
  a prvalue,
   2. use this prvalue to direct-initialize t.

In the second step, explicit constructors should be considered, since
we're direct-initializing.  This is what r225705 fixed.

In this PR we are dealing with the first step, I think, where explicit
constructors should be skipped.  [over.match.copy] says "The converting
constructors of T are candidate functions" which clearly eliminates
explicit constructors.  But we also have to copy-initialize the argument
we are passing to such a converting constructor, and here we should
disregard explicit constructors too.

In this testcase we have

   V v = m;

and we choose V::V(M) to convert m to V.  But we wrongly choose
the explicit M::M(M&) to copy-initialize the argument; it's
a better match for a non-const lvalue than the implicit M::M(const M&)
but because it's explicit, we shouldn't use it.

When convert_like is processing the ck_user conversion -- the convfn is
V::V(M) -- it can see that cand->flags contains LOOKUP_ONLYCONVERTING,
but then when we're in build_over_call for this convfn, we have no way
to pass the flag to convert_like for the argument 'm', because convert_like
doesn't take flags.  So I've resorted to setting need_temporary_p in
a ck_rvalue, thus far unused, to signal that we're only interested in
non-explicit constructors.

LOOKUP_COPY_PARM looks relevant, but again, it's a LOOKUP_* flag, so
can't pass it to convert_like.  DR 899 also seemed related, but that
deals with direct-init contexts only.

Does this make sense?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/90320
* call.c (standard_conversion): Set need_temporary_p if FLAGS demands
LOOKUP_ONLYCONVERTING.
(convert_like_real) : If a ck_rvalue has
need_temporary_p set, or LOOKUP_ONLYCONVERTING into FLAGS.

* g++.dg/cpp0x/explicit13.C: New test.
---
  gcc/cp/call.c   | 24 +---
  gcc/testsuite/g++.dg/cpp0x/explicit13.C | 14 ++
  2 files changed, 31 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/explicit13.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index c58231601c9..d802f1a0c2f 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -92,10 +92,10 @@ struct conversion {
   language standards, e.g. disregarding pointer qualifiers or
   converting integers to pointers.  */
BOOL_BITFIELD bad_p : 1;
-  /* If KIND is ck_ref_bind ck_base_conv, true to indicate that a
+  /* If KIND is ck_ref_bind or ck_base, true to indicate that a
   temporary should be created to hold the result of the
- conversion.  If KIND is ck_ambig or ck_user, true means force
- copy-initialization.  */
+ conversion.  If KIND is ck_ambig, ck_rvalue, or ck_user, true means
+ force copy-initialization.  */
BOOL_BITFIELD need_temporary_p : 1;
/* If KIND is ck_ptr or ck_pmem, true to indicate that a conversion
   from a pointer-to-derived to pointer-to-base is being performed.  */
@@ -1252,6 +1252,8 @@ standard_conversion (tree to, tree from, tree expr, bool 
c_cast_p,
if (flags & LOOKUP_PREFER_RVALUE)
/* Tell convert_like_real to set LOOKUP_PREFER_RVALUE.  */
conv->rvaluedness_matches_p = true;
+  if (flags & LOOKUP_ONLYCONVERTING)
+   conv->need_temporary_p = true;


Presumably we want the same thing for ck_base?


@@ -7654,10 +7656,18 @@ convert_like_real (conversion *convs, tree expr, tree 
fn, int argnum,
 destination [is treated as direct-initialization].  [dcl.init] */
flags = LOOKUP_NORMAL;
if (convs->user_conv_p)
-   /* This conversion is being done in the context of a user-defined
-  conversion (i.e. the second step of copy-initialization), so
-  don't allow any more.  */
-   flags |= LOOKUP_NO_CONVERSION;
+   {
+ /* This conversion is being done in the context of a user-defined
+conversion (i.e. the second step of copy-initialization), so
+don't allow any more.  */
+ flags |= LOOKUP_NO_CONVERSION;
+ /* But we might also be performing a conversion of the argument
+to the user-defined conversion, i.e., not a conversion of the
+result of the user-defined conversion.  In which case we skip
+explicit constructors.  */
+ if (convs->kind == ck_rvalue && convs->need_temporary_p)
+   flags |= LOOKUP_ONLYCONVERTING;
+   }
el

Re: [PATCH] Support the new ("v0") mangling scheme in rust-demangle.

2020-04-23 Thread Eduard-Mihai Burtescu
Ping 4: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542012.html

Thanks,
- Eddy B.

On Mon, Apr 13, 2020, at 05:52, Eduard-Mihai Burtescu wrote:
> Ping 3: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542012.html
> 
> Thanks,
> - Eddy B.
> 
> On Tue, Apr 7, 2020, at 00:52, Eduard-Mihai Burtescu wrote:
> > Ping 2: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542012.html
> > 
> > Thanks,
> > - Eddy B.
> > 
> > On Fri, Mar 13, 2020, at 10:28 PM, Eduard-Mihai Burtescu wrote:
> > > This is the libiberty (mainly for binutils/gdb) counterpart of
> > > https://github.com/alexcrichton/rustc-demangle/pull/23.
> > > 
> > > Relevant links for the new Rust mangling scheme (aka "v0"):
> > > * Rust RFC: https://github.com/rust-lang/rfcs/pull/2603
> > > * tracking issue: https://github.com/rust-lang/rust/issues/60705
> > > * implementation: https://github.com/rust-lang/rust/pull/57967
> > > 
> > > This implementation includes full support for UTF-8 identifiers
> > > via punycode, so I've included a testcase for that as well.
> > > (Let me know if it causes any issues and I'll take it out)
> > > 
> > > Last year I've submitted several small patches to rust-demangle
> > > in preparation for upstreaming the entire new demangler, and
> > > feedback from that has proven useful.
> > > For example, I started with error-handling macros, but instead
> > > the code now has "rdm->errored = 1;" before several returns/gotos.
> > > 
> > > The patch is attached instead of inline, as it's over 1000 lines long.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > 
> > > Also, I have no commit access, so I'd be thankful if
> > > someone would commit this for me if/once approved.
> > > Attachments:
> > > * 0001-Support-the-new-v0-mangling-scheme-in-rust-demangle.patch


Re: [PATCH] tree: Fix up get_narrower [PR94724]

2020-04-23 Thread Richard Biener
On April 23, 2020 8:19:55 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>In the recent get_narrower change, I wanted it to be efficient and
>avoid
>recursion if there are many nested COMPOUND_EXPRs.  That builds the
>COMPOUND_EXPR nest with the right arguments, but as build2_loc computes
>some
>flags like TREE_SIDE_EFFECTS, TREE_CONSTANT and TREE_READONLY, when it
>is called with something that will not be the argument in the end,
>those
>flags are computed incorrectly.
>So, this patch instead uses an auto_vec and builds them in the reverse
>order
>so when they are built, they are built with the correct operands.
>
>Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

OK. 

Richard. 

>2020-04-23  Jakub Jelinek  
>
>   PR middle-end/94724
>   * tree.c (get_narrower): Instead of creating COMPOUND_EXPRs
>   temporarily with non-final second operand and updating it later,
>   push COMPOUND_EXPRs into a vector and process it in reverse,
>   creating COMPOUND_EXPRs with the final operands.
>
>   * gcc.c-torture/execute/pr94724.c: New test.
>
>--- gcc/tree.c.jj  2020-04-04 09:14:29.808002636 +0200
>+++ gcc/tree.c 2020-04-23 11:07:34.003675831 +0200
>@@ -8881,18 +8881,22 @@ get_narrower (tree op, int *unsignedp_pt
> 
>   if (TREE_CODE (op) == COMPOUND_EXPR)
> {
>-  while (TREE_CODE (op) == COMPOUND_EXPR)
>+  do
>   op = TREE_OPERAND (op, 1);
>+  while (TREE_CODE (op) == COMPOUND_EXPR);
>   tree ret = get_narrower (op, unsignedp_ptr);
>   if (ret == op)
>   return win;
>-  op = win;
>-  for (tree *p = &win; TREE_CODE (op) == COMPOUND_EXPR;
>- op = TREE_OPERAND (op, 1), p = &TREE_OPERAND (*p, 1))
>-  *p = build2_loc (EXPR_LOCATION (op), COMPOUND_EXPR,
>-   TREE_TYPE (ret), TREE_OPERAND (op, 0),
>-   ret);
>-  return win;
>+  auto_vec  v;
>+  unsigned int i;
>+  for (tree op = win; TREE_CODE (op) == COMPOUND_EXPR;
>+ op = TREE_OPERAND (op, 1))
>+  v.safe_push (op);
>+  FOR_EACH_VEC_ELT_REVERSE (v, i, op)
>+  ret = build2_loc (EXPR_LOCATION (op), COMPOUND_EXPR,
>+TREE_TYPE (win), TREE_OPERAND (op, 0),
>+ret);
>+  return ret;
> }
>   while (TREE_CODE (op) == NOP_EXPR)
> {
>--- gcc/testsuite/gcc.c-torture/execute/pr94724.c.jj   2020-04-23
>11:11:52.470736940 +0200
>+++ gcc/testsuite/gcc.c-torture/execute/pr94724.c  2020-04-23
>11:14:27.999367103 +0200
>@@ -0,0 +1,12 @@
>+/* PR middle-end/94724 */
>+
>+short a, b;
>+
>+int
>+main ()
>+{
>+  (0, (0, (a = 0 >= 0, b))) != 53601;
>+  if (a != 1)
>+__builtin_abort ();
>+  return 0;
>+}
>
>   Jakub



[PATCH] Shortcut identity VEC_PERM expansion [PR94710]

2020-04-23 Thread Jakub Jelinek via Gcc-patches
Hi!

This PR is about the rs6000 backend emitting wrong assembly
for whole vector shift by 0, and while I think it is desirable
to fix the backend, I don't see a point why the expander should
try to emit that, whole vector shift by 0 is identity, we can just
return the operand.

Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

2020-04-23  Jakub Jelinek  

PR target/94710
* optabs.c (expand_vec_perm_const): For shift_amt const0_rtx
just return v2.

--- gcc/optabs.c.jj 2020-04-17 14:18:44.380437703 +0200
+++ gcc/optabs.c2020-04-23 11:50:07.931780323 +0200
@@ -5627,6 +5627,8 @@ expand_vec_perm_const (machine_mode mode
   if (shift_amt)
{
  class expand_operand ops[3];
+ if (shift_amt == const0_rtx)
+   return v2;
  if (shift_code != CODE_FOR_nothing)
{
  create_output_operand (&ops[0], target, mode);

Jakub



[PATCH v3] aarch64: Add TX3 machine model

2020-04-23 Thread Anton Youdkevitch
Here is the patch introducing thunderxt311 maching model
for the scheduler. A name for the new chip was added to the
list of the names to be recognized as a valid parameter for mcpu
and mtune flags. The TX2 cost model was reused for TX3.

The previously used "cryptic" name for the command line
parameter is replaced with the same "thunderxt311" name.

Added the new chip name to the documentation. Fixed
copyright names and dates.

Bootstrapped on AArch64.

2020-04-23 Anton Youdkevitch 

* config/aarch64/aarch64-cores.def: Add the chip name.
* config/aarch64/aarch64-tune.md: Regenerated.
* config/aarch64/aarch64.c: Add the cost tables for the chip.
* config/aarch64/thunderx3t11.md: New file: add the new
machine model for the scheduler
* config/aarch64/aarch64.md: Include the new model.
* doc/invoke.texi: Add the new name to the list

---
gcc/config/aarch64/aarch64-cores.def |   3 +
gcc/config/aarch64/aarch64-tune.md   |   2 +-
gcc/config/aarch64/aarch64.c |  27 ++
gcc/config/aarch64/aarch64.md|   1 +
gcc/config/aarch64/thunderx3t11.md   | 686 +
gcc/doc/invoke.texi  |   2 +-
6 files changed, 719 insertions(+), 2 deletions(-)
>From 452ae6d022a5a882abdf236d8e58cbacf4f0b301 Mon Sep 17 00:00:00 2001
From: Anton Youdkevitch 
Date: Mon, 23 Mar 2020 13:22:35 -0700
Subject: [PATCH] TX3 scheduling and tuning implementation

Added the scheduler descriptions for TX3 and the
cost tables borrowed them from TX2.
---
 gcc/config/aarch64/aarch64-cores.def |   3 +
 gcc/config/aarch64/aarch64-tune.md   |   2 +-
 gcc/config/aarch64/aarch64.c |  27 ++
 gcc/config/aarch64/aarch64.md|   1 +
 gcc/config/aarch64/thunderx3t11.md   | 686 +++
 gcc/doc/invoke.texi  |   2 +-
 6 files changed, 719 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/thunderx3t11.md

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index ea9b98b..7224802 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -95,6 +95,9 @@ AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AA
 /* Cavium ('C') cores. */
 AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1)
 
+/* Marvell cores (TX3). */
+AARCH64_CORE("thunderx3t11",  thunderx3t11,  thunderx3t11, 8_4A,  AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_SHA3, thunderx3t11, 0x43, 0x0b8, 0x0a)
+
 /* ARMv8.2-A Architecture Processors.  */
 
 /* ARM ('A') cores. */
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index 3cc1c4d..573a4a9 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,thunderx3t11,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 24c055d..7abce6a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1216,6 +1216,33 @@ static const struct tune_params thunderx2t99_tunings =
   &thunderx2t99_prefetch_tune
 };
 
+static const struct tune_params thunderx3t11_tunings =
+{
+  &thunderx2t99_extra_costs,
+  &thunderx2t99_addrcost_table,
+  &thunderx2t99_regmove_cost,
+  &thunderx2t99_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  4, /* memmov_cost.  */
+  4, /* issue_rate.  */
+  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* lo

[PATCH] testsuite: C++14 vs. C++17 struct-layout-1.exp testing with ALT_CXX_UNDER_TEST [PR94383]

2020-04-23 Thread Jakub Jelinek via Gcc-patches
Hi!

On Tue, Apr 21, 2020 at 11:57:02AM +0200, Jakub Jelinek wrote:
> I haven't added (yet) checks if the alternate compiler does support these
> options (I think that can be done incrementally), so for now this testing is
> done only if the alternate compiler is not used.

This patch does that, so now when testing against not too old compiler
it can do the -std=c++14 vs. -std=c++17 testing also between under test and
alt compilers.

Tested on x86_64-linux, without ALT_CXX_UNDER_TEST (all tests still used),
with ALT_CXX_UNDER_TEST=g++ (all tests still used too), and after tweaking
it to test -std=c++20 instead of -std=c++17 that my system g++ doesn't
support, where it only used tests before *_32* and bootstrapped/regtested
on powerpc64{,le}-linux, ok for trunk?

2020-04-23  Jakub Jelinek  

PR c++/94383
* g++.dg/compat/struct-layout-1.exp: Use the -std=c++14 vs. -std=c++17
ABI compatibility testing even with ALT_CXX_UNDER_TEST, as long as
that compiler accepts -std=c++14 and -std=c++17 options.

--- gcc/testsuite/g++.dg/compat/struct-layout-1.exp.jj  2020-04-21 
17:07:52.004248153 +0200
+++ gcc/testsuite/g++.dg/compat/struct-layout-1.exp 2020-04-23 
15:56:16.057326947 +0200
@@ -142,7 +142,19 @@ if { $status == 0 } then {
 file delete -force $tstobjdir
 file mkdir $tstobjdir
 set generator_args "-s $srcdir/$subdir -d $tstobjdir"
-if { $use_alt == 0 } then {
+set test_cxx14_vs_cxx17 1
+if { $use_alt != 0 } then {
+   compat-use-alt-compiler
+   if { [check_no_compiler_messages_nocache compat_alt_has_cxx14 object {
+   int dummy; } "-std=c++14"] == 0 } {
+   set test_cxx14_vs_cxx17 0
+   } elseif { [check_no_compiler_messages_nocache compat_alt_has_cxx17 
object {
+   int dummy; } "-std=c++17"] == 0 } {
+   set test_cxx14_vs_cxx17 0
+   }
+   compat-use-tst-compiler
+}
+if { $test_cxx14_vs_cxx17 != 0 } then {
set generator_args "$generator_args -c"
 }
 if [info exists env(RUN_ALL_COMPAT_TESTS) ] then {


Jakub



Re: [PATCH] rs6000: Replace outdated link to ELFv2 ABI

2020-04-23 Thread will schmidt via Gcc-patches
On Thu, 2020-04-23 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> A user reported that we are still referring to a public review
> draft of the ELFv2 ABI specification.  Replace that by a permalink.
> 
> Tested with "make pdf" and verified the link is hot.  Is this okay
> for master?
> 
Hi,

I have confirmed the URL goes to the right place.

lgtm :-)

Thanks
-Will



> Thanks,
> Bill
> 
> 2020-04-24  Bill Schmidt  
> 
>   * gcc/doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
>   Replace outdated link to ELFv2 ABI.
> ---
>  gcc/doc/extend.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c17b1040bde..936c22e2fe7 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -17633,7 +17633,7 @@ subject to change without notice.
> 
>  GCC complies with the OpenPOWER 64-Bit ELF V2 ABI Specification,
>  which may be found at
> -@uref{
> http://openpowerfoundation.org/wp-content/uploads/resources/leabi-prd/content/index.html
> }.
> +@uref{
> https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
> }.
>  Appendix A of this document lists the vector API interfaces that
> must be
>  provided by compliant compilers.  Programmers should preferentially
> use
>  the interfaces described therein.  However, historically GCC has
> provided



[committed] c-family: Fix ICE on attribute with -fgnu-tm [PR94733]

2020-04-23 Thread Marek Polacek via Gcc-patches
find_tm_attribute was using TREE_PURPOSE to get the attribute name,
which is breaking now that we preserve the C++11-style attribute
format past decl_attributes.  So use get_attribute_name which can
handle both formats of attributes.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/94733
* c-attribs.c (find_tm_attribute): Use get_attribute_name instead of
TREE_PURPOSE.

* g++.dg/tm/attrib-5.C: New test.
---
 gcc/c-family/c-attribs.c   | 2 +-
 gcc/testsuite/g++.dg/tm/attrib-5.C | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/tm/attrib-5.C

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 1483b3540dc..ac936d5 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -3314,7 +3314,7 @@ find_tm_attribute (tree list)
 {
   for (; list ; list = TREE_CHAIN (list))
 {
-  tree name = TREE_PURPOSE (list);
+  tree name = get_attribute_name (list);
   if (tm_attr_to_mask (name) != 0)
return name;
 }
diff --git a/gcc/testsuite/g++.dg/tm/attrib-5.C 
b/gcc/testsuite/g++.dg/tm/attrib-5.C
new file mode 100644
index 000..0b7bc728f06
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tm/attrib-5.C
@@ -0,0 +1,5 @@
+// PR c++/94733
+// { dg-do compile { target c++11 } }
+// { dg-options "-fgnu-tm" }
+
+struct [[gnu::may_alias]] pe { };

base-commit: dcf69ac5448fd6a16137cfe9fe6deadd0ec0243d
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [PATCH] rs6000: Replace outdated link to ELFv2 ABI

2020-04-23 Thread David Edelsohn via Gcc-patches
On Thu, Apr 23, 2020 at 12:13 PM Bill Schmidt  wrote:
>
> A user reported that we are still referring to a public review
> draft of the ELFv2 ABI specification.  Replace that by a permalink.
>
> Tested with "make pdf" and verified the link is hot.  Is this okay
> for master?
>
> Thanks,
> Bill
>
> 2020-04-24  Bill Schmidt  
>
> * gcc/doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
> Replace outdated link to ELFv2 ABI.

Okay.

This probably counts as obvious change.

Thanks, David


Re: [PATCH][v3], rs6000: Use plq/pstq for atomic_{load, store} (PR94622)

2020-04-23 Thread will schmidt via Gcc-patches
On Wed, 2020-04-22 at 07:59 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Apr 21, 2020 at 04:53:53PM -0500, Aaron Sawdey via Gcc-
> patches wrote:
> > For future architecture with prefix instructions, always use
> > plq/pstq
> > rather than lq/stq for atomic load of quadword. Then we never have
> > to
> > do the doubleword swap on little endian. Before this fix, -mno-
> > pcrel
> > would generate lq with the doubleword swap (which was ok) and
> > -mpcrel
> > would generate plq, also with the doubleword swap, which was wrong.
> > 2020-04-20  Aaron Sawdey  
> > 
> > PR target/94622
> > * config/rs6000/sync.md (load_quadpti): Add attr "prefixed"
> > if TARGET_PREFIXED.
> > (store_quadpti): Ditto.
> > (atomic_load): Do not swap doublewords if TARGET_PREFIXED
> > as
> > plq will be used and doesn't need it.
> > (atomic_store): Ditto, for pstq.
> > +;; Pattern load_quadpti will always use plq for atomic TImode if
> > +;; TARGET_PREFIXED.  It has the correct doubleword ordering on
> > either LE
> > +;; or BE, so we can just move the result into the output register
> > and
> > +;; do not need to do the doubleword swap for LE. Also this avoids
> > any
> > +;; confusion about whether the lq vs plq might be used based on
> > whether
> > +;; op1 has PC-relative addressing. We could potentially allow BE
> > to
> > +;; use lq because it doesn't have the doubleword ordering problem.
> 
> Two spaces after dot (twice).
> 
> Thanks for the nice comments :-)
> 
> > -  [(set_attr "type" "store")])
> > +  [(set_attr "type" "store")
> > +   (set (attr "prefixed") (if_then_else (match_test
> > "TARGET_PREFIXED")
> > +(const_string "yes")
> > +(const_string "no")))])
> 
> Every 8 leading spaces should be a tab (it's annoying to have a
> mixture
> of styles, and then later to have patches randomly change such things
> as
> well.  Spaces everywhere (no tabs ever) works fine for me, but that
> is
> not what we use, not in GCC, and not in our port.  We could change
> that
> in GCC 11 perhaps?  Opinions?)

Keep the port consistent with the rest of the project, whatever it is.
I tend to prefer just tabs, but also like to have >80 column windows.  
:-)

thanks
-Will


> 
> The patch is okay for trunk modulo those nits.  Thanks!
> 
> 
> Segher



Re: [PATCH] Shortcut identity VEC_PERM expansion [PR94710]

2020-04-23 Thread Richard Biener
On April 23, 2020 9:04:40 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>This PR is about the rs6000 backend emitting wrong assembly
>for whole vector shift by 0, and while I think it is desirable
>to fix the backend, I don't see a point why the expander should
>try to emit that, whole vector shift by 0 is identity, we can just
>return the operand.
>
>Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

OK. 

Richard. 

>2020-04-23  Jakub Jelinek  
>
>   PR target/94710
>   * optabs.c (expand_vec_perm_const): For shift_amt const0_rtx
>   just return v2.
>
>--- gcc/optabs.c.jj2020-04-17 14:18:44.380437703 +0200
>+++ gcc/optabs.c   2020-04-23 11:50:07.931780323 +0200
>@@ -5627,6 +5627,8 @@ expand_vec_perm_const (machine_mode mode
>   if (shift_amt)
>   {
> class expand_operand ops[3];
>+if (shift_amt == const0_rtx)
>+  return v2;
> if (shift_code != CODE_FOR_nothing)
>   {
> create_output_operand (&ops[0], target, mode);
>
>   Jakub



Re: [Version 2][PATCH][gcc][PR94230]provide an option to change the size limitation for -Wmisleading-indent

2020-04-23 Thread Qing Zhao via Gcc-patches
Hi, Richard,


> On Apr 23, 2020, at 1:27 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> ---
>> gcc/c-family/c-indentation.c   |  3 +++
>> gcc/common.opt |  5 +
>> gcc/doc/invoke.texi| 15 ++-
>> gcc/testsuite/gcc.dg/plugin/location-overflow-test-1.c |  2 +-
>> gcc/toplev.c   |  3 +++
>> 5 files changed, 26 insertions(+), 2 deletions(-)
>> 
>> diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c
>> index f737555..7074b10 100644
>> --- a/gcc/c-family/c-indentation.c
>> +++ b/gcc/c-family/c-indentation.c
>> @@ -67,6 +67,9 @@ get_visual_column (expanded_location exploc, location_t 
>> loc,
>>"%<-Wmisleading-indentation%> is disabled from this point"
>>" onwards, since column-tracking was disabled due to"
>>" the size of the code/headers");
>> +  inform (loc,
>> +  "please add %<-flarge-source-files%> to invoke more" 
>> +  " column-tracking for large source files");
>>  }
>>   return false;
>> }
> 
> This should be conditional on !flag_large_source_files.

Yes, indeed, will add it.
> 
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 4368910..10a3d5b 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1597,6 +1597,11 @@ fkeep-gc-roots-live
>> Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
>> ; Always keep a pointer to a live memory block
>> 
>> +flarge-source-files
>> +Common Report Var(flag_large_source_files) Init(0)
>> +Adjust GCC to cope with large source files to provide more accurate
>> +column information.
>> +
> 
> I'm having difficulty suggesting wording here, but I think would be good
> to mention the downside.  How about:
> 
> --
> Improve GCC's ability to track column numbers in large source files,
> at the expense of slower compilation.
> ———

Sounds better than my previous wording. Thanks.

> 
>> floop-parallelize-all
>> Common Report Var(flag_loop_parallelize_all) Optimization
>> Mark all loops as parallel.
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 96a9516..c6ea9ef 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -574,7 +574,7 @@ Objective-C and Objective-C++ Dialects}.
>> -fdebug-cpp  -fdirectives-only  -fdollars-in-identifiers  @gol
>> -fexec-charset=@var{charset}  -fextended-identifiers  @gol
>> -finput-charset=@var{charset}  -fmacro-prefix-map=@var{old}=@var{new}  @gol
>> --fmax-include-depth=@var{depth} @gol
>> +-fmax-include-depth=@var{depth} -flarge-source-files @gol
>> -fno-canonical-system-headers  -fpch-deps  -fpch-preprocess  @gol
>> -fpreprocessed  -ftabstop=@var{width}  -ftrack-macro-expansion  @gol
>> -fwide-exec-charset=@var{charset}  -fworking-directory @gol
> 
> This should be kept in alphabetical order, so after -finput-charset.

Okay. 
> 
>> @@ -14151,6 +14151,19 @@ This option may be useful in conjunction with the 
>> @option{-B} or
>> perform additional processing of the program source between
>> normal preprocessing and compilation.
>> 
>> +@item -flarge-source-files
>> +@opindex flarge-source-files
>> +Adjust GCC to cope with large source files to provide more accurate
>> +column information. 
>> +By default, GCC will lose accurate column information if the source 
>> +file is very large.
>> +If this option is provided, GCC will adapt accordingly to provide more
>> +accurate column information. 
>> +This option may be useful when any user hits the 
>> @option{-Wmisleading-indent} 
>> +note, "is disabled from this point onwards, since column-tracking was 
>> disabled 
>> +due to the size of the code/headers", or hits the limit at which columns get
>> +dropped from diagnostics.
>> +
> 
> On a similar vein, how about:
> 
> --
> Adjust GCC to expect large source files, at the expense of slower
> compilation and higher memory usage.
> 
> Specifically, GCC normally tracks both column numbers and line numbers
> within source files and it normally prints both of these numbers in
> diagnostics.  However, once it has processed a certain number of source
> lines, it stops tracking column numbers and only tracks line numbers.
> This means that diagnostics for later lines do not include column numbers.
> It also means that options like @option{-Wmisleading-indent} cease to work
> at that point, although the compiler prints a note if this happens.
> Passing @option{-flarge-source-files} significantly increases the number
> of source lines that GCC can process before it stops tracking column
> numbers.
> ———

Thanks a lot for this paragraph. I will use it.

Qing
> 
> Thanks,
> Richard



[PATCH] c++: zero_init_expr_p of dependent expression

2020-04-23 Thread Patrick Palka via Gcc-patches
This fixes a ICE coming from mangle.c:write_expression when compiling the
ranges-v3 testsuite; the added testcase is a reduced reproducer of the ICE.

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on the
cmcstl2, fmt and range-v3 libraries.  Does this look OK to commit?

gcc/cp/ChangeLog:

* tree.c (zero_init_expr_p): Use uses_template_parms instead of
dependent_type_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/dependent3.C: New test.
---
 gcc/cp/tree.c   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/dependent3.C | 28 +
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/dependent3.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 090c565c093..8840932dba2 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -4486,7 +4486,7 @@ bool
 zero_init_expr_p (tree t)
 {
   tree type = TREE_TYPE (t);
-  if (!type || dependent_type_p (type))
+  if (!type || uses_template_parms (type))
 return false;
   if (zero_init_p (type))
 return initializer_zerop (t);
diff --git a/gcc/testsuite/g++.dg/cpp0x/dependent3.C 
b/gcc/testsuite/g++.dg/cpp0x/dependent3.C
new file mode 100644
index 000..caf7e1cd4a4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/dependent3.C
@@ -0,0 +1,28 @@
+// { dg-do compile { target c++11 } }
+
+template
+struct d
+{
+  using e = c;
+};
+
+template
+struct g
+{
+  using h = typename d::e;
+
+  template
+  auto operator()(i, j k) -> decltype(h{k});
+};
+
+template
+void m()
+{
+  int a[1];
+  l{}(a, a);
+}
+
+int main()
+{
+  m>();
+}
-- 
2.26.2.266.ge870325ee8



[PATCH] Mark experimental::net::system_context ctor deleted

2020-04-23 Thread Thomas Rodgers via Gcc-patches


  * include/experimental/net/executor: Mark
  system_context::system_context() = default.
  * testsuite/experimental/net/executor/1.cc: Add new
  test for deleted system_context ::system_context().
---
 libstdc++-v3/include/experimental/executor| 2 +-
 libstdc++-v3/testsuite/experimental/net/executor/1.cc | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/experimental/executor 
b/libstdc++-v3/include/experimental/executor
index b5c6e18a19a..fa39eaa0468 100644
--- a/libstdc++-v3/include/experimental/executor
+++ b/libstdc++-v3/include/experimental/executor
@@ -850,7 +850,7 @@ inline namespace v1
 
 // construct / copy / destroy:
 
-system_context() = default;
+system_context() = delete;
 system_context(const system_context&) = delete;
 system_context& operator=(const system_context&) = delete;
 
diff --git a/libstdc++-v3/testsuite/experimental/net/executor/1.cc 
b/libstdc++-v3/testsuite/experimental/net/executor/1.cc
index 456d620e193..cd0af4b7737 100644
--- a/libstdc++-v3/testsuite/experimental/net/executor/1.cc
+++ b/libstdc++-v3/testsuite/experimental/net/executor/1.cc
@@ -85,9 +85,16 @@ test02()
   VERIFY( e == g );
 }
 
+void
+test03()
+{
+  static_assert( ! std::is_default_constructible::value, 
"" );
+}
+
 int
 main()
 {
   test01();
   test02();
+  test03();
 }
-- 
2.25.3

 



[PATCH] c++: Lambda in friend of constrained class [PR94645]

2020-04-23 Thread Patrick Palka via Gcc-patches
In the testcase below, when grokfndecl processes the operator() decl for the
lambda inside the friend function foo, processing_template_decl is rightly 1,
but template_class_depth on the lambda's closure type incorrectly returns 0
instead of 1.

Since processing_template_decl > template_class_depth, this makes grokfndecl
think that the operator() has its own set of template arguments, and so we
attach the innermost set of constraints -- those belonging to struct l -- to the
operator() decl.  We then get confused when checking constraints_satisfied_p on
the operator() because it doesn't have template information and yet has
constraints associated with it.

This patch fixes template_class_depth to return the correct template nesting
level in cases like these, so that when it hits a friend function it walks into
the DECL_FRIEND_CONTEXT of the friend rather than into the CP_DECL_CONTEXT.

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on cmcstl2,
range-v3, and fmt libraries.

gcc/cp/ChangeLog:

PR c++/94645
* pt.c (template_class_depth): Walk into the DECL_FRIEND_CONTEXT of a
friend declaration rather than into its CP_DECL_CONTEXT.

gcc/testsuite/ChangeLog:

PR c++/94645
* g++.dg/cpp2a/concepts-lambda6.C: New test.
---
 gcc/cp/pt.c   |  7 ++-
 gcc/testsuite/g++.dg/cpp2a/concepts-lambda6.C | 19 +++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda6.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0e3e7b2039a..08a35c41007 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -390,7 +390,12 @@ template_class_depth (tree type)
++depth;
 
   if (DECL_P (type))
-   type = CP_DECL_CONTEXT (type);
+   {
+ if (tree fctx = DECL_FRIEND_CONTEXT (type))
+   type = fctx;
+ else
+   type = CP_DECL_CONTEXT (type);
+   }
   else if (LAMBDA_TYPE_P (type) && LAMBDA_TYPE_EXTRA_SCOPE (type))
type = LAMBDA_TYPE_EXTRA_SCOPE (type);
   else
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda6.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda6.C
new file mode 100644
index 000..244421e64ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda6.C
@@ -0,0 +1,19 @@
+// PR c++/94632
+// { dg-do compile { target concepts } }
+
+struct unordered_map {
+  int cend() const noexcept;
+};
+
+template  concept HasMapInterface = requires(a t) { t.cend(); };
+
+template  requires HasMapInterface
+struct l {
+  friend void foo(l opt) { ([]() {})(); }
+};
+
+struct p {
+  static unordered_map map();
+};
+
+void g(l *y) { foo(*y); }
-- 
2.26.2.266.ge870325ee8



Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Segher Boessenkool
On Thu, Apr 23, 2020 at 08:40:50AM -0600, Jeff Law wrote:
> On Thu, 2020-04-23 at 15:07 +0200, Richard Biener wrote:
> > On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
> >  wrote:
> > > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > > But being stuck with something means no progress...  I know
> > > > > > very well it's 100 times harder to get rid of something than to
> > > > > > add something new ontop.
> > > > > 
> > > > > Well, what progress do you expect to make?  After expand that is :-)
> > > > 
> > > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > > no CSE, ...
> > > 
> > > RTL CSE for example is very much required to get any good code.  It
> > > needs to CSE stuff that wasn't there before expand.
> > 
> > Sure, but then we should fix that!
> Exactly.  It's purpose largely becomes dealing with the redundancies exposed 
> by
> expansion.  ie, address arithmetic and the like.   A lot of its path following
> code should be throttled back.

Hrm, I never thought about it like this.  CSE was always there, I never
stopped to question if we needed it :-)

Well, that's cse1 then.  What about cse2?

> > But valid RTL is instructions that are recognized.  Which means
> > when the target doesn't support an SImode add we may not create
> > one.  That's instruction selection ;)
> That's always a point of tension.  But I think that in general continuing to 
> have
> targets claim to support things they do not (such as double-wordsize 
> arithmetic,
> logicals, moves, etc) is a mistake.  It made sense at one time, but I think 
> we've
> got better mechansisms in place to deal with this stuff now.

Different targets have *very* different insns for add, mul, div, shifts;
everything really.  Describing this at expand time with two-machine-word
operations works pretty bloody well, for most or all targets -- this is
just part of the power of define_expand (but an important part).  And
define_expand is very very useful, it's the swiss army escape hatch, it
lets you do everything optabs have a too small mind for.

> > > Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> > > there.  But for low-level, close-to-the-machine stuff, RTL is much
> > > better suited.  And we *do* want to optimise at that level as well, and
> > > much more than just peepholes.
> > 
> > Well, everything that requires costing (unrolling, vectorization,
> > IV selection to name a few) _is_ close-to-the-machine.  We're
> > just saying they are not because GIMPLE is so much easier to
> > work with here (not sure why exactly...).
> The primary motivation behind discouraging target costing and the like from
> gimple was to make it easier to implement and predict the behavior of the 
> gimple
> optimizers.   We've relaxed that somewhat, particularly for vectorization, 
> but I
> think the principle is still solid.

There are two kinds of costing.  The first only says which of A or B is
better; that can perhaps be done on GIMPLE already, using
target-specific costs.  The other gives a number to everything, which is
much harder to get anywhere close to usably correct (what does the
number even *mean*?  For performance, latency of the whole sequence is
the most important number, but that is not easy to work with, or what we
use for say insn_cost).

> 
> But I think there is a place for adding target dependencies -- and that's at 
> the
> end of the current gimple pipeline.

There are a *few* things in GIMPLE that use target costs (ivopts...)
But yeah, most things should not.


Segher


Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Segher Boessenkool
On Thu, Apr 23, 2020 at 04:32:51PM +0100, Richard Sandiford wrote:
> I also wonder how difficult it would be to get recog to recognise
> gimple :-)

Since recog recognises single (rtl) insns: hard to impossible?


Segher


Re: [PATCH] Mark experimental::net::system_context ctor deleted

2020-04-23 Thread Jonathan Wakely via Gcc-patches

On 23/04/20 13:09 -0700, Thomas Rodgers via Libstdc++ wrote:


 * include/experimental/net/executor: Mark
 system_context::system_context() = default.


s/default/delete/ :-)

But the affected function/type/thingie should be named in parens, not
in the comment i.e.

* include/experimental/net/executor (system_context): Define default
constructor as deleted.


 * testsuite/experimental/net/executor/1.cc: Add new
 test for deleted system_context ::system_context().


There's a stray space in there.

It would be more accurate to say "check system_context isn't default
constructible" because you can't test if it's deleted, as opposed to
private or just doesn't exist.

OK with those changelog tweaks.

Please backport to gcc-9 too.




---
libstdc++-v3/include/experimental/executor| 2 +-
libstdc++-v3/testsuite/experimental/net/executor/1.cc | 7 +++
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/experimental/executor 
b/libstdc++-v3/include/experimental/executor
index b5c6e18a19a..fa39eaa0468 100644
--- a/libstdc++-v3/include/experimental/executor
+++ b/libstdc++-v3/include/experimental/executor
@@ -850,7 +850,7 @@ inline namespace v1

// construct / copy / destroy:

-system_context() = default;
+system_context() = delete;
system_context(const system_context&) = delete;
system_context& operator=(const system_context&) = delete;

diff --git a/libstdc++-v3/testsuite/experimental/net/executor/1.cc 
b/libstdc++-v3/testsuite/experimental/net/executor/1.cc
index 456d620e193..cd0af4b7737 100644
--- a/libstdc++-v3/testsuite/experimental/net/executor/1.cc
+++ b/libstdc++-v3/testsuite/experimental/net/executor/1.cc
@@ -85,9 +85,16 @@ test02()
  VERIFY( e == g );
}

+void
+test03()
+{
+  static_assert( ! std::is_default_constructible::value, 
"" );
+}
+
int
main()
{
  test01();
  test02();
+  test03();
}
--
2.25.3







Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-23 Thread Joseph Myers
On Thu, 23 Apr 2020, Richard Biener via Gcc-patches wrote:

> I think at least one step would be uncontroversical(?), namely moving
> the RTL expansion "magic"
> up to a GIMPLE pass.  Where the "magic" would be to turn
> GIMPLE stmts not directly expandable via an existing optab into
> GIMPLE that can be trivially expanded.  That includes eventually

I suspect some such pieces should actually happen before some GIMPLE 
optimizations are done, rather than as the last pass before expansion - 
though then you need to make sure subsequent GIMPLE passes don't 
reintroduce the operations you've lowered.

E.g. the special handling of arithmetic on bit-field types in expand is a 
good candidate for being a lowering step (lowering to operations on 
integer widths actually supported by the machine / that correspond to 
known machine modes) somewhere in GIMPLE processing (and you don't want to 
subsequently narrow back to operations on those widths that don't exist in 
hardware).  Note in that regard that there is a proposal for C2x to 
support integer types with a specified number of bits more generally than 
just as bit-fields.

> That said, I actually agree with that.  It's just that I hope we can
> make some of the knowledge just represented on the RTL side
> available on the GIMPLE side.  The more complicated parts,
> like calling conventions, that is.

On the other hand, there is some knowledge like that which should move out 
of the front ends, into a GIMPLE lowering step (front ends should not be 
calling targetm.calls.promote_prototypes and changing their internal 
representation accordingly, that information should be handled later in 
the compiler, probably on GIMPLE).

-- 
Joseph S. Myers
jos...@codesourcery.com


Fix PR tree-optimization/94717

2020-04-23 Thread Eric Botcazou
This is a regression present on the mainline in the form of a segfault on the 
attached C testcase with -O2 -fnon-call-exceptions -ftracer.  The GIMPLE SSA 
store merging pass blows up when it is rewriting the stores because it didn't 
realize that they don't belong to the same EH region.

Fixed by refusing to merge them.  Tested on x86-64/Linux, pre-approved by 
Jakub and applied on the mainline.


2019-04-23  Eric Botcazou  

PR tree-optimization/94717
* gimple-ssa-store-merging.c (try_coalesce_bswap): Return false if one
of the stores doesn't have the same landing pad number as the first.
(coalesce_immediate_stores): Do not try to coalesce the store using
bswap if it doesn't have the same landing pad number as the first.


2019-04-23  Eric Botcazou  

* g++.dg/opt/store-merging-4.C: New test.

-- 
Eric Botcazoudiff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index a6687cd9c98..25753517cc6 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -2435,6 +2435,7 @@ imm_store_chain_info::try_coalesce_bswap (merged_store_group *merged_store,
   for (unsigned int i = first + 1; i < len; ++i)
 {
   if (m_store_info[i]->bitpos != m_store_info[first]->bitpos + width
+	  || m_store_info[i]->lp_nr != merged_store->lp_nr
 	  || m_store_info[i]->ins_stmt == NULL)
 	return false;
   width += m_store_info[i]->bitsize;
@@ -2682,6 +2683,7 @@ imm_store_chain_info::coalesce_immediate_stores ()
   if (info->bitpos == merged_store->start + merged_store->width
 	  && merged_store->stores.length () == 1
 	  && merged_store->stores[0]->ins_stmt != NULL
+	  && info->lp_nr == merged_store->lp_nr
 	  && info->ins_stmt != NULL)
 	{
 	  unsigned int try_size;
// PR tree-optimization/94717
// Reported by Zdenek Sojka 

// { dg-do compile }
// { dg-options "-O2 -fnon-call-exceptions -ftracer" }

int abs (int);

static inline void
bar (int d)
{
  d && abs (d);
}

struct S
{
  int a;
  int b;
  int c;
  S (unsigned a, unsigned b) : a (a), b (b) { }
};

void
foo (S *x)
{
  bar (x->c);
  new S (x->a, x->b);
  bar (0);
}


[committed 9/8] libstdc++: Define __cpp_lib_three_way_comparison for freestanding

2020-04-23 Thread Jonathan Wakely via Gcc-patches

The  header is always supported, not only for hosted configs.

* include/std/version (__cpp_lib_three_way_comparison): Define for
freestanding builds.

Tested powerpc64le-linux, committed to master.


commit a2dcb56c9443d1211e14889bd0c2c21360d54cdb
Author: Jonathan Wakely 
Date:   Thu Apr 23 21:39:33 2020 +0100

libstdc++: Define __cpp_lib_three_way_comparison for freestanding

The  header is always supported, not only for hosted configs.

* include/std/version (__cpp_lib_three_way_comparison): Define for
freestanding builds.

diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index d06d60c9106..1beb9aa938e 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -184,6 +184,9 @@
 #endif
 #define __cpp_lib_is_nothrow_convertible 201806L
 #define __cpp_lib_remove_cvref 201711L
+#if __cpp_impl_three_way_comparison >= 201907L && __cpp_lib_concepts
+# define __cpp_lib_three_way_comparison 201907L
+#endif
 #define __cpp_lib_type_identity 201806L
 #define __cpp_lib_unwrap_ref 201811L
 
@@ -215,9 +218,6 @@
 #define __cpp_lib_span 202002L
 #define __cpp_lib_ssize 201902L
 #define __cpp_lib_starts_ends_with 201711L
-#if __cpp_impl_three_way_comparison >= 201907L && __cpp_lib_concepts
-# define __cpp_lib_three_way_comparison 201907L
-#endif
 #define __cpp_lib_to_address 201711L
 #define __cpp_lib_to_array 201907L
 #endif


Re: [committed 7/8] libstdc++: Update (and revert) value of __cpp_lib_array_constexpr

2020-04-23 Thread Jonathan Wakely via Gcc-patches

On 22/04/20 22:59 +0100, Jonathan Wakely wrote:

This macro should have been updated to 201811 when the last C++20
changes were implemented. However those changes are not enabled for
C++17 mode, so the macro should only have the new value in C++20 mode.

This change ensures that the macro is defined to 201603 for C++17 and
201811 for C++20.

* include/bits/stl_iterator.h (__cpp_lib_array_constexpr): Define
different values for C++17 and C++20, to indicate different feature
sets. Update value for C++20 to indicate P1032R1 support.
* include/std/version (__cpp_lib_array_constexpr): Likewise.
* testsuite/23_containers/array/comparison_operators/constexpr.cc:
Check feature test macro.
* testsuite/23_containers/array/element_access/constexpr_c++17.cc:
New test.
* testsuite/23_containers/array/requirements/constexpr_fill.cc: Check
feature test macro.
* testsuite/23_containers/array/requirements/constexpr_iter.cc: Test
in C++17 mode and check feature test macro.


On second thoughts, changing __cpp_lib_array_constexpr for C++17 was
wrong, it should have been left at 201803.

This partially reverts my previous change related to this macro. The
C++20 constexpr iterator requirements are always met by array:iterator,
because it's just a pointer. So the macro can be set to 201803 even in
C++17 mode.

Tested powerpc64le-linux, committed to master.


commit 40541efe1c063e9ce894b5f11ff727e4aec56e8b
Author: Jonathan Wakely 
Date:   Thu Apr 23 21:39:33 2020 +0100

libstdc++: Change __cpp_lib_array_constexpr for C++17 again

This partially reverts my previous change related to this macro. The
C++20 constexpr iterator requirements are always met by array:iterator,
because it's just a pointer. So the macro can be set to 201803 even in
C++17 mode.

* include/bits/stl_iterator.h (__cpp_lib_array_constexpr): Revert
value for C++17 to 201803L because P0858R0 is supported for C++17.
* include/std/version (__cpp_lib_array_constexpr): Likewise.
* testsuite/23_containers/array/element_access/constexpr_c++17.cc:
Check for value corresponding to P0031R0 features being tested.
* testsuite/23_containers/array/requirements/constexpr_iter.cc:
Check for value corresponding to P0858R0 features being tested.

diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h
index d7e85b84041..cc0b3e0a766 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -72,7 +72,7 @@
 #if __cplusplus > 201703L
 # define __cpp_lib_array_constexpr 201811L
 #elif __cplusplus == 201703L
-# define __cpp_lib_array_constexpr 201603L
+# define __cpp_lib_array_constexpr 201803L
 #endif
 
 #if __cplusplus > 201703L
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 1beb9aa938e..fa505f25e98 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -123,7 +123,7 @@
 #if _GLIBCXX_HOSTED
 #define __cpp_lib_any 201606L
 #define __cpp_lib_apply 201603
-#define __cpp_lib_array_constexpr 201603L
+#define __cpp_lib_array_constexpr 201803L
 #define __cpp_lib_as_const 201510
 #define __cpp_lib_boyer_moore_searcher 201603
 #define __cpp_lib_chrono 201611
diff --git a/libstdc++-v3/testsuite/23_containers/array/element_access/constexpr_c++17.cc b/libstdc++-v3/testsuite/23_containers/array/element_access/constexpr_c++17.cc
index 56d1cf256be..dd69645833f 100644
--- a/libstdc++-v3/testsuite/23_containers/array/element_access/constexpr_c++17.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/element_access/constexpr_c++17.cc
@@ -24,8 +24,6 @@
 # error "Feature test macro for array constexpr is missing in "
 #elif __cpp_lib_array_constexpr < 201603L
 # error "Feature test macro for array constexpr has wrong value in "
-#elif __cpp_lib_array_constexpr > 201603L && __cplusplus == 201703
-# error "Feature test macro for array constexpr has wrong value for C++17"
 #endif
 
 constexpr std::size_t test01()
diff --git a/libstdc++-v3/testsuite/23_containers/array/requirements/constexpr_iter.cc b/libstdc++-v3/testsuite/23_containers/array/requirements/constexpr_iter.cc
index a119937f773..566388405b6 100644
--- a/libstdc++-v3/testsuite/23_containers/array/requirements/constexpr_iter.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/requirements/constexpr_iter.cc
@@ -22,12 +22,13 @@
 
 #ifndef __cpp_lib_array_constexpr
 # error "Feature test macro for array constexpr is missing in "
-#elif __cpp_lib_array_constexpr < 201603L
+#elif __cpp_lib_array_constexpr < 201803L
 # error "Feature test macro for array constexpr has wrong value in "
-#elif __cpp_lib_array_constexpr > 201603L && __cplusplus == 201703
-# error "Feature test macro for array constexpr has wrong value for C++17"
 #endif
 
+// This test is compiled as C++17 because array::itera

Re: [committed 0/8] libstdc++: Add/update/fix feature test macros

2020-04-23 Thread Jonathan Wakely via Gcc-patches

On 22/04/20 22:57 +0100, Jonathan Wakely wrote:

This series of patches fixes a number of omissions and errors in the
feature test macros we define.

Tested powerpc64le-linux, committed to master.

Jonathan Wakely (8):
 libstdc++: Update value of __cpp_lib_jthread macro
 libstdc++: Remove non-standard feature test macros
 libstdc++: Add missing feature test macros
 libstdc++: Rename __cpp_lib_constexpr_invoke macro
 libstdc++: Update __cpp_lib_concepts value
 libstdc++: Do not define __cpp_lib_constexpr_algorithms in 
 libstdc++: Update (and revert) value of __cpp_lib_array_constexpr
 libstdc++: Define __cpp_lib_execution feature test macro



I've backported some of these changes (and other ones related to
feature test macros) to the gcc-9 branch. This means both master and
gcc-9 should define exactly the feature test macros for the features
they support.

With that done, I've also updated the C++20 status table in the docs
(only for master so far, but I'll do it for gcc-9 too).

The attached doc patch has been committed to master.

commit be0363c80f7ac93f1dbd00da6beb9ce0eed96d9d
Author: Jonathan Wakely 
Date:   Thu Apr 23 21:39:33 2020 +0100

libstdc++: Update C++20 library status docs

This reorganises the C++20 status table, grouping the proposals by
category. It also adds more proposals, and documents all the feature
test macros for C++20 library changes.

* doc/xml/manual/status_cxx2020.xml: Update C++20 status table.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
index 17f28887119..ade77cbb80b 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
@@ -1,8 +1,8 @@
-http://docbook.org/ns/docbook"; version="5.0" 
-	 xml:id="status.iso.2020" xreflabel="Status C++ 2020">
+http://docbook.org/ns/docbook"; version="5.0"
+   xml:id="status.iso.2020" xreflabel="Status C++ 2020">
 
 
-C++ 202a
+C++ 2020
   
 ISO C++
 2020
@@ -26,17 +26,20 @@ not in any particular release.
 
 
 The following table lists new library features that have been accepted into
-the C++2a working draft. The "Proposal" column provides a link to the
+the C++20 working draft. The "Proposal" column provides a link to the
 ISO C++ committee proposal that describes the feature, while the "Status"
 column indicates the first version of GCC that contains an implementation of
 this feature (if it has been implemented).
-The "SD-6 Feature Test" column shows the corresponding macro or header from
+A dash (—) in the status column indicates that the changes in the proposal
+either do not affect the code in libstdc++, or the changes are not required for conformance.
+The "SD-6 Feature Test / Notes" column shows the corresponding macro or header from
 http://www.w3.org/1999/xlink"; xlink:href="https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations";>SD-6:
-Feature-testing recommendations for C++.
+Feature-testing recommendations for C++ (where applicable)
+or any notes about the implementation.
 
 
-
-C++ 2020 Implementation Status
+
+C++ 2020 Library Features
 
 
 
@@ -48,44 +51,309 @@ Feature-testing recommendations for C++.
   Library Feature
   Proposal
   Status
-  SD-6 Feature Test
+  SD-6 Feature Test / Notes
 
   
 
   
 
 
-Endian just Endian 
-  
-http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0463r1.html";>
-	P0463R1
-	
+  
+Compile-time programming
   
-   8.1 
-  
+
+
+
+Add constexpr modifiers to functions in  and  Headers 
+  
+http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0202r3.html";>
+P0202R3 
+  
+   10.1 
+   __cpp_lib_constexpr_algorithms >= 201703L 
+
+
+
+Constexpr for swap and swap related functions 
+  
+http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0879r0.html";>
+P0879R0 
+  
+   10.1 
+   __cpp_lib_constexpr_algorithms >= 201806L 
+
+
+
+Constexpr for std::complex 
+  
+http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0415r1.html";>
+P0415R1 
+  
+   9.1 
+   __cpp_lib_constexpr_complex >= 201711L (since 9.4, see Note 1) 
+
+
+
+P0595R2 std::is_constant_evaluated() 
+  
+http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0595r2.html";>
+P0595R2 
+  
+   9.1 
+   __cpp_lib_is_constant_evaluated >= 201811L 
+
+
+
+More constexpr containers 
+  
+http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers

  1   2   >