Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-28 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Wed, May 27, 2020 at 6:36 AM Jiufu Guo  wrote:
>>
>> Segher Boessenkool  writes:
>>
>> > Hi!
>> >
>> > On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote:
>> >> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
>> >>  wrote:
>> >> > Yes, cunroll does not have its own option, and that is a problem.  But
>> >> > that is easy to fix!  Either with an option, or just with params (the
>> >> > option wouldn't do more than set params anyway?)
>> >>
>> >> Well, given coming up with different names for essentially the same
>> >> transform is going to be challenging how about sth like
>> >>
>> >> -funroll-loops={early,late,static,dynamic}[insert better names here]
>> >
>> > User interface is hard :-)  I think luckily we don't need to change
>> > anything there yet, just have an internal flag?
>> >
>> > But complete unrolling is something quite different, so it should have
>> > its own flag anyway (at least internally).
>> >
>> >> note there's also -fpeel-loops which may match the transform
>> >> done on GIMPLE better?
>> >
>> > Peeling and unrolling are the same thing, if doing complete unrolling
>> > (or complete peeling), followed by DCE in both cases.  Peeling is a
>> > nicer name here I think, yeah.
>> >
>> >> I'm not sure which are the technically
>> >> correct terms for unrollings that elide the loop (the backedge).
>> >
>> > I don't know a better term than "complete", I don't remember ever seeing
>> > something else either.
>>
>> How about "Var(flag_cunroll_grow_size) EnabledBy(funroll-loops ||
>> funroll-all-loops || fpeel-loops)" Or flag_cunroll_allow_grow_size?
>>
>> And then using this flags as:
>>   unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>>|| optimize >= 3, true);
>>
>> And we do not need to enable this flag at -O2.
>
> Sure this works for me.  Note I'd make funroll-loops enabled by
> funroll-all-loops so you could simplify the above.

After some checking, I did not use 'EnabledBy(funroll-all-loops)' for
funroll-loops, because of special behavior on -fno-unroll-all-loops.
Command line -fno-unroll-all-loops turns off flag_unroll_loops, if
flag_unroll_loops is enabled implicitly(not through command line). 

While the orignal logic is: only possitive flag_unroll_all_loops
overrides flag_unroll_loops. 
"if (flag_unroll_all_loops) flag_unroll_loops = 1;" in process_options.

Thanks.
Jiufu

>
> Richard.
>
>> Thanks for all your helpful comments again!
>>
>> Jiufu
>>
>> >
>> >> We're doing such kind of unrolling even if we cannot statically
>> >> decide which of a set of possible exits we take (and internally
>> >> call that peeling, if we can statically decide we call it complete
>> >> unrolling).
>> >
>> > "Peeling" is placing some copies of the loop before the loop;
>> > "unrolling" is placing a few copies of the loop inside the loop body.
>> > Does that match usage here?
>> >
>> >> The RTL side OTOH only performs classical unrolling,
>> >> preserving the backedge with various strategies for the
>> >> remaining iterations.
>> >
>> > And if you do complete unrolling that way, the backedge can be removed,
>> > since it can be shown never to be taken.
>> >
>> >> As said, for the regression on the 10 branch with ppc I'd add
>> >> [a hidden] flag that controls the RTL unroller, also set by
>> >> -funroll-loops and triggered by the ppc specific heuristics.
>> >
>> > But the problem is in cunroll?  This is so backwards...  Because some
>> > other transform abuses the unroller flags, adding a second level flag
>> > with the same meaning :-(  It will work for fixing the regression,
>> > sure, and it is slightly less code as well.
>> >
>> >
>> > Segher


[PATCH] testsuite/95363 - fix gcc.dg/vect/bb-slp-pr95271.c for ilp32

2020-05-28 Thread Richard Biener
This fixes the testcase to avoid out of bound shifts on ilp32 targets.

2020-05-28  Richard Biener  

PR testsuite/95363
* gcc.dg/vect/bb-slp-pr95271.c: Fix on ilp32 targets.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95271.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95271.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95271.c
index 2f235980405..f6e266cce5c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95271.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95271.c
@@ -1,19 +1,22 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target stdint_types } */
 /* { dg-additional-options "-march=cooperlake" { target x86_64-*-* i?86-*-* } 
} */
 
+#include 
+
 int a;
 struct b c;
-long d;
+int64_t d;
 struct b {
-  unsigned long address;
-  unsigned long e;
+  uint64_t address;
+  uint64_t e;
 };
 void f()
 {
-  d = (long)(&a)[0] << 56 | (long)((unsigned char *)&a)[1] << 48 |
-  (long)((unsigned char *)&a)[2] << 40 |
-  (long)((unsigned char *)&a)[3] << 32 |
-  (long)((unsigned char *)&a)[4] << 24 | ((unsigned char *)&a)[5] << 16 |
+  d = (int64_t)(&a)[0] << 56 | (int64_t)((unsigned char *)&a)[1] << 48 |
+  (int64_t)((unsigned char *)&a)[2] << 40 |
+  (int64_t)((unsigned char *)&a)[3] << 32 |
+  (int64_t)((unsigned char *)&a)[4] << 24 | ((unsigned char *)&a)[5] << 16 
|
   ((unsigned char *)&a)[6] << 8 | ((unsigned char *)&a)[7];
   c.address = c.e = d;
 }
-- 
2.26.1


Re: [PATCH] mklog: support renaming of files

2020-05-28 Thread Martin Liška

On 5/27/20 8:15 PM, Martin Liška wrote:

The support is optional and detected during run-time.

Thoughts?


Pushed as eb78da45ab8.

Martin


[PATCH 2/2] rs6000: allow cunroll to grow size according to -funroll-loop or -fpeel-loops

2020-05-28 Thread guojiufu via Gcc-patches
From: Jiufu Guo 

Previously, flag_unroll_loops was turned on at -O2 implicitly.  It
also turned on cunroll with allowance size increasing, and then cunroll
will unroll/peel the loop even the loop is complex like code in PR95018.
With this patch, size growth for cunroll is allowed if -funroll-loops
or -fpeel-loops is specified explicitly.

Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
GCC10?

BR,
Jiufu

gcc/ChangeLog
2020-02-28  Jiufu Guo  

PR target/95018
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Override flag_cunroll_grow_size.

---
 gcc/config/rs6000/rs6000.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8435bc15d72..df6e03146cb 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4567,7 +4567,12 @@ rs6000_option_override_internal (bool global_init_p)
unroll_only_small_loops = 0;
  if (!global_options_set.x_flag_rename_registers)
flag_rename_registers = 1;
+ if (!global_options_set.x_flag_cunroll_grow_size)
+   flag_cunroll_grow_size = 1;
}
+  else
+   if (!global_options_set.x_flag_cunroll_grow_size)
+ flag_cunroll_grow_size = flag_peel_loops;
 
   /* If using typedef char *va_list, signal that
 __builtin_va_start (&ap, 0) can be optimized to
-- 
2.17.1



[PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-05-28 Thread guojiufu via Gcc-patches
From: Jiufu Guo 

Currently GIMPLE complete unroller(cunroll) is checking
flag_unroll_loops and flag_peel_loops to see if allow size growth.
Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
To have more freedom to control cunroll and RTL unroller, this patch
introduces flag_cunroll_grow_size.  With this patch, we can control
cunroll and RTL unroller indepently.

Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
GCC10 after week?

gcc/ChangeLog
2020-02-28  Jiufu Guo  

* common.opt (flag_cunroll_grow_size): New flag.
* toplev.c (process_options): Set flag_cunroll_grow_size.
* tree-ssa-loop-ivcanon.c (pass_complete_unroll::execute):
Use flag_cunroll_grow_size.
---
 gcc/common.opt  | 4 
 gcc/toplev.c| 4 
 gcc/tree-ssa-loop-ivcanon.c | 3 +--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4464049fc1f..1d0fa7b1749 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2856,6 +2856,10 @@ funroll-all-loops
 Common Report Var(flag_unroll_all_loops) Optimization
 Perform loop unrolling for all loops.
 
+funroll-completely-grow-size
+Var(flag_cunroll_grow_size) Init(2)
+; Control cunroll to allow size growth during complete unrolling
+
 ; Nonzero means that loop optimizer may assume that the induction variables
 ; that control loops do not overflow and that the loops with nontrivial
 ; exit condition are not infinite
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 96316fbd23b..8d52358efdd 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1474,6 +1474,10 @@ process_options (void)
   if (flag_unroll_all_loops)
 flag_unroll_loops = 1;
 
+  /* Allow cunroll to grow size accordingly.  */
+  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
+flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
+
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
 flag_web = flag_unroll_loops;
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8ab6ab3330c..d6a4617a6a1 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1603,8 +1603,7 @@ pass_complete_unroll::execute (function *fun)
  re-peeling the same loop multiple times.  */
   if (flag_peel_loops)
 peeled_loops = BITMAP_ALLOC (NULL);
-  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
-  || flag_peel_loops
+  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
   || optimize >= 3, true);
   if (peeled_loops)
 {
-- 
2.17.1



[PATCH] Fix check-params-in-docs.py for --help=param.

2020-05-28 Thread Martin Liška

Updated to current help format and pushed to master.

Martin

contrib/ChangeLog:

* check-params-in-docs.py: Update to new format
of help.  Apply flake8 corrections.
---
 contrib/check-params-in-docs.py | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
index 6cff090dc4c..dfbfa3d0067 100755
--- a/contrib/check-params-in-docs.py
+++ b/contrib/check-params-in-docs.py
@@ -22,16 +22,19 @@
 #
 #
 
-import sys

-import json
 import argparse
+from itertools import dropwhile, takewhile
 
-from itertools import *
 
 def get_param_tuple(line):

-line = line.strip()
+line = line.strip().replace('--param=', '')
 i = line.find(' ')
-return (line[:i], line[i:].strip())
+name = line[:i]
+if '=' in name:
+name = name[:name.find('=')]
+description = line[i:].strip()
+return (name, description)
+
 
 parser = argparse.ArgumentParser()

 parser.add_argument('texi_file')
@@ -49,8 +52,8 @@ for line in open(args.params_output).readlines():
 
 # Find section in .texi manual with parameters

 texi = ([x.strip() for x in open(args.texi_file).readlines()])
-texi = dropwhile(lambda x: not 'item --param' in x, texi)
-texi = takewhile(lambda x: not '@node Instrumentation Options' in x, texi)
+texi = dropwhile(lambda x: 'item --param' not in x, texi)
+texi = takewhile(lambda x: '@node Instrumentation Options' not in x, texi)
 texi = list(texi)[1:]
 
 token = '@item '

--
2.26.2



[PATCH] Add documentation for missing params.

2020-05-28 Thread Martin Liška

The patch fixes various issues spotted by check-params-in-docs.py
script. I'm going to install the patch.

gcc/ChangeLog:

PR web/95380
* doc/invoke.texi: Add missing params, remove max-once-peeled-insns and
rename ipcp-unit-growth to ipa-cp-unit-growth.
---
 gcc/doc/invoke.texi | 38 +-
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 78c2f500c90..5345bc3def3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10565,7 +10565,7 @@ When enabled, interprocedural constant propagation 
performs function cloning
 when externally visible function can be called with constant arguments.
 Because this optimization can create multiple copies of functions,
 it may significantly increase code size
-(see @option{--param ipcp-unit-growth=@var{value}}).
+(see @option{--param ipa-cp-unit-growth=@var{value}}).
 This flag is enabled by default at @option{-O3}.
 It is also enabled by @option{-fprofile-use} and @option{-fauto-profile}.
 
@@ -12454,7 +12454,7 @@ For example, parameter value 20 limits unit growth to 1.2 times the original

 size. Cold functions (either marked cold via an attribute or by profile
 feedback) are not accounted into the unit size.
 
-@item ipcp-unit-growth

+@item ipa-cp-unit-growth
 Specifies maximal overall growth of the compilation unit caused by
 interprocedural constant propagation.  For example, parameter value 10 limits
 unit growth to 1.1 times the original size.
@@ -13460,9 +13460,6 @@ will not try to thread through its block.
 Maximum number of nested calls to search for control dependencies
 during uninitialized variable analysis.
 
-@item max-once-peeled-insns

-The maximum number of insns of a peeled loop that rolls only once.
-
 @item sra-max-scalarization-size-Osize
 Maximum size, in storage units, of an aggregate
 which should be considered for scalarization when compiling for size.
@@ -13493,6 +13490,37 @@ of iterations or recursive calls GCC performs when 
optimizing certain
 statements or when determining their validity prior to issuing
 diagnostics.
 
+@item store-merging-max-size

+Maximum size of a single store merging region in bytes.
+
+@item hash-table-verification-limit
+The number of elements for which hash table verification is done
+for each searched element.
+
+@item max-find-base-term-values
+Maximum number of VALUEs handled during a single find_base_term call.
+
+@item analyzer-max-enodes-per-program-point
+The maximum number of exploded nodes per program point within
+the analyzer, before terminating analysis of that point.
+
+@item analyzer-min-snodes-for-call-summary
+The minimum number of supernodes within a function for the
+analyzer to consider summarizing its effects at call sites.
+
+@item analyzer-max-recursion-depth
+The maximum number of times a callsite can appear in a call stack
+within the analyzer, before terminating analysis of a call that would
+recurse deeper.
+
+@item gimple-fe-computed-hot-bb-threshold
+The number of executions of a basic block which is considered hot.
+The parameter is used only in GIMPLE FE.
+
+@item analyzer-bb-explosion-factor
+The maximum number of 'after supernode' exploded nodes within the analyzer
+per supernode, before terminating analysis.
+
 @end table
 
 The following choices of @var{name} are available on AArch64 targets:

--
2.26.2



Re: [PATCH] gcc-changelog: enhance handling of renamings

2020-05-28 Thread Pierre-Marie de Rodat

On 27/05/2020 19:50, Martin Liška wrote:

Thank you very much for working on this! It's a good idea that's currently
not supported.


You are welcome, I’m glad to contribute. :-)

However, this is available for unidiff package starting from version 
0.6.0. With a bit older

release I see:

     t = 'D'

  elif f.is_rename:
E   AttributeError: 'PatchedFile' object has no attribute 
'is_rename'


Which is a minor limitation is git_email.py is supposed to be used only 
for tests.


Agreed.


Can you please align both previous hunks, I mean doing in both:

modified_files.append(..., 'A')
t = 'D'


I initially thought it was not possible (f.path in git_email.py 
corresponds to the source file, whereas file.b_path in git_repository.py 
corresponds to the target file), but a simple refactoring did it.



We'll need here a skip based on version of unidiff. So something like:
@pytest.mark.skipif
?

I'm going to prepare a counter-part for mklog that can also handle file 
renaming.


Thanks! The updated patch is attached.

--
Pierre-Marie de Rodat
>From 42b48c97cb30bcc1b05679ced3cb946551bfcae0 Mon Sep 17 00:00:00 2001
From: Pierre-Marie de Rodat 
Date: Wed, 27 May 2020 15:25:18 +0200
Subject: [PATCH] gcc-changelog: enhance handling of renamings

So far, we expect from a commit that renames a file to contain a
changelog entry only for the new name. For example, after the following
commit:

   $ git move foo bar
   $ git commit

We expect the following changelog:

   * bar: Renamed from foo.

Git does not keep track of renamings, only file deletions and additions.
The display of patches then uses heuristics (with config-dependent
parameters) to try to match deleted and added files in the same commit.
It is thus brittle to rely on this information.

This commit modifies changelog processing so that renames are considered
as a deletion of a file plus an addition of another file. The following
changelog is now expected for the above example:

   * foo: Move...
   * bar: Here.

contrib/

	* gcc-changelog/git_email.py (GitEmail.__init__): Interpret file
	renamings as a file deletion plus a file addition.
	* gcc-changelog/git_repository.py (parse_git_revisions):
	Likewise.
	* gcc-changelog/test_email.py: New testcase.
	* gcc-changelog/test_patches.txt: New testcase.
---
 contrib/gcc-changelog/git_email.py  |  11 +-
 contrib/gcc-changelog/git_repository.py |   5 +
 contrib/gcc-changelog/test_email.py |  10 ++
 contrib/gcc-changelog/test_patches.txt  | 153 
 4 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py
index 8c9df293a66..367cf76d8ee 100755
--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -50,13 +50,22 @@ class GitEmail(GitCommit):
 
 modified_files = []
 for f in diff:
+# Strip "a/" and "b/" prefixes
+source = f.source_file[2:]
+target = f.target_file[2:]
+
 if f.is_added_file:
 t = 'A'
 elif f.is_removed_file:
 t = 'D'
+elif f.is_rename:
+# Consider that renamed files are two operations: the deletion
+# of the original name and the addition of the new one.
+modified_files.append((source, 'D'))
+t = 'A'
 else:
 t = 'M'
-modified_files.append((f.path, t))
+modified_files.append((target, t))
 super().__init__(None, date, author, body, modified_files,
  strict=strict)
 
diff --git a/contrib/gcc-changelog/git_repository.py b/contrib/gcc-changelog/git_repository.py
index 0473fe73fba..e3b6c4d7a38 100755
--- a/contrib/gcc-changelog/git_repository.py
+++ b/contrib/gcc-changelog/git_repository.py
@@ -47,6 +47,11 @@ def parse_git_revisions(repo_path, revisions, strict=False):
 t = 'A'
 elif file.deleted_file:
 t = 'D'
+elif file.renamed_file:
+# Consider that renamed files are two operations: the deletion
+# of the original name and the addition of the new one.
+modified_files.append((file.a_path, 'D'))
+t = 'A'
 else:
 t = 'M'
 modified_files.append((file.b_path, t))
diff --git a/contrib/gcc-changelog/test_email.py b/contrib/gcc-changelog/test_email.py
index 3d2c8ff2412..23372f082a0 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -18,6 +18,7 @@
 
 import os
 import tempfile
+import unidiff
 import unittest
 
 from git_email import GitEmail
@@ -25,6 +26,8 @@ from git_email import GitEmail
 
 script_path = os.path.dirname(os.path.realpath(__file__))
 
+unidiff_supports_renaming = hasattr(unidiff.PatchedFile(), 'is_rename')
+
 
 class TestGccChangelog(unittest.TestCase):
 def setUp(sel

Re: [PATCH] gcc-changelog: enhance handling of renamings

2020-05-28 Thread Martin Liška

On 5/28/20 11:05 AM, Pierre-Marie de Rodat wrote:

Thanks! The updated patch is attached.


The patch is fine, please install it.

Thanks,
Martin


Re: [PATCH] gcc-changelog: enhance handling of renamings

2020-05-28 Thread Pierre-Marie de Rodat

On 28/05/2020 11:09, Martin Liška wrote:

On 5/28/20 11:05 AM, Pierre-Marie de Rodat wrote:

Thanks! The updated patch is attached.


The patch is fine, please install it.


Now pushed. Thank you again.

--
Pierre-Marie de Rodat


[PATCH] arm: Fix unwanted fall-throughs in arm.c

2020-05-28 Thread Andrea Corallo
Hi all,

this small patch fix some unintentional fall-throughs in
`mve_vector_mem_operand'.

Regtested and bootstraped on arm-linux-gnueabihf.

Okay for trunk?

Regards

  Andrea

gcc/ChangeLog

2020-05-28  Andrea Corallo  

* config/arm/arm.c (mve_vector_mem_operand): Fix unwanted
fall-throughs.

>From 9b24746c356036c48cc5017942d565f5a786748e Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Wed, 27 May 2020 17:43:48 +0100
Subject: [PATCH] arm: Fix unwanted fall-throughs in arm.c

gcc/ChangeLog

2020-05-28  Andrea Corallo  

	* config/arm/arm.c (mve_vector_mem_operand): Fix unwanted
	fall-throughs.
---
 gcc/config/arm/arm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 90cb1fb5f901..9257c7a51a40 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13330,32 +13330,38 @@ mve_vector_mem_operand (machine_mode mode, rtx op, bool strict)
 	if (abs (val) <= 127)
 	  return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
 		  || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	return FALSE;
 	  case E_V8HImode:
 	  case E_V8HFmode:
 	if (abs (val) <= 255)
 	  return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
 		  || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	return FALSE;
 	  case E_V8QImode:
 	  case E_V4QImode:
 	if (abs (val) <= 127)
 	  return (reg_no <= LAST_LO_REGNUM
 		  || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	return FALSE;
 	  case E_V4HImode:
 	  case E_V4HFmode:
 	if (val % 2 == 0 && abs (val) <= 254)
 	  return (reg_no <= LAST_LO_REGNUM
 		  || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	return FALSE;
 	  case E_V4SImode:
 	  case E_V4SFmode:
 	if (val % 4 == 0 && abs (val) <= 508)
 	  return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
 		  || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	return FALSE;
 	  case E_V2DImode:
 	  case E_V2DFmode:
 	  case E_TImode:
 	if (val % 4 == 0 && val >= 0 && val <= 1020)
 	  return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
 		  || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+	return FALSE;
 	  default:
 	return FALSE;
 	}
-- 
2.17.1



RE: [PATCH] arm: Fix unwanted fall-throughs in arm.c

2020-05-28 Thread Kyrylo Tkachov



> -Original Message-
> From: Andrea Corallo 
> Sent: 28 May 2020 10:20
> To: gcc Patches 
> Cc: nd ; Kyrylo Tkachov ; Richard
> Earnshaw 
> Subject: [PATCH] arm: Fix unwanted fall-throughs in arm.c
> 
> Hi all,
> 
> this small patch fix some unintentional fall-throughs in
> `mve_vector_mem_operand'.
> 
> Regtested and bootstraped on arm-linux-gnueabihf.
> 
> Okay for trunk?

Oops.
Ok.
Thanks,
Kyrill

> 
> Regards
> 
>   Andrea
> 
> gcc/ChangeLog
> 
> 2020-05-28  Andrea Corallo  
> 
>   * config/arm/arm.c (mve_vector_mem_operand): Fix unwanted
>   fall-throughs.
> 


Re: [PATCH] arm: Fix unwanted fall-throughs in arm.c

2020-05-28 Thread Andrea Corallo
Kyrylo Tkachov  writes:

>> -Original Message-
>> From: Andrea Corallo 
>> Sent: 28 May 2020 10:20
>> To: gcc Patches 
>> Cc: nd ; Kyrylo Tkachov ; Richard
>> Earnshaw 
>> Subject: [PATCH] arm: Fix unwanted fall-throughs in arm.c
>>
>> Hi all,
>>
>> this small patch fix some unintentional fall-throughs in
>> `mve_vector_mem_operand'.
>>
>> Regtested and bootstraped on arm-linux-gnueabihf.
>>
>> Okay for trunk?
>
> Oops.
> Ok.
> Thanks,
> Kyrill
>

Installed as dd019ef07358.

Thanks

  Andrea


[PATCH] tree-optimization/95273 - more vectorizable_shift massaging

2020-05-28 Thread Richard Biener
Covering all bases in vectorizable_shift is hard - this makes sure
to appropriately handle the case of PR95356 without breaking others.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

2020-05-28  Richard Biener  

PR tree-optimization/95273
PR tree-optimization/95356
* tree-vect-stmts.c (vectorizable_shift): Adjust when and to
what we set the vector type of the shift operand SLP node
again.

* gcc.target/i386/pr95356.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr95356.c | 125 
 gcc/tree-vect-stmts.c   |   6 +-
 2 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr95356.c

diff --git a/gcc/testsuite/gcc.target/i386/pr95356.c 
b/gcc/testsuite/gcc.target/i386/pr95356.c
new file mode 100644
index 000..fdd917ba5e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr95356.c
@@ -0,0 +1,125 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512dq" } */
+
+extern void abort (void);
+long long a[16];
+
+__attribute__((noinline, noclone)) void
+f1 (void)
+{
+  long long a0, a1, a2, a3;
+  a0 = a[0];
+  a1 = a[1];
+  a2 = a[2];
+  a3 = a[3];
+  a0 = a0 << 2;
+  a1 = a1 << 3;
+  a2 = a2 << 4;
+  a3 = a3 << 5;
+  a[0] = a0;
+  a[1] = a1;
+  a[2] = a2;
+  a[3] = a3;
+}
+
+__attribute__((noinline, noclone)) void
+f2 (void)
+{
+  long long a0, a1, a2, a3;
+  a0 = a[0];
+  a1 = a[1];
+  a2 = a[2];
+  a3 = a[3];
+  a0 = a0 << 2;
+  a1 = a1 << 2;
+  a2 = a2 << 2;
+  a3 = a3 << 2;
+  a[0] = a0;
+  a[1] = a1;
+  a[2] = a2;
+  a[3] = a3;
+}
+
+__attribute__((noinline, noclone)) void
+f2a (int x)
+{
+  long long a0, a1, a2, a3;
+  a0 = a[0];
+  a1 = a[1];
+  a2 = a[2];
+  a3 = a[3];
+  a0 = a0 << x;
+  a1 = a1 << 2;
+  a2 = a2 << 2;
+  a3 = a3 << 2;
+  a[0] = a0;
+  a[1] = a1;
+  a[2] = a2;
+  a[3] = a3;
+}
+
+__attribute__((noinline, noclone)) void
+f2b (int x)
+{
+  long long a0, a1, a2, a3;
+  a0 = a[0];
+  a1 = a[1];
+  a2 = a[2];
+  a3 = a[3];
+  a0 = a0 << 2;
+  a1 = a1 << 2;
+  a2 = a2 << x;
+  a3 = a3 << 2;
+  a[0] = a0;
+  a[1] = a1;
+  a[2] = a2;
+  a[3] = a3;
+}
+
+__attribute__((noinline, noclone)) void
+f3 (int x)
+{
+  long long a0, a1, a2, a3;
+  a0 = a[0];
+  a1 = a[1];
+  a2 = a[2];
+  a3 = a[3];
+  a0 = a0 << x;
+  a1 = a1 << x;
+  a2 = a2 << x;
+  a3 = a3 << x;
+  a[0] = a0;
+  a[1] = a1;
+  a[2] = a2;
+  a[3] = a3;
+}
+
+int
+main ()
+{
+  a[0] = 4LL;
+  a[1] = 3LL;
+  a[2] = 2LL;
+  a[3] = 1LL;
+  f1 ();
+  if (a[0] != (4LL << 2) || a[1] != (3LL << 3)
+  || a[2] != (2LL << 4) || a[3] != (1LL << 5))
+abort ();
+  f2 ();
+  if (a[0] != (4LL << 4) || a[1] != (3LL << 5)
+  || a[2] != (2LL << 6) || a[3] != (1LL << 7))
+abort ();
+  f3 (3);
+  if (a[0] != (4LL << 7) || a[1] != (3LL << 8)
+  || a[2] != (2LL << 9) || a[3] != (1LL << 10))
+abort ();
+  f2a (3);
+  if (a[0] != (4LL << 10) || a[1] != (3LL << 10)
+  || a[2] != (2LL << 11) || a[3] != (1LL << 12))
+abort ();
+  f2b (3);
+  if (a[0] != (4LL << 12) || a[1] != (3LL << 12)
+  || a[2] != (2LL << 14) || a[3] != (1LL << 14))
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2f92bbe..ff335aa531e 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5792,7 +5792,11 @@ vectorizable_shift (vec_info *vinfo,
   if (slp_node
  && (!vect_maybe_update_slp_op_vectype (slp_op0, vectype)
  || (!scalar_shift_arg
- && !vect_maybe_update_slp_op_vectype (slp_op1, vectype
+ && (!incompatible_op1_vectype_p
+ || dt[1] == vect_constant_def)
+ && !vect_maybe_update_slp_op_vectype
+   (slp_op1,
+incompatible_op1_vectype_p ? vectype : op1_vectype
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-- 
2.16.4


Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-05-28 Thread Richard Biener via Gcc-patches
On Thu, May 28, 2020 at 10:52 AM guojiufu  wrote:
>
> From: Jiufu Guo 
>
> Currently GIMPLE complete unroller(cunroll) is checking
> flag_unroll_loops and flag_peel_loops to see if allow size growth.
> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
> To have more freedom to control cunroll and RTL unroller, this patch
> introduces flag_cunroll_grow_size.  With this patch, we can control
> cunroll and RTL unroller indepently.
>
> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
> GCC10 after week?
>
> gcc/ChangeLog
> 2020-02-28  Jiufu Guo  
>
> * common.opt (flag_cunroll_grow_size): New flag.
> * toplev.c (process_options): Set flag_cunroll_grow_size.
> * tree-ssa-loop-ivcanon.c (pass_complete_unroll::execute):
> Use flag_cunroll_grow_size.
> ---
>  gcc/common.opt  | 4 
>  gcc/toplev.c| 4 
>  gcc/tree-ssa-loop-ivcanon.c | 3 +--
>  3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4464049fc1f..1d0fa7b1749 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2856,6 +2856,10 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
>
> +funroll-completely-grow-size
> +Var(flag_cunroll_grow_size) Init(2)
> +; Control cunroll to allow size growth during complete unrolling
> +

So this really adds a new compiler option which would need documenting.

I fear we'll get into bikeshed territory here as well...  I originally thought
we can use

Variable
int flag_cunroll_grow_size;

but now realize that does not work well with LTO without adjusting
the awk scripts to generate option saving/restoring.  For your patch
you'd need to add 'Optimization' to get the flag streamed properly,
you should also verify the target adjustment done in the backend
is reflected in LTO mode.

>  ; Nonzero means that loop optimizer may assume that the induction variables
>  ; that control loops do not overflow and that the loops with nontrivial
>  ; exit condition are not infinite
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 96316fbd23b..8d52358efdd 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1474,6 +1474,10 @@ process_options (void)
>if (flag_unroll_all_loops)
>  flag_unroll_loops = 1;
>
> +  /* Allow cunroll to grow size accordingly.  */
> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
> +flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
> +

Any reason to not use EnabledBy(funroll-loops || fpeel-loops)?

>/* web and rename-registers help when run after loop unrolling.  */
>if (flag_web == AUTODETECT_VALUE)
>  flag_web = flag_unroll_loops;
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 8ab6ab3330c..d6a4617a6a1 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1603,8 +1603,7 @@ pass_complete_unroll::execute (function *fun)
>   re-peeling the same loop multiple times.  */
>if (flag_peel_loops)
>  peeled_loops = BITMAP_ALLOC (NULL);
> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> -  || flag_peel_loops
> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>|| optimize >= 3, true);

Given we check optimize >= 3 here please enable the flag by default
at O3+ via opts.c:default_options_table and also elide the optimize >= 3
check.  That way -fno-unroll-completely-grow-size would have the desired effect.

Now back to the option name ... if we expose the option we should apply
some forward looking.  Currently cunroll cannot be disabled or enabled
with a flag and the desired new flag simply tunes one knob on it.  How
about adding

-fcomplete-unroll-loops[=may-grow]

to be able to further extend this later (there's the knob to only unroll
non-outermost loops and the knob whether to unroll loops where
intermediate exits are not statically predicted - incompletely controlled
by -fpeel-loops).  There's unfortunately no existing examples that allows
multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
the sanitizers which have manual option parsing.

So if there's no good suggestion from option folks maybe go with

-fcomplete-unroll-loops-may-grow

(ick).  And on a second thought -fcomplete-unroll-loops[=...] should
be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
bases?

I really hate to explode the number of options users have to
consider optimizing their code ...

So if we can defer all this thinking and make a non-option flag
variable work that would be best IMHO.

Richard.

>if (peeled_loops)
>  {
> --
> 2.17.1
>


Re: [PATCH 1/2] [aarch64] Rework fpcr fpsr getter/setter builtins

2020-05-28 Thread Andrea Corallo
Andrea Corallo  writes:

> Hi all,
>
> I'd like to submit this patch introducing the following 64bit builtins
> variants as FPCR and FPSR registers getter/setter:
>
> unsigned long long __builtin_aarch64_get_fpcr64 ()
> void __builtin_aarch64_set_fpcr64 (unsigned long long)
> unsigned long long __builtin_aarch64_get_fpsr64 ()
> void __builtin_aarch64_set_fpsr64 (unsigned long long)
>
> Regards
>   Andrea
>
> gcc/ChangeLog:
>
> 2020-??-??  Andrea Corallo  
>
>   * config/aarch64/aarch64-builtins.c (aarch64_builtins): Add enums
>   for 64bits fpsr/fpcr getter setters builtin variants.
>   (aarch64_init_fpsr_fpcr_builtins): New function.
>   (aarch64_expand_fcr_fpsr_builtin): New function.
>   (aarch64_general_expand_builtin): Modify to make use of the later.
>   * config/aarch64/aarch64.md (UNSPECV_GET_FPCR64)
>   (UNSPECV_SET_FPCR64, UNSPECV_GET_FPSR64, UNSPECV_SET_FPSR64): Add
>   4 new unpecv.
>   (set_fpcr64, get_fpcr64,set_fpsr64, get_fpsr64): New patterns.
>   * doc/extend.texi (__builtin_aarch64_get_fpcr64)
>   (__builtin_aarch64_set_fpcr64, __builtin_aarch64_get_fpsr64)
>   (__builtin_aarch64_set_fpsr64): Add into AArch64 Built-in
>   Functions.
>
> gcc/testsuite/ChangeLog:
>
> 2020-??-??  Andrea Corallo  
>
>   * gcc.target/aarch64/get_fpcr64.c: New test.
>   * gcc.target/aarch64/set_fpcr64.c: New test.
>   * gcc.target/aarch64/get_fpsr64.c: New test.
>   * gcc.target/aarch64/set_fpsr64.c: New test.

Hi all,

Leaving aside 2/2, I've retested this one (1/2) on top of current
master.

Regtested and bootstrapped on aarch64-linux-gnu.

Is it okay for trunk?

Regards

  Andrea


Re: Ping^1 [PATCH 2/4 V3] Add target hook stride_dform_valid_p

2020-05-28 Thread Richard Sandiford
"Kewen.Lin"  writes:
> Hi,
>
> Gentle ping patches as below:
>
> 1/4 v3 https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540171.html
> 2/4 v3 https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541387.html
> 3/4 v3 https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545643.html
>
> Or shall I ping them seperately?

Can you repost the full series?

Thanks,
Richard


[WIKI] Replace delta with C-Vise (and C-Reduce)

2020-05-28 Thread Martin Liška

Hello.

I've spent quite some time working of a super-parallel reduction tool
and I would like to promote it ;) Moreover, delta website is down and
it should be replaced: [1].

There's updated wording of the following WIKI page:
https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction

For all delta lovers who don't like C-Vise (or C-Reduce), I recommend
cvise-delta tool. It works the same as delta, but in a super-parallel way.

The tool is available on openSUSE and Gentoo Linux. Thanks to Matthias,
the tool will be available soon on Ubuntu and Debian. The RedHat port
is work-in-progress.

Thoughts?
Martin

[1] http://delta.tigris.org/
>From 4f9bb31b435d0e60ea7cdccd575aa590c943f516 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 28 May 2020 12:54:21 +0200
Subject: [PATCH] Replace usage of delta (and multidelta) with C-Vise and
 C-Reduce.

---
 guide_to_test_case.txt | 110 -
 1 file changed, 20 insertions(+), 90 deletions(-)

diff --git a/guide_to_test_case.txt b/guide_to_test_case.txt
index 7926b28..cd66c15 100644
--- a/guide_to_test_case.txt
+++ b/guide_to_test_case.txt
@@ -5,9 +5,7 @@
 Our [[https://gcc.gnu.org/bugs.html|bug reporting instructions]] ask that a bug report include the preprocessed version of the file that triggers the bug. There are several [[https://gcc.gnu.org/bugs/minimize.html|methods to minimise a testcase]].
 
 This page ought to serve as an introduction to automatic testcase reduction using
-the [[http://delta.tigris.org/|Delta]] or [[http://embed.cs.utah.edu/creduce/|C-Reduce]] tools.
-
-For '''Fortran''' there is a [[https://gcc.gnu.org/ml/gcc/2009-10/msg00618.html|patched version of Delta]] which takes `subroutine`/`do`/`if` boundaries into account.
+[[https://github.com/marxin/cvise|C-Vise]] tools or [[http://embed.cs.utah.edu/creduce/|C-Reduce]] tools.
 
 == Simple ICE reduction ==
 
@@ -23,19 +21,15 @@ For '''Fortran''' there is a [[https://gcc.gnu.org/ml/gcc/2009-10/msg00618.html|
   > mv testcase.x testcase.i
 }}}
 
- The Delta tool requires us to create a script that exits with status zero in
+ The C-Vise (or C-Reduce) tool requires us to create a script that exits with status zero in
  case of the intermediate reduced testcase still is a testcase for what we got
  it in the first place (the same ICE is produced).  A sample script may look
- like (the testcase filename is passed as argument to the script)
+ like
 
 
 {{{
   #!/bin/sh
-  gcc -c -O -Wfatal-errors $1 2>&1 | grep 'internal compiler error: in typeck.c:2534'
-  if ! test $? = 0; then
-   exit 1
-  fi
-  exit 0
+  gcc -c -O -Wfatal-errors testcase.i 2>&1 | grep 'internal compiler error: in typeck.c:2534'
 }}}
 
  Note the {{{-Wfatal-errors}}} option can greatly speed up reducing large testcases,
@@ -46,92 +40,35 @@ For '''Fortran''' there is a [[https://gcc.gnu.org/ml/gcc/2009-10/msg00618.html|
  You should be able to verify your script by invoking it with the unreduced testcase.
  Try if it has zero exit code.
 
- Now we can invoke Delta to have it reduce the testcase using the script we just
+ Now we can invoke C-Vise (or C-Reduce) to have it reduce the testcase using the script we just
  wrote
 
-
 {{{
-  > ~/bin/delta -test=check.sh -suffix=.i -cp_minimal=testcase-min.i testcase.i
+  > cvise check.sh testcase.i
 }}}
 
  This will reduce the testcase until no single line can be removed from it without
  the check.sh script failing to identify it as a valid testcase.
 
-== Using topformflat ==
-
- The way delta reduces a testcase by removing complete lines often conflicts with
- the syntactic structure of a C/C++ testcase.  To make testcase reduction faster
- and more accurate there exists the topformflat tool in the Delta distribution
- that puts syntactically related tokens on one line, thereby making it possible
- to, f.i. restrict reduction to whole-function removal in a first step.  Basically
- you can control the nesting level up to which tokens are put to separate lines
- where a level of zero is all toplevel constructs onto a line on their own, level
- one would be each statement of a toplevel function on a separate line.
-
- Reducing a big C++ testcase one usually starts with level zero, increasing it
- until Delta no longer can reduce the testcase further (due to the line-oriented
- reduction it may be worthwhile to start over with level zero again and iterate
- until there's no further reduction).  An improved topformflat was posted at
- [[https://gcc.gnu.org/ml/gcc-patches/2005-08/msg01503.html]] where you can additionally
- specify if you want to ignore namespace and extern "C" as a nesting construct by
- specifying a second command line argument to topformflat.
-
-
-{{{
-  > ~/bin/topformflat 0 x < testcase.i > testcase.0x.i
-}}}
-
-
-== Using multidelta ==
-
-All the above can be simplified by using the '''multidelta''' tool that comes with the [[http://delta.tigris.org/|Delta distribution]]. The only differences is that the script should be able to b

[PATCH][GCC] arm: Fix the MVE ACLE vbicq intrinsics.

2020-05-28 Thread Srinath Parvathaneni
Hello,

Following MVE intrinsic testcases are failing in GCC testsuite.

Directory: gcc.target/arm/mve/intrinsics/
Testcases: vbicq_f16.c, vbicq_f32.c, vbicq_s16.c, vbicq_s32.c, vbicq_s8.c
,vbicq_u16.c, vbicq_u32.c and vbicq_u8.c.

This patch fixes the vbicq intrinsics by modifying the intrinsic parameters
and polymorphic variants in "arm_mve.h" header file.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]for more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for master and gcc-10 branch?

Thanks,
Srinath.

gcc/ChangeLog:

2020-05-20  Srinath Parvathaneni  

* config/arm/arm_mve.h (__arm_vbicq_n_u16): Correct the intrinsic
arguments.
(__arm_vbicq_n_s16): Likewise.
(__arm_vbicq_n_u32): Likewise.
(__arm_vbicq_n_s32): Likewise.
(__arm_vbicq): Modify polymorphic variant.

gcc/testsuite/ChangeLog:

2020-05-20  Srinath Parvathaneni  

* gcc.target/arm/mve/intrinsics/vbicq_f16.c: Modify.
* gcc.target/arm/mve/intrinsics/vbicq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_n_u16.c: Likewise. 
* gcc.target/arm/mve/intrinsics/vbicq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_u8.c: Likewise.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 
1002512a98f9364403f66eba0e320fe5070bdc3a..9bc5c97db8fea15d8140d966bc501b8a457a1abf
 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -6361,7 +6361,7 @@ __arm_vorrq_n_u16 (uint16x8_t __a, const int __imm)
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_n_u16 (uint16x8_t __a, const uint16_t __imm)
+__arm_vbicq_n_u16 (uint16x8_t __a, const int __imm)
 {
   return __builtin_mve_vbicq_n_uv8hi (__a, __imm);
 }
@@ -6473,7 +6473,7 @@ __arm_vorrq_n_s16 (int16x8_t __a, const int __imm)
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_n_s16 (int16x8_t __a, const int16_t __imm)
+__arm_vbicq_n_s16 (int16x8_t __a, const int __imm)
 {
   return __builtin_mve_vbicq_n_sv8hi (__a, __imm);
 }
@@ -6564,7 +6564,7 @@ __arm_vorrq_n_u32 (uint32x4_t __a, const int __imm)
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_n_u32 (uint32x4_t __a, const uint32_t __imm)
+__arm_vbicq_n_u32 (uint32x4_t __a, const int __imm)
 {
   return __builtin_mve_vbicq_n_uv4si (__a, __imm);
 }
@@ -6676,7 +6676,7 @@ __arm_vorrq_n_s32 (int32x4_t __a, const int __imm)
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_n_s32 (int32x4_t __a, const int32_t __imm)
+__arm_vbicq_n_s32 (int32x4_t __a, const int __imm)
 {
   return __builtin_mve_vbicq_n_sv4si (__a, __imm);
 }
@@ -23182,7 +23182,7 @@ __arm_vorrq (uint16x8_t __a, const int __imm)
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq (uint16x8_t __a, const uint16_t __imm)
+__arm_vbicq (uint16x8_t __a, const int __imm)
 {
  return __arm_vbicq_n_u16 (__a, __imm);
 }
@@ -23294,7 +23294,7 @@ __arm_vorrq (int16x8_t __a, const int __imm)
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq (int16x8_t __a, const int16_t __imm)
+__arm_vbicq (int16x8_t __a, const int __imm)
 {
  return __arm_vbicq_n_s16 (__a, __imm);
 }
@@ -23385,7 +23385,7 @@ __arm_vorrq (uint32x4_t __a, const int __imm)
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq (uint32x4_t __a, const uint32_t __imm)
+__arm_vbicq (uint32x4_t __a, const int __imm)
 {
  return __arm_vbicq_n_u32 (__a, __imm);
 }
@@ -23497,7 +23497,7 @@ __arm_vorrq (int32x4_t __a, const int __imm)
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq (int32x4_t __a, const int32_t __imm)
+__arm_vbicq (int32x4_t __a, const int __imm)
 {
  return __arm_vbicq_n_s32 (__a, __imm);
 }
@@ -35963,10 +35963,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid

Re: [WIKI] Replace delta with C-Vise (and C-Reduce)

2020-05-28 Thread Martin Jambor
Hi,

On Thu, May 28 2020, Martin Liška wrote:
> Hello.
>
> I've spent quite some time working of a super-parallel reduction tool
> and I would like to promote it ;) Moreover, delta website is down and
> it should be replaced: [1].
>
> There's updated wording of the following WIKI page:
> https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction
>
> For all delta lovers who don't like C-Vise (or C-Reduce), I recommend
> cvise-delta tool. It works the same as delta, but in a super-parallel way.
>
> The tool is available on openSUSE and Gentoo Linux. Thanks to Matthias,
> the tool will be available soon on Ubuntu and Debian. The RedHat port
> is work-in-progress.
>
> Thoughts?

I don't think you need to seek approval to edit wiki pages and putting
c-vise instructions at the top of that page is definitely the right
thing to do.

On the other hand, I would not remove the delta and multidelta sections
but rather move them to the bottom of the page.  The instructions may
still be useful on various ancient and non-Linux systems.

Thanks,

Martin


> Martin
>
> [1] http://delta.tigris.org/
> From 4f9bb31b435d0e60ea7cdccd575aa590c943f516 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Thu, 28 May 2020 12:54:21 +0200
> Subject: [PATCH] Replace usage of delta (and multidelta) with C-Vise and
>  C-Reduce.
>
> ---
>  guide_to_test_case.txt | 110 -
>  1 file changed, 20 insertions(+), 90 deletions(-)
>
> diff --git a/guide_to_test_case.txt b/guide_to_test_case.txt
> index 7926b28..cd66c15 100644
> --- a/guide_to_test_case.txt
> +++ b/guide_to_test_case.txt
> @@ -5,9 +5,7 @@
>  Our [[https://gcc.gnu.org/bugs.html|bug reporting instructions]] ask that a 
> bug report include the preprocessed version of the file that triggers the 
> bug. There are several [[https://gcc.gnu.org/bugs/minimize.html|methods to 
> minimise a testcase]].
>  
>  This page ought to serve as an introduction to automatic testcase reduction 
> using
> -the [[http://delta.tigris.org/|Delta]] or 
> [[http://embed.cs.utah.edu/creduce/|C-Reduce]] tools.
> -
> -For '''Fortran''' there is a 
> [[https://gcc.gnu.org/ml/gcc/2009-10/msg00618.html|patched version of Delta]] 
> which takes `subroutine`/`do`/`if` boundaries into account.
> +[[https://github.com/marxin/cvise|C-Vise]] tools or 
> [[http://embed.cs.utah.edu/creduce/|C-Reduce]] tools.
>  
>  == Simple ICE reduction ==
>  
> @@ -23,19 +21,15 @@ For '''Fortran''' there is a 
> [[https://gcc.gnu.org/ml/gcc/2009-10/msg00618.html|
>> mv testcase.x testcase.i
>  }}}
>  
> - The Delta tool requires us to create a script that exits with status zero in
> + The C-Vise (or C-Reduce) tool requires us to create a script that exits 
> with status zero in
>   case of the intermediate reduced testcase still is a testcase for what we 
> got
>   it in the first place (the same ICE is produced).  A sample script may look
> - like (the testcase filename is passed as argument to the script)
> + like
>  
>  
>  {{{
>#!/bin/sh
> -  gcc -c -O -Wfatal-errors $1 2>&1 | grep 'internal compiler error: in 
> typeck.c:2534'
> -  if ! test $? = 0; then
> -   exit 1
> -  fi
> -  exit 0
> +  gcc -c -O -Wfatal-errors testcase.i 2>&1 | grep 'internal compiler error: 
> in typeck.c:2534'
>  }}}
>  
>   Note the {{{-Wfatal-errors}}} option can greatly speed up reducing large 
> testcases,
> @@ -46,92 +40,35 @@ For '''Fortran''' there is a 
> [[https://gcc.gnu.org/ml/gcc/2009-10/msg00618.html|
>   You should be able to verify your script by invoking it with the unreduced 
> testcase.
>   Try if it has zero exit code.
>  
> - Now we can invoke Delta to have it reduce the testcase using the script we 
> just
> + Now we can invoke C-Vise (or C-Reduce) to have it reduce the testcase using 
> the script we just
>   wrote
>  
> -
>  {{{
> -  > ~/bin/delta -test=check.sh -suffix=.i -cp_minimal=testcase-min.i 
> testcase.i
> +  > cvise check.sh testcase.i
>  }}}
>  
>   This will reduce the testcase until no single line can be removed from it 
> without
>   the check.sh script failing to identify it as a valid testcase.
>  
> -== Using topformflat ==
> -
> - The way delta reduces a testcase by removing complete lines often conflicts 
> with
> - the syntactic structure of a C/C++ testcase.  To make testcase reduction 
> faster
> - and more accurate there exists the topformflat tool in the Delta 
> distribution
> - that puts syntactically related tokens on one line, thereby making it 
> possible
> - to, f.i. restrict reduction to whole-function removal in a first step.  
> Basically
> - you can control the nesting level up to which tokens are put to separate 
> lines
> - where a level of zero is all toplevel constructs onto a line on their own, 
> level
> - one would be each statement of a toplevel function on a separate line.
> -
> - Reducing a big C++ testcase one usually starts with level zero, increasing 
> it
> - until Delta no longer can reduce the testcase further (due to the 
> line-orien

[PATCH] remove obsolete code from SLP invariant costing

2020-05-28 Thread Richard Biener
This removes handling of !SLP_TREE_VECTYPE from invariant costing.
The single caller guards against this case already.

2020-05-28  Richard Biener  

* tree-vect-slp.c (vect_prologue_cost_for_slp): Remove
case for !SLP_TREE_VECTYPE.
(vect_slp_analyze_node_operations): Adjust.
---
 gcc/tree-vect-slp.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index aa95c0a7f75..5976e91cf62 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2739,19 +2739,13 @@ vect_slp_convert_to_external (vec_info *vinfo, slp_tree 
node,
by NODE.  */
 
 static void
-vect_prologue_cost_for_slp (vec_info *vinfo,
-   slp_tree node,
+vect_prologue_cost_for_slp (slp_tree node,
stmt_vector_for_cost *cost_vec)
 {
   /* Without looking at the actual initializer a vector of
  constants can be implemented as load from the constant pool.
  When all elements are the same we can use a splat.  */
   tree vectype = SLP_TREE_VECTYPE (node);
-  /* ???  Ideally we'd want all invariant nodes to have a vectype.  */
-  if (!vectype)
-vectype = get_vectype_for_scalar_type (vinfo,
-  TREE_TYPE (SLP_TREE_SCALAR_OPS
- (node)[0]), node);
   unsigned group_size = SLP_TREE_SCALAR_OPS (node).length ();
   unsigned num_vects_to_check;
   unsigned HOST_WIDE_INT const_nunits;
@@ -2911,7 +2905,7 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
slp_tree node,
  SLP_TREE_NUMBER_OF_VEC_STMTS (child)
= vect_get_num_vectors (vf * group_size, vector_type);
  /* And cost them.  */
- vect_prologue_cost_for_slp (vinfo, child, cost_vec);
+ vect_prologue_cost_for_slp (child, cost_vec);
}
 
   /* If this node can't be vectorized, try pruning the tree here rather
-- 
2.26.2


Re: [PATCH] gcc-changelog: enhance handling of renamings

2020-05-28 Thread Jakub Jelinek via Gcc-patches
On Thu, May 28, 2020 at 11:09:41AM +0200, Martin Liška wrote:
> On 5/28/20 11:05 AM, Pierre-Marie de Rodat wrote:
> > Thanks! The updated patch is attached.
> 
> The patch is fine, please install it.

I'd like to mention that for file renames, perhaps it is acceptable not to
have a ChangeLog entry for pure file rename (do we require it after
Pierre-Marie's patch), but if it is rename + some changes, the committer
should still describe the changes in there, which is something the script
can't do for him.

Jakub



Re: [WIKI] Replace delta with C-Vise (and C-Reduce)

2020-05-28 Thread Tobias Burnus

Hi Martin,

On 5/28/20 1:17 PM, Martin Jambor wrote:

On Thu, May 28 2020, Martin Liška wrote:

Hello.

I've spent quite some time working of a super-parallel reduction tool
and I would like to promote it ;) Moreover, delta website is down and
it should be replaced: [1].

It is not completely clear to me whether C-Vise also works with Fortran;
if not, could you add a reference early in the page to point for Fortran
to the delta section? If C-Vise does work with Fortran, I still concur
with Martin's suggestion:

On the other hand, I would not remove the delta and multidelta sections
but rather move them to the bottom of the page.  The instructions may
still be useful on various ancient and non-Linux systems.

Tobias
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] gcc-changelog: enhance handling of renamings

2020-05-28 Thread Martin Liška

On 5/28/20 1:20 PM, Jakub Jelinek wrote:

On Thu, May 28, 2020 at 11:09:41AM +0200, Martin Liška wrote:

On 5/28/20 11:05 AM, Pierre-Marie de Rodat wrote:

Thanks! The updated patch is attached.


The patch is fine, please install it.


I'd like to mention that for file renames, perhaps it is acceptable not to
have a ChangeLog entry for pure file rename (do we require it after
Pierre-Marie's patch), but if it is rename + some changes, the committer
should still describe the changes in there, which is something the script
can't do for him.


That will likely depend on the similarity level (100%). But I'm not sure 
GitPython
can provide such an information. It seems to me a nit, file renaming is quite 
rare
operation.

Martin



Jakub





[PATCH 0/4] IVOPTs consider step cost for different forms when unrolling

2020-05-28 Thread Kewen.Lin via Gcc-patches
Hi,

This is one repost and you can refer to the original series 
via https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538360.html.

As we discussed in the thread
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html
Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html,
I'm working to teach IVOPTs to consider D-form group access during unrolling.
The difference on D-form and other forms during unrolling is we can put the
stride into displacement field to avoid additional step increment. eg:

With X-form (uf step increment):
  ...
  LD A = baseA, X
  LD B = baseB, X
  ST C = baseC, X
  X = X + stride
  LD A = baseA, X
  LD B = baseB, X
  ST C = baseC, X
  X = X + stride
  LD A = baseA, X
  LD B = baseB, X
  ST C = baseC, X
  X = X + stride
  ...

With D-form (one step increment for each base):
  ...
  LD A = baseA, OFF
  LD B = baseB, OFF
  ST C = baseC, OFF
  LD A = baseA, OFF+stride
  LD B = baseB, OFF+stride
  ST C = baseC, OFF+stride
  LD A = baseA, OFF+2*stride
  LD B = baseB, OFF+2*stride
  ST C = baseC, OFF+2*stride
  ...
  baseA += stride * uf
  baseB += stride * uf
  baseC += stride * uf

Imagining that if the loop get unrolled by 8 times, then 3 step updates with
D-form vs. 8 step updates with X-form. Here we only need to check stride
meet D-form field requirement, since if OFF doesn't meet, we can construct
baseA' with baseA + OFF.

This patch set consists four parts:
 
  [PATCH 1/4] unroll: Add middle-end unroll factor estimation

 Add unroll factor estimation in middle-end. It mainly refers to current
 RTL unroll factor determination in function decide_unrolling and its
 sub calls.  As Richi suggested, we probably can force unroll factor
 with this and avoid duplicate unroll factor calculation, but I think it
 need more benchmarking work and should be handled separately.

  [PATCH 2/4] param: Introduce one param to control unroll factor 

 As Richard and Segher's suggestion, I used addr_offset_valid_p for the
 addressing mode, rather than one target hook.  As Richard's suggestion,
 
 it introduces one parameter to control this IVOPTs consideration and
 further tweaking [3/4] on top of unroll factor estimation [1/4].
 
  [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling

 Teach IVOPTs to mark the IV cand as reg_offset_p which is derived from
 one address IV type group where the whole group is valid to use reg_offset
 mode.  Then scaling up the IV cand step cost by (uf - 1) for no
 reg_offset_p IV cands, here the uf is one estimated unroll factor [1/4].
 
  [PATCH 4/4] rs6000: P9 D-form test cases

 Add some test cases, mainly copied from Kelvin's patch.  This is approved
 by Segher if the whole series is fine.


Many thanks to Richard and Segher on previous version reviews.

Bootstrapped and regress tested on powerpc64le-linux-gnu.

Any comments are highly appreciated!  Thanks in advance!


BR,
Kewen

---

 gcc/cfgloop.h  |   3 ++
 gcc/config/i386/i386-options.c |   6 +++
 gcc/config/s390/s390.c |   6 +++
 gcc/doc/invoke.texi|   9 +
 gcc/params.opt |   4 ++
 gcc/tree-ssa-loop-ivopts.c | 100 
++-
 gcc/tree-ssa-loop-manip.c  | 253 
++
 gcc/tree-ssa-loop-manip.h  |   3 +-
 gcc/tree-ssa-loop.c|  33 
 gcc/tree-ssa-loop.h|   2 +
 10 files changed, 416 insertions(+), 3 deletions(-)



[PATCH 1/4] unroll: Add middle-end unroll factor estimation

2020-05-28 Thread Kewen.Lin via Gcc-patches

gcc/ChangeLog

2020-MM-DD  Kewen Lin  

* cfgloop.h (struct loop): New field estimated_unroll.
* tree-ssa-loop-manip.c (decide_unroll_const_iter): New function.
(decide_unroll_runtime_iter): Likewise.
(decide_unroll_stupid): Likewise.
(estimate_unroll_factor): Likewise.
* tree-ssa-loop-manip.h (estimate_unroll_factor): New declaration.
* tree-ssa-loop.c (tree_average_num_loop_insns): New function.
* tree-ssa-loop.h (tree_average_num_loop_insns): New declaration.


---
 gcc/cfgloop.h |   3 +
 gcc/tree-ssa-loop-manip.c | 253 ++
 gcc/tree-ssa-loop-manip.h |   3 +-
 gcc/tree-ssa-loop.c   |  33 ++
 gcc/tree-ssa-loop.h   |   2 +
 5 files changed, 292 insertions(+), 2 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 11378ca..c5bcca7 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -232,6 +232,9 @@ public:
  Other values means unroll with the given unrolling factor.  */
   unsigned short unroll;
 
+  /* Like unroll field above, but it's estimated in middle-end.  */
+  unsigned short estimated_unroll;
+
   /* If this loop was inlined the main clique of the callee which does
  not need remapping when copying the loop body.  */
   unsigned short owned_clique;
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 120b35b..8a5a1a9 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
 #include "tree.h"
 #include "gimple.h"
 #include "cfghooks.h"
@@ -42,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
 #include "tree-inline.h"
+#include "wide-int.h"
 
 /* All bitmaps for rewriting into loop-closed SSA go on this obstack,
so that we can free them all at once.  */
@@ -1592,3 +1594,254 @@ canonicalize_loop_ivs (class loop *loop, tree *nit, 
bool bump_in_latch)
 
   return var_before;
 }
+
+/* Try to determine estimated unroll factor for given LOOP with constant number
+   of iterations, mainly refer to decide_unroll_constant_iterations.
+- NITER_DESC holds number of iteration description if it isn't NULL.
+- NUNROLL holds a unroll factor value computed with instruction numbers.
+- ITER holds estimated or likely max loop iterations.
+   Return true if it succeeds, also update estimated_unroll.  */
+
+static bool
+decide_unroll_const_iter (class loop *loop, const tree_niter_desc *niter_desc,
+ unsigned nunroll, const widest_int *iter)
+{
+  /* Skip big loops.  */
+  if (nunroll <= 1)
+return false;
+
+  gcc_assert (niter_desc && niter_desc->assumptions);
+
+  /* Check number of iterations is constant, return false if no.  */
+  if ((niter_desc->may_be_zero && !integer_zerop (niter_desc->may_be_zero))
+  || !tree_fits_uhwi_p (niter_desc->niter))
+return false;
+
+  unsigned HOST_WIDE_INT const_niter = tree_to_uhwi (niter_desc->niter);
+
+  /* If unroll factor is set explicitly, use it as estimated_unroll.  */
+  if (loop->unroll > 0 && loop->unroll < USHRT_MAX)
+{
+  /* It should have been peeled instead.  */
+  if (const_niter == 0 || (unsigned) loop->unroll > const_niter - 1)
+   loop->estimated_unroll = 1;
+  else
+   loop->estimated_unroll = loop->unroll;
+  return true;
+}
+
+  /* Check whether the loop rolls enough to consider.  */
+  if (const_niter < 2 * nunroll || wi::ltu_p (*iter, 2 * nunroll))
+return false;
+
+  /* Success; now compute number of iterations to unroll.  */
+  unsigned best_unroll = 0, n_copies = 0;
+  unsigned best_copies = 2 * nunroll + 10;
+  unsigned i = 2 * nunroll + 2;
+
+  if (i > const_niter - 2)
+i = const_niter - 2;
+
+  for (; i >= nunroll - 1; i--)
+{
+  unsigned exit_mod = const_niter % (i + 1);
+
+  if (!empty_block_p (loop->latch))
+   n_copies = exit_mod + i + 1;
+  else if (exit_mod != i)
+   n_copies = exit_mod + i + 2;
+  else
+   n_copies = i + 1;
+
+  if (n_copies < best_copies)
+   {
+ best_copies = n_copies;
+ best_unroll = i;
+   }
+}
+
+  loop->estimated_unroll = best_unroll + 1;
+  return true;
+}
+
+/* Try to determine estimated unroll factor for given LOOP with countable but
+   non-constant number of iterations, mainly refer to
+   decide_unroll_runtime_iterations.
+- NITER_DESC holds number of iteration description if it isn't NULL.
+- NUNROLL_IN holds a unroll factor value computed with instruction numbers.
+- ITER holds estimated or likely max loop iterations.
+   Return true if it succeeds, also update estimated_unroll.  */
+
+static bool
+decide_unroll_runtime_iter (class loop *loop, const tree_niter_desc 
*niter_desc,
+   unsigned nunroll_in, const widest_int *iter)
+{

[committed] aarch64: Fix segfault in aarch64_expand_epilogue [PR95361]

2020-05-28 Thread Richard Sandiford
The stack frame for the function in the testcase consisted of two
SVE save slots.  Both saves had been shrink-wrapped, but for different
blocks, meaning that the stack allocation and deallocation were
separate from the saves themselves.  Before emitting the deallocation,
we tried to attach a REG_CFA_DEF_CFA note to the preceding instruction,
to redefine the CFA in terms of the stack pointer.  But in this case
there was no preceding instruction.

This in practice only happens for SVE because:

(a) We don't try to shrink-wrap wb_candidate* registers even when
we've decided to treat them as normal saves and restores.
I have a fix for that.

(b) Even with (a) fixed, we're (almost?) guaranteed to emit
a stack tie for frames that are 64k or larger, so we end
up hanging the REG_CFA_DEF_CFA note on that instead.

We should only need to redefine the CFA if it was previously
defined in terms of the frame pointer.  In other cases the CFA
should already be defined in terms of the stack pointer,
so redefining it is unnecessary but usually harmless.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Pushed to master.  I'm planning to backport this to branches without
the assert.

Richard


2020-05-28  Richard Sandiford  

gcc/
PR testsuite/95361
* config/aarch64/aarch64.c (aarch64_expand_epilogue): Assert that
we have at least some CFI operations when using a frame pointer.
Only redefine the CFA if we have CFI operations.

gcc/testsuite/
PR testsuite/95361
* gcc.target/aarch64/sve/pr95361.c: New test.
---
 gcc/config/aarch64/aarch64.c   |  6 +-
 gcc/testsuite/gcc.target/aarch64/sve/pr95361.c | 11 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr95361.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 78db0a56323..cffb945d7dd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8180,7 +8180,11 @@ aarch64_expand_epilogue (bool for_sibcall)
   if (callee_adjust != 0)
 aarch64_pop_regs (reg1, reg2, callee_adjust, &cfi_ops);
 
-  if (callee_adjust != 0 || maybe_gt (initial_adjust, 65536))
+  /* If we have no register restore information, the CFA must have been
+ defined in terms of the stack pointer since the end of the prologue.  */
+  gcc_assert (cfi_ops || !frame_pointer_needed);
+
+  if (cfi_ops && (callee_adjust != 0 || maybe_gt (initial_adjust, 65536)))
 {
   /* Emit delayed restores and set the CFA to be SP + initial_adjust.  */
   insn = get_last_insn ();
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr95361.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr95361.c
new file mode 100644
index 000..ce70d0d5cdf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr95361.c
@@ -0,0 +1,11 @@
+/* { dg-options "-O2" } */
+
+__SVInt8_t
+f (__SVInt8_t x, int y)
+{
+  if (y == 1)
+asm volatile ("" ::: "z8");
+  if (y == 2)
+asm volatile ("" ::: "z9");
+  return x;
+}


[committed] aarch64: Fix missed shrink-wrapping opportunity

2020-05-28 Thread Richard Sandiford
wb_candidate1 and wb_candidate2 exist for two overlapping cases:
when we use an STR or STP with writeback to allocate the frame,
and when we set up a frame chain record (either using writeback
allocation or not).

However, aarch64_layout_frame was leaving these fields with
legitimate register numbers even if we decided to do neither
of those things.  This prevented those registers from being
shrink-wrapped, even though we were otherwise treating them
as normal saves and restores.

The case this patch handles isn't the common case, so it might
not be worth going out of our way to optimise it.  But I think
the patch actually makes the output of aarch64_layout_frame more
consistent.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Pushed.

Richard


2020-05-28  Richard Sandiford  

gcc/
* config/aarch64/aarch64.h (aarch64_frame): Add a comment above
wb_candidate1 and wb_candidate2.
* config/aarch64/aarch64.c (aarch64_layout_frame): Invalidate
wb_candidate1 and wb_candidate2 if we decided not to use them.

gcc/testsuite/
* gcc.target/aarch64/shrink_wrap_1.c: New test.
---
 gcc/config/aarch64/aarch64.c  |  8 
 gcc/config/aarch64/aarch64.h  | 17 +
 .../gcc.target/aarch64/shrink_wrap_1.c| 19 +++
 3 files changed, 44 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shrink_wrap_1.c

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 24767c747ba..2be52fd4d73 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -842,6 +842,23 @@ struct GTY (()) aarch64_frame
   /* Store FP,LR and setup a frame pointer.  */
   bool emit_frame_chain;
 
+  /* In each frame, we can associate up to two register saves with the
+ initial stack allocation.  This happens in one of two ways:
+
+ (1) Using an STR or STP with writeback to perform the initial
+stack allocation.  When EMIT_FRAME_CHAIN, the registers will
+be those needed to create a frame chain.
+
+Indicated by CALLEE_ADJUST != 0.
+
+ (2) Using a separate STP to set up the frame record, after the
+initial stack allocation but before setting up the frame pointer.
+This is used if the offset is too large to use writeback.
+
+Indicated by CALLEE_ADJUST == 0 && EMIT_FRAME_CHAIN.
+
+ These fields indicate which registers we've decided to handle using
+ (1) or (2), or INVALID_REGNUM if none.  */
   unsigned wb_candidate1;
   unsigned wb_candidate2;
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cffb945d7dd..7feff77adf6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6749,6 +6749,14 @@ aarch64_layout_frame (void)
+ frame.sve_callee_adjust
+ frame.final_adjust, frame.frame_size));
 
+  if (!frame.emit_frame_chain && frame.callee_adjust == 0)
+{
+  /* We've decided not to associate any register saves with the initial
+stack allocation.  */
+  frame.wb_candidate1 = INVALID_REGNUM;
+  frame.wb_candidate2 = INVALID_REGNUM;
+}
+
   frame.laid_out = true;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_1.c 
b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_1.c
new file mode 100644
index 000..ab7cd74ec3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** foo:
+** ...
+** str d8, \[sp\]
+** ldr d8, \[sp\]
+** ...
+*/
+void
+foo (int x)
+{
+  int tmp[0x1000];
+  asm volatile ("" : "=m" (tmp));
+  if (x == 1)
+asm volatile ("" ::: "d8");
+}


[PATCH 2/4] param: Introduce one param to control ivopts reg-offset consideration

2020-05-28 Thread Kewen.Lin via Gcc-patches

gcc/ChangeLog

2020-MM-DD  Kewen Lin  

* doc/invoke.texi (iv-consider-reg-offset-for-unroll): Document new 
option.
* params.opt (iv-consider-reg-offset-for-unroll): New.
* config/s390/s390.c (s390_option_override_internal): Disable parameter
iv-consider-reg-offset-for-unroll by default.
* config/i386/i386-options.c (ix86_option_override_internal): Likewise.


diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index e0be493..41c99b3 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2902,6 +2902,12 @@ ix86_option_override_internal (bool main_args_p,
   if (ix86_indirect_branch != indirect_branch_keep)
 SET_OPTION_IF_UNSET (opts, opts_set, flag_jump_tables, 0);
 
+  /* Disable this for now till loop_unroll_adjust supports gimple level checks,
+ to avoid possible ICE.  */
+  if (opts->x_optimize >= 1)
+SET_OPTION_IF_UNSET (opts, opts_set,
+param_iv_consider_reg_offset_for_unroll, 0);
+
   return true;
 }
 
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ebba670..ae4c2bd 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -15318,6 +15318,12 @@ s390_option_override_internal (struct gcc_options 
*opts,
  not the case when the code runs before the prolog. */
   if (opts->x_flag_fentry && !TARGET_64BIT)
 error ("%<-mfentry%> is supported only for 64-bit CPUs");
+
+  /* Disable this for now till loop_unroll_adjust supports gimple level checks,
+ to avoid possible ICE.  */
+  if (opts->x_optimize >= 1)
+SET_OPTION_IF_UNSET (opts, opts_set,
+param_iv_consider_reg_offset_for_unroll, 0);
 }
 
 static void
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fa98e2f..502031c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12220,6 +12220,15 @@ If the number of candidates in the set is smaller than 
this value,
 always try to remove unnecessary ivs from the set
 when adding a new one.
 
+@item iv-consider-reg-offset-for-unroll
+When RTL unrolling performs on a loop, the duplicated loop iterations introduce
+appropriate induction variable step update expressions.  But if an induction
+variable is derived from address object, it is profitable to fill its required
+offset updates into appropriate memory access expressions if target memory
+accessing supports the register offset mode and the resulted offset is in the
+valid range.  The induction variable optimizations consider this information
+for better unrolling code.  It requires unroll factor estimation in middle-end.
+
 @item avg-loop-niter
 Average number of iterations of a loop.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 8e4217d..31424cf 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -270,6 +270,10 @@ Bound on number of candidates below that all candidates 
are considered in iv opt
 Common Joined UInteger Var(param_iv_max_considered_uses) Init(250) Param 
Optimization
 Bound on number of iv uses in loop optimized in iv optimizations.
 
+-param=iv-consider-reg-offset-for-unroll=
+Common Joined UInteger Var(param_iv_consider_reg_offset_for_unroll) Init(1) 
Optimization IntegerRange(0, 1) Param
+Whether iv optimizations mark register offset valid groups and consider their 
derived iv candidates would be profitable with estimated unroll factor 
consideration.
+
 -param=jump-table-max-growth-ratio-for-size=
 Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_size) 
Init(300) Param Optimization
 The maximum code size growth ratio when expanding into a jump table (in 
percent).  The parameter is used when optimizing for size.


[PATCH 0/4] Make IPA-SRA not depend on tree-dce and related fixes

2020-05-28 Thread Martin Jambor
Hi,

this patch series addresses PR 93385 which exposed that the new
IPA-SRA depends on tree-dce and can leave misbehaving instructions
behind if the user switched it off.  It is a series because I also
tried to produce the best debug info possible in such cases while
avoiding unnecessary copying of instructions during IPA-SRA clone
materialization and it seemed best to tackle different problems
independently.  We also might not want to backport all of the patches
to GCC 10.

The first patch actually fixes similar but different PR 95113 where
IPA-SRA should switch itself off because of non-call exceptions.

The second patch fixes non-debug instructions in PR 93385, debug
instructions are simply reset.

The third patch fixes up debug statements except for those when a
removed value is passed to another function.

The fourth patch attempts to produce useful debug info even in this
situation and fixes PR debug/95343.  However it requires relaxing some
gimple IL rules during IPA passes (and only during IPA passes).

After/if there is a consensus on the above, I would like to proceed
with a little bit of clean-up in the messy parts of tree-inline.c
which are directly involved in this - particularly the debug info
generation.

Finally, in Bugzilla Jakub asked me to make IPA-SRA consider any
arithmetic operation on otherwise unnecessary argument a use - if the
user used the option -fno-tree-dce.  I have not done that yet, mostly
because I realized we already differentiate between -fno-dce and
-fno-tree-dce and so none of those options is really for users and GCC
hackers might want to disable a specific pass and not a little bit of
another when they use it.  Also, making the testcase fail without
-fno-tree-dce requires using the following exact combination of
options:

  -fno-dce -fdisable-tree-cddce1 -fdisable-tree-cdce
  -fdisable-tree-cddce3 -fdisable-tree-dce2 -fdisable-tree-dce3
  -fdisable-tree-dce4 -fdisable-tree-dce7

And that does not seem very maintainable in the testcase.
Nevertheless, if the consensus is that -fno-tree-dce should also limit
IPA-SRA in this regard, the patch is trivial (Jakup wrote it in
comment 23.

All patches were individually bootstrapped and tested on x86_64-linux
and the whole bundle also passes LTO bootstrap and profiled-LTO
bootstrap on the same platform.  Bootstrap on aarch64 and i686 is
underway.

I am looking forward for your comments, questions and suggestions,

Martin


Martin Jambor (4):
  ipa-sra: Do not remove statements necessary because of non-call EH (PR
95113)
  ipa-sra: Introduce a mini-DCE to tree-inline.c (PR 93385)
  ipa-sra: Improve debug info for removed parameters (PR 93385)
  ipa-sra: Fix debug info for removed args passed to other functions (PR
93385, 95343)

 gcc/ipa-param-manipulation.c | 406 +++
 gcc/ipa-param-manipulation.h |  18 +
 gcc/ipa-sra.c|  28 +-
 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c |  45 +++
 gcc/testsuite/gcc.dg/guality/pr95343.c   |  45 +++
 gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c|  24 ++
 gcc/testsuite/gcc.dg/ipa/pr93385.c   |  27 ++
 gcc/testsuite/gcc.dg/ipa/pr95113.c   |  33 ++
 gcc/tree-cfg.c   |  14 +-
 gcc/tree-eh.c|  10 +
 gcc/tree-eh.h|   1 +
 gcc/tree-inline.c|  51 ++-
 gcc/tree-ssa-dce.c   |   4 +-
 gcc/tree-ssa-operands.c  |  16 +-
 14 files changed, 635 insertions(+), 87 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c
 create mode 100644 gcc/testsuite/gcc.dg/guality/pr95343.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr93385.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr95113.c

-- 
2.26.2


[PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling

2020-05-28 Thread Kewen.Lin via Gcc-patches

gcc/ChangeLog

2020-MM-DD  Kewen Lin  

* tree-ssa-loop-ivopts.c (struct iv_group): New field reg_offset_p.
(struct iv_cand): New field reg_offset_p.
(struct ivopts_data): New field consider_reg_offset_for_unroll_p.
(dump_groups): Dump group with reg_offset_p.
(record_group): Initialize reg_offset_p.
(mark_reg_offset_groups): New function.
(find_interesting_uses): Call mark_reg_offset_groups.
(add_candidate_1): Update reg_offset_p if derived from reg_offset_p 
group.
(set_group_iv_cost): Scale up group cost with estimate_unroll_factor if
consider_reg_offset_for_unroll_p.
(determine_iv_cost): Increase step cost with estimate_unroll_factor if
consider_reg_offset_for_unroll_p.
(tree_ssa_iv_optimize_loop): Call estimate_unroll_factor, update
consider_reg_offset_for_unroll_p.


diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1d2697ae1ba..1b7e4621f37 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -432,6 +432,8 @@ struct iv_group
   struct iv_cand *selected;
   /* To indicate this is a doloop use group.  */
   bool doloop_p;
+  /* To indicate this group is reg_offset valid.  */
+  bool reg_offset_p;
   /* Uses in the group.  */
   vec vuses;
 };
@@ -473,6 +475,7 @@ struct iv_cand
   struct iv *orig_iv;  /* The original iv if this cand is added from biv with
   smaller type.  */
   bool doloop_p;   /* Whether this is a doloop candidate.  */
+  bool reg_offset_p;/* Derived from one reg_offset valid group.  */
 };
 
 /* Hashtable entry for common candidate derived from iv uses.  */
@@ -653,6 +656,10 @@ struct ivopts_data
 
   /* Whether the loop has doloop comparison use.  */
   bool doloop_use_p;
+
+  /* Whether need to consider register offset addressing mode for the loop with
+ upcoming unrolling by estimated unroll factor.  */
+  bool consider_reg_offset_for_unroll_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -840,6 +847,11 @@ dump_groups (FILE *file, struct ivopts_data *data)
  gcc_assert (group->type == USE_COMPARE);
  fprintf (file, "  Type:\tCOMPARE\n");
}
+  if (group->reg_offset_p)
+   {
+ gcc_assert (address_p (group->type));
+ fprintf (file, "  reg_offset_p: true\n");
+   }
   for (j = 0; j < group->vuses.length (); j++)
dump_use (file, group->vuses[j]);
 }
@@ -1582,6 +1594,7 @@ record_group (struct ivopts_data *data, enum use_type 
type)
   group->related_cands = BITMAP_ALLOC (NULL);
   group->vuses.create (1);
   group->doloop_p = false;
+  group->reg_offset_p = false;
 
   data->vgroups.safe_push (group);
   return group;
@@ -2731,6 +2744,60 @@ split_address_groups (struct ivopts_data *data)
 }
 }
 
+/* Go through all address type groups, check and mark reg_offset addressing 
mode
+   valid groups.  */
+
+static void
+mark_reg_offset_groups (struct ivopts_data *data)
+{
+  class loop *loop = data->current_loop;
+  gcc_assert (data->current_loop->estimated_unroll > 1);
+  bool any_reg_offset_p = false;
+
+  for (unsigned i = 0; i < data->vgroups.length (); i++)
+{
+  struct iv_group *group = data->vgroups[i];
+  if (address_p (group->type))
+   {
+ struct iv_use *head_use = group->vuses[0];
+ if (!tree_fits_poly_int64_p (head_use->iv->step))
+   continue;
+
+ bool found = true;
+ poly_int64 step = tree_to_poly_int64 (head_use->iv->step);
+ /* Max extra offset to fill for head of group.  */
+ poly_int64 max_increase = (loop->estimated_unroll - 1) * step;
+ /* Check whether this increment still valid.  */
+ if (!addr_offset_valid_p (head_use, max_increase))
+   found = false;
+
+ unsigned group_size = group->vuses.length ();
+ /* Check the whole group further.  */
+ if (group_size > 1)
+   {
+ /* Only need to check the last one in the group, both the head and
+   the last is valid, the others should be fine.  */
+ struct iv_use *last_use = group->vuses[group_size - 1];
+ poly_int64 max_delta
+   = last_use->addr_offset - head_use->addr_offset;
+ poly_int64 max_offset = max_delta + max_increase;
+ if (maybe_ne (max_delta, 0)
+ && !addr_offset_valid_p (head_use, max_offset))
+   found = false;
+   }
+
+ if (found)
+   {
+ group->reg_offset_p = true;
+ any_reg_offset_p = true;
+   }
+   }
+}
+
+  if (!any_reg_offset_p)
+data->consider_reg_offset_for_unroll_p = false;
+}
+
 /* Finds uses of the induction variables that are interesting.  */
 
 static void
@@ -2762,6 +2829,9 @@ find_interesting_uses (struct ivopts_data *data)
 
   split_address_groups (data);
 
+  if (data->consider_reg_offset_for_unroll_p)
+mark_

[PATCH 1/4] ipa-sra: Do not remove statements necessary because of non-call EH (PR 95113)

2020-05-28 Thread Martin Jambor
PR 95113 revealed that when reasoning about which parameters are dead,
IPA-SRA does not perform the same check related to non-call exceptions
as tree DCE.  It most certainly should and so this patch moves the
condition used in tree-ssa-dce.c into a separate predicate (in
tree-eh.c) and uses it from both places.

gcc/ChangeLog:

2020-05-27  Martin Jambor  

PR ipa/95113
* gcc/tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Move non-call
exceptions check to...
* gcc/tree-eh.c (stmt_unremovable_because_of_non_call_eh_p): ...this
new function.
* gcc/tree-eh.h (stmt_unremovable_because_of_non_call_eh_p): Declare it.
* gcc/ipa-sra.c (isra_track_scalar_value_uses): Use it.  New parameter
fun.

gcc/testsuite/ChangeLog:

2020-05-27  Martin Jambor  

PR ipa/95113
* gcc.dg/ipa/pr95113.c: New test.
---
 gcc/ipa-sra.c  | 28 +
 gcc/testsuite/gcc.dg/ipa/pr95113.c | 33 ++
 gcc/tree-eh.c  | 10 +
 gcc/tree-eh.h  |  1 +
 gcc/tree-ssa-dce.c |  4 +---
 5 files changed, 60 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr95113.c

diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
index 7c922e40a4e..c81e8869e7a 100644
--- a/gcc/ipa-sra.c
+++ b/gcc/ipa-sra.c
@@ -795,17 +795,17 @@ get_single_param_flow_source (const isra_param_flow 
*param_flow)
 }
 
 /* Inspect all uses of NAME and simple arithmetic calculations involving NAME
-   in NODE and return a negative number if any of them is used for something
-   else than either an actual call argument, simple arithmetic operation or
-   debug statement.  If there are no such uses, return the number of actual
-   arguments that this parameter eventually feeds to (or zero if there is 
none).
-   For any such parameter, mark PARM_NUM as one of its sources.  ANALYZED is a
-   bitmap that tracks which SSA names we have already started
-   investigating.  */
+   in FUN represented with NODE and return a negative number if any of them is
+   used for something else than either an actual call argument, simple
+   arithmetic operation or debug statement.  If there are no such uses, return
+   the number of actual arguments that this parameter eventually feeds to (or
+   zero if there is none).  For any such parameter, mark PARM_NUM as one of its
+   sources.  ANALYZED is a bitmap that tracks which SSA names we have already
+   started investigating.  */
 
 static int
-isra_track_scalar_value_uses (cgraph_node *node, tree name, int parm_num,
- bitmap analyzed)
+isra_track_scalar_value_uses (function *fun, cgraph_node *node, tree name,
+ int parm_num, bitmap analyzed)
 {
   int res = 0;
   imm_use_iterator imm_iter;
@@ -859,8 +859,9 @@ isra_track_scalar_value_uses (cgraph_node *node, tree name, 
int parm_num,
}
  res += all_uses;
}
-  else if ((is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
-  || gimple_code (stmt) == GIMPLE_PHI)
+  else if (!stmt_unremovable_because_of_non_call_eh_p (fun, stmt)
+  && ((is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
+  || gimple_code (stmt) == GIMPLE_PHI))
{
  tree lhs;
  if (gimple_code (stmt) == GIMPLE_PHI)
@@ -876,7 +877,7 @@ isra_track_scalar_value_uses (cgraph_node *node, tree name, 
int parm_num,
  gcc_assert (!gimple_vdef (stmt));
  if (bitmap_set_bit (analyzed, SSA_NAME_VERSION (lhs)))
{
- int tmp = isra_track_scalar_value_uses (node, lhs, parm_num,
+ int tmp = isra_track_scalar_value_uses (fun, node, lhs, parm_num,
  analyzed);
  if (tmp < 0)
{
@@ -927,7 +928,8 @@ isra_track_scalar_param_local_uses (function *fun, 
cgraph_node *node, tree parm,
 return true;
 
   bitmap analyzed = BITMAP_ALLOC (NULL);
-  int call_uses = isra_track_scalar_value_uses (node, name, parm_num, 
analyzed);
+  int call_uses = isra_track_scalar_value_uses (fun, node, name, parm_num,
+   analyzed);
   BITMAP_FREE (analyzed);
   if (call_uses < 0)
 return true;
diff --git a/gcc/testsuite/gcc.dg/ipa/pr95113.c 
b/gcc/testsuite/gcc.dg/ipa/pr95113.c
new file mode 100644
index 000..a8f8c901ebe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr95113.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions" } */
+/* { dg-require-effective-target exceptions } */
+
+int a, b;
+
+static inline long int
+foo (long int x, int y)
+{
+  if (y == 0)
+return 0;
+
+  if (x == -1 && y == -1)
+return 0;
+
+  return x / y;
+}
+
+static inline int
+bar (int *p)
+{
+  int c = foo (a, 1) + *p;
+  return b;
+}
+
+int
+main ()
+{
+  int d = 0;
+  b = foo (1,

[PATCH 3/4] ipa-sra: Improve debug info for removed parameters (PR 93385)

2020-05-28 Thread Martin Jambor
Whereas the previous patch fixed issues with code left behind after
IPA-SRA removed a parameter but only reset all affected debug bind
statements, this one updates them with expressions which can allow the
debugger to print the removed value - see the added test-case.

Even though I originally did not want to create DEBUG_EXPR_DECLs for
intermediate values, I ended up doing so, because otherwise the code
started creating statements like

   # DEBUG __aD.198693 => &MEM[(const struct _Alloc_nodeD.171110 
*)D#195]._M_tD.184726->_M_implD.171154

which not only is a bit scary but gimple-fold also ICEs on
it. Therefore I decided they are probably quite necessary and have
them.

The patch simply notes each removed SSA name present in a debug
statement and then works from it backwards, looking if it can
reconstruct the expression it represents (which can fail if a
non-degenerate PHI node is in the way).  If it can, it populates two
hash maps with those expressions so that 1) removed assignments are
replaced with a debug bind defining a new intermediate debug_decl_expr
and 2) existing debug binds that refer to SSA names that are bing
removed now refer to corresponding debug_decl_exprs.

If a removed parameter is passed to another function, the debugging
information still cannot describe its value there - see the xfailed
test in the testcase.  This will is addressed in the following patch
which removes the xfail.
---
 gcc/ipa-param-manipulation.c | 271 ++-
 gcc/ipa-param-manipulation.h |  12 +-
 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c |  45 
 gcc/tree-inline.c|  45 ++--
 4 files changed, 302 insertions(+), 71 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 1f47f3a4268..0a265e26c4f 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -40,6 +40,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-ssa.h"
 #include "tree-inline.h"
+#include "tree-phinodes.h"
+#include "cfgexpand.h"
 
 
 /* Actual prefixes of different newly synthetized parameters.  Keep in sync
@@ -979,7 +981,8 @@ phi_arg_will_live_p (gphi *phi, bitmap blocks_to_copy, tree 
arg)
any replacement or splitting.  */
 
 void
-ipa_param_body_adjustments::mark_dead_statements (tree dead_param)
+ipa_param_body_adjustments::mark_dead_statements (tree dead_param,
+ vec *debugstack)
 {
   if (!is_gimple_reg (dead_param))
 return;
@@ -988,6 +991,7 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param)
 return;
 
   auto_vec stack;
+  hash_set used_in_debug;
   m_dead_ssas.add (parm_ddef);
   stack.safe_push (parm_ddef);
   while (!stack.is_empty ())
@@ -1010,6 +1014,11 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param)
{
  m_dead_stmts.add (stmt);
  gcc_assert (gimple_debug_bind_p (stmt));
+ if (!used_in_debug.contains (t))
+   {
+ used_in_debug.add (t);
+ debugstack->safe_push (t);
+   }
}
  else if (gimple_code (stmt) == GIMPLE_PHI)
{
@@ -1044,6 +1053,155 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param)
gcc_unreachable ();
}
 }
+
+  if (!MAY_HAVE_DEBUG_STMTS)
+{
+  gcc_assert (debugstack->is_empty ());
+  return;
+}
+
+  tree dp_ddecl = make_node (DEBUG_EXPR_DECL);
+  DECL_ARTIFICIAL (dp_ddecl) = 1;
+  TREE_TYPE (dp_ddecl) = TREE_TYPE (dead_param);
+  SET_DECL_MODE (dp_ddecl, DECL_MODE (dead_param));
+  m_dead_ssa_debug_equiv.put (parm_ddef, dp_ddecl);
+}
+
+/* Callback to walk_tree.  If REMAP is an SSA_NAME that is present in hash_map
+   passed in DATA, replace it with unshared version of what it was mapped
+   to.  */
+
+static tree
+replace_with_mapped_expr (tree *remap, int *walk_subtrees, void *data)
+{
+  if (TYPE_P (*remap))
+{
+  *walk_subtrees = 0;
+  return 0;
+}
+  if (TREE_CODE (*remap) != SSA_NAME)
+return 0;
+
+  *walk_subtrees = 0;
+
+  hash_map *equivs = (hash_map *) data;
+  if (tree *p = equivs->get (*remap))
+*remap = unshare_expr (*p);
+  return 0;
+}
+
+/* Replace all occurances of SSAs in m_dead_ssa_debug_equiv in t with what they
+   are mapped to.  */
+
+void
+ipa_param_body_adjustments::remap_with_debug_expressions (tree *t)
+{
+  /* If *t is an SSA_NAME which should have its debug statements reset, it is
+ mapped to NULL in the hash_map.  We need to handle that case separately or
+ otherwise the walker would segfault.  No expression that is more
+ complicated than that can have its operands mapped to NULL.  */
+  if (TREE_CODE (*t) == SSA_NAME)
+{
+  if (tree *p = m_dead_ssa_debug_equiv.get (*t))
+   *t = *p;
+}
+  else
+walk_tree (t, replace_with_ma

[PATCH 4/4] ipa-sra: Fix debug info for removed args passed to other functions (PR 93385, 95343)

2020-05-28 Thread Martin Jambor
This patch arguably finishes what I was asked to do in bugzilla PR
93385 and remaps *all* occurrences of SSA names discovered to be dead
in the process of parameter removal during clone materialization
either to error_mark_node or to DEBUG_EXPR_DECL that represents the
removed value - including those that appeared as arguments in call
statements.

However, those error_mark_nodes and DEBUG_EXPR_DECLs occurrences are
not removed straight away because arguments are removed only as a part
of call redirection - mostly following the plan for the callee - which
is not part of clone materialization.  Just for the record, this is
not something introduced by IPA-SRA, this has always been that way
since the beginning of IPA infrastructure and for good reasons.

As a consequence, error_mark_node and DEBUG_EXPR_DECL must be allowed
in places where they are normally not, which this patch does but only
during IPA passes. Afterwards, they are again banned.  I am confident
that if some bug allowed one of these to survive until late tree
passes, the compiler would ICE very quickly and so it is a safe thing
to do, even if not exactly nicely consistent.  Perhaps safer than the
temporary decl what the second patch introduced.

Temporarily replacing arguments with associated DEBUG_EXPR_DECL allows
us to produce debug info allowing the debugger to print values of
unused parameters which were removed not only in its function but also
in the caller.  At least sometimes :-) See the removed xfail in
testcase/gcc.dg/guality/ipa-sra-1.c.

I have attempted to achieve the same thing by associating the
DEBUG_EXPR_DECL with the artificial temporary and keep track of this
relationship in on-the side-summaries, constantly remapping both when
a clone of a clone gets its body and it is doable but quite ugly.
Injecting the DEBUG_EXPR_DECL directly into the IL works out of the
box.

Oh, and this patch also fixes PR debug/95343 - a case whee call
redirection can produce bad debug info.  A non-controversial fix is in
the first bugzilla comment but it needs all the other bits of this
patch to really allow debugger to print the value of the removed
parameter and not "value optimized out."  But perhaps that is what we
want to backport?

gcc/Changelog:

2020-05-26  Martin Jambor  

PR ipa/93385
PR debug/95343
* ipa-param-manipulation.c (transitive_split_p): Handle
error_mark_node.
(ipa_param_adjustments::modify_call): Use index_map if available.
Directly use removed argument if it is a DEBUG_EXP_DECL for
corresponding debug info, assert all are removed.
(ipa_param_body_adjustments::get_removed_call_arg_placeholder): Return
corresponding DEBUG_EXP_DECL if there is one, otherwise return
error_mark_node.
* tree-ssa-operands.c: Include tree-pass.h.
(operands_scanner::get_expr_operands): Allow DEBUG_EXPR_DECL and
error_mark_node in call arguments during simple IPA passes.
* tree-cfg.c (verify_gimple_call): Likewise.

gcc/testsuite/Changelog:

2020-05-26  Martin Jambor  

PR ipa/93385
PR debug/95343
* gcc.dg/guality/pr95343.c: New test.
* gcc.dg/guality/ipa-sra-1.c (bar): Remove xfail.
---
 gcc/ipa-param-manipulation.c | 31 
 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c |  2 +-
 gcc/testsuite/gcc.dg/guality/pr95343.c   | 45 
 gcc/tree-cfg.c   | 14 ++--
 gcc/tree-ssa-operands.c  | 16 +++--
 5 files changed, 95 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/pr95343.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 0a265e26c4f..b43c1323ef1 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -466,6 +466,8 @@ transitive_split_p (vec 
*performed_splits,
tree expr, unsigned *sm_idx_p, unsigned *unit_offset_p)
 {
   tree base;
+  if (expr == error_mark_node)
+return false;
   if (!isra_get_ref_base_and_offset (expr, &base, unit_offset_p))
 return false;
 
@@ -617,6 +619,8 @@ ipa_param_adjustments::modify_call (gcall *stmt,
index = index_map[apm->base_index];
 
  tree arg = gimple_call_arg (stmt, index);
+ gcc_assert (arg != error_mark_node
+ && TREE_CODE (arg) != DEBUG_EXPR_DECL);
 
  vargs.quick_push (arg);
  kept[index] = true;
@@ -789,7 +793,14 @@ ipa_param_adjustments::modify_call (gcall *stmt,
  if (!is_gimple_reg (old_parm) || kept[i])
continue;
  tree origin = DECL_ORIGIN (old_parm);
- tree arg = gimple_call_arg (stmt, i);
+ int index;
+ if (transitive_remapping)
+   index = index_map[i];
+ else
+   index = i;
+ tree arg = gimple_call_arg (stmt, index);
+ if (arg == error_mark_node)
+   continue;
 
  if (!useless_type_conversion_p (TR

[PATCH 2/4] ipa-sra: Introduce a mini-DCE to tree-inline.c (PR 93385)

2020-05-28 Thread Martin Jambor
PR 93385 reveals that if the user explicitely disables DCE, IPA-SRA
can leave behind statements which are useless because their results
are eventually not used but can have problematic side effects,
especially since their inputs are now bogus that useless parameters
were removed.

This patch fixes the problem by doing a similar def-use walk when
materializing clones, marking which statements should not be copied
and which SSA_NAMEs will be removed by call redirections and now need
to be replaced with anything valid.  Default-definition SSA_NAMEs of
parameters which are removed and all SSA_NAMEs derived from them (in a
phi node or a simple assignment statement) are then remapped to
error_mark_node - a sure way to spot it if any is left in place.

There is one exception to the above rule, if such SSA_NAMEs appear as
an argument of a call, they need to be removed by call redirection and
not as part of clone materialization.  So to have something valid
there until that time, this patch pulls out dummy declarations out of
thin air.  If you do not like that, see patch number 4 in the series,
which changes this, but probably in a controversial way.

This patch only resets debug statements using the removed SSA_NAMEs.
The first follow-up patch adjusts debug statements in the current
function to still try to make the removed values available in debugger
in the current function and the subsequent one also in other functions
where they are passed.

gcc/ChangeLog:

2020-05-14  Martin Jambor  

PR ipa/93385
* ipa-param-manipulation.h (class ipa_param_body_adjustments): New
members m_dead_stmts, m_dead_ssas, mark_dead_statements and
get_removed_call_arg_placeholder.
* ipa-param-manipulation.c (phi_arg_will_live_p): New function.
(ipa_param_body_adjustments::mark_dead_statements): New method.
(ipa_param_body_adjustments::common_initialization): Call it.
(ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
new mwmbers.
(ipa_param_body_adjustments::get_removed_call_arg_placeholder): New.
(ipa_param_body_adjustments::modify_call_stmt): Replace dead SSAs
with dummy decls.
* tree-inline.c (remap_gimple_stmt): Do not copy dead statements,
reset dead debug statements.
(copy_phis_for_bb): Do not copy dead PHI nodes.

gcc/testsuite/ChangeLog:

2020-05-14  Martin Jambor  

PR ipa/93385
* gcc.dg/ipa/pr93385.c: New test.
* gcc.dg/ipa/ipa-sra-23.c: Likewise.
---
 gcc/ipa-param-manipulation.c  | 142 --
 gcc/ipa-param-manipulation.h  |   8 ++
 gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c |  24 +
 gcc/testsuite/gcc.dg/ipa/pr93385.c|  27 +
 gcc/tree-inline.c |  18 +++-
 5 files changed, 205 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr93385.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 978916057f0..1f47f3a4268 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -953,6 +953,99 @@ ipa_param_body_adjustments::carry_over_param (tree t)
   return new_parm;
 }
 
+/* Return true if BLOCKS_TO_COPY is NULL or if PHI has an argument ARG in
+   position that corresponds to an edge that is coming from a block that has
+   the corresponding bit set in BLOCKS_TO_COPY.  */
+
+static bool
+phi_arg_will_live_p (gphi *phi, bitmap blocks_to_copy, tree arg)
+{
+  bool arg_will_survive = false;
+  if (!blocks_to_copy)
+arg_will_survive = true;
+  else
+for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+  if (gimple_phi_arg_def (phi, i) == arg
+ && bitmap_bit_p (blocks_to_copy,
+  gimple_phi_arg_edge (phi, i)->src->index))
+   {
+ arg_will_survive = true;
+ break;
+   }
+  return arg_will_survive;
+}
+
+/* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without
+   any replacement or splitting.  */
+
+void
+ipa_param_body_adjustments::mark_dead_statements (tree dead_param)
+{
+  if (!is_gimple_reg (dead_param))
+return;
+  tree parm_ddef = ssa_default_def (m_id->src_cfun, dead_param);
+  if (!parm_ddef || has_zero_uses (parm_ddef))
+return;
+
+  auto_vec stack;
+  m_dead_ssas.add (parm_ddef);
+  stack.safe_push (parm_ddef);
+  while (!stack.is_empty ())
+{
+  tree t = stack.pop ();
+
+  imm_use_iterator imm_iter;
+  gimple *stmt;
+
+  insert_decl_map (m_id, t, error_mark_node);
+  FOR_EACH_IMM_USE_STMT (stmt, imm_iter, t)
+   {
+ if (is_gimple_call (stmt)
+ || (m_id->blocks_to_copy
+ && !bitmap_bit_p (m_id->blocks_to_copy,
+   gimple_bb (stmt)->index)))
+   continue;
+
+ if (is_gimple_debug (stmt))
+   {
+ m_dead_stmts.add (stmt);
+ gcc_assert (

Re: [WIKI] Replace delta with C-Vise (and C-Reduce)

2020-05-28 Thread Martin Liška

On 5/28/20 1:17 PM, Martin Jambor wrote:

I don't think you need to seek approval to edit wiki pages and putting
c-vise instructions at the top of that page is definitely the right
thing to do.


All right.



On the other hand, I would not remove the delta and multidelta sections
but rather move them to the bottom of the page.  The instructions may
still be useful on various ancient and non-Linux systems.


I moved the section to Legacy tools and I've just updated the wiki page.

Thanks,
Martin


Re: [WIKI] Replace delta with C-Vise (and C-Reduce)

2020-05-28 Thread Martin Liška

On 5/28/20 1:26 PM, Tobias Burnus wrote:

It is not completely clear to me whether C-Vise also works with Fortran;


Yes, it works. Some C/C++-related passes are skipped, but it works.

Martin


[DOC] Mention C-Vise and C-Reduce instead of Delta.

2020-05-28 Thread Martin Liška

Hi.

I've just updated https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction and
I'm changing link to the WIKI page.

Martin

---
 htdocs/bugs/minimize.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/bugs/minimize.html b/htdocs/bugs/minimize.html
index 6197169a..4503b73f 100644
--- a/htdocs/bugs/minimize.html
+++ b/htdocs/bugs/minimize.html
@@ -67,7 +67,7 @@ the bug. For this, either use the undo function of your 
editor, or use
 frequent backup files; alternatively, you can use #if 0
 and #endif. The automatic and recommended way to
 https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction";>reduce
-a testcase is using the Delta tool.
+a testcase is using the C-Vise or the C-Reduce tool.
 
 If you have access to the original sources, it is better to start

 with the original sources and when those cannot be 
reduced further,
--
2.26.2



RE: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

2020-05-28 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, May 28, 2020 12:07 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR95254] aarch64: gcc generate inefficient code with
> fixed sve vector length
> 

Snip...

> 
> Ah, OK.  But in that case, shouldn't we allow the change if the original
> unaligned MEM was also “slow”?
> 
> I guess there might be cases in which both modes are slow enough for the
> hook to return true for them, but one is worse than the other.
> But I don't think there's much we can do about that as things stand:
> changing the mode might move from a slow mode to a slower mode, but it
> might move in the other direction too.

Good point.

> > +2020-05-27  Felix Yang  
> > +   Richard Sandiford  
> 
> I appreciate the gesture, but I don't think it's appropriate to list me as an
> author.  I haven't written any of the code, I've just reviewed it. :-)

OK.

> > diff --git a/gcc/expr.c b/gcc/expr.c
> > index dfbeae71518..3035791c764 100644
> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -3814,6 +3814,69 @@ emit_move_insn (rtx x, rtx y)
> >gcc_assert (mode != BLKmode
> >   && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode));
> >
> > +  /* If we have a copy which looks like one of the following patterns:
> 
> s/which/that/ (I think)

OK.

> > +   (set (subreg:M1 (reg:M2 ...)) (subreg:M1 (reg:M2 ...)))
> > +   (set (subreg:M1 (reg:M2 ...)) (mem:M1 ADDR))
> > +   (set (mem:M1 ADDR) (subreg:M1 (reg:M2 ...)))
> > +   (set (subreg:M1 (reg:M2 ...)) (constant C))
> > + where mode M1 is equal in size to M2 and target hook
> can_change_mode_class
> > + (M1, M2, ALL_REGS) returns false, try to remove the subreg.  This
> avoids
> > + an implicit round trip through memory.  */
> 
> How about:
> 
>  where mode M1 is equal in size to M2, try to detect whether the
>  mode change involves an implicit round trip through memory.
>  If so, see if we can avoid that by removing the subregs and
>  doing the move in mode M2 instead.  */
> 
> > +  else if (x_inner != NULL_RTX
> > +  && MEM_P (y)
> > +  && ! targetm.can_change_mode_class (GET_MODE (x_inner),
> > +  mode, ALL_REGS)
> > +  /* Stop if the inner mode requires too much alignment.  */
> > +  && (! targetm.slow_unaligned_access (GET_MODE (x_inner),
> > +   MEM_ALIGN (y))
> > +  || MEM_ALIGN (y) >= GET_MODE_ALIGNMENT (GET_MODE
> (x_inner
> 
> It's better to check the alignment first, since it's cheaper.
> So taking the comment above into account, I think this ends up as:
> 
>  && (MEM_ALIGN (y) >= GET_MODE_ALIGNMENT (GET_MODE
> (x_inner))
>  || targetm.slow_unaligned_access (mode, MEM_ALIGN (y)
>  || !targetm.slow_unaligned_access (GET_MODE (x_inner),
> MEM_ALIGN (y))
> 
> (Note: no space after "!", although the sources aren't as consistent about
> that as they could be.)

OK.

> TBH I think it would be good to avoid duplicating such a complicated condition
> in both directions, so at the risk of getting flamed, how about using a 
> lambda?
> 
>   auto candidate_mem_p = [&](machine_mode inner_mode, rtx mem) {
> return ...;
>   };
> 
> with ... containing everything after the MEM_P check?

Yes, this avoids duplicating code.

> Looks good otherwise, thanks,

Thanks for reviewing this.
Attached please find the v5 patch.
Note: we also need to modify local variable "mode" once we catch one case.  I 
see test failure without this change.

Bootstrapped and tested on aarch64-linux-gnu.
Also bootstrapped on x86_64-linux-gnu.  Regression test show two failed tests 
on this platform:

1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors)
2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa

I have adjust the second one in the v4 patch. But The first one looks strange 
to me.
I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like:

125 vcvtps2ph   $0, %zmm0, -112(%rbp){%k1}
126 vcvtps2ph   $0, %zmm0, -80(%rbp){%k1}{z}

This happens in the combine phase, where an combined insn looks like:

1989 Trying 31 -> 33:
199031: r106:V16HI=vec_merge(unspec[r103:V16SF,0] 
133,[frame:DI-0x60],r109:HI)
199133: [frame:DI-0x60]=r106:V16HI
1992   REG_DEAD r106:V16HI
1993 Successfully matched this instruction:
1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
1995 (const_int -96 [0xffa0])) [2 res2.x+0 S32 A256])
1996 (vec_merge:V16HI (unspec:V16HI [
1997 (reg:V16SF 103)
1998 (const_int 0 [0])
1999 ] UNSPEC_VCVTPS2PH)
2000 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
2001 (const_int -96 [0xffa0])) [2 res2.x+0 S32 
A256])
2002 (reg:HI 109)))
2003 allowing c

Re: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-05-28 Thread Richard Biener
On Wed, 27 May 2020, Hao Liu OS wrote:

> Hi all,
> 
> Previously, the fix for 
> PR89430 was 
> reverted by PR94734 
> due to a bug. The root cause is missing non-trapping check with 
> dominating LOAD/STORE.
> 
> This patch extends the cselim non-trapping check to support ARRAY_REF 
> and COMPONENT_REF (previously only support MEM_REF) by 
> get_inner_reference and hashing according to the comments from Jakub.
> 
> To support cases in PR89430, if there is a dominating LOAD of local 
> variable without address escape, as local stack is always writable, the 
> STORE is not trapped and can be optimized.
> 
> Review, please.

How did you test the patch?  There's a ChangeLog missing as well as
a testcase or testcase adjustments in case we XFAILed the old ones.
It helps to post patches created by git format-patch.

First comments inline...

> 
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index b1e0dce93d8..3733780a0bc 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -1986,26 +1986,31 @@ abs_replacement (basic_block cond_bb, basic_block 
> middle_bb,
> 
> ??? We currently are very conservative and assume that a load might
> trap even if a store doesn't (write-only memory).  This probably is
> -   overly conservative.  */
> +   overly conservative.
> 
> -/* A hash-table of SSA_NAMEs, and in which basic block an MEM_REF
> -   through it was seen, which would constitute a no-trap region for
> -   same accesses.  */
> -struct name_to_bb
> +   We currently support a special case that for !TREE_ADDRESSABLE automatic
> +   variables, it could ignore whether something is a load or store because 
> the
> +   local stack should be always writable.  */
> +
> +/* A hash-table of references (MEM_REF/ARRAY_REF/COMPONENT_REF), and in which
> +   basic block an *_REF through it was seen, which would constitute a
> +   no-trap region for same accesses.  */
> +struct ref_to_bb
>  {
> -  unsigned int ssa_name_ver;
> +  tree base;
> +  poly_int64 bitsize, bitpos;
> +  tree offset;
>unsigned int phase;
> -  bool store;
> -  HOST_WIDE_INT offset, size;
> +  bool writable;
>basic_block bb;
>  };
> 
>  /* Hashtable helpers.  */
> 
> -struct ssa_names_hasher : free_ptr_hash 
> +struct refs_hasher : free_ptr_hash
>  {
> -  static inline hashval_t hash (const name_to_bb *);
> -  static inline bool equal (const name_to_bb *, const name_to_bb *);
> +  static inline hashval_t hash (const ref_to_bb *);
> +  static inline bool equal (const ref_to_bb *, const ref_to_bb *);
>  };
> 
>  /* Used for quick clearing of the hash-table when we see calls.
> @@ -2015,28 +2020,44 @@ static unsigned int nt_call_phase;
>  /* The hash function.  */
> 
>  inline hashval_t
> -ssa_names_hasher::hash (const name_to_bb *n)
> +refs_hasher::hash (const ref_to_bb *n)
>  {
> -  return n->ssa_name_ver ^ (((hashval_t) n->store) << 31)
> - ^ (n->offset << 6) ^ (n->size << 3);
> +  inchash::hash hstate;
> +  inchash::add_expr (n->base, hstate);
> +  hstate.add_poly_int (n->bitsize);
> +  hstate.add_poly_int (n->bitpos);
> +  hstate.add_int (n->writable);
> +  if (n->offset != NULL_TREE)
> +{
> +  inchash::add_expr (n->offset, hstate);
> +}

extra {}

> +  return hstate.end ();
>  }
> 
>  /* The equality function of *P1 and *P2.  */
> 
>  inline bool
> -ssa_names_hasher::equal (const name_to_bb *n1, const name_to_bb *n2)
> +refs_hasher::equal (const ref_to_bb *n1, const ref_to_bb *n2)
>  {
> -  return n1->ssa_name_ver == n2->ssa_name_ver
> - && n1->store == n2->store
> - && n1->offset == n2->offset
> - && n1->size == n2->size;
> +  if (operand_equal_p (n1->base, n2->base, 0)
> +  && known_eq (n1->bitsize, n2->bitsize)
> +  && known_eq (n1->bitpos, n2->bitpos) && n1->writable == n2->writable)
> +{
> +  /* Should not call operand_equal_p with NULL_TREE.  */
> +  if (n1->offset == NULL_TREE || n2->offset == NULL_TREE)
> +  return n1->offset == n2->offset;

indents are off.  You should be using a tab-stop of 8 characters.

> +  else
> +  return operand_equal_p (n1->offset, n2->offset, 0);
> +}
> +  return false;
>  }
> 
>  class nontrapping_dom_walker : public dom_walker
>  {
>  public:
>nontrapping_dom_walker (cdi_direction direction, hash_set *ps)
> -: dom_walker (direction), m_nontrapping (ps), m_seen_ssa_names (128) {}
> +: dom_walker (direction), m_nontrapping (ps), m_seen_refs (256)
> +  {}
> 
>virtual edge before_dom_children (basic_block);
>virtual void after_dom_children (basic_block);
> @@ -2053,7 +2074,7 @@ private:
>hash_set *m_nontrapping;
> 
>/* The hash table for remembering what we've seen.  */
> -  hash_table m_seen_ssa_names;
> +  hash_table m_seen_refs;
>  };
> 
>  /* Called by walk_dominator_tree, when entering the block BB.  */
> @@ -2102,65 +2123,76 @@ nontra

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-28 Thread Segher Boessenkool
On Wed, May 27, 2020 at 10:20:18AM +0200, Richard Biener wrote:
> > How about "Var(flag_cunroll_grow_size) EnabledBy(funroll-loops ||
> > funroll-all-loops || fpeel-loops)" Or flag_cunroll_allow_grow_size?
> >
> > And then using this flags as:
> >   unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
> >|| optimize >= 3, true);
> >
> > And we do not need to enable this flag at -O2.
> 
> Sure this works for me.  Note I'd make funroll-loops enabled by
> funroll-all-loops so you could simplify the above.

But only do that on trunk?


Segher


Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-05-28 Thread Segher Boessenkool
On Thu, May 28, 2020 at 12:01:26PM +0200, Richard Biener wrote:
> On Thu, May 28, 2020 at 10:52 AM guojiufu  wrote:
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -2856,6 +2856,10 @@ funroll-all-loops
> >  Common Report Var(flag_unroll_all_loops) Optimization
> >  Perform loop unrolling for all loops.
> >
> > +funroll-completely-grow-size
> > +Var(flag_cunroll_grow_size) Init(2)
> > +; Control cunroll to allow size growth during complete unrolling
> > +
> 
> So this really adds a new compiler option which would need documenting.

The new flag can be marked Undocumented, would that help?  It is only
for internal use (which a comment can say as well).

> I fear we'll get into bikeshed territory here as well...  I originally thought
> we can use
> 
> Variable
> int flag_cunroll_grow_size;
> 
> but now realize that does not work well with LTO without adjusting
> the awk scripts to generate option saving/restoring.  For your patch
> you'd need to add 'Optimization' to get the flag streamed properly,
> you should also verify the target adjustment done in the backend
> is reflected in LTO mode.

The option machinery together with LTO is too complicated for me :-/

> Now back to the option name ... if we expose the option we should apply
> some forward looking.  Currently cunroll cannot be disabled or enabled
> with a flag and the desired new flag simply tunes one knob on it.  How
> about adding
> 
> -fcomplete-unroll-loops[=may-grow]
> 
> to be able to further extend this later

Trying to anticipate how things will be extended later rarely works :-(

> (there's the knob to only unroll
> non-outermost loops and the knob whether to unroll loops where
> intermediate exits are not statically predicted - incompletely controlled
> by -fpeel-loops).  There's unfortunately no existing examples that allows
> multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
> the sanitizers which have manual option parsing.
> 
> So if there's no good suggestion from option folks maybe go with
> 
> -fcomplete-unroll-loops-may-grow
> 
> (ick).  And on a second thought -fcomplete-unroll-loops[=...] should
> be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
> bases?
> 
> I really hate to explode the number of options users have to
> consider optimizing their code ...

Well, the defaults should be good for almost everyone.  But after that,
sure, it should be possible to tune things in a reasonable way.

> So if we can defer all this thinking and make a non-option flag
> variable work that would be best IMHO.

:-)


Segher


Re: [pushed] c++: Handle multiple aggregate overloads [PR95319].

2020-05-28 Thread H.J. Lu via Gcc-patches
On Wed, May 27, 2020 at 12:07 PM Jason Merrill via Gcc-patches
 wrote:
>
> Here, when considering the two 'insert' overloads, we look for aggregate
> conversions from the same initializer-list to B<3> or
> initializer_list>.  But since my fix for reshape_init overhead on the
> PR14179 testcase we reshaped the initializer-list directly, leading to an
> error when we then tried to reshape it differently for the second overload.
>
> Tested x86_64-pc-linux-gnu, applying to trunk and 10.
>
> gcc/cp/ChangeLog:
>
> PR c++/95319
> * decl.c (reshape_init_array_1): Don't reuse in overload context.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/95319
> * g++.dg/cpp0x/initlist-array12.C: New test.

I got

FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++17 (test for excess errors)
FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++2a (test for excess errors)
FAIL: g++.dg/ext/tmplattr10.C  -std=c++98 (test for excess errors)

on Linux/x86:

https://gcc.gnu.org/pipermail/gcc-regression/2020-May/072622.html

[hjl@gnu-cfl-2 gcc]$
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/testsuite/g++/../../xg++
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/testsuite/g++/../../
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/cpp0x/initlist-array12.C
-m32 -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -fdiagnostics-urls=never -nostdinc++
-I/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/x86_64-pc-linux-gnu/32/libstdc++-v3/include/x86_64-pc-linux-gnu
-I/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/x86_64-pc-linux-gnu/32/libstdc++-v3/include
-I/export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++
-I/export/gnu/import/git/sources/gcc/libstdc++-v3/include/backward
-I/export/gnu/import/git/sources/gcc/libstdc++-v3/testsuite/util
-fmessage-length=0 -std=c++2a -pedantic-errors -Wno-long-long -S -o
initlist-array12.s
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/cpp0x/initlist-array12.C:5:24:
fatal error: definition of ‘class std::initializer_list<
 >’ does not match ‘#include
’
compilation terminated.
[hjl@gnu-cfl-2 gcc]$

-- 
H.J.


Re: [PATCH 2/2] rs6000: allow cunroll to grow size according to -funroll-loop or -fpeel-loops

2020-05-28 Thread Segher Boessenkool
Hi Jiufu,

On Thu, May 28, 2020 at 04:52:07PM +0800, guojiufu wrote:
> gcc/ChangeLog
> 2020-02-28  Jiufu Guo  
> 
>   PR target/95018
>   * config/rs6000/rs6000.c (rs6000_option_override_internal):
>   Override flag_cunroll_grow_size.

This part is fine of course.  Thanks!


Segher


Re: [pushed] c++: Handle multiple aggregate overloads [PR95319].

2020-05-28 Thread Marek Polacek via Gcc-patches
On Thu, May 28, 2020 at 06:44:53AM -0700, H.J. Lu via Gcc-patches wrote:
> On Wed, May 27, 2020 at 12:07 PM Jason Merrill via Gcc-patches
>  wrote:
> >
> > Here, when considering the two 'insert' overloads, we look for aggregate
> > conversions from the same initializer-list to B<3> or
> > initializer_list>.  But since my fix for reshape_init overhead on the
> > PR14179 testcase we reshaped the initializer-list directly, leading to an
> > error when we then tried to reshape it differently for the second overload.
> >
> > Tested x86_64-pc-linux-gnu, applying to trunk and 10.
> >
> > gcc/cp/ChangeLog:
> >
> > PR c++/95319
> > * decl.c (reshape_init_array_1): Don't reuse in overload context.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR c++/95319
> > * g++.dg/cpp0x/initlist-array12.C: New test.
> 
> I got
> 
> FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++14 (test for excess errors)
> FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++17 (test for excess errors)
> FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++2a (test for excess errors)

This can be fixed with the attached patch.  Jason, is it OK?

> FAIL: g++.dg/ext/tmplattr10.C  -std=c++98 (test for excess errors)

But I don't know why this one fails.

-- >8 --

* g++.dg/cpp0x/initlist-array12.C: Fix the definition of
initializer_list for ilp32 target.
---
 gcc/testsuite/g++.dg/cpp0x/initlist-array12.C | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-array12.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-array12.C
index b012e7295d5..168c5dd6492 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist-array12.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-array12.C
@@ -1,10 +1,12 @@
 // PR c++/95319
 // { dg-do compile { target c++11 } }
 
+typedef decltype(sizeof(char)) size_t;
+
 namespace std {
 template  class initializer_list {
   int *_M_array;
-  unsigned long _M_len;
+  size_t _M_len;
 };
 template  struct A { typedef int _Type[_Nm]; };
 template  struct B { typename A<_Nm>::_Type _M_elems; };

base-commit: 3ea6977d0f1813d982743a09660eec1760e981ec
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [pushed] c++: Handle multiple aggregate overloads [PR95319].

2020-05-28 Thread H.J. Lu via Gcc-patches
On Thu, May 28, 2020 at 6:57 AM Marek Polacek  wrote:
>
> On Thu, May 28, 2020 at 06:44:53AM -0700, H.J. Lu via Gcc-patches wrote:
> > On Wed, May 27, 2020 at 12:07 PM Jason Merrill via Gcc-patches
> >  wrote:
> > >
> > > Here, when considering the two 'insert' overloads, we look for aggregate
> > > conversions from the same initializer-list to B<3> or
> > > initializer_list>.  But since my fix for reshape_init overhead on the
> > > PR14179 testcase we reshaped the initializer-list directly, leading to an
> > > error when we then tried to reshape it differently for the second 
> > > overload.
> > >
> > > Tested x86_64-pc-linux-gnu, applying to trunk and 10.
> > >
> > > gcc/cp/ChangeLog:
> > >
> > > PR c++/95319
> > > * decl.c (reshape_init_array_1): Don't reuse in overload context.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR c++/95319
> > > * g++.dg/cpp0x/initlist-array12.C: New test.
> >
> > I got
> >
> > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++14 (test for excess errors)
> > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++17 (test for excess errors)
> > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++2a (test for excess errors)
>
> This can be fixed with the attached patch.  Jason, is it OK?
>
> > FAIL: g++.dg/ext/tmplattr10.C  -std=c++98 (test for excess errors)
>
> But I don't know why this one fails.

[hjl@gnu-cfl-2 gcc]$
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/testsuite/g++/../../xg++
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/testsuite/g++/../../
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C
-m32 -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -fdiagnostics-urls=never -nostdinc++
-I/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/x86_64-pc-linux-gnu/32/libstdc++-v3/include/x86_64-pc-linux-gnu
-I/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/x86_64-pc-linux-gnu/32/libstdc++-v3/include
-I/export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++
-I/export/gnu/import/git/sources/gcc/libstdc++-v3/include/backward
-I/export/gnu/import/git/sources/gcc/libstdc++-v3/testsuite/util
-fmessage-length=0 -std=c++98 -pedantic-errors -Wno-long-long -S -o
tmplattr10.s
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:15:31:
error: variadic templates only available with ‘-std=c++11’ or
‘-std=gnu++11’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:16:28:
error: variadic templates only available with ‘-std=c++11’ or
‘-std=gnu++11’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:18:11:
error: expected nested-name-specifier before ‘type’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:21:31:
error: variadic templates only available with ‘-std=c++11’ or
‘-std=gnu++11’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:22:39:
error: variadic templates only available with ‘-std=c++11’ or
‘-std=gnu++11’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:24:11:
error: expected nested-name-specifier before ‘type’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:27:31:
error: variadic templates only available with ‘-std=c++11’ or
‘-std=gnu++11’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:28:38:
error: variadic templates only available with ‘-std=c++11’ or
‘-std=gnu++11’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:30:11:
error: expected nested-name-specifier before ‘type’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:6:
error: ISO C++ forbids declaration of ‘wrap’ with no type
[-fpermissive]
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:1:
error: top-level declaration of ‘wrap’ specifies ‘auto’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:51:
error: trailing return type only available with ‘-std=c++11’ or
‘-std=gnu++11’
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:
In function ‘int main()’:
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:51:12:
error: ‘wrap’ was not declared in this scope
[hjl@gnu-cfl-2 gcc]$

-- 
H.J.


Re: [patch] Add support for __builtin_bswap128

2020-05-28 Thread H.J. Lu via Gcc-patches
On Wed, May 27, 2020 at 8:26 AM Richard Biener via Gcc-patches
 wrote:
>
> On Wed, May 27, 2020 at 3:33 PM Eric Botcazou  wrote:
> >
> > > Please use int128 effective target rather than lp64 in the tests that need
> > > __int128 type.
> >
> > OK, thanks, adjusted locally.
>
> OK.

I am checking in this as an obvious fix.

-- 
H.J.
From 4d80ebea98cda52c57da8353a4da47029eea88b4 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 28 May 2020 07:07:13 -0700
Subject: [PATCH] gcc.dg/builtin-bswap-10.c: Check "! int128"

Check "! int128" instead of ilp32 since ILP32 targets can support int128.

gcc/testsuite/

	* gcc.dg/builtin-bswap-10.c: Check "! int128" instead of ilp32
---
 gcc/testsuite/gcc.dg/builtin-bswap-10.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-bswap-10.c b/gcc/testsuite/gcc.dg/builtin-bswap-10.c
index 6c8a39f17d0..6c69bcd70d8 100644
--- a/gcc/testsuite/gcc.dg/builtin-bswap-10.c
+++ b/gcc/testsuite/gcc.dg/builtin-bswap-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ilp32 } } } */
+/* { dg-do compile { target { ! int128 } } } */
 /* { dg-options "" } */
 /* { dg-final { scan-assembler "__builtin_" } } */
 
-- 
2.26.2



Re: [pushed] c++: Handle multiple aggregate overloads [PR95319].

2020-05-28 Thread Marek Polacek via Gcc-patches
On Thu, May 28, 2020 at 07:05:47AM -0700, H.J. Lu wrote:
> On Thu, May 28, 2020 at 6:57 AM Marek Polacek  wrote:
> >
> > On Thu, May 28, 2020 at 06:44:53AM -0700, H.J. Lu via Gcc-patches wrote:
> > > On Wed, May 27, 2020 at 12:07 PM Jason Merrill via Gcc-patches
> > >  wrote:
> > > >
> > > > Here, when considering the two 'insert' overloads, we look for aggregate
> > > > conversions from the same initializer-list to B<3> or
> > > > initializer_list>.  But since my fix for reshape_init overhead on 
> > > > the
> > > > PR14179 testcase we reshaped the initializer-list directly, leading to 
> > > > an
> > > > error when we then tried to reshape it differently for the second 
> > > > overload.
> > > >
> > > > Tested x86_64-pc-linux-gnu, applying to trunk and 10.
> > > >
> > > > gcc/cp/ChangeLog:
> > > >
> > > > PR c++/95319
> > > > * decl.c (reshape_init_array_1): Don't reuse in overload 
> > > > context.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR c++/95319
> > > > * g++.dg/cpp0x/initlist-array12.C: New test.
> > >
> > > I got
> > >
> > > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++14 (test for excess errors)
> > > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++17 (test for excess errors)
> > > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++2a (test for excess errors)
> >
> > This can be fixed with the attached patch.  Jason, is it OK?
> >
> > > FAIL: g++.dg/ext/tmplattr10.C  -std=c++98 (test for excess errors)
> >
> > But I don't know why this one fails.
> 
> [hjl@gnu-cfl-2 gcc]$
> /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/testsuite/g++/../../xg++
> -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/testsuite/g++/../../
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C
> -m32 -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never -fdiagnostics-urls=never -nostdinc++
> -I/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/x86_64-pc-linux-gnu/32/libstdc++-v3/include/x86_64-pc-linux-gnu
> -I/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/x86_64-pc-linux-gnu/32/libstdc++-v3/include
> -I/export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++
> -I/export/gnu/import/git/sources/gcc/libstdc++-v3/include/backward
> -I/export/gnu/import/git/sources/gcc/libstdc++-v3/testsuite/util
> -fmessage-length=0 -std=c++98 -pedantic-errors -Wno-long-long -S -o
> tmplattr10.s
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:15:31:
> error: variadic templates only available with ‘-std=c++11’ or
> ‘-std=gnu++11’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:16:28:
> error: variadic templates only available with ‘-std=c++11’ or
> ‘-std=gnu++11’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:18:11:
> error: expected nested-name-specifier before ‘type’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:21:31:
> error: variadic templates only available with ‘-std=c++11’ or
> ‘-std=gnu++11’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:22:39:
> error: variadic templates only available with ‘-std=c++11’ or
> ‘-std=gnu++11’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:24:11:
> error: expected nested-name-specifier before ‘type’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:27:31:
> error: variadic templates only available with ‘-std=c++11’ or
> ‘-std=gnu++11’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:28:38:
> error: variadic templates only available with ‘-std=c++11’ or
> ‘-std=gnu++11’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:30:11:
> error: expected nested-name-specifier before ‘type’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:6:
> error: ISO C++ forbids declaration of ‘wrap’ with no type
> [-fpermissive]
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:1:
> error: top-level declaration of ‘wrap’ specifies ‘auto’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:51:
> error: trailing return type only available with ‘-std=c++11’ or
> ‘-std=gnu++11’
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:
> In function ‘int main()’:
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:51:12:
> error: ‘wrap’ was not declared in this scope
> [hjl@gnu-cfl-2 gcc]$

I see, thanks.  Jason, is this one OK too, then?

Tested
make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
dg.exp=tmplattr10.C'

-- >8 --
This test uses C++11 features so should only run in c++11.

* g++.dg/ext/tmplattr10.C: Only run in c++11.
---
 gcc/testsuite/g++.dg/ext/tmplattr10.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/ext/tmplattr10.C 
b/gcc/testsuite/g++.

Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-28 Thread Richard Biener via Gcc-patches
On Tue, May 26, 2020 at 6:58 AM Jiufu Guo  wrote:
>
> David Edelsohn  writes:
>
> > On Mon, May 25, 2020 at 1:58 PM Richard Biener
> >  wrote:
> >>
> >> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool 
> >>  wrote:
> >> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
> >> >> On Mon, May 25, 2020 at 1:10 PM guojiufu 
> >> >wrote:
> >> >> Since a new flag is not needed to fix the regression please avoid
> >> >> adding -fcomplete-unroll-loops.
> >> >>
> >> >> For -frtl-unroll-loops you should be able to use
> >> >
> >> >Erm.  That *is* a new command-line option (the internal flags I do not
> >> >care about so much: new implementation details are *good*).  And a new
> >> >name that is a mistake in my opinion, for many reasons (users do not
> >> >know and should not have to care about "rtl"; the name is not
> >> >descriptive; it is useless churn, it is not the same name as we have
> >> >had for decades now; it is adding a new option for a future where we
> >> >will do most unrolling at gimple level, a future we do not know will
> >> >ever exist, and we do not know what that will look like anyway; it is
> >> >an extra level of indirection (in the name)).
> >> >
> >> >We should not have an -frtl-unroll-loops if we do not have a
> >> >-ftree-unroll-loops (or whatever).
> >> >
> >> >Unrolling early is not a good idea in general (the problems with the
> >> >very trivial complete unrolling case just underline that).  But we
> >> >*should* know which code we expect to unroll later, for better costing.
> >> >Adding names like "rtl-unroll-loops" only stands in the way of getting
> >> >a better design here.
> >>
> >> You folks made ppc specific hacks instead of a better design. Those
> >> now stand in the way as well. But sure, simply do not expose the
> >> flag to the users, use
> >> Var(flag_rtl_unroll_loops). My other points still stand.
> >>
> >> Feel free to ignore the regression part on the branch and come up
> >> with a great design. But don't expect to backport that then.
> >
> > I completely agree.
>
> Thanks a lot for all your comments, suggestions, and tips in the
> discussion.  Thank Richar, Segher, David, Hanza, and all!
>
> I may have an explanation about the intention of this work.
>
> We know that loop unrolling is a complex and tickly thing.  It could
> help some kinds of code in a great manner.  Sometimes there are side
> effects.  For different types of loop and different platforms, it may
> result in different effects.
> It would makes sense to tune the loop unrolling accordingly.  And so, to
> help and tune loop unrolling is what we want to do.
>
> Currently, we have loop unroller at GIMPLE part (cunroll/cunrolli) and
> RTL part.  There are some options (like -funroll-loops) and --params to
> control unrollers.
>
> Through target hook, it would be helpful for different platforms to tune
> unroller: checking the type of loops, check optimization level.
> Existing hooks may help with something, like turn --params.
>
> Adding separate flags(or options) may be helpful to control different
> behaviors independently.  This is one reason for the patch which
> introduces internal undocumented options.
>
> One previous patch, r10-4525, is tunning for ppc at -O2. Which
> implements an existing hook for rs6000 to check simple loops for RTL
> unroller. For cunroll, it just enables it even increasing size at -O2
> directly, without check the type of the loops.  And then the
> side/negative effects of cunroll are also visible at -O2 besides
> positive effects.  In PR95018, the side effect is shown on complex loop
> (early exit, and more peeling).
> One idea is for cunroll to tune it to avoid side effects. And if the
> heuristic is suitable, it would be helpful for other usage, like -O3 and
> -funroll-loops.
>
> Thanks for any comments!

For GIMPLE level transforms I don't think targets have more knowledge
than the middle-end.  In fact GIMPLE complete unrolling is about
secondary effects, removing redundancies and abstraction.  So IMHO
the correct approach is to look at individual cases and try to improve
the generic code rather than try to get better benchmark results
on a per-target manner by magical parameter tuning.

For what the RTL unroller does it indeed depends very heavily on
the target whether sth is beneficial or not.

So I'd like to see specific cases where you think cunroll should
do "better" on powerpc only but not elsewhere.

Richard.

> Jiufu
>
> >
> > This path is digging a deeper and deeper hole.
> >
> > - David


Re: [patch] Add support for __builtin_bswap128

2020-05-28 Thread Marek Polacek via Gcc-patches
On Thu, May 28, 2020 at 07:10:20AM -0700, H.J. Lu via Gcc-patches wrote:
> On Wed, May 27, 2020 at 8:26 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Wed, May 27, 2020 at 3:33 PM Eric Botcazou  wrote:
> > >
> > > > Please use int128 effective target rather than lp64 in the tests that 
> > > > need
> > > > __int128 type.
> > >
> > > OK, thanks, adjusted locally.
> >
> > OK.
> 
> I am checking in this as an obvious fix.

In
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=7e58fe0e4c2b79a1cf5c93161856e27e1c830162
you've committed a new ChangeLog entry to a ChangeLog file.  This should no
longer happen:
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546472.html

Please revert that -- the bot should take care of adding the ChangeLog entry
from the commit message to the ChangeLog file.

Marek



Re: [patch] Add support for __builtin_bswap128

2020-05-28 Thread Martin Liška

On 5/28/20 4:30 PM, Marek Polacek via Gcc-patches wrote:

On Thu, May 28, 2020 at 07:10:20AM -0700, H.J. Lu via Gcc-patches wrote:

On Wed, May 27, 2020 at 8:26 AM Richard Biener via Gcc-patches
 wrote:


On Wed, May 27, 2020 at 3:33 PM Eric Botcazou  wrote:



Please use int128 effective target rather than lp64 in the tests that need
__int128 type.


OK, thanks, adjusted locally.


OK.


I am checking in this as an obvious fix.


In
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=7e58fe0e4c2b79a1cf5c93161856e27e1c830162
you've committed a new ChangeLog entry to a ChangeLog file.  This should no
longer happen:
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546472.html

Please revert that -- the bot should take care of adding the ChangeLog entry
from the commit message to the ChangeLog file.

Marek



And please do not use references like r11-694, one can't use it for git 
commands.

Thanks,
Martin


Re: [patch] Add support for __builtin_bswap128

2020-05-28 Thread H.J. Lu via Gcc-patches
On Thu, May 28, 2020 at 7:30 AM Marek Polacek  wrote:
>
> On Thu, May 28, 2020 at 07:10:20AM -0700, H.J. Lu via Gcc-patches wrote:
> > On Wed, May 27, 2020 at 8:26 AM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Wed, May 27, 2020 at 3:33 PM Eric Botcazou  
> > > wrote:
> > > >
> > > > > Please use int128 effective target rather than lp64 in the tests that 
> > > > > need
> > > > > __int128 type.
> > > >
> > > > OK, thanks, adjusted locally.
> > >
> > > OK.
> >
> > I am checking in this as an obvious fix.
>
> In
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=7e58fe0e4c2b79a1cf5c93161856e27e1c830162
> you've committed a new ChangeLog entry to a ChangeLog file.  This should no
> longer happen:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546472.html
>
> Please revert that -- the bot should take care of adding the ChangeLog entry
> from the commit message to the ChangeLog file.

Done.

-- 
H.J.


Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-05-28 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Thu, May 28, 2020 at 10:52 AM guojiufu  wrote:
>>
>> From: Jiufu Guo 
>>
>> Currently GIMPLE complete unroller(cunroll) is checking
>> flag_unroll_loops and flag_peel_loops to see if allow size growth.
>> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
>> To have more freedom to control cunroll and RTL unroller, this patch
>> introduces flag_cunroll_grow_size.  With this patch, we can control
>> cunroll and RTL unroller indepently.
>>
>> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
>> GCC10 after week?
>>
>>
>> +funroll-completely-grow-size
>> +Var(flag_cunroll_grow_size) Init(2)
>> +; Control cunroll to allow size growth during complete unrolling
>> +
>
> So this really adds a new compiler option which would need
> documenting.
I once add 'Undocumented' (avoid shown in --help), and do not add
'Common' (avoid --help=common).  What I want to do is avoid expose this
to user. 
While this is still an option as you said.

>
> I fear we'll get into bikeshed territory here as well...  I originally thought
> we can use
>
> Variable
> int flag_cunroll_grow_size;

Thanks, this code is definetly a variable instead an option. I would try
this way. 
>
> but now realize that does not work well with LTO without adjusting
> the awk scripts to generate option saving/restoring.  For your patch
> you'd need to add 'Optimization' to get the flag streamed properly,
> you should also verify the target adjustment done in the backend
> is reflected in LTO mode.

At here, internal option is relative simple 'Optimization' could help.
When trying 'Variable', I will verify it in LTO mode. 

>
>>  ; Nonzero means that loop optimizer may assume that the induction variables
>>
>> +  /* Allow cunroll to grow size accordingly.  */
>> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
>> +flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
>> +
>
> Any reason to not use EnabledBy(funroll-loops || fpeel-loops)?

With tests and checking the generated code(e.g. options.c), I find that
this setting has some unexpected behavior:
For example, "-funroll-loops -fno-peel-loops" turns off the flag.
"||" would indicate the flag will be _on/off_ by f[no]-unroll-loop or
f[no]-peel-loops.

>
>>/* web and rename-registers help when run after loop unrolling.  */
>>if (flag_web == AUTODETECT_VALUE)
>>  flag_web = flag_unroll_loops;

>> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
>> -  || flag_peel_loops
>> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>>|| optimize >= 3, true);
>
> Given we check optimize >= 3 here please enable the flag by default
> at O3+ via opts.c:default_options_table and also elide the optimize >= 3
> check.  That way -fno-unroll-completely-grow-size would have the desired 
> effect.
>
Actually in code flag_peel_loops is enabled at O3+, so, "|| optimize >=
3" could be removed.  Like you said, this helps to set negative form
even at -O3.

> Now back to the option name ... if we expose the option we should apply
> some forward looking.  Currently cunroll cannot be disabled or enabled
> with a flag and the desired new flag simply tunes one knob on it.  How
> about adding
>
> -fcomplete-unroll-loops[=may-grow]
-fcomplete-unroll-loops[=may-grow|inner|outer]
>
> to be able to further extend this later (there's the knob to only unroll
> non-outermost loops and the knob whether to unroll loops where
> intermediate exits are not statically predicted - incompletely controlled
> by -fpeel-loops).  There's unfortunately no existing examples that allows
> multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
> the sanitizers which have manual option parsing.
>
> So if there's no good suggestion from option folks maybe go with
>
> -fcomplete-unroll-loops-may-grow
>
> (ick).  And on a second thought -fcomplete-unroll-loops[=...] should
> be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
> bases?
>
> I really hate to explode the number of options users have to
> consider optimizing their code ...
>
> So if we can defer all this thinking and make a non-option flag
> variable work that would be best IMHO.
Yes, a few things are need trade-off when designing a user options.


Jiufu
>
> Richard.
>
>>if (peeled_loops)
>>  {
>> --
>> 2.17.1
>>


[AArch64][GCC-8][GCC-9] Use __getauxval instead of getauxval in LSE detection code in libgcc

2020-05-28 Thread Andre Vieira (lists)

The patch applies cleanly on gcc-9 and gcc-8.
I bootstrapped this on aarch64-none-linux-gnu and tested 
aarch64-none-elf for both.


Is this OK for those backports?

libgcc/ChangeLog:
2020-05-28  Andre Vieira  

    Backport from mainline.
    2020-05-06  Kyrylo Tkachov  

    * config/aarch64/lse-init.c (init_have_lse_atomics): Use __getauxval
    instead of getauxval.
    (AT_HWCAP): Define.
    (HWCAP_ATOMICS): Define.
    Guard detection on __gnu_linux__.

On 06/05/2020 16:24, Kyrylo Tkachov wrote:



-Original Message-
From: Joseph Myers 
Sent: 06 May 2020 15:46
To: Richard Biener 
Cc: Kyrylo Tkachov ; Florian Weimer
; Szabolcs Nagy ; gcc-
patc...@gcc.gnu.org; Jakub Jelinek 
Subject: Re: [PATCH][AArch64] Use __getauxval instead of getauxval in LSE
detection code in libgcc

On Wed, 6 May 2020, Richard Biener wrote:


Here is the updated patch for the record.
Jakub, richi, is this ok for the GCC 10 branch?

I'll defer to Joseph who is release manager as well.

This version is OK with me.

Thank you Joseph,
I've committed this version to trunk and the gcc-10 branch.
Kyrill


--
Joseph S. Myers
jos...@codesourcery.com


RE: [AArch64][GCC-8][GCC-9] Use __getauxval instead of getauxval in LSE detection code in libgcc

2020-05-28 Thread Kyrylo Tkachov
Hi Andre,

> -Original Message-
> From: Andre Vieira (lists) 
> Sent: 28 May 2020 15:42
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [AArch64][GCC-8][GCC-9] Use __getauxval instead of getauxval in
> LSE detection code in libgcc
> 
> The patch applies cleanly on gcc-9 and gcc-8.
> I bootstrapped this on aarch64-none-linux-gnu and tested
> aarch64-none-elf for both.
> 
> Is this OK for those backports?

Yes, thanks for testing this.
Kyrill

> 
> libgcc/ChangeLog:
> 2020-05-28  Andre Vieira  
> 
>      Backport from mainline.
>      2020-05-06  Kyrylo Tkachov  
> 
>      * config/aarch64/lse-init.c (init_have_lse_atomics): Use __getauxval
>      instead of getauxval.
>      (AT_HWCAP): Define.
>      (HWCAP_ATOMICS): Define.
>      Guard detection on __gnu_linux__.
> 
> On 06/05/2020 16:24, Kyrylo Tkachov wrote:
> >
> >> -Original Message-
> >> From: Joseph Myers 
> >> Sent: 06 May 2020 15:46
> >> To: Richard Biener 
> >> Cc: Kyrylo Tkachov ; Florian Weimer
> >> ; Szabolcs Nagy ; gcc-
> >> patc...@gcc.gnu.org; Jakub Jelinek 
> >> Subject: Re: [PATCH][AArch64] Use __getauxval instead of getauxval in LSE
> >> detection code in libgcc
> >>
> >> On Wed, 6 May 2020, Richard Biener wrote:
> >>
>  Here is the updated patch for the record.
>  Jakub, richi, is this ok for the GCC 10 branch?
> >>> I'll defer to Joseph who is release manager as well.
> >> This version is OK with me.
> > Thank you Joseph,
> > I've committed this version to trunk and the gcc-10 branch.
> > Kyrill
> >
> >> --
> >> Joseph S. Myers
> >> jos...@codesourcery.com


Re: [stage1][PATCH] Lower VEC_COND_EXPR into internal functions.

2020-05-28 Thread Martin Liška

Hi.

There's a new patch that adds normal internal functions for the 4
VCOND* functions.

The patch that survives bootstrap and regression
tests on x86_64-linux-gnu and ppc64le-linux-gnu.

Thoughts?
Martin
>From 9a8880a601c7820eb2d0c9104367ea454571681e Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 9 Mar 2020 13:23:03 +0100
Subject: [PATCH] Lower VEC_COND_EXPR into internal functions.

gcc/ChangeLog:

2020-03-30  Martin Liska  

	* expr.c (expand_expr_real_2): Put gcc_unreachable, we should reach
	this path.
	(do_store_flag): Likewise here.
	* internal-fn.c (expand_vect_cond_optab_fn): New.
	(expand_VCOND): Likewise.
	(expand_VCONDU): Likewise.
	(expand_VCONDEQ): Likewise.
	(expand_vect_cond_mask_optab_fn): Likewise.
	(expand_VCOND_MASK): Likewise.
	* internal-fn.def (VCOND): New.
	(VCONDU): Likewise.
	(VCONDEQ): Likewise.
	(VCOND_MASK): Likewise.
	* optabs.c (expand_vec_cond_mask_expr): Removed.
	(expand_vec_cond_expr): Likewise.
	* optabs.h (expand_vec_cond_expr): Likewise.
	(vector_compare_rtx): Likewise.
	* passes.def: Add pass_gimple_isel.
	* tree-cfg.c (verify_gimple_assign_ternary): Add new
	GIMPLE check.
	* tree-pass.h (make_pass_gimple_isel): New.
	* tree-ssa-forwprop.c (pass_forwprop::execute): Do not forward
	to already lowered VEC_COND_EXPR.
	* tree-vect-generic.c (expand_vector_divmod): Expand to SSA_NAME.
	(expand_vector_condition): Expand tcc_comparison of a VEC_COND_EXPR
	into a SSA_NAME.
	(gimple_expand_vec_cond_expr): New.
	(gimple_expand_vec_cond_exprs): New.
	(class pass_gimple_isel): New.
	(make_pass_gimple_isel): New.
---
 gcc/expr.c  |  25 +
 gcc/internal-fn.c   |  98 +
 gcc/internal-fn.def |   5 +
 gcc/optabs.c| 124 +
 gcc/optabs.h|   7 +-
 gcc/passes.def  |   1 +
 gcc/tree-cfg.c  |   8 ++
 gcc/tree-pass.h |   1 +
 gcc/tree-ssa-forwprop.c |   6 +
 gcc/tree-vect-generic.c | 237 +++-
 10 files changed, 358 insertions(+), 154 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index dfbeae71518..a757394f436 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9205,17 +9205,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
   if (temp != 0)
 	return temp;
 
-  /* For vector MIN , expand it a VEC_COND_EXPR 
-	 and similarly for MAX .  */
   if (VECTOR_TYPE_P (type))
-	{
-	  tree t0 = make_tree (type, op0);
-	  tree t1 = make_tree (type, op1);
-	  tree comparison = build2 (code == MIN_EXPR ? LE_EXPR : GE_EXPR,
-type, t0, t1);
-	  return expand_vec_cond_expr (type, comparison, t0, t1,
-   original_target);
-	}
+	gcc_unreachable ();
 
   /* At this point, a MEM target is no longer useful; we will get better
 	 code without it.  */
@@ -9804,10 +9795,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
   }
 
-case VEC_COND_EXPR:
-  target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
-  return target;
-
 case VEC_DUPLICATE_EXPR:
   op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
   target = expand_vector_broadcast (mode, op0);
@@ -12138,8 +12125,7 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
   STRIP_NOPS (arg1);
 
   /* For vector typed comparisons emit code to generate the desired
- all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
- expander for this.  */
+ all-ones or all-zeros mask.  */
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
 {
   tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
@@ -12147,12 +12133,7 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
 	  && expand_vec_cmp_expr_p (TREE_TYPE (arg0), ops->type, ops->code))
 	return expand_vec_cmp_expr (ops->type, ifexp, target);
   else
-	{
-	  tree if_true = constant_boolean_node (true, ops->type);
-	  tree if_false = constant_boolean_node (false, ops->type);
-	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
-   if_false, target);
-	}
+	gcc_unreachable ();
 }
 
   /* Optimize (x % C1) == C2 or (x % C1) != C2 if it is beneficial
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 5e9aa60721e..aa41b4f6870 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
+#include "explow.h"
 
 /* The names of each internal function, indexed by function number.  */
 const char *const internal_fn_name_array[] = {
@@ -2548,6 +2549,103 @@ expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 
 #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn
 
+/* Expand VCOND, VCONDU and VCONDEQ internal functions.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_optab_fn (internal_fn ifn, gcall *stmt)
+{
+  class expand_operand ops[6];
+  insn_co

Re: [PATCH] [aarch64] Fix PR94591: GCC generates invalid rev64 insns

2020-05-28 Thread Alex Coplan
On 19/05/2020 17:59, Richard Sandiford wrote:
> Alex Coplan  writes:
> > Hello,
> >
> > This patch fixes PR94591. The problem was the function 
> > aarch64_evpc_rev_local()
> > matching vector permutations that were not reversals. In particular, prior 
> > to
> > this patch, this function matched the identity permutation which led to
> > generating bogus REV64 insns which were rejected by the assembler.
> >
> > Testing:
> >  - New regression test which passes after applying the patch.
> >  - New test passes on an x64 -> aarch64-none-elf cross.
> >  - Bootstrap and regtest on aarch64-linux-gnu.
> >
> > OK to install?
> >
> > Thanks,
> > Alex
> >
> > ---
> >
> > gcc/ChangeLog:
> >
> > 2020-05-19  Alex Coplan  
> >
> > PR target/94591
> > * config/aarch64/aarch64.c (aarch64_evpc_rev_local): Don't match
> > identity permutation.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-05-19  Alex Coplan  
> >
> > PR target/94591
> > * gcc.c-torture/execute/pr94591.c: New test.
> 
> OK, thanks.
> 
> Richard

I've just tested this patch on gcc-{8,9,10} release branches:
bootstraps+regtests on aarch64-linux-gnu came back clean.

Since this was a regression introduced in GCC 8, is it OK to backport
the fix to those release branches now?

Thanks,
Alex

> 
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 70aa2f752b5..79c016f4dc3 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -20191,7 +20191,8 @@ aarch64_evpc_rev_local (struct expand_vec_perm_d *d)
> >  
> >if (d->vec_flags == VEC_SVE_PRED
> >|| !d->one_vector_p
> > -  || !d->perm[0].is_constant (&diff))
> > +  || !d->perm[0].is_constant (&diff)
> > +  || !diff)
> >  return false;
> >  
> >size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode);
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94591.c 
> > b/gcc/testsuite/gcc.c-torture/execute/pr94591.c
> > new file mode 100644
> > index 000..42271ad8bce
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94591.c
> > @@ -0,0 +1,32 @@
> > +typedef unsigned __attribute__((__vector_size__(8))) V2SI_u;
> > +typedef int __attribute__((__vector_size__(8))) V2SI_d;
> > +
> > +typedef unsigned long __attribute__((__vector_size__(16))) V2DI_u;
> > +typedef long __attribute__((__vector_size__(16))) V2DI_d;
> > +
> > +void id_V2SI(V2SI_d *v)
> > +{
> > +  *v = __builtin_shuffle(*v, (V2SI_d)(V2SI_u) { 0, 1 });
> > +}
> > +
> > +void id_V2DI(V2DI_d *v)
> > +{
> > +  *v = __builtin_shuffle(*v, (V2DI_d)(V2DI_u) { 0, 1 });
> > +}
> > +
> > +extern void abort(void);
> > +
> > +int main(void)
> > +{
> > +  V2SI_d si = { 35, 42 };
> > +  id_V2SI(&si);
> > +
> > +  if (si[0] != 35 || si[1] != 42)
> > +abort();
> > +
> > +  V2DI_d di = { 63, 38 };
> > +  id_V2DI(&di);
> > +
> > +  if (di[0] != 63 || di[1] != 38)
> > +abort();
> > +}


Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-05-28 Thread Richard Biener via Gcc-patches
On Thu, May 28, 2020 at 4:37 PM Jiufu Guo  wrote:
>
> Richard Biener  writes:
>
> > On Thu, May 28, 2020 at 10:52 AM guojiufu  wrote:
> >>
> >> From: Jiufu Guo 
> >>
> >> Currently GIMPLE complete unroller(cunroll) is checking
> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
> >> To have more freedom to control cunroll and RTL unroller, this patch
> >> introduces flag_cunroll_grow_size.  With this patch, we can control
> >> cunroll and RTL unroller indepently.
> >>
> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
> >> GCC10 after week?
> >>
> >>
> >> +funroll-completely-grow-size
> >> +Var(flag_cunroll_grow_size) Init(2)
> >> +; Control cunroll to allow size growth during complete unrolling
> >> +
> >
> > So this really adds a new compiler option which would need
> > documenting.
> I once add 'Undocumented' (avoid shown in --help), and do not add
> 'Common' (avoid --help=common).  What I want to do is avoid expose this
> to user.
> While this is still an option as you said.
>
> >
> > I fear we'll get into bikeshed territory here as well...  I originally 
> > thought
> > we can use
> >
> > Variable
> > int flag_cunroll_grow_size;
>
> Thanks, this code is definetly a variable instead an option. I would try
> this way.
> >
> > but now realize that does not work well with LTO without adjusting
> > the awk scripts to generate option saving/restoring.  For your patch
> > you'd need to add 'Optimization' to get the flag streamed properly,
> > you should also verify the target adjustment done in the backend
> > is reflected in LTO mode.
>
> At here, internal option is relative simple 'Optimization' could help.
> When trying 'Variable', I will verify it in LTO mode.

It won't work without adjusting the awk scripts.  So go with

funroll-completely-grow-size
Undocumented Optimization Var(flag_cunroll_grow_size)
EnabledBy(funroll-loops || fpeel-loops)
; ...

and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
an option not supposed to be set by users?


> >
> >>  ; Nonzero means that loop optimizer may assume that the induction 
> >> variables
> >>
> >> +  /* Allow cunroll to grow size accordingly.  */
> >> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
> >> +flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
> >> +
> >
> > Any reason to not use EnabledBy(funroll-loops || fpeel-loops)?
>
> With tests and checking the generated code(e.g. options.c), I find that
> this setting has some unexpected behavior:
> For example, "-funroll-loops -fno-peel-loops" turns off the flag.
> "||" would indicate the flag will be _on/off_ by f[no]-unroll-loop or
> f[no]-peel-loops.
>
> >
> >>/* web and rename-registers help when run after loop unrolling.  */
> >>if (flag_web == AUTODETECT_VALUE)
> >>  flag_web = flag_unroll_loops;
>
> >> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> >> -  || flag_peel_loops
> >> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
> >>|| optimize >= 3, true);
> >
> > Given we check optimize >= 3 here please enable the flag by default
> > at O3+ via opts.c:default_options_table and also elide the optimize >= 3
> > check.  That way -fno-unroll-completely-grow-size would have the desired 
> > effect.
> >
> Actually in code flag_peel_loops is enabled at O3+, so, "|| optimize >=
> 3" could be removed.  Like you said, this helps to set negative form
> even at -O3.

You are right.

> > Now back to the option name ... if we expose the option we should apply
> > some forward looking.  Currently cunroll cannot be disabled or enabled
> > with a flag and the desired new flag simply tunes one knob on it.  How
> > about adding
> >
> > -fcomplete-unroll-loops[=may-grow]
> -fcomplete-unroll-loops[=may-grow|inner|outer]
> >
> > to be able to further extend this later (there's the knob to only unroll
> > non-outermost loops and the knob whether to unroll loops where
> > intermediate exits are not statically predicted - incompletely controlled
> > by -fpeel-loops).  There's unfortunately no existing examples that allows
> > multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
> > the sanitizers which have manual option parsing.
> >
> > So if there's no good suggestion from option folks maybe go with
> >
> > -fcomplete-unroll-loops-may-grow
> >
> > (ick).  And on a second thought -fcomplete-unroll-loops[=...] should
> > be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
> > bases?
> >
> > I really hate to explode the number of options users have to
> > consider optimizing their code ...
> >
> > So if we can defer all this thinking and make a non-option flag
> > variable work that would be best IMHO.
> Yes, a few things are need trade-off when designing a user options.
>
>
> Jiufu
> >
> 

Re: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

2020-05-28 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
> Thanks for reviewing this.
> Attached please find the v5 patch.
> Note: we also need to modify local variable "mode" once we catch one case.  I 
> see test failure without this change.

Looks good.  Patch is OK assuming the x86 folks don't want to rewrite
gcc.target/i386/pr67609.c to avoid the new optimisation.  I'll hold off
applying until the AVX512 thing is sorted.

> Bootstrapped and tested on aarch64-linux-gnu.
> Also bootstrapped on x86_64-linux-gnu.  Regression test show two failed tests 
> on this platform:

Thanks for the extra testing.

> 1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors)
> 2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa
>
> I have adjust the second one in the v4 patch.

So this is:

movdqa  reg(%rip), %xmm1
movaps  %xmm1, -24(%rsp)
movsd   %xmm0, -24(%rsp)
movapd  -24(%rsp), %xmm2
movaps  %xmm2, reg(%rip)
ret

to:

movq%xmm0, reg(%rip)
ret

Nice.  I think it's safe to say that's an improvement :-)

I don't know whether this means we're no longer testing what the test
was intended to test.  Maybe one of the x86 folks has an opinion about
whether we should instead rewrite the test somehow.

> But The first one looks strange to me.
> I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like:
>
> 125 vcvtps2ph   $0, %zmm0, -112(%rbp){%k1}
> 126 vcvtps2ph   $0, %zmm0, -80(%rbp){%k1}{z}
>
> This happens in the combine phase, where an combined insn looks like:
>
> 1989 Trying 31 -> 33:
> 199031: r106:V16HI=vec_merge(unspec[r103:V16SF,0] 
> 133,[frame:DI-0x60],r109:HI)
> 199133: [frame:DI-0x60]=r106:V16HI
> 1992   REG_DEAD r106:V16HI
> 1993 Successfully matched this instruction:
> 1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> 1995 (const_int -96 [0xffa0])) [2 res2.x+0 S32 A256])
> 1996 (vec_merge:V16HI (unspec:V16HI [
> 1997 (reg:V16SF 103)
> 1998 (const_int 0 [0])
> 1999 ] UNSPEC_VCVTPS2PH)
> 2000 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> 2001 (const_int -96 [0xffa0])) [2 res2.x+0 S32 
> A256])
> 2002 (reg:HI 109)))
> 2003 allowing combination of insns 31 and 33
> 2004 original costs 16 + 4 = 20
> 2005 replacement cost 16
> 2006 deferring deletion of insn with uid = 31.
> 2007 modifying insn i333: [frame:DI-0x60]=vec_merge(unspec[r103:V16SF,0] 
> 133,[frame:DI-0x60],r109:HI)
> 2008 deferring rescan insn with uid = 33.
>
> And this can be matched with pattern: avx512f_vcvtps2ph512_mask
> 2282 (insn 33 31 37 4 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> 2283 (const_int -96 [0xffa0])) [2 res2.x+0 S32 
> A256])
> 2284 (vec_merge:V16HI (unspec:V16HI [
> 2285 (reg:V16SF 103)
> 2286 (const_int 0 [0])
> 2287 ] UNSPEC_VCVTPS2PH)
> 2288 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> 2289 (const_int -96 [0xffa0])) [2 res2.x+0 
> S32 A256])
> 2290 (reg:HI 109))) "avx512f-vcvtps2ph-2.c":80:10 5324 
> {avx512f_vcvtps2ph512_mask}
> 2291  (nil))
>
> gcc/config/i386/sse.md:
> 21663 (define_insn "avx512f_vcvtps2ph512"
> 21664   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
> 21665 (unspec:V16HI
> 21666   [(match_operand:V16SF 1 "register_operand" "v")
> 21667(match_operand:SI 2 "const_0_to_255_operand" "N")]
> 21668   UNSPEC_VCVTPS2PH))]
> 21669   "TARGET_AVX512F"
> 21670   "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}"
> 21671   [(set_attr "type" "ssecvt")
> 21672(set_attr "prefix" "evex")
> 21673(set_attr "mode" "V16SF")])
>
> How can that happen? 

This is due to define_subst magic.  The generators automatically
create a vec_merge form of the instruction based on the information
in the  attributes.

AFAICT the rtl above is for the line-125 instruction, which looks ok.
The problem is the line-126 instruction, since vcvtps2ph doesn't
AIUI allow zero masking.

The "mask" define_subst allows both zeroing and merging,
so I guess this means that the pattern should either be using
a different define_subst, or should be enforcing merging in
some other way.  Please could one of the x86 devs take a look?

Thanks,
Richard


Re: [pushed] c++: Handle multiple aggregate overloads [PR95319].

2020-05-28 Thread Jason Merrill via Gcc-patches
On Thu, May 28, 2020 at 10:16 AM Marek Polacek  wrote:

> On Thu, May 28, 2020 at 07:05:47AM -0700, H.J. Lu wrote:
> > On Thu, May 28, 2020 at 6:57 AM Marek Polacek 
> wrote:
> > >
> > > On Thu, May 28, 2020 at 06:44:53AM -0700, H.J. Lu via Gcc-patches
> wrote:
> > > > On Wed, May 27, 2020 at 12:07 PM Jason Merrill via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Here, when considering the two 'insert' overloads, we look for
> aggregate
> > > > > conversions from the same initializer-list to B<3> or
> > > > > initializer_list>.  But since my fix for reshape_init
> overhead on the
> > > > > PR14179 testcase we reshaped the initializer-list directly,
> leading to an
> > > > > error when we then tried to reshape it differently for the second
> overload.
> > > > >
> > > > > Tested x86_64-pc-linux-gnu, applying to trunk and 10.
> > > > >
> > > > > gcc/cp/ChangeLog:
> > > > >
> > > > > PR c++/95319
> > > > > * decl.c (reshape_init_array_1): Don't reuse in overload
> context.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR c++/95319
> > > > > * g++.dg/cpp0x/initlist-array12.C: New test.
> > > >
> > > > I got
> > > >
> > > > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++14 (test for excess
> errors)
> > > > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++17 (test for excess
> errors)
> > > > FAIL: g++.dg/cpp0x/initlist-array12.C  -std=c++2a (test for excess
> errors)
> > >
> > > This can be fixed with the attached patch.  Jason, is it OK?
> > >
> > > > FAIL: g++.dg/ext/tmplattr10.C  -std=c++98 (test for excess errors)
> > >
> > > But I don't know why this one fails.
> >
> > [hjl@gnu-cfl-2 gcc]$
> >
> /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/testsuite/g++/../../xg++
> >
> -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/testsuite/g++/../../
> > /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C
> > -m32 -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> > -fdiagnostics-color=never -fdiagnostics-urls=never -nostdinc++
> >
> -I/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/x86_64-pc-linux-gnu/32/libstdc++-v3/include/x86_64-pc-linux-gnu
> >
> -I/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/x86_64-pc-linux-gnu/32/libstdc++-v3/include
> > -I/export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++
> > -I/export/gnu/import/git/sources/gcc/libstdc++-v3/include/backward
> > -I/export/gnu/import/git/sources/gcc/libstdc++-v3/testsuite/util
> > -fmessage-length=0 -std=c++98 -pedantic-errors -Wno-long-long -S -o
> > tmplattr10.s
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:15:31:
> > error: variadic templates only available with ‘-std=c++11’ or
> > ‘-std=gnu++11’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:16:28:
> > error: variadic templates only available with ‘-std=c++11’ or
> > ‘-std=gnu++11’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:18:11:
> > error: expected nested-name-specifier before ‘type’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:21:31:
> > error: variadic templates only available with ‘-std=c++11’ or
> > ‘-std=gnu++11’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:22:39:
> > error: variadic templates only available with ‘-std=c++11’ or
> > ‘-std=gnu++11’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:24:11:
> > error: expected nested-name-specifier before ‘type’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:27:31:
> > error: variadic templates only available with ‘-std=c++11’ or
> > ‘-std=gnu++11’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:28:38:
> > error: variadic templates only available with ‘-std=c++11’ or
> > ‘-std=gnu++11’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:30:11:
> > error: expected nested-name-specifier before ‘type’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:6:
> > error: ISO C++ forbids declaration of ‘wrap’ with no type
> > [-fpermissive]
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:1:
> > error: top-level declaration of ‘wrap’ specifies ‘auto’
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:34:51:
> > error: trailing return type only available with ‘-std=c++11’ or
> > ‘-std=gnu++11’
> > /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:
> > In function ‘int main()’:
> >
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/tmplattr10.C:51:12:
> > error: ‘wrap’ was not declared in this scope
> > [hjl@gnu-cfl-2 gcc]$
>
> I see, thanks.  Jason, is this one OK too, then?
>

Yes, thanks.  I wonder what was wrong with my testing setup that I didn't
see these myself...


> Tested
> make check-g++ RUNTESTFLAG

[PATCH] c++: lambdas inside constraints [PR92652]

2020-05-28 Thread Patrick Palka via Gcc-patches
When parsing a constraint-expression, a requires-clause or a
requires-expression, we temporarily increment processing_template_decl
so that we always obtain template trees which we later reduce via
substitution even when not inside a template.

But incrementing processing_template_decl when we're already inside a
template has the unintended side effect of shifting up the template
parameter levels of a lambda defined inside one of these constructs,
which leads to confusion later during substitution into the lambda.

This patch fixes this issue by incrementing processing_template_decl
during parsing of these constructs only if it is 0.

Passes 'make check-c++', and also tested by building cmcstl2, does this
look OK to commit after a full bootstrap/regtest?

gcc/cp/ChangeLog:

PR c++/92652
PR c++/93698
PR c++/94128
* parser.c (cp_parser_requires_clause_expression): Temporarily
increment processing_template_decl only if it is 0.
(cp_parser_constraint_expression): Likewise.
(cp_parser_requires_expression): Likewise.

gcc/testsuite/ChangeLog:

PR c++/92652
PR c++/93698
PR c++/94128
* g++.dg/cpp2a/concepts-lambda8.C: New test.
* g++.dg/cpp2a/concepts-lambda9.C: New test.
* g++.dg/cpp2a/concepts-lambda10.C: New test.
---
 gcc/cp/parser.c| 15 +--
 gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C |  7 +++
 gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C  | 11 +++
 gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C  | 11 +++
 4 files changed, 38 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 54ca875ce54..3bca1f3770a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -27663,11 +27663,12 @@ static tree
 cp_parser_requires_clause_expression (cp_parser *parser, bool lambda_p)
 {
   processing_constraint_expression_sentinel parsing_constraint;
-  ++processing_template_decl;
+  temp_override ovr (processing_template_decl);
+  if (!processing_template_decl)
+processing_template_decl = 1;
   cp_expr expr = cp_parser_constraint_logical_or_expression (parser, lambda_p);
   if (check_for_bare_parameter_packs (expr))
 expr = error_mark_node;
-  --processing_template_decl;
   return expr;
 }
 
@@ -27684,12 +27685,13 @@ static tree
 cp_parser_constraint_expression (cp_parser *parser)
 {
   processing_constraint_expression_sentinel parsing_constraint;
-  ++processing_template_decl;
+  temp_override ovr (processing_template_decl);
+  if (!processing_template_decl)
+processing_template_decl = 1;
   cp_expr expr = cp_parser_binary_expression (parser, false, true,
  PREC_NOT_OPERATOR, NULL);
   if (check_for_bare_parameter_packs (expr))
 expr = error_mark_node;
-  --processing_template_decl;
   expr.maybe_add_location_wrapper ();
   return expr;
 }
@@ -27798,9 +27800,10 @@ cp_parser_requires_expression (cp_parser *parser)
   parms = NULL_TREE;
 
 /* Parse the requirement body. */
-++processing_template_decl;
+temp_override ovr (processing_template_decl);
+if (!processing_template_decl)
+  processing_template_decl = 1;
 reqs = cp_parser_requirement_body (parser);
---processing_template_decl;
 if (reqs == error_mark_node)
   return error_mark_node;
   }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C
new file mode 100644
index 000..392da312b28
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C
@@ -0,0 +1,7 @@
+// PR c++/94128
+// { dg-do compile { target c++20 } }
+
+void test(auto param)
+requires requires{ { [](auto p){return p;}(param) }; };
+
+void test2() { test(1); }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C
new file mode 100644
index 000..c1c9be682d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C
@@ -0,0 +1,11 @@
+// PR c++/92652
+// { dg-do compile { target concepts } }
+
+template < typename T >
+requires ([]{return true ;}())
+void h() { }
+
+int main()
+{
+h();
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C
new file mode 100644
index 000..6b81ba0adac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C
@@ -0,0 +1,11 @@
+// PR c++/93698
+// { dg-do compile { target concepts } }
+
+#include 
+
+template 
+concept foo = [](std::index_sequence) constexpr {
+  return (Is + ...) > 10;
+}(std::make_index_sequence());
+
+bool a = foo<7>;
-- 
2.27.0.rc1.5.gae92ac8ae3



[PATCH]: aarch64: add support for unpacked EOR, ORR and AND

2020-05-28 Thread Joe Ramsay


From: Joe Ramsay 
Date: Thursday, 28 May 2020 at 16:19
To: Gcc-patches 
Subject: [PATCH]: aarch64: add support for unpacked EOR, ORR and AND

Hi!

This patch improves code generation for EOR, ORR and AND on unpacked vectors 
with SVE. The following function:
void f (unsigned int *x, unsigned short *y, unsigned short *z) {
  for (int i = 0; i < 7; ++i)
x[i] = (unsigned short) (y[i] & z[i]);
}

previously compiled to
ptrue   p1.d, vl3
ld1hz0.d, p1/z, [x1, #1, mul vl]
ptrue   p0.b, vl32
st1hz0.d, p0, [sp, #1, mul vl]
ld1hz0.d, p1/z, [x2, #1, mul vl]
st1hz0.d, p0, [sp]
ldr x3, [x2]
ldp x4, x2, [sp]
ldr x1, [x1]
and x1, x3, x1
and x2, x2, x4
str x2, [sp]
ld1hz0.d, p0/z, [sp]
str x1, [sp]
uxthz0.s, p0/m, z0.s
st1wz0.d, p1, [x0, #1, mul vl]
ld1hz0.d, p0/z, [sp]
uxthz0.s, p0/m, z0.s
st1wz0.d, p0, [x0]
add sp, sp, 16
ret

and now compiles to:
ptrue   p0.s, vl7
ptrue   p1.b, vl32
ld1hz1.s, p0/z, [x1]
ld1hz0.s, p0/z, [x2]
add z0.h, z0.h, z1.h
uxthz0.s, p1/m, z0.s
st1wz0.s, p0, [x0]
ret

Tested on aarch64-linux-gnu and x86_64-linux-gnu hosts.

Thanks,
Joe



2020-05-20  Joe Ramsay  



* config/aarch64/aarch64-sve.md (3): 
Add support for unpacked EOR, ORR, AND.



gcc/testsuite/ChangeLog



2020-05-20  Joe Ramsay  



* gcc.target/aarch64/sve/load_const_offset_2.c: Force using packed 
vectors.

* gcc.target/aarch64/sve/logical_unpacked_and_1.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_and_2.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_and_3.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_and_4.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_and_5.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_and_6.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_and_7.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_eor_1.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_eor_2.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_eor_3.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_eor_4.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_eor_5.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_eor_6.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_eor_7.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_orr_1.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_orr_2.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_orr_3.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_orr_4.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_orr_5.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_orr_6.c: New test.

* gcc.target/aarch64/sve/logical_unpacked_orr_7.c: New test.

* gcc.target/aarch64/sve/scatter_store_6.c: Force using packed vectors.

* gcc.target/aarch64/sve/scatter_store_7.c: Force using packed vectors.

* gcc.target/aarch64/sve/strided_load_3.c: Force using packed vectors.

* gcc.target/aarch64/sve/strided_store_3.c: Force using packed vectors.

* gcc.target/aarch64/sve/unpack_signed_1.c: Force using packed vectors.




0001-Support-AND-ORR-EOR-on-unpacked-vectors.patch
Description: 0001-Support-AND-ORR-EOR-on-unpacked-vectors.patch


Re: [stage1][PATCH] Lower VEC_COND_EXPR into internal functions.

2020-05-28 Thread Richard Sandiford
Martin Liška  writes:
> Hi.
>
> There's a new patch that adds normal internal functions for the 4
> VCOND* functions.
>
> The patch that survives bootstrap and regression
> tests on x86_64-linux-gnu and ppc64le-linux-gnu.

I think this has the same problem as the previous one.  What I meant
in yesterday's message is that:

  expand_insn (icode, 6, ops);

is simply not valid when icode is allowed to FAIL.  That's true in
any context, not just internal functions.  If icode does FAIL,
the expand_insn call will ICE:

  if (!maybe_expand_insn (icode, nops, ops))
gcc_unreachable ();

When using optabs you either:

(a) declare that the md patterns aren't allowed to FAIL.  expand_insn
is for this case.

(b) allow the md patterns to FAIL and provide a fallback when they do.
maybe_expand_insn is for this case.

So if we keep IFN_VCOND, we need to use maybe_expand_insn and find some
way of implementing the IFN_VCOND when the pattern FAILs.

Thanks,
Richard


Re: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

2020-05-28 Thread H.J. Lu via Gcc-patches
On Thu, May 28, 2020 at 8:00 AM Richard Sandiford
 wrote:
>
> "Yangfei (Felix)"  writes:
> > Thanks for reviewing this.
> > Attached please find the v5 patch.
> > Note: we also need to modify local variable "mode" once we catch one case.  
> > I see test failure without this change.
>
> Looks good.  Patch is OK assuming the x86 folks don't want to rewrite
> gcc.target/i386/pr67609.c to avoid the new optimisation.  I'll hold off
> applying until the AVX512 thing is sorted.
>
> > Bootstrapped and tested on aarch64-linux-gnu.
> > Also bootstrapped on x86_64-linux-gnu.  Regression test show two failed 
> > tests on this platform:
>
> Thanks for the extra testing.
>
> > 1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors)
> > 2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa
> >
> > I have adjust the second one in the v4 patch.
>
> So this is:
>
> movdqa  reg(%rip), %xmm1
> movaps  %xmm1, -24(%rsp)
> movsd   %xmm0, -24(%rsp)
> movapd  -24(%rsp), %xmm2
> movaps  %xmm2, reg(%rip)
> ret
>
> to:
>
> movq%xmm0, reg(%rip)
> ret
>
> Nice.  I think it's safe to say that's an improvement :-)
>
> I don't know whether this means we're no longer testing what the test
> was intended to test.  Maybe one of the x86 folks has an opinion about
> whether we should instead rewrite the test somehow.
>
> > But The first one looks strange to me.
> > I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like:
> >
> > 125 vcvtps2ph   $0, %zmm0, -112(%rbp){%k1}
> > 126 vcvtps2ph   $0, %zmm0, -80(%rbp){%k1}{z}
> >
> > This happens in the combine phase, where an combined insn looks like:
> >
> > 1989 Trying 31 -> 33:
> > 199031: r106:V16HI=vec_merge(unspec[r103:V16SF,0] 
> > 133,[frame:DI-0x60],r109:HI)
> > 199133: [frame:DI-0x60]=r106:V16HI
> > 1992   REG_DEAD r106:V16HI
> > 1993 Successfully matched this instruction:
> > 1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > 1995 (const_int -96 [0xffa0])) [2 res2.x+0 S32 
> > A256])
> > 1996 (vec_merge:V16HI (unspec:V16HI [
> > 1997 (reg:V16SF 103)
> > 1998 (const_int 0 [0])
> > 1999 ] UNSPEC_VCVTPS2PH)
> > 2000 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > 2001 (const_int -96 [0xffa0])) [2 res2.x+0 S32 
> > A256])
> > 2002 (reg:HI 109)))
> > 2003 allowing combination of insns 31 and 33
> > 2004 original costs 16 + 4 = 20
> > 2005 replacement cost 16
> > 2006 deferring deletion of insn with uid = 31.
> > 2007 modifying insn i333: 
> > [frame:DI-0x60]=vec_merge(unspec[r103:V16SF,0] 133,[frame:DI-0x60],r109:HI)
> > 2008 deferring rescan insn with uid = 33.
> >
> > And this can be matched with pattern: avx512f_vcvtps2ph512_mask
> > 2282 (insn 33 31 37 4 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > 2283 (const_int -96 [0xffa0])) [2 res2.x+0 S32 
> > A256])
> > 2284 (vec_merge:V16HI (unspec:V16HI [
> > 2285 (reg:V16SF 103)
> > 2286 (const_int 0 [0])
> > 2287 ] UNSPEC_VCVTPS2PH)
> > 2288 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > 2289 (const_int -96 [0xffa0])) [2 res2.x+0 
> > S32 A256])
> > 2290 (reg:HI 109))) "avx512f-vcvtps2ph-2.c":80:10 5324 
> > {avx512f_vcvtps2ph512_mask}
> > 2291  (nil))
> >
> > gcc/config/i386/sse.md:
> > 21663 (define_insn "avx512f_vcvtps2ph512"
> > 21664   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
> > 21665 (unspec:V16HI
> > 21666   [(match_operand:V16SF 1 "register_operand" "v")
> > 21667(match_operand:SI 2 "const_0_to_255_operand" "N")]
> > 21668   UNSPEC_VCVTPS2PH))]
> > 21669   "TARGET_AVX512F"
> > 21670   "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}"
> > 21671   [(set_attr "type" "ssecvt")
> > 21672(set_attr "prefix" "evex")
> > 21673(set_attr "mode" "V16SF")])
> >
> > How can that happen?
>
> This is due to define_subst magic.  The generators automatically
> create a vec_merge form of the instruction based on the information
> in the  attributes.
>
> AFAICT the rtl above is for the line-125 instruction, which looks ok.
> The problem is the line-126 instruction, since vcvtps2ph doesn't
> AIUI allow zero masking.
>
> The "mask" define_subst allows both zeroing and merging,
> so I guess this means that the pattern should either be using
> a different define_subst, or should be enforcing merging in
> some other way.  Please could one of the x86 devs take a look?
>

Hongtao, can you take a look?

Thanks.


-- 
H.J.


[PATCH] doc: Clarify __builtin_return_address [PR94891]

2020-05-28 Thread Szabolcs Nagy
The expected semantics and valid usage of __builtin_return_address is
not clear since it exposes implementation internals that are normally
not meaningful to portable c code.

This documentation change tries to clarify the semantics in case the
return address is stored in a mangled form in memory which affects
AArch64 when pointer authentication is used for the return address
(i.e. -mbranch-protection=pac-ret).

---

This is an RFC patch trying to address PR target/94891:

AArch64 __builtin_return_address is currently returning the mangled
address even though user code cannot generally use such address or
tell if it is mangled. (So this patch will require aarch64 backend
changes.)

__builtin_extract_return_addr returns its argument unchanged on
AArch64. This can be changed but the assumption that this operation
can be reversed by __builtin_frob_return_addr makes it unsuitable
for general unmangling (return address signing requires additional
input other than the code address).

On AArch64 the return address mangling is ABI between the compiler
and unwinder / debugger: the unwind / debug info describes when and
how to unmangle the return address. This information may not be
available at runtime (e.g. without unwind tables) so user code cannot
handle a mangled return address in general. Currently the xpaclri
instruction always works and gives an unmangled address, but exposing
the mangled address to users means breaking existing code using
__builtin_return_address and constrains the mangling ABI.

On AArch64 with ILP32 ABI the return address is stored as 64bit in
memory but __builtin_return_address returns 32bit void* so it cannot
be the same as the stored value if the top bits are used for mangling.

It seems only the
  __builtin_extract_return_addr (__builtin_return_address (0))
usage was ever useful in portable code so i think this should be
documented and otherwise leave the semantics to the target to decide.
---
 gcc/doc/extend.texi | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cced19d2018..0fd32a22599 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -11151,18 +11151,30 @@ The @var{level} argument must be a constant integer.
 
 On some machines it may be impossible to determine the return address of
 any function other than the current one; in such cases, or when the top
-of the stack has been reached, this function returns @code{0} or a
-random value.  In addition, @code{__builtin_frame_address} may be used
+of the stack has been reached, this function returns an unspecified
+value.  In addition, @code{__builtin_frame_address} may be used
 to determine if the top of the stack has been reached.
 
 Additional post-processing of the returned value may be needed, see
 @code{__builtin_extract_return_addr}.
 
+The stored representation of the return address in memory may be different
+from the address returned by @code{__builtin_return_address}.  For example
+on AArch64 the stored address may be mangled with return address signing.
+
 Calling this function with a nonzero argument can have unpredictable
 effects, including crashing the calling program.  As a result, calls
 that are considered unsafe are diagnosed when the @option{-Wframe-address}
 option is in effect.  Such calls should only be made in debugging
 situations.
+
+On targets where code addresses are representable as @code{void *},
+@smallexample
+void *addr = __builtin_extract_return_addr (__builtin_return_address (0))
+@end smallexample
+gives the code address where the current function would return.  For example
+such address may be used with @code{dladdr} or other interfaces that work
+with code addresses.
 @end deftypefn
 
 @deftypefn {Built-in Function} {void *} __builtin_extract_return_addr (void 
*@var{addr})
-- 
2.17.1



Re: [PATCH] doc: Clarify __builtin_return_address [PR94891]

2020-05-28 Thread Florian Weimer via Gcc-patches
* Szabolcs Nagy:

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index cced19d2018..0fd32a22599 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi

>  On some machines it may be impossible to determine the return address of
>  any function other than the current one; in such cases, or when the top
> +of the stack has been reached, this function returns an unspecified
> +value.

Can it crash as well?  But that's a pre-existing issue with the wording.

>  Additional post-processing of the returned value may be needed, see
>  @code{__builtin_extract_return_addr}.
>  
> +The stored representation of the return address in memory may be different
> +from the address returned by @code{__builtin_return_address}.  For example
> +on AArch64 the stored address may be mangled with return address signing.
> +
>  Calling this function with a nonzero argument can have unpredictable
>  effects, including crashing the calling program.  As a result, calls
>  that are considered unsafe are diagnosed when the @option{-Wframe-address}
>  option is in effect.  Such calls should only be made in debugging
>  situations.
> +
> +On targets where code addresses are representable as @code{void *},
> +@smallexample
> +void *addr = __builtin_extract_return_addr (__builtin_return_address (0))
> +@end smallexample
> +gives the code address where the current function would return.  For example
> +such address may be used with @code{dladdr} or other interfaces that work
> +with code addresses.
>  @end deftypefn

The change looks reasonable to me.  It is worded in such a way that it
covers architectures which use function descriptors.

Thanks,
Florian



[PATCH] S/390: Emit vector alignment hints for z13

2020-05-28 Thread Stefan Schulze Frielinghaus via Gcc-patches
Vector alignment hints are fully supported since z14.  On z13 alignment
hints have no effect, however, instructions with alignment hints are
still legal.  Thus, emit alignment hints also for z13 targets so that if
the binary is actually run on a z14 or later it benefits from such
hints.

Note, this requires gas including commit f687f5f563 of the binutils
repository.

gcc/ChangeLog:

* config/s390/s390.c (print_operand): Emit vector alignment
hints for z13.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/align-1.c: Change target architecture
to z13.
* gcc.target/s390/vector/align-2.c: Change target architecture
to z13.
---
 gcc/config/s390/s390.c | 7 ++-
 gcc/testsuite/gcc.target/s390/vector/align-1.c | 2 +-
 gcc/testsuite/gcc.target/s390/vector/align-2.c | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 4de3129f88e..b5fd5a2f3ed 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7854,7 +7854,12 @@ print_operand (FILE *file, rtx x, int code)
 {
 case 'A':
 #ifdef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS
-  if (TARGET_Z14 && MEM_P (x))
+  /* Vector alignment hints are fully supported since z14.  On z13
+alignment hints have no effect, however, instructions with alignment
+hints are still legal.  Thus, emit alignment hints also for z13
+targets so that if the binary is actually run on a z14 or later it
+benefits from such hints.  */
+  if (TARGET_Z13 && MEM_P (x))
{
  if (MEM_ALIGN (x) >= 128)
fprintf (file, ",4");
diff --git a/gcc/testsuite/gcc.target/s390/vector/align-1.c 
b/gcc/testsuite/gcc.target/s390/vector/align-1.c
index ccad22a..6997af2ddcd 100644
--- a/gcc/testsuite/gcc.target/s390/vector/align-1.c
+++ b/gcc/testsuite/gcc.target/s390/vector/align-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mzarch -march=z14" } */
+/* { dg-options "-O3 -mzarch -march=z13" } */
 
 /* The user alignment ends up in DECL_ALIGN of the VAR_DECL and is
currently ignored if it is smaller than the alignment of the type.
diff --git a/gcc/testsuite/gcc.target/s390/vector/align-2.c 
b/gcc/testsuite/gcc.target/s390/vector/align-2.c
index e4e2fba6a58..00e09d3eadb 100644
--- a/gcc/testsuite/gcc.target/s390/vector/align-2.c
+++ b/gcc/testsuite/gcc.target/s390/vector/align-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mzarch -march=z14" } */
+/* { dg-options "-O3 -mzarch -march=z13" } */
 
 /* The user alignment ends up in TYPE_ALIGN of the type of the
VAR_DECL.  */
-- 
2.25.3



Re: [PATCH] S/390: Emit vector alignment hints for z13

2020-05-28 Thread Stefan Schulze Frielinghaus via Gcc-patches
Forgot to mention:

Bootstrapped and regtested on z13 and z14 with gas including f687f5f563

Ok for master?

On Thu, May 28, 2020 at 08:24:26PM +0200, Stefan Schulze Frielinghaus via 
Gcc-patches wrote:
> Vector alignment hints are fully supported since z14.  On z13 alignment
> hints have no effect, however, instructions with alignment hints are
> still legal.  Thus, emit alignment hints also for z13 targets so that if
> the binary is actually run on a z14 or later it benefits from such
> hints.
> 
> Note, this requires gas including commit f687f5f563 of the binutils
> repository.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.c (print_operand): Emit vector alignment
>   hints for z13.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/align-1.c: Change target architecture
>   to z13.
>   * gcc.target/s390/vector/align-2.c: Change target architecture
>   to z13.
> ---
>  gcc/config/s390/s390.c | 7 ++-
>  gcc/testsuite/gcc.target/s390/vector/align-1.c | 2 +-
>  gcc/testsuite/gcc.target/s390/vector/align-2.c | 2 +-
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 4de3129f88e..b5fd5a2f3ed 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -7854,7 +7854,12 @@ print_operand (FILE *file, rtx x, int code)
>  {
>  case 'A':
>  #ifdef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS
> -  if (TARGET_Z14 && MEM_P (x))
> +  /* Vector alignment hints are fully supported since z14.  On z13
> +  alignment hints have no effect, however, instructions with alignment
> +  hints are still legal.  Thus, emit alignment hints also for z13
> +  targets so that if the binary is actually run on a z14 or later it
> +  benefits from such hints.  */
> +  if (TARGET_Z13 && MEM_P (x))
>   {
> if (MEM_ALIGN (x) >= 128)
>   fprintf (file, ",4");
> diff --git a/gcc/testsuite/gcc.target/s390/vector/align-1.c 
> b/gcc/testsuite/gcc.target/s390/vector/align-1.c
> index ccad22a..6997af2ddcd 100644
> --- a/gcc/testsuite/gcc.target/s390/vector/align-1.c
> +++ b/gcc/testsuite/gcc.target/s390/vector/align-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -mzarch -march=z14" } */
> +/* { dg-options "-O3 -mzarch -march=z13" } */
>  
>  /* The user alignment ends up in DECL_ALIGN of the VAR_DECL and is
> currently ignored if it is smaller than the alignment of the type.
> diff --git a/gcc/testsuite/gcc.target/s390/vector/align-2.c 
> b/gcc/testsuite/gcc.target/s390/vector/align-2.c
> index e4e2fba6a58..00e09d3eadb 100644
> --- a/gcc/testsuite/gcc.target/s390/vector/align-2.c
> +++ b/gcc/testsuite/gcc.target/s390/vector/align-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -mzarch -march=z14" } */
> +/* { dg-options "-O3 -mzarch -march=z13" } */
>  
>  /* The user alignment ends up in TYPE_ALIGN of the type of the
> VAR_DECL.  */
> -- 
> 2.25.3
> 


[committed] Fix incorrect code on H8/SX with bit logicals

2020-05-28 Thread Jeff Law via Gcc-patches


The H8/SX has some extended capabilities for bit-and, bit-ior and bit-xor
compared to earlier processors in the H8 family and thus there's some special
patterns to handle them.

THe instructions work on byte sized chunks, but GCC is exposing them in HImode 
as
well. It looks like someone tried to handle the big endian correction issues for
the bit-and instruction and there's some chance that splitter might work.  But
the ior/xor pattern is clearly broken.

I tried to fix the ior/xor pattern in a similar manner, but doing so highlighted
that these insns don't allow reg+d addressing modes.  It seems safest to limit
when they apply.  If someone cares enough they can always go back and try to
handle the address adjustments (allocating a scratch as needed) and add support
for SImode as well.

So this patch restricts the existing splitter to cases where no address
adjustment is needed.  It removes bclrhi_msx and turns bhi_msx into a
splitter.

Committing to the trunk.


Jeff
commit f04f2fcd3d40f944f29189b1f995aa35ea04a379
Author: Jeff Law 
Date:   Thu May 28 12:28:56 2020 -0600

Fix incorrect code generation with bit insns on H8/SX.

* config/h8300/logical.md (HImode H8/SX bit-and splitter): Don't
make a nonzero adjustment to the memory offset.
(bhi_msx): Turn into a splitter.

diff --git a/gcc/config/h8300/logical.md b/gcc/config/h8300/logical.md
index 9dd863cdd8c..a099bbb4f5f 100644
--- a/gcc/config/h8300/logical.md
+++ b/gcc/config/h8300/logical.md
@@ -14,22 +14,14 @@
   [(set (match_operand:HI 0 "bit_register_indirect_operand")
(and:HI (match_operand:HI 1 "bit_register_indirect_operand")
(match_operand:HI 2 "single_zero_operand")))]
-  "TARGET_H8300SX"
+  "TARGET_H8300SX && abs (INTVAL (operands[2])) > 0xff"
   [(set (match_dup 0)
(and:QI (match_dup 1)
(match_dup 2)))]
   {
-if (abs (INTVAL (operands[2])) > 0xFF)
-  {
-   operands[0] = adjust_address (operands[0], QImode, 0);
-   operands[1] = adjust_address (operands[1], QImode, 0);
-   operands[2] = GEN_INT ((INTVAL (operands[2])) >> 8);
-  }
-else
-  {
-   operands[0] = adjust_address (operands[0], QImode, 1);
-   operands[1] = adjust_address (operands[1], QImode, 1);
-  }
+operands[0] = adjust_address (operands[0], QImode, 0);
+operands[1] = adjust_address (operands[1], QImode, 0);
+operands[2] = GEN_INT ((INTVAL (operands[2])) >> 8);
   })
 
 (define_insn "bclrhi_msx"
@@ -134,13 +126,19 @@
   { return  == IOR ? "bset\\t%V2,%0" : "bnot\\t%V2,%0"; }
   [(set_attr "length" "8")])
 
-(define_insn "bhi_msx"
-  [(set (match_operand:HI 0 "bit_register_indirect_operand" "=m")
-   (ors:HI (match_operand:HI 1 "bit_register_indirect_operand" "%0")
-   (match_operand:HI 2 "single_one_operand" "Y2")))]
-  "TARGET_H8300SX"
-  { return  == IOR ? "bset\\t%V2,%0" : "bnot\\t%V2,%0"; }
-  [(set_attr "length" "8")])
+(define_split
+  [(set (match_operand:HI 0 "bit_register_indirect_operand")
+   (ors:HI (match_operand:HI 1 "bit_register_indirect_operand")
+   (match_operand:HI 2 "single_one_operand")))]
+  "TARGET_H8300SX && abs (INTVAL (operands[2])) > 0xff"
+  [(set (match_dup 0)
+   (and:QI (match_dup 1)
+   (match_dup 2)))]
+  {
+operands[0] = adjust_address (operands[0], QImode, 0);
+operands[1] = adjust_address (operands[1], QImode, 0);
+operands[2] = GEN_INT ((INTVAL (operands[2])) >> 8);
+  })
 
 (define_insn "qi3_1"
   [(set (match_operand:QI 0 "bit_operand" "=U,rQ")
commit ccf4e86dc01d8c89a8d56b228757a689d1fcc564
Author: Jeff Law 
Date:   Thu May 28 12:37:08 2020 -0600

Finish prior patch

* config/h8300/logical.md (bclrhi_msx): Remove pattern.

diff --git a/gcc/config/h8300/logical.md b/gcc/config/h8300/logical.md
index a099bbb4f5f..7d24fad360a 100644
--- a/gcc/config/h8300/logical.md
+++ b/gcc/config/h8300/logical.md
@@ -24,14 +24,6 @@
 operands[2] = GEN_INT ((INTVAL (operands[2])) >> 8);
   })
 
-(define_insn "bclrhi_msx"
-  [(set (match_operand:HI 0 "bit_register_indirect_operand" "=m")
-   (and:HI (match_operand:HI 1 "bit_register_indirect_operand" "%0")
-   (match_operand:HI 2 "single_zero_operand" "Y0")))]
-  "TARGET_H8300SX"
-  "bclr\\t%W2,%0"
-  [(set_attr "length" "8")])
-
 (define_insn "*andqi3_2"
   [(set (match_operand:QI 0 "bit_operand" "=U,rQ,r")
(and:QI (match_operand:QI 1 "bit_operand" "%0,0,WU")


Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-28 Thread Segher Boessenkool
Hi!

On Thu, May 28, 2020 at 04:22:16PM +0200, Richard Biener wrote:
> For GIMPLE level transforms I don't think targets have more knowledge
> than the middle-end.

Yes, certainly.

> In fact GIMPLE complete unrolling is about
> secondary effects, removing redundancies and abstraction.  So IMHO
> the correct approach is to look at individual cases and try to improve
> the generic code

Yep.

> rather than try to get better benchmark results
> on a per-target manner by magical parameter tuning.

I'm no fan of that for target-specific code either.  It's fine to be led
by benchmarks, but usually a better justification is needed.

> For what the RTL unroller does it indeed depends very heavily on
> the target whether sth is beneficial or not.

Yes :-(  And this means it will need to remain late in the pass
pipeline,  or at least the decision needs to use target information
(just like what ivopts does).

> So I'd like to see specific cases where you think cunroll should
> do "better" on powerpc only but not elsewhere.

It is probably not a good idea in general to unroll 14 times, yes :-)


Segher


Aw: [PATCH, committed] [9/10/11 Regression] PR fortran/95104 - Segfault on a legal WAIT statement

2020-05-28 Thread Harald Anlauf
The fix for

> PR fortran/95104 - Segfault on a legal WAIT statement
>
> Referencing a unit in a WAIT statement that has not been opened before
> resulted in a NULL pointer dereference.  Check for this condition.
>
> 2020-05-26  Harald Anlauf  
>
> libgfortran/
>   PR libfortran/95104
>   * io/transfer.c (st_wait_async): Do not dereference NULL pointer.
>
> gcc/testsuite/
>   PR libfortran/95104
>   * gfortran.dg/pr95104.f90: New test.
>
> Co-Authored-By: Steven G. Kargl  

did uncover a latent issue with regard to unit locking that was introduced in 
the
context of asynchronous I/O in libgfortran.  This was reported by Rainer Orth, 
see

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95104#c13

I can reproduce this when compiling/linking with -fopenmp.

There are two possible fixes for this:

(1) guard the call to unlock_unit by:

diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index cd51679ff46..296be0711a2 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -4508,7 +4508,8 @@ st_wait_async (st_parameter_wait *wtp)
async_wait (&(wtp->common), u->au);
 }

-  unlock_unit (u);
+  if (u)
+unlock_unit (u);
 }



(2) in unlock_unit():

diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
index 0030d7e8701..a3b0656cb90 100644
--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c
@@ -767,9 +767,12 @@ close_unit_1 (gfc_unit *u, int locked)
 void
 unlock_unit (gfc_unit *u)
 {
-  NOTE ("unlock_unit = %d", u->unit_number);
-  UNLOCK (&u->lock);
-  NOTE ("unlock_unit done");
+  if (u)
+{
+  NOTE ("unlock_unit = %d", u->unit_number);
+  UNLOCK (&u->lock);
+  NOTE ("unlock_unit done");
+}
 }

 /* close_unit()-- Close a unit.  The stream is closed, and any memory


Does anybody prefer one over the other, or just commit both (which might be
preferable to catch other unguarded cases)?

Thanks,
Harald



Re: [PATCH, committed] [9/10/11 Regression] PR fortran/95104 - Segfault on a legal WAIT statement

2020-05-28 Thread Thomas Koenig via Gcc-patches

Hi Harald,


There are two possible fixes for this:

(1) guard the call to unlock_unit by:

diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index cd51679ff46..296be0711a2 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -4508,7 +4508,8 @@ st_wait_async (st_parameter_wait *wtp)
 async_wait (&(wtp->common), u->au);
  }

-  unlock_unit (u);
+  if (u)
+unlock_unit (u);
  }



(2) in unlock_unit():

diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
index 0030d7e8701..a3b0656cb90 100644
--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c
@@ -767,9 +767,12 @@ close_unit_1 (gfc_unit *u, int locked)
  void
  unlock_unit (gfc_unit *u)
  {
-  NOTE ("unlock_unit = %d", u->unit_number);
-  UNLOCK (&u->lock);
-  NOTE ("unlock_unit done");
+  if (u)
+{
+  NOTE ("unlock_unit = %d", u->unit_number);
+  UNLOCK (&u->lock);
+  NOTE ("unlock_unit done");
+}
  }

  /* close_unit()-- Close a unit.  The stream is closed, and any memory


Does anybody prefer one over the other, or just commit both (which might be
preferable to catch other unguarded cases)?


I think the second one is more robust - like you say, this may catch
other cases.  If we have that one, we don't need the first one.

Regards

Thomas


[pushed] c++: Immediately deduce auto member [PR94926].

2020-05-28 Thread Jason Merrill via Gcc-patches
In r9-297 I was trying to be more flexible and treat static data members of
class templates more like variable templates, where the type need not be
determined until the variable is instantiated, but I suppose that in a class
the types of all the non-template members need to be determined at the time
of class instantiation.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/94926
* decl.c (cp_finish_decl): Revert r9-297 change.
(check_static_variable_definition): Likewise.
* constexpr.c (ensure_literal_type_for_constexpr_object): Likewise.
* pt.c (instantiate_decl): Return early on type error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/pr86648.C: Expect error.
* g++.dg/cpp1z/static2.C: Expect error.
* g++.dg/cpp0x/nsdmi16.C: New test.
---
 gcc/cp/constexpr.c   |  2 --
 gcc/cp/decl.c| 21 +
 gcc/cp/pt.c  |  1 +
 gcc/testsuite/g++.dg/cpp0x/nsdmi16.C | 11 +++
 gcc/testsuite/g++.dg/cpp1z/pr86648.C |  4 +++-
 gcc/testsuite/g++.dg/cpp1z/static2.C |  2 +-
 6 files changed, 29 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi16.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 4e441ac8d2f..4b1f92f989c 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -96,8 +96,6 @@ ensure_literal_type_for_constexpr_object (tree decl)
   if (CLASS_TYPE_P (stype) && !COMPLETE_TYPE_P (complete_type (stype)))
/* Don't complain here, we'll complain about incompleteness
   when we try to initialize the variable.  */;
-  else if (type_uses_auto (type))
-   /* We don't know the actual type yet.  */;
   else if (!literal_type_p (type))
{
  if (DECL_DECLARED_CONSTEXPR_P (decl))
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 56571e39570..b0de90630d7 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7467,18 +7467,24 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   && (DECL_INITIAL (decl) || init))
 DECL_INITIALIZED_IN_CLASS_P (decl) = 1;
 
-  /* Do auto deduction unless decl is a function or an uninstantiated
- template specialization.  */
   if (TREE_CODE (decl) != FUNCTION_DECL
-  && !(init == NULL_TREE
-  && DECL_LANG_SPECIFIC (decl)
-  && DECL_TEMPLATE_INSTANTIATION (decl)
-  && !DECL_TEMPLATE_INSTANTIATED (decl))
   && (auto_node = type_uses_auto (type)))
 {
   tree d_init;
   if (init == NULL_TREE)
-   gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (auto_node));
+   {
+ if (DECL_LANG_SPECIFIC (decl)
+ && DECL_TEMPLATE_INSTANTIATION (decl)
+ && !DECL_TEMPLATE_INSTANTIATED (decl))
+   {
+ /* init is null because we're deferring instantiating the
+initializer until we need it.  Well, we need it now.  */
+ instantiate_decl (decl, /*defer_ok*/true, /*expl*/false);
+ return;
+   }
+
+ gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (auto_node));
+   }
   d_init = init;
   if (d_init)
{
@@ -10171,7 +10177,6 @@ check_static_variable_definition (tree decl, tree type)
  in check_initializer.  Similarly for inline static data members.  */
   else if (DECL_P (decl)
   && (DECL_DECLARED_CONSTEXPR_P (decl)
- || undeduced_auto_decl (decl)
  || DECL_VAR_DECLARED_INLINE_P (decl)))
 ;
   else if (cxx_dialect >= cxx11 && !INTEGRAL_OR_ENUMERATION_TYPE_P (type))
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4d9651acee6..90dafff3aa7 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -25293,6 +25293,7 @@ instantiate_decl (tree d, bool defer_ok, bool 
expl_inst_class_mem_p)
 d = DECL_CLONED_FUNCTION (d);
 
   if (DECL_TEMPLATE_INSTANTIATED (d)
+  || TREE_TYPE (d) == error_mark_node
   || (TREE_CODE (d) == FUNCTION_DECL
  && DECL_DEFAULTED_FN (d) && DECL_INITIAL (d))
   || DECL_TEMPLATE_SPECIALIZATION (d))
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi16.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi16.C
new file mode 100644
index 000..07bc198e691
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi16.C
@@ -0,0 +1,11 @@
+// PR c++/94926
+// { dg-do compile { target c++11 } }
+
+template
+struct A {
+static auto self_reference = A{}; // { dg-error "incomplete" }
+};
+
+int main() {
+A{};
+}
diff --git a/gcc/testsuite/g++.dg/cpp1z/pr86648.C 
b/gcc/testsuite/g++.dg/cpp1z/pr86648.C
index 20ee4c8c0d4..58c611c985f 100644
--- a/gcc/testsuite/g++.dg/cpp1z/pr86648.C
+++ b/gcc/testsuite/g++.dg/cpp1z/pr86648.C
@@ -1,5 +1,7 @@
 // { dg-do compile { target c++17 } }
 
 template  class A;
-template  struct B { static A a{T::a}; };
+template  struct B {
+  static A a{T::a};// { dg-error "int" }
+};
 void foo () { B a; }
diff --git a/gcc/testsuite/g++.dg/cpp1z/static2.C 
b/gcc/testsuite/g++.dg/cpp1z/static2.C
index 9462e0355c8..5d93a0e7242 100

Re: [PATCH] Port libgccjit to Windows.

2020-05-28 Thread David Malcolm via Gcc-patches
On Wed, 2020-05-27 at 22:27 -0300, Nicolas Bértolo wrote:
> Hi,
> 
> > Do you have commit/push access to the gcc repository?
> 
> No I don't.
> 
> > BTW, why isn't it necessary to use --enable-host-shared in Windows?
> > Can we document that?
> 
> That's because all code is position independent in Windows.
> 
> > On the subject of nitpicking, I find myself getting distracted by
> the
> > indentation in the patch; there seem to be a lot of mismatches.
> 
> > What editor are you using, and does it have options to
> > (a) show visible whitespace, and
> > (b) to apply a formatting convention?
> 
> > I use Emacs, and it takes care of this for me.  I haven't used it,
> but
> > there's a contrib/clang-format file in the gcc source tree which
> > presumably describes GCC's coding conventions, if that helps for
> the
> > new code.
> 
> The problem seems to be that I was writing tabs but since I have set
> up my
> editor to show them as 2 spaces I couldn't see what was wrong.

Thanks; the latest patch is much better.

> > Am I right in thinking that this installs the libgccjit.a file on
> Windows?
> > Why is this done?
> 
> That is the file libgccjit.dll.a
> 
> It is the import library for gccjit. It is part of the way Windows
> handles
> dynamic libraries.

Thanks.

> > New C++ source files should have a .cc extension.
> > I hope that at some point we'll rename all the existing .c ones
> > accordingly.
> 
> I just couldn't get Make to generate jit-w32.o from jit-w32.cc.
> It looks for jit-w32.c.
> 
> I had to leave it with the .c extension.

Fair enough.

> > Does this call generate a directory that's only accessible to the
> > current user?
> > Otherwise there could be a risk of a hostile user on the same
> machine
> > clobbering the contents and injecting code into this process.
> 
> I changed the code to generate a directory than can only be accessed
> by the
> current user.
> 
> I've attached a new version. It contains a rewrite of the code that
> creates
> temporary directories.
> 
> Nico

I'm going to have to trust your Windows expertise here; the tempdir
code looks convoluted to me, but perhaps that's the only way to do it.
(Microsoft's docs for "SECURITY_ATTRIBUTES" suggest to me that if
lpSecurityDescriptor is NULL, then the directory gets a default
security descriptor, and that this may mean it's only readable by the
user represented by the access token of the process [1], which might
suggest a simplification - but I'm very hazy on how the security model
in Windows works)


I was able to successfully bootstrap and regression test with your
patch on x86_64-pc-linux-gnu.  I also verified that the result of "make
install" was not affected for my configuration.

I've pushed your patch to master as
c83027f32d9cca84959c7d6a1e519a0129731501.

(I had to do a little fixup of the ChangeLog entries to get them to
work with the new hooks on our git repo)

Thanks again for the patch
Dave

[1] 
https://docs.microsoft.com/en-us/previous-versions/windows/desktop/legacy/aa379560(v=vs.85)



Re: [PATCH] Port libgccjit to Windows.

2020-05-28 Thread Nicolas Bértolo via Gcc-patches
> I'm going to have to trust your Windows expertise here; the tempdir
> code looks convoluted to me, but perhaps that's the only way to do it.
> (Microsoft's docs for "SECURITY_ATTRIBUTES" suggest to me that if
> lpSecurityDescriptor is NULL, then the directory gets a default
> security descriptor, and that this may mean it's only readable by the
> user represented by the access token of the process [1], which might
> suggest a simplification - but I'm very hazy on how the security model
> in Windows works)

I tested this and it gives write access to the "Authenticated Users" group.
The
way I did it gives access only to the user that owns the libgccjit process.
I
have to admit that it is a lot of code and it is hard to understand unless
you
know the security model of Windows well. I don't know it well, I wrote this
keeping the documentation close and experimenting.

> I was able to successfully bootstrap and regression test with your
> patch on x86_64-pc-linux-gnu.  I also verified that the result of "make
> install" was not affected for my configuration.

Great.

> I've pushed your patch to master as
> c83027f32d9cca84959c7d6a1e519a0129731501.
>
> Thanks again for the patch
> Dave

Thanks to you for all the good feedback.

Nico.


Re: [PATCH, committed] [9/10/11 Regression] PR fortran/95104 - Segfault on a legal WAIT statement

2020-05-28 Thread Harald Anlauf
Dear all,

> I think the second one is more robust - like you say, this may catch
> other cases.  If we have that one, we don't need the first one.

to fix the fallout I've committed to master the following patch,
which I will backport to the affected branches (9/10):


PR fortran/95104 - Segfault on a legal WAIT statement

The initial commit for this PR uncovered a latent issue with unit locking
in the Fortran run-time library.  Add check for valid unit.

2020-05-28  Harald Anlauf  

libgfortran/
PR libfortran/95104
* io/unit.c (unlock_unit): Guard by check for NULL pointer.


diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
index 0030d7e8701..a3b0656cb90 100644
--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c
@@ -767,9 +767,12 @@ close_unit_1 (gfc_unit *u, int locked)
 void
 unlock_unit (gfc_unit *u)
 {
-  NOTE ("unlock_unit = %d", u->unit_number);
-  UNLOCK (&u->lock);
-  NOTE ("unlock_unit done");
+  if (u)
+{
+  NOTE ("unlock_unit = %d", u->unit_number);
+  UNLOCK (&u->lock);
+  NOTE ("unlock_unit done");
+}
 }

 /* close_unit()-- Close a unit.  The stream is closed, and any memory



[PATCH, committed] PR fortran/95373 - [9/10/11 Regression] ICE in build_reference_type, at tree.c:7942

2020-05-28 Thread Harald Anlauf
The obvious patch attached was already OKed in the PR by Steve Kargl.
As it is a 9/10/11 regression, I will backport it in a few days.

Thanks,
Harald


PR fortran/95373 - [9/10/11 Regression] ICE in build_reference_type, at 
tree.c:7942

The use of KIND, LEN, RE, and IM inquiry references for applicable intrinsic
types is valid only for suffienctly new Fortran standards.  Add appropriate
check.

2020-05-28  Harald Anlauf  

gcc/fortran/
PR fortran/95373
* primary.c (is_inquiry_ref): Check validity of inquiry
references against selected Fortran standard.

gcc/testsuite/
PR fortran/95373
* gfortran.dg/pr95373_1.f90: New test.
* gfortran.dg/pr95373_2.f90: New test.
diff --git a/gcc/fortran/primary.c b/gcc/fortran/primary.c
index d73898473df..67105cc9ab1 100644
--- a/gcc/fortran/primary.c
+++ b/gcc/fortran/primary.c
@@ -1998,6 +1998,28 @@ is_inquiry_ref (const char *name, gfc_ref **ref)
   else
 return false;

+  switch (type)
+{
+case INQUIRY_RE:
+case INQUIRY_IM:
+  if (!gfc_notify_std (GFC_STD_F2008, "RE or IM part_ref at %C"))
+	return false;
+  break;
+
+case INQUIRY_KIND:
+  if (!gfc_notify_std (GFC_STD_F2003, "KIND part_ref at %C"))
+	return false;
+  break;
+
+case INQUIRY_LEN:
+  if (!gfc_notify_std (GFC_STD_F2003, "LEN part_ref at %C"))
+	return false;
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
   if (ref)
 {
   *ref = gfc_get_ref ();
diff --git a/gcc/testsuite/gfortran.dg/pr95373_1.f90 b/gcc/testsuite/gfortran.dg/pr95373_1.f90
new file mode 100644
index 000..f39b6a72346
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95373_1.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-std=f95" }
+! PR fortran/95373 - ICE in build_reference_type, at tree.c:7942
+
+subroutine s (x)
+  complex, parameter :: z = 3
+  real(z% kind)  :: x   ! { dg-error "nonderived-type variable" }
+  type t
+ real:: kind
+ logical :: re
+  end type t
+  type(t) :: b
+  print *, b% kind, b% re
+  print *, z% re! { dg-error "nonderived-type variable" }
+end
diff --git a/gcc/testsuite/gfortran.dg/pr95373_2.f90 b/gcc/testsuite/gfortran.dg/pr95373_2.f90
new file mode 100644
index 000..2a654b43faa
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95373_2.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-std=f2003" }
+! PR fortran/95373 - ICE in build_reference_type, at tree.c:7942
+
+subroutine s (x)
+  complex, parameter :: z = 3
+  real(z% kind)  :: x
+  type t
+ real:: kind
+ logical :: re
+  end type t
+  type(t) :: b
+  print *, b% kind, b% re
+  print *, z% re! { dg-error "nonderived-type variable" }
+end


Re: [PATCH] c++: lambdas inside constraints [PR92652]

2020-05-28 Thread Jason Merrill via Gcc-patches

On 5/28/20 11:03 AM, Patrick Palka wrote:

When parsing a constraint-expression, a requires-clause or a
requires-expression, we temporarily increment processing_template_decl
so that we always obtain template trees which we later reduce via
substitution even when not inside a template.

But incrementing processing_template_decl when we're already inside a
template has the unintended side effect of shifting up the template
parameter levels of a lambda defined inside one of these constructs,
which leads to confusion later during substitution into the lambda.

This patch fixes this issue by incrementing processing_template_decl
during parsing of these constructs only if it is 0.

Passes 'make check-c++', and also tested by building cmcstl2, does this
look OK to commit after a full bootstrap/regtest?

gcc/cp/ChangeLog:

PR c++/92652
PR c++/93698
PR c++/94128
* parser.c (cp_parser_requires_clause_expression): Temporarily
increment processing_template_decl only if it is 0.
(cp_parser_constraint_expression): Likewise.
(cp_parser_requires_expression): Likewise.

gcc/testsuite/ChangeLog:

PR c++/92652
PR c++/93698
PR c++/94128
* g++.dg/cpp2a/concepts-lambda8.C: New test.
* g++.dg/cpp2a/concepts-lambda9.C: New test.
* g++.dg/cpp2a/concepts-lambda10.C: New test.
---
  gcc/cp/parser.c| 15 +--
  gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C |  7 +++
  gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C  | 11 +++
  gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C  | 11 +++
  4 files changed, 38 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 54ca875ce54..3bca1f3770a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -27663,11 +27663,12 @@ static tree
  cp_parser_requires_clause_expression (cp_parser *parser, bool lambda_p)
  {
processing_constraint_expression_sentinel parsing_constraint;
-  ++processing_template_decl;
+  temp_override ovr (processing_template_decl);
+  if (!processing_template_decl)
+processing_template_decl = 1;


This needs a comment about why you're doing it this way instead of the 
usual ++.  OK with that change.



cp_expr expr = cp_parser_constraint_logical_or_expression (parser, 
lambda_p);
if (check_for_bare_parameter_packs (expr))
  expr = error_mark_node;
-  --processing_template_decl;
return expr;
  }
  
@@ -27684,12 +27685,13 @@ static tree

  cp_parser_constraint_expression (cp_parser *parser)
  {
processing_constraint_expression_sentinel parsing_constraint;
-  ++processing_template_decl;
+  temp_override ovr (processing_template_decl);
+  if (!processing_template_decl)
+processing_template_decl = 1;
cp_expr expr = cp_parser_binary_expression (parser, false, true,
  PREC_NOT_OPERATOR, NULL);
if (check_for_bare_parameter_packs (expr))
  expr = error_mark_node;
-  --processing_template_decl;
expr.maybe_add_location_wrapper ();
return expr;
  }
@@ -27798,9 +27800,10 @@ cp_parser_requires_expression (cp_parser *parser)
parms = NULL_TREE;
  
  /* Parse the requirement body. */

-++processing_template_decl;
+temp_override ovr (processing_template_decl);
+if (!processing_template_decl)
+  processing_template_decl = 1;
  reqs = cp_parser_requirement_body (parser);
---processing_template_decl;
  if (reqs == error_mark_node)
return error_mark_node;
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C
new file mode 100644
index 000..392da312b28
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda10.C
@@ -0,0 +1,7 @@
+// PR c++/94128
+// { dg-do compile { target c++20 } }
+
+void test(auto param)
+requires requires{ { [](auto p){return p;}(param) }; };
+
+void test2() { test(1); }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C
new file mode 100644
index 000..c1c9be682d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda8.C
@@ -0,0 +1,11 @@
+// PR c++/92652
+// { dg-do compile { target concepts } }
+
+template < typename T >
+requires ([]{return true ;}())
+void h() { }
+
+int main()
+{
+h();
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C
new file mode 100644
index 000..6b81ba0adac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda9.C
@@ -0,0 +1,11 @@
+// PR c++/93698
+// { dg-do compile { target concepts } }
+
+#include 
+
+template 
+concept foo = [](std::index_sequence) constexpr {
+  return (Is + ...) > 10;
+}(std:

Re: [PATCH] c++: constexpr RANGE_EXPR ctor indexes [PR95241]

2020-05-28 Thread Jason Merrill via Gcc-patches

On 5/27/20 5:15 PM, Patrick Palka wrote:

On Wed, 27 May 2020, Patrick Palka wrote:


On Wed, 27 May 2020, Patrick Palka wrote:


In the testcase below, the CONSTRUCTOR for 'field' contains a
RANGE_EXPR index:

   {aggr_init_expr<...>, [1...2]={.off=1}}

but get_or_insert_ctor_field isn't prepared to handle RANGE_EXPR
indexes.

This patch adds limited support for RANGE_EXPR indexes to
get_or_insert_ctor_field.  The limited scope of this patch should make
it more suitable for backporting, and support for more access patterns
would be needed only to handle self-modifying CONSTRUCTORs containing a
RANGE_EXPR index, but I haven't yet been able to come up with a testcase
that exhibits such a CONSTRUCTOR.

Passes 'make check-c++', does this look OK to commit to master and to
the GCC 10 branch after a full bootstrap and regtest?


OK.


gcc/cp/ChangeLog:

PR c++/95241
* constexpr.c (get_or_insert_ctor_field): Add limited support
for RANGE_EXPR indexes.

gcc/testsuite/ChangeLog:

PR c++/95241
* g++.dg/cpp0x/constexpr-array25.C: New test.
---
  gcc/cp/constexpr.c| 12 +++
  .../g++.dg/cpp0x/constexpr-array25.C  | 21 +++
  2 files changed, 33 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array25.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 4e441ac8d2f..6f9bafbe8d8 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3301,6 +3301,18 @@ get_or_insert_ctor_field (tree ctor, tree index, int 
pos_hint = -1)
  }
else if (TREE_CODE (type) == ARRAY_TYPE || TREE_CODE (type) == VECTOR_TYPE)
  {
+  if (TREE_CODE (index) == RANGE_EXPR)
+   {
+ /* Our support for RANGE_EXPR indexes is limited to accessing an
+existing one via POS_HINT, and appending a new one to the end of
+CTOR.  ??? Support for other access patterns might be needed.  */
+ tree lo = TREE_OPERAND (index, 0);
+ auto *elts = CONSTRUCTOR_ELTS (ctor);
+ gcc_assert (vec_safe_is_empty (elts)
+ || array_index_cmp (lo, elts->last().index) > 0);
+ return vec_safe_push (elts, {index, NULL_TREE});
+   }
+


Oops, it just occurred to me that the use of C++11 features here would
make this patch unsuitable for backporting.  C++98-compatible patch
incoming...


Here it is.  Does the following look OK to commit to master and to the
GCC 10 branch after a full bootstrap and regtest?

-- >8 --

Subject: [PATCH] c++: constexpr RANGE_EXPR ctor indexes [PR95241]

In the testcase below, the CONSTRUCTOR for 'field' contains a
RANGE_EXPR index:

   {aggr_init_expr<...>, [1...2]={.off=1}}

but get_or_insert_ctor_field isn't prepared to handle RANGE_EXPR
indexes.

This patch adds limited support for RANGE_EXPR indexes to
get_or_insert_ctor_field.  The limited scope of this patch should make
it more suitable for backporting, and support for more access patterns
would be needed only to handle self-modifying CONSTRUCTORs containing a
RANGE_EXPR index, but I haven't yet been able to come up with a testcase
that exhibits such a CONSTRUCTOR.

gcc/cp/ChangeLog:

PR c++/95241
* constexpr.c (get_or_insert_ctor_field): Add limited support
for RANGE_EXPR indexes.

gcc/testsuite/ChangeLog:

PR c++/95241
* g++.dg/cpp0x/constexpr-array25.C: New test.
---
  gcc/cp/constexpr.c| 15 +
  .../g++.dg/cpp0x/constexpr-array25.C  | 21 +++
  2 files changed, 36 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array25.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 4e441ac8d2f..32f2ef96fc7 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3301,6 +3301,21 @@ get_or_insert_ctor_field (tree ctor, tree index, int 
pos_hint = -1)
  }
else if (TREE_CODE (type) == ARRAY_TYPE || TREE_CODE (type) == VECTOR_TYPE)
  {
+  if (TREE_CODE (index) == RANGE_EXPR)
+   {
+ /* ??? Support for RANGE_EXPR indexes is currently limited to
+accessing one via POS_HINT, or appending a new one to the end
+of CTOR.  Support for other access patterns may be needed.  */
+ vec *elts = CONSTRUCTOR_ELTS (ctor);
+ if (vec_safe_length (elts))
+   {
+ tree lo = TREE_OPERAND (index, 0);
+ gcc_assert (array_index_cmp (lo, elts->last().index) > 0);
+   }
+ CONSTRUCTOR_APPEND_ELT (elts, index, NULL_TREE);
+ return &elts->last();
+   }
+
HOST_WIDE_INT i = find_array_ctor_elt (ctor, index, /*insert*/true);
gcc_assert (i >= 0);
constructor_elt *cep = CONSTRUCTOR_ELT (ctor, i);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-array25.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-array25.C
new file mode 100644
index 000..9162943249f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-array25.C
@@ 

Re: [PATCH] Port libgccjit to Windows.

2020-05-28 Thread David Malcolm via Gcc-patches
On Thu, 2020-05-28 at 16:51 -0300, Nicolas Bértolo wrote:
> > I'm going to have to trust your Windows expertise here; the tempdir
> > code looks convoluted to me, but perhaps that's the only way to do
> it.
> > (Microsoft's docs for "SECURITY_ATTRIBUTES" suggest to me that if
> > lpSecurityDescriptor is NULL, then the directory gets a default
> > security descriptor, and that this may mean it's only readable by
> the
> > user represented by the access token of the process [1], which
> might
> > suggest a simplification - but I'm very hazy on how the security
> model
> > in Windows works)
> 
> I tested this and it gives write access to the "Authenticated Users"
> group. 

Aha - sounds like that would be a problem.  Thanks for clarifying.

> The
> way I did it gives access only to the user that owns the libgccjit
> process. I
> have to admit that it is a lot of code and it is hard to understand
> unless you
> know the security model of Windows well. I don't know it well, I
> wrote this
> keeping the documentation close and experimenting.

Thanks.

> > I was able to successfully bootstrap and regression test with your
> > patch on x86_64-pc-linux-gnu.  I also verified that the result of
> "make
> > install" was not affected for my configuration.
> 
> Great.
> 
> > I've pushed your patch to master as
> > c83027f32d9cca84959c7d6a1e519a0129731501.
> > 
> > Thanks again for the patch
> > Dave
> 
> Thanks to you for all the good feedback.
> 
> Nico.



Re: [PATCH] c++: Try to complete decomp types [PR95328]

2020-05-28 Thread Jason Merrill via Gcc-patches

On 5/27/20 4:50 AM, Jakub Jelinek wrote:

Hi!

Two years ago Paolo has added the
   else if (processing_template_decl && !COMPLETE_TYPE_P (type))
 pedwarn (...);
lines into cp_finish_decomp.  For type dependent decl we punt much earlier,
but even for types which aren't type dependent COMPLETE_TYPE_P might be
false as this testcase shows, so this patch tries to complete_type first
(the reason for writing it that way is that it is then followed by another
else if and if complete_type returns error_mark_node, we shouldn't report
anything, as a bug should have been reported already.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2020-05-27  Jakub Jelinek  

PR c++/95328
* decl.c (cp_finish_decomp): Call complete_type before checking
COMPLETE_TYPE_P.

* g++.dg/cpp1z/decomp53.C: New test.

--- gcc/cp/decl.c.jj2020-05-22 11:07:21.884215758 +0200
+++ gcc/cp/decl.c   2020-05-26 15:21:25.039880747 +0200
@@ -8392,6 +8392,8 @@ cp_finish_decomp (tree decl, tree first,
error_at (loc, "cannot decompose lambda closure type %qT", type);
goto error_out;
  }
+  else if (processing_template_decl && complete_type (type) == error_mark_node)
+goto error_out;
else if (processing_template_decl && !COMPLETE_TYPE_P (type))
  pedwarn (loc, 0, "structured binding refers to incomplete class type %qT",
 type);
--- gcc/testsuite/g++.dg/cpp1z/decomp53.C.jj2020-05-26 15:25:01.397644953 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/decomp53.C   2020-05-26 15:24:37.764998398 
+0200
@@ -0,0 +1,22 @@
+// PR c++/95328
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template 
+struct S
+{
+  int a, b;
+};
+
+template 
+void
+foo ()
+{
+  auto [a, b] = S();// { dg-warning "structured bindings only available 
with" "" { target c++14_down } }
+}
+
+int
+main ()
+{
+  foo ();
+}

Jakub





Re: [PATCH] c++: Fix bogus -Wparentheses warning [PR95344]

2020-05-28 Thread Jason Merrill via Gcc-patches

On 5/26/20 8:25 PM, Marek Polacek wrote:

Since r267272, which added location wrappers, cp_fold loses
TREE_NO_WARNING on a MODIFY_EXPR that finish_parenthesized_expr set, and
that results in a bogus -Wparentheses warning.

I.e., previously we had "b = 1" but now we have "VIEW_CONVERT_EXPR(b) = 1"
and cp_fold_maybe_rvalue folds away the location wrapper and so we do
2718 x = fold_build2_loc (loc, code, TREE_TYPE (x), op0, op1);
in cp_fold and the flag is lost.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10/9?

PR c++/95344
* cp-gimplify.c (cp_fold) : Set TREE_NO_WARNING.

* c-c++-common/Wparentheses-2.c: New test.
---
  gcc/cp/cp-gimplify.c|  5 -
  gcc/testsuite/c-c++-common/Wparentheses-2.c | 18 ++
  2 files changed, 22 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/c-c++-common/Wparentheses-2.c

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 53d715dcd89..8b505dd878c 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2745,7 +2745,10 @@ cp_fold (tree x)
x = org_x;
}
if (code == MODIFY_EXPR && TREE_CODE (x) == MODIFY_EXPR)
-   TREE_THIS_VOLATILE (x) = TREE_THIS_VOLATILE (org_x);
+   {
+ TREE_THIS_VOLATILE (x) = TREE_THIS_VOLATILE (org_x);
+ TREE_NO_WARNING (x) = TREE_NO_WARNING (org_x);
+   }


I wonder if we want to copy these flags lower down for any EXPR_P (x) 
where TREE_CODE (x) == code?


Jason



Re: [PATCH v2] c++: Fix bogus -Wparentheses warning [PR95344]

2020-05-28 Thread Marek Polacek via Gcc-patches
On Thu, May 28, 2020 at 05:01:51PM -0400, Jason Merrill wrote:
> On 5/26/20 8:25 PM, Marek Polacek wrote:
> > Since r267272, which added location wrappers, cp_fold loses
> > TREE_NO_WARNING on a MODIFY_EXPR that finish_parenthesized_expr set, and
> > that results in a bogus -Wparentheses warning.
> > 
> > I.e., previously we had "b = 1" but now we have "VIEW_CONVERT_EXPR(b) 
> > = 1"
> > and cp_fold_maybe_rvalue folds away the location wrapper and so we do
> > 2718 x = fold_build2_loc (loc, code, TREE_TYPE (x), op0, op1);
> > in cp_fold and the flag is lost.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10/9?
> > 
> > PR c++/95344
> > * cp-gimplify.c (cp_fold) : Set TREE_NO_WARNING.
> > 
> > * c-c++-common/Wparentheses-2.c: New test.
> > ---
> >   gcc/cp/cp-gimplify.c|  5 -
> >   gcc/testsuite/c-c++-common/Wparentheses-2.c | 18 ++
> >   2 files changed, 22 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/c-c++-common/Wparentheses-2.c
> > 
> > diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
> > index 53d715dcd89..8b505dd878c 100644
> > --- a/gcc/cp/cp-gimplify.c
> > +++ b/gcc/cp/cp-gimplify.c
> > @@ -2745,7 +2745,10 @@ cp_fold (tree x)
> > x = org_x;
> > }
> > if (code == MODIFY_EXPR && TREE_CODE (x) == MODIFY_EXPR)
> > -   TREE_THIS_VOLATILE (x) = TREE_THIS_VOLATILE (org_x);
> > +   {
> > + TREE_THIS_VOLATILE (x) = TREE_THIS_VOLATILE (org_x);
> > + TREE_NO_WARNING (x) = TREE_NO_WARNING (org_x);
> > +   }
> 
> I wonder if we want to copy these flags lower down for any EXPR_P (x) where
> TREE_CODE (x) == code?

Sounds good; I don't think we want to lose those flags when folding in general,
not just for MODIFY_EXPR.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Since r267272, which added location wrappers, cp_fold loses
TREE_NO_WARNING on a MODIFY_EXPR that finish_parenthesized_expr set, and
that results in a bogus -Wparentheses warning.

I.e., previously we had "b = 1" but now we have "VIEW_CONVERT_EXPR(b) = 1"
and cp_fold_maybe_rvalue folds away the location wrapper and so we do
2718 x = fold_build2_loc (loc, code, TREE_TYPE (x), op0, op1);
in cp_fold and the flag is lost.

PR c++/95344
* cp-gimplify.c (cp_fold) : Don't set
TREE_THIS_VOLATILE here.
(cp_fold): Set it here along with TREE_NO_WARNING.

* c-c++-common/Wparentheses-2.c: New test.
---
 gcc/cp/cp-gimplify.c|  8 ++--
 gcc/testsuite/c-c++-common/Wparentheses-2.c | 18 ++
 2 files changed, 24 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wparentheses-2.c

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 53d715dcd89..d6723e44ec4 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2744,8 +2744,6 @@ cp_fold (tree x)
  else
x = org_x;
}
-  if (code == MODIFY_EXPR && TREE_CODE (x) == MODIFY_EXPR)
-   TREE_THIS_VOLATILE (x) = TREE_THIS_VOLATILE (org_x);
 
   break;
 
@@ -2994,6 +2992,12 @@ cp_fold (tree x)
   return org_x;
 }
 
+  if (EXPR_P (x) && TREE_CODE (x) == code)
+{
+  TREE_THIS_VOLATILE (x) = TREE_THIS_VOLATILE (org_x);
+  TREE_NO_WARNING (x) = TREE_NO_WARNING (org_x);
+}
+
   if (!c.evaluation_restricted_p ())
 {
   fold_cache->put (org_x, x);
diff --git a/gcc/testsuite/c-c++-common/Wparentheses-2.c 
b/gcc/testsuite/c-c++-common/Wparentheses-2.c
new file mode 100644
index 000..1aa5d314ae7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wparentheses-2.c
@@ -0,0 +1,18 @@
+// PR c++/95344 - bogus -Wparentheses warning.
+// { dg-do compile }
+// { dg-options "-Wparentheses" }
+
+#ifndef __cplusplus
+# define bool _Bool
+# define true 1
+# define false 0
+#endif
+
+void
+f (int i)
+{
+  bool b = false;
+  if (i == 99 ? (b = true) : false) // { dg-bogus "suggest parentheses" }
+{
+}
+}

base-commit: 3d8d5ddb539a5254c7ef83414377f4c74c7701d4
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



[OBSOLETE][PATCH] PR preprocessor/94657: use $AR, not 'ar',

2020-05-28 Thread Sergei Trofimovich via Gcc-patches
On Thu, 7 May 2020 08:18:31 +0100
Sergei Trofimovich via Gcc-patches  wrote:

> On Wed, 22 Apr 2020 23:05:38 +0100
> Sergei Trofimovich  wrote:
> 
> > From: Sergei Trofimovich 
> > 
> > On system with 'ar' and '${CHOST}-ar' the latter is preferred.
> > as it might not match default 'ar'.
> > 
> > Bug is initially reported downstream as https://bugs.gentoo.org/718004.
> > 
> > libcpp/ChangeLog:
> > 
> > PR libcpp/94657
> > * Makefile.in: use @AR@ placeholder
> > * configure.ac: use AC_CHECK_TOOL to find 'ar'
> > * configure: regenerate
> > ---
> >  libcpp/ChangeLog|  7 
> >  libcpp/Makefile.in  |  2 +-
> >  libcpp/configure| 94 +
> >  libcpp/configure.ac |  1 +
> >  4 files changed, 103 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> > index 307cf3add94..77145768a3d 100644
> > --- a/libcpp/ChangeLog
> > +++ b/libcpp/ChangeLog
> > @@ -1,3 +1,10 @@
> > +2020-04-22  Sergei Trofimovich  
> > +
> > +   PR preprocessor/94657: use $AR, not 'ar'
> > +   * Makefile.in: use @AR@ placeholder
> > +   * configure.ac: use AC_CHECK_TOOL to find 'ar'
> > +   * configure: regenerate
> > +
> >  2020-02-14  Jakub Jelinek  
> >  
> > Partially implement P1042R1: __VA_OPT__ wording clarifications
> > diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
> > index 8f8c8f65eb3..af7a0c6f73e 100644
> > --- a/libcpp/Makefile.in
> > +++ b/libcpp/Makefile.in
> > @@ -25,7 +25,7 @@ srcdir = @srcdir@
> >  top_builddir = .
> >  VPATH = @srcdir@
> >  INSTALL = @INSTALL@
> > -AR = ar
> > +AR = @AR@
> >  ARFLAGS = cru
> >  ACLOCAL = @ACLOCAL@
> >  AUTOCONF = @AUTOCONF@
> > diff --git a/libcpp/configure b/libcpp/configure
> > index 11da199083b..a6dcf5dcb61 100755
> > --- a/libcpp/configure
> > +++ b/libcpp/configure
> > @@ -657,6 +657,7 @@ ACLOCAL
> >  EGREP
> >  GREP
> >  CPP
> > +AR
> >  RANLIB
> >  ac_ct_CXX
> >  CXXFLAGS
> > @@ -1039,6 +1040,7 @@ do
> >| -silent | --silent | --silen | --sile | --sil)
> >  silent=yes ;;
> >  
> > +
> >-sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
> >  ac_prev=sbindir ;;
> >-sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
> > @@ -4008,6 +4010,98 @@ else
> >RANLIB="$ac_cv_prog_RANLIB"
> >  fi
> >  
> > +if test -n "$ac_tool_prefix"; then
> > +  # Extract the first word of "${ac_tool_prefix}ar", so it can be a 
> > program name with args.
> > +set dummy ${ac_tool_prefix}ar; ac_word=$2
> > +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> > +$as_echo_n "checking for $ac_word... " >&6; }
> > +if ${ac_cv_prog_AR+:} false; then :
> > +  $as_echo_n "(cached) " >&6
> > +else
> > +  if test -n "$AR"; then
> > +  ac_cv_prog_AR="$AR" # Let the user override the test.
> > +else
> > +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> > +for as_dir in $PATH
> > +do
> > +  IFS=$as_save_IFS
> > +  test -z "$as_dir" && as_dir=.
> > +for ac_exec_ext in '' $ac_executable_extensions; do
> > +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> > +ac_cv_prog_AR="${ac_tool_prefix}ar"
> > +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> > $as_dir/$ac_word$ac_exec_ext" >&5
> > +break 2
> > +  fi
> > +done
> > +  done
> > +IFS=$as_save_IFS
> > +
> > +fi
> > +fi
> > +AR=$ac_cv_prog_AR
> > +if test -n "$AR"; then
> > +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $AR" >&5
> > +$as_echo "$AR" >&6; }
> > +else
> > +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> > +$as_echo "no" >&6; }
> > +fi
> > +
> > +
> > +fi
> > +if test -z "$ac_cv_prog_AR"; then
> > +  ac_ct_AR=$AR
> > +  # Extract the first word of "ar", so it can be a program name with args.
> > +set dummy ar; ac_word=$2
> > +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> > +$as_echo_n "checking for $ac_word... " >&6; }
> > +if ${ac_cv_prog_ac_ct_AR+:} false; then :
> > +  $as_echo_n "(cached) " >&6
> > +else
> > +  if test -n "$ac_ct_AR"; then
> > +  ac_cv_prog_ac_ct_AR="$ac_ct_AR" # Let the user override the test.
> > +else
> > +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> > +for as_dir in $PATH
> > +do
> > +  IFS=$as_save_IFS
> > +  test -z "$as_dir" && as_dir=.
> > +for ac_exec_ext in '' $ac_executable_extensions; do
> > +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> > +ac_cv_prog_ac_ct_AR="ar"
> > +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> > $as_dir/$ac_word$ac_exec_ext" >&5
> > +break 2
> > +  fi
> > +done
> > +  done
> > +IFS=$as_save_IFS
> > +
> > +fi
> > +fi
> > +ac_ct_AR=$ac_cv_prog_ac_ct_AR
> > +if test -n "$ac_ct_AR"; then
> > +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_AR" >&5
> > +$as_echo "$ac_ct_AR" >&6; }
> > +else
> > +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> > +$as_echo "no" >&6; }
> > +fi
> > +
> > +  if test "x$ac_ct_AR" = x; then
> > +AR=""
> > +  else
> > +case $cross_compiling:$ac_tool_warned in
> > +yes:)
> > +{ $as_echo "$as_me:$

[PATCH] diagnostics: Consistently add fixit hint for implicit builtin declaration

2020-05-28 Thread Mark Wielaard
There are two warnings that might trigger when a builtin function is
used but not declared yet. Both called through implicitly_declare in
c-decl. The first in implicit_decl_warning does warn for builtins,
but does not add a fixit hint for them (only for non-builtins when
a header is suggested through lookup_name_fuzzy). This warning is
guarded by -Wimplicit-function-declaration. The second warning, which
does include a fixit hint if possible, is given when the implicit
builtin declaration has an incompatible signature. This second warning
cannot be disabled.

This setup means that you only get a fixit-hint for usage of builtin
functions where the implicit signature is different than the actual
signature of the builtin. No fixit hints with header suggestions
are ever generated for builtins like abs, isdigit or putchar.

It seems more consistent to always generate a fixit-hint if possible
for the -Wimplicit-function-declaration warning. And for the second
warning to make it depend on -Wbuiltin-declaration-mismatch like
other warnings about builtin declaration mismatches.

Include a new test to show we get fixit-hints for abs, isdigit and
putchar now. And some small tweaks to existing tests to show the
effect of -Wno-builtin-declaration-mismatch with this change.

A nice follow-up would be to merge the built-in missing headers table
from header_for_builtin_fn in c/c-decl.c with the known headers in
c-family/known-headers.cc so that they can also be used in the C++
frontend unqualified_name_lookup_error through suggest_alternatives_for.

gcc/c/ChangeLog:

* c-decl.c (implicit_decl_warning): When warned and olddecl is
an undeclared builtin, then add a fixit header hint, if found.
(implicitly_declare): Add OPT_Wbuiltin_declaration_mismatch to
warning_at about implicit builtin declaration type mismatch.

gcc/testsuite/ChangeLog:

* gcc.dg/missing-header-fixit-4.c: Add
-Wno-implicit-function-declaration.
* gcc.dg/missing-header-fixit-4.c: Add new expected output.
* gcc.dg/missing-header-fixit-5.c: New testcase.
---
 gcc/c/c-decl.c| 30 ++--
 gcc/testsuite/gcc.dg/missing-header-fixit-3.c |  2 +-
 gcc/testsuite/gcc.dg/missing-header-fixit-4.c |  4 +++
 gcc/testsuite/gcc.dg/missing-header-fixit-5.c | 36 +++
 4 files changed, 68 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/missing-header-fixit-5.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index b3e05be0af87..81bd2ee94f02 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -3368,8 +3368,30 @@ implicit_decl_warning (location_t loc, tree id, tree 
olddecl)
 warned = warning_at (loc, OPT_Wimplicit_function_declaration,
 G_("implicit declaration of function %qE"), id);
 
-  if (olddecl && warned)
-locate_old_decl (olddecl);
+  if (warned)
+{
+  /* Whether the olddecl is an undeclared builtin function.
+locate_old_decl will not generate a diagnostic for those,
+so in that case we want to look elsewhere.  */
+  bool undeclared_builtin = (olddecl
+&& TREE_CODE (olddecl) == FUNCTION_DECL
+&& fndecl_built_in_p (olddecl)
+&& !C_DECL_DECLARED_BUILTIN (olddecl));
+  if (undeclared_builtin)
+   {
+ const char *header = header_for_builtin_fn (olddecl);
+ if (header)
+   {
+ rich_location richloc (line_table, loc);
+ maybe_add_include_fixit (&richloc, header, true);
+ inform (&richloc,
+ "include %qs or provide a declaration of %qE",
+ header, id);
+   }
+   }
+  else if (olddecl)
+   locate_old_decl (olddecl);
+}
 
   if (!warned)
 hint.suppress ();
@@ -3631,7 +3653,9 @@ implicitly_declare (location_t loc, tree functionid)
  (TREE_TYPE (decl)));
  if (!comptypes (newtype, TREE_TYPE (decl)))
{
- bool warned = warning_at (loc, 0, "incompatible implicit "
+ bool warned = warning_at (loc,
+   OPT_Wbuiltin_declaration_mismatch,
+   "incompatible implicit "
"declaration of built-in "
"function %qD", decl);
  /* See if we can hint which header to include.  */
diff --git a/gcc/testsuite/gcc.dg/missing-header-fixit-3.c 
b/gcc/testsuite/gcc.dg/missing-header-fixit-3.c
index dd53bf65d3c8..8394010c1ac1 100644
--- a/gcc/testsuite/gcc.dg/missing-header-fixit-3.c
+++ b/gcc/testsuite/gcc.dg/missing-header-fixit-3.c
@@ -2,7 +2,7 @@
adding them to the top of the file, given that there is no
pre-existing #include.  */
 
-/* { dg-options "-fdiagnostics-show-caret -fdi

[PATCH] c++: Make braced-init-list as template arg work with aggr init [PR95369]

2020-05-28 Thread Marek Polacek via Gcc-patches
Barry pointed out to me that our braced-init-list as a template-argument
extension doesn't work as expected when we aggregate-initialize.  Thus
fixed by calling digest_init in convert_nontype_argument so that we can
actually convert the CONSTRUCTOR.

I don't think we can call digest_init any earlier, and it needs to
happen before the call to build_converted_constant_expr.

Barry also noticed that we allow designated initializers for
non-aggregate types in the template-argument argument context, i.e. this

  struct S {
unsigned a;
unsigned b;
constexpr S(unsigned _a, unsigned _b) noexcept: a{_a}, b{_b} { }
  };

  template struct X { };

  void f()
  {
X<{.a = 1, .b = 2}> x;
  }

probably should not compile.  But I'm not too sure about it, and don't
know how I would fix it anyway, so I'm not dealing with it in this
patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/95369
* pt.c (convert_nontype_argument): In C++20, reshape and digest
a braced-init-list if the type is an aggregate.

gcc/testsuite/ChangeLog:

PR c++/95369
* g++.dg/cpp2a/nontype-class38.C: New test.
---
 gcc/cp/pt.c  | 13 +
 gcc/testsuite/g++.dg/cpp2a/nontype-class38.C | 30 
 2 files changed, 43 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class38.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 90dafff3aa7..adb7593f77d 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -7133,6 +7133,19 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
return error_mark_node;
 }
 
+  /* For a { } template argument, like in X<{ 1, 2 }>, we need to digest
+ here so that build_converted_constant_expr below is able to convert
+ it to TYPE.  */
+  if (cxx_dialect >= cxx20
+  && BRACE_ENCLOSED_INITIALIZER_P (expr)
+  && CP_AGGREGATE_TYPE_P (type))
+{
+  expr = reshape_init (type, expr, complain);
+  expr = digest_init (type, expr, complain);
+  if (expr == error_mark_node)
+   return error_mark_node;
+}
+
   /* If we are in a template, EXPR may be non-dependent, but still
  have a syntactic, rather than semantic, form.  For example, EXPR
  might be a SCOPE_REF, rather than the VAR_DECL to which the
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class38.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class38.C
new file mode 100644
index 000..5b440fd1c9e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class38.C
@@ -0,0 +1,30 @@
+// PR c++/95369
+// { dg-do compile { target c++20 } }
+
+struct S {
+  int a;
+  int b;
+};
+
+struct W {
+  int i;
+  S s;
+};
+
+template 
+void fnc()
+{
+}
+
+template struct X { };
+template struct Y { };
+
+void f()
+{
+  fnc<{ .a = 10, .b = 20 }>();
+  fnc<{ 10, 20 }>();
+  X<{ .a = 1, .b = 2 }> x;
+  X<{ 1, 2 }> x2;
+  // Brace elision is likely to be allowed.
+  Y<{ 1, 2, 3 }> x3;
+}

base-commit: 3d8d5ddb539a5254c7ef83414377f4c74c7701d4
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [PATCH 2/2] Provide diagnostic hints for missing C++ cinttypes string constants.

2020-05-28 Thread Mark Wielaard
Hi,

On Mon, May 25, 2020 at 12:26:33PM -0400, Jason Merrill wrote:
> On 5/23/20 8:30 PM, Mark Wielaard wrote:
> > When reporting an error in cp_parser and we notice a string literal
> > followed by an unknown name check whether there is a known standard
> > header containing a string macro with the same name, then add a hint
> > to the error message to include that header.
> > 
> > gcc/c-family/ChangeLog:
> > 
> > * known-headers.cc (get_cp_stdlib_header_for_string_macro_name):
> > New function.
> > * known-headers.h (get_c_stdlib_header_for_string_macro_name):
> 
> Missing 'p'.
> 
> > New function definition.
> 
> Declaration, not definition.
> 
> The C++ changes are OK with these ChangeLog corrections.

Thanks. David, are you OK with the diagnostic changes?

Who can we trick into reviewing the C frontend changes in the 1/2
patch that this depends on?

Cheers,

Mark

> > gcc/cp/ChangeLog:
> > 
> > * parser.c (cp_lexer_safe_previous_token): New function.
> > (cp_parser_error_1): Add name_hint if the previous token is
> > a string literal and next token is a CPP_NAME and we have a
> > missing header suggestion for the name.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/spellcheck-inttypes.C: Add string-literal testcases.
> > ---
> >   gcc/c-family/known-headers.cc  |  8 +
> >   gcc/c-family/known-headers.h   |  1 +
> >   gcc/cp/parser.c| 36 
> >   gcc/testsuite/g++.dg/spellcheck-inttypes.C | 39 ++
> >   4 files changed, 84 insertions(+)
> > 
> > diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc
> > index c07cfd1db815..977230a586db 100644
> > --- a/gcc/c-family/known-headers.cc
> > +++ b/gcc/c-family/known-headers.cc
> > @@ -268,6 +268,14 @@ get_c_stdlib_header_for_string_macro_name (const char 
> > *name)
> > return get_string_macro_hint (name, STDLIB_C);
> >   }
> > +/* Given non-NULL NAME, return the header name defining a string macro
> > +   within the C++ standard library (with '<' and '>'), or NULL.  */
> > +const char *
> > +get_cp_stdlib_header_for_string_macro_name (const char *name)
> > +{
> > +  return get_string_macro_hint (name, STDLIB_CPLUSPLUS);
> > +}
> > +
> >   /* Implementation of class suggest_missing_header.  */
> >   /* suggest_missing_header's ctor.  */
> > diff --git a/gcc/c-family/known-headers.h b/gcc/c-family/known-headers.h
> > index a69bbbf28e76..f0c89dc9019d 100644
> > --- a/gcc/c-family/known-headers.h
> > +++ b/gcc/c-family/known-headers.h
> > @@ -24,6 +24,7 @@ extern const char *get_c_stdlib_header_for_name (const 
> > char *name);
> >   extern const char *get_cp_stdlib_header_for_name (const char *name);
> >   extern const char *get_c_stdlib_header_for_string_macro_name (const char 
> > *n);
> > +extern const char *get_cp_stdlib_header_for_string_macro_name (const char 
> > *n);
> >   /* Subclass of deferred_diagnostic for suggesting to the user
> >  that they have missed a #include.  */
> > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> > index 54ca875ce54c..95b8c635fc65 100644
> > --- a/gcc/cp/parser.c
> > +++ b/gcc/cp/parser.c
> > @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
> >   #include "tree-iterator.h"
> >   #include "cp-name-hint.h"
> >   #include "memmodel.h"
> > +#include "c-family/known-headers.h"
> >   
> >   /* The lexer.  */
> > @@ -776,6 +777,20 @@ cp_lexer_previous_token (cp_lexer *lexer)
> > return cp_lexer_token_at (lexer, tp);
> >   }
> > +/* Same as above, but return NULL when the lexer doesn't own the token
> > +   buffer or if the next_token is at the start of the token
> > +   vector.  */
> > +
> > +static cp_token *
> > +cp_lexer_safe_previous_token (cp_lexer *lexer)
> > +{
> > +  if (lexer->buffer)
> > +if (lexer->next_token != lexer->buffer->address ())
> > +  return cp_lexer_previous_token (lexer);
> > +
> > +  return NULL;
> > +}
> > +
> >   /* Overload for make_location, taking the lexer to mean the location of 
> > the
> >  previous token.  */
> > @@ -2919,6 +2934,7 @@ cp_parser_error_1 (cp_parser* parser, const char* 
> > gmsgid,
> > }
> >   }
> > +  auto_diagnostic_group d;
> > gcc_rich_location richloc (input_location);
> > bool added_matching_location = false;
> > @@ -2941,6 +2957,26 @@ cp_parser_error_1 (cp_parser* parser, const char* 
> > gmsgid,
> >   = richloc.add_location_if_nearby (matching_location);
> >   }
> > +  /* If we were parsing a string-literal and there is an unknown name
> > + token right after, then check to see if that could also have been
> > + a literal string by checking the name against a list of known
> > + standard string literal constants defined in header files. If
> > + there is one, then add that as an hint to the error message. */
> > +  name_hint h;
> > +  cp_token *prev_token = cp_lexer_safe_previous_token (parser->lexer);
> > +  if (prev_token && cp_parser

Re: [PATCH] diagnostics: Consistently add fixit hint for implicit builtin declaration

2020-05-28 Thread Martin Sebor via Gcc-patches

On 5/28/20 5:16 PM, Mark Wielaard wrote:

There are two warnings that might trigger when a builtin function is
used but not declared yet. Both called through implicitly_declare in
c-decl. The first in implicit_decl_warning does warn for builtins,
but does not add a fixit hint for them (only for non-builtins when
a header is suggested through lookup_name_fuzzy). This warning is
guarded by -Wimplicit-function-declaration. The second warning, which
does include a fixit hint if possible, is given when the implicit
builtin declaration has an incompatible signature. This second warning
cannot be disabled.

This setup means that you only get a fixit-hint for usage of builtin
functions where the implicit signature is different than the actual
signature of the builtin. No fixit hints with header suggestions
are ever generated for builtins like abs, isdigit or putchar.

It seems more consistent to always generate a fixit-hint if possible
for the -Wimplicit-function-declaration warning. And for the second
warning to make it depend on -Wbuiltin-declaration-mismatch like
other warnings about builtin declaration mismatches.

Include a new test to show we get fixit-hints for abs, isdigit and
putchar now. And some small tweaks to existing tests to show the
effect of -Wno-builtin-declaration-mismatch with this change.

A nice follow-up would be to merge the built-in missing headers table
from header_for_builtin_fn in c/c-decl.c with the known headers in
c-family/known-headers.cc so that they can also be used in the C++
frontend unqualified_name_lookup_error through suggest_alternatives_for.


This is much more in David's domain than mine but since I promised
to look at it let me just say it seems like a nice improvement :)

Although few tests bother with it, since you add an option for
the existing warning where there was none before, an even more
exhaustive test than the one you added would also verify the same
option can be used to suppress it (e.g., via #pragma GCC diagnostic
ignored).

Martin



gcc/c/ChangeLog:

* c-decl.c (implicit_decl_warning): When warned and olddecl is
an undeclared builtin, then add a fixit header hint, if found.
(implicitly_declare): Add OPT_Wbuiltin_declaration_mismatch to
warning_at about implicit builtin declaration type mismatch.

gcc/testsuite/ChangeLog:

* gcc.dg/missing-header-fixit-4.c: Add
-Wno-implicit-function-declaration.
* gcc.dg/missing-header-fixit-4.c: Add new expected output.
* gcc.dg/missing-header-fixit-5.c: New testcase.
---
  gcc/c/c-decl.c| 30 ++--
  gcc/testsuite/gcc.dg/missing-header-fixit-3.c |  2 +-
  gcc/testsuite/gcc.dg/missing-header-fixit-4.c |  4 +++
  gcc/testsuite/gcc.dg/missing-header-fixit-5.c | 36 +++
  4 files changed, 68 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/missing-header-fixit-5.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index b3e05be0af87..81bd2ee94f02 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -3368,8 +3368,30 @@ implicit_decl_warning (location_t loc, tree id, tree 
olddecl)
  warned = warning_at (loc, OPT_Wimplicit_function_declaration,
 G_("implicit declaration of function %qE"), id);
  
-  if (olddecl && warned)

-locate_old_decl (olddecl);
+  if (warned)
+{
+  /* Whether the olddecl is an undeclared builtin function.
+locate_old_decl will not generate a diagnostic for those,
+so in that case we want to look elsewhere.  */
+  bool undeclared_builtin = (olddecl
+&& TREE_CODE (olddecl) == FUNCTION_DECL
+&& fndecl_built_in_p (olddecl)
+&& !C_DECL_DECLARED_BUILTIN (olddecl));
+  if (undeclared_builtin)
+   {
+ const char *header = header_for_builtin_fn (olddecl);
+ if (header)
+   {
+ rich_location richloc (line_table, loc);
+ maybe_add_include_fixit (&richloc, header, true);
+ inform (&richloc,
+ "include %qs or provide a declaration of %qE",
+ header, id);
+   }
+   }
+  else if (olddecl)
+   locate_old_decl (olddecl);
+}
  
if (!warned)

  hint.suppress ();
@@ -3631,7 +3653,9 @@ implicitly_declare (location_t loc, tree functionid)
  (TREE_TYPE (decl)));
  if (!comptypes (newtype, TREE_TYPE (decl)))
{
- bool warned = warning_at (loc, 0, "incompatible implicit "
+ bool warned = warning_at (loc,
+   OPT_Wbuiltin_declaration_mismatch,
+   "incompatible implicit "
"declaration of built-in "
"function %qD", decl);

[PATCH] rs6000: libgcc multilib FAT libraries

2020-05-28 Thread David Edelsohn via Gcc-patches
When AIX added 64 bit support, it implemented what Apple MacOS Darwin
calls "FAT" libraries for its equivalent functionality -- both 32 bit
and 64 bit objects (or shared objects) are co-located in the same
archive.  GCC on AIX historically has followed the GCC multilib
directory hierarchy approach with separate directories and archives
for each multilib.

We now are working to support GCC on AIX in 64 bit mode.  To retain
the directory hierarchy, it is beneficial to shift (or at least
initially augment) the GCC multilib mechanism with AIX-style "FAT"
libraries.

It is beneficial for the "FAT" libraries to be created consistently
for GCC in both 32 bit mode and 64 bit mode, so this begins the
process for libgcc in the existing 32 bit build.  When all of the
libraries are converted, the multilib rules will look for 32 bit and
64 bit multilibs in the top-level library.  All target multilibs need
to be enabled at the same time, but the build can start the creation
of the "FAT" libraries without utilizing them.

The goal is to place both 32 bit and 64 bit objects and shared objects
in archives at the top-level, not multilib subdirectories.  The
multilibs are built in subdirectories, but must be combined during the
last parts of the target library build process.  Because of the way
that GCC bootstrap works, the libraries must be combined during the
multiple stages of GCC bootstrap, not solely when installed in the
final destination, so the libraries have to be correct at the end of
each target library build stage, not solely an install recipe.

For libgcc, this is accomplished by copying 64 bit objects into
top-level 32 bit library in 32 bit mode and 32 bit objects into
top-level 64 bit library in 64 bit mode.  The recipe is protected by
MULTIBUILDTOP so that it only is run at top-level after multilibs are
built.  The recipe is rather explicit, but it already has to know a
lot of details about the names and locations of objects, so I did not
see the need to insert additional macros, which only hide the purpose.
It also is a very target-specific purpose and context.

For other target libraries, I plan to attach something similar to the
all-local rule that only is valid for $(target_os) equal to AIX.
Something similar can be implemented for Darwin.

If anyone has a suggestion for a cleaner approach, please let me know.

Thanks, David

libgcc/
* config.host (extra_parts): Add crtcxa_64 and crtdbase_64.
* config/rs6000/t-aix-cxa: Explicitly compile 32 bit with -maix32
and 64 bit with -maix64.
* config/rs6000/t-slibgcc-aix: Remove extra @multilib_dir@ level.
Build and install AIX-style FAT libraries.

gcc/
* config/rs6000/aix72.h (STARTFILE_SPEC): Add 64 bit crtcxa.


0001-rs6000-libgcc-multilib.patch
Description: Binary data


Re: [PATCH v3] Add -fuse-ld= to specify an arbitrary executable as the linker

2020-05-28 Thread Fangrui Song via Gcc-patches

On 2020-05-25, Martin Liška wrote:

On 5/22/20 6:42 AM, Fangrui Song wrote:

but I can't fix this one because joining two lines will break the 80-column 
rule.


What about this:

diff --git a/gcc/collect2.c b/gcc/collect2.c
index cc57a20e08b..e5b54b080f7 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -1138,8 +1138,8 @@ main (int argc, char **argv)
  /* Search the ordinary system bin directories
 for `ld' (if native linking) or `TARGET-ld' (if cross).  */
  if (ld_file_name == 0)
-   ld_file_name =
- find_a_file (&path, full_ld_suffixes[selected_linker], X_OK);
+   ld_file_name
+ = find_a_file (&path, full_ld_suffixes[selected_linker], X_OK);
}
#ifdef REAL_NM_FILE_NAME

Apart from that, the patch is fine.

Martin


Adding people who may be able to approve and commit on my behalf.

This formatting issue seems small enough. Hopefully a maintainer can do
it for me.


Re: [PATCH] diagnostics: Consistently add fixit hint for implicit builtin declaration

2020-05-28 Thread Mark Wielaard
Hi Martin,

On Thu, May 28, 2020 at 06:21:39PM -0600, Martin Sebor wrote:
> Although few tests bother with it, since you add an option for
> the existing warning where there was none before, an even more
> exhaustive test than the one you added would also verify the same
> option can be used to suppress it (e.g., via #pragma GCC diagnostic
> ignored).

OK. How about this variant with an extra
Wbuiltin-declaration-mismatch-ignore.c test?
It FAILS with (test for excess errors) before the patch.
It PASSes with the patch.

Thanks,

Mark>From a35979eee900c57ebf5c60f2eea7f8e4eb6d0464 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Thu, 28 May 2020 02:55:36 +0200
Subject: [PATCH] diagnostics: Consistently add fixit hint for implicit builtin
 declaration

There are two warnings that might trigger when a builtin function is
used but not declared yet. Both called through implicitly_declare in
c-decl. The first in implicit_decl_warning does warn for builtins,
but does not add a fixit hint for them (only for non-builtins when
a header is suggested through lookup_name_fuzzy). This warning is
guarded by -Wimplicit-function-declaration. The second warning, which
does include a fixit hint if possible, is given when the implicit
builtin declaration has an incompatible signature. This second warning
cannot be disabled.

This setup means that you only get a fixit-hint for usage of builtin
functions where the implicit signature is different than the actual
signature of the builtin. No fixit hints with header suggestions
are ever generated for builtins like abs, isdigit or putchar.

It seems more consistent to always generate a fixit-hint if possible
for the -Wimplicit-function-declaration warning. And for the second
warning to make it depend on -Wbuiltin-declaration-mismatch like
other warnings about builtin declaration mismatches.

Include a new test to show we get fixit-hints for abs, isdigit and
putchar now. Some small tweaks to existing tests to show the
effect of -Wno-builtin-declaration-mismatch with this change. And
a testcase to show that #pragma GCC diagnostic ignored now works.

gcc/c/ChangeLog:

	* c-decl.c (implicit_decl_warning): When warned and olddecl is
	an undeclared builtin, then add a fixit header hint, if found.
	(implicitly_declare): Add OPT_Wbuiltin_declaration_mismatch to
	warning_at about implicit builtin declaration type mismatch.

gcc/testsuite/ChangeLog:

	* gcc.dg/missing-header-fixit-4.c: Add
	-Wno-implicit-function-declaration.
	* gcc.dg/missing-header-fixit-4.c: Add new expected output.
	* gcc.dg/missing-header-fixit-5.c: New testcase.
	* gcc.dg/Wbuiltin-declaration-mismatch-ignore.c: Likewise.
---
 gcc/c/c-decl.c| 30 ++--
 .../Wbuiltin-declaration-mismatch-ignore.c| 11 ++
 gcc/testsuite/gcc.dg/missing-header-fixit-3.c |  2 +-
 gcc/testsuite/gcc.dg/missing-header-fixit-4.c |  4 +++
 gcc/testsuite/gcc.dg/missing-header-fixit-5.c | 36 +++
 5 files changed, 79 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-ignore.c
 create mode 100644 gcc/testsuite/gcc.dg/missing-header-fixit-5.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index b3e05be0af87..81bd2ee94f02 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -3368,8 +3368,30 @@ implicit_decl_warning (location_t loc, tree id, tree olddecl)
 warned = warning_at (loc, OPT_Wimplicit_function_declaration,
 			 G_("implicit declaration of function %qE"), id);
 
-  if (olddecl && warned)
-locate_old_decl (olddecl);
+  if (warned)
+{
+  /* Whether the olddecl is an undeclared builtin function.
+	 locate_old_decl will not generate a diagnostic for those,
+	 so in that case we want to look elsewhere.  */
+  bool undeclared_builtin = (olddecl
+ && TREE_CODE (olddecl) == FUNCTION_DECL
+ && fndecl_built_in_p (olddecl)
+ && !C_DECL_DECLARED_BUILTIN (olddecl));
+  if (undeclared_builtin)
+	{
+	  const char *header = header_for_builtin_fn (olddecl);
+	  if (header)
+	{
+	  rich_location richloc (line_table, loc);
+	  maybe_add_include_fixit (&richloc, header, true);
+	  inform (&richloc,
+		  "include %qs or provide a declaration of %qE",
+		  header, id);
+	}
+	}
+  else if (olddecl)
+	locate_old_decl (olddecl);
+}
 
   if (!warned)
 hint.suppress ();
@@ -3631,7 +3653,9 @@ implicitly_declare (location_t loc, tree functionid)
 		  (TREE_TYPE (decl)));
 	  if (!comptypes (newtype, TREE_TYPE (decl)))
 		{
-		  bool warned = warning_at (loc, 0, "incompatible implicit "
+		  bool warned = warning_at (loc,
+	OPT_Wbuiltin_declaration_mismatch,
+	"incompatible implicit "
 	"declaration of built-in "
 	"function %qD", decl);
 		  /* See if we can hint which header to include.  */
diff --git a/gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-ignore.c b/gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-ignore.c
new file mode 1006

[PATCH] Optimize and+or+sub into xor+not (PR94882)

2020-05-28 Thread Naveen Hurugalawadi via Gcc-patches
Hi,

Please find attached the patch that addresses PR94882.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Thanks,
Naveen

match.pd: (x & y) - (x | y) - 1 -> ~(x ^ y) simplification [PR94882]

2029-05-04  Naveen H S  

PR tree-optimization/94882

* match.pd (x & y) - (x | y) - 1 -> ~(x ^ y): New simplification.

* gcc.dg/tree-ssa/pr94882.c: New test.
* gcc.dg/tree-ssa/pr94882-1.c: New test.


2.patch
Description: 2.patch


Re: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

2020-05-28 Thread Hongtao Liu via Gcc-patches
> > > 21668   UNSPEC_VCVTPS2PH))]
> > > 21669   "TARGET_AVX512F"
> > > 21670   "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}"
> > > 21671   [(set_attr "type" "ssecvt")
> > > 21672(set_attr "prefix" "evex")
> > > 21673(set_attr "mode" "V16SF")])
> > >
> > > How can that happen?
> >
> > This is due to define_subst magic.  The generators automatically
> > create a vec_merge form of the instruction based on the information
> > in the  attributes.
> >
> > AFAICT the rtl above is for the line-125 instruction, which looks ok.
> > The problem is the line-126 instruction, since vcvtps2ph doesn't
> > AIUI allow zero masking.
> >

zero masking is not allowed for mem_operand here, but available for
register_operand.
there's something wrong in the pattern, we need to fix it.
(define_insn "avx512f_vcvtps2ph512"


> > The "mask" define_subst allows both zeroing and merging,
> > so I guess this means that the pattern should either be using
> > a different define_subst, or should be enforcing merging in
> > some other way.  Please could one of the x86 devs take a look?
> >
>
> Hongtao, can you take a look?
>
> Thanks.
>
>
> --
> H.J.

BTW, i failed to build gcc when apply pr95254-v4.txt.

gcc configure:

Using built-in specs.
COLLECT_GCC=./gcc/xgcc
Target: x86_64-pc-linux-gnu
Configured with: ../../gcc/gnu-toolchain/gcc/configure
--enable-languages=c,c++,fortran --disable-bootstrap
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20200528 (experimental) (GCC)

host on x86_64 rel8.

error message:

during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_eq_dd.c: In
function ‘__bid_eqdd2’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_eq_dd.c:35:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
   35 |   res = __bid64_quiet_equal (ux.i, uy.i);
  | ^~~~
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_gt_dd.c: In
function ‘__bid_gtdd2’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_gt_dd.c:35:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
   35 |   res = __bid64_quiet_greater (ux.i, uy.i);
  | ^~
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_dd_to_di.c:
In function ‘__bid_fixdddi’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_dd_to_di.c:34:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
   34 |   res = __bid64_to_int64_xint (ux.i);
  | ^~~~
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_ge_dd.c: In
function ‘__bid_gedd2’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_ge_dd.c:35:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
   35 |   res = __bid64_quiet_greater_equal (ux.i, uy.i);
  | ^~~~
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_le_dd.c: In
function ‘__bid_ledd2’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_le_dd.c:35:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
   35 |   res = __bid64_quiet_less_equal (ux.i, uy.i);
  | ^

--
BR,
Hongtao


Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-28 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool  writes:

> Hi!
>
> On Thu, May 28, 2020 at 04:22:16PM +0200, Richard Biener wrote:
>> For GIMPLE level transforms I don't think targets have more knowledge
>> than the middle-end.
>
> Yes, certainly.
>
>> In fact GIMPLE complete unrolling is about
>> secondary effects, removing redundancies and abstraction.  So IMHO
>> the correct approach is to look at individual cases and try to improve
>> the generic code
>
> Yep.
>
>> rather than try to get better benchmark results
>> on a per-target manner by magical parameter tuning.
>
> I'm no fan of that for target-specific code either.  It's fine to be led
> by benchmarks, but usually a better justification is needed.

Thanks all,
Agree, we'd better tune it in generic code.

Jiufu

>
>> For what the RTL unroller does it indeed depends very heavily on
>> the target whether sth is beneficial or not.
>
> Yes :-(  And this means it will need to remain late in the pass
> pipeline,  or at least the decision needs to use target information
> (just like what ivopts does).
>
>> So I'd like to see specific cases where you think cunroll should
>> do "better" on powerpc only but not elsewhere.
>
> It is probably not a good idea in general to unroll 14 times, yes :-)
>
>
> Segher


Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-05-28 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Thu, May 28, 2020 at 4:37 PM Jiufu Guo  wrote:
>>
>> Richard Biener  writes:
>>
>> > On Thu, May 28, 2020 at 10:52 AM guojiufu  wrote:
>> >>
>> >> From: Jiufu Guo 
>> >>
>> >> Currently GIMPLE complete unroller(cunroll) is checking
>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
>> >> To have more freedom to control cunroll and RTL unroller, this patch
>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
>> >> cunroll and RTL unroller indepently.
>> >>
>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
>> >> GCC10 after week?
>> >>
>> >>
>> >> +funroll-completely-grow-size
>> >> +Var(flag_cunroll_grow_size) Init(2)
>> >> +; Control cunroll to allow size growth during complete unrolling
>> >> +
>> >
>> > So this really adds a new compiler option which would need
>> > documenting.
>> I once add 'Undocumented' (avoid shown in --help), and do not add
>> 'Common' (avoid --help=common).  What I want to do is avoid expose this
>> to user.
>> While this is still an option as you said.
>>
>> >
>> > I fear we'll get into bikeshed territory here as well...  I originally 
>> > thought
>> > we can use
>> >
>> > Variable
>> > int flag_cunroll_grow_size;
>>
>> Thanks, this code is definetly a variable instead an option. I would try
>> this way.
>> >
>> > but now realize that does not work well with LTO without adjusting
>> > the awk scripts to generate option saving/restoring.  For your patch
>> > you'd need to add 'Optimization' to get the flag streamed properly,
>> > you should also verify the target adjustment done in the backend
>> > is reflected in LTO mode.
>>
>> At here, internal option is relative simple 'Optimization' could help.
>> When trying 'Variable', I will verify it in LTO mode.
>
> It won't work without adjusting the awk scripts.  So go with
>
> funroll-completely-grow-size
> Undocumented Optimization Var(flag_cunroll_grow_size)
> EnabledBy(funroll-loops || fpeel-loops)
> ; ...
>
EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
"-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.

Through "EnabledBy", a flag can be turned, and also can be turned off by
the "EnabledBy option", only if the flag is not specifed through commond
line.  

> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
> an option not supposed to be set by users?
>

global_options_set.x_flagxxx can be used to check if option is set by
user.  But it does not work well here neither, because we also care of
if the flag is override by OPTION_OPTIMIZATION_TABLE or
OPTION_OVERRIDE. 

AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
flag_rename_registers, flag_var_tracking, flag_tree_cselim...
And this way could be used to check if the flag is effective(on/off)
either explicit set by command line or implicit set through
OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
So, I use it here.

Thanks again!
Jiufu

>
>> >
>> >>  ; Nonzero means that loop optimizer may assume that the induction 
>> >> variables
>> >>
>> >> +  /* Allow cunroll to grow size accordingly.  */
>> >> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
>> >> +flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
>> >> +
>> >
>> > Any reason to not use EnabledBy(funroll-loops || fpeel-loops)?
>>
>> With tests and checking the generated code(e.g. options.c), I find that
>> this setting has some unexpected behavior:
>> For example, "-funroll-loops -fno-peel-loops" turns off the flag.
>> "||" would indicate the flag will be _on/off_ by f[no]-unroll-loop or
>> f[no]-peel-loops.
>>
>> >
>> >>/* web and rename-registers help when run after loop unrolling.  */
>> >>if (flag_web == AUTODETECT_VALUE)
>> >>  flag_web = flag_unroll_loops;
>>
>> >> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
>> >> -  || flag_peel_loops
>> >> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>> >>|| optimize >= 3, 
>> >> true);
>> >
>> > Given we check optimize >= 3 here please enable the flag by default
>> > at O3+ via opts.c:default_options_table and also elide the optimize >= 3
>> > check.  That way -fno-unroll-completely-grow-size would have the desired 
>> > effect.
>> >
>> Actually in code flag_peel_loops is enabled at O3+, so, "|| optimize >=
>> 3" could be removed.  Like you said, this helps to set negative form
>> even at -O3.
>
> You are right.
>
>> > Now back to the option name ... if we expose the option we should apply
>> > some forward looking.  Currently cunroll cannot be disabled or enabled
>> > with a flag and the desired new flag simply tunes one knob on it.  How
>> > about adding
>> >
>> > -fcomplete-unroll-loops[=may-grow]
>> -fcomplete-unroll-loops[=may-grow|inner|outer]
>> >
>> > 

[PATCH v5 4/8] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock

2020-05-28 Thread Mike Crowe via Gcc-patches
The user-visible effect of this change is that std::future::wait_for now
uses std::chrono::steady_clock to determine the timeout.  This makes it
immune to changes made to the system clock.  It also means that anyone
using their own clock types with std::future::wait_until will have the
timeout converted to std::chrono::steady_clock rather than
std::chrono::system_clock.

Now that use of both std::chrono::steady_clock and
std::chrono::system_clock are correctly supported for the wait timeout, I
believe that std::chrono::steady_clock is a better choice for the reference
clock that all other clocks are converted to since it is guaranteed to
advance steadily.  The previous behaviour of converting to
std::chrono::system_clock risks timeouts changing dramatically when the
system clock is changed.

* libstdc++-v3/include/bits/atomic_futex.h:
(__atomic_futex_unsigned): Change __clock_t typedef to use
steady_clock so that unknown clocks are synced to it rather than
system_clock. Change existing __clock_t overloads of
_M_load_and_text_until_impl and _M_load_when_equal_until to use
system_clock explicitly. Remove comment about DR 887 since these
changes address that problem as best as we currently able.
---
 libstdc++-v3/include/bits/atomic_futex.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 507c5c9..4375129 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template 
   class __atomic_futex_unsigned : __atomic_futex_unsigned_base
   {
-typedef chrono::system_clock __clock_t;
+typedef chrono::steady_clock __clock_t;
 
 // This must be lock-free and at offset 0.
 atomic _M_data;
@@ -169,7 +169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 unsigned
 _M_load_and_test_until_impl(unsigned __assumed, unsigned __operand,
bool __equal, memory_order __mo,
-   const chrono::time_point<__clock_t, _Dur>& __atime)
+   const chrono::time_point& __atime)
 {
   auto __s = chrono::time_point_cast(__atime);
   auto __ns = chrono::duration_cast(__atime - __s);
@@ -229,7 +229,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_load_when_equal_until(unsigned __val, memory_order __mo,
  const chrono::time_point<_Clock, _Duration>& __atime)
   {
-   // DR 887 - Sync unknown clock to known clock.
const typename _Clock::time_point __c_entry = _Clock::now();
const __clock_t::time_point __s_entry = __clock_t::now();
const auto __delta = __atime - __c_entry;
@@ -241,7 +240,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template
 _GLIBCXX_ALWAYS_INLINE bool
 _M_load_when_equal_until(unsigned __val, memory_order __mo,
-   const chrono::time_point<__clock_t, _Duration>& __atime)
+   const chrono::time_point& __atime)
 {
   unsigned __i = _M_load(__mo);
   if ((__i & ~_Waiter_bit) == __val)
-- 
git-series 0.9.1


[PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock

2020-05-28 Thread Mike Crowe via Gcc-patches
If std::future::wait_until is passed a time point measured against a clock
that is neither std::chrono::steady_clock nor std::chrono::system_clock
then the generic implementation of
__atomic_futex_unsigned::_M_load_when_equal_until is called which
calculates the timeout based on __clock_t and calls the
_M_load_when_equal_until method for that clock to perform the actual wait.

There's no guarantee that __clock_t is running at the same speed as the
caller's clock, so if the underlying wait times out timeout we need to
check the timeout against the caller's clock again before potentially
looping.

Also add two extra tests to the testsuite's async.cc:

* run test03 with steady_clock_copy, which behaves identically to
  std::chrono::steady_clock, but isn't std::chrono::steady_clock. This
  causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until
  that takes an arbitrary clock to be called.

* invent test04 which uses a deliberately slow running clock in order to
  exercise the looping behaviour o
  __atomic_futex_unsigned::_M_load_when_equal_until described above.

* libstdc++-v3/include/bits/atomic_futex.h:
(__atomic_futex_unsigned) Add loop to _M_load_when_equal_until on
generic _Clock to check the timeout against _Clock again after
_M_load_when_equal_until returns indicating a timeout.

* libstdc++-v3/testsuite/30_threads/async/async.cc: Invent
slow_clock that runs at an eleventh of steady_clock's speed. Use it
to test the user-supplied-clock variant of
__atomic_futex_unsigned::_M_load_when_equal_until works generally
with test03 and loops correctly when the timeout time hasn't been
reached in test04.
---
 libstdc++-v3/include/bits/atomic_futex.h | 15 ++--
 libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +-
 2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 4375129..5f95ade 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_load_when_equal_until(unsigned __val, memory_order __mo,
  const chrono::time_point<_Clock, _Duration>& __atime)
   {
-   const typename _Clock::time_point __c_entry = _Clock::now();
-   const __clock_t::time_point __s_entry = __clock_t::now();
-   const auto __delta = __atime - __c_entry;
-   const auto __s_atime = __s_entry + __delta;
-   return _M_load_when_equal_until(__val, __mo, __s_atime);
+   typename _Clock::time_point __c_entry = _Clock::now();
+   do {
+ const __clock_t::time_point __s_entry = __clock_t::now();
+ const auto __delta = __atime - __c_entry;
+ const auto __s_atime = __s_entry + __delta;
+ if (_M_load_when_equal_until(__val, __mo, __s_atime))
+   return true;
+ __c_entry = _Clock::now();
+   } while (__c_entry < __atime);
+   return false;
   }
 
 // Returns false iff a timeout occurred.
diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 84d94cf..ee117f4 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -63,6 +63,24 @@ void test02()
   VERIFY( status == std::future_status::ready );
 }
 
+// This clock behaves exactly the same as steady_clock, but it is not
+// steady_clock which means that the generic clock overload of
+// future::wait_until is used.
+struct steady_clock_copy
+{
+  using rep = std::chrono::steady_clock::rep;
+  using period = std::chrono::steady_clock::period;
+  using duration = std::chrono::steady_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = true;
+
+  static time_point now()
+  {
+const auto steady = std::chrono::steady_clock::now();
+return time_point{steady.time_since_epoch()};
+  }
+};
+
 // This test is prone to failures if run on a loaded machine where the
 // kernel decides not to schedule us for several seconds. It also
 // assumes that no-one will warp CLOCK whilst the test is
@@ -90,11 +108,63 @@ void test03()
   VERIFY( elapsed < std::chrono::seconds(5) );
 }
 
+// This clock is supposed to run at a tenth of normal speed, but we
+// don't have to worry about rounding errors causing us to wake up
+// slightly too early below if we actually run it at an eleventh of
+// normal speed. It is used to exercise the
+// __atomic_futex_unsigned::_M_load_when_equal_until overload that
+// takes an arbitrary clock.
+struct slow_clock
+{
+  using rep = std::chrono::steady_clock::rep;
+  using period = std::chrono::steady_clock::period;
+  using duration = std::chrono::steady_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = true;
+
+  static time_point now(

[PATCH v5 8/8] libstdc++: Extra async tests, not for merging

2020-05-28 Thread Mike Crowe via Gcc-patches
These tests show that changing the system clock has an effect on
std::future::wait_until when using std::chrono::system_clock but not when
using std::chrono::steady_clock.  Unfortunately these tests have a number
of downsides:

1. Nothing that is attempting to keep the clock set correctly (ntpd,
   systemd-timesyncd) can be running at the same time.

2. The test process requires the CAP_SYS_TIME capability (although, as it's
   written it checks for being root.)

3. Other processes running concurrently may misbehave when the clock darts
   back and forth.

4. They are slow to run.

As such, I don't think they are suitable for merging. I include them here
because I wanted to document how I had tested the changes in the previous
commits.
---
 libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +-
 1 file changed, 70 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index f697292..8b44810 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 
 using namespace std;
 
@@ -172,6 +173,71 @@ void test_pr91486()
   VERIFY( elapsed_steady >= std::chrono::seconds(1) );
 }
 
+void perturb_system_clock(const std::chrono::seconds &seconds)
+{
+  struct timeval tv;
+  if (gettimeofday(&tv, NULL))
+abort();
+
+  tv.tv_sec += seconds.count();
+  if (settimeofday(&tv, NULL))
+abort();
+}
+
+// Ensure that advancing CLOCK_REALTIME doesn't make any difference
+// when we're waiting on std::chrono::steady_clock.
+void test05()
+{
+  auto const start = chrono::steady_clock::now();
+  future f1 = async(launch::async, []() {
+  std::this_thread::sleep_for(std::chrono::seconds(10));
+});
+
+  perturb_system_clock(chrono::seconds(20));
+
+  std::future_status status;
+  status = f1.wait_for(std::chrono::seconds(4));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(6));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(12));
+  VERIFY( status == std::future_status::ready );
+
+  auto const elapsed = chrono::steady_clock::now() - start;
+  VERIFY( elapsed >= std::chrono::seconds(10) );
+  VERIFY( elapsed < std::chrono::seconds(15) );
+
+  perturb_system_clock(chrono::seconds(-20));
+}
+
+// Ensure that advancing CLOCK_REALTIME does make a difference when
+// we're waiting on std::chrono::system_clock.
+void test06()
+{
+  auto const start = chrono::system_clock::now();
+  auto const start_steady = chrono::steady_clock::now();
+
+  future f1 = async(launch::async, []() {
+  std::this_thread::sleep_for(std::chrono::seconds(5));
+  perturb_system_clock(chrono::seconds(60));
+  std::this_thread::sleep_for(std::chrono::seconds(5));
+});
+  future_status status;
+  status = f1.wait_until(start + std::chrono::seconds(60));
+  VERIFY( status == std::future_status::timeout );
+
+  auto const elapsed_steady = chrono::steady_clock::now() - start_steady;
+  VERIFY( elapsed_steady >= std::chrono::seconds(5) );
+  VERIFY( elapsed_steady < std::chrono::seconds(10) );
+
+  status = f1.wait_until(start + std::chrono::seconds(75));
+  VERIFY( status == std::future_status::ready );
+
+  perturb_system_clock(chrono::seconds(-60));
+}
+
 int main()
 {
   test01();
@@ -181,5 +247,9 @@ int main()
   test03();
   test04();
   test_pr91486();
+  if (geteuid() == 0) {
+test05();
+test06();
+  }
   return 0;
 }
-- 
git-series 0.9.1


[PATCH v5 1/8] libstdc++: Improve async test

2020-05-28 Thread Mike Crowe via Gcc-patches
Add tests for waiting for the future using both std::chrono::steady_clock
and std::chrono::system_clock in preparation for dealing with those clocks
properly in futex.cc.

 * libstdc++-v3/testsuite/30_threads/async/async.cc (test02): Test
 steady_clock with std::future::wait_until.  (test03): Add new test
 templated on clock type waiting for future associated with async
 to resolve.  (main): Call test03 to test both system_clock and
 steady_clock.
---
 libstdc++-v3/testsuite/30_threads/async/async.cc | 33 +-
 1 file changed, 33 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 7fa9b03..84d94cf 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -51,17 +51,50 @@ void test02()
   VERIFY( status == std::future_status::timeout );
   status = f1.wait_until(std::chrono::system_clock::now());
   VERIFY( status == std::future_status::timeout );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::timeout );
   l.unlock();  // allow async thread to proceed
   f1.wait();   // wait for it to finish
   status = f1.wait_for(std::chrono::milliseconds(0));
   VERIFY( status == std::future_status::ready );
   status = f1.wait_until(std::chrono::system_clock::now());
   VERIFY( status == std::future_status::ready );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::ready );
+}
+
+// This test is prone to failures if run on a loaded machine where the
+// kernel decides not to schedule us for several seconds. It also
+// assumes that no-one will warp CLOCK whilst the test is
+// running when CLOCK is std::chrono::system_clock.
+template
+void test03()
+{
+  auto const start = CLOCK::now();
+  future f1 = async(launch::async, []() {
+  std::this_thread::sleep_for(std::chrono::seconds(2));
+});
+  std::future_status status;
+
+  status = f1.wait_for(std::chrono::milliseconds(500));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(1));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(5));
+  VERIFY( status == std::future_status::ready );
+
+  auto const elapsed = CLOCK::now() - start;
+  VERIFY( elapsed >= std::chrono::seconds(2) );
+  VERIFY( elapsed < std::chrono::seconds(5) );
 }
 
 int main()
 {
   test01();
   test02();
+  test03();
+  test03();
   return 0;
 }
-- 
git-series 0.9.1


[PATCH v5 3/8] libstdc++ futex: Support waiting on std::chrono::steady_clock directly

2020-05-28 Thread Mike Crowe via Gcc-patches
The user-visible effect of this change is for std::future::wait_until to
use CLOCK_MONOTONIC when passed a timeout of std::chrono::steady_clock
type.  This makes it immune to any changes made to the system clock
CLOCK_REALTIME.

Add an overload of __atomic_futex_unsigned::_M_load_and_text_until_impl
that accepts a std::chrono::steady_clock, and correctly passes this through
to __atomic_futex_unsigned_base::_M_futex_wait_until_steady which uses
CLOCK_MONOTONIC for the timeout within the futex system call.  These
functions are mostly just copies of the std::chrono::system_clock versions
with small tweaks.

Prior to this commit, a std::chrono::steady timeout would be converted via
std::chrono::system_clock which risks reducing or increasing the timeout if
someone changes CLOCK_REALTIME whilst the wait is happening.  (The commit
immediately prior to this one increases the window of opportunity for that
from a short period during the calculation of a relative timeout, to the
entire duration of the wait.)

FUTEX_WAIT_BITSET was added in kernel v2.6.25.  If futex reports ENOSYS to
indicate that this operation is not supported then the code falls back to
using clock_gettime(2) to calculate a relative time to wait for.

I believe that I've added this functionality in a way that it doesn't break
ABI compatibility, but that has made it more verbose and less type safe.  I
believe that it would be better to maintain the timeout as an instance of
the correct clock type all the way down to a single _M_futex_wait_until
function with an overload for each clock.  The current scheme of separating
out the seconds and nanoseconds early risks accidentally calling the wait
function for the wrong clock.  Unfortunately, doing this would break code
that compiled against the old header.

* libstdc++-v3/config/abi/pre/gnu.ver: Update for addition of
  __atomic_futex_unsigned_base::_M_futex_wait_until_steady.

* libstdc++-v3/include/bits/atomic_futex.h
  (__atomic_futex_unsigned_base): Add comments to clarify that
  _M_futex_wait_until _M_load_and_test_until use CLOCK_REALTIME.
  Declare new _M_futex_wait_until_steady and
  _M_load_and_text_until_steady methods that use CLOCK_MONOTONIC.
  Add _M_load_and_test_until_impl and _M_load_when_equal_until
  overloads that accept a steady_clock time_point and use these new
  methods.

* libstdc++-v3/src/c++11/futex.cc: Include headers required for
clock_gettime. Add futex_clock_monotonic_flag constant to tell
futex to use CLOCK_MONOTONIC to match the existing
futex_clock_realtime_flag.  Add futex_clock_monotonic_unavailable
to store the result of trying to use
CLOCK_MONOTONIC. 
(__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
Add new variant of _M_futex_wait_until that uses CLOCK_MONOTONIC to
support waiting using steady_clock.
---
 libstdc++-v3/config/abi/pre/gnu.ver  | 10 +--
 libstdc++-v3/include/bits/atomic_futex.h | 67 +++-
 libstdc++-v3/src/c++11/futex.cc  | 82 +-
 3 files changed, 154 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index edf4485..3d734d7 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1916,10 +1916,9 @@ GLIBCXX_3.4.21 {
 _ZNSt7codecvtID[is]c*;
 _ZT[ISV]St7codecvtID[is]c*E;
 
-extern "C++"
-{
-  std::__atomic_futex_unsigned_base*;
-};
+# std::__atomic_futex_unsigned_base members
+_ZNSt28__atomic_futex_unsigned_base19_M_futex_notify_all*;
+_ZNSt28__atomic_futex_unsigned_base19_M_futex_wait_until*;
 
 # codecvt_utf8 etc.
 _ZNKSt19__codecvt_utf8_base*;
@@ -2297,6 +2296,9 @@ GLIBCXX_3.4.28 {
 _ZNSt3pmr25monotonic_buffer_resourceD[0125]Ev;
 _ZT[ISV]NSt3pmr25monotonic_buffer_resourceE;
 
+# std::__atomic_futex_unsigned_base::_M_futex_wait_until_steady
+_ZNSt28__atomic_futex_unsigned_base26_M_futex_wait_until_steady*;
+
 } GLIBCXX_3.4.27;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 886fc63..507c5c9 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -52,11 +52,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if defined(_GLIBCXX_HAVE_LINUX_FUTEX) && ATOMIC_INT_LOCK_FREE > 1
   struct __atomic_futex_unsigned_base
   {
-// Returns false iff a timeout occurred.
+// __s and __ns are measured against CLOCK_REALTIME. Returns false
+// iff a timeout occurred.
 bool
 _M_futex_wait_until(unsigned *__addr, unsigned __val, bool __has_timeout,
chrono::seconds __s, chrono::nanoseconds __ns);
 
+// __s and __ns are measured against CLOCK_MONOTONIC. Returns
+// false iff a timeout occurred.
+bool
+_M_fu

  1   2   >