Re: [committed][testsuite] Re-enable pr94600-{1,3}.c tests for arm

2020-09-30 Thread Hans-Peter Nilsson
On Wed, 30 Sep 2020, Tom de Vries wrote:

> [ was: Re: [committed][testsuite] Require non_strict_align in
> pr94600-{1,3}.c ]
>
> On 9/30/20 4:53 AM, Hans-Peter Nilsson wrote:
> > On Thu, 24 Sep 2020, Tom de Vries wrote:
> >
> >> Hi,
> >>
> >> With the nvptx target, we run into:
> >> ...
> >> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(mem/v" 6
> >> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(set \\(mem/v" 6
> >> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(mem/v" 1
> >> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(set \\(mem/v" 1
> >> ...
> >> The scans attempt to check for volatile stores, but on nvptx we have memcpy
> >> instead.
> >>
> >> This is due to nvptx being a STRICT_ALIGNMENT target, which has the effect
> >> that the TYPE_MODE for the store target is set to BKLmode in
> >> compute_record_mode.
> >>
> >> Fix the FAILs by requiring effective target non_strict_align.
> >
> > No, that's wrong.  There's more than that at play; it worked for
> > the strict-alignment targets where it was tested at the time.
> >
>
> Hi,
>
> thanks for letting me know.
>
> > The test is a valuable canary for this kind of bug.  You now
> > disabled it for strict-alignment targets.
> >
> > Please revert and add your target specifier instead, if you
> > don't feel like investigating further.
>
> I've analyzed the compilation on strict-alignment target arm-eabi, and

An analysis should result in more than that statement.

> broadened the effective target to (non_strict_align ||
> pcc_bitfield_type_matters).

That's *also* not right.  I'm guessing your nvptx fails because
it has 64-bit alignment requirement, but no 32-bit writes.
...um, no that can't be it, nvptx seems to have them.  Costs?
Yes, probably your #define MOVE_RATIO(SPEED) 4.

The writes are to 32-bit aligned addresses which gcc can deduce
(also for strict-alignment targets) because it's a literal,
where it isn't explicitly declared to be attribute-aligned

You should have noticed the weirness in that you "only" needed
to tweak pr94600-1.c and -3.c, not even pr94600-2.c, which
should be the case if it was just the test-case getting the
predicates wrong.  This points at your MOVE_RATIO, together with
middle-end not applying it consistently for -2.c.

Again, please just skip for nvptx (don't mix-n-match general
predicates) unless you really look into the reason you don't get
6 single 32-bit-writes only in *some* of the cases.

brgds, H-P


Re: [PATCH] generalized range_query class for multiple contexts

2020-09-30 Thread Andrew MacLeod via Gcc-patches

On 9/25/20 1:41 PM, Andrew MacLeod via Gcc-patches wrote:

On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote:

On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
As part of the ranger work, we have been trying to clean up and 
generalize interfaces whenever possible. This not only helps in 
reducing the maintenance burden going forward, but provides 
mechanisms for backwards compatibility between ranger and other 
providers/users of ranges throughout the compiler like evrp and VRP.


One such interface is the range_query class in vr_values.h, which 
provides a range query mechanism for use in the 
simplify_using_ranges module.  With it, simplify_using_ranges can be 
used with the ranger, or the VRP twins by providing a 
get_value_range() method.  This has helped us in comparing apples to 
apples while doing our work, and has also future proofed the 
interface so that asking for a range can be done within the context 
in which it appeared.  For example, get_value_range now takes a 
gimple statement which provides context.  We are no longer tied to 
asking for a global SSA range, but can ask for the range of an SSA 
within a statement. Granted, this functionality is currently only in 
the ranger, but evrp/vrp could be adapted to pass such context.


The range_query is a good first step, but what we really want is a 
generic query mechanism that can ask for SSA ranges within an 
expression, a statement, an edge, or anything else that may come 
up.  We think that a generic mechanism can be used not only for 
range producers, but consumers such as the 
substitute_and_fold_engine (see get_value virtual) and possibly the 
gimple folder (see valueize).


The attached patchset provides such an interface.  It is meant to be 
a replacement for range_query that can be used for vr_values, 
substitute_and_fold, the subsitute_and_fold_engine, as well as the 
ranger.  The general API is:


class value_query
{
public:
   // Return the singleton expression for NAME at a gimple statement,
   // or NULL if none found.
   virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
   // Return the singleton expression for NAME at an edge, or NULL if
   // none found.
   virtual tree value_on_edge (edge, tree name);
   // Return the singleton expression for the LHS of a gimple
   // statement, assuming an (optional) initial value of NAME. Returns
   // NULL if none found.
   //
   // Note this method calculates the range the LHS would have *after*
   // the statement has executed.
   virtual tree value_of_stmt (gimple *, tree name = NULL);
};

class range_query : public value_query
{
public:
   range_query ();
   virtual ~range_query ();

   virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
   virtual tree value_on_edge (edge, tree name) OVERRIDE;
   virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;

   // These are the range equivalents of the value_* methods. Instead
   // of returning a singleton, they calculate a range and return it in
   // R.  TRUE is returned on success or FALSE if no range was found.
   virtual bool range_of_expr (irange , tree name, gimple * = 
NULL) = 0;

   virtual bool range_on_edge (irange , edge, tree name);
   virtual bool range_of_stmt (irange , gimple *, tree name = NULL);

   // DEPRECATED: This method is used from vr-values.  The plan is to
   // rewrite all uses of it to the above API.
   virtual const class value_range_equiv *get_value_range (const_tree,
   gimple * = NULL);
};

The duality of the API (value_of_* and range_on_*) is because some 
passes are interested in a singleton value 
(substitute_and_fold_enginge), while others are interested in ranges 
(vr_values).  Passes that are only interested in singletons can take 
a value_query, while passes that are interested in full ranges, can 
take a range_query.  Of course, for future proofing, we would 
recommend taking a range_query, since if you provide a default 
range_of_expr, sensible defaults will be provided for the others in 
terms of range_of_expr.


Note, that the absolute bare minimum that must be provided is a 
value_of_expr and a range_of_expr respectively.


One piece of the API which is missing is a method  to return the 
range of an arbitrary SSA_NAME *after* a statement. Currently 
range_of_expr calculates the range of an expression upon entry to 
the statement, whereas range_of_stmt calculates the range of *only* 
the LHS of a statement AFTER the statement has executed.


This would allow for complete representation of the ranges/values in 
something like:


 d_4 = *g_7;

Here the range of g_7 upon entry could be VARYING, but after the 
dereference we know it must be non-zero.  Well for sane targets anyhow.


Choices would be to:

   1) add a 4th method such as "range_after_stmt", or

   2) merge that functionality with the existing range_of_stmt 
method to provide "after" functionality for any ssa_name. Currently 
the SSA_NAME must be the 

libgo patch committed: Add 32-bit RISC-V support

2020-09-30 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Maciej W. Rozycki adds 32-bit RISC-V support.
Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian
a119b20263517656379c4833a3341031a6d58dc4
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 314ffd2efab..8d9fda54619 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-9e55baf44ab63ba06af0b57038e7b3aab8216222
+c9c084bce713e258721e12041a351ec8ad33ad17
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/configure.ac b/libgo/configure.ac
index abc58b87b53..f15f8d830bb 100644
--- a/libgo/configure.ac
+++ b/libgo/configure.ac
@@ -342,8 +342,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([
 [GOARCH=ppc64le],
 [GOARCH=ppc64])])
 ;;
-  riscv64-*-*)
-GOARCH=riscv64
+  riscv*-*-*)
+AC_COMPILE_IFELSE([AC_LANG_SOURCE([
+#if __riscv_xlen == 64
+#error 64-bit
+#endif
+])],
+[GOARCH=riscv],
+[GOARCH=riscv64])
 ;;
   s390*-*-*)
 AC_COMPILE_IFELSE([AC_LANG_SOURCE([
diff --git a/libgo/go/cmd/cgo/main.go b/libgo/go/cmd/cgo/main.go
index 80f35681d75..6de6d69ce6c 100644
--- a/libgo/go/cmd/cgo/main.go
+++ b/libgo/go/cmd/cgo/main.go
@@ -184,6 +184,7 @@ var ptrSizeMap = map[string]int64{
"ppc": 4,
"ppc64":   8,
"ppc64le": 8,
+   "riscv":   4,
"riscv64": 8,
"s390":4,
"s390x":   8,
@@ -210,6 +211,7 @@ var intSizeMap = map[string]int64{
"ppc": 4,
"ppc64":   8,
"ppc64le": 8,
+   "riscv":   4,
"riscv64": 8,
"s390":4,
"s390x":   8,
diff --git a/libgo/go/cmd/go/testdata/script/link_syso_issue33139.txt 
b/libgo/go/cmd/go/testdata/script/link_syso_issue33139.txt
index 46b0ef4200e..3030ee924ff 100644
--- a/libgo/go/cmd/go/testdata/script/link_syso_issue33139.txt
+++ b/libgo/go/cmd/go/testdata/script/link_syso_issue33139.txt
@@ -8,8 +8,9 @@
 # See: https://github.com/golang/go/issues/8912
 [linux] [ppc64] skip
 
-# External linking is not supported on linux/riscv64.
+# External linking is not supported on linux/riscv, linux/riscv64.
 # See: https://github.com/golang/go/issues/36739
+[linux] [riscv] skip
 [linux] [riscv64] skip
 
 cc -c -o syso/objTestImpl.syso syso/src/objTestImpl.c
diff --git a/libgo/go/cmd/internal/sys/arch.go 
b/libgo/go/cmd/internal/sys/arch.go
index e8687363def..60a3b3c8ecd 100644
--- a/libgo/go/cmd/internal/sys/arch.go
+++ b/libgo/go/cmd/internal/sys/arch.go
@@ -19,6 +19,7 @@ const (
MIPS
MIPS64
PPC64
+   RISCV
RISCV64
S390X
Wasm
@@ -143,6 +144,15 @@ var ArchPPC64LE = {
MinLC: 4,
 }
 
+var ArchRISCV = {
+   Name:  "riscv",
+   Family:RISCV,
+   ByteOrder: binary.LittleEndian,
+   PtrSize:   4,
+   RegSize:   4,
+   MinLC: 4,
+}
+
 var ArchRISCV64 = {
Name:  "riscv64",
Family:RISCV64,
@@ -181,6 +191,7 @@ var Archs = [...]*Arch{
ArchMIPS64LE,
ArchPPC64,
ArchPPC64LE,
+   ArchRISCV,
ArchRISCV64,
ArchS390X,
ArchWasm,
diff --git a/libgo/go/debug/elf/file.go b/libgo/go/debug/elf/file.go
index b9a8b1e0cbb..48178d480d7 100644
--- a/libgo/go/debug/elf/file.go
+++ b/libgo/go/debug/elf/file.go
@@ -617,6 +617,8 @@ func (f *File) applyRelocations(dst []byte, rels []byte) 
error {
return f.applyRelocationsMIPS(dst, rels)
case f.Class == ELFCLASS64 && f.Machine == EM_MIPS:
return f.applyRelocationsMIPS64(dst, rels)
+   case f.Class == ELFCLASS32 && f.Machine == EM_RISCV:
+   return f.applyRelocationsRISCV(dst, rels)
case f.Class == ELFCLASS64 && f.Machine == EM_RISCV:
return f.applyRelocationsRISCV64(dst, rels)
case f.Class == ELFCLASS64 && f.Machine == EM_S390:
@@ -1008,6 +1010,47 @@ func (f *File) applyRelocationsMIPS64(dst []byte, rels 
[]byte) error {
return nil
 }
 
+func (f *File) applyRelocationsRISCV(dst []byte, rels []byte) error {
+   // 12 is the size of Rela32.
+   if len(rels)%12 != 0 {
+   return errors.New("length of relocation section is not a 
multiple of 12")
+   }
+
+   symbols, _, err := f.getSymbols(SHT_SYMTAB)
+   if err != nil {
+   return err
+   }
+
+   b := bytes.NewReader(rels)
+   var rela Rela32
+
+   for b.Len() > 0 {
+   binary.Read(b, f.ByteOrder, )
+   symNo := rela.Info >> 8
+   t := R_RISCV(rela.Info & 0xff)
+
+   if symNo == 0 || symNo > uint32(len(symbols)) {
+   continue
+   }
+   sym := [symNo-1]
+   needed, val := relocSymbolTargetOK(sym)
+   if !needed {
+   continue
+   }
+
+   switch t {
+   case R_RISCV_32:
+   

Re: [RS6000] Adjust gcc asm for power10

2020-09-30 Thread Segher Boessenkool
Hi Alan,

On Thu, Oct 01, 2020 at 08:49:44AM +0930, Alan Modra wrote:
> On Wed, Sep 30, 2020 at 05:36:08PM -0500, Segher Boessenkool wrote:
> > On Wed, Sep 30, 2020 at 05:06:57PM +0930, Alan Modra wrote:
> > > Generate assembly that is .localentry 1 with @notoc calls to match.
> > 
> > What is the purpose of this?  Non-obvious patchexs without any
> > explanation like that cost needless extra time to review :-/
> > 
> > "Support __PCREL__ code." suggests that it did not even build before?
> > Or did not work?  Or is this just a perfomance improvement?
> 
> Sorry, I sometimes credit you with super-human powers.  It's a
> performance improvement for libgcc.a.  Calling between functions that
> advertise as using the TOC and those that don't, will require linker
> call stubs.

Thanks for the explanation!


Segher


[PATCH] issue -Wstring-compare in more case (PR 95673)

2020-09-30 Thread Martin Sebor via Gcc-patches

-Wstring-compare triggers under the same strict conditions as
the strcmp/strncmp call is folded into a constant: only when
all the uses of the result are [in]equality expressions with
zero.  However, even when the call cannot be folded into
a constant because the result is in addition used in other
expressions besides equality to zero, GCC still sets the range
of the result to nonzero.  So in more complex functions where
some of the uses of the same result are in tests for equality
to zero and others in other expressions, the warning fails to
point out the very mistake it's designed to detect.

The attached change enhances the function that determines how
the strcmp/strncmp is used to also make it possible to detect
the mistakes in the multi-use situations.

Tested on x86_64-linux & by building Glibc and Binutils/GDB
and confirming it triggers no new warnings.

Martin
PR middle-end/95673 - missing -Wstring-compare for an impossible strncmp test

gcc/ChangeLog:

	PR middle-end/95673
	* tree-ssa-strlen.c (used_only_for_zero_equality): Rename...
	(use_in_zero_equality): ...to this.  Add a default argument.
	(handle_builtin_memcmp): Adjust to the name change above.
	(handle_builtin_string_cmp): Same.
	(maybe_warn_pointless_strcmp): Same.  Pass in an explicit argument.

gcc/testsuite/ChangeLog:

	PR middle-end/95673
	* gcc.dg/Wstring-compare-3.c: New test.

diff --git a/gcc/testsuite/gcc.dg/Wstring-compare-3.c b/gcc/testsuite/gcc.dg/Wstring-compare-3.c
new file mode 100644
index 000..d4d7121dba7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstring-compare-3.c
@@ -0,0 +1,106 @@
+/* PR middle-end/95673 - missing -Wstring-compare for an impossible strncmp test
+   { dg-do compile }
+   { dg-options "-O2 -Wall -Wstring-compare -ftrack-macro-expansion=0" } */
+
+typedef __SIZE_TYPE__ size_t;
+
+extern int strcmp (const char*, const char*);
+extern int strncmp (const char*, const char*, size_t);
+
+void sink (int, ...);
+
+extern char a3[3];
+
+int nowarn_strcmp_one_use_ltz (int c)
+{
+  const char *s = c ? "1234" : a3;
+  int n = strcmp (s, "123");
+  return n < 0;
+}
+
+
+int nowarn_strcmp_one_use_eqnz (int c)
+{
+  const char *s = c ? "12345" : a3;
+  int n = strcmp (s, "123");
+  return n == 1;
+}
+
+
+int warn_strcmp_one_use_eqz (int c)
+{
+  const char *s = c ? "123456" : a3;
+  int n = strcmp (s, "123");// { dg-warning "'strcmp' of a string of length 3 and an array of size 3 evaluates to nonzero" }
+  return n == 0;// { dg-message "in this expression" }
+}
+
+
+int warn_strcmp_one_use_bang (int c)
+{
+  const char *s = c ? "1234567" : a3;
+  int n = strcmp (s, "123");// { dg-warning "'strcmp' of a string of length 3 and an array of size 3 evaluates to nonzero" }
+  return !n;// { dg-message "in this expression" }
+}
+
+
+int warn_strcmp_one_use_bang_bang (int c)
+{
+  const char *s = c ? "12345678" : a3;
+  int n = strcmp (s, "123");// { dg-warning "'strcmp' of a string of length 3 and an array of size 3 evaluates to nonzero" }
+  return !!n;   // { dg-message "in this expression" }
+}
+
+
+_Bool warn_one_use_bool (int c)
+{
+  const char *s = c ? "123456789" : a3;
+  int n = strcmp (s, "123");// { dg-warning "'strcmp' of a string of length 3 and an array of size 3 evaluates to nonzero" }
+  return (_Bool)n;  // { dg-message "in this expression" }
+}
+
+
+int warn_strcmp_one_use_cond (int c)
+{
+  const char *s = c ? "1234567890" : a3;
+  int n = strcmp (s, "123");// { dg-warning "'strcmp' of a string of length 3 and an array of size 3 evaluates to nonzero" }
+  return n ? 3 : 5; // { dg-message "in this expression" }
+}
+
+
+int nowarn_strcmp_multiple_uses (int c)
+{
+  const char *s = c ? "1234" : a3;
+  int n = strcmp (s, "123");
+  sink (n < 0);
+  sink (n > 0);
+  sink (n <= 0);
+  sink (n >= 0);
+  sink (n + 1);
+  return n;
+}
+
+
+int warn_strcmp_multiple_uses (int c)
+{
+  const char *s = c ? "12345" : a3;
+  int n = strcmp (s, "123");// { dg-warning "'strcmp' of a string of length 3 and an array of size 3 evaluates to nonzero" }
+  sink (n < 0);
+  sink (n > 0);
+  sink (n <= 0);
+  sink (n >= 0);
+  sink (n == 0);// { dg-message "in this expression" }
+  return n;
+}
+
+
+int warn_strncmp_multiple_uses (int c)
+{
+  const char *s = a3;
+  int n = strncmp (s, "1234", 4); // { dg-warning "'strncmp' of a string of length 4, an array of size 3 and bound of 4 evaluates to nonzero" }
+  sink (n < 0);
+  sink (n > 0);
+  sink (n <= 0);
+  sink (n >= 0);
+  sink (n == 0);// { dg-message "in this expression" }
+  return n;
+}
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 47f537ab210..936b39577b8 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -3982,11 +3982,13 @@ handle_builtin_memset (gimple_stmt_iterator *gsi, bool *zero_write,
   return true;
 }
 
-/* Return a pointer to the first such equality expression if RES is used
-   only in 

Re: [RS6000] Adjust gcc asm for power10

2020-09-30 Thread Alan Modra via Gcc-patches
On Wed, Sep 30, 2020 at 05:36:08PM -0500, Segher Boessenkool wrote:
> On Wed, Sep 30, 2020 at 05:06:57PM +0930, Alan Modra wrote:
> > Generate assembly that is .localentry 1 with @notoc calls to match.
> 
> What is the purpose of this?  Non-obvious patchexs without any
> explanation like that cost needless extra time to review :-/
> 
> "Support __PCREL__ code." suggests that it did not even build before?
> Or did not work?  Or is this just a perfomance improvement?

Sorry, I sometimes credit you with super-human powers.  It's a
performance improvement for libgcc.a.  Calling between functions that
advertise as using the TOC and those that don't, will require linker
call stubs.

To recap, a function that uses a TOC pointer advertises that fact by a
value of 2 or larger in the symbol st_other localentry bits.  A call
advertises that it is from a function that needs to preserve r2 by
using an R_PPC64_REL24 reloc on the call, a function that doesn't have
a valid TOC pointer uses R_PPC64_REL24_NOTOC.

Note that the extra stubs I'm talking about are in statically linked
code.  Calls to shared library functions have no extra overhead due to
mis-matched toc/notoc code.  Those calls need a plt call stub anyway.
Also, indirect calls are not affected.

> > gcc/
> > * config/rs6000/ppc-asm.h: Support __PCREL__ code.
> > libgcc/
> > * config/rs6000/morestack.S,
> > * config/rs6000/tramp.S,
> > * config/powerpc/sjlj.S: Support __PCREL__ code.
> 
> The patch does look fine.  Okay for trunk (and backports if those are
> wanted; discuss with Bill I guess).  Thanks!
> 
> (But please explain the purpose of this, in the commit message if that
> makes sense.)
> 
> 
> Segher

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] Adjust gcc asm for power10

2020-09-30 Thread Segher Boessenkool
On Wed, Sep 30, 2020 at 05:06:57PM +0930, Alan Modra wrote:
> Generate assembly that is .localentry 1 with @notoc calls to match.

What is the purpose of this?  Non-obvious patchexs without any
explanation like that cost needless extra time to review :-/

"Support __PCREL__ code." suggests that it did not even build before?
Or did not work?  Or is this just a perfomance improvement?

> gcc/
>   * config/rs6000/ppc-asm.h: Support __PCREL__ code.
> libgcc/
>   * config/rs6000/morestack.S,
>   * config/rs6000/tramp.S,
>   * config/powerpc/sjlj.S: Support __PCREL__ code.

The patch does look fine.  Okay for trunk (and backports if those are
wanted; discuss with Bill I guess).  Thanks!

(But please explain the purpose of this, in the commit message if that
makes sense.)


Segher


[PING #2][PATCH] use get_size_range to get allocated size (PR 92942)

2020-09-30 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552903.html

(I lost track of this patch.)

On 9/9/20 3:42 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552903.html

On 8/28/20 11:12 AM, Martin Sebor wrote:

The gimple_call_alloc_size() function that determines the range
of sizes of allocated objects and constrains the bounds in calls
to functions like memcpy calls get_range() instead of
get_size_range() to obtain its result.  The latter is the right
function to call because it has the necessary logic to constrain
the range to just the values that are valid for object sizes.
This is especially useful when the range is the result of
a conversion from a signed to a wider unsigned integer where
the upper subrange is excessive and can be eliminated such as in:

   char* f (int n)
   {
 if (n > 8)
   n = 8;
 char *p = malloc (n);
 strcpy (p, "0123456789");   // buffer overflow
 ...
   }

Attached is a fix that lets -Wstringop-overflow diagnose the buffer
overflow above.  Besides with GCC I have also tested the change by
building Binutils/GDB and Glibc and verifying that it doesn't
introduce any false positives.

Martin






Re: [PATCH] c++: CTAD and explicit deduction guides for copy-list-init [PR90210]

2020-09-30 Thread Jason Merrill via Gcc-patches

On 9/19/20 5:33 PM, Marek Polacek wrote:

This PR points out that we accept

   template struct tuple { tuple(T); }; // #1
   template explicit tuple(T t) -> tuple; // #2
   tuple t = { 1 };

despite the 'explicit' deduction guide in a copy-list-initialization
context.  That's because in deduction_guides_for we first find the
user-defined deduction guide (#2), and then ctor_deduction_guides_for
creates artificial deduction guides: one from the tuple(T) constructor and
a copy guide.  So we end up with these three guides:

   (1) template tuple(T) -> tuple [DECL_NONCONVERTING_P]
   (2) template tuple(tuple) -> tuple
   (3) template tuple(T) -> tuple

Then, in do_class_deduction, we prune this set, and get rid of (1).
Then overload resolution selects (3) and we succeed.

But [over.match.list]p1 says "In copy-list-initialization, if an explicit
constructor is chosen, the initialization is ill-formed."  It also goes
on to say that this differs from other situations where only converting
constructors are considered for copy-initialization.  Therefore for
list-initialization we consider explicit constructors and complain if one
is chosen.  E.g. convert_like_internal/ck_user can give an error.

So my logic runs that we should not prune the deduction_guides_for guides
in a copy-list-initialization context, and only complain if we actually
choose an explicit deduction guide.  This matches clang++/EDG/msvc++.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/90210
* pt.c (do_class_deduction): Don't prune explicit deduction guides
in copy-list-initialization.  In copy-list-initialization, if an
explicit deduction guide was selected, give an error.

gcc/testsuite/ChangeLog:

PR c++/90210
* g++.dg/cpp1z/class-deduction73.C: New test.
---
  gcc/cp/pt.c   | 49 ++-
  .../g++.dg/cpp1z/class-deduction73.C  | 41 
  2 files changed, 79 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction73.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cfe5ff4a94f..9bcb743dc1d 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28929,6 +28929,7 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
tree type = TREE_TYPE (tmpl);
  
bool try_list_ctor = false;

+  bool list_init_p = false;
  
releasing_vec rv_args = NULL;

vec * = *_args;
@@ -28936,6 +28937,7 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
  args = make_tree_vector ();
else if (BRACE_ENCLOSED_INITIALIZER_P (init))
  {
+  list_init_p = true;
try_list_ctor = TYPE_HAS_LIST_CTOR (type);
if (try_list_ctor && CONSTRUCTOR_NELTS (init) == 1)
{
@@ -28967,9 +28969,10 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
if (cands == error_mark_node)
  return error_mark_node;
  
-  /* Prune explicit deduction guides in copy-initialization context.  */

+  /* Prune explicit deduction guides in copy-initialization context (but
+ not copy-list-initialization).  */
bool elided = false;
-  if (flags & LOOKUP_ONLYCONVERTING)
+  if (!list_init_p && (flags & LOOKUP_ONLYCONVERTING))
  {
for (lkp_iterator iter (cands); !elided && iter; ++iter)
if (DECL_NONCONVERTING_P (STRIP_TEMPLATE (*iter)))
@@ -29038,18 +29041,42 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
--cp_unevaluated_operand;
  }
  
-  if (call == error_mark_node

-  && (complain & tf_warning_or_error))
+  if (call == error_mark_node)
  {
-  error ("class template argument deduction failed:");
+  if (complain & tf_warning_or_error)
+   {
+ error ("class template argument deduction failed:");
  
-  ++cp_unevaluated_operand;

-  call = build_new_function_call (cands, , complain | tf_decltype);
-  --cp_unevaluated_operand;
+ ++cp_unevaluated_operand;
+ call = build_new_function_call (cands, ,
+ complain | tf_decltype);
+ --cp_unevaluated_operand;
  
-  if (elided)

-   inform (input_location, "explicit deduction guides not considered "
-   "for copy-initialization");
+ if (elided)
+   inform (input_location, "explicit deduction guides not considered "
+   "for copy-initialization");
+   }
+  return error_mark_node;
+}
+  /* [over.match.list]/1: In copy-list-initialization, if an explicit
+ constructor is chosen, the initialization is ill-formed.  */
+  else if (flags & LOOKUP_ONLYCONVERTING)
+{
+  tree fndecl = cp_get_callee_fndecl_nofold (call);
+  if (fndecl && DECL_NONCONVERTING_P (fndecl))
+   {
+ if (complain & tf_warning_or_error)
+   {
+ // TODO: Pass down location from cp_finish_decl.
+ error ("class template argument deduction for %qT failed: "
+"explicit 

Re: [PATCH] c++: ICE in dependent_type_p with constrained auto [PR97052]

2020-09-30 Thread Jason Merrill via Gcc-patches

On 9/29/20 5:01 PM, Patrick Palka wrote:

This patch fixes an "unguarded" call to coerce_template_parms in
build_standard_check: processing_template_decl could be zero if we
we get here during processing of the first 'auto' parameter of an
abbreviated function template.  In the testcase below, this leads to an
ICE when coerce_template_parms substitutes into C's dependent default
template argument.

Bootstrapped and regtested on x86_64-pc-linux-gnu and tested by building
cmcstl2 and range-v3.  Does this look OK for trunk?


This looks OK, but is there a place higher in the call stack where we 
should have already set processing_template_decl?



gcc/cp/ChangeLog:

PR c++/97052
* constraint.cc (build_standard_check): Temporarily increment
processing_template_decl when calling coerce_template_parms.

gcc/testsuite/ChangeLog:

PR c++/97052
* g++.dg/cpp2a/concepts-defarg2: New test.
---
  gcc/cp/constraint.cc  | 2 ++
  gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C | 9 +
  2 files changed, 11 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index d49957a6c4a..da3b2cc7e65 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1355,7 +1355,9 @@ build_standard_check (tree tmpl, tree args, 
tsubst_flags_t complain)
gcc_assert (standard_concept_p (tmpl));
gcc_assert (TREE_CODE (tmpl) == TEMPLATE_DECL);
tree parms = INNERMOST_TEMPLATE_PARMS (DECL_TEMPLATE_PARMS (tmpl));
+  ++processing_template_decl;
args = coerce_template_parms (parms, args, tmpl, complain);
+  --processing_template_decl;
if (args == error_mark_node)
  return error_mark_node;
return build2 (TEMPLATE_ID_EXPR, boolean_type_node, tmpl, args);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C
new file mode 100644
index 000..6c0670e9fd2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C
@@ -0,0 +1,9 @@
+// PR c++/97052
+// { dg-do compile { target c++20 } }
+
+template
+concept C = true;
+
+bool f(C auto) {
+  return true;
+}





Re: [RS6000] -mno-minimal-toc vs. power10 pcrelative

2020-09-30 Thread Segher Boessenkool
Hi!

On Wed, Sep 30, 2020 at 05:01:45PM +0930, Alan Modra wrote:
>   * config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't
>   set -mcmodel=small for -mno-minimal-toc when pcrel.

> -   SET_CMODEL (CMODEL_SMALL);\
> +   if (TARGET_MINIMAL_TOC\
> +   || !(TARGET_PCREL \
> +|| (PCREL_SUPPORTED_BY_OS\
> +&& (rs6000_isa_flags_explicit\
> +& OPTION_MASK_PCREL) == 0))) \
> + SET_CMODEL (CMODEL_SMALL);  \

Please write this in a more readable way?  With some "else" statements,
perhaps.

It is also fine to SET_CMODEL twice if that makes for simpler code.

The rest looks fine, fwiw.


Segher


Re: [committed] libstdc++: Use __libc_single_threaded to optimise atomics [PR 96817]

2020-09-30 Thread Jonathan Wakely via Gcc-patches

On 30/09/20 16:03 +0100, Jonathan Wakely wrote:

On 29/09/20 13:51 +0200, Christophe Lyon via Libstdc++ wrote:

On Sat, 26 Sep 2020 at 21:42, Jonathan Wakely via Gcc-patches
 wrote:


Glibc 2.32 adds a global variable that says whether the process is
single-threaded. We can use this to decide whether to elide atomic
operations, as a more precise and reliable indicator than
__gthread_active_p.

This means that guard variables for statics and reference counting in
shared_ptr can use less expensive, non-atomic ops even in processes that
are linked to libpthread, as long as no threads have been created yet.
It also means that we switch to using atomics if libpthread gets loaded
later via dlopen (this still isn't supported in general, for other
reasons).

We can't use __libc_single_threaded to replace __gthread_active_p
everywhere. If we replaced the uses of __gthread_active_p in std::mutex
then we would elide the pthread_mutex_lock in the code below, but not
the pthread_mutex_unlock:

 std::mutex m;
 m.lock();// pthread_mutex_lock
 std::thread t([]{}); // __libc_single_threaded = false
 t.join();
 m.unlock();  // pthread_mutex_unlock

We need the lock and unlock to use the same "is threading enabled"
predicate, and similarly for init/destroy pairs for mutexes and
condition variables, so that we don't try to release resources that were
never acquired.

There are other places that could use __libc_single_threaded, such as
_Sp_locker in src/c++11/shared_ptr.cc and locale init functions, but
they can be changed later.

libstdc++-v3/ChangeLog:

   PR libstdc++/96817
   * include/ext/atomicity.h (__gnu_cxx::__is_single_threaded()):
   New function wrapping __libc_single_threaded if available.
   (__exchange_and_add_dispatch, __atomic_add_dispatch): Use it.
   * libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_abort)
   (__cxa_guard_release): Likewise.
   * testsuite/18_support/96817.cc: New test.

Tested powerpc64le-linux, with glibc 2.31 and 2.32. Committed to trunk.


Hi,

This patch introduced regressions on armeb-linux-gnueabhf:
--target armeb-none-linux-gnueabihf --with-cpu cortex-a9
  g++.dg/compat/init/init-ref2 cp_compat_x_tst.o-cp_compat_y_tst.o execute
  g++.dg/cpp2a/decomp1.C  -std=gnu++14 execution test
  g++.dg/cpp2a/decomp1.C  -std=gnu++17 execution test
  g++.dg/cpp2a/decomp1.C  -std=gnu++2a execution test
  g++.dg/init/init-ref2.C  -std=c++14 execution test
  g++.dg/init/init-ref2.C  -std=c++17 execution test
  g++.dg/init/init-ref2.C  -std=c++2a execution test
  g++.dg/init/init-ref2.C  -std=c++98 execution test
  g++.dg/init/ref15.C  -std=c++14 execution test
  g++.dg/init/ref15.C  -std=c++17 execution test
  g++.dg/init/ref15.C  -std=c++2a execution test
  g++.dg/init/ref15.C  -std=c++98 execution test
  g++.old-deja/g++.jason/pmf7.C  -std=c++98 execution test
  g++.old-deja/g++.mike/leak1.C  -std=c++14 execution test
  g++.old-deja/g++.mike/leak1.C  -std=c++17 execution test
  g++.old-deja/g++.mike/leak1.C  -std=c++2a execution test
  g++.old-deja/g++.mike/leak1.C  -std=c++98 execution test
  g++.old-deja/g++.other/init19.C  -std=c++14 execution test
  g++.old-deja/g++.other/init19.C  -std=c++17 execution test
  g++.old-deja/g++.other/init19.C  -std=c++2a execution test
  g++.old-deja/g++.other/init19.C  -std=c++98 execution test

and probably some (280) in libstdc++ tests: (I didn't bisect those):
  19_diagnostics/error_category/generic_category.cc execution test
  19_diagnostics/error_category/system_category.cc execution test
  20_util/scoped_allocator/1.cc execution test
  20_util/scoped_allocator/2.cc execution test
  20_util/scoped_allocator/construct_pair_c++2a.cc execution test
  20_util/to_address/debug.cc execution test
  20_util/variant/run.cc execution test


I think this is a latent bug in the static initialization code for
EABI that affects big endian. In libstdc++-v3/libsupc++/guard.cc we
have:

# ifndef _GLIBCXX_GUARD_TEST_AND_ACQUIRE

// Test the guard variable with a memory load with
// acquire semantics.

inline bool
__test_and_acquire (__cxxabiv1::__guard *g)
{
 unsigned char __c;
 unsigned char *__p = reinterpret_cast(g);
 __atomic_load (__p, &__c,  __ATOMIC_ACQUIRE);
 (void) __p;
 return _GLIBCXX_GUARD_TEST(&__c);
}
#  define _GLIBCXX_GUARD_TEST_AND_ACQUIRE(G) __test_and_acquire (G)
# endif

That inspects the first byte of the guard variable. But for EABI the
"is initialized" bit is the least significant bit of the guard
variable. For little endian that's fine, the least significant bit is
in the first byte. But for big endian, it's not in the first byte, so
we are looking in the wrong place. This means that the initial check
in __cxa_guard_acquire is wrong:

 extern "C"
 int __cxa_guard_acquire (__guard *g)
 {
#ifdef __GTHREADS
   // If the target can reorder loads, we need to insert a read memory
   // barrier so that accesses to the guarded variable happen after the
   // guard test.
   if 

Re: [PATCH v2] c++: Fix up default initialization with consteval default ctor [PR96994]

2020-09-30 Thread Jason Merrill via Gcc-patches

On 9/30/20 3:57 AM, Jakub Jelinek wrote:

On Fri, Sep 25, 2020 at 04:30:26PM -0400, Jason Merrill via Gcc-patches wrote:

On 9/15/20 3:57 AM, Jakub Jelinek wrote:

The following testcase is miscompiled (in particular the a and i
initialization).  The problem is that build_special_member_call due to
the immediate constructors (but not evaluated in constant expression mode)
doesn't create a CALL_EXPR, but returns a TARGET_EXPR with CONSTRUCTOR
as the initializer for it,


That seems like the bug; at the end of build_over_call, after you


call = cxx_constant_value (call, obj_arg);


You need to build an INIT_EXPR if obj_arg isn't a dummy.


That works.  obj_arg is NULL if it is a dummy from the earlier code.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?1


OK.


2020-09-30  Jakub Jelinek  

PR c++/96994
* call.c (build_over_call): If obj_arg is non-NULL, return INIT_EXPR
setting obj_arg to call.

* g++.dg/cpp2a/consteval18.C: New test.

--- gcc/cp/call.c.jj2020-09-10 15:52:50.688207138 +0200
+++ gcc/cp/call.c   2020-09-29 20:39:55.003361651 +0200
@@ -9200,6 +9200,8 @@ build_over_call (struct z_candidate *can
}
}
  call = cxx_constant_value (call, obj_arg);
+ if (obj_arg && !error_operand_p (call))
+   call = build2 (INIT_EXPR, void_type_node, obj_arg, call);
}
  }
return call;
--- gcc/testsuite/g++.dg/cpp2a/consteval18.C.jj 2020-09-29 20:33:56.533596845 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/consteval18.C2020-09-29 20:33:56.533596845 
+0200
@@ -0,0 +1,26 @@
+// PR c++/96994
+// { dg-do run { target c++20 } }
+
+struct A { consteval A () { i = 1; } consteval A (int x) : i (x) {} int i = 0; 
};
+struct B { constexpr B () { i = 1; } constexpr B (int x) : i (x) {} int i = 0; 
};
+A const a;
+constexpr A b;
+B const c;
+A const constinit d;
+A const e = 2;
+constexpr A f = 3;
+B const g = 4;
+A const constinit h = 5;
+A i;
+B j;
+A k = 6;
+B l = 7;
+static_assert (b.i == 1 && f.i == 3);
+
+int
+main()
+{
+  if (a.i != 1 || c.i != 1 || d.i != 1 || e.i != 2 || g.i != 4 || h.i != 5
+  || i.i != 1 || j.i != 1 || k.i != 6 || l.i != 7)
+__builtin_abort ();
+}


Jakub





Re: [PATCH] c++: Handle std::construct_at on automatic vars during constant evaluation [PR97195]

2020-09-30 Thread Jason Merrill via Gcc-patches

On 9/30/20 4:01 AM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, we only support due to a bug in constant expressions
std::construct_at on non-automatic variables, because we VERIFY_CONSTANT the
second argument of placement new, which fails verification if it is an
address of an automatic variable.
The following patch fixes it by not performing that verification, the
placement new evaluation later on will verify it after it is dereferenced.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2020-09-30  Jakub Jelinek  

PR c++/97195
* constexpr.c (cxx_eval_call_expression): Don't VERIFY_CONSTANT the
second argument.

* g++.dg/cpp2a/constexpr-new14.C: New test.

--- gcc/cp/constexpr.c.jj   2020-09-22 21:08:01.993199681 +0200
+++ gcc/cp/constexpr.c  2020-09-29 18:37:09.517051012 +0200
@@ -2342,9 +2342,10 @@ cxx_eval_call_expression (const constexp
  tree arg = CALL_EXPR_ARG (t, i);
  arg = cxx_eval_constant_expression (ctx, arg, false,
  non_constant_p, overflow_p);
- VERIFY_CONSTANT (arg);
  if (i == 1)
arg1 = arg;
+ else
+   VERIFY_CONSTANT (arg);
}
  gcc_assert (arg1);
  return arg1;
--- gcc/testsuite/g++.dg/cpp2a/constexpr-new14.C.jj 2020-09-29 
18:40:52.834785887 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-new14.C2020-09-29 
18:40:47.707860852 +0200
@@ -0,0 +1,73 @@
+// PR c++/97195
+// { dg-do compile { target c++20 } }
+
+namespace std
+{
+  typedef __SIZE_TYPE__ size_t;
+
+  template 
+  struct allocator
+  {
+constexpr allocator () noexcept {}
+
+constexpr T *allocate (size_t n)
+{ return static_cast (::operator new (n * sizeof(T))); }
+
+constexpr void
+deallocate (T *p, size_t n)
+{ ::operator delete (p); }
+  };
+
+  template 
+  U __declval (int);
+  template 
+  T __declval (long);
+  template 
+  auto declval () noexcept -> decltype (__declval (0));
+
+  template 
+  struct remove_reference
+  { typedef T type; };
+  template 
+  struct remove_reference
+  { typedef T type; };
+  template 
+  struct remove_reference
+  { typedef T type; };
+
+  template 
+  constexpr T &&
+  forward (typename std::remove_reference::type ) noexcept
+  { return static_cast (t); }
+
+  template
+  constexpr T &&
+  forward (typename std::remove_reference::type &) noexcept
+  { return static_cast (t); }
+
+  template 
+  constexpr auto
+  construct_at (T *l, A &&... a)
+  noexcept (noexcept (::new ((void *) 0) T (std::declval ()...)))
+  -> decltype (::new ((void *) 0) T (std::declval ()...))
+  { return ::new ((void *) l) T (std::forward (a)...); }
+
+  template 
+  constexpr inline void
+  destroy_at (T *l)
+  { l->~T (); }
+}
+
+inline void *operator new (std::size_t, void *p) noexcept
+{ return p; }
+
+constexpr bool
+foo ()
+{
+  int a = 5;
+  int *p = std::construct_at (, -1);
+  if (p[0] != -1)
+throw 1;
+  return true;
+}
+constexpr bool b = foo ();

Jakub





Re: [PATCH v2] builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-09-30 Thread Segher Boessenkool
On Wed, Sep 30, 2020 at 09:02:34AM +0200, Richard Biener wrote:
> On Tue, 29 Sep 2020, Segher Boessenkool wrote:
> > I don't see much about optabs in the docs either.  Add some text to
> > optabs.def itself then?
> 
> All optabs are documented in doc/md.texi as 'instruction patterns'

Except for what seems to be the majority that isn't.

> This is where new optabs need to be documented.

It's going to be challenging to find a reasonable spot in there.
Oh well.

Thanks,


Segher


Re: [PATCH] arm: subdivide the type attribute "alu_shfit_imm"

2020-09-30 Thread Richard Sandiford via Gcc-patches
Thanks for the patch and sorry for the slow reply.

Must admit that I hadn't realised that we'd quite that many
autodetect_types, sorry.  Obviously the operand numbering is a lot
less regular in arm than in aarch64. :-)  The approach still seems
reasonable to me though, and the patch generally looks really good.

Qian Jianhua  writes:
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index dbc6b1db176..12418f42ee5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -2447,7 +2447,7 @@
> (match_operand:GPI 3 "register_operand" "r")))]
>""
>"add\\t%0, %3, %1,  %2"
> -  [(set_attr "type" "alu_shift_imm")]
> +  [(set_attr "autodetect_type" "alu_shift_operator")]
>  )

The full pattern is:

(define_insn "*add__"
  [(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
  (match_operand:QI 2 "aarch64_shift_imm_" 
"n"))
  (match_operand:GPI 3 "register_operand" "r")))]
  ""
  "add\\t%0, %3, %1,  %2"
  [(set_attr "autodetect_type" "alu_shift_operator")]
)

so I think in this case it would be better to have:

  alu_shift__op2

and define alu_shift_lsr_op2 and alu_shift_asr_op2 autodetect_types that
always map to alu_shift_imm_other.

I think all of the aarch64.md uses would then also be:

  alu_shift__op2

> @@ -1370,7 +1371,8 @@
> (set_attr "arch" "32,a")
> (set_attr "shift" "3")
> (set_attr "predicable" "yes")
> -   (set_attr "type" "alu_shift_imm,alu_shift_reg")]
> +   (set_attr "autodetect_type" "alu_shift_operator2,none")
> +   (set_attr "type" "*,alu_shift_reg")]
>  )
>  
>  (define_insn "*addsi3_carryin_clobercc"

I guess here we have the option of using just:

  (set_attr "autodetect_type" "alu_shift_operator2")

We can then make alu_shift_operator2 detect shifts by registers too.
It looked like this could simplify some of the other patterns too.

Neither way's obviously better than the other, just mentioning it
as a suggestion.

> @@ -9501,7 +9509,7 @@
>[(set_attr "predicable" "yes")
> (set_attr "shift" "2")
> (set_attr "arch" "a,t2")
> -   (set_attr "type" "alu_shift_imm")])
> +   (set_attr "autodetect_type" "alu_shift_lsl_op3")])

The pattern here is:

(define_insn "*_multsi"
  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
(SHIFTABLE_OPS:SI
 (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
  (match_operand:SI 3 "power_of_two_operand" ""))
 (match_operand:SI 1 "s_register_operand" "rk,")))]
  "TARGET_32BIT"
  "%?\\t%0, %1, %2, lsl %b3"
  [(set_attr "predicable" "yes")
   (set_attr "shift" "2")
   (set_attr "arch" "a,t2")
   (set_attr "autodetect_type" "alu_shift_lsl_op3")])

so I think alu_shift_mul_op3 would be a better name.

(By rights this pattern should never match, since the mult should
be converted to a shift.  But fixing that would be feature creep. :-))

> diff --git a/gcc/config/arm/common.md b/gcc/config/arm/common.md
> new file mode 100644
> index 000..1a5da834d61
> --- /dev/null
> +++ b/gcc/config/arm/common.md
> @@ -0,0 +1,37 @@
> +;; Common predicate definitions for ARM, Thumb and AArch64
> +;; Copyright (C) 2020 Free Software Foundation, Inc.
> +;; Contributed by Fujitsu Ltd.
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published
> +;; by the Free Software Foundation; either version 3, or (at your
> +;; option) any later version.
> +
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT
> +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +;; License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; .
> +
> +;; Return true if constant is CONST_INT >= 1 and <= 4
> +(define_predicate "const_1_to_4_operand"
> +  (and (match_code "const_int")
> +   (match_test "IN_RANGE(INTVAL (op), 1, 4)")))

Minor formatting nit, but: GCC style is to have a space between
"IN_RANGE" and "(".

> +;; Return true if constant is 2 or 4 or 8 or 16
> +(define_predicate "const_2_4_8_16_operand"
> +  (and (match_code "const_int")
> +   (match_test ("   INTVAL (op) == 2
> + || INTVAL (op) == 4
> + || INTVAL (op) == 8
> + || INTVAL (op) == 16 "
> +
> +;; Return true if shift type is lsl and amount is in[1,4].
> +(define_predicate "alu_shift_operator_lsl_1_to_4"
> +  (and (match_code "ashift")
> +   (match_test "const_1_to_4_operand(XEXP(op, 1), mode)")))

Same space comment here.

> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
> index 83983452f52..d1303c6fd76 100644
> --- 

Re: [PATCH] libstdc++: Rebase include/pstl to current upstream

2020-09-30 Thread Jonathan Wakely via Gcc-patches

On 21/09/20 15:40 +0100, Jonathan Wakely wrote:

On 15/09/20 20:35 -0700, Thomas Rodgers wrote:

From: Thomas Rodgers 

From llvm-project/pstl @ 0b2e0e80d96

libstdc++-v3/ChangeLog:

* include/pstl/algorithm_impl.h: Update file.
* include/pstl/execution_impl.h: Likewise.
* include/pstl/glue_algorithm_impl.h: Likewise.
* include/pstl/glue_memory_impl.h: Likewise.
* include/pstl/glue_numeric_impl.h: Likewise.
* include/pstl/memory_impl.h: Likewise.
* include/pstl/numeric_impl.h: Likewise.
* include/pstl/parallel_backend.h: Likewise.
* include/pstl/parallel_backend_serial.h: Likewise.
* include/pstl/parallel_backend_tbb.h: Likewise.
* include/pstl/parallel_backend_utils.h: Likewise.
* include/pstl/pstl_config.h: Likewise.
* include/pstl/unseq_backend_simd.h: Likewise.
---
libstdc++-v3/include/pstl/algorithm_impl.h| 181 ++--
libstdc++-v3/include/pstl/execution_impl.h|   4 +-
.../include/pstl/glue_algorithm_impl.h| 543 +--
libstdc++-v3/include/pstl/glue_memory_impl.h  | 264 ++---
libstdc++-v3/include/pstl/glue_numeric_impl.h |  68 +-
libstdc++-v3/include/pstl/memory_impl.h   |  67 +-
libstdc++-v3/include/pstl/numeric_impl.h  |   8 +-
libstdc++-v3/include/pstl/parallel_backend.h  |   8 +
.../include/pstl/parallel_backend_serial.h|   8 +-
.../include/pstl/parallel_backend_tbb.h   | 903 +++---
.../include/pstl/parallel_backend_utils.h | 248 +++--
libstdc++-v3/include/pstl/pstl_config.h   |  24 +-
.../include/pstl/unseq_backend_simd.h |  39 +-
13 files changed, 1586 insertions(+), 779 deletions(-)

diff --git a/libstdc++-v3/include/pstl/glue_algorithm_impl.h 
b/libstdc++-v3/include/pstl/glue_algorithm_impl.h
index 379de4033ec..d2e30529f78 100644
--- a/libstdc++-v3/include/pstl/glue_algorithm_impl.h
+++ b/libstdc++-v3/include/pstl/glue_algorithm_impl.h
@@ -757,8 +743,7 @@ 
__pstl::__internal::__enable_if_execution_policy<_ExecutionPolicy, bool>
equal(_ExecutionPolicy&& __exec, _ForwardIterator1 __first1, _ForwardIterator1 
__last1, _ForwardIterator2 __first2,
 _ForwardIterator2 __last2)
{
-return std::equal(std::forward<_ExecutionPolicy>(__exec), __first1, 
__last1, __first2, __last2,
-  __pstl::__internal::__pstl_equal());
+return equal(std::forward<_ExecutionPolicy>(__exec), __first1, __last1, 
__first2, __last2, std::equal_to<>());


Any idea why this is now called unqualified? I don't think we want ADL
here.



diff --git a/libstdc++-v3/include/pstl/parallel_backend_tbb.h 
b/libstdc++-v3/include/pstl/parallel_backend_tbb.h
index 9c05ade0532..4476486d548 100644
--- a/libstdc++-v3/include/pstl/parallel_backend_tbb.h
+++ b/libstdc++-v3/include/pstl/parallel_backend_tbb.h


This file is full of non-reserved names, like _root and _x_orig and
move_y_range.

Fixing those upstream might take a while though.


Please go ahead and commit this as is. The problems can be addressed
upstream and fixed here later.




[committed] fix ICE in attribute access formatting (PR middle-end/97189)

2020-09-30 Thread Martin Sebor via Gcc-patches

Redeclaring a function that takes a VLA parameter with attribute
access that references the same parameter can cause conflicts
when the two aren't in sync.  The conflicts are detected and
diagnosed but also have to be resolved.  The code wasn't robust
enough to handle all cases gracefully, leading to the ICE reported
in the PR.  After testing on x86_64-linux I have committed in
r11-3571 the attached fix to improve it and avoid the ICE.

Martin

Avoid assuming a VLA access specification string contains a closing bracket (PR middle-end/97189).

Resolves:
PR middle-end/97189 - ICE on redeclaration of a function with VLA argument and attribute access

gcc/ChangeLog:

	PR middle-end/97189
	* attribs.c (attr_access::array_as_string): Avoid assuming a VLA
	access specification string contains a closing bracket.

gcc/c-family/ChangeLog:

	PR middle-end/97189
	* c-attribs.c (append_access_attr): Use the function declaration
	location for a warning about an attribute access argument.

gcc/testsuite/ChangeLog:

	PR middle-end/97189
	* gcc.dg/attr-access-2.c: Adjust caret location.
	* gcc.dg/Wvla-parameter-6.c: New test.
	* gcc.dg/Wvla-parameter-7.c: New test.

diff --git a/gcc/attribs.c b/gcc/attribs.c
index 3f6ec3d3aa3..94b9e02699f 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -2270,11 +2270,11 @@ attr_access::array_as_string (tree type) const
 	 bound is nonconstant and whose access string has "$]" in it)
 	 extract the bound expression from SIZE.  */
 	  const char *p = end;
-	  for ( ; *p-- != ']'; );
+	  for ( ; p != str && *p-- != ']'; );
 	  if (*p == '$')
 	index_type = build_index_type (TREE_VALUE (size));
 	}
-  else  if (minsize)
+  else if (minsize)
 	index_type = build_index_type (size_int (minsize - 1));
 
   tree arat = NULL_TREE;
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 70b00037d98..c779d13f023 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -4151,18 +4151,12 @@ append_access_attr (tree node[3], tree attrs, const char *attrstr,
  "missing in previous designation",
  attrstr);
 	  else if (newa->internal_p || cura->internal_p)
-	{
-	  /* Mismatch in the value of the size argument and a VLA
-		 bound.  */
-	  location_t argloc = curloc;
-	  if (tree arg = get_argument (node[2], newa->sizarg))
-		argloc = DECL_SOURCE_LOCATION (arg);
-	  warned = warning_at (argloc, OPT_Wattributes,
-   "attribute %qs positional argument 2 "
-   "conflicts with previous designation "
-   "by argument %u",
-   attrstr, cura->sizarg + 1);
-	}
+	/* Mismatch in the value of the size argument and a VLA bound.  */
+	warned = warning_at (curloc, OPT_Wattributes,
+ "attribute %qs positional argument 2 "
+ "conflicts with previous designation "
+ "by argument %u",
+ attrstr, cura->sizarg + 1);
 	  else
 	/* Mismatch in the value of the size argument between two
 	   explicit access attributes.  */
diff --git a/gcc/testsuite/gcc.dg/Wvla-parameter-6.c b/gcc/testsuite/gcc.dg/Wvla-parameter-6.c
new file mode 100644
index 000..268aeec9251
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wvla-parameter-6.c
@@ -0,0 +1,34 @@
+/* PR middle-end/97189 - ICE on redeclaration of a function with VLA argument
+   and attribute access
+   Also verify the right arguments are underlined in the notes.
+   { dg-do compile }
+   { dg-options "-Wall -fdiagnostics-show-caret" } */
+
+#define RW(...) __attribute__ ((access (read_write, __VA_ARGS__)))
+
+RW (2, 3) void f1 (int n, int[n], int);
+/* { dg-warning "attribute 'access \\(read_write, 2, 3\\)' positional argument 2 conflicts with previous designation by argument 3" "warning" { target *-*-* } .-1 }
+   { dg-begin-multiline-output "" }
+ RW (2, 3) void f1 (int n, int[n], int);
+^~
+   { dg-end-multiline-output "" }
+   { dg-message "designating the bound of variable length array argument 2" "note" { target *-*-* } .-6 }
+   { dg-begin-multiline-output "" }
+ RW (2, 3) void f1 (int n, int[n], int);
+^  ~~
+   { dg-end-multiline-output "" } */
+
+
+RW (2)void f2 (int, int[*], int);
+/* { dg-message "previously declared as a variable length array 'int\\\[\\\*]'" "note" { target *-*-* } .-1 }
+   { dg-begin-multiline-output "" }
+ RW (2, 3) void f2 (int, int[], int);
+ ^
+   { dg-end-multiline-output "" } */
+
+RW (2, 3) void f2 (int, int[], int);
+/* { dg-warning "argument 2 of type 'int\\\[]' declared as an ordinary array" "warning" { target *-*-* } .-1 }
+   { dg-begin-multiline-output "" }
+ RW (2)void f2 (int, int[*], int);
+ ^~
+   { dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/gcc.dg/Wvla-parameter-7.c b/gcc/testsuite/gcc.dg/Wvla-parameter-7.c
new file mode 100644
index 000..14ce75f3e2e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wvla-parameter-7.c
@@ -0,0 +1,36 @@
+/* PR middle-end/97189 - ICE on 

Re: [PATCH] avoid modifying type in place (PR 97206)

2020-09-30 Thread Martin Sebor via Gcc-patches

On 9/30/20 3:57 AM, Jakub Jelinek wrote:

On Tue, Sep 29, 2020 at 03:40:40PM -0600, Martin Sebor via Gcc-patches wrote:

I will commit this patch later this week unless I hear concerns
or suggestions for changes.


That is not how the patch review process works.


The review process hasn't been working well for me, but thankfully,
the commit policy lets me make these types of "obvious" fixes on
my own, without waiting for approval.  But if I could get simple
changes reviewed in a few days instead of having to ping them for
weeks there would be no reason for me to take advantage of this
latitude (and for us to rehash this topic yet again).

+ arat = tree_cons (get_identifier ("array"), flag, NULL_TREE);


Better
  arat = build_tree_list (get_identifier ("array"), flag);
then, tree_cons is when you have a meaningful TREE_CHAIN you want to supply
too.


Okay.  I checked to make sure they both do the same thing and
create a tree with the size and committed the updated patch in
r11-3570.

Martin


}
  
-  TYPE_ATOMIC (artype) = TYPE_ATOMIC (type);

-  TYPE_READONLY (artype) = TYPE_READONLY (type);
-  TYPE_RESTRICT (artype) = TYPE_RESTRICT (type);
-  TYPE_VOLATILE (artype) = TYPE_VOLATILE (type);
-  type = artype;
+  const int quals = TYPE_QUALS (type);
+  type = build_array_type (eltype, index_type);
+  type = build_type_attribute_qual_variant (type, arat, quals);
  }
  
/* Format the type using the current pretty printer.  The generic tree

@@ -2309,10 +2304,6 @@ attr_access::array_as_string (tree type) const
typstr = pp_formatted_text (pp);
delete pp;
  
-  if (this->str)

-/* Remove the attribute that wasn't installed by decl_attributes.  */
-TYPE_ATTRIBUTES (type) = NULL_TREE;
-
return typstr;
  }


Otherwise LGTM.

Jakub





Re: Another issue on RS6000 target. Re: One issue with default implementation of zero_call_used_regs

2020-09-30 Thread Qing Zhao via Gcc-patches



> On Sep 30, 2020, at 11:25 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
 
 As I checked, when the FP registers are zeroed, the above failure happened.
 
 I suspect that the issue still relate to the following statement:
 
 machine_mode mode = reg_raw_mode[regno];
 
 As I checked, the reg_raw_mode always return the integer mode that can be 
 hold by the hard registers, even though it’s FP register.
>>> 
>>> Well, more precisely: it's the largest mode that the target allows the
>>> registers to hold.  If there are multiple candidate modes of the same
>>> size, the integer one wins, like you say.  But the point is that DI only
>>> wins over DF because the target allows both DI and DF to be stored in
>>> the register, and therefore supports both DI and DF moves for that
>>> register.
>>> 
>>> So I don't think the mode is the issue.  Integer zero and floating-point
>>> zero have the same bit representation after all.
>> 
>> theoritically  yes. 
>> However, as we have noticed in Aarch64, the integer TI move has not been 
>> supported before your fix today. As a result, the Ti move have to be 
>> splitted.
>> With your fix today on aarch64,  Yes, the default implementation works well 
>> for those vector registers. Thanks a lot.
>> 
>> Potentially there will be other targets that have the same issue. Then those 
>> targets need to fix those issues too in order to make the default 
>> implementation work.
> 
> Right.  But that's not a bad thing.
> 
> My point above was that what you describe was not the issue for Power.
> AIUI the issue there was…
> 
>>> AIUI, without VSX, Power needs to load the zero from the constant pool.
> 
> …this instead.
> 
 So, I still wondering:
 
 1. Is there another available utility routine that returns the proper MODE 
 for the hard registers that can be readily used to zero the hard register?
 2. If not, should I add one more target hook for this purpose? i.e 
 
 /* Return the proper machine mode that can be used to zero this hard 
 register specified by REGNO.  */
 machine_mode zero-call-used-regs-mode (unsigned int REGNO)
 
 3. Or should I just delete the default implemeantion, and let the target 
 to implement it.
>>> 
>>> IMO no.  This goes back to what we discussed earlier.  It isn't the
>>> case that a default target hook has to be correct for all targets,
>>> with targets only overriding them as an optimisation.  The default
>>> versions of many hooks and macros are not conservatively correct.
>>> They are just reaonable default assumptions.  And IMO that's true
>>> of the hook above too.
>>> 
>>> The way to flush out whether a target needs to override the hook
>>> is to add tests that run on all targets.
>> I planned to add these new test cases, so currently I have been testing the 
>> simple testing cases on aarch64 and rs6000 to see any issue 
>> With the default implementation. So far, I have found these issues with the 
>> very simple testing cases.
>> 
>> For me, at most I can test aarch64 and rs6000 targets for some small testing 
>> cases for checking correctness.
> 
> Thanks for testing other targets.  But I don't think that invalidates
> what I said above.  It might be that some of the targets you pick to
> test are ones that can't use the default implementation (or at least,
> not in all circumstances).  At that point, hopefully target maintainers
> will step in to help.
> 
>>> That said, one way of erring on the side of caution from an ICE
>>> perspective would be to do something like:
>>> 
>>>   rtx_insn *last_insn = get_last_insn ();
>>>   rtx zero = CONST0_RTX (reg_raw_mode[regno]);
>>>   rtx_insn *insn = emit_insn (gen_rtx_SET (regno_reg_rtx[regno], zero));
>>>   if (!valid_insn_p (insn))
>>> {
>>>   delete_insns_since (last_insn);
>>>   ...remove regno from the set of cleared registers...;
>>> }
>>> 
>>> where valid_insn_p abstracts out this code from ira.c:
>>> 
>>>   recog_memoized (move_insn);
>>>   if (INSN_CODE (move_insn) < 0)
>>> continue;
>>>   extract_insn (move_insn);
>>>   /* We don't know whether the move will be in code that is optimized
>>>  for size or speed, so consider all enabled alternatives.  */
>>>   if (! constrain_operands (1, get_enabled_alternatives (move_insn)))
>>> continue;
>>> 
>>> (but keeping the comment where it is).  The default behaviour would then
>>> be to drop any register that can't be zeroed easily.
>> 
>> I will check whether the above fix the ICE on rs6000.

Yes, it fixes the ICE on rs6000.
However, now with this check, ONLY integer registers on rs6000 can be zeroed.  
All other call_used registers are excluded from ALL. 

This doesn’t look like the correct fix to me. 
>>> 
>>> Doing this would make the default hook usable for more targets.
>>> The question is whether dropping registers that can't be zeroed
>>> easily is acceptable as a default policy for a 

Re: libgfortran caf API help needed: Fixing fnspec strings in trans-decl

2020-09-30 Thread Andre Vehreschild

Hi Honza, Tobias,
Yes, I am willing to help and will do so as soon as my small vacation ends
on Monday.

Regards,
Andre

Andre Vehreschild * ve...@gmx.de
Am 30. September 2020 19:12:48 schrieb Tobias Burnus :


Hi Honza,

On 9/30/20 6:12 PM, Jan Hubicka wrote:

_gfortran_caf_co_sum (gfc_descriptor_t *a __attribute__ ((unused)),

Should have fnspec
  ".XXWXX"
First dot represents return value, then X is for unused parameters


'X' is definitely wrong. In GCC there is only a stub implementation for
gfortran's coarray (Fortran) implemented. The full version needs
a communication library – such as MPI (Message Passing Interface)
or GASNet or OpenShMem ... Hence, that library is separate. The main point
of this stub library is to provide some means for testing.

See http://www.opencoarrays.org/ and
https://github.com/sourceryinstitute/opencoarrays/
for a (or rather: the) version which actually implements those
library functions.


I would apprechiate help from someone who knows the API to correct the
strings.


@Andre? How about you? ;-)


-  gfor_fndecl_sr_kind = gfc_build_library_function_decl_with_spec (
- get_identifier (PREFIX("selected_real_kind2008")), ".RR",
+  gfor_fndecl_sr_kind = gfc_build_library_function_decl (
+ get_identifier (PREFIX("selected_real_kind2008")),


(This one is outside CAF.)

@Honza: I want to note that also for user functions, 'fn spec' are
generated, cf. create_fn_spec in trans-types.c – hopefully this one is fine.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
Alexander Walter




RE: [PATCH][GCC][ARM] Add support for Cortex-A78 and Cortex-A78AE

2020-09-30 Thread Przemyslaw Wirkus via Gcc-patches
> > Subject: [PATCH][GCC][ARM] Add support for Cortex-A78 and Cortex-A78AE
> >
> > This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
> > cpus.
> >
> > [0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-
> > a78
> > [1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-
> > a78ae
> >
> > OK for master branch ?

commit 60e4b3cade5c63f919df4ddc0f0d23261f968e13

> Ok.
> Thanks,
> Kyrill
> 
> >
> > kind regards
> > Przemyslaw Wirkus
> >
> > gcc/ChangeLog:
> >
> > * config/arm/arm-cpus.in: Add Cortex-A78 and Cortex-A78AE cores.
> > * config/arm/arm-tables.opt: Regenerate.
> > * config/arm/arm-tune.md: Regenerate.
> > * doc/invoke.texi: Update docs.


RE: [PATCH][GCC][AArch64] Add support for Cortex-A78 and Cortex-A78AE

2020-09-30 Thread Przemyslaw Wirkus via Gcc-patches
> > Subject: [PATCH][GCC][AArch64] Add support for Cortex-A78 and Cortex-
> > A78AE
> >
> > This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
> > cpus.
> >
> > [0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
> > [1]:
> > https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae
> >
> > OK for master branch ?

commit b6860cb96d038fe7519797adfb9c3c2e635234de

> Ok.
> Thanks,
> Kyrill
> 
> >
> > kind regards
> > Przemyslaw Wirkus
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-cores.def: Add Cortex-A78 and Cortex-
> A78AE
> > cores.
> > * config/aarch64/aarch64-tune.md: Regenerate.
> > * doc/invoke.texi: Add -mtune=cortex-a78 and -mtune=cortex-a78ae.


[committed] libstdc++: Use __is_same instead of __is_same_as

2020-09-30 Thread Jonathan Wakely via Gcc-patches
PR 92271 added __is_same as another spelling of __is_same_as. Since
Clang also spells it __is_same, let's just use that consistently.

It appears that Intel icc sets __GNUC__ to 10, but only supports
__is_same_as. If we only use __is_same for __GNUC__ >= 11 then we won't
break icc again (it looks like we broke previous versions of icc when we
started using __is_same_as).

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAVE_BUILTIN_IS_SAME):
Define for GCC 11 or when !__is_identifier(__is_same).
(_GLIBCXX_BUILTIN_IS_SAME_AS): Remove.
* include/std/type_traits (is_same, is_same_v): Replace uses
of _GLIBCXX_BUILTIN_IS_SAME_AS.

Tested powerpc64le-linux. Committed to trunk.

commit 73ae6eb572515ad627b575a7fbdfdd47a4368e1c
Author: Jonathan Wakely 
Date:   Wed Sep 30 18:24:48 2020

libstdc++: Use __is_same instead of __is_same_as

PR 92271 added __is_same as another spelling of __is_same_as. Since
Clang also spells it __is_same, let's just use that consistently.

It appears that Intel icc sets __GNUC__ to 10, but only supports
__is_same_as. If we only use __is_same for __GNUC__ >= 11 then we won't
break icc again (it looks like we broke previous versions of icc when we
started using __is_same_as).

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAVE_BUILTIN_IS_SAME):
Define for GCC 11 or when !__is_identifier(__is_same).
(_GLIBCXX_BUILTIN_IS_SAME_AS): Remove.
* include/std/type_traits (is_same, is_same_v): Replace uses
of _GLIBCXX_BUILTIN_IS_SAME_AS.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 860bf6dbcb3..2e6c880ad95 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -658,10 +658,12 @@ namespace std
 # define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1
 # define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
 # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
-# define _GLIBCXX_BUILTIN_IS_SAME_AS(T, U) __is_same_as(T, U)
 # if __GNUC__ >= 9
 #  define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1
 # endif
+# if __GNUC__ >= 11
+#  define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1
+# endif
 #elif defined(__is_identifier) && defined(__has_builtin)
 // For non-GNU compilers:
 # if ! __is_identifier(__has_unique_object_representations)
@@ -677,7 +679,7 @@ namespace std
 #  define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1
 # endif
 # if ! __is_identifier(__is_same)
-#  define _GLIBCXX_BUILTIN_IS_SAME_AS(T, U) __is_same(T, U)
+#  define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1
 # endif
 #endif // GCC
 
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index b7bb63bbc74..9994c9ae3d7 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1394,14 +1394,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// is_same
   template
 struct is_same
-#ifdef _GLIBCXX_BUILTIN_IS_SAME_AS
-: public integral_constant
+#ifdef _GLIBCXX_HAVE_BUILTIN_IS_SAME
+: public integral_constant
 #else
 : public false_type
 #endif
 { };
 
-#ifndef _GLIBCXX_BUILTIN_IS_SAME_AS
+#ifndef _GLIBCXX_HAVE_BUILTIN_IS_SAME
   template
 struct is_same<_Tp, _Tp>
 : public true_type
@@ -3215,9 +3215,9 @@ template 
   inline constexpr size_t rank_v = rank<_Tp>::value;
 template 
   inline constexpr size_t extent_v = extent<_Tp, _Idx>::value;
-#ifdef _GLIBCXX_BUILTIN_IS_SAME_AS
+#ifdef _GLIBCXX_HAVE_BUILTIN_IS_SAME
 template 
-  inline constexpr bool is_same_v = _GLIBCXX_BUILTIN_IS_SAME_AS(_Tp, _Up);
+  inline constexpr bool is_same_v = __is_same(_Tp, _Up);
 #else
 template 
   inline constexpr bool is_same_v = std::is_same<_Tp, _Up>::value;


[PATCH] libstdc++: Make ranges::construct_at constexpr-friendly [PR95788]

2020-09-30 Thread Patrick Palka via Gcc-patches
This rewrites ranges::construct_at in terms of std::construct_at so
that we can piggy back on the compiler's existing support for
recognizing placement new within std::construct_at during constexpr
evaluation instead of having to additionally teach the compiler about
ranges::construct_at.

While we're here, we should also make ranges::construct_at conditionally
noexcept like std::construct_at.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/95788
* include/bits/ranges_uninitialized.h:
(__construct_at_fn::operator()): Just call std::construct_at.
Declare it conditionally noexcept.
* testsuite/20_util/specialized_algorithms/construct_at/95788.cc:
New test.
---
 .../include/bits/ranges_uninitialized.h   |  6 +--
 .../construct_at/95788.cc | 40 +++
 2 files changed, 42 insertions(+), 4 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/95788.cc

diff --git a/libstdc++-v3/include/bits/ranges_uninitialized.h 
b/libstdc++-v3/include/bits/ranges_uninitialized.h
index d758078fc03..def086508fb 100644
--- a/libstdc++-v3/include/bits/ranges_uninitialized.h
+++ b/libstdc++-v3/include/bits/ranges_uninitialized.h
@@ -496,10 +496,8 @@ namespace ranges
   requires requires { ::new (declval()) _Tp(declval<_Args>()...); }
   constexpr _Tp*
   operator()(_Tp* __location, _Args&&... __args) const
-  {
-   return ::new (__detail::__voidify(*__location))
-_Tp(std::forward<_Args>(__args)...);
-  }
+  noexcept(noexcept(std::construct_at(__location, declval<_Args>()...)))
+  { return std::construct_at(__location, std::forward<_Args>(__args)...); }
   };
 
   inline constexpr __construct_at_fn construct_at{};
diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/95788.cc 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/95788.cc
new file mode 100644
index 000..aeb04de1ea3
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/95788.cc
@@ -0,0 +1,40 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+
+constexpr bool test01()
+{
+  const int sz{1};
+  int* data{std::allocator{}.allocate(sz)};
+  static_assert(noexcept(std::ranges::construct_at(data, 42)));
+  std::ranges::construct_at(data, 42);
+  if (*data != 42)
+return false;
+  std::ranges::destroy_at(data);
+  std::allocator{}.deallocate(data, sz);
+  return true;
+}
+
+static_assert(test01());
+
+struct S { S(); };
+S *p;
+static_assert(!noexcept(std::ranges::construct_at(p)));
-- 
2.28.0.651.g306ee63a70



Re: libgfortran caf API help needed: Fixing fnspec strings in trans-decl

2020-09-30 Thread Tobias Burnus

Hi Honza,

On 9/30/20 6:12 PM, Jan Hubicka wrote:

_gfortran_caf_co_sum (gfc_descriptor_t *a __attribute__ ((unused)),

Should have fnspec
  ".XXWXX"
First dot represents return value, then X is for unused parameters


'X' is definitely wrong. In GCC there is only a stub implementation for
gfortran's coarray (Fortran) implemented. The full version needs
a communication library – such as MPI (Message Passing Interface)
or GASNet or OpenShMem ... Hence, that library is separate. The main point
of this stub library is to provide some means for testing.

See http://www.opencoarrays.org/ and
https://github.com/sourceryinstitute/opencoarrays/
for a (or rather: the) version which actually implements those
library functions.


I would apprechiate help from someone who knows the API to correct the
strings.


@Andre? How about you? ;-)


-  gfor_fndecl_sr_kind = gfc_build_library_function_decl_with_spec (
- get_identifier (PREFIX("selected_real_kind2008")), ".RR",
+  gfor_fndecl_sr_kind = gfc_build_library_function_decl (
+ get_identifier (PREFIX("selected_real_kind2008")),


(This one is outside CAF.)

@Honza: I want to note that also for user functions, 'fn spec' are
generated, cf. create_fn_spec in trans-types.c – hopefully this one is fine.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter

2020-09-30 Thread Martin Jambor
Hi,

On Wed, Sep 30 2020, Richard Biener wrote:
> On Tue, Sep 29, 2020 at 9:31 PM Jan Hubicka  wrote:
>>
>> >
>> > gcc/ChangeLog:
>> >
>> > 2020-09-07  Martin Jambor  
>> >
>> >   * params.opt (ipa-cp-large-unit-insns): New parameter.
>> >   * ipa-cp.c (get_max_overall_size): Use the new parameter.
>> OK,
>
> Maybe the IPA CP large-unit should be a factor of the large-unit
> param?  Thus, make the new param ipa-cp-large-unit-factor
> instead so when people increase large-unit they also get "other"
> large units increased accordingly?

I do not have a very strong opinion about this but I think that having
two separate parameters will make it easier for us to experiment with
the passes and is probably easier to document and thus also easier for
users who want to play with this to understand.

On the other hand, having a single param to tune sensitivity of all IPA
towards sizes - or just what big size means - does not seem like such a
big advantage to me.

But I guess I could be persuaded otherwise.

Thanks,

Martin


Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 30, 2020 at 06:06:31PM +0200, Florian Weimer wrote:
> This is what I came up with.  It is not valid to set ix86_arch to
> PROCESSOR_GENERIC, which is why PTA_NO_TUNE is still needed.

Ok, LGTM, but would prefer Uros to have final voice.

Jakub



Re: Another issue on RS6000 target. Re: One issue with default implementation of zero_call_used_regs

2020-09-30 Thread Richard Sandiford via Gcc-patches
Qing Zhao  writes:
>>> +  }
>>> +  return need_zeroed_hardregs;
>>> +}
>>> +
>>> 
>>> With the small testing case:
>>> int
>>> test ()
>>> {
>>>  return 1;
>>> }
>>> 
>>> If I compiled it with 
>>> 
>>> /home/qinzhao/Install/latest/bin/gcc -O2 -fzero-call-used-regs=all-arg t.c
>>> 
>>> It will failed as:
>>> 
>>> t.c: In function ‘test’:
>>> t.c:6:1: error: insn does not satisfy its constraints:
>>>6 | }
>>>  | ^
>>> (insn 28 27 29 (set (reg:DI 33 1)
>>>(const_int 0 [0])) "t.c":6:1 647 {*movdi_internal64}
>>> (nil))
>>> during RTL pass: shorten
>>> dump file: t.c.319r.shorten
>>> t.c:6:1: internal compiler error: in extract_constrain_insn_cached, at 
>>> recog.c:2207
>>> 0x1018d693 _fatal_insn(char const*, rtx_def const*, char const*, int, char 
>>> const*)
>>> ../../latest-gcc-x86/gcc/rtl-error.c:108
>>> 0x1018d6e7 _fatal_insn_not_found(rtx_def const*, char const*, int, char 
>>> const*)
>>> ../../latest-gcc-x86/gcc/rtl-error.c:118
>>> 0x1099a82b extract_constrain_insn_cached(rtx_insn*)
>>> ../../latest-gcc-x86/gcc/recog.c:2207
>>> 0x11393917 insn_min_length(rtx_insn*)
>>> ../../latest-gcc-x86/gcc/config/rs6000/rs6000.md:721
>>> 0x105bece3 shorten_branches(rtx_insn*)
>>> ../../latest-gcc-x86/gcc/final.c:1118
>>> 
>>> 
>>> As I checked, when the FP registers are zeroed, the above failure happened.
>>> 
>>> I suspect that the issue still relate to the following statement:
>>> 
>>> machine_mode mode = reg_raw_mode[regno];
>>> 
>>> As I checked, the reg_raw_mode always return the integer mode that can be 
>>> hold by the hard registers, even though it’s FP register.
>> 
>> Well, more precisely: it's the largest mode that the target allows the
>> registers to hold.  If there are multiple candidate modes of the same
>> size, the integer one wins, like you say.  But the point is that DI only
>> wins over DF because the target allows both DI and DF to be stored in
>> the register, and therefore supports both DI and DF moves for that
>> register.
>> 
>> So I don't think the mode is the issue.  Integer zero and floating-point
>> zero have the same bit representation after all.
>
> theoritically  yes. 
> However, as we have noticed in Aarch64, the integer TI move has not been 
> supported before your fix today. As a result, the Ti move have to be splitted.
> With your fix today on aarch64,  Yes, the default implementation works well 
> for those vector registers. Thanks a lot.
>
> Potentially there will be other targets that have the same issue. Then those 
> targets need to fix those issues too in order to make the default 
> implementation work.

Right.  But that's not a bad thing.

My point above was that what you describe was not the issue for Power.
AIUI the issue there was…

>> AIUI, without VSX, Power needs to load the zero from the constant pool.

…this instead.

>>> So, I still wondering:
>>> 
>>> 1. Is there another available utility routine that returns the proper MODE 
>>> for the hard registers that can be readily used to zero the hard register?
>>> 2. If not, should I add one more target hook for this purpose? i.e 
>>> 
>>> /* Return the proper machine mode that can be used to zero this hard 
>>> register specified by REGNO.  */
>>> machine_mode zero-call-used-regs-mode (unsigned int REGNO)
>>> 
>>> 3. Or should I just delete the default implemeantion, and let the target to 
>>> implement it.
>> 
>> IMO no.  This goes back to what we discussed earlier.  It isn't the
>> case that a default target hook has to be correct for all targets,
>> with targets only overriding them as an optimisation.  The default
>> versions of many hooks and macros are not conservatively correct.
>> They are just reaonable default assumptions.  And IMO that's true
>> of the hook above too.
>> 
>> The way to flush out whether a target needs to override the hook
>> is to add tests that run on all targets.
> I planned to add these new test cases, so currently I have been testing the 
> simple testing cases on aarch64 and rs6000 to see any issue 
> With the default implementation. So far, I have found these issues with the 
> very simple testing cases.
>
> For me, at most I can test aarch64 and rs6000 targets for some small testing 
> cases for checking correctness.

Thanks for testing other targets.  But I don't think that invalidates
what I said above.  It might be that some of the targets you pick to
test are ones that can't use the default implementation (or at least,
not in all circumstances).  At that point, hopefully target maintainers
will step in to help.

>> That said, one way of erring on the side of caution from an ICE
>> perspective would be to do something like:
>> 
>>rtx_insn *last_insn = get_last_insn ();
>>rtx zero = CONST0_RTX (reg_raw_mode[regno]);
>>rtx_insn *insn = emit_insn (gen_rtx_SET (regno_reg_rtx[regno], zero));
>>if (!valid_insn_p (insn))
>>  {
>>delete_insns_since (last_insn);
>>...remove regno from the set of cleared 

libgfortran caf API help needed: Fixing fnspec strings in trans-decl

2020-09-30 Thread Jan Hubicka
Hi,
this patch contains basic fixup of the fnspec strings for caf, however I
am quite sure I need help on this (short of dropping them all).

I first assumed that we have missing "." for return values since most
strings had as many letters as parametrs, but it is not true.
I tried to check the strings with reality. For example:


void
_gfortran_caf_co_sum (gfc_descriptor_t *a __attribute__ ((unused)), 
  int result_image __attribute__ ((unused)),
  int *stat, char *errmsg __attribute__ ((unused)), 
  size_t errmsg_len __attribute__ ((unused)))   
{   
  if (stat) 
*stat = 0;  
}   

Should have fnspec
 ".XXWXX"
First dot represents return value, then X is for unused parameters and W
is for stat pointer we write into.

However I am not sure why the pointers are part ofthe API, if they are
meant to be used later, we need to specify them so things remain ABI
compatible.

It is declared as:
get_identifier (PREFIX("caf_co_sum")), "W.WW",
Which correclty specifies stat as W, but I am not sure what does the
else.

I would apprechiate help from someone who knows the API to correct the
strings.  Basicaly all strings starting with "W" or "R" are wrong since
they miss the return value specifier.

An alternative would be to simply drop all of those if we are unsure
what they do, but it seems lame.

Honza

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 2be9df40d2c..59ea891915e 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -3514,8 +3514,8 @@ gfc_build_intrinsic_function_decls (void)
   DECL_PURE_P (gfor_fndecl_si_kind) = 1;
   TREE_NOTHROW (gfor_fndecl_si_kind) = 1;
 
-  gfor_fndecl_sr_kind = gfc_build_library_function_decl_with_spec (
-   get_identifier (PREFIX("selected_real_kind2008")), ".RR",
+  gfor_fndecl_sr_kind = gfc_build_library_function_decl (
+   get_identifier (PREFIX("selected_real_kind2008")),
gfc_int4_type_node, 3, pvoid_type_node, pvoid_type_node,
pvoid_type_node);
   DECL_PURE_P (gfor_fndecl_sr_kind) = 1;
@@ -3841,50 +3841,50 @@ gfc_build_builtin_function_decls (void)
get_identifier (PREFIX("caf_num_images")), integer_type_node,
2, integer_type_node, integer_type_node);
 
-  gfor_fndecl_caf_register = gfc_build_library_function_decl_with_spec (
-   get_identifier (PREFIX("caf_register")), "RRR", void_type_node, 7,
+  gfor_fndecl_caf_register = gfc_build_library_function_decl (
+   get_identifier (PREFIX("caf_register")), void_type_node, 7,
size_type_node, integer_type_node, ppvoid_type_node, pvoid_type_node,
pint_type, pchar_type_node, size_type_node);
 
   gfor_fndecl_caf_deregister = gfc_build_library_function_decl_with_spec (
-   get_identifier (PREFIX("caf_deregister")), "WRWWR", void_type_node, 5,
+   get_identifier (PREFIX("caf_deregister")), ".W.WW.", void_type_node, 5,
ppvoid_type_node, integer_type_node, pint_type, pchar_type_node,
size_type_node);
 
-  gfor_fndecl_caf_get = gfc_build_library_function_decl_with_spec (
-   get_identifier (PREFIX("caf_get")), ".R.RRWRRRW", void_type_node, 10,
+  gfor_fndecl_caf_get = gfc_build_library_function_decl (
+   get_identifier (PREFIX("caf_get")), void_type_node, 10,
pvoid_type_node, size_type_node, integer_type_node, pvoid_type_node,
pvoid_type_node, pvoid_type_node, integer_type_node, integer_type_node,
boolean_type_node, pint_type);
 
-  gfor_fndecl_caf_send = gfc_build_library_function_decl_with_spec (
-   get_identifier (PREFIX("caf_send")), ".R.RRWR", void_type_node, 11,
+  gfor_fndecl_caf_send = gfc_build_library_function_decl (
+   get_identifier (PREFIX("caf_send")), void_type_node, 11,
pvoid_type_node, size_type_node, integer_type_node, pvoid_type_node,
pvoid_type_node, pvoid_type_node, integer_type_node, integer_type_node,
boolean_type_node, pint_type, pvoid_type_node);
 
-  gfor_fndecl_caf_sendget = gfc_build_library_function_decl_with_spec (
-   get_identifier (PREFIX("caf_sendget")), ".R..RR",
+  gfor_fndecl_caf_sendget = gfc_build_library_function_decl (
+   get_identifier (PREFIX("caf_sendget")),
void_type_node, 14, pvoid_type_node, size_type_node, integer_type_node,
pvoid_type_node, pvoid_type_node, pvoid_type_node, size_type_node,
integer_type_node, pvoid_type_node, pvoid_type_node, integer_type_node,
integer_type_node, boolean_type_node, integer_type_node);
 
-  

Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Florian Weimer
* Jakub Jelinek:

> On Wed, Sep 30, 2020 at 04:29:34PM +0200, Florian Weimer wrote:
>> > Thinking about it more, wouldn't it better to just imply generic tuning
>> > for these -march= options?
>> 
>> I think this is what the patch does?  See the x86-64-v3-haswell.c
>> test.
>
> No, I think it will have that behavior solely when the compiler has been
> configured to default to -mtune=generic.
> What I'm suggesting is to not ignore the tuning like you do for PTA_NO_TUNE,
> but instead perhaps use PROCESSOR_GENERIC and special case it in the code
> so that ix86_arch will be set to PROCESSOR_K8 in that case and only
> ix86_tune will be PROCESSOR_GENERIC.

This is what I came up with.  It is not valid to set ix86_arch to
PROCESSOR_GENERIC, which is why PTA_NO_TUNE is still needed.

8<--8<
These micro-architecture levels are defined in the x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9

PTA_NO_TUNE is introduced so that the new processor alias table entries
do not affect the CPU tuning setting in ix86_tune.

The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").

gcc/:
PR target/97250
* config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
(PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
* common/config/i386/i386-common.c (processor_alias_table):
Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
* config/i386/i386-options.c (ix86_option_override_internal):
Handle new PTA_NO_TUNE processor table entries.
* doc/invoke.texi (x86 Options): Document new -march values.

gcc/testsuite/:
PR target/97250
* gcc.target/i386/x86-64-v2.c: New test.
* gcc.target/i386/x86-64-v3.c: New test.
* gcc.target/i386/x86-64-v3-haswell.c: New test.
* gcc.target/i386/x86-64-v3-skylake.c: New test.
* gcc.target/i386/x86-64-v4.c: New test.

---
 gcc/common/config/i386/i386-common.c  |  10 +-
 gcc/config/i386/i386-options.c|  29 +-
 gcc/config/i386/i386.h|  11 +-
 gcc/doc/invoke.texi   |  15 ++-
 gcc/testsuite/gcc.target/i386/x86-64-v2.c | 116 ++
 gcc/testsuite/gcc.target/i386/x86-64-v3-haswell.c |  18 
 gcc/testsuite/gcc.target/i386/x86-64-v3-skylake.c |  21 
 gcc/testsuite/gcc.target/i386/x86-64-v3.c | 116 ++
 gcc/testsuite/gcc.target/i386/x86-64-v4.c | 116 ++
 9 files changed, 442 insertions(+), 10 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 10142149115..62a620b4430 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1795,9 +1795,13 @@ const pta processor_alias_table[] =
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
   {"athlon-mp", PROCESSOR_ATHLON, CPU_ATHLON,
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
-  {"x86-64", PROCESSOR_K8, CPU_K8,
-PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR,
-0, P_NONE},
+  {"x86-64", PROCESSOR_K8, CPU_K8, PTA_X86_64_BASELINE, 0, P_NONE},
+  {"x86-64-v2", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V2 | PTA_NO_TUNE,
+   0, P_NONE},
+  {"x86-64-v3", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V3 | PTA_NO_TUNE,
+   0, P_NONE},
+  {"x86-64-v4", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V4 | PTA_NO_TUNE,
+   0, P_NONE},
   {"eden-x2", PROCESSOR_K8, CPU_K8,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR,
 0, P_NONE},
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 597de533fbd..a59bd703880 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2058,10 +2058,27 @@ ix86_option_override_internal (bool main_args_p,
return false;
  }
 
+   /* The feature-only micro-architecture levels that use
+  PTA_NO_TUNE are only defined for the x86-64 psABI.  */
+   if ((processor_alias_table[i].flags & PTA_NO_TUNE) != 0
+   && (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
+   || opts->x_ix86_abi != SYSV_ABI))
+ {
+   error (G_("%<%s%> architecture level is only defined"
+ " for the x86-64 psABI"), opts->x_ix86_arch_string);
+   return false;
+ }
+
ix86_schedule = processor_alias_table[i].schedule;
ix86_arch = processor_alias_table[i].processor;
-   /* Default cpu tuning to the architecture.  */
-   ix86_tune = ix86_arch;
+
+   /* Default cpu tuning to the architecture, unless the table
+  entry requests not to do this.  Used by the x86-64 psABI
+  micro-architecture levels.  */
+   if ((processor_alias_table[i].flags & PTA_NO_TUNE) == 

Re: [PATCH] tree-optimization/97151 - improve PTA for C++ operator delete

2020-09-30 Thread Jason Merrill via Gcc-patches

On 9/28/20 3:09 PM, Jason Merrill wrote:

On 9/28/20 3:56 AM, Richard Biener wrote:

On Fri, 25 Sep 2020, Jason Merrill wrote:


On 9/25/20 2:30 AM, Richard Biener wrote:

On Thu, 24 Sep 2020, Jason Merrill wrote:


On 9/24/20 3:43 AM, Richard Biener wrote:

On Wed, 23 Sep 2020, Jason Merrill wrote:


On 9/23/20 2:42 PM, Richard Biener wrote:

On September 23, 2020 7:53:18 PM GMT+02:00, Jason Merrill

wrote:

On 9/23/20 4:14 AM, Richard Biener wrote:

C++ operator delete, when DECL_IS_REPLACEABLE_OPERATOR_DELETE_P,
does not cause the deleted object to be escaped.  It also has no
other interesting side-effects for PTA so skip it like we do
for BUILT_IN_FREE.


Hmm, this is true of the default implementation, but since the 
function


is replaceable, we don't know what a user definition might do 
with the

pointer.


But can the object still be 'used' after delete? Can delete fail /
throw?

What guarantee does the predicate give us?


The deallocation function is called as part of a delete 
expression in

order
to
release the storage for an object, ending its lifetime (if it was 
not

ended
by
a destructor), so no, the object can't be used afterward.


OK, but the delete operator can access the object contents if there
wasn't a destructor ...



A deallocation function that throws has undefined behavior.


OK, so it seems the 'replaceable' operators are the global ones
(for user-defined/class-specific placement variants I see arbitrary
extra arguments that we'd possibly need to handle).

I'm happy to revert but I'd like to have a testcase that FAILs
with the patch ;)

Now, the following aborts:

struct X {
 static struct X saved;
 int *p;
 X() { __builtin_memcpy (this, , sizeof (X)); }
};
void operator delete (void *p)
{
 __builtin_memcpy (::saved, p, sizeof (X));
}
int main()
{
 int y = 1;
 X *p = new X;
 p->p = 
 delete p;
 X *q = new X;
 *(q->p) = 2;
 if (y != 2)
   __builtin_abort ();
}

and I could fix this by not making *p but what *p points to escape.
The testcase is of course maximally awkward, but hey ... ;)

Now this would all be moot if operator delete may not access
the object (or if the object contents are undefined at that point).

Oh, and the testcase segfaults when compiled with GCC 10 because
there we elide the new X / delete p pair ... which is invalid then?
Hmm, we emit

 MEM[(struct X *)_8] ={v} {CLOBBER};
 operator delete (_8, 8);

so the object contents are undefined _before_ calling delete
even when I do not have a DTOR?  That is, the above,
w/o -fno-lifetime-dse, makes the PTA patch OK for the testcase.


Yes, all classes have a destructor, even if it's trivial, so the 
object's
lifetime definitely ends before the call to operator delete. This 
is less
clear for scalar objects, but treating them similarly would be 
consistent

with
other recent changes, so I think it's fine for us to assume that 
scalar
objects are also invalidated before the call to operator delete.  
But of
course this doesn't apply to explicit calls to operator delete 
outside of a

delete expression.


OK, so change the testcase main slightly to

int main()
{
    int y = 1;
    X *p = new X;
    p->p = 
    ::operator delete(p);
    X *q = new X;
    *(q->p) = 2;
    if (y != 2)
  __builtin_abort ();
}

in this case the lifetime of *p does not end before calling
::operator delete() and delete can stash the object contents
somewhere before ending its lifetime.  For the very same reason
we may not elide a new/delete pair like in

int main()
{
    int *p = new int;
    *p = 1;
    ::operator delete (p);
}


Correct; the permission to elide new/delete pairs are for the 
expressions, not

the functions.


which we before the change did not do only because calling
operator delete made p escape.  Unfortunately points-to analysis
cannot really reconstruct whether delete was called as part of
a delete expression or directly (and thus whether object lifetime
ended already), neither can DCE.  So I guess we need to mark
the operator delete call in some way to make those transforms
safe.  At least currently any operator delete call makes the
alias guarantee of a operator new call moot by forcing the object
to be aliased with all global and escaped memory ...

Looks like there are some unallocated flags for CALL_EXPR we could
pick but I wonder if we can recycle protected_flag which is

 CALL_FROM_THUNK_P and
 CALL_ALLOCA_FOR_VAR_P in
 CALL_EXPR

for calls to DECL_IS_OPERATOR_{NEW,DELETE}_P, thus whether
we have CALL_FROM_THUNK_P for those operators.  Guess picking
a new flag is safer.


We won't ever call those operators from a thunk, so it should be OK 
to reuse

it.


But, does it seem correct that we need to distinguish
delete expressions from plain calls to operator delete?


A reason for that distinction came up in the context of omitting 
new/delete
pairs: we want to consider the operator first called by the new or 
delete

[PATCH] Add a testcase for PR target/96827

2020-09-30 Thread H.J. Lu via Gcc-patches
Add a testcase for PR target/96827 which was fixed by r11-3559:

commit 97b798d80baf945ea28236eef3fa69f36626b579
Author: Joel Hutton 
Date:   Wed Sep 30 15:08:13 2020 +0100

[SLP][VECT] Add check to fix 96837

PR target/96827
* gcc.target/i386/pr96827.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr96827.c | 41 +
 1 file changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr96827.c

diff --git a/gcc/testsuite/gcc.target/i386/pr96827.c 
b/gcc/testsuite/gcc.target/i386/pr96827.c
new file mode 100644
index 000..309e9e8947d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr96827.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target sse2_runtime } } */
+/* { dg-options "-O3 -msse2 -mfpmath=sse" } */
+
+typedef unsigned short int __uint16_t;
+typedef unsigned int __uint32_t;
+typedef __uint16_t uint16_t;
+typedef __uint32_t uint32_t;
+typedef int __v4si __attribute__ ((__vector_size__ (16)));
+typedef long long __m128i __attribute__ ((__vector_size__ (16), 
__may_alias__));
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_store_si128 (__m128i *__P, __m128i __B)
+{
+  *__P = __B;
+}
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_set_epi32 (int __q3, int __q2, int __q1, int __q0)
+{
+  return __extension__ (__m128i)(__v4si){ __q0, __q1, __q2, __q3 };
+}
+typedef uint16_t u16;
+typedef uint32_t u32;
+extern int printf (const char *__restrict __format, ...);
+void do_the_thing(u32 idx, __m128i *dude)
+{
+ u32 dude_[4] = { idx+0, idx+2, idx+4, idx+6 };
+ for (u32 i = 0; i < 3; ++i)
+  if (dude_[i] == 1234)
+   dude_[i]--;
+ *dude = _mm_set_epi32(dude_[0], dude_[1], dude_[2], dude_[3]);
+}
+int main()
+{
+ __m128i dude;
+ u32 idx = 0;
+ do_the_thing(idx, );
+ __attribute__((aligned(16))) u32 dude_[4];
+ _mm_store_si128((__m128i*)dude_, dude);
+ if (!(6 == dude_[0] && 4 == dude_[1] && 2 == dude_[2] && 0 == dude_[3]))
+   __builtin_abort ();
+ return 0;
+}
-- 
2.26.2



Re: [PING][PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-30 Thread Christophe Lyon via Gcc-patches
On Wed, 30 Sep 2020 at 16:03, Alex Coplan  wrote:
>
> Ping. Are these testsuite fixes for ILP32 OK?
>
LGTM, by looking at the patch (I didn't run it in ilp32 mode)

Thanks
Christophe


> On 18/09/2020 17:15, Alex Coplan wrote:
> > Hi Christophe,
> >
> > On 08/09/2020 10:14, Christophe Lyon wrote:
> > > On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/aarch64/aarch64.md
> > > > (*adds__): Ensure extended operand
> > > > agrees with width of extension specifier.
> > > > (*subs__): Likewise.
> > > > (*adds__shift_): Likewise.
> > > > (*subs__shift_): Likewise.
> > > > (*add__): Likewise.
> > > > (*add__shft_): Likewise.
> > > > (*add_uxt_shift2): Likewise.
> > > > (*sub__): Likewise.
> > > > (*sub__shft_): Likewise.
> > > > (*sub_uxt_shift2): Likewise.
> > > > (*cmp_swp__reg): Likewise.
> > > > (*cmp_swp__shft_): Likewise.
> > > >
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> > > > * gcc.target/aarch64/cmp.c: Likewise.
> > > > * gcc.target/aarch64/subs3.c: Likewise.
> > > > * gcc.target/aarch64/subsp.c: Likewise.
> > > > * gcc.target/aarch64/extend-syntax.c: New test.
> > > >
> > >
> > > Hi,
> > >
> > > I've noticed some of the new tests fail with -mabi=ilp32:
> > > gcc.target/aarch64/extend-syntax.c check-function-bodies add1
> > > gcc.target/aarch64/extend-syntax.c check-function-bodies add3
> > > gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
> > > gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
> > > gcc.target/aarch64/extend-syntax.c scan-assembler-times
> > > subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
> > > gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 
> > > 4\n
> > >
> > > Christophe
> >
> > AFAICT the second scan-assembler in that subsp test failed on ILP32
> > before my commit. This is because we generate slightly suboptimal code
> > here. On LP64 with -O, we get:
> >
> > f2:
> > stp x29, x30, [sp, -16]!
> > mov x29, sp
> > add w1, w1, 1
> > sub sp, sp, x1, sxtw 4
> > mov x0, sp
> > bl  foo
> > mov sp, x29
> > ldp x29, x30, [sp], 16
> > ret
> >
> > On ILP32, we get:
> >
> > f2:
> > stp x29, x30, [sp, -16]!
> > mov x29, sp
> > add w1, w1, 1
> > lsl w1, w1, 4
> > sub sp, sp, x1
> > mov w0, wsp
> > bl  foo
> > mov sp, x29
> > ldp x29, x30, [sp], 16
> > ret
> >
> > And we see similar results all the way back to GCC 6. So AFAICT this
> > scan-assembler has never worked. The attached patch disables it on ILP32
> > since this isn't a code quality regression.
> >
> > This patch also fixes up the DejaGnu directives in extend-syntax.c to
> > work on ILP32: we change the check-function-bodies directive to only run
> > on LP64, adding scan-assembler directives for ILP32 where required.
> >
> > OK for trunk?
> >
> > Thanks,
> > Alex
>
> > diff --git a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c 
> > b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> > index 23fa9f4ffc5..1bfcdb59dde 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> > @@ -20,6 +20,7 @@ unsigned long long *add1(unsigned long long *p, unsigned 
> > x)
> >  */
> >  unsigned long long add2(unsigned long long x, unsigned y)
> >  {
> > +  /* { dg-final { scan-assembler-times "add\tx0, x0, w1, uxtw" 1 { target 
> > ilp32 } } } */
> >return x + y;
> >  }
> >
> > @@ -34,6 +35,9 @@ double *add3(double *p, int x)
> >return p + x;
> >  }
> >
> > +// add1 and add3 should both generate this on ILP32:
> > +/* { dg-final { scan-assembler-times "add\tw0, w0, w1, lsl 3" 2 { target 
> > ilp32 } } } */
> > +
> >  // Hits *sub_zero_extendsi_di (*sub__).
> >  /*
> >  ** sub1:
> > @@ -42,6 +46,7 @@ double *add3(double *p, int x)
> >  */
> >  unsigned long long sub1(unsigned long long x, unsigned n)
> >  {
> > +/* { dg-final { scan-assembler-times "sub\tx0, x0, w1, uxtw" 1 { 
> > target ilp32 } } } */
> >  return x - n;
> >  }
> >
> > @@ -67,6 +72,9 @@ double *sub3(double *p, int n)
> >return p - n;
> >  }
> >
> > +// sub2 and sub3 should both generate this on ILP32:
> > +/* { dg-final { scan-assembler-times "sub\tw0, w0, w1, lsl 3" 2 { target 
> > ilp32 } } } */
> > +
> >  // Hits *adds_zero_extendsi_di (*adds__).
> >  int adds1(unsigned long long x, unsigned y)
> >  {
> > @@ -97,7 +105,8 @@ int subs1(unsigned long long x, unsigned y)
> >  unsigned long long *w;
> >  int subs2(unsigned long long *x, int y)
> >  {
> > -  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, 
> > w\[0-9\]+, 

Re: [committed] libstdc++: Use __libc_single_threaded to optimise atomics [PR 96817]

2020-09-30 Thread Jonathan Wakely via Gcc-patches

On 29/09/20 13:51 +0200, Christophe Lyon via Libstdc++ wrote:

On Sat, 26 Sep 2020 at 21:42, Jonathan Wakely via Gcc-patches
 wrote:


Glibc 2.32 adds a global variable that says whether the process is
single-threaded. We can use this to decide whether to elide atomic
operations, as a more precise and reliable indicator than
__gthread_active_p.

This means that guard variables for statics and reference counting in
shared_ptr can use less expensive, non-atomic ops even in processes that
are linked to libpthread, as long as no threads have been created yet.
It also means that we switch to using atomics if libpthread gets loaded
later via dlopen (this still isn't supported in general, for other
reasons).

We can't use __libc_single_threaded to replace __gthread_active_p
everywhere. If we replaced the uses of __gthread_active_p in std::mutex
then we would elide the pthread_mutex_lock in the code below, but not
the pthread_mutex_unlock:

  std::mutex m;
  m.lock();// pthread_mutex_lock
  std::thread t([]{}); // __libc_single_threaded = false
  t.join();
  m.unlock();  // pthread_mutex_unlock

We need the lock and unlock to use the same "is threading enabled"
predicate, and similarly for init/destroy pairs for mutexes and
condition variables, so that we don't try to release resources that were
never acquired.

There are other places that could use __libc_single_threaded, such as
_Sp_locker in src/c++11/shared_ptr.cc and locale init functions, but
they can be changed later.

libstdc++-v3/ChangeLog:

PR libstdc++/96817
* include/ext/atomicity.h (__gnu_cxx::__is_single_threaded()):
New function wrapping __libc_single_threaded if available.
(__exchange_and_add_dispatch, __atomic_add_dispatch): Use it.
* libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_abort)
(__cxa_guard_release): Likewise.
* testsuite/18_support/96817.cc: New test.

Tested powerpc64le-linux, with glibc 2.31 and 2.32. Committed to trunk.


Hi,

This patch introduced regressions on armeb-linux-gnueabhf:
--target armeb-none-linux-gnueabihf --with-cpu cortex-a9
   g++.dg/compat/init/init-ref2 cp_compat_x_tst.o-cp_compat_y_tst.o execute
   g++.dg/cpp2a/decomp1.C  -std=gnu++14 execution test
   g++.dg/cpp2a/decomp1.C  -std=gnu++17 execution test
   g++.dg/cpp2a/decomp1.C  -std=gnu++2a execution test
   g++.dg/init/init-ref2.C  -std=c++14 execution test
   g++.dg/init/init-ref2.C  -std=c++17 execution test
   g++.dg/init/init-ref2.C  -std=c++2a execution test
   g++.dg/init/init-ref2.C  -std=c++98 execution test
   g++.dg/init/ref15.C  -std=c++14 execution test
   g++.dg/init/ref15.C  -std=c++17 execution test
   g++.dg/init/ref15.C  -std=c++2a execution test
   g++.dg/init/ref15.C  -std=c++98 execution test
   g++.old-deja/g++.jason/pmf7.C  -std=c++98 execution test
   g++.old-deja/g++.mike/leak1.C  -std=c++14 execution test
   g++.old-deja/g++.mike/leak1.C  -std=c++17 execution test
   g++.old-deja/g++.mike/leak1.C  -std=c++2a execution test
   g++.old-deja/g++.mike/leak1.C  -std=c++98 execution test
   g++.old-deja/g++.other/init19.C  -std=c++14 execution test
   g++.old-deja/g++.other/init19.C  -std=c++17 execution test
   g++.old-deja/g++.other/init19.C  -std=c++2a execution test
   g++.old-deja/g++.other/init19.C  -std=c++98 execution test

and probably some (280) in libstdc++ tests: (I didn't bisect those):
   19_diagnostics/error_category/generic_category.cc execution test
   19_diagnostics/error_category/system_category.cc execution test
   20_util/scoped_allocator/1.cc execution test
   20_util/scoped_allocator/2.cc execution test
   20_util/scoped_allocator/construct_pair_c++2a.cc execution test
   20_util/to_address/debug.cc execution test
   20_util/variant/run.cc execution test


I think this is a latent bug in the static initialization code for
EABI that affects big endian. In libstdc++-v3/libsupc++/guard.cc we
have:

# ifndef _GLIBCXX_GUARD_TEST_AND_ACQUIRE

// Test the guard variable with a memory load with
// acquire semantics.

inline bool
__test_and_acquire (__cxxabiv1::__guard *g)
{
  unsigned char __c;
  unsigned char *__p = reinterpret_cast(g);
  __atomic_load (__p, &__c,  __ATOMIC_ACQUIRE);
  (void) __p;
  return _GLIBCXX_GUARD_TEST(&__c);
}
#  define _GLIBCXX_GUARD_TEST_AND_ACQUIRE(G) __test_and_acquire (G)
# endif

That inspects the first byte of the guard variable. But for EABI the
"is initialized" bit is the least significant bit of the guard
variable. For little endian that's fine, the least significant bit is
in the first byte. But for big endian, it's not in the first byte, so
we are looking in the wrong place. This means that the initial check
in __cxa_guard_acquire is wrong:

  extern "C"
  int __cxa_guard_acquire (__guard *g)
  {
#ifdef __GTHREADS
// If the target can reorder loads, we need to insert a read memory
// barrier so that accesses to the guarded variable happen after the
// guard test.
if 

[PATCH] amend SLP reduction testcases

2020-09-30 Thread Richard Biener
This amends SLP reduction testcases that currently trigger
vect_attempt_slp_rearrange_stmts eliding load permutations to
verify this is actually happening.

tested on x86_64-unknown-linux-gnu, pushed

2020-09-30  Richard Biener  

* gcc.dg/vect/pr37027.c: Amend.
* gcc.dg/vect/pr67790.c: Likewise.
* gcc.dg/vect/pr92324-4.c: Likewise.
* gcc.dg/vect/pr92558.c: Likewise.
* gcc.dg/vect/pr95495.c: Likewise.
* gcc.dg/vect/slp-reduc-1.c: Likewise.
* gcc.dg/vect/slp-reduc-2.c: Likewise.
* gcc.dg/vect/slp-reduc-3.c: Likewise.
* gcc.dg/vect/slp-reduc-4.c: Likewise.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.dg/vect/slp-reduc-7.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr37027.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/pr67790.c   | 1 +
 gcc/testsuite/gcc.dg/vect/pr92324-4.c | 2 ++
 gcc/testsuite/gcc.dg/vect/pr92558.c   | 2 ++
 gcc/testsuite/gcc.dg/vect/pr95495.c   | 2 ++
 gcc/testsuite/gcc.dg/vect/slp-reduc-1.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-2.c   | 1 +
 gcc/testsuite/gcc.dg/vect/slp-reduc-3.c   | 1 +
 gcc/testsuite/gcc.dg/vect/slp-reduc-4.c   | 1 +
 gcc/testsuite/gcc.dg/vect/slp-reduc-5.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-7.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-4.c | 1 +
 12 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr37027.c 
b/gcc/testsuite/gcc.dg/vect/pr37027.c
index ef6760ec924..69f58264de9 100644
--- a/gcc/testsuite/gcc.dg/vect/pr37027.c
+++ b/gcc/testsuite/gcc.dg/vect/pr37027.c
@@ -33,4 +33,4 @@ foo (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
vect_no_int_add } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
xfail vect_no_int_add } } } */
-
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr67790.c 
b/gcc/testsuite/gcc.dg/vect/pr67790.c
index 5e2d506a730..32eacd91fda 100644
--- a/gcc/testsuite/gcc.dg/vect/pr67790.c
+++ b/gcc/testsuite/gcc.dg/vect/pr67790.c
@@ -38,3 +38,4 @@ int main()
 }
 
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr92324-4.c 
b/gcc/testsuite/gcc.dg/vect/pr92324-4.c
index 83479852233..57e117ca109 100644
--- a/gcc/testsuite/gcc.dg/vect/pr92324-4.c
+++ b/gcc/testsuite/gcc.dg/vect/pr92324-4.c
@@ -28,3 +28,5 @@ int main ()
 __builtin_abort ();
   return 0;
 }
+
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr92558.c 
b/gcc/testsuite/gcc.dg/vect/pr92558.c
index 1d24fa0f2f8..11f41320ec1 100644
--- a/gcc/testsuite/gcc.dg/vect/pr92558.c
+++ b/gcc/testsuite/gcc.dg/vect/pr92558.c
@@ -21,3 +21,5 @@ int main()
 __builtin_abort ();
   return 0;
 }
+
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr95495.c 
b/gcc/testsuite/gcc.dg/vect/pr95495.c
index a961aef59fc..683f0f26a82 100644
--- a/gcc/testsuite/gcc.dg/vect/pr95495.c
+++ b/gcc/testsuite/gcc.dg/vect/pr95495.c
@@ -14,3 +14,5 @@ h()
 d += e[f].b >> 1 | e[f].b & 1;
   }
 }
+
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-1.c
index b353dd7ccf8..b9bddb85994 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-1.c
@@ -44,4 +44,4 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
vect_no_int_add } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
xfail vect_no_int_add } } } */
-
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-2.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-2.c
index 15dd59922fc..aa09d01975a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-2.c
@@ -41,4 +41,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
vect_no_int_add } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
xfail vect_no_int_add } } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
index 7358275c3cb..4969fe82b25 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
@@ -60,3 +60,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 
"vect" { xfail *-*-* } } } */
 /* { dg-final { 

Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 30, 2020 at 04:29:34PM +0200, Florian Weimer wrote:
> > Thinking about it more, wouldn't it better to just imply generic tuning
> > for these -march= options?
> 
> I think this is what the patch does?  See the x86-64-v3-haswell.c
> test.

No, I think it will have that behavior solely when the compiler has been
configured to default to -mtune=generic.
What I'm suggesting is to not ignore the tuning like you do for PTA_NO_TUNE,
but instead perhaps use PROCESSOR_GENERIC and special case it in the code
so that ix86_arch will be set to PROCESSOR_K8 in that case and only
ix86_tune will be PROCESSOR_GENERIC.

Jakub



Re: Another issue on RS6000 target. Re: One issue with default implementation of zero_call_used_regs

2020-09-30 Thread Qing Zhao via Gcc-patches



> On Sep 30, 2020, at 4:21 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>> Hi, Richard,
>> 
>> At the same time testing aarch64, I also tested the default implementation 
>> on rs6000 target. 
>> 
>> The default implementation now is:
>> 
>> +/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>> +
>> +HARD_REG_SET
>> +default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
>> +
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +  {
>> +   machine_mode mode = reg_raw_mode[regno];
>> +   rtx reg = gen_rtx_REG (mode, regno);
>> +   emit_move_insn (reg, const0_rtx);
> 
> This should just be:
> 
>   rtx zero = CONST0_RTX (reg_raw_mode[regno]);
>   emit_move_insn (regno_reg_rtx[regno], zero);

Okay. Will update the code.

> 
>> +  }
>> +  return need_zeroed_hardregs;
>> +}
>> +
>> 
>> With the small testing case:
>> int
>> test ()
>> {
>>  return 1;
>> }
>> 
>> If I compiled it with 
>> 
>> /home/qinzhao/Install/latest/bin/gcc -O2 -fzero-call-used-regs=all-arg t.c
>> 
>> It will failed as:
>> 
>> t.c: In function ‘test’:
>> t.c:6:1: error: insn does not satisfy its constraints:
>>6 | }
>>  | ^
>> (insn 28 27 29 (set (reg:DI 33 1)
>>(const_int 0 [0])) "t.c":6:1 647 {*movdi_internal64}
>> (nil))
>> during RTL pass: shorten
>> dump file: t.c.319r.shorten
>> t.c:6:1: internal compiler error: in extract_constrain_insn_cached, at 
>> recog.c:2207
>> 0x1018d693 _fatal_insn(char const*, rtx_def const*, char const*, int, char 
>> const*)
>>  ../../latest-gcc-x86/gcc/rtl-error.c:108
>> 0x1018d6e7 _fatal_insn_not_found(rtx_def const*, char const*, int, char 
>> const*)
>>  ../../latest-gcc-x86/gcc/rtl-error.c:118
>> 0x1099a82b extract_constrain_insn_cached(rtx_insn*)
>>  ../../latest-gcc-x86/gcc/recog.c:2207
>> 0x11393917 insn_min_length(rtx_insn*)
>>  ../../latest-gcc-x86/gcc/config/rs6000/rs6000.md:721
>> 0x105bece3 shorten_branches(rtx_insn*)
>>  ../../latest-gcc-x86/gcc/final.c:1118
>> 
>> 
>> As I checked, when the FP registers are zeroed, the above failure happened.
>> 
>> I suspect that the issue still relate to the following statement:
>> 
>> machine_mode mode = reg_raw_mode[regno];
>> 
>> As I checked, the reg_raw_mode always return the integer mode that can be 
>> hold by the hard registers, even though it’s FP register.
> 
> Well, more precisely: it's the largest mode that the target allows the
> registers to hold.  If there are multiple candidate modes of the same
> size, the integer one wins, like you say.  But the point is that DI only
> wins over DF because the target allows both DI and DF to be stored in
> the register, and therefore supports both DI and DF moves for that
> register.
> 
> So I don't think the mode is the issue.  Integer zero and floating-point
> zero have the same bit representation after all.

theoritically  yes. 
However, as we have noticed in Aarch64, the integer TI move has not been 
supported before your fix today. As a result, the Ti move have to be splitted.
With your fix today on aarch64,  Yes, the default implementation works well for 
those vector registers. Thanks a lot.

Potentially there will be other targets that have the same issue. Then those 
targets need to fix those issues too in order to make the default 
implementation work.

> 
> AIUI, without VSX, Power needs to load the zero from the constant pool.
> 
>> So, I still wondering:
>> 
>> 1. Is there another available utility routine that returns the proper MODE 
>> for the hard registers that can be readily used to zero the hard register?
>> 2. If not, should I add one more target hook for this purpose? i.e 
>> 
>> /* Return the proper machine mode that can be used to zero this hard 
>> register specified by REGNO.  */
>> machine_mode zero-call-used-regs-mode (unsigned int REGNO)
>> 
>> 3. Or should I just delete the default implemeantion, and let the target to 
>> implement it.
> 
> IMO no.  This goes back to what we discussed earlier.  It isn't the
> case that a default target hook has to be correct for all targets,
> with targets only overriding them as an optimisation.  The default
> versions of many hooks and macros are not conservatively correct.
> They are just reaonable default assumptions.  And IMO that's true
> of the hook above too.
> 
> The way to flush out whether a target needs to override the hook
> is to add tests that run on all targets.
I planned to add these new test cases, so currently I have been testing the 
simple testing cases on aarch64 and rs6000 to see any issue 
With the default implementation. So far, I have found these issues with the 
very simple testing cases.

For me, at most I can test aarch64 and rs6000 targets for some small testing 
cases for checking correctness.
> 
> That said, one way of erring on the 

Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Florian Weimer
* Jakub Jelinek:

> On Wed, Sep 30, 2020 at 04:05:41PM +0200, Florian Weimer wrote:
>> > I think the documentation should state that these are not valid in -mtune=,
>> > just in -march=, and that using -march=x86-64-v* will not change tuning.
>> > I guess there should be some testsuite coverage for the for some unexpected
>> > behavior of
>> > -march=skylake -march=x86-64-v3
>> > actually acting as
>> > -march=x86-64-v3 -mtune=skylake
>> > though perhaps it needs to be skipped if user used explicit -mtune= and
>> > not sure how to actually test that (-fverbose-asm doesn't print -mtune=
>> > when it is not explicit).
>> 
>> I think the compiler driver collapses -march=skylake -march=x86-64-v3
>> to -march=x86-64-v3, dropping the tuning.  The cc1 option parser also
>> drops the first -march=.  That's a bit surprising to me.  It means
>> that we can't use multiple tuning/non-tuning -march= switches, and
>> that tuning with (say) -march=x86-64-v3 needs to use -mtune.
>> 
>> PTA_NO_TUNE is still needed because we'd define __tune_k8__ otherwise
>> (and switch to K8 tuning internally).
>> 
>> Is it okay to simply document this?  Perhaps like this?
>
> Thinking about it more, wouldn't it better to just imply generic tuning
> for these -march= options?

I think this is what the patch does?  See the x86-64-v3-haswell.c
test.

I tried to explain this in the documentation.  I do not think this is
particularly confusing for end users because they do not see the
implementation, which is making this complicated.

I think we should not set generic tuning in processor_alias_table
because it would override tuning for target clones, and I don't think
we want to do that automatically.


Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 30, 2020 at 04:05:41PM +0200, Florian Weimer wrote:
> > I think the documentation should state that these are not valid in -mtune=,
> > just in -march=, and that using -march=x86-64-v* will not change tuning.
> > I guess there should be some testsuite coverage for the for some unexpected
> > behavior of
> > -march=skylake -march=x86-64-v3
> > actually acting as
> > -march=x86-64-v3 -mtune=skylake
> > though perhaps it needs to be skipped if user used explicit -mtune= and
> > not sure how to actually test that (-fverbose-asm doesn't print -mtune=
> > when it is not explicit).
> 
> I think the compiler driver collapses -march=skylake -march=x86-64-v3
> to -march=x86-64-v3, dropping the tuning.  The cc1 option parser also
> drops the first -march=.  That's a bit surprising to me.  It means
> that we can't use multiple tuning/non-tuning -march= switches, and
> that tuning with (say) -march=x86-64-v3 needs to use -mtune.
> 
> PTA_NO_TUNE is still needed because we'd define __tune_k8__ otherwise
> (and switch to K8 tuning internally).
> 
> Is it okay to simply document this?  Perhaps like this?

Thinking about it more, wouldn't it better to just imply generic tuning
for these -march= options?

Jakub



Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Florian Weimer
* Jakub Jelinek:

> On Wed, Sep 30, 2020 at 02:27:38PM +0200, Florian Weimer wrote:
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -29258,6 +29258,13 @@ of the selected instruction set.
>>  @item x86-64
>>  A generic CPU with 64-bit extensions.
>>  
>> +@item x86-64-v2
>> +@itemx x86-64-v3
>> +@itemx x86-64-v4
>> +These choices for @var{cpu-type} select the corresponding
>> +micro-architecture level from the x86-64 psABI.  They are only available
>> +when compiling for a x86-64 target that uses the System V psABI@.
>
> I think the documentation should state that these are not valid in -mtune=,
> just in -march=, and that using -march=x86-64-v* will not change tuning.
> I guess there should be some testsuite coverage for the for some unexpected
> behavior of
> -march=skylake -march=x86-64-v3
> actually acting as
> -march=x86-64-v3 -mtune=skylake
> though perhaps it needs to be skipped if user used explicit -mtune= and
> not sure how to actually test that (-fverbose-asm doesn't print -mtune=
> when it is not explicit).

I think the compiler driver collapses -march=skylake -march=x86-64-v3
to -march=x86-64-v3, dropping the tuning.  The cc1 option parser also
drops the first -march=.  That's a bit surprising to me.  It means
that we can't use multiple tuning/non-tuning -march= switches, and
that tuning with (say) -march=x86-64-v3 needs to use -mtune.

PTA_NO_TUNE is still needed because we'd define __tune_k8__ otherwise
(and switch to K8 tuning internally).

Is it okay to simply document this?  Perhaps like this?

8<--8<
These micro-architecture levels are defined in the x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9

PTA_NO_TUNE is introduced so that the new processor alias table entries
do not affect the CPU tuning setting in ix86_tune.

The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").

gcc/:
PR target/97250
* config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
(PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
* common/config/i386/i386-common.c (processor_alias_table):
Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
* config/i386/i386-options.c (ix86_option_override_internal):
Handle new PTA_NO_TUNE processor table entries.
* doc/invoke.texi (x86 Options): Document new -march values.

gcc/testsuite/:
PR target/97250
* gcc.target/i386/x86-64-v2.c: New test.
* gcc.target/i386/x86-64-v3.c: New test.
* gcc.target/i386/x86-64-v3-haswell.c: New test.
* gcc.target/i386/x86-64-v3-skylake.c: New test.
* gcc.target/i386/x86-64-v4.c: New test.

---
 gcc/common/config/i386/i386-common.c  |  10 +-
 gcc/config/i386/i386-options.c|  27 -
 gcc/config/i386/i386.h|  11 +-
 gcc/doc/invoke.texi   |  15 ++-
 gcc/testsuite/gcc.target/i386/x86-64-v2.c | 116 ++
 gcc/testsuite/gcc.target/i386/x86-64-v3-haswell.c |  18 
 gcc/testsuite/gcc.target/i386/x86-64-v3-skylake.c |  21 
 gcc/testsuite/gcc.target/i386/x86-64-v3.c | 116 ++
 gcc/testsuite/gcc.target/i386/x86-64-v4.c | 116 ++
 9 files changed, 440 insertions(+), 10 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 10142149115..62a620b4430 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1795,9 +1795,13 @@ const pta processor_alias_table[] =
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
   {"athlon-mp", PROCESSOR_ATHLON, CPU_ATHLON,
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
-  {"x86-64", PROCESSOR_K8, CPU_K8,
-PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR,
-0, P_NONE},
+  {"x86-64", PROCESSOR_K8, CPU_K8, PTA_X86_64_BASELINE, 0, P_NONE},
+  {"x86-64-v2", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V2 | PTA_NO_TUNE,
+   0, P_NONE},
+  {"x86-64-v3", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V3 | PTA_NO_TUNE,
+   0, P_NONE},
+  {"x86-64-v4", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V4 | PTA_NO_TUNE,
+   0, P_NONE},
   {"eden-x2", PROCESSOR_K8, CPU_K8,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR,
 0, P_NONE},
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 597de533fbd..cf48a911798 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2058,10 +2058,25 @@ ix86_option_override_internal (bool main_args_p,
return false;
  }
 
+   /* Only the x86-64 psABI defines the feature-only
+  micro-architecture levels that use PTA_NO_TUNE.  */
+   if ((processor_alias_table[i].flags & 

Re: [committed] testsuite: Fix up amx* dg-do run tests with older binutils

2020-09-30 Thread Hongyu Wang via Gcc-patches
Thanks for the fix! I forgot that we don't have builtin check for
target-supports.exp.

Will update these once we implement AMX with builtins.

Jakub Jelinek  于2020年9月30日周三 下午7:51写道:

> On Fri, Sep 18, 2020 at 04:31:55PM +0800, Hongyu Wang via Gcc-patches
> wrote:
> > Very Appreciated for your review again
> >
> > I just update the patch with adding XSAVE dependency and use
> > __builtin_cpu_supports for runtime test.
>
> Several tests FAIL when using older binutils that don't support AMX.
>
> Fixed thusly, tested on x86_64-linux -m32/-m64, committed to trunk as
> obvious:
>
> 2020-09-30  Jakub Jelinek  
>
> * gcc.target/i386/amxint8-dpbssd-2.c: Require effective targets
> amx_tile and amx_int8.
> * gcc.target/i386/amxint8-dpbsud-2.c: Likewise.
> * gcc.target/i386/amxint8-dpbusd-2.c: Likewise.
> * gcc.target/i386/amxint8-dpbuud-2.c: Likewise.
> * gcc.target/i386/amxbf16-dpbf16ps-2.c: Require effective targets
> amx_tile and amx_bf16.
> * gcc.target/i386/amxtile-2.c: Require effective target amx_tile.
>
> --- gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c.jj 2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c2020-09-30
> 13:16:08.186445881 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_int8 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
>  #include 
>
> --- gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c.jj 2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c2020-09-30
> 13:16:23.715221450 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_int8 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
>  #include 
>
> --- gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c.jj 2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c2020-09-30
> 13:16:28.998145100 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_int8 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
>  #include 
>
> --- gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c.jj 2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c2020-09-30
> 13:16:35.770047224 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_int8 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
>  #include 
>
> --- gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c.jj   2020-09-29
> 11:32:02.949602773 +0200
> +++ gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c  2020-09-30
> 13:15:41.079837637 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_bf16 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-bf16" } */
>  #include 
>
> --- gcc/testsuite/gcc.target/i386/amxtile-2.c.jj2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxtile-2.c   2020-09-30
> 13:16:57.972726339 +0200
> @@ -1,4 +1,5 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
>  /* { dg-options "-O2 -mamx-tile " } */
>  #include 
>
>
>
> Jakub
>
>

-- 
Regards,

Hongyu, Wang


[PING][PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-30 Thread Alex Coplan via Gcc-patches
Ping. Are these testsuite fixes for ILP32 OK?

On 18/09/2020 17:15, Alex Coplan wrote:
> Hi Christophe,
> 
> On 08/09/2020 10:14, Christophe Lyon wrote:
> > On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/aarch64/aarch64.md
> > > (*adds__): Ensure extended operand
> > > agrees with width of extension specifier.
> > > (*subs__): Likewise.
> > > (*adds__shift_): Likewise.
> > > (*subs__shift_): Likewise.
> > > (*add__): Likewise.
> > > (*add__shft_): Likewise.
> > > (*add_uxt_shift2): Likewise.
> > > (*sub__): Likewise.
> > > (*sub__shft_): Likewise.
> > > (*sub_uxt_shift2): Likewise.
> > > (*cmp_swp__reg): Likewise.
> > > (*cmp_swp__shft_): Likewise.
> > >
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> > > * gcc.target/aarch64/cmp.c: Likewise.
> > > * gcc.target/aarch64/subs3.c: Likewise.
> > > * gcc.target/aarch64/subsp.c: Likewise.
> > > * gcc.target/aarch64/extend-syntax.c: New test.
> > >
> > 
> > Hi,
> > 
> > I've noticed some of the new tests fail with -mabi=ilp32:
> > gcc.target/aarch64/extend-syntax.c check-function-bodies add1
> > gcc.target/aarch64/extend-syntax.c check-function-bodies add3
> > gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
> > gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
> > gcc.target/aarch64/extend-syntax.c scan-assembler-times
> > subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
> > gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 4\n
> > 
> > Christophe
> 
> AFAICT the second scan-assembler in that subsp test failed on ILP32
> before my commit. This is because we generate slightly suboptimal code
> here. On LP64 with -O, we get:
> 
> f2:
> stp x29, x30, [sp, -16]!
> mov x29, sp
> add w1, w1, 1
> sub sp, sp, x1, sxtw 4
> mov x0, sp
> bl  foo
> mov sp, x29
> ldp x29, x30, [sp], 16
> ret
> 
> On ILP32, we get:
> 
> f2:
> stp x29, x30, [sp, -16]!
> mov x29, sp
> add w1, w1, 1
> lsl w1, w1, 4
> sub sp, sp, x1
> mov w0, wsp
> bl  foo
> mov sp, x29
> ldp x29, x30, [sp], 16
> ret
> 
> And we see similar results all the way back to GCC 6. So AFAICT this
> scan-assembler has never worked. The attached patch disables it on ILP32
> since this isn't a code quality regression.
> 
> This patch also fixes up the DejaGnu directives in extend-syntax.c to
> work on ILP32: we change the check-function-bodies directive to only run
> on LP64, adding scan-assembler directives for ILP32 where required.
> 
> OK for trunk?
> 
> Thanks,
> Alex

> diff --git a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c 
> b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> index 23fa9f4ffc5..1bfcdb59dde 100644
> --- a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> +++ b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> @@ -20,6 +20,7 @@ unsigned long long *add1(unsigned long long *p, unsigned x)
>  */
>  unsigned long long add2(unsigned long long x, unsigned y)
>  {
> +  /* { dg-final { scan-assembler-times "add\tx0, x0, w1, uxtw" 1 { target 
> ilp32 } } } */
>return x + y;
>  }
>  
> @@ -34,6 +35,9 @@ double *add3(double *p, int x)
>return p + x;
>  }
>  
> +// add1 and add3 should both generate this on ILP32:
> +/* { dg-final { scan-assembler-times "add\tw0, w0, w1, lsl 3" 2 { target 
> ilp32 } } } */
> +
>  // Hits *sub_zero_extendsi_di (*sub__).
>  /*
>  ** sub1:
> @@ -42,6 +46,7 @@ double *add3(double *p, int x)
>  */
>  unsigned long long sub1(unsigned long long x, unsigned n)
>  {
> +/* { dg-final { scan-assembler-times "sub\tx0, x0, w1, uxtw" 1 { target 
> ilp32 } } } */
>  return x - n;
>  }
>  
> @@ -67,6 +72,9 @@ double *sub3(double *p, int n)
>return p - n;
>  }
>  
> +// sub2 and sub3 should both generate this on ILP32:
> +/* { dg-final { scan-assembler-times "sub\tw0, w0, w1, lsl 3" 2 { target 
> ilp32 } } } */
> +
>  // Hits *adds_zero_extendsi_di (*adds__).
>  int adds1(unsigned long long x, unsigned y)
>  {
> @@ -97,7 +105,8 @@ int subs1(unsigned long long x, unsigned y)
>  unsigned long long *w;
>  int subs2(unsigned long long *x, int y)
>  {
> -  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, 
> w\[0-9\]+, sxtw 3" 1 } } */
> +  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, 
> w\[0-9\]+, sxtw 3" 1 { target lp64 } } } */
> +  /* { dg-final { scan-assembler-times "subs\tw\[0-9\]+, w\[0-9\]+, 
> w\[0-9\]+, lsl 3" 1 { target ilp32 } } } */
>unsigned long long *t = x - y;
>w = t;
>return !!t;
> @@ -117,4 +126,4 @@ int cmp2(unsigned long long x, int y)
>return x == ((unsigned 

RE: [PATCH 1/1] arm: [testsuite] Skip thumb2-cond-cmp tests on Cortex-M [PR94595]

2020-09-30 Thread Kyrylo Tkachov via Gcc-patches
Now adding gcc-patches too

> -Original Message-
> From: Kyrylo Tkachov
> Sent: 30 September 2020 15:02
> To: Christophe Lyon 
> Subject: RE: [PATCH 1/1] arm: [testsuite] Skip thumb2-cond-cmp tests on
> Cortex-M [PR94595]
> 
> Hi Christophe,
> 
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 07 September 2020 17:13
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH 1/1] arm: [testsuite] Skip thumb2-cond-cmp tests on
> Cortex-
> > M [PR94595]
> >
> > Since r204778 (g571880a0a4c512195aa7d41929ba6795190887b2), we
> favor
> > branches over IT blocks on Cortex-M. As a result, instead of
> > generating two nested IT blocks in thumb2-cond-cmp-[1234].c, we
> > generate either a single IT block, or use branches depending on
> > conditions tested by the program.
> >
> > Since this was a deliberate change and the tests still pass as
> > expected on Cortex-A, this patch skips them when targetting
> > Cortex-M. The avoids the failures on Cortex M3, M4, and M33.  This
> > patch makes the testcases unsupported on Cortex-M7 although they pass
> > in this case because this CPU has different branch costs.
> >
> > I tried to relax the scan-assembler directives using eg. cmpne|subne
> > or cmpgt|ble but that seemed fragile.
> >
> > OK?
> 
> Ok. Sorry for the delay, it fell through my filters.
> 
> Thanks,
> Kyrill
> 
> >
> > 2020-09-07  Christophe Lyon  
> >
> > gcc/testsuite/
> > PR target/94595
> > * gcc.target/arm/thumb2-cond-cmp-1.c: Skip if arm_cortex_m.
> > * gcc.target/arm/thumb2-cond-cmp-2.c: Skip if arm_cortex_m.
> > * gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
> > * gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
> > ---
> >  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c | 2 +-
> >  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c | 2 +-
> >  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c | 2 +-
> >  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c | 2 +-
> >  4 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> > b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> > index 45ab605..36204f4 100644
> > --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> > +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> > @@ -1,6 +1,6 @@
> >  /* Use conditional compare */
> >  /* { dg-options "-O2" } */
> > -/* { dg-skip-if "" { arm_thumb1_ok } } */
> > +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
> >  /* { dg-final { scan-assembler "cmpne" } } */
> >
> >  int f(int i, int j)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> > b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> > index 17d9a8f..108d1c3 100644
> > --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> > +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> > @@ -1,6 +1,6 @@
> >  /* Use conditional compare */
> >  /* { dg-options "-O2" } */
> > -/* { dg-skip-if "" { arm_thumb1_ok } } */
> > +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
> >  /* { dg-final { scan-assembler "cmpeq" } } */
> >
> >  int f(int i, int j)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> > b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> > index 6b2a79b..ca7fd9f 100644
> > --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> > +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> > @@ -1,6 +1,6 @@
> >  /* Use conditional compare */
> >  /* { dg-options "-O2" } */
> > -/* { dg-skip-if "" { arm_thumb1_ok } } */
> > +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
> >  /* { dg-final { scan-assembler "cmpgt" } } */
> >
> >  int f(int i, int j)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> > b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> > index 80e1076..91cc8f4 100644
> > --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> > +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> > @@ -1,6 +1,6 @@
> >  /* Use conditional compare */
> >  /* { dg-options "-O2" } */
> > -/* { dg-skip-if "" { arm_thumb1_ok } } */
> > +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
> >  /* { dg-final { scan-assembler "cmpgt" } } */
> >
> >  int f(int i, int j)
> > --
> > 2.7.4



RE: [GCC][PATCH] arm: Fix MVE intrinsics polymorphic variants wrongly generating __ARM_undef type (pr96795).

2020-09-30 Thread Kyrylo Tkachov via Gcc-patches
Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 30 September 2020 12:51
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [GCC][PATCH] arm: Fix MVE intrinsics polymorphic variants wrongly
> generating __ARM_undef type (pr96795).
> 
> Hello,
> 
> This patch fixes (PR96795) MVE intrinsic polymorphic variants vaddq,
> vaddq_m, vaddq_x, vcmpeqq_m,
> vcmpeqq, vcmpgeq_m, vcmpgeq, vcmpgtq_m, vcmpgtq, vcmpleq_m,
> vcmpleq, vcmpltq_m, vcmpltq,
> vcmpneq_m, vcmpneq, vfmaq_m, vfmaq, vfmasq_m, vfmasq, vmaxnmavq,
> vmaxnmavq_p, vmaxnmvq,
> vmaxnmvq_p, vminnmavq, vminnmavq_p, vminnmvq, vminnmvq_p, vmulq_m,
> vmulq, vmulq_x, vsetq_lane,
> vsubq_m, vsubq and vsubq_x which are incorrectly generating __ARM_undef
> and mismatching the passed
> floating point scalar arguments.
> 
> Bootstrapped on arm-none-linux-gnueabihf and regression tested on arm-
> none-eabi and found no regressions.
> 
> Ok for master? Ok for GCC-10 branch?

Ok for both.
Thanks,
Kyrill

> 
> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2020-09-30  Srinath Parvathaneni  
> 
>   PR target/96795
>   * config/arm/arm_mve.h (__ARM_mve_coerce2): Define.
>   (__arm_vaddq): Correct the scalar argument.
>   (__arm_vaddq_m): Likewise.
>   (__arm_vaddq_x): Likewise.
>   (__arm_vcmpeqq_m): Likewise.
>   (__arm_vcmpeqq): Likewise.
>   (__arm_vcmpgeq_m): Likewise.
>   (__arm_vcmpgeq): Likewise.
>   (__arm_vcmpgtq_m): Likewise.
>   (__arm_vcmpgtq): Likewise.
>   (__arm_vcmpleq_m): Likewise.
>   (__arm_vcmpleq): Likewise.
>   (__arm_vcmpltq_m): Likewise.
>   (__arm_vcmpltq): Likewise.
>   (__arm_vcmpneq_m): Likewise.
>   (__arm_vcmpneq): Likewise.
>   (__arm_vfmaq_m): Likewise.
>   (__arm_vfmaq): Likewise.
>   (__arm_vfmasq_m): Likewise.
>   (__arm_vfmasq): Likewise.
>   (__arm_vmaxnmavq): Likewise.
>   (__arm_vmaxnmavq_p): Likewise.
>   (__arm_vmaxnmvq): Likewise.
>   (__arm_vmaxnmvq_p): Likewise.
>   (__arm_vminnmavq): Likewise.
>   (__arm_vminnmavq_p): Likewise.
>   (__arm_vminnmvq): Likewise.
>   (__arm_vminnmvq_p): Likewise.
>   (__arm_vmulq_m): Likewise.
>   (__arm_vmulq): Likewise.
>   (__arm_vmulq_x): Likewise.
>   (__arm_vsetq_lane): Likewise.
>   (__arm_vsubq_m): Likewise.
>   (__arm_vsubq): Likewise.
>   (__arm_vsubq_x): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/96795
>   * gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: New Test.
>   * gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpleq_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpleq_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpltq_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpltq_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpneq_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcmpneq_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vfmaq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vfmaq_m_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vfmaq_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vfmaq_n_f32-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16-1.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32-1.c: Likewise.
>   * 

Re: [PATCH 1/1] arm: [testsuite] Skip thumb2-cond-cmp tests on Cortex-M [PR94595]

2020-09-30 Thread Christophe Lyon via Gcc-patches
Ping?

On Thu, 24 Sep 2020 at 15:18, Christophe Lyon
 wrote:
>
> Ping?
>
> On Mon, 7 Sep 2020 at 18:13, Christophe Lyon  
> wrote:
> >
> > Since r204778 (g571880a0a4c512195aa7d41929ba6795190887b2), we favor
> > branches over IT blocks on Cortex-M. As a result, instead of
> > generating two nested IT blocks in thumb2-cond-cmp-[1234].c, we
> > generate either a single IT block, or use branches depending on
> > conditions tested by the program.
> >
> > Since this was a deliberate change and the tests still pass as
> > expected on Cortex-A, this patch skips them when targetting
> > Cortex-M. The avoids the failures on Cortex M3, M4, and M33.  This
> > patch makes the testcases unsupported on Cortex-M7 although they pass
> > in this case because this CPU has different branch costs.
> >
> > I tried to relax the scan-assembler directives using eg. cmpne|subne
> > or cmpgt|ble but that seemed fragile.
> >
> > OK?
> >
> > 2020-09-07  Christophe Lyon  
> >
> > gcc/testsuite/
> > PR target/94595
> > * gcc.target/arm/thumb2-cond-cmp-1.c: Skip if arm_cortex_m.
> > * gcc.target/arm/thumb2-cond-cmp-2.c: Skip if arm_cortex_m.
> > * gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
> > * gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
> > ---
> >  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c | 2 +-
> >  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c | 2 +-
> >  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c | 2 +-
> >  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c | 2 +-
> >  4 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c 
> > b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> > index 45ab605..36204f4 100644
> > --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> > +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> > @@ -1,6 +1,6 @@
> >  /* Use conditional compare */
> >  /* { dg-options "-O2" } */
> > -/* { dg-skip-if "" { arm_thumb1_ok } } */
> > +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
> >  /* { dg-final { scan-assembler "cmpne" } } */
> >
> >  int f(int i, int j)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c 
> > b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> > index 17d9a8f..108d1c3 100644
> > --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> > +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> > @@ -1,6 +1,6 @@
> >  /* Use conditional compare */
> >  /* { dg-options "-O2" } */
> > -/* { dg-skip-if "" { arm_thumb1_ok } } */
> > +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
> >  /* { dg-final { scan-assembler "cmpeq" } } */
> >
> >  int f(int i, int j)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c 
> > b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> > index 6b2a79b..ca7fd9f 100644
> > --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> > +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> > @@ -1,6 +1,6 @@
> >  /* Use conditional compare */
> >  /* { dg-options "-O2" } */
> > -/* { dg-skip-if "" { arm_thumb1_ok } } */
> > +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
> >  /* { dg-final { scan-assembler "cmpgt" } } */
> >
> >  int f(int i, int j)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c 
> > b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> > index 80e1076..91cc8f4 100644
> > --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> > +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> > @@ -1,6 +1,6 @@
> >  /* Use conditional compare */
> >  /* { dg-options "-O2" } */
> > -/* { dg-skip-if "" { arm_thumb1_ok } } */
> > +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
> >  /* { dg-final { scan-assembler "cmpgt" } } */
> >
> >  int f(int i, int j)
> > --
> > 2.7.4
> >


c++: Kill DECL_HIDDEN_FRIEND_P

2020-09-30 Thread Nathan Sidwell


Now hiddenness is managed by name-lookup, we no longer need 
DECL_HIDDEN_FRIEND_P.
This removes it.  Mainly by deleting its bookkeeping, but there are a 
couple of uses


1) two name lookups look at it to see if they found a hidden thing.
In one we have the OVERLOAD, so can record OVL_HIDDEN_P.  In the other
we're repeating a lookup that failed, but asking for hidden things --
so if that succeeds we know the thing was hidden.  (FWIW CWG recently
discussed whether template specializations and instantiations should
see such hidden templates anyway, there is compiler divergence.)

2) We had a confusing setting of KOENIG_P when building a
non-dependent call.  We don't repeat that lookup at instantiation time
anyway.

gcc/cp/
* cp-tree.h (struct lang_decl_fn): Remove hidden_friend_p.
(DECL_HIDDEN_FRIEND_P): Delete.
* call.c (add_function_candidate): Drop assert about anticipated
decl.
(build_new_op_1): Drop koenig lookup flagging for hidden friend.
* decl.c (duplicate_decls): Drop HIDDEN_FRIEND_P updating.
* name-lookup.c (do_pushdecl): Likewise.
(set_decl_namespace): Discover hiddenness from OVL_HIDDEN_P.
* pt.c (check_explicit_specialization): Record found_hidden
explicitly.

pushing to trunk

nathan


--
Nathan Sidwell
diff --git i/gcc/cp/call.c w/gcc/cp/call.c
index 5606389f4bd..da013e17e14 100644
--- i/gcc/cp/call.c
+++ w/gcc/cp/call.c
@@ -2220,11 +2220,6 @@ add_function_candidate (struct z_candidate **candidates,
   int viable = 1;
   struct rejection_reason *reason = NULL;
 
-  /* At this point we should not see any functions which haven't been
- explicitly declared, except for friend functions which will have
- been found using argument dependent lookup.  */
-  gcc_assert (!DECL_ANTICIPATED (fn) || DECL_HIDDEN_FRIEND_P (fn));
-
   /* The `this', `in_chrg' and VTT arguments to constructors are not
  considered in overload resolution.  */
   if (DECL_CONSTRUCTOR_P (fn))
@@ -6344,11 +6339,6 @@ build_new_op_1 (const op_location_t , enum tree_code code, int flags,
 	  tree call = extract_call_expr (result);
 	  CALL_EXPR_OPERATOR_SYNTAX (call) = true;
 
-	  if (processing_template_decl && DECL_HIDDEN_FRIEND_P (cand->fn))
-		/* This prevents build_new_function_call from discarding this
-		   function during instantiation of the enclosing template.  */
-		KOENIG_LOOKUP_P (call) = 1;
-
 	  /* Specify evaluation order as per P0145R2.  */
 	  CALL_EXPR_ORDERED_ARGS (call) = false;
 	  switch (op_is_ordered (code))
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index a25934e3263..762a3519b7c 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -2720,14 +2720,13 @@ struct GTY(()) lang_decl_fn {
   unsigned thunk_p : 1;
 
   unsigned this_thunk_p : 1;
-  unsigned hidden_friend_p : 1;
   unsigned omp_declare_reduction_p : 1;
   unsigned has_dependent_explicit_spec_p : 1;
   unsigned immediate_fn_p : 1;
   unsigned maybe_deleted : 1;
   unsigned coroutine_p : 1;
 
-  unsigned spare : 9;
+  unsigned spare : 10;
 
   /* 32-bits padding on 64-bit host.  */
 
@@ -4067,12 +4066,6 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter)
 #define DECL_OMP_PRIVATIZED_MEMBER(NODE) \
   (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))->u.base.anticipated_p)
 
-/* Nonzero if NODE is a FUNCTION_DECL which was declared as a friend
-   within a class but has not been declared in the surrounding scope.
-   The function is invisible except via argument dependent lookup.  */
-#define DECL_HIDDEN_FRIEND_P(NODE) \
-  (LANG_DECL_FN_CHECK (DECL_COMMON_CHECK (NODE))->hidden_friend_p)
-
 /* Nonzero if NODE is an artificial FUNCTION_DECL for
#pragma omp declare reduction.  */
 #define DECL_OMP_DECLARE_REDUCTION_P(NODE) \
diff --git i/gcc/cp/decl.c w/gcc/cp/decl.c
index 617b96e02e4..14742c115ad 100644
--- i/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -2141,10 +2141,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden)
   olddecl_hidden_friend = olddecl_friend && was_hidden;
   hidden_friend = olddecl_hidden_friend && hiding;
   if (!hidden_friend)
-	{
-	  DECL_ANTICIPATED (olddecl) = 0;
-	  DECL_HIDDEN_FRIEND_P (olddecl) = 0;
-	}
+	DECL_ANTICIPATED (olddecl) = false;
 }
 
   if (TREE_CODE (newdecl) == TEMPLATE_DECL)
@@ -2892,12 +2889,9 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden)
 
   DECL_UID (olddecl) = olddecl_uid;
   if (olddecl_friend)
-DECL_FRIEND_P (olddecl) = 1;
+DECL_FRIEND_P (olddecl) = true;
   if (hidden_friend)
-{
-  DECL_ANTICIPATED (olddecl) = 1;
-  DECL_HIDDEN_FRIEND_P (olddecl) = 1;
-}
+DECL_ANTICIPATED (olddecl) = true;
 
   /* NEWDECL contains the merged attribute lists.
  Update OLDDECL to be the same.  */
diff --git i/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index bc60d343f7e..8cd6fe38271 100644
--- i/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -3172,7 +3172,7 @@ 

Re: [PATCH] options: Save and restore opts_set for Optimization and Target options

2020-09-30 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Wed, Sep 30, 2020 at 01:39:11PM +0200, Jakub Jelinek wrote:
> On Wed, Sep 30, 2020 at 01:21:44PM +0200, Stefan Schulze Frielinghaus wrote:
> > I think the problem boils down that on S/390 we distinguish between four
> > states of a flag: explicitely set to yes/no and implicitely set to
> > yes/no.  If set explicitely, the option wins.  For example, the options
> > `-march=z10 -mhtm` should enable the hardware transactional memory
> > option although z10 does not have one.  In the past if a flag was set or
> > not explicitely was encoded into opts_set->x_target_flags ... for each
> > flag individually, e.g. TARGET_OPT_HTM_P (opts_set->x_target_flags) was
> 
> Oops, seems I've missed that set_option has special treatment for
> CLVC_BIT_CLEAR/CLVC_BIT_SET.
> Which means I'll need to change the generic handling, so that for
> global_options_set elements mentioned in CLVC_BIT_* options are treated
> differently, instead of using the accumulated bitmasks they'll need to use
> their specific bitmask variables during the option saving/restoring.
> Is it ok if I defer it for tomorrow? Need to prepare for OpenMP meeting now.

Sure, no problem at all.  In that case I stop to investigate further and
wait for you.

Cheers,
Stefan


RE: [PATCH][GCC][AArch64] Add support for Cortex-A78 and Cortex-A78AE

2020-09-30 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Przemyslaw Wirkus 
> Sent: 30 September 2020 11:39
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> ; Kyrylo Tkachov
> ; Marcus Shawcroft
> 
> Subject: [PATCH][GCC][AArch64] Add support for Cortex-A78 and Cortex-
> A78AE
> 
> This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
> cpus.
> 
> [0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
> [1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae
> 
> OK for master branch ?

Ok.
Thanks,
Kyrill

> 
> kind regards
> Przemyslaw Wirkus
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-cores.def: Add Cortex-A78 and Cortex-
> A78AE cores.
>   * config/aarch64/aarch64-tune.md: Regenerate.
>   * doc/invoke.texi: Add -mtune=cortex-a78 and -mtune=cortex-a78ae.


RE: [PATCH][GCC][ARM] Add support for Cortex-A78 and Cortex-A78AE

2020-09-30 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Przemyslaw Wirkus 
> Sent: 30 September 2020 11:42
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Ramana Radhakrishnan
> ; Kyrylo Tkachov
> ; Richard Earnshaw
> 
> Subject: [PATCH][GCC][ARM] Add support for Cortex-A78 and Cortex-A78AE
> 
> This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
> cpus.
> 
>   [0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-
> a78
>   [1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-
> a78ae
> 
> OK for master branch ?

Ok.
Thanks,
Kyrill

> 
> kind regards
> Przemyslaw Wirkus
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-cpus.in: Add Cortex-A78 and Cortex-A78AE cores.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-tune.md: Regenerate.
>   * doc/invoke.texi: Update docs.


Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 30, 2020 at 02:27:38PM +0200, Florian Weimer wrote:
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -29258,6 +29258,13 @@ of the selected instruction set.
>  @item x86-64
>  A generic CPU with 64-bit extensions.
>  
> +@item x86-64-v2
> +@itemx x86-64-v3
> +@itemx x86-64-v4
> +These choices for @var{cpu-type} select the corresponding
> +micro-architecture level from the x86-64 psABI.  They are only available
> +when compiling for a x86-64 target that uses the System V psABI@.

I think the documentation should state that these are not valid in -mtune=,
just in -march=, and that using -march=x86-64-v* will not change tuning.
I guess there should be some testsuite coverage for the for some unexpected
behavior of
-march=skylake -march=x86-64-v3
actually acting as
-march=x86-64-v3 -mtune=skylake
though perhaps it needs to be skipped if user used explicit -mtune= and
not sure how to actually test that (-fverbose-asm doesn't print -mtune=
when it is not explicit).

Jakub



Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Florian Weimer
* Uros Bizjak:

> On Wed, Sep 30, 2020 at 2:27 PM Florian Weimer  wrote:
>>
>> These micro-architecture levels are defined in the x86-64 psABI:
>>
>> https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9
>>
>> PTA_NO_TUNE is introduced so that the new processor alias table entries
>> do not affect the CPU tuning setting in ix86_tune.
>>
>> The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
>> ("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").
>>
>> gcc/:
>> PR target/97250
>> * config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
>> (PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
>> * common/config/i386/i386-common.c (processor_alias_table):
>> Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
>> * config/i386/i386-options.c (ix86_option_override_internal):
>> Handle new PTA_NO_TUNE processor table entries.
>> * doc/invoke.texi (x86 Options): Document new -march values.
>>
>> gcc/testsuite/:
>> PR target/97250
>> * gcc.target/i386/x86-64-v2.c: New test.
>> * gcc.target/i386/x86-64-v3.c: New test.
>> * gcc.target/i386/x86-64-v4.c: New test.
>
> Perhaps you should also test for the newly introduced __LAHF_SAHF__ define?

Like this?  Thanks.

8<--8<
These micro-architecture levels are defined in the x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9

PTA_NO_TUNE is introduced so that the new processor alias table entries
do not affect the CPU tuning setting in ix86_tune.

The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").

gcc/:
PR target/97250
* config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
(PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
* common/config/i386/i386-common.c (processor_alias_table):
Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
* config/i386/i386-options.c (ix86_option_override_internal):
Handle new PTA_NO_TUNE processor table entries.
* doc/invoke.texi (x86 Options): Document new -march values.

gcc/testsuite/:
PR target/97250
* gcc.target/i386/x86-64-v2.c: New test.
* gcc.target/i386/x86-64-v3.c: New test.
* gcc.target/i386/x86-64-v4.c: New test.

---
 gcc/common/config/i386/i386-common.c  |  10 ++-
 gcc/config/i386/i386-options.c|  27 +--
 gcc/config/i386/i386.h|  11 ++-
 gcc/doc/invoke.texi   |   7 ++
 gcc/testsuite/gcc.target/i386/x86-64-v2.c | 116 ++
 gcc/testsuite/gcc.target/i386/x86-64-v3.c | 116 ++
 gcc/testsuite/gcc.target/i386/x86-64-v4.c | 116 ++
 7 files changed, 394 insertions(+), 9 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 10142149115..62a620b4430 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1795,9 +1795,13 @@ const pta processor_alias_table[] =
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
   {"athlon-mp", PROCESSOR_ATHLON, CPU_ATHLON,
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
-  {"x86-64", PROCESSOR_K8, CPU_K8,
-PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR,
-0, P_NONE},
+  {"x86-64", PROCESSOR_K8, CPU_K8, PTA_X86_64_BASELINE, 0, P_NONE},
+  {"x86-64-v2", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V2 | PTA_NO_TUNE,
+   0, P_NONE},
+  {"x86-64-v3", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V3 | PTA_NO_TUNE,
+   0, P_NONE},
+  {"x86-64-v4", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V4 | PTA_NO_TUNE,
+   0, P_NONE},
   {"eden-x2", PROCESSOR_K8, CPU_K8,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR,
 0, P_NONE},
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 597de533fbd..cf48a911798 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2058,10 +2058,25 @@ ix86_option_override_internal (bool main_args_p,
return false;
  }
 
+   /* Only the x86-64 psABI defines the feature-only
+  micro-architecture levels that use PTA_NO_TUNE.  */
+   if ((processor_alias_table[i].flags & PTA_NO_TUNE) != 0
+   && (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
+   || opts->x_ix86_abi != SYSV_ABI))
+ {
+   error (G_("%<%s%> architecture level is only defined"
+ " for the x86-64 psABI"), opts->x_ix86_arch_string);
+   return false;
+ }
+
ix86_schedule = processor_alias_table[i].schedule;
ix86_arch = processor_alias_table[i].processor;
-   /* Default cpu tuning to the architecture.  */
-   ix86_tune = 

Re: [PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Uros Bizjak via Gcc-patches
On Wed, Sep 30, 2020 at 2:27 PM Florian Weimer  wrote:
>
> These micro-architecture levels are defined in the x86-64 psABI:
>
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9
>
> PTA_NO_TUNE is introduced so that the new processor alias table entries
> do not affect the CPU tuning setting in ix86_tune.
>
> The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
> ("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").
>
> gcc/:
> PR target/97250
> * config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
> (PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
> * common/config/i386/i386-common.c (processor_alias_table):
> Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
> * config/i386/i386-options.c (ix86_option_override_internal):
> Handle new PTA_NO_TUNE processor table entries.
> * doc/invoke.texi (x86 Options): Document new -march values.
>
> gcc/testsuite/:
> PR target/97250
> * gcc.target/i386/x86-64-v2.c: New test.
> * gcc.target/i386/x86-64-v3.c: New test.
> * gcc.target/i386/x86-64-v4.c: New test.

Perhaps you should also test for the newly introduced __LAHF_SAHF__ define?

Uros.

> ---
>
> Notes (not going to be committed);
>
> I struggled a bit without avoid ICEs when I used PROCESSOR_GENERIC
> instead of PROCESSOR_K8 in the new process alias table entries.  In
> the end, I think not resetting the tuning setting is the correct thing
> to do.
>
> Test results on x86-64 (on Debian buster) look okay-ish to me.  I see
> lots of obviously unrelated FAILs.
>
>  gcc/common/config/i386/i386-common.c  |  10 ++-
>  gcc/config/i386/i386-options.c|  27 +--
>  gcc/config/i386/i386.h|  11 ++-
>  gcc/doc/invoke.texi   |   7 ++
>  gcc/testsuite/gcc.target/i386/x86-64-v2.c | 113 
> ++
>  gcc/testsuite/gcc.target/i386/x86-64-v3.c | 113 
> ++
>  gcc/testsuite/gcc.target/i386/x86-64-v4.c | 113 
> ++
>  7 files changed, 385 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.c 
> b/gcc/common/config/i386/i386-common.c
> index 10142149115..62a620b4430 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -1795,9 +1795,13 @@ const pta processor_alias_table[] =
>  PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
>{"athlon-mp", PROCESSOR_ATHLON, CPU_ATHLON,
>  PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
> -  {"x86-64", PROCESSOR_K8, CPU_K8,
> -PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR,
> -0, P_NONE},
> +  {"x86-64", PROCESSOR_K8, CPU_K8, PTA_X86_64_BASELINE, 0, P_NONE},
> +  {"x86-64-v2", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V2 | PTA_NO_TUNE,
> +   0, P_NONE},
> +  {"x86-64-v3", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V3 | PTA_NO_TUNE,
> +   0, P_NONE},
> +  {"x86-64-v4", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V4 | PTA_NO_TUNE,
> +   0, P_NONE},
>{"eden-x2", PROCESSOR_K8, CPU_K8,
>  PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR,
>  0, P_NONE},
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 597de533fbd..cf48a911798 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -2058,10 +2058,25 @@ ix86_option_override_internal (bool main_args_p,
> return false;
>   }
>
> +   /* Only the x86-64 psABI defines the feature-only
> +  micro-architecture levels that use PTA_NO_TUNE.  */
> +   if ((processor_alias_table[i].flags & PTA_NO_TUNE) != 0
> +   && (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
> +   || opts->x_ix86_abi != SYSV_ABI))
> + {
> +   error (G_("%<%s%> architecture level is only defined"
> + " for the x86-64 psABI"), opts->x_ix86_arch_string);
> +   return false;
> + }
> +
> ix86_schedule = processor_alias_table[i].schedule;
> ix86_arch = processor_alias_table[i].processor;
> -   /* Default cpu tuning to the architecture.  */
> -   ix86_tune = ix86_arch;
> +
> +   /* Default cpu tuning to the architecture, unless the table
> +  entry requests not to do this.  Used by the x86-64 psABI
> +  micro-architecture levels.  */
> +   if ((processor_alias_table[i].flags & PTA_NO_TUNE) == 0)
> + ix86_tune = ix86_arch;
>
> if (((processor_alias_table[i].flags & PTA_MMX) != 0)
> && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MMX))
> @@ -2384,7 +2399,8 @@ ix86_option_override_internal (bool main_args_p,
>  ix86_arch_features[i] = !!(initial_ix86_arch_features[i] & 
> ix86_arch_mask);
>
>for (i = 0; i < pta_size; i++)
> -if (! strcmp (opts->x_ix86_tune_string, 

Re: Ping: [PATCH] arm: Add new vector mode macros

2020-09-30 Thread Christophe Lyon via Gcc-patches
On Tue, 29 Sep 2020 at 12:38, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: 29 September 2020 11:27
> > To: Kyrylo Tkachov 
> > Cc: gcc-patches@gcc.gnu.org; ni...@redhat.com; Richard Earnshaw
> > ; Ramana Radhakrishnan
> > ; Dennis Zhang
> > 
> > Subject: Ping: [PATCH] arm: Add new vector mode macros
> >
> > Ping
> >
> > Richard Sandiford  writes:
> > > Kyrylo Tkachov  writes:
> > >> This looks like a productive way forward to me.
> > >> Okay if the other maintainer don't object by the end of the week.
> > >
> > > Thanks.  Dennis pointed out off-list that it regressed
> > > armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
> > > to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
> > > be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
> > > flush-to-zero control.
> > >
> > > This version therefore updates the test to expect the same output
> > > as the -ffast-math version.
> > >
> > > Tested on arm-linux-gnueabi (hopefully for real this time -- I must
> > > have messed up the testing last time).  OK for trunk?
> > >
>
> Ok.
> Thanks,
> Kyrill
>

Hi Richard,

I didn't notice the initial regression you mention above, but after
this commit (r11-3522),
I see:
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vabs\\.f16\\ts[0-9]+, s[0-9]+ 2
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vmul\\.f16\\td[0-9]+, d[0-9]+, d[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vmul\\.f16\\tq[0-9]+, q[0-9]+, q[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vmul\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vsub\\.f16\\td[0-9]+, d[0-9]+, d[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vsub\\.f16\\tq[0-9]+, q[0-9]+, q[0-9]+ 1
FAIL: gcc.target/arm/armv8_2-fp16-arith-2.c scan-assembler-times
vsub\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1

Looks like we are running validations differently?

Christophe

> > > FWIW, the non-testsuite part is the same as before.
> > >
> > > Richard
> > >
> > >
> > > gcc/
> > > * config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH,
> > ARM_HAVE_NEON_V4HI_ARITH)
> > > (ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH):
> > New macros.
> > > (ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH):
> > Likewise.
> > > (ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH):
> > Likewise.
> > > (ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH):
> > Likewise.
> > > (ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH,
> > ARM_HAVE_V4HI_ARITH)
> > > (ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH,
> > ARM_HAVE_V8HI_ARITH)
> > > (ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH,
> > ARM_HAVE_V4HF_ARITH)
> > > (ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH,
> > ARM_HAVE_V4SF_ARITH):
> > > Likewise.
> > > * config/arm/iterators.md (VNIM, VNINOTM): Delete.
> > > * config/arm/vec-common.md (add3, addv8hf3)
> > > (add3): Replace with...
> > > (add3): ...this new expander.
> > > * config/arm/neon.md (*add3_neon): Use the new
> > > ARM_HAVE_NEON__ARITH macros as the C condition.
> > > (addv8hf3_neon, addv4hf3, add3_fp16): Delete in
> > > favor of the above.
> > > (neon_vadd): Use gen_add3 instead of
> > > gen_add3_fp16.
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/armv8_2-fp16-arith-2.c: Expect FP16 vectorization
> > > even without -ffast-math.
> > > ---
> > >  gcc/config/arm/arm.h  | 41 
> > >  gcc/config/arm/iterators.md   |  8 
> > >  gcc/config/arm/neon.md| 47 +--
> > >  gcc/config/arm/vec-common.md  | 42 ++---
> > >  .../gcc.target/arm/armv8_2-fp16-arith-2.c | 20 +---
> > >  5 files changed, 61 insertions(+), 97 deletions(-)
> > >
> > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> > > index f4d3676c5bc..4a63d33c70d 100644
> > > --- a/gcc/config/arm/arm.h
> > > +++ b/gcc/config/arm/arm.h
> > > @@ -1110,6 +1110,47 @@ extern const int arm_arch_cde_coproc_bits[];
> > >  #define VALID_MVE_STRUCT_MODE(MODE) \
> > >((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
> > >
> > > +/* The conditions under which vector modes are supported for general
> > > +   arithmetic using Neon.  */
> > > +
> > > +#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
> > > +
> > > +#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
> > > +#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
> > > +
> > > +/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
> > > +#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
> > > +#define 

Re: [Patch] Fortran: add contiguous check for ptr assignment, fix non-contig check (PR97242)

2020-09-30 Thread Paul Richard Thomas via Gcc-patches
Hi Tobias,

This looks good to me - OK for master.

Thanks for the patch

Paul


On Wed, 30 Sep 2020 at 09:59, Tobias Burnus  wrote:

> The non-contiguous had both check false positive and false
> negative results. Some more refinements
> are surely possible, but hopefully there are no longer
> false positives.
>
> I also now used this check for pointer assignments where the
> LHS pointer has the contiguous attribute.
>
> In the non-contiguous-check function:
> - for 'dt(i)%array' it returned true due to dt(i) but that's
>an element, which is contiguous.
> - ref_size (which is a size) is compared with 'arr_size' calculated
>via dep_difference,, which returns upper-lower but array size is
>(upper-lower)+1.
> - fixed a memory leak.
>
> OK?
>
> Tobias
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
> Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


[PATCH] PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

2020-09-30 Thread Florian Weimer
These micro-architecture levels are defined in the x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9

PTA_NO_TUNE is introduced so that the new processor alias table entries
do not affect the CPU tuning setting in ix86_tune.

The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").

gcc/:
PR target/97250
* config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
(PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
* common/config/i386/i386-common.c (processor_alias_table):
Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
* config/i386/i386-options.c (ix86_option_override_internal):
Handle new PTA_NO_TUNE processor table entries.
* doc/invoke.texi (x86 Options): Document new -march values.

gcc/testsuite/:
PR target/97250
* gcc.target/i386/x86-64-v2.c: New test.
* gcc.target/i386/x86-64-v3.c: New test.
* gcc.target/i386/x86-64-v4.c: New test.

---

Notes (not going to be committed);

I struggled a bit without avoid ICEs when I used PROCESSOR_GENERIC
instead of PROCESSOR_K8 in the new process alias table entries.  In
the end, I think not resetting the tuning setting is the correct thing
to do.

Test results on x86-64 (on Debian buster) look okay-ish to me.  I see
lots of obviously unrelated FAILs.

 gcc/common/config/i386/i386-common.c  |  10 ++-
 gcc/config/i386/i386-options.c|  27 +--
 gcc/config/i386/i386.h|  11 ++-
 gcc/doc/invoke.texi   |   7 ++
 gcc/testsuite/gcc.target/i386/x86-64-v2.c | 113 ++
 gcc/testsuite/gcc.target/i386/x86-64-v3.c | 113 ++
 gcc/testsuite/gcc.target/i386/x86-64-v4.c | 113 ++
 7 files changed, 385 insertions(+), 9 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 10142149115..62a620b4430 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1795,9 +1795,13 @@ const pta processor_alias_table[] =
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
   {"athlon-mp", PROCESSOR_ATHLON, CPU_ATHLON,
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR, 0, P_NONE},
-  {"x86-64", PROCESSOR_K8, CPU_K8,
-PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR,
-0, P_NONE},
+  {"x86-64", PROCESSOR_K8, CPU_K8, PTA_X86_64_BASELINE, 0, P_NONE},
+  {"x86-64-v2", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V2 | PTA_NO_TUNE,
+   0, P_NONE},
+  {"x86-64-v3", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V3 | PTA_NO_TUNE,
+   0, P_NONE},
+  {"x86-64-v4", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V4 | PTA_NO_TUNE,
+   0, P_NONE},
   {"eden-x2", PROCESSOR_K8, CPU_K8,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR,
 0, P_NONE},
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 597de533fbd..cf48a911798 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2058,10 +2058,25 @@ ix86_option_override_internal (bool main_args_p,
return false;
  }
 
+   /* Only the x86-64 psABI defines the feature-only
+  micro-architecture levels that use PTA_NO_TUNE.  */
+   if ((processor_alias_table[i].flags & PTA_NO_TUNE) != 0
+   && (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
+   || opts->x_ix86_abi != SYSV_ABI))
+ {
+   error (G_("%<%s%> architecture level is only defined"
+ " for the x86-64 psABI"), opts->x_ix86_arch_string);
+   return false;
+ }
+
ix86_schedule = processor_alias_table[i].schedule;
ix86_arch = processor_alias_table[i].processor;
-   /* Default cpu tuning to the architecture.  */
-   ix86_tune = ix86_arch;
+
+   /* Default cpu tuning to the architecture, unless the table
+  entry requests not to do this.  Used by the x86-64 psABI
+  micro-architecture levels.  */
+   if ((processor_alias_table[i].flags & PTA_NO_TUNE) == 0)
+ ix86_tune = ix86_arch;
 
if (((processor_alias_table[i].flags & PTA_MMX) != 0)
&& !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MMX))
@@ -2384,7 +2399,8 @@ ix86_option_override_internal (bool main_args_p,
 ix86_arch_features[i] = !!(initial_ix86_arch_features[i] & ix86_arch_mask);
 
   for (i = 0; i < pta_size; i++)
-if (! strcmp (opts->x_ix86_tune_string, processor_alias_table[i].name))
+if (! strcmp (opts->x_ix86_tune_string, processor_alias_table[i].name)
+   && (processor_alias_table[i].flags & PTA_NO_TUNE) == 0)
   {
ix86_schedule = processor_alias_table[i].schedule;
ix86_tune = processor_alias_table[i].processor;
@@ -2428,8 +2444,9 @@ ix86_option_override_internal (bool main_args_p,
 
 

[committed][testsuite] Re-enable pr94600-{1,3}.c tests for arm

2020-09-30 Thread Tom de Vries
[ was: Re: [committed][testsuite] Require non_strict_align in
pr94600-{1,3}.c ]

On 9/30/20 4:53 AM, Hans-Peter Nilsson wrote:
> On Thu, 24 Sep 2020, Tom de Vries wrote:
> 
>> Hi,
>>
>> With the nvptx target, we run into:
>> ...
>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(mem/v" 6
>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(set \\(mem/v" 6
>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(mem/v" 1
>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(set \\(mem/v" 1
>> ...
>> The scans attempt to check for volatile stores, but on nvptx we have memcpy
>> instead.
>>
>> This is due to nvptx being a STRICT_ALIGNMENT target, which has the effect
>> that the TYPE_MODE for the store target is set to BKLmode in
>> compute_record_mode.
>>
>> Fix the FAILs by requiring effective target non_strict_align.
> 
> No, that's wrong.  There's more than that at play; it worked for
> the strict-alignment targets where it was tested at the time.
> 

Hi,

thanks for letting me know.

> The test is a valuable canary for this kind of bug.  You now
> disabled it for strict-alignment targets.
> 
> Please revert and add your target specifier instead, if you
> don't feel like investigating further.

I've analyzed the compilation on strict-alignment target arm-eabi, and
broadened the effective target to (non_strict_align ||
pcc_bitfield_type_matters).

Thanks,
- Tom
[testsuite] Re-enable pr94600-{1,3}.c tests for arm

Before commit 7e437162001 "[testsuite] Require non_strict_align in
pr94600-{1,3}.c", some tests were failing for nvptx, because volatile stores
were expected, but memcpy's were found instead.

This was traced back to this bit in compute_record_mode:
...
  /* If structure's known alignment is less than what the scalar
 mode would need, and it matters, then stick with BLKmode.  */
  if (mode != BLKmode
  && STRICT_ALIGNMENT
  && ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT
|| TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (mode)))
{
  /* If this is the only reason this type is BLKmode, then
 don't force containing types to be BLKmode.  */
  TYPE_NO_FORCE_BLK (type) = 1;
  mode = BLKmode;
}
...
which got triggered for nvptx, but not for x86_64.

The commit disabled the tests for non_strict_align effective target, but
that had the effect for the arm target that those tests were disabled, even
though they were passing before.

Further investigation in compute_record_mode shows that the if-condition
evaluates to false for arm because, because TYPE_ALIGN (type) == 32, while
it's 8 for nvptx.  This again can be explained by the
PCC_BITFIELD_TYPE_MATTERS setting, which is 1 for arm, but 0 for nvptx.

Re-enable the test for arm by using effective target
(non_strict_align || pcc_bitfield_type_matters).

Tested on arm-eabi and nvptx.

gcc/testsuite/ChangeLog:

2020-09-30  Tom de Vries  

	* gcc.dg/pr94600-1.c: Use effective target
	(non_strict_align || pcc_bitfield_type_matters).
	* gcc.dg/pr94600-3.c: Same.

---
 gcc/testsuite/gcc.dg/pr94600-1.c | 4 ++--
 gcc/testsuite/gcc.dg/pr94600-3.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr94600-1.c b/gcc/testsuite/gcc.dg/pr94600-1.c
index 38f939a98cb..c9a7bb9007e 100644
--- a/gcc/testsuite/gcc.dg/pr94600-1.c
+++ b/gcc/testsuite/gcc.dg/pr94600-1.c
@@ -32,5 +32,5 @@ foo(void)
 }
 
 /* The only volatile accesses should be the obvious writes.  */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" { target { non_strict_align } } } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" { target { non_strict_align } } } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
diff --git a/gcc/testsuite/gcc.dg/pr94600-3.c b/gcc/testsuite/gcc.dg/pr94600-3.c
index e8776fbdb28..ff42c7db3c6 100644
--- a/gcc/testsuite/gcc.dg/pr94600-3.c
+++ b/gcc/testsuite/gcc.dg/pr94600-3.c
@@ -31,5 +31,5 @@ foo(void)
 }
 
 /* The loop isn't unrolled. */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" { target { non_strict_align } } } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" { target { non_strict_align } } } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */


Re: [PATCH] Add type arg to TARGET_LIBC_HAS_FUNCTION

2020-09-30 Thread Richard Biener via Gcc-patches
On Tue, Sep 29, 2020 at 2:18 PM Tom de Vries  wrote:
>
> On 9/29/20 8:59 AM, Richard Biener wrote:
> > On Mon, Sep 28, 2020 at 7:28 PM Tom de Vries  wrote:
> >>
> >> [ was: Re: [Patch][nvptx] return true in libc_has_function for
> >> function_sincos ]
> >>
> >> On 9/26/20 6:47 PM, Tobias Burnus wrote:
> >>> Found when looking at PR97203 (but having no effect there).
> >>>
> >>> The GCC ME optimizes with -O1 (or higher) the
> >>>   a = sinf(x)
> >>>   b = cosf(x)
> >>> to __builtin_cexpi(x, , )
> >>> (...i as in internal; like cexp(z) but with with __real__ z == 0)
> >>>
> >>>
> >>> In expand_builtin_cexpi, that is handles as:
> >>>   if (optab_handler (sincos_optab, mode) != CODE_FOR_nothing)
> >>> ...
> >>>   else if (targetm.libc_has_function (function_sincos))
> >>> ...
> >>>   else
> >>> fn = builtin_decl_explicit (BUILT_IN_CEXPF);
> >>>
> >>> And the latter is done. As newlib's cexpf does not know that
> >>> __real__ z == 0, it calculates 'r = expf (__real__ z)' before
> >>> invoking sinf and cosf on __imag__ z.
> >>>
> >>> Thus, it is much faster to call 'sincosf', which also exists
> >>> in newlib.
> >>>
> >>> Solution: Return true for targetm.libc_has_function (function_sincos).
> >>>
> >>>
> >>> NOTE: With -funsafe-math-optimizations (-O0 or higher),
> >>> sinf/cosf and sincosf invoke .sin.approx/.cos/.approx instead of
> >>> doing a library call.
> >>
> >> This version takes care to enable sincos and sincosf, but not sincosl.
> >>
> >> Target hook changes OK for trunk?
> >
> > @@ -9770,7 +9770,7 @@ fold_builtin_sincos (location_t loc,
> >  }
> >if (!call)
> >  {
> > -  if (!targetm.libc_has_function (function_c99_math_complex)
> > +  if (!targetm.libc_has_function (function_c99_math_complex, NULL_TREE)
> >
> > why pass NULL_TREE and not 'type' here?
> >
> >   || !builtin_decl_implicit_p (fn))
> > return NULL_TREE;
> >
>
> I was trying to do the minimal, sincos-only implementation.
>
> > similar for the builtins.def change for the cases where math functions
> > are affected?  I guess it's a bit awkward to make it work there, so OK.
> >
> >  bool
> > -darwin_libc_has_function (enum function_class fn_class)
> > +darwin_libc_has_function (enum function_class fn_class, tree type)
> >  {
> > -  if (fn_class == function_sincos)
> > +  if (type != NULL_TREE)
> > +{
> > +  switch (fn_class)
> > +   {
> > +   case function_sincos:
> > + break;
> > +   default:
> > + /* Not implemented.  */
> > + gcc_unreachable ();
> > +   }
> > +}
> >
> > huh.  I think special-casing this just for sincos is a bit awkward,
> > esp. ICEing for other queries with a type.  Specifically
> >
> > -@deftypefn {Target Hook} bool TARGET_LIBC_HAS_FUNCTION (enum
> > function_class @var{fn_class})
> > +@deftypefn {Target Hook} bool TARGET_LIBC_HAS_FUNCTION (enum
> > function_class @var{fn_class}, tree @var{type})
> >  This hook determines whether a function from a class of functions
> > -@var{fn_class} is present in the target C library.
> > +@var{fn_class} is present in the target C library.  The @var{type} argument
> > +can be used to distinguish between float, double and long double versions.
> >  @end deftypefn
> >
> > This doesn't mention we'll ICE for anything but sincos.  A sensible
> > semantics would be that if TYPE is NULL the caller asks for support
> > for all standard (float, double, long double) types while with TYPE
> > non-NULL it can ask for a specific type including for example the
> > new _FloatN, etc. types.
> >
>
> Ack, updated accordingly and retested.
>
> OK for trunk?

OK.

Thanks,
Richard.

> Thanks,
> - Tom


[GCC][PATCH] arm: Fix MVE intrinsics polymorphic variants wrongly generating __ARM_undef type (pr96795).

2020-09-30 Thread Srinath Parvathaneni via Gcc-patches
Hello,

This patch fixes (PR96795) MVE intrinsic polymorphic variants vaddq, vaddq_m, 
vaddq_x, vcmpeqq_m,
vcmpeqq, vcmpgeq_m, vcmpgeq, vcmpgtq_m, vcmpgtq, vcmpleq_m, vcmpleq, vcmpltq_m, 
vcmpltq,
vcmpneq_m, vcmpneq, vfmaq_m, vfmaq, vfmasq_m, vfmasq, vmaxnmavq, vmaxnmavq_p, 
vmaxnmvq,
vmaxnmvq_p, vminnmavq, vminnmavq_p, vminnmvq, vminnmvq_p, vmulq_m, vmulq, 
vmulq_x, vsetq_lane,
vsubq_m, vsubq and vsubq_x which are incorrectly generating __ARM_undef and 
mismatching the passed
floating point scalar arguments.

Bootstrapped on arm-none-linux-gnueabihf and regression tested on arm-none-eabi 
and found no regressions.

Ok for master? Ok for GCC-10 branch?

Regards,
Srinath.

gcc/ChangeLog:

2020-09-30  Srinath Parvathaneni  

PR target/96795
* config/arm/arm_mve.h (__ARM_mve_coerce2): Define.
(__arm_vaddq): Correct the scalar argument.
(__arm_vaddq_m): Likewise.
(__arm_vaddq_x): Likewise.
(__arm_vcmpeqq_m): Likewise.
(__arm_vcmpeqq): Likewise.
(__arm_vcmpgeq_m): Likewise.
(__arm_vcmpgeq): Likewise.
(__arm_vcmpgtq_m): Likewise.
(__arm_vcmpgtq): Likewise.
(__arm_vcmpleq_m): Likewise.
(__arm_vcmpleq): Likewise.
(__arm_vcmpltq_m): Likewise.
(__arm_vcmpltq): Likewise.
(__arm_vcmpneq_m): Likewise.
(__arm_vcmpneq): Likewise.
(__arm_vfmaq_m): Likewise.
(__arm_vfmaq): Likewise.
(__arm_vfmasq_m): Likewise.
(__arm_vfmasq): Likewise.
(__arm_vmaxnmavq): Likewise.
(__arm_vmaxnmavq_p): Likewise.
(__arm_vmaxnmvq): Likewise.
(__arm_vmaxnmvq_p): Likewise.
(__arm_vminnmavq): Likewise.
(__arm_vminnmavq_p): Likewise.
(__arm_vminnmvq): Likewise.
(__arm_vminnmvq_p): Likewise.
(__arm_vmulq_m): Likewise.
(__arm_vmulq): Likewise.
(__arm_vmulq_x): Likewise.
(__arm_vsetq_lane): Likewise.
(__arm_vsubq_m): Likewise.
(__arm_vsubq): Likewise.
(__arm_vsubq_x): Likewise.

gcc/testsuite/ChangeLog:

PR target/96795
* gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: New Test.
* gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16-1.c: Likewise.
* 

[committed] testsuite: Fix up amx* dg-do run tests with older binutils

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 18, 2020 at 04:31:55PM +0800, Hongyu Wang via Gcc-patches wrote:
> Very Appreciated for your review again
> 
> I just update the patch with adding XSAVE dependency and use
> __builtin_cpu_supports for runtime test.

Several tests FAIL when using older binutils that don't support AMX.

Fixed thusly, tested on x86_64-linux -m32/-m64, committed to trunk as
obvious:

2020-09-30  Jakub Jelinek  

* gcc.target/i386/amxint8-dpbssd-2.c: Require effective targets
amx_tile and amx_int8.
* gcc.target/i386/amxint8-dpbsud-2.c: Likewise.
* gcc.target/i386/amxint8-dpbusd-2.c: Likewise.
* gcc.target/i386/amxint8-dpbuud-2.c: Likewise.
* gcc.target/i386/amxbf16-dpbf16ps-2.c: Require effective targets
amx_tile and amx_bf16.
* gcc.target/i386/amxtile-2.c: Require effective target amx_tile.

--- gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c.jj 2020-09-29 
11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c2020-09-30 
13:16:08.186445881 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_int8 } */
 /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
 #include 
 
--- gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c.jj 2020-09-29 
11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c2020-09-30 
13:16:23.715221450 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_int8 } */
 /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
 #include 
 
--- gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c.jj 2020-09-29 
11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c2020-09-30 
13:16:28.998145100 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_int8 } */
 /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
 #include 
 
--- gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c.jj 2020-09-29 
11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c2020-09-30 
13:16:35.770047224 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_int8 } */
 /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
 #include 
 
--- gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c.jj   2020-09-29 
11:32:02.949602773 +0200
+++ gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c  2020-09-30 
13:15:41.079837637 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_bf16 } */
 /* { dg-options "-O2 -mamx-tile -mamx-bf16" } */
 #include 
 
--- gcc/testsuite/gcc.target/i386/amxtile-2.c.jj2020-09-29 
11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxtile-2.c   2020-09-30 13:16:57.972726339 
+0200
@@ -1,4 +1,5 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
 /* { dg-options "-O2 -mamx-tile " } */
 #include 
 


Jakub



Re: [Patch] OpenMP: Add implicit declare target for nested procedures

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 30, 2020 at 01:37:46PM +0200, Tobias Burnus wrote:
> We missed to handle nested procedures.
> 
> OK for the trunk?

Yes, thanks.

Jakub



Re: [PATCH] options: Save and restore opts_set for Optimization and Target options

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 30, 2020 at 01:21:44PM +0200, Stefan Schulze Frielinghaus wrote:
> I think the problem boils down that on S/390 we distinguish between four
> states of a flag: explicitely set to yes/no and implicitely set to
> yes/no.  If set explicitely, the option wins.  For example, the options
> `-march=z10 -mhtm` should enable the hardware transactional memory
> option although z10 does not have one.  In the past if a flag was set or
> not explicitely was encoded into opts_set->x_target_flags ... for each
> flag individually, e.g. TARGET_OPT_HTM_P (opts_set->x_target_flags) was

Oops, seems I've missed that set_option has special treatment for
CLVC_BIT_CLEAR/CLVC_BIT_SET.
Which means I'll need to change the generic handling, so that for
global_options_set elements mentioned in CLVC_BIT_* options are treated
differently, instead of using the accumulated bitmasks they'll need to use
their specific bitmask variables during the option saving/restoring.
Is it ok if I defer it for tomorrow? Need to prepare for OpenMP meeting now.

Jakub



[Patch] OpenMP: Add implicit declare target for nested procedures

2020-09-30 Thread Tobias Burnus

We missed to handle nested procedures.

OK for the trunk?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Add implicit declare target for nested procedures

gcc/ChangeLog:

	* omp-offload.c (omp_discover_implicit_declare_target): Also
	handled nested functions.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-3.f90: New test.

 gcc/omp-offload.c  |  7 
 .../testsuite/libgomp.fortran/declare-target-3.f90 | 45 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index a89275b3a7a..7fb3a72ec55 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -327,11 +327,18 @@ omp_discover_implicit_declare_target (void)
   FOR_EACH_DEFINED_FUNCTION (node)
 if (DECL_SAVED_TREE (node->decl))
   {
+	struct cgraph_node *cgn;
 if (omp_declare_target_fn_p (node->decl))
 	  worklist.safe_push (node->decl);
 	else if (DECL_STRUCT_FUNCTION (node->decl)
 		 && DECL_STRUCT_FUNCTION (node->decl)->has_omp_target)
 	  worklist.safe_push (node->decl);
+	for (cgn = node->nested; cgn; cgn = cgn->next_nested)
+	  if (omp_declare_target_fn_p (cgn->decl))
+	worklist.safe_push (cgn->decl);
+	  else if (DECL_STRUCT_FUNCTION (cgn->decl)
+		   && DECL_STRUCT_FUNCTION (cgn->decl)->has_omp_target)
+	worklist.safe_push (cgn->decl);
   }
   FOR_EACH_STATIC_INITIALIZER (vnode)
 if (omp_declare_target_var_p (vnode->decl))
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-3.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-3.f90
new file mode 100644
index 000..6e5301de0a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-3.f90
@@ -0,0 +1,45 @@
+! { dg-additional-options "-fdump-tree-omplower" }
+
+module m
+  implicit none (type, external)
+contains
+  subroutine mod_proc(x)
+integer :: x(2)
+  x = x + 5
+end subroutine
+end module m
+
+program main
+  use m
+  implicit none (type, external)
+  if (any (foo() /= [48, 49])) stop 1
+contains
+  integer function fourty_two(y)
+integer :: y
+fourty_two = y + 42
+  end function
+
+  integer function wrapper (x, y)
+integer :: x, y(2)
+call mod_proc(y)
+wrapper = fourty_two(x) + 1
+  end function
+
+  function foo()
+integer :: foo(2)
+integer :: a(2)
+integer :: b, summed(2)
+a = [1, 2]
+b = -1
+!$omp target map (tofrom: a, b, summed)
+  summed = wrapper (b, a)
+!$omp end target
+if (b /= -1) stop 2! unchanged
+if (any (summed /= 42)) stop 3 ! b + 42 + 1 = 42
+if (any (a /= [6, 7])) stop 4  ! [1, 2] + 5
+foo = summed + a   ! [48, 49]
+  end function
+end
+
+! 3 times: mod_proc, fourty_two and wrapper:
+! { dg-final { scan-tree-dump-times "__attribute__..omp declare target" 3 "omplower" } }


Re: [PATCH] options: Save and restore opts_set for Optimization and Target options

2020-09-30 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Wed, Sep 30, 2020 at 11:32:55AM +0200, Jakub Jelinek wrote:
> On Mon, Sep 28, 2020 at 09:50:00PM +0200, Stefan Schulze Frielinghaus via 
> Gcc-patches wrote:
> > This patch breaks quite a view test cases (target-attribute/tattr-*) on
> > IBM Z.  Having a look at function cl_target_option_restore reveals that
> > some members of opts_set are reduced to 1 or 0 depending on whether a
> > member was set before or not, e.g. for target_flags we have
> 
> I've tried to reproduce the tattr FAILs reported in
> https://gcc.gnu.org/pipermail/gcc-testresults/2020-September/608760.html
> in a cross-compiler (with
> #define HAVE_AS_MACHINE_MACHINEMODE 1
> ), but couldn't, neither the ICEs nor the scan-assembler failures.
> Anyway, could you do a side-by-side debugging of one of those failures
> before/after my change and see what behaves differently?

I think the problem boils down that on S/390 we distinguish between four
states of a flag: explicitely set to yes/no and implicitely set to
yes/no.  If set explicitely, the option wins.  For example, the options
`-march=z10 -mhtm` should enable the hardware transactional memory
option although z10 does not have one.  In the past if a flag was set or
not explicitely was encoded into opts_set->x_target_flags ... for each
flag individually, e.g. TARGET_OPT_HTM_P (opts_set->x_target_flags) was
used.  This has changed with the mentioned patch in the sense that
opts_set encodes whether any flag of x_target_flags was set or not but
not which individual one after a call to the generated function
cl_target_option_restore where we have:
opts_set->x_target_flags = (mask & 1) != 0;

Compiling the following program

#pragma GCC target ("arch=z10")
void fn_pragma_0 (void) { }

with options `-march=z13 -mzarch -mhtm -mdebug` produces different flags
for 4ac7b669580 (commit prior your patch) and ba948b37768 (your patch).

This is my current understanding of the option handling.  I will try to
come up with a trace where these things become hopefully more clear.

Cheers,
Stefan


[PATCH][committed] PR target/96313 AArch64: vqmovun* return types should be unsigned

2020-09-30 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

In this PR we have the wrong return type for some intrinsics. It should be 
unsigned, but we implement it as signed.
Fix this by adjusting the type qualifiers used when creating the builtins and 
fixing the type in the arm_neon.h intrinsic.
With the adjustment in qualifiers we now don't need to cast the result when 
returning.

Bootstrapped and tested on aarch64-none-linux-gnu.

Pushing to master.
Thanks,
Kyrill

gcc/
PR target/96313
* config/aarch64/aarch64-simd-builtins.def (sqmovun): Use UNOPUS
qualifiers.
* config/aarch64/arm_neon.h (vqmovun_s16): Adjust builtin call.
Remove unnecessary result cast.
(vqmovun_s32): Likewise.
(vqmovun_s64): Likewise.
(vqmovunh_s16): Likewise.  Fix return type.
(vqmovuns_s32): Likewise.
(vqmovund_s64): Likewise.

gcc/testsuite/
PR target/96313
* gcc.target/aarch64/pr96313.c: New test.
* gcc.target/aarch64/scalar_intrinsics.c (test_vqmovunh_s16): Adjust
return type.
(test_vqmovuns_s32): Likewise.
(test_vqmovund_s64): Likewise.


vqmovun.patch
Description: vqmovun.patch


[PATCH][committed] PR target/97150 AArch64: 2nd parameter of unsigned Neon scalar shift intrinsics should be signed

2020-09-30 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

In this PR the second argument to the intrinsics should be signed but we use an 
unsigned one erroneously.
The corresponding builtins are already using the correct types so it's just a 
matter of correcting the signatures
in arm_neon.h

Bootstrapped and tested on aarch64-none-linux-gnu.

Pushing to master.
Thanks,
Kyrill

gcc/
PR target/97150
* config/aarch64/arm_neon.h (vqrshlb_u8): Make second argument signed.
(vqrshlh_u16): Likewise.
(vqrshls_u32): Likewise.
(vqrshld_u64): Likewise.
(vqshlb_u8): Likewise.
(vqshlh_u16): Likewise.
(vqshls_u32): Likewise.
(vqshld_u64): Likewise.
(vshld_u64): Likewise.

gcc/testsuite/
PR target/97150
* gcc.target/aarch64/pr97150.c: New test.


shift-sign.patch
Description: shift-sign.patch


Re: [PATCH] x86: Use SET operation in MOVDIRI and MOVDIR64B

2020-09-30 Thread Uros Bizjak via Gcc-patches
> gcc/
>
> PR target/97184
> * config/i386/i386.md (UNSPECV_MOVDIRI): Renamed to ...
> (UNSPEC_MOVDIRI): This.
> (UNSPECV_MOVDIR64B): Renamed to ...
> (UNSPEC_MOVDIR64B): This.
> (movdiri): Use SET operation.
> (@movdir64b_): Likewise.
>
> gcc/testsuite/
>
> PR target/97184
> * gcc.target/i386/movdir64b.c: New test.
> * gcc.target/i386/movdiri32.c: Likewise.
> * gcc.target/i386/movdiri64.c: Likewise.
> * testsuite/lib/target-supports.exp
> (check_effective_target_movdir): New.

OK for mainline and backports.

Thanks,
Uros.


[committed] aarch64: Tweak movti and movtf patterns

2020-09-30 Thread Richard Sandiford via Gcc-patches
movti lacked an way of zeroing an FPR, meaning that we'd do:

mov x0, 0
mov x1, 0
fmovd0, x0
fmovv0.d[1], x1

instead of just:

moviv0.2d, #0

movtf had the opposite problem for GPRs: we'd generate:

moviv0.2d, #0
fmovx0, d0
fmovx1, v0.d[1]

instead of just:

mov x0, 0
mov x1, 0

Also, there was an unnecessary earlyclobber on the GPR<-GPR movtf
alternative (but not the movti one).  The splitter handles overlap
correctly.

The TF splitter used aarch64_reg_or_imm, but the _imm part only
accepts integer constants, not floating-point ones.  The patch
changes it to nonmemory_operand instead.

Tested on aarch64-linux-gnu, pushed.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_split_128bit_move_p): Add a
function comment.  Tighten check for FP moves.
* config/aarch64/aarch64.md (*movti_aarch64): Add a w<-Z alternative.
(*movtf_aarch64): Handle r<-Y like r<-r.  Remove unnecessary
earlyclobber.  Change splitter predicate from aarch64_reg_or_imm
to nonmemory_operand.

gcc/testsuite/
* gcc.target/aarch64/movtf_1.c: New test.
* gcc.target/aarch64/movti_1.c: Likewise.
---
 gcc/config/aarch64/aarch64.c   |  9 ++-
 gcc/config/aarch64/aarch64.md  | 17 +++--
 gcc/testsuite/gcc.target/aarch64/movtf_1.c | 87 ++
 gcc/testsuite/gcc.target/aarch64/movti_1.c | 87 ++
 4 files changed, 190 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_1.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 491fc582dab..9e88438b3c3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3422,11 +3422,16 @@ aarch64_split_128bit_move (rtx dst, rtx src)
 }
 }
 
+/* Return true if we should split a move from 128-bit value SRC
+   to 128-bit register DEST.  */
+
 bool
 aarch64_split_128bit_move_p (rtx dst, rtx src)
 {
-  return (! REG_P (src)
- || ! (FP_REGNUM_P (REGNO (dst)) && FP_REGNUM_P (REGNO (src;
+  if (FP_REGNUM_P (REGNO (dst)))
+return REG_P (src) && !FP_REGNUM_P (REGNO (src));
+  /* All moves to GPRs need to be split.  */
+  return true;
 }
 
 /* Split a complex SIMD combine.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 19ec9e33f9f..78fe7c43a00 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1361,13 +1361,14 @@ (define_expand "movti"
 
 (define_insn "*movti_aarch64"
   [(set (match_operand:TI 0
-"nonimmediate_operand"  "=   r,w, r,w,r,m,m,w,m")
+"nonimmediate_operand"  "=   r,w,w, r,w,r,m,m,w,m")
(match_operand:TI 1
-"aarch64_movti_operand" " rUti,r, w,w,m,r,Z,m,w"))]
+"aarch64_movti_operand" " rUti,Z,r, w,w,m,r,Z,m,w"))]
   "(register_operand (operands[0], TImode)
 || aarch64_reg_or_zero (operands[1], TImode))"
   "@
#
+   movi\\t%0.2d, #0
#
#
mov\\t%0.16b, %1.16b
@@ -1376,11 +1377,11 @@ (define_insn "*movti_aarch64"
stp\\txzr, xzr, %0
ldr\\t%q0, %1
str\\t%q1, %0"
-  [(set_attr "type" "multiple,f_mcr,f_mrc,neon_logic_q, \
+  [(set_attr "type" "multiple,neon_move,f_mcr,f_mrc,neon_logic_q, \
 load_16,store_16,store_16,\
  load_16,store_16")
-   (set_attr "length" "8,8,8,4,4,4,4,4,4")
-   (set_attr "arch" "*,*,*,simd,*,*,*,fp,fp")]
+   (set_attr "length" "8,4,8,8,4,4,4,4,4,4")
+   (set_attr "arch" "*,simd,*,*,simd,*,*,*,fp,fp")]
 )
 
 ;; Split a TImode register-register or register-immediate move into
@@ -1511,9 +1512,9 @@ (define_split
 
 (define_insn "*movtf_aarch64"
   [(set (match_operand:TF 0
-"nonimmediate_operand" "=w,?,w ,?r,w,?w,w,m,?r,m ,m")
+"nonimmediate_operand" "=w,?r ,w ,?r,w,?w,w,m,?r,m ,m")
(match_operand:TF 1
-"general_operand"  " w,?r, ?r,w ,Y,Y ,m,w,m ,?r,Y"))]
+"general_operand"  " w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))]
   "TARGET_FLOAT && (register_operand (operands[0], TFmode)
 || aarch64_reg_or_fp_zero (operands[1], TFmode))"
   "@
@@ -1536,7 +1537,7 @@ (define_insn "*movtf_aarch64"
 
 (define_split
[(set (match_operand:TF 0 "register_operand" "")
-(match_operand:TF 1 "aarch64_reg_or_imm" ""))]
+(match_operand:TF 1 "nonmemory_operand" ""))]
   "reload_completed && aarch64_split_128bit_move_p (operands[0], operands[1])"
   [(const_int 0)]
   {
diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_1.c 
b/gcc/testsuite/gcc.target/aarch64/movtf_1.c
new file mode 100644
index 000..570de931389
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movtf_1.c
@@ -0,0 +1,87 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** zero_q:
+** moviv0.2d, #0
+** ret

Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

2020-09-30 Thread Alex Coplan via Gcc-patches
On 29/09/2020 14:20, Segher Boessenkool wrote:
> On Tue, Sep 29, 2020 at 11:36:12AM +0100, Alex Coplan wrote:
> > Is the combine change (a canonicalization fix, as described below) OK
> > for trunk in light of this info?
> 
> Can you please resend it with correct info and a corresponding commit
> message?

Sure. I've sent just the combine patch with a proposed commit message:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html

Thanks,
Alex


[PATCH][GCC][ARM] Add support for Cortex-A78 and Cortex-A78AE

2020-09-30 Thread Przemyslaw Wirkus via Gcc-patches
This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
cpus.

[0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
[1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae

OK for master branch ?

kind regards
Przemyslaw Wirkus

gcc/ChangeLog:

* config/arm/arm-cpus.in: Add Cortex-A78 and Cortex-A78AE cores.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.


rb13552.patch
Description: rb13552.patch


[PATCH][GCC][AArch64] Add support for Cortex-A78 and Cortex-A78AE

2020-09-30 Thread Przemyslaw Wirkus via Gcc-patches
This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
cpus.

[0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
[1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae

OK for master branch ?

kind regards
Przemyslaw Wirkus

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Cortex-A78 and Cortex-A78AE 
cores.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add -mtune=cortex-a78 and -mtune=cortex-a78ae.


rb13551.patch
Description: rb13551.patch


[PATCH v2] combine: Don't turn (mult (extend x) 2^n) into extract [PR96998]

2020-09-30 Thread Alex Coplan via Gcc-patches
Currently, make_extraction() identifies where we can emit an ASHIFT of
an extend in place of an extraction, but fails to make the corresponding
canonicalization/simplification when presented with a MULT by a power of
two. Such a representation is canonical when representing a left-shifted
address inside a MEM.

This patch remedies this situation: after the patch, make_extraction()
now also identifies RTXs such as:

(mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))

and rewrites this as:

(mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))

instead of using a sign_extract.

(This patch also fixes up a comment in expand_compound_operation() which
appears to have suffered from bitrot.)

This fixes PR96998: an ICE on AArch64 due to an unrecognised
sign_extract insn which was exposed by
r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf. That change
introduced a canonicalisation in LRA to rewrite mult to shift in address
reloads.

Prior to this patch, the flow was as follows. We start with the
following insn going into combine:

(insn 9 8 10 3 (set (mem:SI (plus:DI (mult:DI (reg:DI 98 [ g ])
(const_int 4 [0x4]))
(reg/f:DI 96)) [3 *i_5+0 S4 A32])
(asm_operands:SI ("") ("=Q") 0 []
 []
 [] test.c:11)) "test.c":11:5 -1
 (expr_list:REG_DEAD (reg:DI 98 [ g ])
(nil)))

Then combine turns this into a sign_extract:

(insn 9 8 10 3 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI 
(reg/v:SI 92 [ g ]) 0)
(const_int 4 [0x4]))
(const_int 34 [0x22])
(const_int 0 [0]))
(reg/f:DI 96)) [3 *i_5+0 S4 A32])
(asm_operands:SI ("") ("=Q") 0 []
 []
 [] test.c:11)) "test.c":11:5 -1
 (expr_list:REG_DEAD (reg/v:SI 92 [ g ])
(nil)))

Then LRA reloads the address and (prior to the LRA change) we get:

(insn 32 8 9 3 (set (reg:DI 0 x0 [100])
(plus:DI (sign_extract:DI (mult:DI (reg:DI 0 x0 [orig:92 g ] [92])
(const_int 4 [0x4]))
(const_int 34 [0x22])
(const_int 0 [0]))
(reg/f:DI 19 x19 [96]))) "test.c":11:5 283 {*add_extvdi_multp2}
 (nil))
(insn 9 32 10 3 (set (mem:SI (reg:DI 0 x0 [100]) [3 *i_5+0 S4 A32])
(asm_operands:SI ("") ("=Q") 0 []
 []
 [] test.c:11)) "test.c":11:5 -1
 (nil))

Now observe that insn 32 here is not canonical: firstly, we should be
using an ASHIFT by 2 instead of a MULT by 4, since we're outside of a
MEM. Indeed, the LRA change remedies this, and support for such insns in
the AArch64 backend was dropped in
r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8.

Now the reason we ICE after the LRA change here is that AArch64 has
never supported the ASHIFT variant of this sign_extract insn. Inspecting
the unrecognised reloaded insn confirms this:

(gdb) p debug(insn)
(insn 33 8 34 3 (set (reg:DI 100)
(sign_extract:DI (ashift:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
(const_int 2 [0x2]))
(const_int 34 [0x22])
(const_int 0 [0]))) "test.c":11:5 -1
 (nil))

The thesis of this patch is that combine should _never_ be producing
such an insn. Clearly this should be canonicalised as an extend
operation instead (as combine already does in make_extraction() for the
ASHIFT form). After this change to combine, we get:

(insn 9 8 10 3 (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92 [ g 
]))
(const_int 4 [0x4]))
(reg/f:DI 96)) [3 *i_5+0 S4 A32])
(asm_operands:SI ("") ("=Q") 0 []
 []
 [] test.c:11)) "test.c":11:5 -1
 (expr_list:REG_DEAD (reg/v:SI 92 [ g ])
(nil)))

coming out of combine, and LRA can happily reload the address:

(insn 32 8 9 3 (set (reg:DI 0 x0 [100])
(plus:DI (ashift:DI (sign_extend:DI (reg/v:SI 0 x0 [orig:92 g ] [92]))
(const_int 2 [0x2]))
(reg/f:DI 19 x19 [96]))) "test.c":11:5 245 {*add_extendsi_shft_di}
 (nil))
(insn 9 32 10 3 (set (mem:SI (reg:DI 0 x0 [100]) [3 *i_5+0 S4 A32])
(asm_operands:SI ("") ("=Q") 0 []
 []
 [] test.c:11)) "test.c":11:5 -1
 (nil))

and all is well, with nice simple and canonical RTL being used
throughout.

Testing:
 * Bootstrap and regtest on aarch64-linux-gnu, arm-linux-gnueabihf, and
   x86-linux-gnu in progress.

OK for trunk (with AArch64 changes discussed here [0] as a follow-on
patch) provided it passes testing?

Thanks,
Alex

[0] : https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html

---

gcc/ChangeLog:

PR target/96998
* combine.c (expand_compound_operation): Tweak variable name in
comment to match source.
(make_extraction): Handle mult by power of two in addition to
ashift.

gcc/testsuite/ChangeLog:

PR target/96998
* gcc.c-torture/compile/pr96998.c: New test.
diff --git 

Re: [PATCH PR96757] aarch64: ICE during GIMPLE pass: vect

2020-09-30 Thread Richard Sandiford via Gcc-patches
Thanks for the update, looks good apart from…

"duanbo (C)"  writes:
> @@ -4361,7 +4391,7 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
>if (known_eq (TYPE_VECTOR_SUBPARTS (vectype1),
>   TYPE_VECTOR_SUBPARTS (vectype2))
> && (TREE_CODE (rhs1) == SSA_NAME
> -   || rhs1_type == TREE_TYPE (TREE_OPERAND (rhs1, 0
> +   || !rhs1_op0_type || !rhs1_op1_type))
>   return NULL;

…I think this should be:

  && (TREE_CODE (rhs1) == SSA_NAME
  || (!rhs1_op0_type && !rhs1_op1_type))

i.e. punt only if both types are already OK.  If one operand wants
a specific mask type, we should continue to the code below and attach
the chosen type to the comparison.

Although I guess this simplifies to:

  if (known_eq (TYPE_VECTOR_SUBPARTS (vectype1),
TYPE_VECTOR_SUBPARTS (vectype2))
  && !rhs1_op0_type
  && !rhs1_op1_type)
return NULL;

(I think the comment above the code is still accurate with this change.)

> @@ -4393,7 +4423,16 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
>if (TREE_CODE (rhs1) != SSA_NAME)
>   {
> tmp = vect_recog_temp_ssa_var (TREE_TYPE (rhs1), NULL);
> -   pattern_stmt = gimple_build_assign (tmp, rhs1);
> +   if (rhs1_op0_type && TYPE_PRECISION (rhs1_op0_type)
> + != TYPE_PRECISION (rhs1_type))
> + rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> +   vectype2, stmt_vinfo);
> +   if (rhs1_op1_type && TYPE_PRECISION (rhs1_op1_type)
> + != TYPE_PRECISION (rhs1_type))

Very minor -- I would have fixed this up before committing if it
wasn't for the above -- but: GCC formatting is instead:

  if (rhs1_op1_type
  && TYPE_PRECISION (rhs1_op1_type) != TYPE_PRECISION (rhs1_type))

LGTM with those changes, thanks.

Richard

> + rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> +   vectype2, stmt_vinfo);
> +   pattern_stmt = gimple_build_assign (tmp, TREE_CODE (rhs1),
> +   rhs1_op0, rhs1_op1);
> rhs1 = tmp;
> append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt, vectype2,
> rhs1_type);



Re: [PATCH] i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags

2020-09-30 Thread Uros Bizjak via Gcc-patches
On Tue, Sep 29, 2020 at 5:47 PM Florian Weimer  wrote:
>
> It looks like these have been omitted by accident.
>
> gcc/
> * config/i386/i386-c.c (ix86_target_macros_internal): Define
> __LAHF_SAHF__ and __MOVBE__ based on ISA flags.

LGTM.

Thanks,
Uros.

>
> ---
>  gcc/config/i386/i386-c.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> index 9da682ab05c..e647fce9ad4 100644
> --- a/gcc/config/i386/i386-c.c
> +++ b/gcc/config/i386/i386-c.c
> @@ -594,6 +594,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>  def_or_undef (parse_in, "__AMX_INT8__");
>if (isa_flag2 & OPTION_MASK_ISA2_AMX_BF16)
>  def_or_undef (parse_in, "__AMX_BF16__");
> +  if (isa_flag & OPTION_MASK_ISA_SAHF)
> +def_or_undef (parse_in, "__LAHF_SAHF__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_MOVBE)
> +def_or_undef (parse_in, "__MOVBE__");
>
>if (TARGET_IAMCU)
>  {


Re: [PATCH] avoid modifying type in place (PR 97206)

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 29, 2020 at 03:40:40PM -0600, Martin Sebor via Gcc-patches wrote:
> I will commit this patch later this week unless I hear concerns
> or suggestions for changes.

That is not how the patch review process works.

> +   arat = tree_cons (get_identifier ("array"), flag, NULL_TREE);

Better
  arat = build_tree_list (get_identifier ("array"), flag);
then, tree_cons is when you have a meaningful TREE_CHAIN you want to supply
too.
>   }
>  
> -  TYPE_ATOMIC (artype) = TYPE_ATOMIC (type);
> -  TYPE_READONLY (artype) = TYPE_READONLY (type);
> -  TYPE_RESTRICT (artype) = TYPE_RESTRICT (type);
> -  TYPE_VOLATILE (artype) = TYPE_VOLATILE (type);
> -  type = artype;
> +  const int quals = TYPE_QUALS (type);
> +  type = build_array_type (eltype, index_type);
> +  type = build_type_attribute_qual_variant (type, arat, quals);
>  }
>  
>/* Format the type using the current pretty printer.  The generic tree
> @@ -2309,10 +2304,6 @@ attr_access::array_as_string (tree type) const
>typstr = pp_formatted_text (pp);
>delete pp;
>  
> -  if (this->str)
> -/* Remove the attribute that wasn't installed by decl_attributes.  */
> -TYPE_ATTRIBUTES (type) = NULL_TREE;
> -
>return typstr;
>  }

Otherwise LGTM.

Jakub



Re: [PATCH] options: Save and restore opts_set for Optimization and Target options

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 28, 2020 at 09:50:00PM +0200, Stefan Schulze Frielinghaus via 
Gcc-patches wrote:
> This patch breaks quite a view test cases (target-attribute/tattr-*) on
> IBM Z.  Having a look at function cl_target_option_restore reveals that
> some members of opts_set are reduced to 1 or 0 depending on whether a
> member was set before or not, e.g. for target_flags we have

I've tried to reproduce the tattr FAILs reported in
https://gcc.gnu.org/pipermail/gcc-testresults/2020-September/608760.html
in a cross-compiler (with
#define HAVE_AS_MACHINE_MACHINEMODE 1
), but couldn't, neither the ICEs nor the scan-assembler failures.
Anyway, could you do a side-by-side debugging of one of those failures
before/after my change and see what behaves differently?

Jakub



Re: Another issue on RS6000 target. Re: One issue with default implementation of zero_call_used_regs

2020-09-30 Thread Richard Sandiford via Gcc-patches
Qing Zhao  writes:
> Hi, Richard,
>
> At the same time testing aarch64, I also tested the default implementation on 
> rs6000 target. 
>
> The default implementation now is:
>
> +/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
> +
> +HARD_REG_SET
> +default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +  {
> +   machine_mode mode = reg_raw_mode[regno];
> +   rtx reg = gen_rtx_REG (mode, regno);
> +   emit_move_insn (reg, const0_rtx);

This should just be:

rtx zero = CONST0_RTX (reg_raw_mode[regno]);
emit_move_insn (regno_reg_rtx[regno], zero);

> +  }
> +  return need_zeroed_hardregs;
> +}
> +
>
> With the small testing case:
> int
> test ()
> {
>   return 1;
> }
>
> If I compiled it with 
>
> /home/qinzhao/Install/latest/bin/gcc -O2 -fzero-call-used-regs=all-arg t.c
>
> It will failed as:
>
> t.c: In function ‘test’:
> t.c:6:1: error: insn does not satisfy its constraints:
> 6 | }
>   | ^
> (insn 28 27 29 (set (reg:DI 33 1)
> (const_int 0 [0])) "t.c":6:1 647 {*movdi_internal64}
>  (nil))
> during RTL pass: shorten
> dump file: t.c.319r.shorten
> t.c:6:1: internal compiler error: in extract_constrain_insn_cached, at 
> recog.c:2207
> 0x1018d693 _fatal_insn(char const*, rtx_def const*, char const*, int, char 
> const*)
>   ../../latest-gcc-x86/gcc/rtl-error.c:108
> 0x1018d6e7 _fatal_insn_not_found(rtx_def const*, char const*, int, char 
> const*)
>   ../../latest-gcc-x86/gcc/rtl-error.c:118
> 0x1099a82b extract_constrain_insn_cached(rtx_insn*)
>   ../../latest-gcc-x86/gcc/recog.c:2207
> 0x11393917 insn_min_length(rtx_insn*)
>   ../../latest-gcc-x86/gcc/config/rs6000/rs6000.md:721
> 0x105bece3 shorten_branches(rtx_insn*)
>   ../../latest-gcc-x86/gcc/final.c:1118
>
>
> As I checked, when the FP registers are zeroed, the above failure happened.
>
> I suspect that the issue still relate to the following statement:
>
> machine_mode mode = reg_raw_mode[regno];
>
> As I checked, the reg_raw_mode always return the integer mode that can be 
> hold by the hard registers, even though it’s FP register.

Well, more precisely: it's the largest mode that the target allows the
registers to hold.  If there are multiple candidate modes of the same
size, the integer one wins, like you say.  But the point is that DI only
wins over DF because the target allows both DI and DF to be stored in
the register, and therefore supports both DI and DF moves for that
register.

So I don't think the mode is the issue.  Integer zero and floating-point
zero have the same bit representation after all.

AIUI, without VSX, Power needs to load the zero from the constant pool.

> So, I still wondering:
>
> 1. Is there another available utility routine that returns the proper MODE 
> for the hard registers that can be readily used to zero the hard register?
> 2. If not, should I add one more target hook for this purpose? i.e 
>
> /* Return the proper machine mode that can be used to zero this hard register 
> specified by REGNO.  */
> machine_mode zero-call-used-regs-mode (unsigned int REGNO)
>
> 3. Or should I just delete the default implemeantion, and let the target to 
> implement it.

IMO no.  This goes back to what we discussed earlier.  It isn't the
case that a default target hook has to be correct for all targets,
with targets only overriding them as an optimisation.  The default
versions of many hooks and macros are not conservatively correct.
They are just reaonable default assumptions.  And IMO that's true
of the hook above too.

The way to flush out whether a target needs to override the hook
is to add tests that run on all targets.

That said, one way of erring on the side of caution from an ICE
perspective would be to do something like:

rtx_insn *last_insn = get_last_insn ();
rtx zero = CONST0_RTX (reg_raw_mode[regno]);
rtx_insn *insn = emit_insn (gen_rtx_SET (regno_reg_rtx[regno], zero));
if (!valid_insn_p (insn))
  {
delete_insns_since (last_insn);
...remove regno from the set of cleared registers...;
  }

where valid_insn_p abstracts out this code from ira.c:

  recog_memoized (move_insn);
  if (INSN_CODE (move_insn) < 0)
continue;
  extract_insn (move_insn);
  /* We don't know whether the move will be in code that is optimized
 for size or speed, so consider all enabled alternatives.  */
  if (! constrain_operands (1, get_enabled_alternatives (move_insn)))
continue;

(but keeping the comment where it is).  The default behaviour would then
be to drop any register that can't be zeroed easily.

Doing this would make the default hook usable for more targets.
The question is whether dropping registers that can't be 

[Patch] Fortran: add contiguous check for ptr assignment, fix non-contig check (PR97242)

2020-09-30 Thread Tobias Burnus

The non-contiguous had both check false positive and false
negative results. Some more refinements
are surely possible, but hopefully there are no longer
false positives.

I also now used this check for pointer assignments where the
LHS pointer has the contiguous attribute.

In the non-contiguous-check function:
- for 'dt(i)%array' it returned true due to dt(i) but that's
  an element, which is contiguous.
- ref_size (which is a size) is compared with 'arr_size' calculated
  via dep_difference,, which returns upper-lower but array size is
  (upper-lower)+1.
- fixed a memory leak.

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: add contiguous check for ptr assignment, fix non-contig check (PR97242)

gcc/fortran/ChangeLog:

	PR fortran/97242
	* expr.c (gfc_is_not_contiguous): Fix check.
	(gfc_check_pointer_assign): Use it.

gcc/testsuite/ChangeLog:

	PR fortran/97242
	* gfortran.dg/contiguous_11.f90: New test.
	* gfortran.dg/contiguous_4.f90: Update.
	* gfortran.dg/contiguous_7.f90: Update.

 gcc/fortran/expr.c  | 26 -
 gcc/testsuite/gfortran.dg/contiguous_11.f90 | 45 +
 gcc/testsuite/gfortran.dg/contiguous_4.f90  |  6 ++--
 gcc/testsuite/gfortran.dg/contiguous_7.f90  | 16 --
 4 files changed, 82 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 68784a235f1..b87ae3d72a1 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4366,10 +4366,18 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue,
  contiguous.  */
 
   if (lhs_attr.contiguous
-  && lhs_attr.dimension > 0
-  && !gfc_is_simply_contiguous (rvalue, false, true))
-gfc_warning (OPT_Wextra, "Assignment to contiguous pointer from "
-		 "non-contiguous target at %L", >where);
+  && lhs_attr.dimension > 0)
+{
+  if (gfc_is_not_contiguous (rvalue))
+	{
+	  gfc_error ("Assignment to contiguous pointer from "
+		 "non-contiguous target at %L", >where);
+	  return false;
+	}
+  if (!gfc_is_simply_contiguous (rvalue, false, true))
+	gfc_warning (OPT_Wextra, "Assignment to contiguous pointer from "
+ "non-contiguous target at %L", >where);
+}
 
   /* Warn if it is the LHS pointer may lives longer than the RHS target.  */
   if (warn_target_lifetime
@@ -5935,7 +5943,7 @@ gfc_is_not_contiguous (gfc_expr *array)
 {
   /* Array-ref shall be last ref.  */
 
-  if (ar)
+  if (ar && ar->type != AR_ELEMENT)
 	return true;
 
   if (ref->type == REF_ARRAY)
@@ -5955,10 +5963,11 @@ gfc_is_not_contiguous (gfc_expr *array)
 
   if (gfc_ref_dimen_size (ar, i, _size, NULL))
 	{
-	  if (gfc_dep_difference (ar->as->lower[i], ar->as->upper[i], _size))
+	  if (gfc_dep_difference (ar->as->upper[i], ar->as->lower[i], _size))
 	{
 	  /* a(2:4,2:) is known to be non-contiguous, but
 		 a(2:4,i:i) can be contiguous.  */
+	  mpz_add_ui (arr_size, arr_size, 1L);
 	  if (previous_incomplete && mpz_cmp_si (ref_size, 1) != 0)
 		{
 		  mpz_clear (arr_size);
@@ -5979,7 +5988,10 @@ gfc_is_not_contiguous (gfc_expr *array)
 	  && ar->dimen_type[i] == DIMEN_RANGE
 	  && ar->stride[i] && ar->stride[i]->expr_type == EXPR_CONSTANT
 	  && mpz_cmp_si (ar->stride[i]->value.integer, 1) != 0)
-	return true;
+	{
+	  mpz_clear (ref_size);
+	  return true;
+	}
 
 	  mpz_clear (ref_size);
 	}
diff --git a/gcc/testsuite/gfortran.dg/contiguous_11.f90 b/gcc/testsuite/gfortran.dg/contiguous_11.f90
new file mode 100644
index 000..b7eb7bfd0b4
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/contiguous_11.f90
@@ -0,0 +1,45 @@
+! { dg-do compile }
+!
+! PR fortran/97242
+!
+implicit none
+type t
+  integer, allocatable :: A(:,:,:)
+  integer :: D(5,5,5)
+end type t
+
+type(t), target :: B(5)
+integer, pointer, contiguous :: P(:,:,:)
+integer, target :: C(5,5,5)
+integer :: i
+
+i = 1
+
+! OK: contiguous
+P => B(i)%A
+P => B(i)%A(:,:,:)
+P => C
+P => C(:,:,:)
+call foo (B(i)%A)
+call foo (B(i)%A(:,:,:))
+call foo (C)
+call foo (C(:,:,:))
+
+! Invalid - not contiguous
+! "If the pointer object has the CONTIGUOUS attribute, the pointer target shall be contiguous."
+! → known to be noncontigous (not always checkable, however)
+P => B(i)%A(:,::3,::4)   ! <<< Unknown as (1:2:3,1:3:4) is contiguous and has one element.
+P => B(i)%D(:,::2,::2)   ! { dg-error "Assignment to contiguous pointer from non-contiguous target" }
+P => C(::2,::2,::2)  ! { dg-error "Assignment to contiguous pointer from non-contiguous target" }
+
+! This following is stricter:
+! C1541  The actual argument corresponding to a dummy pointer with the
+!CONTIGUOUS attribute shall be simply contiguous (9.5.4).
+call foo (B(i)%A(:,::3,::4))  ! { dg-error "must be simply contiguous" }
+call foo (C(::2,::2,::2)) ! { dg-error "must be 

[PATCH] c++: Handle std::construct_at on automatic vars during constant evaluation [PR97195]

2020-09-30 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, we only support due to a bug in constant expressions
std::construct_at on non-automatic variables, because we VERIFY_CONSTANT the
second argument of placement new, which fails verification if it is an
address of an automatic variable.
The following patch fixes it by not performing that verification, the
placement new evaluation later on will verify it after it is dereferenced.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-09-30  Jakub Jelinek  

PR c++/97195
* constexpr.c (cxx_eval_call_expression): Don't VERIFY_CONSTANT the
second argument.

* g++.dg/cpp2a/constexpr-new14.C: New test.

--- gcc/cp/constexpr.c.jj   2020-09-22 21:08:01.993199681 +0200
+++ gcc/cp/constexpr.c  2020-09-29 18:37:09.517051012 +0200
@@ -2342,9 +2342,10 @@ cxx_eval_call_expression (const constexp
  tree arg = CALL_EXPR_ARG (t, i);
  arg = cxx_eval_constant_expression (ctx, arg, false,
  non_constant_p, overflow_p);
- VERIFY_CONSTANT (arg);
  if (i == 1)
arg1 = arg;
+ else
+   VERIFY_CONSTANT (arg);
}
  gcc_assert (arg1);
  return arg1;
--- gcc/testsuite/g++.dg/cpp2a/constexpr-new14.C.jj 2020-09-29 
18:40:52.834785887 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-new14.C2020-09-29 
18:40:47.707860852 +0200
@@ -0,0 +1,73 @@
+// PR c++/97195
+// { dg-do compile { target c++20 } }
+
+namespace std
+{
+  typedef __SIZE_TYPE__ size_t;
+
+  template 
+  struct allocator
+  {
+constexpr allocator () noexcept {}
+
+constexpr T *allocate (size_t n)
+{ return static_cast (::operator new (n * sizeof(T))); }
+
+constexpr void
+deallocate (T *p, size_t n)
+{ ::operator delete (p); }
+  };
+
+  template 
+  U __declval (int);
+  template 
+  T __declval (long);
+  template 
+  auto declval () noexcept -> decltype (__declval (0));
+
+  template 
+  struct remove_reference
+  { typedef T type; };
+  template 
+  struct remove_reference
+  { typedef T type; };
+  template 
+  struct remove_reference
+  { typedef T type; };
+
+  template 
+  constexpr T &&
+  forward (typename std::remove_reference::type ) noexcept
+  { return static_cast (t); }
+
+  template
+  constexpr T &&
+  forward (typename std::remove_reference::type &) noexcept
+  { return static_cast (t); }
+
+  template 
+  constexpr auto
+  construct_at (T *l, A &&... a)
+  noexcept (noexcept (::new ((void *) 0) T (std::declval ()...)))
+  -> decltype (::new ((void *) 0) T (std::declval ()...))
+  { return ::new ((void *) l) T (std::forward (a)...); }
+
+  template 
+  constexpr inline void
+  destroy_at (T *l)
+  { l->~T (); }
+}
+
+inline void *operator new (std::size_t, void *p) noexcept
+{ return p; }
+
+constexpr bool
+foo ()
+{
+  int a = 5;
+  int *p = std::construct_at (, -1);
+  if (p[0] != -1)
+throw 1;
+  return true;
+}
+constexpr bool b = foo ();

Jakub



[PATCH v2] c++: Fix up default initialization with consteval default ctor [PR96994]

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 25, 2020 at 04:30:26PM -0400, Jason Merrill via Gcc-patches wrote:
> On 9/15/20 3:57 AM, Jakub Jelinek wrote:
> > The following testcase is miscompiled (in particular the a and i
> > initialization).  The problem is that build_special_member_call due to
> > the immediate constructors (but not evaluated in constant expression mode)
> > doesn't create a CALL_EXPR, but returns a TARGET_EXPR with CONSTRUCTOR
> > as the initializer for it,
> 
> That seems like the bug; at the end of build_over_call, after you
> 
> >call = cxx_constant_value (call, obj_arg);
> 
> You need to build an INIT_EXPR if obj_arg isn't a dummy.

That works.  obj_arg is NULL if it is a dummy from the earlier code.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-09-30  Jakub Jelinek  

PR c++/96994
* call.c (build_over_call): If obj_arg is non-NULL, return INIT_EXPR
setting obj_arg to call.

* g++.dg/cpp2a/consteval18.C: New test.

--- gcc/cp/call.c.jj2020-09-10 15:52:50.688207138 +0200
+++ gcc/cp/call.c   2020-09-29 20:39:55.003361651 +0200
@@ -9200,6 +9200,8 @@ build_over_call (struct z_candidate *can
}
}
  call = cxx_constant_value (call, obj_arg);
+ if (obj_arg && !error_operand_p (call))
+   call = build2 (INIT_EXPR, void_type_node, obj_arg, call);
}
 }
   return call;
--- gcc/testsuite/g++.dg/cpp2a/consteval18.C.jj 2020-09-29 20:33:56.533596845 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/consteval18.C2020-09-29 20:33:56.533596845 
+0200
@@ -0,0 +1,26 @@
+// PR c++/96994
+// { dg-do run { target c++20 } }
+
+struct A { consteval A () { i = 1; } consteval A (int x) : i (x) {} int i = 0; 
};
+struct B { constexpr B () { i = 1; } constexpr B (int x) : i (x) {} int i = 0; 
};
+A const a;
+constexpr A b;
+B const c;
+A const constinit d;
+A const e = 2;
+constexpr A f = 3;
+B const g = 4;
+A const constinit h = 5;
+A i;
+B j;
+A k = 6;
+B l = 7;
+static_assert (b.i == 1 && f.i == 3);
+
+int
+main()
+{
+  if (a.i != 1 || c.i != 1 || d.i != 1 || e.i != 2 || g.i != 4 || h.i != 5
+  || i.i != 1 || j.i != 1 || k.i != 6 || l.i != 7)
+__builtin_abort ();
+}


Jakub



Re: [PATCH] libgomp: Enforce 1-thread limit in subteams

2020-09-30 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 29, 2020 at 08:25:44PM +0100, Andrew Stubbs wrote:
> 2020-09-29  Andrew Stubbs  
> 
>   * parallel.c (gomp_resolve_num_threads): Ignore nest_var on nvptx
>   and amdgcn targets.
> 
> diff --git a/libgomp/parallel.c b/libgomp/parallel.c
> index 2423f11f44a..0618056a7fe 100644
> --- a/libgomp/parallel.c
> +++ b/libgomp/parallel.c
> @@ -48,7 +48,14 @@ gomp_resolve_num_threads (unsigned specified, unsigned 
> count)
>  
>if (specified == 1)
>  return 1;
> -  else if (thr->ts.active_level >= 1 && !icv->nest_var)
> +
> +  /* Accelerators with fixed thread counts require this to return 1 for
> + nested parallel regions.  */
> +  if (thr->ts.active_level >= 1
> +#if !defined(__AMDGCN__) && !defined(__nvptx__)

I think the comment should go right above the #if !defined line, because
it doesn't describe what the whole if is about, just a particular detail in
it.
I think I'd prefer some macro for this, but we already have quite a few
nvptx and AMDGCN ifdefs in libgomp/*.c, so I can live with that.

So ok for trunk with the comment move.

> +  && !icv->nest_var
> +#endif
> +  )
>  return 1;
>else if (thr->ts.active_level >= gomp_max_active_levels_var)
>  return 1;


Jakub



[PATCH] [PR96608] analyzer: Change cast from long to intptr_t

2020-09-30 Thread Markus Böck via Gcc-patches
Casting to intptr_t states the intent of an integer to pointer cast
more clearly and ensures that the cast causes no loss of precision on
any platforms. LLP64 platforms eg. have a long value of 4 bytes and
pointer values of 8 bytes which may even cause compiler errors.

Fixes PR 96608

Would need this to be committed for me if accepted. (username
zero9178, email markus.boec...@gmail.com)

Markus

gcc/analyzer/ChangeLog:
PR analyzer/96608

* store.h (hash): Cast to intptr_t instead of long

---
 gcc/analyzer/store.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/analyzer/store.h b/gcc/analyzer/store.h
index 0f4e7ab2a56..9589c566e1b 100644
--- a/gcc/analyzer/store.h
+++ b/gcc/analyzer/store.h
@@ -269,7 +269,7 @@ public:

   hashval_t hash () const
   {
-return (binding_key::impl_hash () ^ (long)m_region);
+return (binding_key::impl_hash () ^ (intptr_t)m_region);
   }
   bool operator== (const symbolic_binding ) const
   {
-- 
2.17.1


[RS6000] Adjust gcc asm for power10

2020-09-30 Thread Alan Modra via Gcc-patches
Generate assembly that is .localentry 1 with @notoc calls to match.

Bootstrapped and regression tested powerpc64le-linux on power8, and
bootstrapped on power10.  (I lost the power10 machine to someone else
before I could build a baseline to compare against.)

gcc/
* config/rs6000/ppc-asm.h: Support __PCREL__ code.
libgcc/
* config/rs6000/morestack.S,
* config/rs6000/tramp.S,
* config/powerpc/sjlj.S: Support __PCREL__ code.

diff --git a/gcc/config/rs6000/ppc-asm.h b/gcc/config/rs6000/ppc-asm.h
index 48edc9945d7..e0bce9c5aec 100644
--- a/gcc/config/rs6000/ppc-asm.h
+++ b/gcc/config/rs6000/ppc-asm.h
@@ -262,6 +262,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #undef toc
 
 #define FUNC_NAME(name) GLUE(__USER_LABEL_PREFIX__,name)
+#ifdef __PCREL__
+#define JUMP_TARGET(name) GLUE(FUNC_NAME(name),@notoc)
+#define FUNC_START(name) \
+   .type FUNC_NAME(name),@function; \
+   .globl FUNC_NAME(name); \
+FUNC_NAME(name): \
+   .localentry FUNC_NAME(name),1
+#else
 #define JUMP_TARGET(name) FUNC_NAME(name)
 #define FUNC_START(name) \
.type FUNC_NAME(name),@function; \
@@ -270,6 +278,7 @@ FUNC_NAME(name): \
 0: addis 2,12,(.TOC.-0b)@ha; \
addi 2,2,(.TOC.-0b)@l; \
.localentry FUNC_NAME(name),.-FUNC_NAME(name)
+#endif /* !__PCREL__ */
 
 #define HIDDEN_FUNC(name) \
   FUNC_START(name) \
diff --git a/libgcc/config/rs6000/morestack.S b/libgcc/config/rs6000/morestack.S
index 1b8ebb5dc3b..ac33c882c30 100644
--- a/libgcc/config/rs6000/morestack.S
+++ b/libgcc/config/rs6000/morestack.S
@@ -55,11 +55,18 @@
.type name,@function;   \
 name##:
 
+#ifdef __PCREL__
+#define ENTRY(name)\
+   ENTRY0(name);   \
+   .localentry name, 1
+#define JUMP_TARGET(name) name##@notoc
+#else
 #define ENTRY(name)\
ENTRY0(name);   \
 0: addis %r2,%r12,.TOC.-0b@ha; \
 addi %r2,%r2,.TOC.-0b@l;   \
.localentry name, .-name
+#endif
 
 #else
 
@@ -81,6 +88,9 @@ BODY_LABEL(name)##:
 
 #define SIZE(name) .size name, .-BODY_LABEL(name)
 
+#ifndef JUMP_TARGET
+#define JUMP_TARGET(name) name
+#endif
 
.text
 # Just like __morestack, but with larger excess allocation
@@ -156,7 +166,7 @@ ENTRY0(__morestack)
stdu %r1,-MORESTACK_FRAMESIZE(%r1)
 
# void __morestack_block_signals (void)
-   bl __morestack_block_signals
+   bl JUMP_TARGET(__morestack_block_signals)
 
# void *__generic_morestack (size_t *pframe_size,
#void *old_stack,
@@ -164,7 +174,7 @@ ENTRY0(__morestack)
addi %r3,%r29,NEWSTACKSIZE_SAVE
mr %r4,%r29
li %r5,0# no copying from old stack
-   bl __generic_morestack
+   bl JUMP_TARGET(__generic_morestack)
 
 # Start using new stack
stdu %r29,-32(%r3)  # back-chain
@@ -183,7 +193,7 @@ ENTRY0(__morestack)
std %r3,-0x7000-64(%r13)# tcbhead_t.__private_ss
 
# void __morestack_unblock_signals (void)
-   bl __morestack_unblock_signals
+   bl JUMP_TARGET(__morestack_unblock_signals)
 
 # Set up for a call to the target function, located 3
 # instructions after __morestack's return address.
@@ -218,11 +228,11 @@ ENTRY0(__morestack)
std %r10,PARAMREG_SAVE+56(%r29)
 #endif
 
-   bl __morestack_block_signals
+   bl JUMP_TARGET(__morestack_block_signals)
 
# void *__generic_releasestack (size_t *pavailable)
addi %r3,%r29,NEWSTACKSIZE_SAVE
-   bl __generic_releasestack
+   bl JUMP_TARGET(__generic_releasestack)
 
 # Reset __private_ss stack guard to value for old stack
ld %r12,NEWSTACKSIZE_SAVE(%r29)
@@ -231,7 +241,7 @@ ENTRY0(__morestack)
 .LEHE0:
std %r3,-0x7000-64(%r13)# tcbhead_t.__private_ss
 
-   bl __morestack_unblock_signals
+   bl JUMP_TARGET(__morestack_unblock_signals)
 
 # Use old stack again.
mr %r1,%r29
@@ -260,13 +270,15 @@ cleanup:
std %r3,PARAMREG_SAVE(%r29) # Save exception header
# size_t __generic_findstack (void *stack)
mr %r3,%r29
-   bl __generic_findstack
+   bl JUMP_TARGET(__generic_findstack)
sub %r3,%r29,%r3
addi %r3,%r3,BACKOFF
std %r3,-0x7000-64(%r13)# tcbhead_t.__private_ss
ld %r3,PARAMREG_SAVE(%r29)
-   bl _Unwind_Resume
+   bl JUMP_TARGET(_Unwind_Resume)
+#ifndef __PCREL__
nop
+#endif
.cfi_endproc
SIZE (__morestack)
 
@@ -310,7 +322,7 @@ ENTRY(__stack_split_initialize)
# void __generic_morestack_set_initial_sp (void *sp, size_t len)
mr %r3,%r1
li %r4, 0x4000
-   b __generic_morestack_set_initial_sp
+   b JUMP_TARGET(__generic_morestack_set_initial_sp)
 # The lack of .cfi_endproc here is 

[RS6000] -mno-minimal-toc vs. power10 pcrelative

2020-09-30 Thread Alan Modra via Gcc-patches
We've had this hack in the libgcc config to build libgcc with
-mcmodel=small for powerpc64 for a long time.  It wouldn't be a bad
thing if someone who knows the multilib machinery well could arrange
for -mcmodel=small to be passed just for ppc64 when building for
earlier than power10.  But for now, make -mno-minimal-toc do nothing
when pcrel.  Which will do the right thing for any project that has
copied libgcc's trick.

We want this if configuring using --with-cpu=power10 to build a
power10 pcrel libgcc.  --mcmodel=small turns off pcrel.

Bootstrapped and regression tested powerpc64le-linux.  OK?

gcc/
* config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't
set -mcmodel=small for -mno-minimal-toc when pcrel.
libgcc/
* config/rs6000/t-linux: Document purpose of -mno-minimal-toc.

diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
index 2ded3301282..2de1097d3ec 100644
--- a/gcc/config/rs6000/linux64.h
+++ b/gcc/config/rs6000/linux64.h
@@ -132,20 +132,25 @@ extern int dot_symbols;
  if ((rs6000_isa_flags & OPTION_MASK_POWERPC64) == 0)  \
{   \
  rs6000_isa_flags |= OPTION_MASK_POWERPC64;\
- error ("%<-m64%> requires a PowerPC64 cpu");  \
+ error ("%<-m64%> requires a PowerPC64 cpu");  \
}   \
+ if (!global_options_set.x_rs6000_current_cmodel)  \
+   SET_CMODEL (CMODEL_MEDIUM); \
  if ((rs6000_isa_flags_explicit\
   & OPTION_MASK_MINIMAL_TOC) != 0) \
{   \
  if (global_options_set.x_rs6000_current_cmodel\
  && rs6000_current_cmodel != CMODEL_SMALL) \
error ("%<-mcmodel incompatible with other toc options%>"); \
- SET_CMODEL (CMODEL_SMALL);\
+ if (TARGET_MINIMAL_TOC\
+ || !(TARGET_PCREL \
+  || (PCREL_SUPPORTED_BY_OS\
+  && (rs6000_isa_flags_explicit\
+  & OPTION_MASK_PCREL) == 0))) \
+   SET_CMODEL (CMODEL_SMALL);  \
}   \
  else  \
{   \
- if (!global_options_set.x_rs6000_current_cmodel)  \
-   SET_CMODEL (CMODEL_MEDIUM); \
  if (rs6000_current_cmodel != CMODEL_SMALL)\
{   \
  if (!global_options_set.x_TARGET_NO_FP_IN_TOC) \
diff --git a/libgcc/config/rs6000/t-linux b/libgcc/config/rs6000/t-linux
index 4f6d4c4a4d2..ed821947b66 100644
--- a/libgcc/config/rs6000/t-linux
+++ b/libgcc/config/rs6000/t-linux
@@ -1,3 +1,8 @@
 SHLIB_MAPFILES += $(srcdir)/config/rs6000/libgcc-glibc.ver
 
-HOST_LIBGCC2_CFLAGS += -mlong-double-128 -mno-minimal-toc
+HOST_LIBGCC2_CFLAGS += -mlong-double-128
+
+# This is a way of selecting -mcmodel=small for ppc64, which gives
+# smaller and faster libgcc code.  Directly specifying -mcmodel=small
+# would need to take into account targets for which -mcmodel is invalid.
+HOST_LIBGCC2_CFLAGS += -mno-minimal-toc

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH v2] builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-09-30 Thread Richard Biener
On Tue, 29 Sep 2020, Segher Boessenkool wrote:

> Hi Raoni,
> 
> Some of this isn't an rs6000 patch, but the subject says it is, so it
> might well not draw the attention it needs.
> 
> Adding some Cc:s.
> 
> On Fri, Sep 04, 2020 at 12:52:30PM -0300, Raoni Fassina Firmino wrote:
> > There is one pending question raised by Segher, It is about adding
> > documentation, I am not sure if it is needed and if so, where it
> > should be. I will quote the relevant part of the conversation[2] from
> > the v1 thread for context:
> > 
> >   > > > +OPTAB_D (fegetround_optab, "fegetround$a")
> >   > > > +OPTAB_D (feclearexcept_optab, "feclearexcept$a")
> >   > > > +OPTAB_D (feraiseexcept_optab, "feraiseexcept$a")
> >   > >?
> >   > > Should those be documented somewhere?  (In gcc/doc/ somewhere).
> >   >
> >   > I am lost on that one. I took a look on the docs (I hope looking on the
> >   > online docs was good enough) and I didn't find a place where i feel it
> >   > sits well. On the PowerPC target specific sections (6.60.22 Basic
> >   > PowerPC Built-in Functions), I didn't found it mentioning builtins that
> >   > are optimizations for the standard library functions, but we do have
> >   > many of these for Power.  Then, on the generic section (6.59 Other
> >   > Built-in Functions Provided by GCC) it mentions C99 functions that have
> >   > builtins but it seems like it mentions builtins that have target
> >   > independent implementation, or at least it dos not say that some
> >   > builtins may be implemented on only some targets.  And in this case
> >   > there is no implementation (for now) for any other target that is not
> >   > PowerPc.
> >   >
> >   > So, I don't know if or where this should be documented.
> 
> I don't see much about optabs in the docs either.  Add some text to
> optabs.def itself then?

All optabs are documented in doc/md.texi as 'instruction patterns'

This is where new optabs need to be documented.

> > +(define_expand "feclearexceptsi"
> > +  [(use (match_operand:SI 1 "const_int_operand" "n"))
> > +   (set (match_operand:SI 0 "gpc_reg_operand")
> > +   (const_int 0))]
> > +  "TARGET_HARD_FLOAT"
> > +{
> > +  switch (INTVAL (operands[1]))
> > +{
> > +case 0x200:  /* FE_INEXACT */
> > +case 0x400:  /* FE_DIVBYZERO */
> > +case 0x800:  /* FE_UNDERFLOW */
> > +case 0x1000: /* FE_OVERFLOW */
> 
> Please write 0x0200 etc. instead?
> 
> > +;; int fegraiseexcept(int excepts)
> 
> (typo)
> 
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fenv_exceptions } */
> > +/* { dg-options "-lm -fno-builtin" } */
> 
> That -fno-builtin looks very strange...  Comment what it is for?
> 
> > +#define FAIL(v, e) printf("ERROR, __builtin_fegetround() returned %d," \
> > +  " not the expecected value %d\n", v, e);
> 
> (Typo, "expected")
> 
> The rs6000 part is okay for trunk (with those modifications), after the
> generic parts is approved.  Thanks!
> 
> 
> Segher
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH][GCC 10] Fix build failure with zstd versio9n 1.2.0 or older.

2020-09-30 Thread Richard Biener via Gcc-patches
On Wed, Sep 30, 2020 at 5:56 AM Jim Wilson  wrote:
>
> This is the gcc-10 branch version of the patch on mainline.
>
> Extends the configure check for zstd.h to also verify the zstd version,
> since gcc requires features that only exist in 1.3.0 and newer.  Without
> this patch we get a build error for lto-compress.c when using an old zstd
> version.
>
> OK?

OK

> Jim
>
> Backported from master:
> 2020-09-29  Jim Wilson  
>
> gcc/
> PR bootstrap/97183
> * configure.ac (gcc_cv_header_zstd_h): Check ZSTD_VERISON_NUMBER.
> * configure: Regenerated.
> ---
>  gcc/configure| 11 ---
>  gcc/configure.ac |  7 ++-
>  2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/configure b/gcc/configure
> index eb6061c1631..b4088d8fd1e 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -10024,9 +10024,14 @@ $as_echo_n "checking for zstd.h... " >&6; }
>  if ${gcc_cv_header_zstd_h+:} false; then :
>$as_echo_n "(cached) " >&6
>  else
> -  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +  # We require version 1.3.0 or later.  This is the first version that has
> +# ZSTD_getFrameContentSize.
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
>  /* end confdefs.h.  */
>  #include 
> +#if ZSTD_VERSION_NUMBER < 10300
> +#error "need zstd 1.3.0 or better"
> +#endif
>  int
>  main ()
>  {
> @@ -19015,7 +19020,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 19018 "configure"
> +#line 19023 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> @@ -19121,7 +19126,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 19124 "configure"
> +#line 19129 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 715fcba0482..070b9c6c497 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -1382,8 +1382,13 @@ LDFLAGS="$LDFLAGS $ZSTD_LDFLAGS"
>
>  AC_MSG_CHECKING(for zstd.h)
>  AC_CACHE_VAL(gcc_cv_header_zstd_h,
> +# We require version 1.3.0 or later.  This is the first version that has
> +# ZSTD_getFrameContentSize.
>  [AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
> -[[#include ]])],
> +[[#include 
> +#if ZSTD_VERSION_NUMBER < 10300
> +#error "need zstd 1.3.0 or better"
> +#endif]])],
>[gcc_cv_header_zstd_h=yes],
>[gcc_cv_header_zstd_h=no])])
>  AC_MSG_RESULT($gcc_cv_header_zstd_h)
> --
> 2.17.1
>


Re: [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter

2020-09-30 Thread Richard Biener via Gcc-patches
On Tue, Sep 29, 2020 at 9:31 PM Jan Hubicka  wrote:
>
> >
> > gcc/ChangeLog:
> >
> > 2020-09-07  Martin Jambor  
> >
> >   * params.opt (ipa-cp-large-unit-insns): New parameter.
> >   * ipa-cp.c (get_max_overall_size): Use the new parameter.
> OK,

Maybe the IPA CP large-unit should be a factor of the large-unit
param?  Thus, make the new param ipa-cp-large-unit-factor
instead so when people increase large-unit they also get "other"
large units increased accordingly?

> thanks!
> Honza
> > ---
> >  gcc/ipa-cp.c   | 2 +-
> >  gcc/params.opt | 4 
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> > index 12acf24c553..2152f9e5876 100644
> > --- a/gcc/ipa-cp.c
> > +++ b/gcc/ipa-cp.c
> > @@ -3448,7 +3448,7 @@ static long
> >  get_max_overall_size (cgraph_node *node)
> >  {
> >long max_new_size = orig_overall_size;
> > -  long large_unit = opt_for_fn (node->decl, param_large_unit_insns);
> > +  long large_unit = opt_for_fn (node->decl, param_ipa_cp_large_unit_insns);
> >if (max_new_size < large_unit)
> >  max_new_size = large_unit;
> >int unit_growth = opt_for_fn (node->decl, param_ipa_cp_unit_growth);
> > diff --git a/gcc/params.opt b/gcc/params.opt
> > index acb59f17e45..9d177ab50ad 100644
> > --- a/gcc/params.opt
> > +++ b/gcc/params.opt
> > @@ -218,6 +218,10 @@ Percentage penalty functions containing a single call 
> > to another function will r
> >  Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param 
> > Optimization
> >  How much can given compilation unit grow because of the interprocedural 
> > constant propagation (in percent).
> >
> > +-param=ipa-cp-large-unit-insns=
> > +Common Joined UInteger Var(param_ipa_cp_large_unit_insns) Optimization 
> > Init(16000) Param
> > +The size of translation unit that IPA-CP pass considers large.
> > +
> >  -param=ipa-cp-value-list-size=
> >  Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param 
> > Optimization
> >  Maximum size of a list of values associated with each parameter for 
> > interprocedural constant propagation.
> > --
> > 2.28.0


Re: [SLP][VECT] Add check to fix 96837

2020-09-30 Thread Richard Biener
On Tue, 29 Sep 2020, Joel Hutton wrote:

>  Hi All,
> 
> The following patch adds a simple check to prevent slp stmts from vector 
> constructors being rearranged. vect_attempt_slp_rearrange_stmts tries to 
> rearrange to avoid a load permutation.
> 
> This fixes PR target/96837 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827
> gcc/ChangeLog:

OK for trunk and branch(es)

Thanks,
Richard.

> 2020-09-29  Joel Hutton  
> 
> PR target/96837
> * tree-vect-slp.c (vect_analyze_slp): Do not call 
> vect_attempt_slp_rearrange_stmts for vector constructors.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-09-29  Joel Hutton  
> 
> PR target/96837
> * gcc.dg/vect/bb-slp-49.c: New test.

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: Add trailing dots to fortran io fnspecs to match signature

2020-09-30 Thread Richard Biener
On Tue, 29 Sep 2020, Jan Hubicka wrote:

> > On September 29, 2020 4:20:42 PM GMT+02:00, Jan Hubicka  
> > wrote:
> > >Hi,
> > >this patch is not needed but makes it possible to sanity check that
> > >fnspec match function signature. It turns out that there are quite few
> > >mistakes in that in trans-decl and one mistake here.
> > >Transfer_derived has additional parameters.
> > 
> > Hmm, omitting trailing dots was on purpose to make the string short (also 
> > consider varargs...).  You can still sanity check the prefix, no? 
> 
> Yes, I check the prefix and check that only permitted letters appears on
> given positions.  However it seems there is enough fuzz to justify one
> extra byte or two in the string (it is not very long anyway).
> 
> I only check it in gfc_build infrastructure and allow early ending
> strings otherwise.

Ah, OK.

Fair enough then, thus OK

Richard.

> I do not have very strong opinionshere, but it seems it is easy to shit
> the string by one or miss a middle argument (especially for calls with
> 13 parameters) that is caught by this check.
> 
> I was also consiering teaching fortran to check that R/W is used only
> for pointer type parameters (but did not implement it)
> Honza
> > 
> > >Bootstrapped/regtested x86_64-linux. OK?
> > >Honza
> > >
> > >   * transe-io.c (gfc_build_io_library_fndecls): Add traling "." for
> > >   fnspecs so the match number of parameters.
> > >diff --git a/gcc/fortran/trans-io.c b/gcc/fortran/trans-io.c
> > >index 21bdd5ef0d8..363cca51ef9 100644
> > >--- a/gcc/fortran/trans-io.c
> > >+++ b/gcc/fortran/trans-io.c
> > >@@ -328,86 +328,86 @@ gfc_build_io_library_fndecls (void)
> > >dt_parm_type = build_pointer_type (st_parameter[IOPARM_ptype_dt].type);
> > > 
> > > iocall[IOCALL_X_INTEGER] = gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_integer")), ".wW",
> > >+  get_identifier (PREFIX("transfer_integer")), ".wW.",
> > >   void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node);
> > > 
> > >iocall[IOCALL_X_INTEGER_WRITE] =
> > >gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_integer_write")), ".wR",
> > >+  get_identifier (PREFIX("transfer_integer_write")), ".wR.",
> > >   void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node);
> > > 
> > > iocall[IOCALL_X_LOGICAL] = gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_logical")), ".wW",
> > >+  get_identifier (PREFIX("transfer_logical")), ".wW.",
> > >   void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node);
> > > 
> > >iocall[IOCALL_X_LOGICAL_WRITE] =
> > >gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_logical_write")), ".wR",
> > >+  get_identifier (PREFIX("transfer_logical_write")), ".wR.",
> > >   void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node);
> > > 
> > >iocall[IOCALL_X_CHARACTER] = gfc_build_library_function_decl_with_spec
> > >(
> > >-  get_identifier (PREFIX("transfer_character")), ".wW",
> > >+  get_identifier (PREFIX("transfer_character")), ".wW.",
> > >   void_type_node, 3, dt_parm_type, pvoid_type_node,
> > >gfc_charlen_type_node);
> > > 
> > >iocall[IOCALL_X_CHARACTER_WRITE] =
> > >gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_character_write")), ".wR",
> > >+  get_identifier (PREFIX("transfer_character_write")), ".wR.",
> > >   void_type_node, 3, dt_parm_type, pvoid_type_node,
> > >gfc_charlen_type_node);
> > > 
> > >iocall[IOCALL_X_CHARACTER_WIDE] =
> > >gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_character_wide")), ".wW",
> > >+  get_identifier (PREFIX("transfer_character_wide")), ".wW..",
> > >   void_type_node, 4, dt_parm_type, pvoid_type_node,
> > >   gfc_charlen_type_node, gfc_int4_type_node);
> > > 
> > >   iocall[IOCALL_X_CHARACTER_WIDE_WRITE] =
> > > gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_character_wide_write")), ".wR",
> > >+  get_identifier (PREFIX("transfer_character_wide_write")), ".wR..",
> > >   void_type_node, 4, dt_parm_type, pvoid_type_node,
> > >   gfc_charlen_type_node, gfc_int4_type_node);
> > > 
> > >   iocall[IOCALL_X_REAL] = gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_real")), ".wW",
> > >+  get_identifier (PREFIX("transfer_real")), ".wW.",
> > >   void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node);
> > > 
> > >iocall[IOCALL_X_REAL_WRITE] = gfc_build_library_function_decl_with_spec
> > >(
> > >-  get_identifier (PREFIX("transfer_real_write")), ".wR",
> > >+  get_identifier (PREFIX("transfer_real_write")), ".wR.",
> > >   void_type_node, 3, dt_parm_type, pvoid_type_node, gfc_int4_type_node);
> > > 
> > > iocall[IOCALL_X_COMPLEX] = gfc_build_library_function_decl_with_spec (
> > >-  get_identifier (PREFIX("transfer_complex")), ".wW",
> > >+  get_identifier 

Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-30 Thread Richard Biener via Gcc-patches
On Tue, Sep 29, 2020 at 3:07 PM Alexandre Oliva  wrote:
>
> On Sep 29, 2020, Richard Biener  wrote:
>
> > On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva  wrote:
>
> >> On Sep 28, 2020, Richard Biener  wrote:
>
> > ifcombine should stop using fold*, yeah
>
> Wow, that's quite a lot of work for no expected improvement in codegen.
> I don't expect to be able to justify such an undertaking :-(
>
> > I also think it will not end up using the simplifications using loads.
>
> Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
> the inner block.  that won't do when the whole point is to merge loads
> from memory.
>
> That seems excessive.  Since we rule out any memory-changing side
> effects, I suppose we could get away with checking for volatile operands
> there.  Then, adding just a little SSA_DEF chasing, I believe I could
> bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
> without much difficulty, and then we could do away with at least that
> part of fold_truth_andor.

The current restrictions were for sure to make my life easier at start
when implementing the pass ;)  Note that you have to watch out
for short-circuited stmts that may trap or invoke undefined behavior
at runtime.

> > Specifically your patch seems to introduce splitting of loads
> > at alignment boundaries
>
> ... when there's another compare involving a load from either side of
> the crossed alignment boundary.  Even on arches that can do unaligned
> loads, the result is no worse, and if there are multiple fields crossing
> consecutive alignment boundaries, the codegen and performance difference
> can be pretty significant.

Ah, OK - I didn't look that closely.

> >> I *think* ifcombine could even be extended so as to reuse the
> >> separate-test logic I put in, by looking for non-immediate dominating
> >> outer conditions for the inner condition.  A further modified version of
> >> fold_truth_andor_1 could then be used to combine the separate tests.
>
> > I think the structure of ifcombine doesn't exactly match what
> > fold_truth_andor does
>
> How so?  AFAICT ifcombine_ifandif deals exactly with the (gimplified
> version of the) structure I described in the patch that started the
> thread:
>
>   (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)

Indeed.

>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer