[PATCH] Enable mask operation for 128/256-bit vector VCOND_EXPR under avx512f (PR92686)

2019-12-03 Thread Hongtao Liu
Hi:
  Currently for VCOND_EXPR, integer mask operation is only available
for 512-bit vector, but since mask register is related to isa not
vector size, under avx512f we can also have 128/256-bit vector
condition move. My local tests show there's no boost frequency penalty
for using integer mask register, and also it will reduce sse register
pressure and save 1 instruction(vpblendvb).

  Bootstrap is ok, regression test on i386/x86_64 backend is ok.
  Ok for trunk.

Changelog
gcc/
PR target/92686
* config/i386/sse.md
(*_cmp3,
*_cmp3,
*_ucmp3,
*_ucmp3): New.
* config/i386/i386.c (ix86_print_operand): New operand substitution.
* config/i386/i386-expand.c (ix86_valid_mask_cmp_mode):
New function.
(ix86_expand_sse_cmp): Relax condition for integer mask from
512-bit vector to all 128/256/512-bit vector. Delete code gen
for avx512f compare patterns since we have generic pattern now.
(ix86_expand_sse_movcc): Adjust condition and codegen for
maskcmp.
(ix86_expand_int_sse_cmp): Don't canonicalize the comparison
when corresponding vector compare is available.

gcc/testsuite/
* gcc.target/i386/pr92686.inc: New file.
* gcc.target/i386/avx512bw-pr92686-vpcmp-1.c: New test.
* gcc.target/i386/avx512bw-pr92686-vpcmp-2.c: Ditto.
* gcc.target/i386/avx512vl-pr92686-vpcmp-1.c: Ditto.
* gcc.target/i386/avx512vl-pr92686-vpcmp-2.c: Ditto.
* gcc.target/i386/avx512bw-pr92686-movcc-1.c: Ditto.
* gcc.target/i386/avx512bw-pr92686-movcc-2.c: Ditto.
* gcc.target/i386/avx512vl-pr92686-movcc-1.c: Ditto.
* gcc.target/i386/avx512vl-pr92686-movcc-2.c: Ditto.
* gcc.target/i386/avx512vl-pr88547-1.c: Adjust testcase.
* gcc.target/i386/pr88547-1.c: Ditto.

-- 
BR,
Hongtao
From 103d31db47dacc5bba9c85389c61f69293f83695 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 28 Nov 2019 14:12:31 +0800
Subject: [PATCH] Enable mask movement for VCOND_EXPR under avx512f for
 128/256-bit vector when integer mask is available.

Changelog
gcc/
	PR target/92686
	* config/i386/sse.md
	(*_cmp3,
	*_cmp3,
	*_ucmp3,
	*_ucmp3): New.
	* config/i386/i386.c (ix86_print_operand): New operand substitution.
	* config/i386/i386-expand.c (ix86_valid_mask_cmp_mode):
	New function.
	(ix86_expand_sse_cmp): Relax condition for integer mask from
	512-bit vector to all 128/256/512-bit vector. Delete code gen
	for avx512f compare patterns since we have generic pattern now.
	(ix86_expand_sse_movcc): Adjust condition and codegen for
	maskcmp.
	(ix86_expand_int_sse_cmp): Don't canonicalize the comparison
	when corresponding vector compare is available.

gcc/testsuite/
	* gcc.target/i386/pr92686.inc: New file.
	* gcc.target/i386/avx512bw-pr92686-vpcmp-1.c: New test.
	* gcc.target/i386/avx512bw-pr92686-vpcmp-2.c: Ditto.
	* gcc.target/i386/avx512vl-pr92686-vpcmp-1.c: Ditto.
	* gcc.target/i386/avx512vl-pr92686-vpcmp-2.c: Ditto.
	* gcc.target/i386/avx512bw-pr92686-movcc-1.c: Ditto.
	* gcc.target/i386/avx512bw-pr92686-movcc-2.c: Ditto.
	* gcc.target/i386/avx512vl-pr92686-movcc-1.c: Ditto.
	* gcc.target/i386/avx512vl-pr92686-movcc-2.c: Ditto.
	* gcc.target/i386/avx512vl-pr88547-1.c: Adjust testcase.
	* gcc.target/i386/pr88547-1.c: Ditto.
---
 gcc/config/i386/i386-expand.c | 172 ++--
 gcc/config/i386/i386.c|  32 +++
 gcc/config/i386/sse.md|  48 +
 .../i386/avx512bw-pr92686-movcc-1.c   | 133 
 .../i386/avx512bw-pr92686-movcc-2.c   | 102 ++
 .../i386/avx512bw-pr92686-vpcmp-1.c   | 112 +++
 .../i386/avx512bw-pr92686-vpcmp-2.c   |  90 +
 .../gcc.target/i386/avx512vl-pr88547-1.c  |   8 +-
 .../i386/avx512vl-pr92686-movcc-1.c   | 133 
 .../i386/avx512vl-pr92686-movcc-2.c   | 102 ++
 .../i386/avx512vl-pr92686-vpcmp-1.c   | 112 +++
 .../i386/avx512vl-pr92686-vpcmp-2.c   |  91 +
 gcc/testsuite/gcc.target/i386/pr88547-1.c |  16 +-
 gcc/testsuite/gcc.target/i386/pr92686.inc | 189 ++
 14 files changed, 1212 insertions(+), 128 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr92686-movcc-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr92686-movcc-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr92686-vpcmp-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr92686-vpcmp-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr92686-movcc-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr92686-movcc-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr92686-vpcmp-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr92686-vpcmp-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92686.inc

diff --git a/gcc/config/i386/i386-expand.c 

[ PATCH ] [ C++ ] Implementing P0767 - deprecate POD

2019-12-03 Thread JeanHeyd Meneide
This patch implements deprecate POD for the C++ Standard Library,
bringing libstdc++ that much closer to 2020 conformance !

Hilariously, a small bug in the [[deprecated]] warning message was
found while implementing this patch, which drove me a bit insane for a
good 10 minutes until I realized what was going on:
https://godbolt.org/z/WwBNUx

Either way, here you go! Someone will probably have to fix the
template-class [[deprecated("foo")]] bug at some point too.

2019-12-03  JeanHeyd "ThePhD" Meneide  

* include/bits/c++config: Add new _GLIBCXX20_DEPRECATED macro.
* include/std/type_traits: Deprecate is_pod with message.
* testuite/20_util/is_pod/deprecated-2a.cc (new): test deprecation
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 7ccfc5f199d..5876b0b977b 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -78,6 +78,7 @@
 //   _GLIBCXX_USE_DEPRECATED
 //   _GLIBCXX_DEPRECATED
 //   _GLIBCXX17_DEPRECATED
+//   _GLIBCXX20_DEPRECATED( STRINGS... )
 #ifndef _GLIBCXX_USE_DEPRECATED
 # define _GLIBCXX_USE_DEPRECATED 1
 #endif
@@ -94,6 +95,12 @@
 # define _GLIBCXX17_DEPRECATED
 #endif
 
+#if defined(__DEPRECATED) && (__cplusplus > 201703L)
+# define _GLIBCXX20_DEPRECATED(...) [[deprecated(__VA_ARGS__)]]
+#else
+# define _GLIBCXX20_DEPRECATED(...) 
+#endif
+
 // Macros for ABI tag attributes.
 #ifndef _GLIBCXX_ABI_TAG_CXX11
 # define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11")))
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 8e787a994c3..91269d1bd02 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -685,10 +685,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"template argument must be a complete class or an unbounded array");
 };
 
-  /// is_pod
+  /// is_pod (deprecated C++2a)
   // Could use is_standard_layout && is_trivial instead of the builtin.
   template
-struct is_pod
+struct
+_GLIBCXX20_DEPRECATED("is_pod is deprecated in C++20: use 
is_standard_layout && is_trivial instead") 
+is_pod
 : public integral_constant
 {
   static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
@@ -3073,6 +3075,7 @@ template 
 template 
   inline constexpr bool is_standard_layout_v = is_standard_layout<_Tp>::value;
 template 
+  _GLIBCXX20_DEPRECATED("is_pod is deprecated in C++20: use 
is_standard_layout && is_trivial instead")
   inline constexpr bool is_pod_v = is_pod<_Tp>::value;
 template 
   inline constexpr bool is_literal_type_v = is_literal_type<_Tp>::value;
diff --git a/libstdc++-v3/testsuite/20_util/is_pod/deprecated-2a.cc 
b/libstdc++-v3/testsuite/20_util/is_pod/deprecated-2a.cc
new file mode 100644
index 000..9782c3f551d
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/is_pod/deprecated-2a.cc
@@ -0,0 +1,25 @@
+// Copyright (C) 2010-2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { target c++2a } }
+
+// { dg-prune-output "declared here" }
+
+#include 
+
+static_assert(std::is_pod::value); // { dg-warning "is deprecated" }
+static_assert(std::is_pod_v); // { dg-warning "is deprecated" }


[AMDGCN] Use fixed registers for queue ptr sgpr pair

2019-12-03 Thread Julian Brown
Hi,

When running tests on a slightly-revised version of the
worker-partitioning patch series previously posted starting here:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01475.html

I discovered that Kwok's patch to optimise GCN register usage
(r278301) was causing one of the new tests to fail:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01210.html

The patch marks the s6/s7 pair as unfixed, but that (as the "queue
pointer") is needed for some kinds of address-space conversion --
which may be synthesized either in normal or kernel entry-point
functions by the worker-partitioning support (gcn_addr_space_convert).
So, it's not safe to leave those registers for general use.

The solution seems to be just to fix the s6/s7 pair. I think I can
self-approve this as a bug fix. Any comments, though? (Andrew, Kwok?)

Tested with offloading to AMD GCN (alongside the worker-partitioning
patches).

Thanks,

Julian

ChangeLog

gcc/
* config/gcn/gcn.h (FIXED_REGISTERS): Make s6/s7 fixed registers.
commit a4e9b17ba2bcdc3ab206fa4424cb55bc320fd092
Author: Julian Brown 
Date:   Mon Dec 2 16:53:50 2019 -0800

Use fixed registers for queue ptr sgpr pair

gcc/
* config/gcn/gcn.h (FIXED_REGISTERS): Make s6/s7 fixed registers.

diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index e60b43122d5..bdf7188b5ff 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -160,7 +160,7 @@
 
 #define FIXED_REGISTERS {			\
 /* Scalars.  */\
-1, 1, 0, 0, 1, 1, 0, 0, 1, 1,		\
+1, 1, 0, 0, 1, 1, 1, 1, 1, 1,		\
 /*		fpsplr.  */		\
 1, 1, 0, 0, 0, 0, 1, 1, 0, 0,		\
 /*  exec_save, cc_save */			\


[PATCH] libstdc++: Implement spaceship for std::pair (P1614R2)

2019-12-03 Thread Jonathan Wakely

This defines operator<=> as a non-member function template and does not
alter operator==. This contradicts the changes made by P1614R2, which
specify both as hidden friends, but that specification of operator<=> is
broken and the subject of a soon-to-be-published LWG issue.

* include/bits/stl_pair.h [__cpp_lib_three_way_comparison]
(operator<=>): Define for C++20.
* libsupc++/compare (__cmp2way_res_t): Rename to __cmp3way_res_t,
move into __detail namespace. Do not turn argument types into lvalues.
(__cmp3way_helper): Rename to __cmp3way_res_impl, move into __detail
namespace. Constrain with concepts instead of using void_t.
(compare_three_way_result): Adjust name of base class.
(compare_three_way_result_t): Use __cmp3way_res_impl directly.
(__detail::__3way_cmp_with): Add workaround for PR 91073.
(compare_three_way): Use workaround.
(__detail::__synth3way, __detail::__synth3way_t): Define new helpers
implementing synth-three-way and synth-three-way-result semantics.
* testsuite/20_util/pair/comparison_operators/constexpr_c++20.cc: New
test.

Tested powerpc64le-linux, committed to trunk.

I plan to do more of P1614R2 tomorrow, starting with std::array (which
is also broken in the working draft).


commit b763e0d20380f4ef6b6cd41362128e6b78fe5b26
Author: Jonathan Wakely 
Date:   Tue Dec 3 14:57:00 2019 +

libstdc++: Implement spaceship for std::pair (P1614R2)

This defines operator<=> as a non-member function template and does not
alter operator==. This contradicts the changes made by P1614R2, which
specify both as hidden friends, but that specification of operator<=> is
broken and the subject of a soon-to-be-published LWG issue.

* include/bits/stl_pair.h [__cpp_lib_three_way_comparison]
(operator<=>): Define for C++20.
* libsupc++/compare (__cmp2way_res_t): Rename to __cmp3way_res_t,
move into __detail namespace. Do not turn argument types into 
lvalues.
(__cmp3way_helper): Rename to __cmp3way_res_impl, move into __detail
namespace. Constrain with concepts instead of using void_t.
(compare_three_way_result): Adjust name of base class.
(compare_three_way_result_t): Use __cmp3way_res_impl directly.
(__detail::__3way_cmp_with): Add workaround for PR 91073.
(compare_three_way): Use workaround.
(__detail::__synth3way, __detail::__synth3way_t): Define new helpers
implementing synth-three-way and synth-three-way-result semantics.
* testsuite/20_util/pair/comparison_operators/constexpr_c++20.cc: 
New
test.

diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index 1e0e4fe892a..02b1f0ac922 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -59,7 +59,10 @@
 #include  // for std::move / std::forward, and std::swap
 
 #if __cplusplus >= 201103L
-#include  // for std::__decay_and_strip too
+# include  // for std::__decay_and_strip, std::is_reference_v
+#endif
+#if __cplusplus > 201703L
+# include 
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -447,7 +450,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_GLIBCXX20_CONSTEXPR
 pair(tuple<_Args1...>&, tuple<_Args2...>&,
  _Index_tuple<_Indexes1...>, _Index_tuple<_Indexes2...>);
-#endif
+#endif // C++11
 };
 
   /// @relates pair @{
@@ -462,6 +465,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 operator==(const pair<_T1, _T2>& __x, const pair<_T1, _T2>& __y)
 { return __x.first == __y.first && __x.second == __y.second; }
 
+#if __cpp_lib_three_way_comparison && __cpp_lib_concepts
+  template
+constexpr common_comparison_category_t<__detail::__synth3way_t<_T1>,
+  __detail::__synth3way_t<_T2>>
+operator<=>(const pair<_T1, _T2>& __x, const pair<_T1, _T2>& __y)
+{
+  if (auto __c = __detail::__synth3way(__x.first, __y.first); __c != 0)
+   return __c;
+  return __detail::__synth3way(__x.second, __y.second);
+}
+#else
   /** Defines a lexicographical order for pairs.
*
* For two pairs of the same type, `P` is ordered before `Q` if
@@ -498,6 +512,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline _GLIBCXX_CONSTEXPR bool
 operator>=(const pair<_T1, _T2>& __x, const pair<_T1, _T2>& __y)
 { return !(__x < __y); }
+#endif // !(three_way_comparison && concepts)
 
 #if __cplusplus >= 201103L
   /** Swap overload for pairs. Calls std::pair::swap().
diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index 289145dea56..412ec6861d3 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -509,34 +509,41 @@ namespace std
{ __t <=> __u } -> __detail::__compares_as<_Cat>;
{ __u <=> __t } -> __detail::__compares_as<_Cat>;

[PATCH] libstdc++: Fix Doxygen markup error

2019-12-03 Thread Jonathan Wakely

* include/bits/stl_pair.h (pair): Remove stray Doxygen closing marker.

I added this stray marker a few months ago by mistake.

Committed to trunk.

commit ba39ad7d7e29e1459ad1d816dc84753ef7503f51
Author: Jonathan Wakely 
Date:   Tue Dec 3 11:50:31 2019 +

libstdc++: Fix Doxygen markup error

* include/bits/stl_pair.h (pair): Remove stray Doxygen closing 
marker.

diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index f7ad1696545..1e0e4fe892a 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -272,7 +272,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   explicit constexpr pair(const _T1& __a, const _T2& __b)
   : first(__a), second(__b) { }
 #endif
-  //@}
 
 #if __cplusplus < 201103L
   /// There is also a templated constructor to convert from other pairs.


Re: introduce -fcallgraph-info option

2019-12-03 Thread Alexandre Oliva
On Nov 14, 2019, Alexandre Oliva  wrote:

> In order to address this, I propose we add an internal option (not for
> the driver), -dumpbase-ext, that names the extension to be discarded
> from dumpbase to form aux output names.

Here's a WIP patch that implements much of the desired semantics.

I'm still struggling a bit with -gdwarf-split and -save-temps; -dumpbase
and multiple inputs, -dumpdir as a prefix, and -flto + -dump*.

-gdwarf-split uses %b to strip debug info into the .dwo file, so it
lands in the same location as the .o, rather than in a named -dumpdir as
specified in the .o debug info skeleton.  I'm thinking of arranging for
-dump* flags to affect %b and %B, just like -save-temps does.  I've
reviewed all uses of %b and %B, and it looks like this would enable us
to fix the .dwo naming mismatch without significant complication.

Which brings us to the next issue.  This would cause -dumpdir to
override the -save-temps location.  This is arguably an improvement.  It
might conflict with -save-temps=cwd, however.

I'm considering rejecting command lines that specify both an explicit
-dumpdir and -save-temps=cwd, and in the absence of an explicit
-dumpdir, arranging for -save-temps=cwd or -save-temps=obj to override
what would otherwise be the default -dumpdir.

Or, for the sake of simplifying and bringing more sanity to the logic of
naming extra output files, we could just discontinue -save-temps=*, and
require -dumpdir ./ along with plain -save-temps to get the effects of
-save-temps=cwd.


When compiling multiple inputs with a single -dumpbase, the current
implementation arranges for each compilation to take an adjusted
-dumpbase appending - to the given dumpbase, minus extension.  An
alternative would be to reject such compilations, just as we reject
multiple compilations with a single object file named as output.  That
feels excessive for -dumpbase, however.  OTOH, adjusting -dumpbase only
when there are multiple inputs causes different behavior comparing:

  gcc -c foo.c -dumpbase foobar && gcc -c bar.c -dumpbase foobar

and

  gcc -c foo.c bar.c -dumpbase foobar

The latter will name outputs after foobar-foo and foobar-bar,
respectively, whereas the former will overwrite outputs named foobar
when compiling bar.c.  Under the proposal to modify %b according to
-dump*, even object files would be named after an explicit -dumpbase,
when -o is not explicitly specified.


Yet another thing I'm not so sure about is -dumpdir as a prefix, e.g.,
in cases we're compiling multiple files and then linking them together,
say 'gcc foo.c bar.c -o foobar', the proposal was to name dumps of the
compilations after foobar-foo and foobar-bar, respectively.

If we use -dumpdir as a prefix to dump names, as we historically have,
if it doesn't end with a slash (or any dir separator) then it could be
used to specify the prefix for multiple outputs, as in the above.  So
gcc -dumpdir foobar- foo.c bar.c -o foobar *could* be equivalent to the
above.  I.e., an executable output name would affect the -dumpdir, but
not the -dumpbase passed to the compiler, whereas -dumpbase would be
derived from an asm or obj output or from input.

In the end, they're pasted together one way or the other, the difference
is the ability to override one or the other.  E.g.,

  gcc -dumpbase foobar foo.c bar.c -c

could then be rejected, just as -o foo+bar.o would be, or foobar could
be appended to the implicit -dumpdir and then override -dumpbase to
foo.c or bar.c in each compilation, to get foobar-foo.o and foobar-bar.o
outputs, getting the same as:

  gcc -dumpdir foobar- foo.c bar.c -c

and then

  gcc -dumpdir temp/foobar- foo.c bar.c -o foo+bar -save-temps

would still create and preserve .o (and .i and .s) named after
foobar-foo and foobar-bar within temp, rather than foo+bar-foo and
foo+bar-bar.

Now, the implementation in the patch below places the link output name,
implicit or not, in the default -dumpbase rather than -dumpdir, so the
last command above would create temp/foobar-foo+bar-foo.o and
temp/foobar-foo+bar-bar.o.  It's this ability to override the
executable-name prefix we're adding to outputs separately from -dumpbase
that I find somewhat appealing, in that it enables even the effect of
such prefixing to be canceled out with e.g. -dumpdir ./

Any thoughts for or against (or requests for clarification on :-) any of
the above proposed changes?


As for -flto + -dump*, the problem is that lto-wrapper doesn't take any
-dump* flags into account, it just overrides them all as if none had
been specified, which is undesirable.  That's fixable, it's just that
I haven't got to it yet.

I haven't brought in the design/documentation into a .texi file yet.
Please refer to posts upthread for the proposal.


Here's the WIP patch, FTR:

---
 gcc/ada/gcc-interface/lang-specs.h |7 +
 gcc/ada/switch.adb |4 -
 gcc/common.opt |   19 ++-
 gcc/dwarf2out.c|3 -
 gcc/fortran/options.c 

[PATCH] avoid invoking assignment on uninitialized objects (PR 92761, 92762)

2019-12-03 Thread Martin Sebor

After allocating a new chunk of memory hash_table::expand() copy-
assigns elements from the current array over the newly allocated
elements without constructing them.

Similarly, after destroying existing elements, hash_table::
empty_slow() assigns a new value to them.  This bug was
introduced in r249234 when trying to deal with -Wclass-memaccess
instances when the warning was first added.

Neither of these is a problem for PODs but both cause trouble when
the hash_table contains elements of a type with a user-defined copy
assignment operator.  There are at least two such instances in GCC
already and a third is under review.

The attached patch avoids this by using placement new to construct
new elements in uninitialized storage and restoring the original
call to memset in hash_table::empty_slow(), analogously to what
was done in r272893 for PR90923,

Longer term, to make these templates safely and efficiently usable
with non-POD types with user-defined copy ctors and copy assignment
operators I think these classes will probably need to be enhanced
to make use of "assign" and "move" traits functions to efficiently
assign and move objects.

Martin
PR middle-end/92761 - hash_table::expand invokes assignment on invalid objects
PR middle-end/92762 - hash_table::empty_slow invokes assignment on invalid objects
gcc/ChangeLog:

	PR middle-end/92761
	PR middle-end/92762
	* hash-map-tests.c (test_map_of_type_with_ctor_and_dtor): Tighten
	up tests.
	* hash-table.h (hash_table::expand): Use placement new to copy
	construct objects in uninitialized storage.
	(hash_table::empty_slow): Avoid invoking copy assignment on
	uninitialized objects.

diff --git a/gcc/hash-map-tests.c b/gcc/hash-map-tests.c
index f3b216be07c..a42eac21ab3 100644
--- a/gcc/hash-map-tests.c
+++ b/gcc/hash-map-tests.c
@@ -117,23 +117,26 @@ public:
 ++ndefault;
   }
 
-  hash_map_test_val_t (const hash_map_test_val_t &)
+  hash_map_test_val_t (const hash_map_test_val_t )
 : ptr ()
   {
 ++ncopy;
+gcc_assert (rhs.ptr == );
   }
 
-  hash_map_test_val_t& operator= (const hash_map_test_val_t &)
-{
- ++nassign;
- return *this;
-}
+  hash_map_test_val_t& operator= (const hash_map_test_val_t )
+  {
+++nassign;
+gcc_assert (ptr == );
+gcc_assert (rhs.ptr == );
+return *this;
+  }
 
   ~hash_map_test_val_t ()
-{
- gcc_assert (ptr == );
- ++ndtor;
-}
+  {
+gcc_assert (ptr == );
+++ndtor;
+  }
 
   void *ptr;
 } val_t;
@@ -184,7 +187,6 @@ test_map_of_type_with_ctor_and_dtor ()
 ASSERT_TRUE (nassign == val_t::nassign);
 
 ASSERT_TRUE ( != pv2);
-ASSERT_TRUE (pv2->ptr == >ptr);
   }
 
   ASSERT_TRUE (val_t::ndefault + val_t::ncopy == val_t::ndtor);
@@ -207,7 +209,6 @@ test_map_of_type_with_ctor_and_dtor ()
 ASSERT_TRUE (nassign + 1 == val_t::nassign);
 
 ASSERT_TRUE ( != pv2);
-ASSERT_TRUE (pv2->ptr == >ptr);
   }
 
   ASSERT_TRUE (val_t::ndefault + val_t::ncopy == val_t::ndtor);
diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index ba5d64fb7a3..26bac624a08 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -818,8 +818,7 @@ hash_table::expand ()
   if (!is_empty (x) && !is_deleted (x))
 {
   value_type *q = find_empty_slot_for_expand (Descriptor::hash (x));
-
-  *q = x;
+	  new ((void*) q) value_type (x);
 }
 
   p++;
@@ -869,14 +868,8 @@ hash_table::empty_slow ()
   m_size_prime_index = nindex;
 }
   else
-{
-#ifndef BROKEN_VALUE_INITIALIZATION
-  for ( ; size; ++entries, --size)
-	*entries = value_type ();
-#else
-  memset (entries, 0, size * sizeof (value_type));
-#endif
-}
+memset ((void *) entries, 0, size * sizeof (value_type));
+
   m_n_deleted = 0;
   m_n_elements = 0;
 }


[PATCH] Freestanding proposal P0829

2019-12-03 Thread Paul M. Bendixen
Hello
I've made an implementation of P0829 and tested it with gcc for avr.
I've included the patch and I believe it might need some modification.
Should I rather just maintain my own fork until P0829 is a bit
further? (It is currently being split into smaller proposals).
I have not included further tests, nor actually run the tests since
everything is in header files is only includes of already existing
libstdc++ code.

Best regards

Paul M. Bendixen
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index bc475c6dd90..3feed5b3bff 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1886,18 +1886,62 @@ install-freestanding-headers:
 	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/bits
 	for file in c++0x_warning.h atomic_base.h concept_check.h move.h; do \
 	  $(INSTALL_DATA) ${glibcxx_srcdir}/include/bits/$${file} $(DESTDIR)${gxx_include_dir}/bits; done
+	# P0829 headers
+	for file in algorithmfwd.h alloc_traits.h boost_concept_check.h char_traits.h \
+  concept_check.h cpp_type_traits.h functexcept.h functional_hash.h \
+  hash_byte.h invoke.h memoryfwd.h parse_numbers.h postypes.h predefined_ops.h \
+	  ptr_traits.h random.h random.tcc range_access.h refwrap.h std_abs.h stl_algo.h \
+	  stl_algobase.h stl_construct.h stl_function.h stl_heap.h stl_iterator.h \
+	  stl_iterator_base_funcs.h  stl_iterator_base_types.h \
+	  stl_numeric.h stl_pair.h stl_relops.h stl_uninitialized.h unique_ptr.h \
+  uniform_int_dist.h uses_allocator.h ; do \
+	  $(INSTALL_DATA) ${glibcxx_srcdir}/include/bits/$${file} $(DESTDIR)${gxx_include_dir}/bits; done
 	$(mkinstalldirs) $(DESTDIR)${host_installdir}
 	for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h \
 	  ${glibcxx_srcdir}/$(ABI_TWEAKS_SRCDIR)/cxxabi_tweaks.h \
+	  ${glibcxx_srcdir}/$(ERROR_CONSTANTS_SRCDIR)/error_constants.h \
 	  ${glibcxx_srcdir}/$(CPU_DEFINES_SRCDIR)/cpu_defines.h; do \
 	  $(INSTALL_DATA) $${file} $(DESTDIR)${host_installdir}; done
 	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${std_builddir}
 	for file in limits type_traits atomic bit version; do \
 	  $(INSTALL_DATA) ${std_builddir}/$${file} $(DESTDIR)${gxx_include_dir}/${std_builddir}; done
+	#P0829 headers
+	# Remaining headers:
+	#	cwchar memory ranges variant
+	# Will not be implemented:
+	#	cinttypes 
+	for file in algorithm array bitset chrono charconv functional iterator \
+  memory numeric optional random ratio string string_view system_error \
+  utility tuple ; do \
+	  $(INSTALL_DATA) ${std_builddir}/$${file} $(DESTDIR)${gxx_include_dir}/${std_builddir}; done
 	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${c_base_builddir}
 	for file in ciso646 cstddef cfloat climits cstdint cstdlib \
 	  cstdalign cstdarg cstdbool; do \
 	  $(INSTALL_DATA) ${c_base_builddir}/$${file} $(DESTDIR)${gxx_include_dir}/${c_base_builddir}; done
+	#P0829 headers
+	for file in csetjmp cwchar cerrno cmath ctime cctype cstring; do \
+	  $(INSTALL_DATA) ${c_base_builddir}/$${file} $(DESTDIR)${gxx_include_dir}/${c_base_builddir}; done
+	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${ext_builddir}
+	for file in type_traits.h numeric_traits.h; do \
+	  $(INSTALL_DATA) ${ext_srcdir}/$${file} $(DESTDIR)${gxx_include_dir}/${ext_builddir}; done
+	#P0829 headers ext
+	for file in alloc_traits.h; do \
+	  $(INSTALL_DATA) ${ext_srcdir}/$${file} $(DESTDIR)${gxx_include_dir}/${ext_builddir}; done
+	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${debug_builddir}
+	for file in array debug.h assertions.h; do \
+	  $(INSTALL_DATA) ${debug_srcdir}/$${file} $(DESTDIR)${gxx_include_dir}/${debug_builddir}; done
+	#P0829
+	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/${backward_builddir}
+	for file in binders.h ; do \
+	  $(INSTALL_DATA) ${backward_srcdir}/$${file} $(DESTDIR)${gxx_include_dir}/${backward_builddir}; done
+	$(mkinstalldirs) $(DESTDIR)${host_installdir}
+	for file in ${host_headers} ${bits_host_headers} ${host_headers_extra} \
+	 ${thread_host_headers}; do \
+	  $(INSTALL_DATA) $${file} $(DESTDIR)${host_installdir}; done
+	$(mkinstalldirs) $(DESTDIR)${host_installdir}/../ext
+	for file in ${ext_host_headers}; do \
+	  $(INSTALL_DATA) $${file} $(DESTDIR)${host_installdir}/../ext; done
+
 
 # The real deal.
 install-headers:
diff --git a/libstdc++-v3/include/bits/algorithmfwd.h b/libstdc++-v3/include/bits/algorithmfwd.h
index 40e051aa9e3..ebf6bd5610d 100644
--- a/libstdc++-v3/include/bits/algorithmfwd.h
+++ b/libstdc++-v3/include/bits/algorithmfwd.h
@@ -289,6 +289,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 bool
 includes(_IIter1, _IIter1, _IIter2, _IIter2, _Compare);
 
+#if _GLIBCXX_HOSTED
   template
 void
 inplace_merge(_BIter, _BIter, _BIter);
@@ -296,6 +297,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 void
 inplace_merge(_BIter, _BIter, _BIter, _Compare);
+#endif // _GLIBCXX_HOSTED
 
 #if __cplusplus >= 201103L
   template
@@ -575,9 +577,11 @@ 

[analyzer] Function pointer support

2019-12-03 Thread David Malcolm
gimple_call_fndecl (and gimple_call_addr_fndecl) fail to identify functions
for non-trivial uses of function pointers.

This patch eliminates most uses of these functions from the analyzer in
favor of resolving the rvalue for the fn ptr in the region_model, so that
with the patch it can e.g. detect the double free here:

  #include 
  typedef void (*deallocator_t) (void *);
  void test_1 (void *ptr)
  {
deallocator_t dealloc_fn = free;
dealloc_fn (ptr);
dealloc_fn (ptr);
  }

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Pushed to branch "dmalcolm/analyzer" on the GCC git mirror.

gcc/ChangeLog:
* analyzer/analyzer.cc (is_named_call_p): Reorganize arguments,
adding a fndecl.  Drop second overload, and rename first overload
to...
(is_special_named_call_p): ...this.
(is_setjmp_call_p): Update for rename of is_named_call_p to
is_special_named_call_p.
(is_longjmp_call_p): Likewise.
* analyzer/analyzer.h (is_named_call_p): Reorganize arguments,
adding a fndecl.  Drop second overload, and rename first overload
to...
(is_special_named_call_p): ...this.
* analyzer/engine.cc (impl_sm_context::get_fndecl_for_call): New
vfunc implementation.
(exploded_node::on_stmt): Update for rename of is_named_call_p to
is_special_named_call_p.
(stmt_requires_new_enode_p): Likewise.
(exploded_graph::dump_exploded_nodes): Likewise.
* analyzer/region-model.cc (region_model::on_call_pre): Likewise.
Replace most calls to is_named_call_p with the new overload taking
a fndecl, calling get_fndecl_for_call to resolve function pointers.
(region_model::on_call_post): Likewise.
(region_model::get_fndecl_for_call): New member function.
* analyzer/region-model.h (region_model::get_fndecl_for_call): New
decl.
* analyzer/sm-file.cc (fileptr_state_machine::on_stmt): Call
get_fndecl_for_call to resolve function pointers, and update all
uses of is_named_call_p to use the result.
* analyzer/sm-malloc.cc (free_of_non_heap::describe_state_change):
Update for rename of is_named_call_p to is_special_named_call_p.
(malloc_state_machine::on_stmt): Call get_fndecl_for_call to
resolve function pointers, and update all uses of is_named_call_p
to use the result.
* analyzer/sm-sensitive.cc (sensitive_state_machine::on_stmt):
Likewise.
* analyzer/sm-taint.cc (taint_state_machine::on_stmt): Likewise.
* analyzer/sm.h (sm_context::get_fndecl_for_call): New vfunc.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/attribute-nonnull.c: Add coverage for detecting
passing NULL to a __attribute__((nonnull)) function called via a
function pointer.
* gcc.dg/analyzer/malloc-callbacks.c: New test.
---
 gcc/analyzer/analyzer.cc  |  38 +++-
 gcc/analyzer/analyzer.h   |   8 +-
 gcc/analyzer/engine.cc|  24 ++-
 gcc/analyzer/region-model.cc  | 187 ++
 gcc/analyzer/region-model.h   |   3 +
 gcc/analyzer/sm-file.cc   |  78 
 gcc/analyzer/sm-malloc.cc | 104 +-
 gcc/analyzer/sm-sensitive.cc  |  55 +++---
 gcc/analyzer/sm-taint.cc  |  33 ++--
 gcc/analyzer/sm.h |   6 +
 .../gcc.dg/analyzer/attribute-nonnull.c   |  24 +++
 .../gcc.dg/analyzer/malloc-callbacks.c|  84 
 12 files changed, 415 insertions(+), 229 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-callbacks.c

diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
index 399925c96fc9..5405facf5e86 100644
--- a/gcc/analyzer/analyzer.cc
+++ b/gcc/analyzer/analyzer.cc
@@ -28,10 +28,18 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "analyzer/analyzer.h"
 
-/* Helper function for checkers.  Is the CALL to the given function name?  */
+/* Helper function for checkers.  Is the CALL to the given function name,
+   and with the given number of arguments?
+
+   This doesn't resolve function pointers via the region model;
+   is_named_call_p should be used instead, using a fndecl from
+   get_fndecl_for_call; this function should only be used for special cases
+   where it's not practical to get at the region model, or for special
+   analyzer functions such as __analyzer_dump.  */
 
 bool
-is_named_call_p (const gcall *call, const char *funcname)
+is_special_named_call_p (const gcall *call, const char *funcname,
+unsigned int num_args)
 {
   gcc_assert (funcname);
 
@@ -39,19 +47,31 @@ is_named_call_p (const gcall *call, const char *funcname)
   if (!fndecl)
 return false;
 
+  return is_named_call_p (fndecl, funcname, call, num_args);
+}
+
+/* Helper 

[PATCH] add -Wmismatched-tags (PR 61339)

2019-12-03 Thread Martin Sebor

On 8/5/19 4:30 PM, Jason Merrill wrote:

On Mon, Aug 5, 2019 at 5:50 PM Martin Sebor  wrote:


On 8/5/19 1:25 PM, Jason Merrill wrote:

On 8/1/19 7:35 PM, Martin Sebor wrote:

On 8/1/19 12:09 PM, Jason Merrill wrote:

On 7/22/19 12:34 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00622.html

...

-Wmismatched-tags is useful to have, given the MSVC/clang situation,
but I wonder about memory consumption from all the record keeping.
Do you have any overhead measurements?


I did think about the overhead but not knowing if the patch would
even be considered I didn't spend time optimizing it.

In an instrumented build of GCC with the patch I just did I have
collected the following stats for the number of elements in
the rec2loc hash table (for 998 files):

rec2loc elements   locvec elements
min:   0 0
max:   2,785 3,303
mean:537 1,043
median:  526 1,080

The locvec data are based on totals for the whole hash table, so
in the worst case the hash_map has 2,785 elements, and the sum of
all locvec lengths in all those elements is 3,303.  Each rec2loc
element takes up 16 bytes, plus the size of the locvec data.

If my math is right (which doesn't happen too often) in the worst
case the bookkeeping overhead is 43KB for the hash_map plus on
the order of 140KB for the vectors (just doubling the number of
elements to account for capacity = 2 X size, times 24 for
the flag_func_loc_t element type).  So 183K in the worst case
in GCC.


183k cumulative over all the GCC sources doesn't sound excessive, but...


There are a few ways to reduce that if it seems excessive.

One is by avoiding some waste in flag_func_loc_t which is

pair>>

tree could come first and tag_types and the bool could share
space.  That should bring it down to 16 in LP64, for about
30% off the 183K.

Beyond that, we could change the algorithm to discard records
for prior declarations after the first definition has been seen
(and any mismatches diagnosed).

We could also only collect one record for each definition
in system headers rather than one for every declaration and
reference.


...these all sound worthwhile.


Okay, I'll look into it.

To be clear: it was 183K maximum for a single compilation, not
aggregate for all of them.


Aha.  Thanks.


Attached is a revised patch that implements just -Wmismatched-tags
without the other two warnings (-Wclass-not-pod and -Wstruct-is-pod).
It also implements the optimizations mentioned above.  To make it
easier to do the tag cleanup by simply dropping the class-key when
it isn't necessary (suggested during the review of the cleanup patch)
I added another warning: -Wredundant-tags to point out instances
where the class-key or enum-key can safely be dropped. Both warnings
are off by default.

With the optimizations in place, the biggest space overhead of
using the option I measured in a GCC build was 990 elements of
the record_to_locs_t hash table, plus 2756 elements of the locvec
vector.  In LP64, each record_to_locs_t element type is 16 bytes
and each element of the locvec vector is 24 bytes, so the maximum
space overhead is on the order of 80K.  The average overhead per
GCC translation unit was about 30K.

The patch depends on fixes for a few bugs in GCC hash_tables (PR
92761 and 92762).  I will post those separately.

Martin

PS Independently of this patch I will propose updating the GCC
Coding Conventions to remove the guideline to use the struct
class-key for PODs and class for non-PODs:
  https://gcc.gnu.org/codingconventions.html#Class_Use
PR c++/61339 - add warning for mismatch between struct and class

gcc/c-family/ChangeLog:

	PR c++/61339
	* c.opt (-Wmismatched-tags, -Wredundant-tags): New options.

gcc/cp/ChangeLog:

	PR c++/61339
	* parser.c (cp_parser_maybe_warn_enum_key): New function.
	(rec_decl_loc_t): New class.
	(record_to_locs_t): New typedef.
	(rec2loc): New global variable.
	(cp_parser_elaborated_type_specifier): Call it.
	(cp_parser_class_head): Call cp_parser_check_class_key.
	(cp_parser_check_class_key): Add arguments.
	(diag_mismatched_tags): New function.
	(c_parse_file): Call diag_mismatched_tags.

gcc/testsuite/ChangeLog:

	PR c++/61339
	* g++.dg/warn/Wmismatched-tags.C: New test.
	* g++.dg/warn/Wredundant-tags.C: New test.

gcc/ChangeLog:

	PR c++/61339
	* doc/invoke.texi (-Wmismatched-tags, -Wredundant-tags): Document
	new C++ options.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 914a2f0ef44..4f3d3cf0d43 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -755,6 +755,10 @@ Wmisleading-indentation
 C C++ Common Var(warn_misleading_indentation) Warning LangEnabledBy(C C++,Wall)
 Warn when the indentation of the code does not reflect the block structure.
 
+Wmismatched-tags
+C++ Objc++ Var(warn_mismatched_tags) Warning
+Warn when a class is redeclared or referenced using a mismatched class-key.
+
 Wmissing-braces
 C ObjC C++ 

Re: [C++ PATCH] __builtin_source_location ()

2019-12-03 Thread Jakub Jelinek
On Tue, Dec 03, 2019 at 03:37:41PM -0500, Jason Merrill wrote:
> On 11/15/19 7:28 AM, Jakub Jelinek wrote:
> > +  loc = LOCATION_LOCUS (loc);
> ...
> > +  entry.loc
> > += linemap_resolve_location (line_table, loc, LRK_MACRO_EXPANSION_POINT,
> > +   );
> 
> You don't need LOCATION_LOCUS if you're calling linemap_resolve_location.
> LGTM with that change and David's.

I had to apply also Jon's suggestion to use _M_* names of the members
instead of __* names (didn't use the variant that would support both), while
__impl remains with two underscores.  Below is what I'll be retesting.

> > I know David and Jonathan had comments on the patch and can try to
> > incorporate them, but more importantly I'd like to ask whether we want to
> > add this at all or wait till GCC 11 and add something else.
> > Yet another option would be make it clear that the builtin is a temporary
> > thing until we implement something different and then would be removed,
> > say by reporting error if the builtin is used anywhere but in default
> > argument of std::source_location::current or something similar; I mean if
> > the ABI is sound, but we might have __builtin_constexpr_force_static or
> > whatever else or some new expression form one day and replace the builtin
> > uses with that.
> 
> I don't think we need to worry about enforcement; we say in general that
> C++20 support is experimental, and we don't try to maintain compatibility;
> even more so for internal details like this.

2019-12-03  Jakub Jelinek  

* cp-tree.h (enum cp_tree_index): Add CPTI_SOURCE_LOCATION_IMPL.
(source_location_impl): Define.
(enum cp_built_in_function): Add CP_BUILT_IN_SOURCE_LOCATION.
(fold_builtin_source_location): Declare.
* cp-gimplify.c: Include output.h, file-prefix-map.h and cgraph.h.
(cp_gimplify_expr, cp_fold): Handle CP_BUILT_IN_SOURCE_LOCATION.
Formatting fix.
(get_source_location_impl_type): New function.
(struct source_location_table_entry,
struct source_location_table_entry_hash): New types.
(source_location_table, source_location_id): New variables.
(fold_builtin_source_location): New function.
* constexpr.c (cxx_eval_builtin_function_call): Handle
CP_BUILT_IN_SOURCE_LOCATION.
* tree.c (builtin_valid_in_constant_expr_p): Likewise.  Formatting
fix.
* decl.c (cxx_init_decl_processing): Register
__builtin_source_location.
* name-lookup.c (get_std_name_hint): Add source_location entry.

* g++.dg/cpp2a/srcloc1.C: New test.
* g++.dg/cpp2a/srcloc2.C: New test.
* g++.dg/cpp2a/srcloc3.C: New test.
* g++.dg/cpp2a/srcloc4.C: New test.
* g++.dg/cpp2a/srcloc5.C: New test.
* g++.dg/cpp2a/srcloc6.C: New test.
* g++.dg/cpp2a/srcloc7.C: New test.
* g++.dg/cpp2a/srcloc8.C: New test.
* g++.dg/cpp2a/srcloc9.C: New test.
* g++.dg/cpp2a/srcloc10.C: New test.
* g++.dg/cpp2a/srcloc11.C: New test.
* g++.dg/cpp2a/srcloc12.C: New test.
* g++.dg/cpp2a/srcloc13.C: New test.
* g++.dg/cpp2a/srcloc14.C: New test.

--- gcc/cp/cp-tree.h.jj 2019-12-03 20:21:29.743476880 +0100
+++ gcc/cp/cp-tree.h2019-12-03 21:45:31.807989738 +0100
@@ -204,6 +204,8 @@ enum cp_tree_index
 
 CPTI_ANY_TARG,
 
+CPTI_SOURCE_LOCATION_IMPL,
+
 CPTI_MAX
 };
 
@@ -356,6 +358,9 @@ extern GTY(()) tree cp_global_trees[CPTI
 /* A node which matches any template argument.  */
 #define any_targ_node  cp_global_trees[CPTI_ANY_TARG]
 
+/* std::source_location::__impl class.  */
+#define source_location_impl   
cp_global_trees[CPTI_SOURCE_LOCATION_IMPL]
+
 /* Node to indicate default access. This must be distinct from the
access nodes in tree.h.  */
 
@@ -6182,6 +6187,7 @@ struct GTY((chain_next ("%h.next"))) tin
 enum cp_built_in_function {
   CP_BUILT_IN_IS_CONSTANT_EVALUATED,
   CP_BUILT_IN_INTEGER_PACK,
+  CP_BUILT_IN_SOURCE_LOCATION,
   CP_BUILT_IN_LAST
 };
 
@@ -7735,6 +7741,7 @@ extern void clear_fold_cache  (void);
 extern tree lookup_hotness_attribute   (tree);
 extern tree process_stmt_hotness_attribute (tree, location_t);
 extern bool simple_empty_class_p   (tree, tree, tree_code);
+extern tree fold_builtin_source_location   (location_t);
 
 /* in name-lookup.c */
 extern tree strip_using_decl(tree);
--- gcc/cp/cp-gimplify.c.jj 2019-11-27 17:26:57.233013947 +0100
+++ gcc/cp/cp-gimplify.c2019-12-03 21:52:31.720546585 +0100
@@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
 #include "attribs.h"
 #include "asan.h"
 #include "gcc-rich-location.h"
+#include "output.h"
+#include "file-prefix-map.h"
+#include "cgraph.h"
 
 /* Forward declarations.  */
 
@@ -896,8 +899,12 @@ cp_gimplify_expr (tree *expr_p, gimple_s
  tree decl = cp_get_callee_fndecl_nofold (*expr_p);
 

Re: [PATCH], V7, #6 of 7, Fix issues with vector extract and prefixed instructions

2019-12-03 Thread Segher Boessenkool
On Tue, Dec 03, 2019 at 01:20:04PM -0500, Michael Meissner wrote:
> On Tue, Nov 26, 2019 at 01:20:20PM -0600, Segher Boessenkool wrote:
> > > I needed to add a new constraint (em) in addition to new predicate 
> > > functions.
> > > I discovered that with the predicate function alone, the register 
> > > allocator
> > > would re-create the address.  The constraint prevents this combination.
> > 
> > It's a reasonable thing to have as a contraint, too.
> > 
> > Once again, how should things work in inline assembler?  Can we allow
> > prefixed addressing there, or should "m" in inline asm mean "em"?
> 
> At the moment, I think we should just not allow prefixed addresses in asm
> constructs at all.

We need to think about this *before GCC 10*.  And have patches for it.

So you are saying that yes, "m" in inline asm should mean "em".  This
probably makes sense, there is so much existing asm that does not work
with prefixed addresses -- this is somewhet similar to how "m" in inline
asm is not the same as "m<>" (as it is in the machine description).

But then we need an extra constraint to _do_ allow prefixed addresses:
there needs to be a way to write inline asm for such memory as well.

> > > I also modified the vector extract code to generate a single PC-relative 
> > > load
> > > if the vector has a PC-relative address and the offset is constant.
> > 
> > That should be handled by the RTL optimisers anyway, no?  Or is this
> > for post-reload splitters (a bad idea always).
> 
> You have it backwards.  It is the RTL optimizers that combines the vector
> extract with the load in the first place.  My code allows the combination and
> generates a single instruction.  Otherwise, it would do a PLA and then do the
> vector extract with the address loaded.

So, separate patch please, for a separate improvement.  So that I can
understand what is what.

> > > +  /* Optimize PC-relative addresses with a constant offset.  */
> > > +  else if (pcrel_p && CONST_INT_P (element_offset))
> > > +{
> > > +  rtx addr2 = addr;
> > 
> > addr isn't used after this anyway, so you can clobber it just fine.
> 
> Generally I prefer not to reuse variables.

You shouldn't fear that.  Instead, you should factor big routines, and/or
simplify control flow.

"Not reusing variables" (for the *same purpose* btw) is not the problem.
The problem is that the code is too big and complex and irregular to
simply understand.

> > > +  /* With only one temporary base register, we can't support a 
> > > PC-relative
> > > + address added to a variable offset.  This is because the PADDI 
> > > instruction
> > > + requires RA to be 0 when doing a PC-relative add (i.e. no register 
> > > to add
> > > + to).  */
> > > +  else if (pcrel_p)
> > > +gcc_unreachable ();
> > 
> > That comment suggests we just ICE when we get unwanted input.  Such a
> > comment belongs where we prevent such code from being formed in the
> > first place (or nowhere, if it is obvious).
> 
> The constraint and predicate is where we prevent it from occuring.  The
> gcc_unreachable is just there as insurance that it didn't get recreated later.
> During testing before I added the constraint, I found the register allocator
> combining the two, even though the predicate didn't allow the combination.  So
> the test is just to ICE out if the combination took place.

So there should be no comment here?  A gcc_assert could be useful, but
that is all?  And the assert should be as early as possible, in general,
like (almost) immediately after the function start.  If you find you want
to do it later, congratulations, you found a good place to factor this
routine :-)


Segher


Re: [C++ PATCH] __builtin_source_location ()

2019-12-03 Thread Jason Merrill

On 11/15/19 7:28 AM, Jakub Jelinek wrote:

+  loc = LOCATION_LOCUS (loc);

...

+  entry.loc
+= linemap_resolve_location (line_table, loc, LRK_MACRO_EXPANSION_POINT,
+   );


You don't need LOCATION_LOCUS if you're calling 
linemap_resolve_location.  LGTM with that change and David's.



I know David and Jonathan had comments on the patch and can try to
incorporate them, but more importantly I'd like to ask whether we want to
add this at all or wait till GCC 11 and add something else.
Yet another option would be make it clear that the builtin is a temporary
thing until we implement something different and then would be removed,
say by reporting error if the builtin is used anywhere but in default
argument of std::source_location::current or something similar; I mean if
the ABI is sound, but we might have __builtin_constexpr_force_static or
whatever else or some new expression form one day and replace the builtin
uses with that.


I don't think we need to worry about enforcement; we say in general that 
C++20 support is experimental, and we don't try to maintain 
compatibility; even more so for internal details like this.


Jason



Re: Ping: [C++ PATCH] Opt out of GNU vector extensions for built-in SVE types

2019-12-03 Thread Jason Merrill

On 12/3/19 7:39 AM, Richard Sandiford wrote:

Jason Merrill  writes:

On 11/29/19 5:59 AM, Richard Sandiford wrote:

Ping

Richard Sandiford  writes:

This is the C++ equivalent of r277950, which prevented the
use of the GNU vector extensions with SVE vector types for C.
[https://gcc.gnu.org/viewcvs/gcc?view=revision=277950].
I've copied the rationale below for reference.

The changes here are very similar to the C ones.  Perhaps the only
noteworthy thing (that I know of) is that the patch continues to treat
!gnu_vector_type_p vector types as literal types/potential constexprs.
Disabling the GNU vector extensions shouldn't in itself stop the types
from being literal types, since whatever the target provides instead
might be constexpr material.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard

-
The AArch64 port defines built-in SVE types at start-up under names
like __SVInt8_t.  These types are represented in the front end and
gimple as normal VECTOR_TYPEs and are code-generated as normal vectors.
However, we'd like to stop the frontends from treating them in the
same way as GNU-style ("vector_size") vectors, for several reasons:

(1) We allowed the GNU vector extensions to be mixed with Advanced SIMD
  vector types and it ended up causing a lot of confusion on big-endian
  targets.  Although SVE handles big-endian vectors differently from
  Advanced SIMD, there are still potential surprises; see the block
  comment near the head of aarch64-sve.md for details.

(2) One of the SVE vectors is a packed one-bit-per-element boolean vector.
  That isn't a combination the GNU vector extensions have supported
  before.  E.g. it means that vectors can no longer decompose to
  arrays for indexing, and that not all elements are individually
  addressable.  It also makes it less clear which order the initialiser
  should be in (lsb first, or bitfield ordering?).  We could define
  all that of course, but it seems a bit weird to go to the effort
  for this case when, given all the other reasons, we don't want the
  extensions anyway.

(3) The GNU vector extensions only provide full-vector operations,
  which is a very artifical limitation on a predicated architecture
  like SVE.

(4) The set of operations provided by the GNU vector extensions is
  relatively small, whereas the SVE intrinsics provide many more.

(5) It makes it easier to ensure that (with default options) code is
  portable between compilers without the GNU vector extensions having
  to become an official part of the SVE intrinsics spec.

(6) The length of the SVE types is usually not fixed at compile time,
  whereas the GNU vector extension is geared around fixed-length
  vectors.

  It's possible to specify the length of an SVE vector using the
  command-line option -msve-vector-bits=N, but in principle it should
  be possible to have functions compiled for different N in the same
  translation unit.  This isn't supported yet but would be very useful
  for implementing ifuncs.  Once mixing lengths in a translation unit
  is supported, the SVE types should represent the same type throughout
  the translation unit, just as GNU vector types do.

However, when -msve-vector-bits=N is in effect, we do allow conversions
between explicit GNU vector types of N bits and the corresponding SVE
types.  This doesn't undermine the intent of (5) because in this case
the use of GNU vector types is explicit and intentional.  It also doesn't
undermine the intent of (6) because converting between the types is just
a conditionally-supported operation.  In other words, the types still
represent the same types throughout the translation unit, it's just that
conversions between them are valid in cases where a certain precondition
is known to hold.  It's similar to the way that the SVE vector types are
defined throughout the translation unit but can only be used in functions
for which SVE is enabled.
-


2019-11-08  Richard Sandiford  

gcc/cp/
* cp-tree.h (CP_AGGREGATE_TYPE_P): Check for gnu_vector_type_p
instead of VECTOR_TYPE.
* call.c (build_conditional_expr_1): Restrict vector handling
to vectors that satisfy gnu_vector_type_p.  Don't treat the
"then" and "else" types as equivalent if they have the same
vector shape but differ in whether they're GNU vectors.
* cvt.c (ocp_convert): Only allow vectors to be converted
to bool if they satisfy gnu_vector_type_p.
(build_expr_type_conversion): Only allow conversions from
vectors if they satisfy gnu_vector_type_p.
* typeck.c (cp_build_binary_op): Only allow binary operators to be
applied to vectors if they satisfy gnu_vector_type_p.
(cp_build_unary_op): 

Re: [PATCH], V7, #5 of 7, Add more effective targets for the 'future' system to target-supports.

2019-12-03 Thread Segher Boessenkool
On Tue, Dec 03, 2019 at 01:11:55PM -0500, Michael Meissner wrote:
> On Fri, Nov 22, 2019 at 08:11:16PM -0600, Segher Boessenkool wrote:
> > On Thu, Nov 14, 2019 at 05:56:50PM -0500, Michael Meissner wrote:
> > >   * lib/target-supports.exp
> > >   (check_effective_target_powerpc_future_ok): Do not require 64-bit
> > >   or Linux support before doing the test.  Use a 32-bit constant in
> > >   PLI.
> > 
> > You changed from 0x12345 to 0x1234, instead -- why?
> 
> The last time I did the patch there was discussion about 32-bit support.  I
> just rewrote the patch to accomidate possibly using -mcpu=future on a 32-bit
> platform.  In particular, AIX.  So I rewrote the basic test so that in theory
> it would run on a 32-bit platform.

I don't see how this has anything to with it?  The insns was just fine
for 32-bit already?

> > > +# Return 1 if this is a PowerPC target supporting -mcpu=future.

> > >  proc check_effective_target_powerpc_future_ok { } {
> > > -if { ([istarget powerpc64*-*-linux*]) } {
> > > +if { ([istarget powerpc*-*-*]) } {
> > >   return [check_no_compiler_messages powerpc_future_ok object {
> > >   int main (void) {
> > >   long e;
> > > - asm ("pli %0,%1" : "=r" (e) : "n" (0x12345));
> > > + asm ("pli %0,%1" : "=r" (e) : "n" (0x1234));
> > >   return e;
> > >   }
> > >   } "-mfuture"]
> > 
> > You are still passing -mfuture.
> 
> All of the tests use the -m test and not -mcpu= test
> (i.e. -mpower9-vector instead of -mcpu=power9 or -mdejagnu-cpu=power9), so I
> was being consistant with those tests.

Yes, that is a nasty problem, so let's not promulgate that.  Anyway, it
also contradicts the comment on this function.

> > I don't understand why we test both pli and pld.
> 
> Just being cautious.

Hrm, the check_effective_target_powerpc_future_ok test should probably
use a non-prefixed instruction?  If we cannot do that right now, add a
comment?

> > > +proc check_effective_target_powerpc_prefixed_addr_ok { } {
> > 
> > The name says another.
> > 
> > > +if { ([istarget powerpc*-*-*]) } {
> > 
> > This part has no function?  Are there any testcases that test for our
> > prefixed insns that are compiler for non-powerpc?
> 
> Probably not, but all of the other tests have the same prefix.

Feel free to send patches fixing old mistakes, too.

> > > + return [check_no_compiler_messages powerpc_prefixed_addr_ok object {

> > Does this need to be separate from check_effective_target_powerpc_future_ok
> > at all?
> 
> This gets back to whether some port will eventually want to use FUTURE
> instructions in a 32-bit context.  Linux will always be 64-bit little endian,

No, Linux also supports 32-bit BE and 64-bit BE.  Even if we do not
support Future on those currently, this may change.  There is absolutely
no reason to make it harder to do later.  If we think some combinations
do not exist we can *ignore* them.  That is much less work, too.

> but there are other platforms out there that may want to generate FUTURE code.

But we should not anticipate those having broken assemblers that do not
support half of the insns, etc.  You *already* required prefixed insns in
the base Future test, to prove how futile this is.

> > > +# support PC-relative addressing when -mcpu=future is used to pass this 
> > > test.
> > > +
> > > +proc check_effective_target_powerpc_pcrel_ok { } {
> > > +if { ([istarget powerpc*-*-*]) } {
> > > + return [check_no_compiler_messages powerpc_pcrel_ok object {
> > > +   int main (void) {
> > > +   static int s __attribute__((__used__));
> > > +   int e;
> > > +   asm ("plwa %0,s@pcrel(0),1" : "=r" (e));
> > > +   return e;
> > > +   }
> > > +   } "-mfuture"]
> > > +  } else {
> > > +   return 0
> > > +  }
> > > +}
> > 
> > So every assembler will support either all three of these, or none.
> > Can you simplify this please?
> 
> I can imagine for example AIX might support 64-bit 'future' but not support
> PC-relative.  I believe you (or David) asked to split up the prefixed
> addressing with numeric offets and PC-relative addressing, because ports might
> not be able to support PC-relative addressing.

That is fine, but these tests do not do that.  Some clearer comments will
help, too.


Segher


Re: [PATCH], V7, #2 of 7, Use PLI to load up 32-bit SImode constants

2019-12-03 Thread Segher Boessenkool
On Tue, Dec 03, 2019 at 12:57:24PM -0500, Michael Meissner wrote:
> No, the change for num_insns_constant_gpr could not go in until the support 
> for
> PLI went in (patch V6 #1).

Well, I lost track.  So your version 7 to 9 patches do *not* replace the
v6 patches?  Or does "V" mean something else?

Please post patches you propose currently (after addressing comments,
saying what is changed wrt the previous submission, etc.)

And if you restructure the patch sets, e.g. add new things to patches,
I will have to start reviewing from scratch.  While if you just do some
modifications (and you say what you modified, and that matches reality),
and those modifications are in line with previous reviews, then reviewing
makes easy progress.

It also normally would not take a single day even to make such changes.

> So for patches V6 #1-3, I believe you approved #1 (movdi) and #3 
> (addi3/addsi3)
> after the changes but not #2 (movsi).  I haven't yet applied the specific bits
> of #1 or #3 after doing the reformating patch.  Are the bits for #2 ok with 
> the
> above change to use num_insns_constant_gpr.  Or did you want to see the 3
> patches once again after the reformating.

https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01605.html

This is six weeks ago.  And I do not know if what you call "v6" is this.


Segher


Re: [C++ Patch] A few more cp_expr_loc_or_input_loc and a diagnostic regression fix

2019-12-03 Thread Jason Merrill

On 12/2/19 5:22 PM, Paolo Carlini wrote:

Hi,

On 02/12/19 19:58, Jason Merrill wrote:

On 11/29/19 8:08 AM, Paolo Carlini wrote:

Hi,

a few more rather straightforward uses for cp_expr_loc_or_input_loc.

Additionally, while working on the latter, I noticed that, compared 
to say, gcc-7, lately the code we have in cp_build_addr_expr_1 to 
diagnose taking the address of 'main' often doesn't work anymore, 
when the argument is wrapped in a location_wrapper. The below fixes 
that by using tree_strip_any_location_wrapper in the DECL_MAIN_P 
check, which works fine, but I can imagine various other ways to 
solve the issue...


Maybe

location_t loc = cp_expr_loc_or_input_loc (arg);
STRIP_ANY_LOCATION_WRAPPER (arg);

at the top?  In general I prefer the local variable to writing 
cp_expr_loc_or_input_loc in lots of places, whether or not we then 
strip wrappers.


Sure. In a few circumstances I hesitated because 
cp_expr_loc_or_input_loc isn't really trivial, boils down to quite a few 
conditionals, and adds a bit to the complexity of simple functions when 
in fact no errors nor warnings are issued. But certainly calling it at 
the top is often much cleaner. I changed both the functions.


However, using STRIP_ANY_LOCATION_WRAPPER at the beginning of 
cp_build_addr_expr_1 doesn't seem straightforward, causes a few 
regressions (wrong location for conversion/Wwrite-strings.C; even an ICE 
for cpp1z/constexpr-lambda23.C; cpp1z/decomp48.C). The function doesn't 
seem trivial from this point of view, there is already a localized 
tree_strip_any_location_wrapper near the end, for 
processing_template_decl. I would rather further investigate that kind 
of simplification in a separate patch, beyond the regression fix. 
(before I would like to improve the locations for all the various kinds 
of cast, quite a few changes)


Absolutely, if it isn't simple don't bother.


@@ -6061,6 +6061,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
 {
   tree argtype;
   tree val;
+  location_t loc;
 
   if (!arg || error_operand_p (arg))

 return error_mark_node;
@@ -6070,6 +6071,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
 return error_mark_node;
 
   argtype = lvalue_type (arg);

+  loc = cp_expr_loc_or_input_loc (arg);


You don't need to separately declare the variable at the top of the 
function anymore.  OK with that change.


Jason



Re: [C++ PATCH] Fix up constexpr TARGET_EXPR evaluation if there are cleanups (PR c++/91369)

2019-12-03 Thread Jason Merrill

On 12/3/19 3:42 AM, Jakub Jelinek wrote:

Hi!

The following testcase shows that during constexpr evaluation we didn't
handle TARGET_EXPR_CLEANUP at all (which was probably fine when there
weren't constexpr dtors).  My understanding is that TARGET_EXPR cleanups
should be queued and evaluated only at the end of CLEANUP_POINT_EXPR, so
that is what the patch implements.

Regtesting of the first iteration revealed quite some FAILs in libstdc++,
my assumption that a TARGET_EXPR with cleanups will always appear inside of
some CLEANUP_POINT_EXPR is violated e.g. when cp_fold attempts to evaluate
some call expressions with TARGET_EXPR in arguments, so I had to add
evaluating of cleanups in cxx_eval_outermost_constant_expr too as if the
outermost expression were wrapped into another CLEANUP_POINT_EXPR.

There was another issue only seen in libstdc++, in particular,
TRY_CATCH_EXPR created by wrap_cleanups_r can sometimes have NULL_TREE
first argument.

Furthermore (though just theoretical, I haven't tried to create a testcase),
I believe a TARGET_EXPR shuld behave similarly to SAVE_EXPR that if the same
TARGET_EXPR tree is encountered multiple times, the initializer expression
is evaluated just once, not multiple times.  Maybe it didn't matter before
if things were evaluated multiple times, but e.g. with constexpr new/delete,
evaluating new multiple times without corresponding deletes is incorrect,
so in the patch I've tried to handle caching of the result and reuse of it
once it has been evaluated already (with pushing into save_exprs so that
e.g. when evaluating a loop second time or a function we undo the caching).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, the description of CLEANUP_POINT_EXPR in tree.def makes me worry a
little bit, with the:
"   Note that if the expression is a reference to storage, it is forced out
of memory before the cleanups are run.  This is necessary to handle
cases where the cleanups modify the storage referenced; in the
expression 't.i', if 't' is a struct with an integer member 'i' and a
cleanup which modifies 'i', the value of the expression depends on
whether the cleanup is run before or after 't.i' is evaluated.  When
expand_expr is run on 't.i', it returns a MEM.  This is not good enough;
the value of 't.i' must be forced out of memory."
note.  Does that mean the CLEANUP_POINT_EXPR handling should do
unshare_constructor on the result if there are any cleanups (and similarly
outermost expr handling)?  Don't want on the other side to waste too much
memory if it is not necessary...

2019-12-03  Jakub Jelinek  

PR c++/91369
* constexpr.c (struct constexpr_global_ctx): Add cleanups member,
initialize it in the ctor.
(cxx_eval_constant_expression) : If TARGET_EXPR_SLOT
is already in the values hash_map, don't evaluate it again.  Put
TARGET_EXPR_SLOT into hash_map even if not lval, and push it into
save_exprs too.  If there is TARGET_EXPR_CLEANUP and not
CLEANUP_EH_ONLY, push the cleanup to cleanups vector.
: Save outer cleanups, set cleanups to
local auto_vec, after evaluating the body evaluate cleanups and
restore previous cleanups.
: Don't crash if the first operand is NULL_TREE.
(cxx_eval_outermost_constant_expr): Set cleanups to local auto_vec,
after evaluating the expression evaluate cleanups.

* g++.dg/cpp2a/constexpr-new8.C: New test.

--- gcc/cp/constexpr.c.jj   2019-11-28 09:02:21.896897228 +0100
+++ gcc/cp/constexpr.c  2019-12-03 01:14:06.674952015 +0100
@@ -1025,8 +1025,10 @@ struct constexpr_global_ctx {
/* Heap VAR_DECLs created during the evaluation of the outermost constant
   expression.  */
auto_vec heap_vars;
+  /* Cleanups that need to be evaluated at the end of CLEANUP_POINT_EXPR.  */
+  vec *cleanups;
/* Constructor.  */
-  constexpr_global_ctx () : constexpr_ops_count (0) {}
+  constexpr_global_ctx () : constexpr_ops_count (0), cleanups (NULL) {}
  };
  
  /* The constexpr expansion context.  CALL is the current function

@@ -4984,6 +4986,14 @@ cxx_eval_constant_expression (const cons
  *non_constant_p = true;
  break;
}
+  /* Avoid evaluating a TARGET_EXPR more than once.  */
+  if (tree *p = ctx->global->values.get (TARGET_EXPR_SLOT (t)))
+   {
+ if (lval)
+   return TARGET_EXPR_SLOT (t);
+ r = *p;
+ break;
+   }
if ((AGGREGATE_TYPE_P (TREE_TYPE (t)) || VECTOR_TYPE_P (TREE_TYPE (t
{
  /* We're being expanded without an explicit target, so start
@@ -5004,13 +5014,14 @@ cxx_eval_constant_expression (const cons
if (!*non_constant_p)
/* Adjust the type of the result to the type of the temporary.  */
r = adjust_temp_type (TREE_TYPE (t), r);
+  if (TARGET_EXPR_CLEANUP (t) && !CLEANUP_EH_ONLY (t))
+   ctx->global->cleanups->safe_push 

Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.

2019-12-03 Thread Jason Merrill

On 12/3/19 1:40 PM, Jason Merrill wrote:

On 12/3/19 12:49 PM, Marek Polacek wrote:

On Tue, Dec 03, 2019 at 02:07:02AM -0500, Jason Merrill wrote:

On 12/2/19 5:09 PM, Marek Polacek wrote:

On Mon, Dec 02, 2019 at 12:09:17PM -0500, Jason Merrill wrote:

On 12/1/19 8:09 PM, Marek Polacek wrote:

+ || (skip_empty
+ && is_really_empty_class (TREE_TYPE (field),


This should probably check DECL_SIZE (field) == size_zero_node 
instead,
since that will properly distinguish between overlapping and 
non-overlapping
data members of empty class type.  And please test how this works 
with data
members of empty class type both with and without 
[[no_unique_address]].


I don't think that's possible -- empty classes in C++ have 
sizeof(char), unless

their only member is char[0], then their DECL_SIZE is 0.


I think you're talking about the TYPE_SIZE of the class, and I'm talking
about the DECL_SIZE of the FIELD_DECL.


I'm really looking at the DECL_SIZE:

Breakpoint 5, next_initializable_field (field=0x7fffea90dc78 s>, skip_empty=true)

 at /home/mpolacek/src/gcc/gcc/cp/decl.c:5928
5928 && is_really_empty_class (TREE_TYPE (field),
(gdb) p field
$1 = 
(gdb) p DECL_SIZE($1)
$2 = 
(gdb) pge
8

This is constexpr-init8.C:

struct S {
   constexpr S(int) {}
};

struct W {
   constexpr W(int) : s(8), p() {}

   S s;
   int *p;
};

constexpr auto a = W(42);

I think that's because layout_class_type has:

  6491   else if (might_overlap && is_empty_class (type))
  6492 layout_empty_base_or_field (rli, field, 
empty_base_offsets);

  6493   else
  6494 layout_nonempty_base_or_field (rli, field, NULL_TREE,
  6495    empty_base_offsets);

and here might_overlap is false because 's' doesn't have 
[[no_unique_address]].

So we emit the FIELD_DECL with size 8.


Yes, that's correct.  So I don't think we want to skip 's' in 
next_initializable_field; it should have a normal initializer.



Its type CLASSTYPE_SIZE is 0 though.
I don't know but I'd rather not mess with this...


I've added two testcases: constexpr-init13.C and 
constexpr-init14.C.  Is there
another scenario regarding [[no_unique_address]] that you want me to 
test?


I think the classes with empty base fields need to have another 
initialized

field after them to have a chance of tripping

   if (idx != field)


And sure enough, that's exactly the case for

struct E {
   constexpr E() = default;
   constexpr E(int) {}
};

struct S {
   E e;
   int i;
   constexpr S() : e{} , i(11) { }
};

constexpr S s;


And that's even without [[no_unique_address]].

So it's a valid concern.  It's unclear to me how this should be resolved.


What does the initializer for s look like in this case?


One thought: drop the empty base checking in next_initializable_field, 
and change the test in reduced_constant_expression_p to


if (idx != field && !is_empty_class (TREE_TYPE (field)))

so we handle both having or not having an initializer for empty class 
elements.


Jason



Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.

2019-12-03 Thread Jason Merrill

On 12/3/19 12:49 PM, Marek Polacek wrote:

On Tue, Dec 03, 2019 at 02:07:02AM -0500, Jason Merrill wrote:

On 12/2/19 5:09 PM, Marek Polacek wrote:

On Mon, Dec 02, 2019 at 12:09:17PM -0500, Jason Merrill wrote:

On 12/1/19 8:09 PM, Marek Polacek wrote:

+|| (skip_empty
+&& is_really_empty_class (TREE_TYPE (field),


This should probably check DECL_SIZE (field) == size_zero_node instead,
since that will properly distinguish between overlapping and non-overlapping
data members of empty class type.  And please test how this works with data
members of empty class type both with and without [[no_unique_address]].


I don't think that's possible -- empty classes in C++ have sizeof(char), unless
their only member is char[0], then their DECL_SIZE is 0.


I think you're talking about the TYPE_SIZE of the class, and I'm talking
about the DECL_SIZE of the FIELD_DECL.


I'm really looking at the DECL_SIZE:

Breakpoint 5, next_initializable_field (field=, 
skip_empty=true)
 at /home/mpolacek/src/gcc/gcc/cp/decl.c:5928
5928 && is_really_empty_class (TREE_TYPE (field),
(gdb) p field
$1 = 
(gdb) p DECL_SIZE($1)
$2 = 
(gdb) pge
8

This is constexpr-init8.C:

struct S {
   constexpr S(int) {}
};

struct W {
   constexpr W(int) : s(8), p() {}

   S s;
   int *p;
};

constexpr auto a = W(42);

I think that's because layout_class_type has:

  6491   else if (might_overlap && is_empty_class (type))
  6492 layout_empty_base_or_field (rli, field, empty_base_offsets);
  6493   else
  6494 layout_nonempty_base_or_field (rli, field, NULL_TREE,
  6495empty_base_offsets);

and here might_overlap is false because 's' doesn't have [[no_unique_address]].
So we emit the FIELD_DECL with size 8.


Yes, that's correct.  So I don't think we want to skip 's' in 
next_initializable_field; it should have a normal initializer.



Its type CLASSTYPE_SIZE is 0 though.
I don't know but I'd rather not mess with this...



I've added two testcases: constexpr-init13.C and constexpr-init14.C.  Is there
another scenario regarding [[no_unique_address]] that you want me to test?


I think the classes with empty base fields need to have another initialized
field after them to have a chance of tripping

   if (idx != field)


And sure enough, that's exactly the case for

struct E {
   constexpr E() = default;
   constexpr E(int) {}
};

struct S {
   E e;
   int i;
   constexpr S() : e{} , i(11) { }
};

constexpr S s;


And that's even without [[no_unique_address]].

So it's a valid concern.  It's unclear to me how this should be resolved.


What does the initializer for s look like in this case?

Jason



Re: [PING^2] Re: [PATCH 1/2] Add a pass to automatically add ptwrite instrumentation

2019-12-03 Thread Andi Kleen
Andi Kleen  writes:

Ping!

> Andi Kleen  writes:
>
> Ping!
>
>> From: Andi Kleen 
>>
>> [v4: Rebased on current tree. Avoid some redundant log statements
>> for locals and a few other fixes.  Fix some comments. Improve
>> documentation. Did some studies on the debug information quality,
>> see below]
>>
>> Add a new pass to automatically instrument changes to variables
>> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
>> field into an Processor Trace log, which allows low over head
>> logging of information. Essentially it's a hardware accelerated
>> printf.


Clear calls_comdat_local when comdat group is dissolved

2019-12-03 Thread Jan Hubicka
Hi,
while looking into Firefox inlining dumps I noticed that we often do not
inline because we think function calls comdat local while the comdat group
itself has been dissolved.

Bootstrapped/regtested x86_64-linux, comitted.

* cgraph.c (cgraph_node::verify_node): Check that calls_comdat_local
is set only for symbol in comdat group.
* symtab.c (symtab_node::dissolve_same_comdat_group_1): Clear it.
Index: cgraph.c
===
--- cgraph.c(revision 278904)
+++ cgraph.c(working copy)
@@ -3093,6 +3094,11 @@ cgraph_node::verify_node (void)
   error ("inline clone is forced to output");
   error_found = true;
 }
+  if (calls_comdat_local && !same_comdat_group)
+{
+  error ("calls_comdat_local is set outside of a comdat group");
+  error_found = true;
+}
   for (e = indirect_calls; e; e = e->next_callee)
 {
   if (e->aux)
Index: symtab.c
===
--- symtab.c(revision 278904)
+++ symtab.c(working copy)
@@ -489,6 +489,8 @@ symtab_node::dissolve_same_comdat_group_
 {
   next = n->same_comdat_group;
   n->same_comdat_group = NULL;
+  if (dyn_cast  (n))
+   dyn_cast  (n)->calls_comdat_local = false;
   /* Clear comdat_group for comdat locals, since
  make_decl_local doesn't.  */
   if (!TREE_PUBLIC (n->decl))


Re: [PATCH], V7, #6 of 7, Fix issues with vector extract and prefixed instructions

2019-12-03 Thread Michael Meissner
On Tue, Nov 26, 2019 at 01:20:20PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Nov 14, 2019 at 06:09:09PM -0500, Michael Meissner wrote:
> > In this case, the current code re-uses the temporary for calculating the 
> > offset
> > of the element to load up the address of the vector, losing the offset.
> 
> Reusing the same pseudo is *always* a bad idea.  You get better
> optimisation if most code is "SSA-like": write to every pseudo only
> once.  Of course you need to violate this where you woule have PHIs in
> actual SSA.

Yes.  I was describing what the current code does (and why I'm fixing it).  It
is a bug.

> > I needed to add a new constraint (em) in addition to new predicate 
> > functions.
> > I discovered that with the predicate function alone, the register allocator
> > would re-create the address.  The constraint prevents this combination.
> 
> It's a reasonable thing to have as a contraint, too.
> 
> Once again, how should things work in inline assembler?  Can we allow
> prefixed addressing there, or should "m" in inline asm mean "em"?

At the moment, I think we should just not allow prefixed addresses in asm
constructs at all.

> 
> > I also modified the vector extract code to generate a single PC-relative 
> > load
> > if the vector has a PC-relative address and the offset is constant.
> 
> That should be handled by the RTL optimisers anyway, no?  Or is this
> for post-reload splitters (a bad idea always).

You have it backwards.  It is the RTL optimizers that combines the vector
extract with the load in the first place.  My code allows the combination and
generates a single instruction.  Otherwise, it would do a PLA and then do the
vector extract with the address loaded.

> > * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
> > for optimizing extracting a constant vector element from a vector
> > that uses a prefixed address.  If the element number is variable
> > and the address uses a prefixed address, abort.
> 
> It doesn't abort.  Erm, wait, it *does* ICE.  Please make that more
> prominent (in the code).  It's not clear why you mention it in the
> changelog while allt he other details are just "add support"?
> 
> I find the control flow very hard to read here.
> 
> > +  /* Optimize PC-relative addresses with a constant offset.  */
> > +  else if (pcrel_p && CONST_INT_P (element_offset))
> > +{
> > +  rtx addr2 = addr;
> 
> addr isn't used after this anyway, so you can clobber it just fine.

Generally I prefer not to reuse variables.

> > +  /* With only one temporary base register, we can't support a PC-relative
> > + address added to a variable offset.  This is because the PADDI 
> > instruction
> > + requires RA to be 0 when doing a PC-relative add (i.e. no register to 
> > add
> > + to).  */
> > +  else if (pcrel_p)
> > +gcc_unreachable ();
> 
> That comment suggests we just ICE when we get unwanted input.  Such a
> comment belongs where we prevent such code from being formed in the
> first place (or nowhere, if it is obvious).

The constraint and predicate is where we prevent it from occuring.  The
gcc_unreachable is just there as insurance that it didn't get recreated later.
During testing before I added the constraint, I found the register allocator
combining the two, even though the predicate didn't allow the combination.  So
the test is just to ICE out if the combination took place.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH], V7, #5 of 7, Add more effective targets for the 'future' system to target-supports.

2019-12-03 Thread Michael Meissner
On Fri, Nov 22, 2019 at 08:11:16PM -0600, Segher Boessenkool wrote:
> On Thu, Nov 14, 2019 at 05:56:50PM -0500, Michael Meissner wrote:
> > * lib/target-supports.exp
> > (check_effective_target_powerpc_future_ok): Do not require 64-bit
> > or Linux support before doing the test.  Use a 32-bit constant in
> > PLI.
> 
> You changed from 0x12345 to 0x1234, instead -- why?

The last time I did the patch there was discussion about 32-bit support.  I
just rewrote the patch to accomidate possibly using -mcpu=future on a 32-bit
platform.  In particular, AIX.  So I rewrote the basic test so that in theory
it would run on a 32-bit platform.

> > -# Return 1 if this is a PowerPC target supporting -mfuture.
> > -# Limit this to 64-bit linux systems for now until other
> > -# targets support FUTURE.
> > +# Return 1 if this is a PowerPC target supporting -mcpu=future.
> 
> "Return 1 if the assembler supports Future instructions."
> 
> Please make it explicit that this isn't about the compiler.

Ok.

> >  proc check_effective_target_powerpc_future_ok { } {
> > -if { ([istarget powerpc64*-*-linux*]) } {
> > +if { ([istarget powerpc*-*-*]) } {
> > return [check_no_compiler_messages powerpc_future_ok object {
> > int main (void) {
> > long e;
> > -   asm ("pli %0,%1" : "=r" (e) : "n" (0x12345));
> > +   asm ("pli %0,%1" : "=r" (e) : "n" (0x1234));
> > return e;
> > }
> > } "-mfuture"]
> 
> You are still passing -mfuture.

All of the tests use the -m test and not -mcpu= test
(i.e. -mpower9-vector instead of -mcpu=power9 or -mdejagnu-cpu=power9), so I
was being consistant with those tests.

> > +# Return 1 if this is a PowerPC target supporting -mcpu=future.  The 
> > compiler
> > +# must support large numeric prefixed addresses by default when -mfuture is
> > +# used.  We test loading up a large constant to verify that the full 34-bit
> > +# offset for prefixed instructions is supported and we check for a prefixed
> > +# load as well.
> 
> The comment says one thing.
> 
> -mfuture isn't a compiler option...  Well it still is, but that should
> be removed.

Again, I was just being consistant with the other tests.

> The actual test uses only 30 bits (and a positive number).  Which is fine,
> but the comment is misleading then: the code doesn't test "if the full
> 34-bit offset is supported".

I can fix the comment.

> I don't understand why we test both pli and pld.

Just being cautious.

> > +proc check_effective_target_powerpc_prefixed_addr_ok { } {
> 
> The name says another.
> 
> > +if { ([istarget powerpc*-*-*]) } {
> 
> This part has no function?  Are there any testcases that test for our
> prefixed insns that are compiler for non-powerpc?

Probably not, but all of the other tests have the same prefix.

> If we want this at all, this test shouldn't be nested, just should be
> an early-out.
> 
> > +   return [check_no_compiler_messages powerpc_prefixed_addr_ok object {
> > +   int main (void) {
> > +   extern long l[];
> > +   long e, e2;
> > +   asm ("pli %0,%1" : "=r" (e) : "n" (0x12345678));
> > +   asm ("pld %0,0x12345678(%1)" : "=r" (e2) : "r" (& l[0]));
> 
> (should be "b" for the last constraint; and "[0]" is usually written
> just "l").
> 
> > +   return e - e2;
> > +   }
> > +   } "-mfuture"]
> 
> And the code tests two things.
> 
> -mcpu=future, instead?
>
> Does this need to be separate from check_effective_target_powerpc_future_ok
> at all?

This gets back to whether some port will eventually want to use FUTURE
instructions in a 32-bit context.  Linux will always be 64-bit little endian,
but there are other platforms out there that may want to generate FUTURE code.

> > +# Return 1 if this is a PowerPC target supporting -mfuture.  The compiler 
> > must
> 
> That is the third selector claiming to test the same thing ("target supports
> -mfuture").
> 
> > +# support PC-relative addressing when -mcpu=future is used to pass this 
> > test.
> > +
> > +proc check_effective_target_powerpc_pcrel_ok { } {
> > +if { ([istarget powerpc*-*-*]) } {
> > +   return [check_no_compiler_messages powerpc_pcrel_ok object {
> > + int main (void) {
> > + static int s __attribute__((__used__));
> > + int e;
> > + asm ("plwa %0,s@pcrel(0),1" : "=r" (e));
> > + return e;
> > + }
> > + } "-mfuture"]
> > +  } else {
> > + return 0
> > +  }
> > +}
> 
> So every assembler will support either all three of these, or none.
> Can you simplify this please?

I can imagine for example AIX might support 64-bit 'future' but not support
PC-relative.  I believe you (or David) asked to split up the prefixed
addressing with numeric offets and PC-relative addressing, because ports might
not be able to support PC-relative addressing.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: 

Re: Add a new combine pass

2019-12-03 Thread Segher Boessenkool
On Tue, Dec 03, 2019 at 10:33:48PM +0900, Oleg Endo wrote:
> On Mon, 2019-11-25 at 16:47 -0600, Segher Boessenkool wrote:
> > 
> > > > - sh (that's sh4-linux):
> > > > 
> > > > /home/segher/src/kernel/net/ipv4/af_inet.c: In function 
> > > > 'snmp_get_cpu_field':
> > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to 
> > > > find a register to spill in class 'R0_REGS'
> > > >  1638 | }
> > > >   | ^
> > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the 
> > > > insn:
> > > > (insn 18 17 19 2 (set (reg:SI 0 r0)
> > > > (mem:SI (plus:SI (reg:SI 4 r4 [178])
> > > > (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) 
> > > > "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
> > > >  (expr_list:REG_DEAD (reg:SI 4 r4 [178])
> > > > (expr_list:REG_DEAD (reg:SI 6 r6 [171])
> > > > (nil
> > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier 
> > > > errors, bailing out
> > > 
> > > Would have to look more at this one.  Seems odd that it can't allocate
> > > R0 when it's already the destination and when R0 can't be live before
> > > the insn.  But there again, this is reload, so my enthuasiasm for looking
> > > is a bit limited :-)
> > 
> > It wants to use r0 in some other insn, so it needs to spill it here, but
> > cannot.  This is what class_likely_spilled is for.
> 
> Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
> explicitly, but it's got a R0_REGS class with only one said reg in it. 
> So the default impl of class_likely_spilled should do its thing.

Yes, good point.  So what happened here?  Is it just RA messing things
up, unrelated to the new pass?


Segher


Re: [PATCH v2 2/2] testsuite: Fix run-time tracking down of `libgcc_s'

2019-12-03 Thread Maciej W. Rozycki
On Mon, 2 Dec 2019, Ian Lance Taylor wrote:

> > gcc/testsuite/
> > * lib/gcc-defs.exp (gcc-set-multilib-library-path): Use
> > `-print-file-name=' to determine the multilib root directory.
> > Use `remote_exec host' rather than `exec' to invoke the
> > compiler.
> 
> This patch is for better Go support but the patch is to generic code.
> The patch is fine with me but should ideally be reviewed by a
> testsuite maintainer.

 I agree.  I have cc-ed you to keep you in the circle and I have added the 
testsuite people now too, but ultimately I think it's Richard who might be 
the best person to look into it as he's been highly skilled in this area 
and he has also dealt with this particular piece in the past.

 Richard, would you be able to find a few cycles and be kind enough to 
look into it?

 I'm somewhat surprised it's only libgo that suffers from the lack of 
run-time library paths in my setup where a random copy of `libgcc_s' is 
not already present in the environment, but maybe no other target library 
test suite pulls `libgcc_s' with any of its tests.

  Maciej


Re: [PATCH, rs6000] Fix PR92760 by checking VECTOR_MEM_NONE_P instead

2019-12-03 Thread Segher Boessenkool
Hi!

On Tue, Dec 03, 2019 at 05:38:41PM +0800, Kewen.Lin wrote:
> PR92760 exposed one issue that VECTOR_UNIT_NONE_P (V2DImode) is true on Power7
> then we won't return it as preferred_simd_mode but ISA 2.06 (Power7) does 
> introduce partial support on vector doubleword (very limitted) and more basic
> support origins from ISA 2.07 (Power8) though.  To make vectorizer still
> leverage those few but available V2DImode related instructions, we need to
> claim it's available on VSX (Power7 and up).
> 
> Instead of using VECTOR_UNIT_NONE_P, this fix is to use VECTOR_MEM_NONE_P
> instead.  I was intended to use both VECTOR_UNIT_NONE_P and VECTOR_MEM_NONE_P
> together but noticed that MEM is like a super set of UNIT in current
> implementation, I think it would be enough.

Yes.  This hook should return what mode we want to use, and !MEM_NONE is
a better match to what we want there.

> 2019-12-03  Kewen Lin  
> 
>   PR target/92760
>   * gcc/config/rs6000/rs6000.c (rs6000_preferred_simd_mode): Update
>   VECTOR_UNIT_NONE_P by VECTOR_MEM_NONE_P.

(Maybe better is "Use VECTOR_MEM_NONE_P instead of VECTOR_UNIT_NONE_P."?)

Okay for trunk.  Thanks!


Segher


Re: [PATCH], V7, #2 of 7, Use PLI to load up 32-bit SImode constants

2019-12-03 Thread Michael Meissner
On Mon, Nov 25, 2019 at 06:49:49PM -0600, Segher Boessenkool wrote:
> On Mon, Nov 25, 2019 at 05:17:08PM -0500, Michael Meissner wrote:
> > On Fri, Nov 22, 2019 at 06:20:52PM -0600, Segher Boessenkool wrote:
> > > >  (define_split
> > > >[(set (match_operand:SI 0 "gpc_reg_operand")
> > > > (match_operand:SI 1 "const_int_operand"))]
> > > >"(unsigned HOST_WIDE_INT) (INTVAL (operands[1]) + 0x8000) >= 0x1
> > > > -   && (INTVAL (operands[1]) & 0x) != 0"
> > > > +   && (INTVAL (operands[1]) & 0x) != 0 && !TARGET_PREFIXED_ADDR"
> > > >[(set (match_dup 0)
> > > > (match_dup 2))
> > > > (set (match_dup 0)
> > > 
> > > Please use num_insns_constant, instead (and fix num_insns_constant_gpr
> > > so it knows about SIGNED_34BIT).
> > 
> > The previous patch V6 #1 already had the modification for
> > num_insns_constant_gpr.
> 
> It's not in trunk yet?

Sorry, the USA Thanksgiving holiday got in the way.

No, the change for num_insns_constant_gpr could not go in until the support for
PLI went in (patch V6 #1).

So for patches V6 #1-3, I believe you approved #1 (movdi) and #3 (addi3/addsi3)
after the changes but not #2 (movsi).  I haven't yet applied the specific bits
of #1 or #3 after doing the reformating patch.  Are the bits for #2 ok with the
above change to use num_insns_constant_gpr.  Or did you want to see the 3
patches once again after the reformating.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-03 Thread Kyrill Tkachov



On 12/3/19 1:45 PM, Wilco Dijkstra wrote:

Hi,

Part 2, split off from 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00399.html


To enable cores to use the correct max_cond_insns setting, use the 
core-specific

tuning when a CPU/tune is selected unless -mrestrict-it is explicitly set.

On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a
0.4% codesize reduction.

Bootstrapped on armhf. OK for commit?


Ok.

Thanks,

Kyrill



ChangeLog:

2019-12-03  Wilco Dijkstra  

    * config/arm/arm.c (arm_option_override_internal):
    Use max_cond_insns from CPU tuning unless -mrestrict-it is used.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
daebe76352d62ad94556762b4e3bc3d0532ad411..5ed9046988996e56f754c5588e4d25d5ecdd6b03 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3041,6 +3041,11 @@ arm_option_override_internal (struct 
gcc_options *opts,

   if (!TARGET_THUMB2_P (opts->x_target_flags) || !arm_arch_notm)
 opts->x_arm_restrict_it = 0;

+  /* Use the IT size from CPU specific tuning unless -mrestrict-it is 
used.  */

+  if (!opts_set->x_arm_restrict_it
+  && (opts_set->x_arm_cpu_string || opts_set->x_arm_tune_string))
+    opts->x_arm_restrict_it = 0;
+
   /* Enable -munaligned-access by default for
  - all ARMv6 architecture-based processors when compiling for a 
32-bit ISA

  i.e. Thumb2 and ARM state only.


Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.

2019-12-03 Thread Marek Polacek
On Tue, Dec 03, 2019 at 02:07:02AM -0500, Jason Merrill wrote:
> On 12/2/19 5:09 PM, Marek Polacek wrote:
> > On Mon, Dec 02, 2019 at 12:09:17PM -0500, Jason Merrill wrote:
> > > On 12/1/19 8:09 PM, Marek Polacek wrote:
> > > > +|| (skip_empty
> > > > +&& is_really_empty_class (TREE_TYPE (field),
> > > 
> > > This should probably check DECL_SIZE (field) == size_zero_node instead,
> > > since that will properly distinguish between overlapping and 
> > > non-overlapping
> > > data members of empty class type.  And please test how this works with 
> > > data
> > > members of empty class type both with and without [[no_unique_address]].
> > 
> > I don't think that's possible -- empty classes in C++ have sizeof(char), 
> > unless
> > their only member is char[0], then their DECL_SIZE is 0.
> 
> I think you're talking about the TYPE_SIZE of the class, and I'm talking
> about the DECL_SIZE of the FIELD_DECL.

I'm really looking at the DECL_SIZE:

Breakpoint 5, next_initializable_field (field=, 
skip_empty=true)
at /home/mpolacek/src/gcc/gcc/cp/decl.c:5928
5928 && is_really_empty_class (TREE_TYPE (field),
(gdb) p field
$1 = 
(gdb) p DECL_SIZE($1)
$2 = 
(gdb) pge
8

This is constexpr-init8.C:

struct S {
  constexpr S(int) {}
};

struct W {
  constexpr W(int) : s(8), p() {}

  S s;
  int *p;
};

constexpr auto a = W(42);

I think that's because layout_class_type has:

 6491   else if (might_overlap && is_empty_class (type))
 6492 layout_empty_base_or_field (rli, field, empty_base_offsets);
 6493   else
 6494 layout_nonempty_base_or_field (rli, field, NULL_TREE,
 6495empty_base_offsets);

and here might_overlap is false because 's' doesn't have [[no_unique_address]].
So we emit the FIELD_DECL with size 8.  Its type CLASSTYPE_SIZE is 0 though.
I don't know but I'd rather not mess with this...

> > I've added two testcases: constexpr-init13.C and constexpr-init14.C.  Is 
> > there
> > another scenario regarding [[no_unique_address]] that you want me to test?
> 
> I think the classes with empty base fields need to have another initialized
> field after them to have a chance of tripping
> 
>   if (idx != field)

And sure enough, that's exactly the case for

struct E {
  constexpr E() = default;
  constexpr E(int) {}
};

struct S {
  E e;
  int i;
  constexpr S() : e{} , i(11) { }
};

constexpr S s;


And that's even without [[no_unique_address]].

So it's a valid concern.  It's unclear to me how this should be resolved.

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [Patch] Add OpenACC 2.6's no_create

2019-12-03 Thread Tobias Burnus

On 12/3/19 4:16 PM, Thomas Schwinge wrote:

On 2019-11-15T20:11:29+0100, Tobias Burnus  wrote:

* Make no_create.c effective by adding 'has_firstprivate = true;' to
target.c.*
(* If one tries to access c or e in the no_create-3.{c,f90} run-time
test case, plugin-nvidia rightly complains (illegal memory access),
using the created 'b' or 'd' works as tested by the test case.

So that's specifically what you fixed above, or is that another problem?


Well, that was one way of manually testing that it really worked for 
not-mapped variables w/o creating them (i.e. verifying that "no_create" 
didn't just act as "present"). – Manual as that's not that simple to 
code in the test suite (shared memory, exact wording for dg-output etc.) 
— However, I think it can be done using '#include ' / "use 
openacc", #if !ACC_MEM_SHARED, and calling acc_is_present (passing 
either "sizeof()" or a simple "1" as "len" argument); hence, I will try 
this next version of the patch.



I'm willing to accept that patch as-is, unless Jakub has any further comments 
at this point. […]
With these items considered/addressed as you feel comfortable, this is OK for 
trunk.


Tobias

PS: I will have a closer look tomorrow at the your new test cases and 
comments.




Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-03 Thread Jakub Jelinek
On Tue, Dec 03, 2019 at 11:52:13AM -0500, David Malcolm wrote:
> > > Our plugin "interface" as such is very broad.
> > 
> > Just to sneak in here I don't like exposing our current plugin "non-
> > API"
> > more.  In fact I'd just build the analyzer into GCC with maybe an
> > option to disable its build (in case it is very fat?).
> 
> My aim here is to provide a way for distributors to be able to disable
> its build - indeed, for now, for it to be disabled by default,
> requiring opting-in.
> 
> My reasoning here is that the analyzer is middle-end code, but isn't as
> mature as the rest of the middle-end (but I'm working on getting it
> more mature).
> 
> I want some way to label the code as a "technology preview", that
> people may want to experiment with, but to set expectations that this
> is a lot of new code and there will be bugs - but to make it available
> to make it easier for adventurous users to try it out.
> 
> I hope that makes sense.
> 
> I went down the "in-tree plugin" path by seeing the analogy with
> frontends, but yes, it would probably be simpler to just build it into
> GCC, guarded with a configure-time variable.  It's many thousand lines
> of non-trivial C++ code, and associated selftests and DejaGnu tests.

I think it is enough to document it as tech preview in the documentation,
no need to have it as an in-tree plugin.  We have lots of options that had
such a state (perhaps undeclared) over the years, I'd consider
-fvtable-verify= to be such an option, or in the past e.g.
-fipa-matrix-reorg or -fipa-struct-reorg.  And 2.5% code growth isn't that
bad.  So, as long as the option isn't enabled by default, I think we'd be
fine.

Jakub



Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-03 Thread David Malcolm
On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote:
> On Tue, Nov 19, 2019 at 11:02 PM David Malcolm 
> wrote:
> > > > The checker is implemented as a GCC plugin.
> > > > 
> > > > The patch kit adds support for "in-tree" plugins i.e. GCC
> > > > plugins
> > > > that
> > > > would live in the GCC source tree and be shipped as part of the
> > > > GCC
> > > > tarball,
> > > > with a new:
> > > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > > configure option, analogous to --enable-languages (the
> > > > Makefile/configure
> > > > machinery for handling in-tree GCC plugins is adapted from how
> > > > we
> > > > support
> > > > frontends).
> > > 
> > > I like that.  Implementing this as a plugin surely must help to
> > > either
> > > document the GCC plugin interface as powerful/mature for such a
> > > task.  Or
> > > make it so, if it isn't yet.  ;-)
> > 
> > Our plugin "interface" as such is very broad.
> 
> Just to sneak in here I don't like exposing our current plugin "non-
> API"
> more.  In fact I'd just build the analyzer into GCC with maybe an
> option to disable its build (in case it is very fat?).

My aim here is to provide a way for distributors to be able to disable
its build - indeed, for now, for it to be disabled by default,
requiring opting-in.

My reasoning here is that the analyzer is middle-end code, but isn't as
mature as the rest of the middle-end (but I'm working on getting it
more mature).

I want some way to label the code as a "technology preview", that
people may want to experiment with, but to set expectations that this
is a lot of new code and there will be bugs - but to make it available
to make it easier for adventurous users to try it out.

I hope that makes sense.

I went down the "in-tree plugin" path by seeing the analogy with
frontends, but yes, it would probably be simpler to just build it into
GCC, guarded with a configure-time variable.  It's many thousand lines
of non-trivial C++ code, and associated selftests and DejaGnu tests.

Building with --enable-checking=release, and stripping the binaries and
the plugin, I see:

$ ls -al cc1 cc1plus plugin/analyzer_plugin.so 
-rwxrwxr-x. 1 david david 25921600 Dec  3 11:22 cc1
-rwxrwxr-x. 1 david david 27473568 Dec  3 11:22 cc1plus
-rwxrwxr-x. 1 david david   645256 Dec  3 11:22
plugin/analyzer_plugin.so

$ ls -alh cc1 cc1plus plugin/analyzer_plugin.so 
-rwxrwxr-x. 1 david david  25M Dec  3 11:22 cc1
-rwxrwxr-x. 1 david david  27M Dec  3 11:22 cc1plus
-rwxrwxr-x. 1 david david 631K Dec  3 11:22 plugin/analyzer_plugin.so

so the plugin is about 2.5% of the size of the existing compiler.

The analysis pass is very time-consuming when enabled via -fanalyzer. 
I'm aiming for "x2 compile-time in exchange for finding lots of bugs"
as a tradeoff that users will be happy to make (by supplying
-fanalyzer) - that's faster than comparable static analyzers I've been
playing with.

> From what I read it seems the analyzer could do with a proper
> plugin API that just exposes introspection - and I really hope
> somebody finds the time to complete (or rewrite...) the
> proposed introspection API that ideally is even cross-compiler
> (proven by implementing said API ontop of both GCC and clang/llvm).
> That way the Analyzer would work with both GCC and clang [and golang
> and rustc...].

We've gone back and forth about what a GCC plugin API should look like;
I'm not sure what the objectives are.

For example, are we hoping to offer some kind of ABI guarantee to
plugins so that we can patch GCC without plugins needing to be rebuilt?
If so, how strong is the ABI guarantee?  For example, do we directly
expose the tree code enums and the gimple code enums?

Or is it more ambitious, and hoping to be cross-compiler, in which case
are these enums themselves hidden?

This feels like opening a massive can of worms, and orthogonal to my
goal of giving GCC a static analysis framework.

> So it would be interesting if you could try to sketch the kind of API
> the Analyzer needs?  That is, merely the detail on which it inspects
> statements, the CFG and the callgraph.

FWIW the symbols consumed by the plugin can be seen at:
 https://dmalcolm.fedorapeople.org/gcc/2019-11-27/symbols-used.txt

This is the result of:
  eu-readelf -s plugin/analyzer_plugin.so |c++filt|grep UNDEF

Surveying that, the plugin:
- creates a pass
- views the callgraph and the functions (e.g. ipa_reverse_postorder)
- views CFGs and SSA representation (including statements)
- uses the diagnostic subsystem (which parts of the patch kit extend,
adding e.g. control flow paths), e.g. creating and subclassing
rich_locations, subclassing diagnostic_path and diagnostic_event
- calls into middle-end support functions like
useless_type_conversion_p
- uses GCC types such as bitmap, inchash, wideint
- creates temporary trees
- has selftests
...etc.

But there are inline uses of various functions that don't show up in
that list (e.g. the various gimple_* accessor functions - grepping the

[Patch, Fortran] PR92754 - fix an issue with resolving intrinsic functions

2019-12-03 Thread Tobias Burnus
The problem here is that one gets two symbols - one inside the block and 
one outside and they do not really agree whether one has a function or a 
variable – which later gives an ICE. As sym->module was "(intrinsic)" 
and FL_VARIABLE, one was running into an assert.


The problem is that when resolving the expression with "max", gfortran 
detects that it got an intrinsic function and resolves the 
expression/function call (in this case to a constant). However, the 
information that "max" was regarded as intrinsic procedure never ends up 
in the symbol itself, only in the expression.


My first idea was to call gfc_resolve_intrinsic, which does a lot of 
nice things like setting the attributes, warning that the type is 
ignored with intrinsic procedures (-Wsurprsing, only) etc. However, that 
gives a bunch of ICE. Even a more Spartan attr setting had similar 
issues – hence, I moved it below the simplification, in the hope it 
would work there. Well, gfc_resolve_intrinsic still gave tons of errors 
and ICEs, but what I now have works. (I think it also works w/o the 
FL_UNKNOWN condition, but, in any case, I am not sure which variant is 
better)


The settings I use with this patch seem to work and I have the hope that 
it covers all valid/invalid code correctly (fingers crossed).


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias

PS: I added 'sym' as after simplifcation, expr->symtree->n.sym might no 
longer exist; getting rid of 'name' was only a cleanup as sym->name is a 
tad clearer and avoids another variable.


PPS: I ended up fixing this PR as Martin CC'ed me – after finding that 
my commit caused the ICE; well, that one was in 2014 (!), somewhat 
unrelated, and before that commit the code was rejected… With a lot of 
good (evil?) will, one can still call it a regression ;-)


	gcc/fortran/
	PR fortran/92754
	* intrinsic.c (gfc_intrinsic_func_interface): Set
	sym's flavor, intrinsic and function attribute if
	unset.

	gcc/testsuite/
	PR fortran/92754
	gfortran.dg/intrinsic_9.f90: New.

diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 572967f5d4e..76b53bb7117 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -4839,9 +4839,9 @@ gfc_check_intrinsic_standard (const gfc_intrinsic_sym* isym,
 match
 gfc_intrinsic_func_interface (gfc_expr *expr, int error_flag)
 {
+  gfc_symbol *sym;
   gfc_intrinsic_sym *isym, *specific;
   gfc_actual_arglist *actual;
-  const char *name;
   int flag;
 
   if (expr->value.function.isym != NULL)
@@ -4857,15 +4857,15 @@ gfc_intrinsic_func_interface (gfc_expr *expr, int error_flag)
   flag |= (actual->expr->ts.type != BT_INTEGER
 	   && actual->expr->ts.type != BT_CHARACTER);
 
-  name = expr->symtree->n.sym->name;
+  sym = expr->symtree->n.sym;
 
-  if (expr->symtree->n.sym->intmod_sym_id)
+  if (sym->intmod_sym_id)
 {
-  gfc_isym_id id = gfc_isym_id_by_intmod_sym (expr->symtree->n.sym);
+  gfc_isym_id id = gfc_isym_id_by_intmod_sym (sym);
   isym = specific = gfc_intrinsic_function_by_id (id);
 }
   else
-isym = specific = gfc_find_function (name);
+isym = specific = gfc_find_function (sym->name);
 
   if (isym == NULL)
 {
@@ -4879,7 +4879,7 @@ gfc_intrinsic_func_interface (gfc_expr *expr, int error_flag)
|| isym->id == GFC_ISYM_SNGL || isym->id == GFC_ISYM_DFLOAT)
   && gfc_init_expr_flag
   && !gfc_notify_std (GFC_STD_F2003, "Function %qs as initialization "
-			  "expression at %L", name, >where))
+			  "expression at %L", sym->name, >where))
 {
   if (!error_flag)
 	gfc_pop_suppress_errors ();
@@ -4898,7 +4898,7 @@ gfc_intrinsic_func_interface (gfc_expr *expr, int error_flag)
 	  && id != GFC_ISYM_TRANSFER && id != GFC_ISYM_TRIM
 	  && !gfc_notify_std (GFC_STD_F2003, "Transformational function %qs "
 			  "at %L is invalid in an initialization "
-			  "expression", name, >where))
+			  "expression", sym->name, >where))
 	{
 	  if (!error_flag)
 	gfc_pop_suppress_errors ();
@@ -4956,9 +4956,6 @@ gfc_intrinsic_func_interface (gfc_expr *expr, int error_flag)
 
 got_specific:
   expr->value.function.isym = specific;
-  if (!expr->symtree->n.sym->module)
-gfc_intrinsic_symbol (expr->symtree->n.sym);
-
   if (!error_flag)
 gfc_pop_suppress_errors ();
 
@@ -4980,6 +4977,16 @@ got_specific:
 			  "character arguments at %L", >where))
 return MATCH_ERROR;
 
+  if (sym->attr.flavor == FL_UNKNOWN)
+{
+  sym->attr.function = 1;
+  sym->attr.intrinsic = 1;
+  sym->attr.flavor = FL_PROCEDURE;
+}
+
+  if (!sym->module)
+gfc_intrinsic_symbol (sym);
+
   return MATCH_YES;
 }
 
diff --git a/gcc/testsuite/gfortran.dg/intrinsic_9.f90 b/gcc/testsuite/gfortran.dg/intrinsic_9.f90
new file mode 100644
index 000..43959ad85df
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intrinsic_9.f90
@@ -0,0 +1,15 @@
+! { dg-do run }
+!
+! PR fortran/92754
+!
+! Contributed by G. Steinmetz
+!
+
+program p
+   integer :: max
+   

[amdgcn] Add missing vcondu patterns

2019-12-03 Thread Andrew Stubbs

I've just committed the attached.

The gcn vcondu patterns had accidentally omitted the proper variants for 
floating point modes.  I believe this is intentional for the comparison 
mode, but not for the data mode.


This omission caused testcase gcc.dg/vect/pr65947-10.c to ICE.  That 
testcase now compiles, although not quite correctly, but that's another 
issue (pr92772).


Andrew
Add missing amdgcn vcondu patterns

2019-12-03  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md: Change "vcondu" patterns to use VEC_1REG_MODE
	for the data mode.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 66b822962ae..f3262e22a02 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2596,10 +2596,10 @@
 DONE;
   })
 
-(define_expand "vcondu"
-  [(match_operand:VEC_1REG_INT_MODE 0 "register_operand")
-   (match_operand:VEC_1REG_INT_MODE 1 "gcn_vop3_operand")
-   (match_operand:VEC_1REG_INT_MODE 2 "gcn_alu_operand")
+(define_expand "vcondu"
+  [(match_operand:VEC_1REG_MODE 0 "register_operand")
+   (match_operand:VEC_1REG_MODE 1 "gcn_vop3_operand")
+   (match_operand:VEC_1REG_MODE 2 "gcn_alu_operand")
(match_operator 3 "comparison_operator"
  [(match_operand:VEC_1REG_INT_ALT 4 "gcn_alu_operand")
   (match_operand:VEC_1REG_INT_ALT 5 "gcn_vop3_operand")])]
@@ -2608,15 +2608,15 @@
 rtx tmp = gen_reg_rtx (DImode);
 emit_insn (gen_vec_cmpdi
 	   (tmp, operands[3], operands[4], operands[5]));
-emit_insn (gen_vcond_mask_di
+emit_insn (gen_vcond_mask_di
 	   (operands[0], operands[1], operands[2], tmp));
 DONE;
   })
 
-(define_expand "vcondu_exec"
-  [(match_operand:VEC_1REG_INT_MODE 0 "register_operand")
-   (match_operand:VEC_1REG_INT_MODE 1 "gcn_vop3_operand")
-   (match_operand:VEC_1REG_INT_MODE 2 "gcn_alu_operand")
+(define_expand "vcondu_exec"
+  [(match_operand:VEC_1REG_MODE 0 "register_operand")
+   (match_operand:VEC_1REG_MODE 1 "gcn_vop3_operand")
+   (match_operand:VEC_1REG_MODE 2 "gcn_alu_operand")
(match_operator 3 "comparison_operator"
  [(match_operand:VEC_1REG_INT_ALT 4 "gcn_alu_operand")
   (match_operand:VEC_1REG_INT_ALT 5 "gcn_vop3_operand")])
@@ -2626,7 +2626,7 @@
 rtx tmp = gen_reg_rtx (DImode);
 emit_insn (gen_vec_cmpdi_exec
 	   (tmp, operands[3], operands[4], operands[5], operands[6]));
-emit_insn (gen_vcond_mask_di
+emit_insn (gen_vcond_mask_di
 	   (operands[0], operands[1], operands[2], tmp));
 DONE;
   })


Re: C++ PATCH for c++/91363 - P0960R3: Parenthesized initialization of aggregates

2019-12-03 Thread Jason Merrill

On 12/3/19 9:38 AM, Marek Polacek wrote:

On Tue, Dec 03, 2019 at 02:01:24AM -0500, Jason Merrill wrote:

On 12/2/19 7:31 PM, Marek Polacek wrote:

@@ -1967,8 +1978,23 @@ expand_default_init (tree binfo, tree true_exp, tree 
exp, tree init, int flags,
 tree ctor_name = (true_exp == exp
? complete_ctor_identifier : base_ctor_identifier);
+  /* Given class A,
+
+  A a(1, 2);
+
+can mean a call to a constructor A::A(int, int), if present.  If not,
+but A is an aggregate, we will try aggregate initialization.  */
 rval = build_special_member_call (exp, ctor_name, , binfo, flags,
complain);
+  if (BRACE_ENCLOSED_INITIALIZER_P (rval))
+   {
+ gcc_assert (cxx_dialect >= cxx2a);
+ rval = digest_init (type, rval, tf_warning_or_error);
+ rval = build2 (INIT_EXPR, type, exp, rval);
+ /* So that we do finish_expr_stmt below.  Don't return here, we
+need to release PARMS.  */
+ TREE_SIDE_EFFECTS (rval) = 1;
+   }


Can we still get a CONSTRUCTOR here after the change to
build_new_method_call_1?


Oop, not anymore.  Dropped the whole cp/init.c hunk.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK, thanks.


2019-12-03  Marek Polacek  

PR c++/91363 - P0960R3: Parenthesized initialization of aggregates.
* c-cppbuiltin.c (c_cpp_builtins): Predefine
__cpp_aggregate_paren_init=201902 for -std=c++2a.

* call.c (build_new_method_call_1): Handle parenthesized initialization
of aggregates by building up a CONSTRUCTOR.
(extend_ref_init_temps): Do nothing for CONSTRUCTOR_IS_PAREN_INIT.
* cp-tree.h (CONSTRUCTOR_IS_PAREN_INIT, LOOKUP_AGGREGATE_PAREN_INIT):
Define.
* decl.c (grok_reference_init): Handle aggregate initialization from
a parenthesized list of values.
(reshape_init): Do nothing for CONSTRUCTOR_IS_PAREN_INIT.
(check_initializer): Handle initialization of an array from a
parenthesized list of values.  Use NULL_TREE instead of NULL.
* tree.c (build_cplus_new): Handle BRACE_ENCLOSED_INITIALIZER_P.
* typeck2.c (digest_init_r): Set LOOKUP_AGGREGATE_PAREN_INIT if it
receives a CONSTRUCTOR with CONSTRUCTOR_IS_PAREN_INIT set.  Allow
narrowing when LOOKUP_AGGREGATE_PAREN_INIT.
(massage_init_elt): Don't lose LOOKUP_AGGREGATE_PAREN_INIT when passing
flags to digest_init_r.

* g++.dg/cpp0x/constexpr-99.C: Only expect an error in C++17 and
lesser.
* g++.dg/cpp0x/explicit7.C: Likewise.
* g++.dg/cpp0x/initlist12.C: Adjust dg-error.
* g++.dg/cpp0x/pr31437.C: Likewise.
* g++.dg/cpp2a/feat-cxx2a.C: Add __cpp_aggregate_paren_init test.
* g++.dg/cpp2a/paren-init1.C: New test.
* g++.dg/cpp2a/paren-init10.C: New test.
* g++.dg/cpp2a/paren-init11.C: New test.
* g++.dg/cpp2a/paren-init12.C: New test.
* g++.dg/cpp2a/paren-init13.C: New test.
* g++.dg/cpp2a/paren-init14.C: New test.
* g++.dg/cpp2a/paren-init15.C: New test.
* g++.dg/cpp2a/paren-init16.C: New test.
* g++.dg/cpp2a/paren-init17.C: New test.
* g++.dg/cpp2a/paren-init18.C: New test.
* g++.dg/cpp2a/paren-init19.C: New test.
* g++.dg/cpp2a/paren-init2.C: New test.
* g++.dg/cpp2a/paren-init3.C: New test.
* g++.dg/cpp2a/paren-init4.C: New test.
* g++.dg/cpp2a/paren-init5.C: New test.
* g++.dg/cpp2a/paren-init6.C: New test.
* g++.dg/cpp2a/paren-init7.C: New test.
* g++.dg/cpp2a/paren-init8.C: New test.
* g++.dg/cpp2a/paren-init9.C: New test.
* g++.dg/ext/desig10.C: Adjust dg-error.
* g++.dg/template/crash107.C: Likewise.
* g++.dg/template/crash95.C: Likewise.
* g++.old-deja/g++.jason/crash3.C: Likewise.
* g++.old-deja/g++.law/ctors11.C: Likewise.
* g++.old-deja/g++.law/ctors9.C: Likewise.
* g++.old-deja/g++.mike/net22.C: Likewise.
* g++.old-deja/g++.niklas/t128.C: Likewise.

diff --git gcc/c-family/c-cppbuiltin.c gcc/c-family/c-cppbuiltin.c
index 6491545bc3b..c7f4659456a 100644
--- gcc/c-family/c-cppbuiltin.c
+++ gcc/c-family/c-cppbuiltin.c
@@ -1006,6 +1006,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_impl_destroying_delete=201806L");
  cpp_define (pfile, "__cpp_constexpr_dynamic_alloc=201907L");
  cpp_define (pfile, "__cpp_impl_three_way_comparison=201907L");
+ cpp_define (pfile, "__cpp_aggregate_paren_init=201902L");
}
if (flag_concepts)
  {
diff --git gcc/cp/call.c gcc/cp/call.c
index 062cff4c735..ea0e8b7e7d7 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -10124,6 +10124,38 @@ build_new_method_call_1 (tree instance, tree fns, 
vec **args,
  
if (!any_viable_p)

  {
+  /* [dcl.init], 17.6.2.2:
+
+

Re: "gcn" vs. "amdgcn" etc. (was: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port)

2019-12-03 Thread Julian Brown
On Tue, 3 Dec 2019 15:06:48 +0100
Thomas Schwinge  wrote:

> We probably can't/shouldn't change 'amdgcn' in the target triplet now,
> but as far as I'm concerned, it's not too late to change
> 'gcc/config/gcn' etc., but I guess that won't happen: too much
> effort.  (And then, I don't feel too stronly about it, but wanted to
> make my point anyway.)
> 
> And, I acknowledge that by some of the logic I presented above, indeed
> 'nvptx' should've been called 'ptx' to denote purely the PTX ISA,
> outside of its Nvidia implementation.
> 
> Naming is hard.  ;-)

Another point is that not too many years ago, it'd have been ATI not
AMD. Other CPU architectures or core implementations have shifted
company allegiance since then, also.

Julian


Re: [Patch][Fortran] OpenACC – permit common blocks in some clauses

2019-12-03 Thread Tobias Burnus

Hi Thomas,

Quick version: The attached patch seems to work, kind of,  but fails at 
run time with:
libgomp: Trying to map into device [0x407218..0x40721c) object when 
[0x407210..0x40721c) is already mapped
This marks the common-block decl but not the common-block vars as 
'device resident' (alias "omp declare target").


The attached with '#if 0'  set to '1' does not work as it gives an ICE 
in lto1. – If one only marks the common-block variables, it fails as the 
ME check complains "variable 'block' has been referenced in offloaded 
code but hasn't been marked to be included in the offloaded code" –


I think the first version is fine, but it seems as if the ME needs to 
use pcreate and not create for those. I think that's also the reason for 
the odd is-program check mentioned at the very bottom.


Tobias

PS: Hmm, I really wonder why it seemed to have passed before. Looking at 
the code, it cannot have passed — more below. That goes wrong since 
r272453 for PR85221 (well, it can't before). I don't quickly see whether 
it also affects OpenMP or other clauses.


I think for a proper fix it would be very useful to know some more 
details about the intention of 'declare device_resident' (existing only 
on the device, existing on both host and device etc.). Cf. previous email.


In terms of this issue, if one does:  "integer :: a, b, c; common /name/ 
a,b,c; !$acc declare device_resident(a)", should this make all of the 
common-block variables as device resident or not? Internally, one gets 
for declare-5.f90 the following, i.e. /another/ is the common name and 
g, h and i are common-name variables:


  static integer(kind=4) g [value-expr: another.g];
  static integer(kind=4) h[3] [value-expr: another.h];
  static integer(kind=4) i[3] [value-expr: another.i];

For the test case, the issue is that 'gfc_get_symbol_decl' only called 
after it's tree representation (sym->backend_decl) has already been 
created; this happens for common blocks. – The attached patch fixes 
this, marking the common block decl and all its variables as declare 
device_resident.


One could think of handling other attributes (which ones?). For 
EQUIVALENCE in commons, the attributes are collected using 
accumulate_equivalence_attributes – and for normal variables, it is 
handled in trans-decl.c's add_attributes_to_decl


 * * *

Additionally, and unrelated to the test case, the following code looks 
very suspicious (from finish_oacc_declare in fortran/trans-decl.c):


  module_oacc_clauses = NULL;
  gfc_traverse_ns (ns, find_module_oacc_declare_clauses);
  if (module_oacc_clauses && sym->attr.flavor == FL_PROGRAM)

First, it very much looks like memory leak – one creates a linked list,
but always dumps it w/o using or freeing it if one is currently not
processing the main program. Additionally, it assumes that the main program
has a full view of module-declared 'declare device_resident' variables, but
it is trivial to construct programs where the main program does not see this
property. Most trivial example is:
  subroutine foo()
use module_w_device_decl
  end subroutine
independent whether that function exists as such or as module procedure or
(at some place) is a procedure contained in another procedure. A general
assumption is also that the whole program is compiled with -fopenacc
and that the main program is written in Fortran and not, e.g., in C or C++.

Index: gcc/fortran/trans-common.c
===
--- gcc/fortran/trans-common.c	(revision 278936)
+++ gcc/fortran/trans-common.c	(working copy)
@@ -706,6 +706,15 @@
 	}
 }
 
+  bool is_device_resident = false;
+  for (s = head; s; s = next_s = s->next)
+ is_device_resident |= s->sym->attr.oacc_declare_device_resident;
+
+  if (is_device_resident)
+DECL_ATTRIBUTES (decl)
+  = tree_cons (get_identifier ("omp declare target"),
+		   NULL_TREE, DECL_ATTRIBUTES (decl));
+
   /* Build component reference for each variable.  */
   for (s = head; s; s = next_s)
 {
@@ -750,6 +759,12 @@
 	  GFC_DECL_ASSIGN_ADDR (var_decl) = GFC_DECL_ASSIGN_ADDR (s->field);
 	}
 
+#if 0
+  if (is_device_resident)
+	DECL_ATTRIBUTES (var_decl)
+	  = tree_cons (get_identifier ("omp declare target"),
+		   NULL_TREE, DECL_ATTRIBUTES (var_decl));
+#endif
   s->sym->backend_decl = var_decl;
 
   next_s = s->next;


Patch ping (was Re: [C++ PATCH] __builtin_source_location ())

2019-12-03 Thread Jakub Jelinek
Hi!

On Fri, Nov 15, 2019 at 01:28:17PM +0100, Jakub Jelinek wrote:
> On Thu, Nov 14, 2019 at 08:34:26PM +0100, Jakub Jelinek wrote:
> > The following WIP patch implements __builtin_source_location (),
> > which returns const void pointer to a std::source_location::__impl
> > struct that is required to contain __file, __function, __line and __column
> > fields, the first two with const char * type, the latter some integral type.
> 
> Here is hopefully final version, with the hashing implemented,
> __file renamed to __file_name and __function to __function_name to match
> how the standard names the methods and with testsuite coverage.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I'd like to ping this patch.
I know David and Jonathan had comments on the patch and can try to
incorporate them, but more importantly I'd like to ask whether we want to
add this at all or wait till GCC 11 and add something else.
Yet another option would be make it clear that the builtin is a temporary
thing until we implement something different and then would be removed,
say by reporting error if the builtin is used anywhere but in default
argument of std::source_location::current or something similar; I mean if
the ABI is sound, but we might have __builtin_constexpr_force_static or
whatever else or some new expression form one day and replace the builtin
uses with that.

> 2019-11-15  Jakub Jelinek  
> 
>   * cp-tree.h (enum cp_tree_index): Add CPTI_SOURCE_LOCATION_IMPL.
>   (source_location_impl): Define.
>   (enum cp_built_in_function): Add CP_BUILT_IN_SOURCE_LOCATION.
>   (fold_builtin_source_location): Declare.
>   * cp-gimplify.c: Include output.h, file-prefix-map.h and cgraph.h.
>   (cp_gimplify_expr, cp_fold): Handle CP_BUILT_IN_SOURCE_LOCATION.
>   Formatting fix.
>   (get_source_location_impl): New function.
>   (struct source_location_table_entry,
>   struct source_location_table_entry_hash): New types.
>   (source_location_table, source_location_id): New variables.
>   (fold_builtin_source_location): New function.
>   * constexpr.c (cxx_eval_builtin_function_call): Handle
>   CP_BUILT_IN_SOURCE_LOCATION.
>   * tree.c (builtin_valid_in_constant_expr_p): Likewise.  Formatting
>   fix.
>   * decl.c (cxx_init_decl_processing): Register
>   __builtin_source_location.
>   * name-lookup.c (get_std_name_hint): Add source_location entry.
> 
>   * g++.dg/cpp2a/srcloc1.C: New test.
>   * g++.dg/cpp2a/srcloc2.C: New test.
>   * g++.dg/cpp2a/srcloc3.C: New test.
>   * g++.dg/cpp2a/srcloc4.C: New test.
>   * g++.dg/cpp2a/srcloc5.C: New test.
>   * g++.dg/cpp2a/srcloc6.C: New test.
>   * g++.dg/cpp2a/srcloc7.C: New test.
>   * g++.dg/cpp2a/srcloc8.C: New test.
>   * g++.dg/cpp2a/srcloc9.C: New test.
>   * g++.dg/cpp2a/srcloc10.C: New test.
>   * g++.dg/cpp2a/srcloc11.C: New test.
>   * g++.dg/cpp2a/srcloc12.C: New test.
>   * g++.dg/cpp2a/srcloc13.C: New test.
>   * g++.dg/cpp2a/srcloc14.C: New test.

Jakub



Re: [Patch] Add OpenACC 2.6's no_create

2019-12-03 Thread Thomas Schwinge
Hi!

Jakub, please note question below.

On 2019-11-15T20:11:29+0100, Tobias Burnus  wrote:
> updated version. Changes:
> * Incorporate Thomas's changes
> * Add no_create clause to newly added 'acc serial'
> * Renamed (G)OMP_MAP_NO_ALLOC to (G)OMP_MAP_IF_PRESENT as proposed
> * Make no_create.c effective by adding 'has_firstprivate = true;' to 
> target.c.*

Thanks.

> (* If one tries to access c or e in the no_create-3.{c,f90} run-time 
> test case, plugin-nvidia rightly complains (illegal memory access), 
> using the created 'b' or 'd' works as tested by the test case.

So that's specifically what you fixed above, or is that another problem?

> This 
> feature seems to be also broken on the OG9 branch.)

Not surprising, given the insufficient testsuite coverage...  ;'-|

I note that you've not addressed the other TODO items that I had put into
the libgomp memory mapping code (see below for reference).  I still think
that this should be understood better, that the code as currently
proposed/discussed is "too complex".  I have an idea how to do this
differently (easier?), but I still have to sketch that out, and not sure
when I'll get to that.  I'm willing to accept that patch as-is, unless
Jakub has any further comments at this point.


Another thing: I've added just another little bit of testsuite coverage,
and another thing broke.  See "TODO" in attached incremental patch.
(Please rename the files appropriately.)  Please have a look.

This feels like something going wrong in gimplification, when we "Look in
outer OpenACC contexts, to see if there's a data attribute for this
variable" ('gcc/gimplify.c:omp_notice_variable'), but that's just a wild
guess.  If you agree/understand that there is a problem, and add some
XFAILed 'gimple' tree-scanning test cases (maybe even just to the libgomp
test cases that I've added), I'm fine to accept that XFAILed, to be
resolved later.

Maybe even that's not specific to the 'no_create' clause, just doesn't
cause any harm (given the existing testsuite...) for other OpenACC
constructs/clauses?


The incremental Fortran test case changes have bene done in a rush; not
sure if they make much sense, or should see some further work applied to
them.


With these items considered/addressed as you feel comfortable, this is OK
for trunk.  To record the review effort, please include "Reviewed-by:
Thomas Schwinge " in the commit log, see
.


> PS: Remaining bits of the OG9 patch, which are not included are the 
> following. I think those are all attach/detach features: a test case for 
> "no_create(s.y…)" (i.e. the struct component-ref; 
> libgomp/testsuite/libgomp.oacc-c-c++-common/nocreate-{3,4}.c) and some 
> 'do_detach = false' in libgomp/target.c. Cf. openacc-gcc-9 /…-8 branch 
> patch is commit 8e74c2ec2b90819c995444370e742864a685209f of Dec 20, 
> 2018. It has been posted as 
> https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01418.html


The libgomp memory mapping code:

> Add OpenACC 2.6 `no_create' clause support
>
> The clause makes any device code use the local memory address for each
> of the variables specified unless the given variable is already present
> on the current device.

> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h
> @@ -75,6 +75,8 @@ enum gomp_map_kind
>  GOMP_MAP_DEVICE_RESIDENT =   (GOMP_MAP_FLAG_SPECIAL_1 | 1),
>  /* OpenACC link.  */
>  GOMP_MAP_LINK =  (GOMP_MAP_FLAG_SPECIAL_1 | 2),
> +/* Use device data if present, fall back to host address otherwise.  */
> +GOMP_MAP_IF_PRESENT =(GOMP_MAP_FLAG_SPECIAL_1 | 3),
>  /* Do not map, copy bits for firstprivate instead.  */
>  GOMP_MAP_FIRSTPRIVATE =  (GOMP_MAP_FLAG_SPECIAL | 0),
>  /* Similarly, but store the value in the pointer rather than

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -667,6 +667,13 @@ gomp_map_vars_internal (struct gomp_device_descr 
> *devicep,
> has_firstprivate = true;
> continue;
>   }
> +  else if ((kind & typemask) == GOMP_MAP_IF_PRESENT)
> + {
> +   tgt->list[i].key = NULL;
> +   tgt->list[i].offset = 0;
> +   has_firstprivate = true;
> +   continue;
> + }
>cur_node.host_start = (uintptr_t) hostaddrs[i];
>if (!GOMP_MAP_POINTER_P (kind & typemask))
>   cur_node.host_end = cur_node.host_start + sizes[i];
> @@ -892,6 +899,49 @@ gomp_map_vars_internal (struct gomp_device_descr 
> *devicep,
>   cur_node.tgt_offset = n->tgt->tgt_start + n->tgt_offset
> + cur_node.host_start - n->host_start;
>   continue;
> +   case GOMP_MAP_IF_PRESENT:
> + {
> +   cur_node.host_start = (uintptr_t) hostaddrs[i];
> +   cur_node.host_end = cur_node.host_start + sizes[i];
> +   splay_tree_key n = splay_tree_lookup (mem_map, _node);
> +   if (n != NULL)
> +  

Re: [PATCH][AArch64] Add support for fused compare and branch

2019-12-03 Thread Wilco Dijkstra
Hi Richard,

> But what uses CMP_BRANCH after the patch?  It looked like you renamed
> all existing uses and didn't add any new ones.

My next patch will be adding uses of it now I've done some benchmarking
to decide when to turn it on.

>> +  && reg_referenced_p (SET_DEST (curr_set), PATTERN (curr)))

> Looks like this should be prev_set.  But this condition will trigger
> for any prev_insn that is only being kept around for its effect on the
> flags, not just CMP/CMN/TST/BICS.  If it's only supposed to be those
> four insns then I think we should test for them explicitly.

Yes that should have been prev_set indeed. I've tightened up the
condition to check for integer comparisons (which matches CMP,
CMN, TST and BICS). Updated patch below.

Cheers,
Wilco


[PATCH v2][AArch64] Add support for fused compare and branch

Add support for fused compare with branch.  Rename the existing
AARCH64_FUSE_CMP_BRANCH to ALU_BRANCH, and AARCH64_FUSE_ALU_BRANCH
to ALU_CBZ to make it clear what is being fused.

AArch64 bootstrap OK, OK to commit?

ChangeLog:

2019-12-03  Wilco Dijkstra  

* config/aarch64/aarch64.c
(thunderxt88_tunings): Use AARCH64_FUSE_ALU_BRANCH.
(thunderx_tunings): Likewise.
(tsv110_tunings): Use AARCH64_FUSE_ALU_BRANCH and AARCH64_FUSE_ALU_CBZ.
(thunderx2t99_tunings): Likewise.
(aarch_macro_fusion_pair_p): Add support for AARCH64_FUSE_CMP_BRANCH.
* config/aarch64/aarch64-fusion-pairs.def: Add ALU_CBZ fusion.

--

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index 
ce4bb92d5c9d1f187c026b1a714e485a2b9f1a74..051009b42b2db4e79a8b302fd3f1b65dedfdba8f
 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -35,5 +35,6 @@ AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
 AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
+AARCH64_FUSION_PAIR ("alu+cbz", ALU_CBZ)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
d0cbe13273f78ebac6f3facc983ca0dec1a43966..942413e0a377603be98a8c547db262eb75baed48
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -915,7 +915,7 @@ static const struct tune_params thunderxt88_tunings =
   SVE_NOT_IMPLEMENTED, /* sve_width  */
   6, /* memmov_cost  */
   2, /* issue_rate  */
-  AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
+  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
   "8", /* function_align.  */
   "8", /* jump_align.  */
   "8", /* loop_align.  */
@@ -941,7 +941,7 @@ static const struct tune_params thunderx_tunings =
   SVE_NOT_IMPLEMENTED, /* sve_width  */
   6, /* memmov_cost  */
   2, /* issue_rate  */
-  AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
+  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
   "8", /* function_align.  */
   "8", /* jump_align.  */
   "8", /* loop_align.  */
@@ -968,8 +968,8 @@ static const struct tune_params tsv110_tunings =
   SVE_NOT_IMPLEMENTED, /* sve_width  */
   4,/* memmov_cost  */
   4,/* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH
-   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
   "16", /* function_align.  */
   "4",  /* jump_align.  */
   "8",  /* loop_align.  */
@@ -1103,8 +1103,8 @@ static const struct tune_params thunderx2t99_tunings =
   SVE_NOT_IMPLEMENTED, /* sve_width  */
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
-   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
+  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
   "16",/* function_align.  */
   "8", /* jump_align.  */
   "16",/* loop_align.  */
@@ -20372,7 +20372,16 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
 }
 }
 
+  /* Fuse compare (CMP/CMN/TST/BICS) and conditional branch.  */
   if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_BRANCH)
+  && prev_set && curr_set && any_condjump_p (curr)
+  && GET_CODE (SET_SRC (prev_set)) == COMPARE
+  && SCALAR_INT_MODE_P (GET_MODE (XEXP (SET_SRC (prev_set), 0)))
+  && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
+return true;
+
+  /* Fuse flag-setting ALU instructions and conditional branch.  */
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
   && any_condjump_p (curr))
 {
   unsigned int condreg1, condreg2;
@@ -20396,9 +20405,10 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
}
 }
 
+  /* Fuse ALU instructions and CBZ/CBNZ.  */
   if (prev_set
   && curr_set
-  && aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_CBZ)
   && any_condjump_p (curr))
 {
   /* We're 

Re: Which OpenACC 'acc_device_t' to use for AMD GPU offloading

2019-12-03 Thread Tobias Burnus
Regarding the  acc_device_, I want to observe that there is no 
fundamental reason that one cannot have multiple names which resolve to 
the same constant.


Thus, one could add acc_device_radeon while keeping acc_device_gcn. 
Whether this makes sense or causes even more confusion is another question.


Searching the internet, acc_device_gcn seems to GCC specific while 
acc_device_radeon does pop up, but not that often the documents are from 
2014/2015 – and mentionAMD Radeon 7970, AMD Radeon 7990.


[On one slide of a super-computing center, they listed what PGI back 
then did define. For completeness, that was: acc_device_none = 0, 
acc_device_default = 1, acc_device_host = 2, acc_device_not_host = 3, 
acc_device_nvidia = 4,acc_device_radeon = 5, acc_device_xeonphi = 6, 
acc_device_pgi_opencl = 7, acc_device_nvidia_opencl = 8, 
acc_device_opencl = 9.]


Tobias

On 12/3/19 3:42 PM, Thomas Schwinge wrote:

And, as I'd mentioned, it's in the OpenACC specification, but it's a
"recommendation", so if you're not comfortable with 'acc_device_radeon',
then it's not too late to change the specification.  Think of it from a
user's perspective, as I'd suggested.

I had a look, and 'acc_device_radeon' (some kind of "brand" name, not
specific to GPUs even?) has been present in the OpenACC specification for
so long that from my archives, I can't tell who introduced it, and what
the rationale was, given that 'acc_device_nvidia' (hardware vendor name)
probably already did exist.

As one additional data point: there once also existed a
'acc_device_xeonphi', which got removed two years ago "since Intel no
longer produces this product".


Re: [PATCH 2/4] Validate acc_device_t uses

2019-12-03 Thread Harwath, Frederik
Hi Thomas,

On 03.12.19 13:14, Thomas Schwinge wrote:
> You once had this patch separate, but then merged into the upstream
> submission of 'acc_get_property'; let's please keep this separate.
> 
> With changes as indicated below, please commit this to trunk [...]

Ok, I have committed the patch as revision 278936. You can find the committed 
version in the attachment. Thank you for the review!

> Generally, does usage of these functions obsolete some existing usage of
> 'acc_dev_num_out_of_range'?  (OK to address later.)

I think it does. I am going to verify this.

>> @@ -168,7 +184,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
>>break;
>>  
>>  default:
>> -  if (d > _ACC_device_hwm)
>> +  if (!acc_known_device_type (d))
>>  {
>>if (fail_is_error)
>>  goto unsupported_device;
> 
> Note that this had 'd > _ACC_device_hwm', not '>=' as it now does, that
> is, previously didn't reject 'd == _ACC_device_hwm' but now does -- but I
> suppose this was an (minor) bug that existed before, so OK to change as
> you did?
Right, I do not see any reasons why it should accept ACC_device_hwm and
the change did not cause any regressions.

Best regards,
Frederik



r278937 | frederik | 2019-12-03 15:38:54 +0100 (Di, 03 Dez 2019) | 25 lines

Validate acc_device_t uses

Check that function arguments of type acc_device_t
are valid enumeration values in all publicly visible
functions from oacc-init.c.

2019-12-03  Frederik Harwath  

libgomp/
* oacc-init.c (acc_known_device_type): Add function.
(unknown_device_type_error): Add function.
(name_of_acc_device_t): Change to call unknown_device_type_error
on unknown type.
(resolve_device): Use acc_known_device_type.
(acc_init): Fail if acc_device_t argument is not valid.
(acc_shutdown): Likewise.
(acc_get_num_devices): Likewise.
(acc_set_device_type): Likewise.
(acc_get_device_num): Likewise.
(acc_set_device_num): Likewise.
(acc_on_device): Add comment that argument validity is not checked.

Reviewed-by: Thomas Schwinge 



Index: libgomp/oacc-init.c
===
--- libgomp/oacc-init.c	(revision 278936)
+++ libgomp/oacc-init.c	(working copy)
@@ -82,6 +82,18 @@
   gomp_mutex_unlock (_device_lock);
 }
 
+static bool
+known_device_type_p (acc_device_t d)
+{
+  return d >= 0 && d < _ACC_device_hwm;
+}
+
+static void
+unknown_device_type_error (acc_device_t invalid_type)
+{
+  gomp_fatal ("unknown device type %u", invalid_type);
+}
+
 /* OpenACC names some things a little differently.  */
 
 static const char *
@@ -103,8 +115,9 @@
 case acc_device_host: return "host";
 case acc_device_not_host: return "not_host";
 case acc_device_nvidia: return "nvidia";
-default: gomp_fatal ("unknown device type %u", (unsigned) type);
+default: unknown_device_type_error (type);
 }
+  __builtin_unreachable ();
 }
 
 /* ACC_DEVICE_LOCK must be held before calling this function.  If FAIL_IS_ERROR
@@ -123,7 +136,7 @@
 	if (goacc_device_type)
 	  {
 	/* Lookup the named device.  */
-	while (++d != _ACC_device_hwm)
+	while (known_device_type_p (++d))
 	  if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
   get_openacc_name (dispatchers[d]->name))
@@ -147,7 +160,7 @@
 
 case acc_device_not_host:
   /* Find the first available device after acc_device_not_host.  */
-  while (++d != _ACC_device_hwm)
+  while (known_device_type_p (++d))
 	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
 	  goto found;
   if (d_arg == acc_device_default)
@@ -168,7 +181,7 @@
   break;
 
 default:
-  if (d > _ACC_device_hwm)
+  if (!known_device_type_p (d))
 	{
 	  if (fail_is_error)
 	goto unsupported_device;
@@ -505,6 +518,9 @@
 void
 acc_init (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   gomp_init_targets_once ();
 
   gomp_mutex_lock (_device_lock);
@@ -519,6 +535,9 @@
 void
 acc_shutdown (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   gomp_init_targets_once ();
 
   gomp_mutex_lock (_device_lock);
@@ -533,6 +552,9 @@
 int
 acc_get_num_devices (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   int n = 0;
   struct gomp_device_descr *acc_dev;
 
@@ -564,6 +586,9 @@
 void
 acc_set_device_type (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   struct gomp_device_descr *base_dev, *acc_dev;
   struct goacc_thread *thr = goacc_thread ();
 
@@ -647,12 +672,12 @@
 int
 acc_get_device_num (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   const struct 

Re: Don't install unnecessary ARRAY_REF element sizes

2019-12-03 Thread Richard Biener
On Mon, Dec 2, 2019 at 5:33 PM Richard Sandiford
 wrote:
>
> Even EXACT_DIV_EXPR doesn't distribute across addition for wrapping
> types, so in general we can't fold EXACT_DIV_EXPRs of POLY_INT_CSTs
> at compile time.  This was causing an ICE when trying to gimplify the
> element size field in an ARRAY_REF.
>
> If the result of that EXACT_DIV_EXPR is an invariant, we don't bother
> recording it in the ARRAY_REF and simply read the element size from the
> element type.  This avoids the overhead of doing:
>
>   /* ??? tree_ssa_useless_type_conversion will eliminate casts to
>  sizetype from another type of the same width and signedness.  */
>   if (TREE_TYPE (aligned_size) != sizetype)
> aligned_size = fold_convert_loc (loc, sizetype, aligned_size);
>   return size_binop_loc (loc, MULT_EXPR, aligned_size,
>  size_int (TYPE_ALIGN_UNIT (elmt_type)));
>
> each time array_ref_element_size is called.
>
> So rather than read array_ref_element_size, do some arithmetic on it,
> and only then check whether the result is an invariant, we might as
> well check whether the element size is an invariant to start with.
> We're then directly testing whether array_ref_element_size gives
> a reusable value.
>
> For consistency, the patch makes the same change for the offset field
> in a COMPONENT_REF, although I don't think that can trigger yet.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-12-02  Richard Sandiford  
>
> gcc/
> * gimplify.c (gimplify_compound_lval): Don't gimplify and install
> an array element size if array_element_size is already an invariant.
> Similarly don't gimplify and install a field offset if
> component_ref_field_offset is already an invariant.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/acle/general-c/struct_1.c: New test.
>
> Index: gcc/gimplify.c
> ===
> --- gcc/gimplify.c  2019-11-22 09:57:52.178273436 +
> +++ gcc/gimplify.c  2019-12-02 16:32:08.063499557 +
> @@ -2987,17 +2987,18 @@ gimplify_compound_lval (tree *expr_p, gi
>
>   if (TREE_OPERAND (t, 3) == NULL_TREE)
> {
> - tree elmt_type = TREE_TYPE (TREE_TYPE (TREE_OPERAND (t, 0)));
> - tree elmt_size = unshare_expr (array_ref_element_size (t));
> - tree factor = size_int (TYPE_ALIGN_UNIT (elmt_type));
> -
> - /* Divide the element size by the alignment of the element
> -type (above).  */
> - elmt_size
> -   = size_binop_loc (loc, EXACT_DIV_EXPR, elmt_size, factor);
> -
> + tree elmt_size = array_ref_element_size (t);
>   if (!is_gimple_min_invariant (elmt_size))
> {
> + elmt_size = unshare_expr (elmt_size);
> + tree elmt_type = TREE_TYPE (TREE_TYPE (TREE_OPERAND (t, 
> 0)));
> + tree factor = size_int (TYPE_ALIGN_UNIT (elmt_type));
> +
> + /* Divide the element size by the alignment of the element
> +type (above).  */
> + elmt_size = size_binop_loc (loc, EXACT_DIV_EXPR,
> + elmt_size, factor);
> +
>   TREE_OPERAND (t, 3) = elmt_size;
>   tret = gimplify_expr (_OPERAND (t, 3), pre_p,
> post_p, is_gimple_reg,
> @@ -3017,16 +3018,18 @@ gimplify_compound_lval (tree *expr_p, gi
>   /* Set the field offset into T and gimplify it.  */
>   if (TREE_OPERAND (t, 2) == NULL_TREE)
> {
> - tree offset = unshare_expr (component_ref_field_offset (t));
> - tree field = TREE_OPERAND (t, 1);
> - tree factor
> -   = size_int (DECL_OFFSET_ALIGN (field) / BITS_PER_UNIT);
> -
> - /* Divide the offset by its alignment.  */
> - offset = size_binop_loc (loc, EXACT_DIV_EXPR, offset, factor);
> -
> + tree offset = component_ref_field_offset (t);
>   if (!is_gimple_min_invariant (offset))
> {
> + offset = unshare_expr (offset);
> + tree field = TREE_OPERAND (t, 1);
> + tree factor
> +   = size_int (DECL_OFFSET_ALIGN (field) / BITS_PER_UNIT);
> +
> + /* Divide the offset by its alignment.  */
> + offset = size_binop_loc (loc, EXACT_DIV_EXPR,
> +  offset, factor);
> +
>   TREE_OPERAND (t, 2) = offset;
>   tret = gimplify_expr (_OPERAND (t, 2), pre_p,
> post_p, is_gimple_reg,
> Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/struct_1.c
> ===
> --- /dev/null  

Re: Mark constant-sized objects as addressable if they have poly-int accesses

2019-12-03 Thread Richard Biener
On Mon, Dec 2, 2019 at 5:30 PM Richard Sandiford
 wrote:
>
> [finally getting back to this]
>
> Richard Biener  writes:
> > On Fri, Nov 8, 2019 at 10:40 AM Richard Sandiford
> >  wrote:
> >>
> >> If SVE code is written for a specific vector length, it might load from
> >> or store to fixed-sized objects.  This needs to work even without
> >> -msve-vector-bits=N (which should never be needed for correctness).
> >>
> >> There's no way of handling a direct poly-int sized reference to a
> >> fixed-size register; it would have to go via memory.  And in that
> >> case it's more efficient to mark the fixed-size object as
> >> addressable from the outset, like we do for array references
> >> with non-constant indices.
> >>
> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >
> > Hmm, shouldn't you somehow avoid walking subtrees again like
> > the array-ref cases?  Do "intermediate" types really matter here?
> > Thus if you have BIT_FIELDREF  > fixed-size-decl>,
> > select first element > do you really need it addressable?
> >
> > It seems to me you want to check it only on the actual reference type.
>
> Yeah, I guess it was overly general.  I don't think it can happen
> for anything other than the MEM_REF/TARGET_MEM_REF itself.
>
> How about this version?  Tested as before.

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-12-02  Richard Sandiford  
>
> gcc/
> * cfgexpand.c (discover_nonconstant_array_refs_r): If an access
> with POLY_INT_CST size is made to a fixed-size object, force the
> object to live in memory.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/acle/general/deref_1.c: New test.
>
> Index: gcc/cfgexpand.c
> ===
> --- gcc/cfgexpand.c 2019-11-13 08:42:44.901373255 +
> +++ gcc/cfgexpand.c 2019-12-02 16:27:45.157295080 +
> @@ -6133,6 +6133,21 @@ discover_nonconstant_array_refs_r (tree
>
>*walk_subtrees = 0;
>  }
> +  /* References of size POLY_INT_CST to a fixed-size object must go
> + through memory.  It's more efficient to force that here than
> + to create temporary slots on the fly.  */
> +  else if ((TREE_CODE (t) == MEM_REF || TREE_CODE (t) == TARGET_MEM_REF)
> +  && TYPE_SIZE (TREE_TYPE (t))
> +  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
> +{
> +  tree base = get_base_address (t);
> +  if (base
> + && DECL_P (base)
> + && DECL_MODE (base) != BLKmode
> + && GET_MODE_SIZE (DECL_MODE (base)).is_constant ())
> +   TREE_ADDRESSABLE (base) = 1;
> +  *walk_subtrees = 0;
> +}
>
>return NULL_TREE;
>  }
> Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_1.c
> ===
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_1.c 2019-12-02 
> 16:27:45.165295026 +
> @@ -0,0 +1,25 @@
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +uint64_t
> +f1 (int32_t *x, int32_t *y)
> +{
> +  union { uint64_t x; char c[8]; } u;
> +  svbool_t pg = svptrue_b32 ();
> +  *(svbool_t *)[0] = svcmpeq (pg, svld1 (pg, x), 0);
> +  *(svbool_t *)[4] = svcmpeq (pg, svld1 (pg, y), 1);
> +  return u.x;
> +}
> +
> +typedef unsigned int v4si __attribute__((vector_size(16)));
> +
> +/* The aliasing is somewhat dubious here, but it must compile.  */
> +
> +v4si
> +f2 (void)
> +{
> +  v4si res;
> +  *(svuint32_t *)  = svindex_u32 (0, 1);
> +  return res;
> +}


[PATCH] Revert to uniform vector CTOR canonicalization

2019-12-03 Thread Richard Biener


The PR shows I added the bail-out prematurely.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-12-03  Richard Biener  

PR tree-optimization/92758
* tree-ssa-forwprop.c (simplify_vector_constructor): Restore
operation on uniform vectors.

Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c (revision 278930)
+++ gcc/tree-ssa-forwprop.c (working copy)
@@ -2043,8 +2043,7 @@ simplify_vector_constructor (gimple_stmt
   gcc_checking_assert (TREE_CODE (op) == CONSTRUCTOR
   && TREE_CODE (type) == VECTOR_TYPE);
 
-  if (!TYPE_VECTOR_SUBPARTS (type).is_constant ()
-  || uniform_vector_p (op))
+  if (!TYPE_VECTOR_SUBPARTS (type).is_constant ())
 return false;
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));


Re: Which OpenACC 'acc_device_t' to use for AMD GPU offloading

2019-12-03 Thread Thomas Schwinge
Hi!

On 2019-12-03T14:20:13+, Julian Brown  wrote:
> On Tue, 3 Dec 2019 10:32:57 +0100
> Thomas Schwinge  wrote:
>> On 2019-12-02T14:50:42+, Julian Brown 
>> wrote:
>> > On Mon, 2 Dec 2019 15:43:29 +0100
>> > Thomas Schwinge  wrote:
>> >  
>> >> > --- a/libgomp/openacc.h
>> >> > +++ b/libgomp/openacc.h
>> >> > @@ -55,6 +55,7 @@ typedef enum acc_device_t {
>> >> >/* acc_device_host_nonshm = 3 removed.  */
>> >> >acc_device_not_host = 4,
>> >> >acc_device_nvidia = 5,
>> >> > +  acc_device_gcn = 8,
>> >> 
>> >> There is no 'acc_device_gcn' in OpenACC.  
>> 
>> My point is, the OpenACC specification defines 'acc_device_t', and
>> we're now adding/using a non-standard value, 'acc_device_gcn'.
>> Depending on how you read the specification, it may be allowed for an
>> implementation to provide additional/different values, but for good
>> reason, OpenACC 3.0, A. "Recommendations for Implementors", still
>> "gives recommendations for standard names [...] to use for
>> implementations for specific targets and target platforms, to promote
>> portability across such implementations".
>> 
>> >> Per OpenACC 3.0, A.1.2. "AMD
>> >> GPU Targets", for example, there is 'acc_device_radeon' (and "the
>> >> case-insensitive name 'radeon' for the environment variable
>> >> 'ACC_DEVICE_TYPE'").  If that is not appropriate to use (I have not
>> >> read up how exactly AMD's "GCN" and "radeon" relate to each other),
>> >> we should get that clarified in the OpenACC specification.  
>> >
>> > FWIW, I'm pretty sure there are Radeon devices that do not use the
>> > GCN ISA.  
>> 
>> But does an OpenACC user really care?  Are users likely to state: "I
>> have a GCN card", or rather: "I have an AMD GPU card", or even just:
>> "I have a card with a big AMD logo on it"?
>> 
>> Admittedly, users are probably unlikely to state: "I have a Radeon
>> card", heh?  ;-) (But it's still the standard name defined in the
>> OpenACC specification.)
>> 
>> > OTOH, there are also Nvidia devices that are not PTX-compatible.  
>> 
>> (But not any recent (as in a few years old) ones, capable for GPGPU
>> computing, are there?)
>> 
>> But again: "I have a card with a big Nvidia logo on it" is probably
>> what users care about, not whether that is internally using PTX or
>> anything else, which then is the implementation's job to sort out,
>> when 'acc_device_nvidia' is requested by the user.
>> 
>> > No strong opinion on the acc_device_foo name from me, though (maybe
>> > "acc_device_amdgcn"?).  
>> 
>> Once/if we have established that the standard 'acc_device_radeon' is
>> not suitable for us here, only then we should think of a new name, in
>> my opinion.  You wouldn't just add a 'copy_mem' function to glibc
>> given that 'memcpy' already exists?  ;-)
>
> Oops, I clearly didn't read that email carefully! Since this is a
> user-visible, not internal-only thing and there is a recommended name
> given in the spec, we should use that name. Still, it's a pity the
> official naming is such a jumble.

Yeah.

And, as I'd mentioned, it's in the OpenACC specification, but it's a
"recommendation", so if you're not comfortable with 'acc_device_radeon',
then it's not too late to change the specification.  Think of it from a
user's perspective, as I'd suggested.

I had a look, and 'acc_device_radeon' (some kind of "brand" name, not
specific to GPUs even?) has been present in the OpenACC specification for
so long that from my archives, I can't tell who introduced it, and what
the rationale was, given that 'acc_device_nvidia' (hardware vendor name)
probably already did exist.

As one additional data point: there once also existed a
'acc_device_xeonphi', which got removed two years ago "since Intel no
longer produces this product".


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: Do not update SSA in lto-stremaer-in

2019-12-03 Thread Richard Biener
On Tue, 3 Dec 2019, Jan Hubicka wrote:

> Hi,
> input_functions ends with building virtual SSA which is unnecesary
> excercise when function is only loaded to be inlined, compared by
> ipa-icf or its profile merged.
> 
> This patch moves the SSA update later before we start working on the
> function body.
> 
> lto-bootstrapped/regtested x86_64-linux, OK?

OK I guess, but please put the update_ssa call _after_
gimple_register_cfg_hooks () and obstrack register, right before
IPA transform execution in ::expand?



> Honza
> 
>   * cgraph.c: Inlcude tree-into-ssa.h
>   (cgraph_node::get_body): Update SSA.
>   * cgraphunit.c (cgraph_node::expand): Likewise.
>   * lto-streamer-in.c (input_function): Do not update SSA.
> Index: cgraph.c
> ===
> --- cgraph.c  (revision 278904)
> +++ cgraph.c  (working copy)
> @@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "selftest.h"
> +#include "tree-into-ssa.h"
>  
>  /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this. 
>  */
>  #include "tree-pass.h"
> @@ -3599,6 +3605,8 @@ cgraph_node::get_body (void)
>set_dump_file (NULL);
>  
>push_cfun (DECL_STRUCT_FUNCTION (decl));
> +
> +  update_ssa (TODO_update_ssa_only_virtuals);
>execute_all_ipa_transforms (true);
>cgraph_edge::rebuild_edges ();
>free_dominance_info (CDI_DOMINATORS);
> Index: cgraphunit.c
> ===
> --- cgraphunit.c  (revision 278904)
> +++ cgraphunit.c  (working copy)
> @@ -2268,6 +2268,7 @@ cgraph_node::expand (void)
>  
>gcc_assert (DECL_STRUCT_FUNCTION (decl));
>push_cfun (DECL_STRUCT_FUNCTION (decl));
> +  update_ssa (TODO_update_ssa_only_virtuals);
>init_function_start (decl);
>  
>gimple_register_cfg_hooks ();
> Index: lto-streamer-in.c
> ===
> --- lto-streamer-in.c (revision 278904)
> +++ lto-streamer-in.c (working copy)
> @@ -1223,7 +1223,6 @@ input_function (tree fn_decl, class data
>fixup_call_stmt_edges (node, stmts);
>execute_all_ipa_stmt_fixups (node, stmts);
>  
> -  update_ssa (TODO_update_ssa_only_virtuals);
>free_dominance_info (CDI_DOMINATORS);
>free_dominance_info (CDI_POST_DOMINATORS);
>free (stmts);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: C++ PATCH for c++/91363 - P0960R3: Parenthesized initialization of aggregates

2019-12-03 Thread Marek Polacek
On Tue, Dec 03, 2019 at 02:01:24AM -0500, Jason Merrill wrote:
> On 12/2/19 7:31 PM, Marek Polacek wrote:
> > @@ -1967,8 +1978,23 @@ expand_default_init (tree binfo, tree true_exp, tree 
> > exp, tree init, int flags,
> > tree ctor_name = (true_exp == exp
> > ? complete_ctor_identifier : base_ctor_identifier);
> > +  /* Given class A,
> > +
> > +  A a(1, 2);
> > +
> > +can mean a call to a constructor A::A(int, int), if present.  If not,
> > +but A is an aggregate, we will try aggregate initialization.  */
> > rval = build_special_member_call (exp, ctor_name, , binfo, 
> > flags,
> > complain);
> > +  if (BRACE_ENCLOSED_INITIALIZER_P (rval))
> > +   {
> > + gcc_assert (cxx_dialect >= cxx2a);
> > + rval = digest_init (type, rval, tf_warning_or_error);
> > + rval = build2 (INIT_EXPR, type, exp, rval);
> > + /* So that we do finish_expr_stmt below.  Don't return here, we
> > +need to release PARMS.  */
> > + TREE_SIDE_EFFECTS (rval) = 1;
> > +   }
> 
> Can we still get a CONSTRUCTOR here after the change to
> build_new_method_call_1?

Oop, not anymore.  Dropped the whole cp/init.c hunk.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-12-03  Marek Polacek  

PR c++/91363 - P0960R3: Parenthesized initialization of aggregates.
* c-cppbuiltin.c (c_cpp_builtins): Predefine
__cpp_aggregate_paren_init=201902 for -std=c++2a.

* call.c (build_new_method_call_1): Handle parenthesized initialization
of aggregates by building up a CONSTRUCTOR.
(extend_ref_init_temps): Do nothing for CONSTRUCTOR_IS_PAREN_INIT.
* cp-tree.h (CONSTRUCTOR_IS_PAREN_INIT, LOOKUP_AGGREGATE_PAREN_INIT):
Define.
* decl.c (grok_reference_init): Handle aggregate initialization from
a parenthesized list of values.
(reshape_init): Do nothing for CONSTRUCTOR_IS_PAREN_INIT.
(check_initializer): Handle initialization of an array from a
parenthesized list of values.  Use NULL_TREE instead of NULL.
* tree.c (build_cplus_new): Handle BRACE_ENCLOSED_INITIALIZER_P.
* typeck2.c (digest_init_r): Set LOOKUP_AGGREGATE_PAREN_INIT if it
receives a CONSTRUCTOR with CONSTRUCTOR_IS_PAREN_INIT set.  Allow
narrowing when LOOKUP_AGGREGATE_PAREN_INIT.
(massage_init_elt): Don't lose LOOKUP_AGGREGATE_PAREN_INIT when passing
flags to digest_init_r.

* g++.dg/cpp0x/constexpr-99.C: Only expect an error in C++17 and
lesser.
* g++.dg/cpp0x/explicit7.C: Likewise.
* g++.dg/cpp0x/initlist12.C: Adjust dg-error.
* g++.dg/cpp0x/pr31437.C: Likewise.
* g++.dg/cpp2a/feat-cxx2a.C: Add __cpp_aggregate_paren_init test.
* g++.dg/cpp2a/paren-init1.C: New test.
* g++.dg/cpp2a/paren-init10.C: New test.
* g++.dg/cpp2a/paren-init11.C: New test.
* g++.dg/cpp2a/paren-init12.C: New test.
* g++.dg/cpp2a/paren-init13.C: New test.
* g++.dg/cpp2a/paren-init14.C: New test.
* g++.dg/cpp2a/paren-init15.C: New test.
* g++.dg/cpp2a/paren-init16.C: New test.
* g++.dg/cpp2a/paren-init17.C: New test.
* g++.dg/cpp2a/paren-init18.C: New test.
* g++.dg/cpp2a/paren-init19.C: New test.
* g++.dg/cpp2a/paren-init2.C: New test.
* g++.dg/cpp2a/paren-init3.C: New test.
* g++.dg/cpp2a/paren-init4.C: New test.
* g++.dg/cpp2a/paren-init5.C: New test.
* g++.dg/cpp2a/paren-init6.C: New test.
* g++.dg/cpp2a/paren-init7.C: New test.
* g++.dg/cpp2a/paren-init8.C: New test.
* g++.dg/cpp2a/paren-init9.C: New test.
* g++.dg/ext/desig10.C: Adjust dg-error.
* g++.dg/template/crash107.C: Likewise.
* g++.dg/template/crash95.C: Likewise.
* g++.old-deja/g++.jason/crash3.C: Likewise.
* g++.old-deja/g++.law/ctors11.C: Likewise.
* g++.old-deja/g++.law/ctors9.C: Likewise.
* g++.old-deja/g++.mike/net22.C: Likewise.
* g++.old-deja/g++.niklas/t128.C: Likewise.

diff --git gcc/c-family/c-cppbuiltin.c gcc/c-family/c-cppbuiltin.c
index 6491545bc3b..c7f4659456a 100644
--- gcc/c-family/c-cppbuiltin.c
+++ gcc/c-family/c-cppbuiltin.c
@@ -1006,6 +1006,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_impl_destroying_delete=201806L");
  cpp_define (pfile, "__cpp_constexpr_dynamic_alloc=201907L");
  cpp_define (pfile, "__cpp_impl_three_way_comparison=201907L");
+ cpp_define (pfile, "__cpp_aggregate_paren_init=201902L");
}
   if (flag_concepts)
 {
diff --git gcc/cp/call.c gcc/cp/call.c
index 062cff4c735..ea0e8b7e7d7 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -10124,6 +10124,38 @@ build_new_method_call_1 (tree instance, tree fns, 
vec **args,
 
   if (!any_viable_p)
 {
+  /* [dcl.init], 17.6.2.2:
+
+Otherwise, if no constructor 

Re: Which OpenACC 'acc_device_t' to use for AMD GPU offloading (was: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port)

2019-12-03 Thread Julian Brown
On Tue, 3 Dec 2019 10:32:57 +0100
Thomas Schwinge  wrote:

> Hi!
> 
> On 2019-12-02T14:50:42+, Julian Brown 
> wrote:
> > On Mon, 2 Dec 2019 15:43:29 +0100
> > Thomas Schwinge  wrote:
> >  
> >> > --- a/libgomp/openacc.h
> >> > +++ b/libgomp/openacc.h
> >> > @@ -55,6 +55,7 @@ typedef enum acc_device_t {
> >> >/* acc_device_host_nonshm = 3 removed.  */
> >> >acc_device_not_host = 4,
> >> >acc_device_nvidia = 5,
> >> > +  acc_device_gcn = 8,
> >> 
> >> There is no 'acc_device_gcn' in OpenACC.  
> 
> My point is, the OpenACC specification defines 'acc_device_t', and
> we're now adding/using a non-standard value, 'acc_device_gcn'.
> Depending on how you read the specification, it may be allowed for an
> implementation to provide additional/different values, but for good
> reason, OpenACC 3.0, A. "Recommendations for Implementors", still
> "gives recommendations for standard names [...] to use for
> implementations for specific targets and target platforms, to promote
> portability across such implementations".
> 
> >> Per OpenACC 3.0, A.1.2. "AMD
> >> GPU Targets", for example, there is 'acc_device_radeon' (and "the
> >> case-insensitive name 'radeon' for the environment variable
> >> 'ACC_DEVICE_TYPE'").  If that is not appropriate to use (I have not
> >> read up how exactly AMD's "GCN" and "radeon" relate to each other),
> >> we should get that clarified in the OpenACC specification.  
> >
> > FWIW, I'm pretty sure there are Radeon devices that do not use the
> > GCN ISA.  
> 
> But does an OpenACC user really care?  Are users likely to state: "I
> have a GCN card", or rather: "I have an AMD GPU card", or even just:
> "I have a card with a big AMD logo on it"?
> 
> Admittedly, users are probably unlikely to state: "I have a Radeon
> card", heh?  ;-) (But it's still the standard name defined in the
> OpenACC specification.)
> 
> > OTOH, there are also Nvidia devices that are not PTX-compatible.  
> 
> (But not any recent (as in a few years old) ones, capable for GPGPU
> computing, are there?)
> 
> But again: "I have a card with a big Nvidia logo on it" is probably
> what users care about, not whether that is internally using PTX or
> anything else, which then is the implementation's job to sort out,
> when 'acc_device_nvidia' is requested by the user.
> 
> > No strong opinion on the acc_device_foo name from me, though (maybe
> > "acc_device_amdgcn"?).  
> 
> Once/if we have established that the standard 'acc_device_radeon' is
> not suitable for us here, only then we should think of a new name, in
> my opinion.  You wouldn't just add a 'copy_mem' function to glibc
> given that 'memcpy' already exists?  ;-)

Oops, I clearly didn't read that email carefully! Since this is a
user-visible, not internal-only thing and there is a recommended name
given in the spec, we should use that name. Still, it's a pity the
official naming is such a jumble.

Julian


Do not update SSA in lto-stremaer-in

2019-12-03 Thread Jan Hubicka
Hi,
input_functions ends with building virtual SSA which is unnecesary
excercise when function is only loaded to be inlined, compared by
ipa-icf or its profile merged.

This patch moves the SSA update later before we start working on the
function body.

lto-bootstrapped/regtested x86_64-linux, OK?

Honza

* cgraph.c: Inlcude tree-into-ssa.h
(cgraph_node::get_body): Update SSA.
* cgraphunit.c (cgraph_node::expand): Likewise.
* lto-streamer-in.c (input_function): Do not update SSA.
Index: cgraph.c
===
--- cgraph.c(revision 278904)
+++ cgraph.c(working copy)
@@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.
 #include "stringpool.h"
 #include "attribs.h"
 #include "selftest.h"
+#include "tree-into-ssa.h"
 
 /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  
*/
 #include "tree-pass.h"
@@ -3599,6 +3605,8 @@ cgraph_node::get_body (void)
   set_dump_file (NULL);
 
   push_cfun (DECL_STRUCT_FUNCTION (decl));
+
+  update_ssa (TODO_update_ssa_only_virtuals);
   execute_all_ipa_transforms (true);
   cgraph_edge::rebuild_edges ();
   free_dominance_info (CDI_DOMINATORS);
Index: cgraphunit.c
===
--- cgraphunit.c(revision 278904)
+++ cgraphunit.c(working copy)
@@ -2268,6 +2268,7 @@ cgraph_node::expand (void)
 
   gcc_assert (DECL_STRUCT_FUNCTION (decl));
   push_cfun (DECL_STRUCT_FUNCTION (decl));
+  update_ssa (TODO_update_ssa_only_virtuals);
   init_function_start (decl);
 
   gimple_register_cfg_hooks ();
Index: lto-streamer-in.c
===
--- lto-streamer-in.c   (revision 278904)
+++ lto-streamer-in.c   (working copy)
@@ -1223,7 +1223,6 @@ input_function (tree fn_decl, class data
   fixup_call_stmt_edges (node, stmts);
   execute_all_ipa_stmt_fixups (node, stmts);
 
-  update_ssa (TODO_update_ssa_only_virtuals);
   free_dominance_info (CDI_DOMINATORS);
   free_dominance_info (CDI_POST_DOMINATORS);
   free (stmts);


"gcn" vs. "amdgcn" etc. (was: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port)

2019-12-03 Thread Thomas Schwinge
Hi!

On 2019-12-03T13:13:33+, Andrew Stubbs  wrote:
> On 02/12/2019 14:43, Thomas Schwinge wrote:
>> On 2019-11-12T13:29:13+, Andrew Stubbs  wrote:
>>> --- a/include/gomp-constants.h
>>> +++ b/include/gomp-constants.h
>>> @@ -174,6 +174,7 @@ enum gomp_map_kind
>>>   #define GOMP_DEVICE_NVIDIA_PTX5
>>>   #define GOMP_DEVICE_INTEL_MIC 6
>>>   #define GOMP_DEVICE_HSA   7
>>> +#define GOMP_DEVICE_GCN8
>> 
>> Unless we are to expect non-AMD GCN implementations (hardware), I'd favor
>> if all these "GCN" things were called "AMD GCN", like done for "Nvidia
>> PTX", or "Intel MIC", or why shouldn't we?
>
> Why do we do that for libgomp

I don't remember how that happened.

Now, I guess I understand 'GOMP_DEVICE_NVIDIA_PTX' to mean:
loading/executing PTX code by means of its "native" Nvidia mechanism
(CUDA Driver library).  There could then also be a
'GOMP_DEVICE_NOUVEAU_PTX' (or 'GOMP_DEVICE_MESA_PTX'?), for example.

> but not for GCC in general? I mean, it's 
> not "Intel i386" or "Renesas SuperH" (or should that be "Hitachi"?). The 
> only GCC port that has the company name is the "nv" in "nvptx" (ignoring 
> those where the company and architecture are the same).
>
> We use "amdgcn" for the target triplet, but it's just "gcn" almost 
> everywhere else,

You also used "amdgcn" in the tag in the email subject.  ;-P

> including the config directories.

Yeah, I guess that's what I find confusing here, that for most of all GCC
back ends (not all, however...) there seems to be consistency between the
"CPU" part of the target triplet and the corresponding GCC back end
name/directory:

$ cd gcc/config/ && for f in *; do test -d "$f"/ && echo "$f $( 
../../config.sub "$f" )"; done
aarch64 aarch64-unknown-none
alpha alpha-unknown-none
arc arc-unknown-none
arm arm-unknown-none
avr avr-unknown-none
bfin bfin-unknown-none
bpf bpf-unknown-none
c6x tic6x-unknown-coff
cr16 cr16-unknown-elf
cris cris-axis-none
csky csky-unknown-none
epiphany epiphany-unknown-none
fr30 fr30-unknown-none
frv frv-unknown-none
ft32 ft32-unknown-none
Invalid configuration `gcn': machine `gcn-unknown' not recognized
gcn 
h8300 h8300-unknown-none
i386 i386-pc-none
ia64 ia64-unknown-none
iq2000 iq2000-unknown-none
lm32 lm32-unknown-none
m32c m32c-unknown-none
m32r m32r-unknown-none
m68k m68k-unknown-none
mcore mcore-unknown-none
microblaze microblaze-xilinx-none
mips mips-unknown-elf
mmix mmix-knuth-mmixware
mn10300 mn10300-unknown-none
moxie moxie-unknown-none
msp430 msp430-unknown-none
nds32 nds32-unknown-none
nios2 nios2-unknown-none
nvptx nvptx-unknown-none
or1k or1k-unknown-none
Invalid configuration `pa': machine `pa-unknown' not recognized
pa 
pdp11 pdp11-dec-none
pru pru-unknown-elf
riscv riscv-unknown-none
rl78 rl78-unknown-none
rs6000 rs6000-ibm-aix
rx rx-unknown-none
s390 s390-ibm-aix
sh sh-unknown-none
sparc sparc-sun-sunos4.1.1
Invalid configuration `stormy16': machine `stormy16-unknown' not recognized
stormy16 
tilegx tilegx-unknown-linux-gnu
tilepro tilepro-unknown-linux-gnu
v850 v850-unknown-none
vax vax-dec-ultrix4.2
visium visium-unknown-none
vms vax-dec-vms
xtensa xtensa-unknown-none

We probably can't/shouldn't change 'amdgcn' in the target triplet now,
but as far as I'm concerned, it's not too late to change 'gcc/config/gcn'
etc., but I guess that won't happen: too much effort.  (And then, I don't
feel too stronly about it, but wanted to make my point anyway.)

And, I acknowledge that by some of the logic I presented above, indeed
'nvptx' should've been called 'ptx' to denote purely the PTX ISA, outside
of its Nvidia implementation.

Naming is hard.  ;-)


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-03 Thread Wilco Dijkstra
Hi,

Part 2, split off from https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00399.html

To enable cores to use the correct max_cond_insns setting, use the core-specific
tuning when a CPU/tune is selected unless -mrestrict-it is explicitly set.

On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a
0.4% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-12-03  Wilco Dijkstra  

* config/arm/arm.c (arm_option_override_internal):
Use max_cond_insns from CPU tuning unless -mrestrict-it is used.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
daebe76352d62ad94556762b4e3bc3d0532ad411..5ed9046988996e56f754c5588e4d25d5ecdd6b03
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3041,6 +3041,11 @@ arm_option_override_internal (struct gcc_options *opts,
   if (!TARGET_THUMB2_P (opts->x_target_flags) || !arm_arch_notm)
 opts->x_arm_restrict_it = 0;
 
+  /* Use the IT size from CPU specific tuning unless -mrestrict-it is used.  */
+  if (!opts_set->x_arm_restrict_it
+  && (opts_set->x_arm_cpu_string || opts_set->x_arm_tune_string))
+opts->x_arm_restrict_it = 0;
+
   /* Enable -munaligned-access by default for
  - all ARMv6 architecture-based processors when compiling for a 32-bit ISA
  i.e. Thumb2 and ARM state only.


Re: Add a new combine pass

2019-12-03 Thread Oleg Endo
On Mon, 2019-11-25 at 16:47 -0600, Segher Boessenkool wrote:
> 
> > > - sh (that's sh4-linux):
> > > 
> > > /home/segher/src/kernel/net/ipv4/af_inet.c: In function 
> > > 'snmp_get_cpu_field':
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to find 
> > > a register to spill in class 'R0_REGS'
> > >  1638 | }
> > >   | ^
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the 
> > > insn:
> > > (insn 18 17 19 2 (set (reg:SI 0 r0)
> > > (mem:SI (plus:SI (reg:SI 4 r4 [178])
> > > (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) 
> > > "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
> > >  (expr_list:REG_DEAD (reg:SI 4 r4 [178])
> > > (expr_list:REG_DEAD (reg:SI 6 r6 [171])
> > > (nil
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier 
> > > errors, bailing out
> > 
> > Would have to look more at this one.  Seems odd that it can't allocate
> > R0 when it's already the destination and when R0 can't be live before
> > the insn.  But there again, this is reload, so my enthuasiasm for looking
> > is a bit limited :-)
> 
> It wants to use r0 in some other insn, so it needs to spill it here, but
> cannot.  This is what class_likely_spilled is for.
> 

Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
explicitly, but it's got a R0_REGS class with only one said reg in it. 
So the default impl of class_likely_spilled should do its thing.

LRA is available on SH and often fixes the R0 problems -- but not
always.  Maybe it got better over time, haven't checked.

Could you re-run the SH build tests with -mlra, please ?

Cheers,
Oleg




Re: [patch, libgomp] Enable OpenACC GCN testing

2019-12-03 Thread Thomas Schwinge
Hi!

On 2019-12-03T12:56:49+, Andrew Stubbs  wrote:
> On 02/12/2019 14:19, Thomas Schwinge wrote:
>> Generally, I'm in favor if you'd consider such a thing (that in principle
>> is just a copy/adapt of the existing cases) as obvious to commit (even
>> more so with your "amdgcn port" maintainer hat on), especially so given
>> that this has been/is blocking you, as Tobias told me more than once.
>
> I probably will do for incremental tweaks, but I left some stuff in this 
> one to see if you were awake (that's my story and I'm sticking to it).

Haha!  ;-P


> Here's what I have committed (and I just realized I forgot the new 
> fangled reviewed-by tag, sorry).

No worries.  That's just an experiment anyway -- seems to work/have been
picked up for glibc, but not so much for GCC.


About the alphabetic sorting that I've mentioned: sometimes we do that,
sometimes we sort per 'GOMP_DEVICE_*'/'acc_device_t'/whatever ordering,
sometimes special cases go to the beginning/end, sometimes it's "in order
of appearance", or seemingly random, or any combination of all these.  So
there isn't a general guideline to follow.  However, to make reading the
code more easy/enjoyable, what I like to see, is at least some kind of
local consistency, instead of just always adding new stuff to the
end. (Where the latter might indeed be the right thing to do in certain
cases, given that 'GOMP_DEVICE_GCN' is the last one in the list.)


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-03 Thread Wilco Dijkstra
Hi Kyrill,

> Hmm, I'm not too confident on that. I'd support such a change for the 
> generic arm_cortex_tune, definitely, and the Armv8-a based ones, but I 
> don't think the argument is as strong for Cortex-A7, Cortex-A8, Cortex-A9.
>
> So let's make the change for the Armv8-A-based cores now. If you get 
> benchmarking data for the older ones (such systems may or may not be 
> easy to get a hold of) we can update those separately.

I ran some experiments on Cortex-A53 and this shows the difference between
2, 3 and 4 is less than for out-of-order cores (which clearly prefer 2).
So it seems alright to set it to 4 for the older in-order cores - see updated 
patch
below.

>>   Set max_cond_insns
>> to 4 on Thumb-2 architectures given it's already limited to that by
>> MAX_INSN_PER_IT_BLOCK.  Also use the CPU tuning setting when a CPU/tune
>> is selected if -mrestrict-it is not explicitly set.
>
> This can go in as a separate patch from the rest, thanks.

Sure, I'll split that part off into a separate patch.

Cheers,
Wilco

[PATCH v2][ARM] Improve max_cond_insns setting for Cortex cores

Various CPUs have max_cond_insns set to 5 due to historical reasons.
Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A
cores, so change it to 2. Set it to 4 on older in-order cores as that is
the MAX_INSN_PER_IT_BLOCK limit for Thumb-2.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-12-03  Wilco Dijkstra  

* config/arm/arm.c (arm_v6t2_tune): Set max_cond_insns to 4.
(arm_cortex_tune): Set max_cond_insns to 2.
(arm_cortex_a8_tune): Set max_cond_insns to 4.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a35_tune): Set max_cond_insns to 2.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a5_tune): Set max_cond_insns to 4.
(arm_cortex_a9_tune): Likewise.
(arm_v6m_tune): Likewise.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
a6b401b7f2e3738ff68316bd83d6e5a2bcf0e7d7..daebe76352d62ad94556762b4e3bc3d0532ad411
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1947,7 +1947,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1971,7 +1971,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1993,7 +1993,7 @@ const struct tune_params arm_cortex_a8_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2015,7 +2015,7 @@ const struct tune_params arm_cortex_a7_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2059,7 +2059,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2081,7 +2081,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* 

Re: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-12-03 Thread Andrew Stubbs

On 02/12/2019 14:43, Thomas Schwinge wrote:

Hi!

On 2019-11-12T13:29:13+, Andrew Stubbs  wrote:

--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -174,6 +174,7 @@ enum gomp_map_kind
  #define GOMP_DEVICE_NVIDIA_PTX5
  #define GOMP_DEVICE_INTEL_MIC 6
  #define GOMP_DEVICE_HSA   7
+#define GOMP_DEVICE_GCN8


Unless we are to expect non-AMD GCN implementations (hardware), I'd favor
if all these "GCN" things were called "AMD GCN", like done for "Nvidia
PTX", or "Intel MIC", or why shouldn't we?


Why do we do that for libgomp, but not for GCC in general? I mean, it's 
not "Intel i386" or "Renesas SuperH" (or should that be "Hitachi"?). The 
only GCC port that has the company name is the "nv" in "nvptx" (ignoring 
those where the company and architecture are the same).


We use "amdgcn" for the target triplet, but it's just "gcn" almost 
everywhere else, including the config directories.



--- a/libgomp/openacc.h
+++ b/libgomp/openacc.h
@@ -55,6 +55,7 @@ typedef enum acc_device_t {
/* acc_device_host_nonshm = 3 removed.  */
acc_device_not_host = 4,
acc_device_nvidia = 5,
+  acc_device_gcn = 8,


There is no 'acc_device_gcn' in OpenACC.  Per OpenACC 3.0, A.1.2. "AMD
GPU Targets", for example, there is 'acc_device_radeon' (and "the
case-insensitive name 'radeon' for the environment variable
'ACC_DEVICE_TYPE'").  If that is not appropriate to use (I have not read
up how exactly AMD's "GCN" and "radeon" relate to each other), we should
get that clarified in the OpenACC specification.


This seems more significant.

There are certainly Radeon devices that are not GCN, but I think all GCN 
devices are branded Radeon?


https://www.amd.com/en/technologies/gcn says "GCN powered Radeon GPUs 
were built ..." so I think AMD would be fine with radeon.  I'll put this 
on my todo list.


Andrew


Re: [patch, libgomp] Enable OpenACC GCN testing

2019-12-03 Thread Andrew Stubbs

On 02/12/2019 14:19, Thomas Schwinge wrote:

Generally, I'm in favor if you'd consider such a thing (that in principle
is just a copy/adapt of the existing cases) as obvious to commit (even
more so with your "amdgcn port" maintainer hat on), especially so given
that this has been/is blocking you, as Tobias told me more than once.


I probably will do for incremental tweaks, but I left some stuff in this 
one to see if you were awake (that's my story and I'm sticking to it).


Here's what I have committed (and I just realized I forgot the new 
fangled reviewed-by tag, sorry).


Andrew
Enable OpenACC GCN testing.

2019-12-03  Andrew Stubbs  

	libgomp/
	* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
	Recognize amdgcn.
	(check_effective_target_openacc_amdgcn_accel_present): New proc.
	(check_effective_target_openacc_amdgcn_accel_selected): New proc.
	* testsuite/libgomp.oacc-c++/c++.exp: Add support for amdgcn.
	* testsuite/libgomp.oacc-c/c.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 06e3186d966..f52ed7184e4 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -318,6 +318,9 @@ proc libgomp_option_proc { option } {
 # not supported, and 'host' for offload target 'disable'.
 proc offload_target_to_openacc_device_type { offload_target } {
 switch -glob $offload_target {
+	amdgcn* {
+	return "gcn"
+	}
 	disable {
 	return "host"
 	}
@@ -479,3 +482,29 @@ proc check_effective_target_hsa_offloading_selected {} {
 	check_effective_target_hsa_offloading_selected_nocache
 }]
 }
+
+# Return 1 if at least one AMD GCN board is present.
+
+proc check_effective_target_openacc_amdgcn_accel_present { } {
+return [check_runtime openacc_amdgcn_accel_present {
+	#include 
+	int main () {
+	return !(acc_get_num_devices (acc_device_gcn) > 0);
+	}
+} "" ]
+}
+
+# Return 1 if at least one AMD GCN board is present, and the AMD GCN device
+# type is selected by default.
+
+proc check_effective_target_openacc_amdgcn_accel_selected { } {
+if { ![check_effective_target_openacc_amdgcn_accel_present] } {
+	return 0;
+}
+global offload_target
+if { [string match "amdgcn*" $offload_target] } {
+return 1;
+}
+return 0;
+}
+
diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index dcefa792ca4..c06c2a097e3 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -88,6 +88,15 @@ if { $lang_test_file_found } {
 		unsupported "$subdir $offload_target offloading"
 		continue
 	}
+	gcn {
+		if { ![check_effective_target_openacc_amdgcn_accel_present] } {
+		# Don't bother; execution testing is going to FAIL.
+		untested "$subdir $offload_target offloading: supported, but hardware not accessible"
+		continue
+		}
+
+		set acc_mem_shared 0
+	}
 	host {
 		set acc_mem_shared 1
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp
index 55cd40f1e99..7f13242fd59 100644
--- a/libgomp/testsuite/libgomp.oacc-c/c.exp
+++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -51,6 +51,15 @@ foreach offload_target [concat [split $offload_targets ","] "disable"] {
 	unsupported "$subdir $offload_target offloading"
 	continue
 	}
+	gcn {
+	if { ![check_effective_target_openacc_amdgcn_accel_present] } {
+		# Don't bother; execution testing is going to FAIL.
+		untested "$subdir $offload_target offloading: supported, but hardware not accessible"
+		continue
+	}
+
+	set acc_mem_shared 0
+	}
 	host {
 	set acc_mem_shared 1
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index 852f372b319..60f0889a07c 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -82,6 +82,15 @@ if { $lang_test_file_found } {
 		unsupported "$subdir $offload_target offloading"
 		continue
 	}
+	gcn {
+		if { ![check_effective_target_openacc_amdgcn_accel_present] } {
+		# Don't bother; execution testing is going to FAIL.
+		untested "$subdir $offload_target offloading: supported, but hardware not accessible"
+		continue
+		}
+
+		set acc_mem_shared 0
+	}
 	host {
 		set acc_mem_shared 1
 	}


Re: Ping: [C++ PATCH] Opt out of GNU vector extensions for built-in SVE types

2019-12-03 Thread Richard Sandiford
Jason Merrill  writes:
> On 11/29/19 5:59 AM, Richard Sandiford wrote:
>> Ping
>> 
>> Richard Sandiford  writes:
>>> This is the C++ equivalent of r277950, which prevented the
>>> use of the GNU vector extensions with SVE vector types for C.
>>> [https://gcc.gnu.org/viewcvs/gcc?view=revision=277950].
>>> I've copied the rationale below for reference.
>>>
>>> The changes here are very similar to the C ones.  Perhaps the only
>>> noteworthy thing (that I know of) is that the patch continues to treat
>>> !gnu_vector_type_p vector types as literal types/potential constexprs.
>>> Disabling the GNU vector extensions shouldn't in itself stop the types
>>> from being literal types, since whatever the target provides instead
>>> might be constexpr material.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>
>>> Richard
>>>
>>> -
>>> The AArch64 port defines built-in SVE types at start-up under names
>>> like __SVInt8_t.  These types are represented in the front end and
>>> gimple as normal VECTOR_TYPEs and are code-generated as normal vectors.
>>> However, we'd like to stop the frontends from treating them in the
>>> same way as GNU-style ("vector_size") vectors, for several reasons:
>>>
>>> (1) We allowed the GNU vector extensions to be mixed with Advanced SIMD
>>>  vector types and it ended up causing a lot of confusion on big-endian
>>>  targets.  Although SVE handles big-endian vectors differently from
>>>  Advanced SIMD, there are still potential surprises; see the block
>>>  comment near the head of aarch64-sve.md for details.
>>>
>>> (2) One of the SVE vectors is a packed one-bit-per-element boolean vector.
>>>  That isn't a combination the GNU vector extensions have supported
>>>  before.  E.g. it means that vectors can no longer decompose to
>>>  arrays for indexing, and that not all elements are individually
>>>  addressable.  It also makes it less clear which order the initialiser
>>>  should be in (lsb first, or bitfield ordering?).  We could define
>>>  all that of course, but it seems a bit weird to go to the effort
>>>  for this case when, given all the other reasons, we don't want the
>>>  extensions anyway.
>>>
>>> (3) The GNU vector extensions only provide full-vector operations,
>>>  which is a very artifical limitation on a predicated architecture
>>>  like SVE.
>>>
>>> (4) The set of operations provided by the GNU vector extensions is
>>>  relatively small, whereas the SVE intrinsics provide many more.
>>>
>>> (5) It makes it easier to ensure that (with default options) code is
>>>  portable between compilers without the GNU vector extensions having
>>>  to become an official part of the SVE intrinsics spec.
>>>
>>> (6) The length of the SVE types is usually not fixed at compile time,
>>>  whereas the GNU vector extension is geared around fixed-length
>>>  vectors.
>>>
>>>  It's possible to specify the length of an SVE vector using the
>>>  command-line option -msve-vector-bits=N, but in principle it should
>>>  be possible to have functions compiled for different N in the same
>>>  translation unit.  This isn't supported yet but would be very useful
>>>  for implementing ifuncs.  Once mixing lengths in a translation unit
>>>  is supported, the SVE types should represent the same type throughout
>>>  the translation unit, just as GNU vector types do.
>>>
>>> However, when -msve-vector-bits=N is in effect, we do allow conversions
>>> between explicit GNU vector types of N bits and the corresponding SVE
>>> types.  This doesn't undermine the intent of (5) because in this case
>>> the use of GNU vector types is explicit and intentional.  It also doesn't
>>> undermine the intent of (6) because converting between the types is just
>>> a conditionally-supported operation.  In other words, the types still
>>> represent the same types throughout the translation unit, it's just that
>>> conversions between them are valid in cases where a certain precondition
>>> is known to hold.  It's similar to the way that the SVE vector types are
>>> defined throughout the translation unit but can only be used in functions
>>> for which SVE is enabled.
>>> -
>> 
>> 2019-11-08  Richard Sandiford  
>> 
>> gcc/cp/
>>  * cp-tree.h (CP_AGGREGATE_TYPE_P): Check for gnu_vector_type_p
>>  instead of VECTOR_TYPE.
>>  * call.c (build_conditional_expr_1): Restrict vector handling
>>  to vectors that satisfy gnu_vector_type_p.  Don't treat the
>>  "then" and "else" types as equivalent if they have the same
>>  vector shape but differ in whether they're GNU vectors.
>>  * cvt.c (ocp_convert): Only allow vectors to be converted
>>  to bool if they satisfy gnu_vector_type_p.
>>  (build_expr_type_conversion): Only allow 

Re: [PATCH 2/4] Validate acc_device_t uses

2019-12-03 Thread Thomas Schwinge
Hi Frederik!

You once had this patch separate, but then merged into the upstream
submission of 'acc_get_property'; let's please keep this separate.

With changes as indicated below, please commit this to trunk (without the
three hunks related to 'acc_get_property', of course; these will then go
into the 'acc_get_property' changes, don't need to re-post
'acc_get_property' because of that).  To record the review effort, please
include "Reviewed-by: Thomas Schwinge " in the
commit log, see .

On 2019-11-13T16:32:13+0100, Frederik Harwath  wrote:
> Check that function arguments of type acc_device_t
> are valid enumeration values in all publicly visible
> functions.
>
>   libgomp/
>   * oacc-init.c (acc_known_device_type): Add function.
> (unknown_device_type_error): Add function.
> (name_of_acc_device_t): Change to call unknown_device_type_error
> on unknown type.
> (resolve_device): Use acc_known_device_type.
> (acc_init): Fail if acc_device_t argument is not valid.
> (acc_shutdown): Likewise.
> (acc_get_num_devices): Likewise.
> (acc_set_device_type): Likewise.
> (acc_get_device_num): Likewise.
> (acc_set_device_num): Likewise.
> (get_property_any): Likewise.
> (acc_get_property): Likewise.
> (acc_get_property_string): Likewise.
> (acc_on_device): Likewise.
> (goacc_save_and_set_bind): Likewise.

Proper indenting.

> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -93,6 +93,21 @@ get_openacc_name (const char *name)
>  return name;
>  }
>  
> +
> +static int
> +acc_known_device_type (acc_device_t __arg)

No leading underscores ('__arg') in implementation files.  (Or, just us
'd' as often/always done elsewhere in this file.)

Often, such inquiry functions are post-fixed with '_p' (but not always).

Can drop the 'acc_' prefix, as this is an internal ('static') function.

Return type should be 'bool'.

> +{
> +  return __arg >= 0 && __arg < _ACC_device_hwm;
> +}
> +
> +
> +static void
> +unknown_device_type_error (unsigned invalid_type)

I suggest this should take an 'acc_device_t, and then cast to
'(unsigned)' when used here:

> +{
> +  gomp_fatal ("unknown device type %u", invalid_type);
> +}
> +
> +
>  static const char *
>  name_of_acc_device_t (enum acc_device_t type)
>  {

Also, move the two new functions before 'get_openacc_name'.

Generally, does usage of these functions obsolete some existing usage of
'acc_dev_num_out_of_range'?  (OK to address later.)

> @@ -103,8 +118,9 @@ name_of_acc_device_t (enum acc_device_t type)
>  case acc_device_host: return "host";
>  case acc_device_not_host: return "not_host";
>  case acc_device_nvidia: return "nvidia";
> -default: gomp_fatal ("unknown device type %u", (unsigned) type);
> +default: unknown_device_type_error(type);

Missing space before '('.  Not just here, but generally.

>  }
> +  return 0; /** Never reached **/
>  }

As done elsewhere in libgomp, instead of that 'return' plus comment, just
call '__builtin_unreachable ()'.

> @@ -123,7 +139,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
>   if (goacc_device_type)
> {
>   /* Lookup the named device.  */
> - while (++d != _ACC_device_hwm)
> + while (acc_known_device_type (++d))
> if (dispatchers[d]
> && !strcasecmp (goacc_device_type,
> get_openacc_name (dispatchers[d]->name))
> @@ -147,7 +163,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
>  
>  case acc_device_not_host:
>/* Find the first available device after acc_device_not_host.  */
> -  while (++d != _ACC_device_hwm)
> +  while (acc_known_device_type (++d))
>   if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
> goto found;
>if (d_arg == acc_device_default)
> @@ -168,7 +184,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
>break;
>  
>  default:
> -  if (d > _ACC_device_hwm)
> +  if (!acc_known_device_type (d))
>   {
> if (fail_is_error)
>   goto unsupported_device;

Note that this had 'd > _ACC_device_hwm', not '>=' as it now does, that
is, previously didn't reject 'd == _ACC_device_hwm' but now does -- but I
suppose this was an (minor) bug that existed before, so OK to change as
you did?

> @@ -328,7 +344,6 @@ acc_shutdown_1 (acc_device_t d)
>gomp_unload_device (acc_dev);
>gomp_mutex_unlock (_dev->lock);
>  }
> -  
>gomp_mutex_lock (_thread_lock);
>  
>/* Free target-specific TLS data and close all devices.  */
> @@ -494,7 +509,6 @@ goacc_attach_host_thread_to_device (int ord)
>thr->api_info = NULL;
>/* Initially, all callbacks for all events are enabled.  */
>thr->prof_callbacks_enabled = true;
> -
>thr->target_tls
>  = acc_dev->openacc.create_thread_data_func (ord);
>  }

Please 

Re: [PATCH][2/2] More abstraction penalty removal for PR92645

2019-12-03 Thread Richard Biener
On Tue, 3 Dec 2019, Richard Biener wrote:

> On Mon, 2 Dec 2019, Richard Biener wrote:
> 
> > On December 2, 2019 4:27:47 PM GMT+01:00, Alexander Monakov 
> >  wrote:
> > >On Mon, 2 Dec 2019, Richard Biener wrote:
> > >
> > >> +typedef long long v4di __attribute__((vector_size(32)));
> > >> +struct Vec
> > >> +{
> > >> +  unsigned int v[8];
> > >> +};
> > >> +
> > >> +v4di pun (struct Vec *s)
> > >> +{
> > >> +  v4di tem;
> > >> +  __builtin_memcpy (, s, 32);
> > >> +  return tem;
> > >> +}
> > >> +
> > >> +/* We're expecting exactly two stmts, in particular no
> > >BIT_INSERT_EXPR
> > >> +   and no memcpy call.
> > >> +_3 = MEM  [(char * {ref-all})s_2(D)];
> > >> +return _3;  */
> > >
> > >So just to make sure I understand correctly: the type in angle brackets
> > >does
> > >not imply 256-bit alignment, and this access has implied alignment of
> > >just 
> > >32 bits, which is deduced from the type of s_2, right?
> > 
> > Yes. I should have quoted the more obvious -gimple IL dump instead. 
> 
>   _3 = __MEM  ((char * {ref-all})s_2(D));
> 
> so it's actually only 8 bits alignment.

And the following is what I have applied (a bit more restricted to
avoid FAILs for strlenopt testcases).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2019-12-03  Richard Biener  

PR tree-optimization/92645
* gimple-fold.c (gimple_fold_builtin_memory_op): Fold memcpy
from or to a properly aligned register variable.

* gcc.target/i386/pr92645-5.c: New testcase.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 278893)
+++ gcc/gimple-fold.c   (working copy)
@@ -986,36 +986,33 @@ gimple_fold_builtin_memory_op (gimple_st
 
   src_align = get_pointer_alignment (src);
   dest_align = get_pointer_alignment (dest);
-  if (dest_align < TYPE_ALIGN (desttype)
- || src_align < TYPE_ALIGN (srctype))
-   return false;
 
+  /* Choose between src and destination type for the access based
+ on alignment, whether the access constitutes a register access
+and whether it may actually expose a declaration for SSA rewrite
+or SRA decomposition.  */
   destvar = NULL_TREE;
+  srcvar = NULL_TREE;
   if (TREE_CODE (dest) == ADDR_EXPR
  && var_decl_component_p (TREE_OPERAND (dest, 0))
- && tree_int_cst_equal (TYPE_SIZE_UNIT (desttype), len))
+ && tree_int_cst_equal (TYPE_SIZE_UNIT (desttype), len)
+ && dest_align >= TYPE_ALIGN (desttype)
+ && (is_gimple_reg_type (desttype)
+ || src_align >= TYPE_ALIGN (desttype)))
destvar = fold_build2 (MEM_REF, desttype, dest, off0);
-
-  srcvar = NULL_TREE;
-  if (TREE_CODE (src) == ADDR_EXPR
- && var_decl_component_p (TREE_OPERAND (src, 0))
- && tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len))
-   {
- if (!destvar
- || src_align >= TYPE_ALIGN (desttype))
-   srcvar = fold_build2 (MEM_REF, destvar ? desttype : srctype,
- src, off0);
- else if (!STRICT_ALIGNMENT)
-   {
- srctype = build_aligned_type (TYPE_MAIN_VARIANT (desttype),
-   src_align);
- srcvar = fold_build2 (MEM_REF, srctype, src, off0);
-   }
-   }
-
+  else if (TREE_CODE (src) == ADDR_EXPR
+  && var_decl_component_p (TREE_OPERAND (src, 0))
+  && tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len)
+  && src_align >= TYPE_ALIGN (srctype)
+  && (is_gimple_reg_type (srctype)
+  || dest_align >= TYPE_ALIGN (srctype)))
+   srcvar = fold_build2 (MEM_REF, srctype, src, off0);
   if (srcvar == NULL_TREE && destvar == NULL_TREE)
return false;
 
+  /* Now that we chose an access type express the other side in
+ terms of it if the target allows that with respect to alignment
+constraints.  */
   if (srcvar == NULL_TREE)
{
  if (src_align >= TYPE_ALIGN (desttype))
Index: gcc/testsuite/gcc.target/i386/pr92645-5.c
===
--- gcc/testsuite/gcc.target/i386/pr92645-5.c   (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr92645-5.c   (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-cddce1 -mavx2 -Wno-psabi" } */
+typedef long long v4di __attribute__((vector_size(32)));
+struct Vec
+{
+  unsigned int v[8];
+};
+
+v4di pun (struct Vec *s)
+{
+  v4di tem;
+  __builtin_memcpy (, s, 32);
+  return tem;
+}
+
+/* We're expecting exactly two stmts, in particular no BIT_INSERT_EXPR
+   and no memcpy call.
+_3 = MEM  [(char * {ref-all})s_2(D)];
+return _3;  */
+/* { dg-final { scan-tree-dump-times " = MEM" 1 "cddce1" } } */
+/* { dg-final { scan-tree-dump-not "memcpy" "cddce1" } } */


[Patch] Rework OpenACC nested reduction clause consistency checking (was: Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses)

2019-12-03 Thread Harwath, Frederik
Hi Jakub,

On 08.11.19 07:41, Harwath, Frederik wrote:
> On 06.11.19 14:00, Jakub Jelinek wrote:
> [...]
>> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
>> more natural, wouldn't it.
> 
> Yes.
> 
> [...]
>> If gimplifier is not the right spot, then use a splay tree + vector instead?
>> splay tree for the outer ones, vector for the local ones, and put into both
>> the clauses, so you can compare reduction code etc.
> 
> Sounds like a good idea. I am going to try that.

Below you can find a patch that reimplements the nested reductions check using
more appropriate data structures. As an additional benefit, the quality of the 
warnings
has also improved (see description below). I have checked the patch by running 
the testsuite on
x86_64-pc-linux-gnu.

Best regards,
Frederik

From 94ca786172afa7dab7630d75965bf6d6f0dd24e1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 3 Dec 2019 10:38:01 +0100
Subject: [PATCH] Rework OpenACC nested reduction clause consistency checking

Revision 277875 of trunk introduced a consistency check for nested OpenACC
reduction clauses. The implementation has two drawbacks:
1) It uses suboptimal data structures for storing information about
   the reduction clauses.
2) The warnings issued for *repeated* inconsistent use of reduction operators
   are confusing. For instance, on three nested loops that use the reduction
   operators +, -, + on the same variable, we obtain a warning at the switch
   from + to - (as desired) and another warning about the switch from - to +.
   It would be preferable to avoid the second warning since + is consistent
   with the first reduction operator.

This commit attempts to fix both problems by using more appropriate data
structures (splay trees and vectors instead of tree lists) for keeping track of
the information about the reduction clauses.

2019-12-3  Frederik Harwath  

	gcc/
	* omp-low.c (omp_context): Removed fields local_reduction_clauses,
	outer_reduction_clauses; added fields oacc_reduction_clauses,
	oacc_reductions_stack.
	(oacc_reduction_clause_location): New struct.
	(oacc_reduction_var_occ): New struct.
	(new_omp_context): Adjust omp_context initialization to new fields.
	(delete_omp_context): Adjust omp_context deletion to new fields.
	(rewind_oacc_reductions_stack): New function.
	(check_oacc_reduction_clause): New function.
	(check_oacc_reduction_clauses): New function.
	(scan_sharing_clauses): Call check_oacc_reduction_clause for
	reduction clauses (this handles clauses on compute regions)
	if a new optional flag is enabled.
	(scan_omp_for): Remove old nested reduction check, call
	 check_oacc_reduction_clauses instead.
	(scan_omp_target): Adapt call to scan_sharing_clauses to enable the new
	flag.

   	gcc/testsuite/
	* c-c++-common/goacc/nested-reductions-warn.c: Add dg-prune-output to
	 ignore warnings that are not relevant to the test.
	(acc_parallel): Stop expecting pruned warnings, adjust expected
	warnings to changes in omp-low.c, add checks for info messages about the
	location of clauses.
	(acc_parallel_loop): Likewise.
	(acc_parallel_reduction): Likewise.
	(acc_parallel_loop_reduction): Likewise.
	(acc_routine): Likewise.
	(acc_kernels): Likewise.

	* gfortran.dg/goacc/nested-reductions-warn.f90: Likewise.
---
 gcc/omp-low.c | 305 --
 .../goacc/nested-reductions-warn.c|  81 ++---
 .../goacc/nested-reductions-warn.f90  |  83 ++---
 3 files changed, 271 insertions(+), 198 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 19132f76da2..ba04e7477dc 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -73,6 +73,9 @@ along with GCC; see the file COPYING3.  If not see
scanned for regions which are then moved to a new
function, to be invoked by the thread library, or offloaded.  */
 
+
+struct oacc_reduction_var_occ;
+
 /* Context structure.  Used to store information about each parallel
directive in the code.  */
 
@@ -128,12 +131,6 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
-  /* A tree_list of the reduction clauses in this context.  */
-  tree local_reduction_clauses;
-
-  /* A tree_list of the reduction clauses in outer contexts.  */
-  tree outer_reduction_clauses;
-
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -163,8 +160,52 @@ struct omp_context
 
   /* True if there is bind clause on the construct (i.e. a loop construct).  */
   bool loop_p;
+
+  /* A mapping that maps a variable to information about the last OpenACC
+ reduction clause that used the variable above the current context.
+ This information is used for checking the nesting restrictions for
+ reduction clauses by the function check_oacc_reduction_clauses.
+ The mapping is owned by 

Re: [PATCH] musl: Fix invalid tls model in libgomp and libitm PR91938

2019-12-03 Thread Jakub Jelinek
On Fri, Nov 15, 2019 at 09:55:37AM +, Szabolcs Nagy wrote:
> Musl does not support initial-exec tls in dynamically loaded shared
> libraries.
> 
> libgomp/ChangeLog:
> 
> 2019-11-15  Szabolcs Nagy  
> 
>   * configure.tgt: Avoid IE tls on *-*-musl*.
> 
> libitm/ChangeLog:
> 
> 2019-11-15  Szabolcs Nagy  
> 
>   * configure.tgt: Avoid IE tls on *-*-musl*.

Ok.

Jakub



Re: [PATCH] musl: Fix invalid tls model in libgomp and libitm PR91938

2019-12-03 Thread Szabolcs Nagy
On 20/11/2019 14:42, Szabolcs Nagy wrote:
> On 15/11/2019 09:55, Szabolcs Nagy wrote:
>> Musl does not support initial-exec tls in dynamically loaded shared
>> libraries.
> 
> ping.

ping.

>>
>> libgomp/ChangeLog:
>>
>> 2019-11-15  Szabolcs Nagy  
>>
>>  * configure.tgt: Avoid IE tls on *-*-musl*.
> 
> i should add PR libgomp/91938 here.
> 
>>
>> libitm/ChangeLog:
>>
>> 2019-11-15  Szabolcs Nagy  
>>
>>  * configure.tgt: Avoid IE tls on *-*-musl*.



Re: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-03 Thread Stam Markianos-Wright


On 12/2/19 9:27 PM, Joseph Myers wrote:
> On Mon, 2 Dec 2019, Jeff Law wrote:
> 
>>> 2019-11-13  Stam Markianos-Wright  
>>>
>>> * real.c (struct arm_bfloat_half_format,
>>> encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>>> * real.h (arm_bfloat_half_format): New.
>>>
>>>
>> Generally OK.  Please consider using "arm_bfloat_half" instead of
>> "bfloat_half" for the name field in the arm_bfloat_half_format
>> structure.  I'm not sure if that's really visible externally, but it
> 
Hi both! Agreed that we want to be conservative. See latest diff 
attached with the name field change (also pasted below).

> Isn't this the same format used by AVX512_BF16 / Intel DL Boost (albeit
> with Arm and Intel using different rounding modes)?

Yes it is remarkably similar, but there's really only so much variation 
you can have with what is half an f32!

Cheers,
Stam


> 


diff --git a/gcc/real.h b/gcc/real.h
index 0f660c9c671..2b337bb7f7d 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -368,6 +368,7 @@ extern const struct real_format decimal_double_format;
  extern const struct real_format decimal_quad_format;
  extern const struct real_format ieee_half_format;
  extern const struct real_format arm_half_format;
+extern const struct real_format arm_bfloat_half_format;


  /* 
== */
diff --git a/gcc/real.c b/gcc/real.c
index 134240a6be9..07b63b6f27e 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -4799,6 +4799,116 @@ decode_ieee_half (const struct real_format *fmt, 
REAL_VALUE_TYPE *r,
  }
  }

+/* Encode arm_bfloat types.  */
+static void
+encode_arm_bfloat_half (const struct real_format *fmt, long *buf,
+   const REAL_VALUE_TYPE *r)
+{
+  unsigned long image, sig, exp;
+  unsigned long sign = r->sign;
+  bool denormal = (r->sig[SIGSZ-1] & SIG_MSB) == 0;
+
+  image = sign << 15;
+  sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 8)) & 0x7f;
+
+  switch (r->cl)
+{
+case rvc_zero:
+  break;
+
+case rvc_inf:
+  if (fmt->has_inf)
+   image |= 255 << 7;
+  else
+   image |= 0x7fff;
+  break;
+
+case rvc_nan:
+  if (fmt->has_nans)
+   {
+ if (r->canonical)
+   sig = (fmt->canonical_nan_lsbs_set ? (1 << 6) - 1 : 0);
+ if (r->signalling == fmt->qnan_msb_set)
+   sig &= ~(1 << 6);
+ else
+   sig |= 1 << 6;
+ if (sig == 0)
+   sig = 1 << 5;
+
+ image |= 255 << 7;
+ image |= sig;
+   }
+  else
+   image |= 0x7fff;
+  break;
+
+case rvc_normal:
+  if (denormal)
+   exp = 0;
+  else
+  exp = REAL_EXP (r) + 127 - 1;
+  image |= exp << 7;
+  image |= sig;
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  buf[0] = image;
+}
+
+/* Decode arm_bfloat types.  */
+static void
+decode_arm_bfloat_half (const struct real_format *fmt, REAL_VALUE_TYPE *r,
+   const long *buf)
+{
+  unsigned long image = buf[0] & 0x;
+  bool sign = (image >> 15) & 1;
+  int exp = (image >> 7) & 0xff;
+
+  memset (r, 0, sizeof (*r));
+  image <<= HOST_BITS_PER_LONG - 8;
+  image &= ~SIG_MSB;
+
+  if (exp == 0)
+{
+  if (image && fmt->has_denorm)
+   {
+ r->cl = rvc_normal;
+ r->sign = sign;
+ SET_REAL_EXP (r, -126);
+ r->sig[SIGSZ-1] = image << 1;
+ normalize (r);
+   }
+  else if (fmt->has_signed_zero)
+   r->sign = sign;
+}
+  else if (exp == 255 && (fmt->has_nans || fmt->has_inf))
+{
+  if (image)
+   {
+ r->cl = rvc_nan;
+ r->sign = sign;
+ r->signalling = (((image >> (HOST_BITS_PER_LONG - 2)) & 1)
+  ^ fmt->qnan_msb_set);
+ r->sig[SIGSZ-1] = image;
+   }
+  else
+   {
+ r->cl = rvc_inf;
+ r->sign = sign;
+   }
+}
+  else
+{
+  r->cl = rvc_normal;
+  r->sign = sign;
+  SET_REAL_EXP (r, exp - 127 + 1);
+  r->sig[SIGSZ-1] = image | SIG_MSB;
+}
+}
+
  /* Half-precision format, as specified in IEEE 754R.  */
  const struct real_format ieee_half_format =
{
@@ -4848,6 +4958,33 @@ const struct real_format arm_half_format =
  false,
  "arm_half"
};
+
+/* ARM Bfloat half-precision format.  This format resembles a truncated
+   (16-bit) version of the 32-bit IEEE 754 single-precision floating-point
+   format.  */
+const struct real_format arm_bfloat_half_format =
+  {
+encode_arm_bfloat_half,
+decode_arm_bfloat_half,
+2,
+8,
+8,
+-125,
+128,
+15,
+15,
+0,
+false,
+true,
+true,
+true,
+true,
+true,
+true,
+false,
+"arm_bfloat_half"
+  };
+
  
  /* A synthetic "format" for internal arithmetic.  It's the size of the
 internal significand minus the two bits needed for proper rounding.
diff --git a/gcc/real.h b/gcc/real.h
index 0f660c9c671..2b337bb7f7d 100644
--- 

[Ada] Do not gratuitously use XVA encoding in debug info

2019-12-03 Thread Eric Botcazou
This streamlines the encoding used in the debug info for record types with 
components of variable size.

Tested on x86_64-suse-linux, applied on the mainline.


2019-12-03  Eric Botcazou  

* gcc-interface/utils.c (potential_alignment_gap): Delete.
(rest_of_record_type_compilation): Do not call above function.  Use
the alignment of the field instead of that of its type, if need be.
When the original field has variable size, always lower the alignment
of the pointer type.  Reset the bit-field status of the new field if
it does not encode a bit-field.

-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 278929)
+++ gcc-interface/utils.c	(working copy)
@@ -288,7 +288,6 @@ static tree split_plus (tree, tree *);
 static tree float_type_for_precision (int, machine_mode);
 static tree convert_to_fat_pointer (tree, tree);
 static unsigned int scale_by_factor_of (tree, unsigned int);
-static bool potential_alignment_gap (tree, tree, tree);
 
 /* Linked list used as a queue to defer the initialization of the DECL_CONTEXT
of ..._DECL nodes and of the TYPE_CONTEXT of ..._TYPE nodes.  */
@@ -2171,7 +2170,6 @@ rest_of_record_type_compilation (tree re
 		 ? UNION_TYPE : TREE_CODE (record_type));
   tree orig_name = TYPE_IDENTIFIER (record_type), new_name;
   tree last_pos = bitsize_zero_node;
-  tree old_field, prev_old_field = NULL_TREE;
 
   new_name
 	= concat_name (orig_name, TREE_CODE (record_type) == QUAL_UNION_TYPE
@@ -2189,7 +2187,8 @@ rest_of_record_type_compilation (tree re
 
   /* Now scan all the fields, replacing each field with a new field
 	 corresponding to the new encoding.  */
-  for (old_field = TYPE_FIELDS (record_type); old_field;
+  for (tree old_field = TYPE_FIELDS (record_type);
+	   old_field;
 	   old_field = DECL_CHAIN (old_field))
 	{
 	  tree field_type = TREE_TYPE (old_field);
@@ -2213,9 +2212,10 @@ rest_of_record_type_compilation (tree re
 	  else
 	pos = compute_related_constant (curpos, last_pos);
 
-	  if (!pos
-	  && TREE_CODE (curpos) == MULT_EXPR
-	  && tree_fits_uhwi_p (TREE_OPERAND (curpos, 1)))
+	  if (pos)
+	;
+	  else if (TREE_CODE (curpos) == MULT_EXPR
+		   && tree_fits_uhwi_p (TREE_OPERAND (curpos, 1)))
 	{
 	  tree offset = TREE_OPERAND (curpos, 0);
 	  align = tree_to_uhwi (TREE_OPERAND (curpos, 1));
@@ -2223,8 +2223,7 @@ rest_of_record_type_compilation (tree re
 	  last_pos = round_up (last_pos, align);
 	  pos = compute_related_constant (curpos, last_pos);
 	}
-	  else if (!pos
-		   && TREE_CODE (curpos) == PLUS_EXPR
+	  else if (TREE_CODE (curpos) == PLUS_EXPR
 		   && tree_fits_uhwi_p (TREE_OPERAND (curpos, 1))
 		   && TREE_CODE (TREE_OPERAND (curpos, 0)) == MULT_EXPR
 		   && tree_fits_uhwi_p
@@ -2240,9 +2239,9 @@ rest_of_record_type_compilation (tree re
 	  last_pos = round_up (last_pos, align);
 	  pos = compute_related_constant (curpos, last_pos);
 	}
-	  else if (potential_alignment_gap (prev_old_field, old_field, pos))
+	  else
 	{
-	  align = TYPE_ALIGN (field_type);
+	  align = DECL_ALIGN (old_field);
 	  last_pos = round_up (last_pos, align);
 	  pos = compute_related_constant (curpos, last_pos);
 	}
@@ -2261,13 +2260,17 @@ rest_of_record_type_compilation (tree re
 	 in this case, if we don't preventively counter that.  */
 	  if (TREE_CODE (DECL_SIZE (old_field)) != INTEGER_CST)
 	{
-	  field_type = build_pointer_type (field_type);
-	  if (align != 0 && TYPE_ALIGN (field_type) > align)
+	  field_type = copy_type (build_pointer_type (field_type));
+	  SET_TYPE_ALIGN (field_type, BITS_PER_UNIT);
+	  var = true;
+
+	  /* ??? Kludge to work around a bug in Workbench's debugger.  */
+	  if (align == 0)
 		{
-		  field_type = copy_type (field_type);
-		  SET_TYPE_ALIGN (field_type, align);
+		  align = DECL_ALIGN (old_field);
+		  last_pos = round_up (last_pos, align);
+		  pos = compute_related_constant (curpos, last_pos);
 		}
-	  var = true;
 	}
 
 	  /* Make a new field name, if necessary.  */
@@ -2287,6 +2290,16 @@ rest_of_record_type_compilation (tree re
 	  new_field
 	= create_field_decl (field_name, field_type, new_record_type,
  DECL_SIZE (old_field), pos, 0, 0);
+	  /* The specified position is not the actual position of the field
+	 but the gap with the previous field, so the computation of the
+	 bit-field status may be incorrect.  We adjust it manually to
+	 avoid generating useless attributes for the field in DWARF.  */
+	  if (DECL_SIZE (old_field) == TYPE_SIZE (field_type)
+	  && value_factor_p (pos, BITS_PER_UNIT))
+	{
+	  DECL_BIT_FIELD (new_field) = 0;
+	  DECL_BIT_FIELD_TYPE (new_field) = NULL_TREE;
+	}
 	  DECL_CHAIN (new_field) = TYPE_FIELDS (new_record_type);
 	  TYPE_FIELDS (new_record_type) = 

Re: [patch] install the lto-dump man page

2019-12-03 Thread Martin Liška

On 12/2/19 7:15 PM, Matthias Klose wrote:

GCC 10 comes with a new lto-dump texi file, but the man page isn't built and
installed. Fix with the attached patch. Ok to install?

Matthias



Hello.

Thank you for the patch. I would consider it as obvious.

Martin


[Ada] Fix debug info for variant records on m68k

2019-12-03 Thread Eric Botcazou
This fixes a bug in the encoding used in the debug info generated for record 
types with variant part for targets where the alignment is capped to 16 bits 
like the m68k (unless -malign-int is specified).

Tested on m68k-elf and x86_64-suse-linux, applied on the mainline.


2019-12-03  Eric Botcazou  

* gcc-interface/utils.c (fold_convert_size): New function.
(fold_bit_position): Invoke it to do further folding.

-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 277906)
+++ gcc-interface/utils.c	(working copy)
@@ -2349,19 +2349,27 @@ merge_sizes (tree last_size, tree first_
   return new_size;
 }
 
+/* Convert the size expression EXPR to TYPE and fold the result.  */
+
+static tree
+fold_convert_size (tree type, tree expr)
+{
+  /* We assume that size expressions do not wrap around.  */
+  if (TREE_CODE (expr) == MULT_EXPR || TREE_CODE (expr) == PLUS_EXPR)
+return size_binop (TREE_CODE (expr),
+		   fold_convert_size (type, TREE_OPERAND (expr, 0)),
+		   fold_convert_size (type, TREE_OPERAND (expr, 1)));
+
+  return fold_convert (type, expr);
+}
+
 /* Return the bit position of FIELD, in bits from the start of the record,
and fold it as much as possible.  This is a tree of type bitsizetype.  */
 
 static tree
 fold_bit_position (const_tree field)
 {
-  tree offset = DECL_FIELD_OFFSET (field);
-  if (TREE_CODE (offset) == MULT_EXPR || TREE_CODE (offset) == PLUS_EXPR)
-offset = size_binop (TREE_CODE (offset),
-			 fold_convert (bitsizetype, TREE_OPERAND (offset, 0)),
-			 fold_convert (bitsizetype, TREE_OPERAND (offset, 1)));
-  else
-offset = fold_convert (bitsizetype, offset);
+  tree offset = fold_convert_size (bitsizetype, DECL_FIELD_OFFSET (field));
   return size_binop (PLUS_EXPR, DECL_FIELD_BIT_OFFSET (field),
  		 size_binop (MULT_EXPR, offset, bitsize_unit_node));
 }


[Ada] Adjust Copy-In/Copy-Out mechanism on 64-bit targets

2019-12-03 Thread Eric Botcazou
This adjusts the return part of the mechanism used to pass In Out or Out 
parameters on 64-bit targets to avoid generating problematic paradoxical 
subregs with floating-point mode.

Tested on x86_64-suse-linux, applied on the mainline.


2019-12-03  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_subprog_type): With the Copy-In/
Copy-Out mechanism, do not promote the mode of the return type to an
integral mode if it contains a field on a non-integral type and even
demote it for 64-bit targets.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 277906)
+++ gcc-interface/decl.c	(working copy)
@@ -5620,6 +5620,32 @@ gnat_to_gnu_profile_type (Entity_Id gnat
   return gnu_type;
 }
 
+/* Return true if TYPE contains only integral data, recursively if need be.  */
+
+static bool
+type_contains_only_integral_data (tree type)
+{
+  switch (TREE_CODE (type))
+{
+case RECORD_TYPE:
+case UNION_TYPE:
+case QUAL_UNION_TYPE:
+  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+	if (!type_contains_only_integral_data (TREE_TYPE (field)))
+	  return false;
+  return true;
+
+case ARRAY_TYPE:
+case COMPLEX_TYPE:
+  return type_contains_only_integral_data (TREE_TYPE (type));
+
+default:
+  return INTEGRAL_TYPE_P (type);
+}
+
+  gcc_unreachable ();
+}
+
 /* Return a GCC tree for a subprogram type corresponding to GNAT_SUBPROG.
DEFINITION is true if this is for a subprogram being defined.  DEBUG_INFO_P
is true if we need to write debug information for other types that we may
@@ -5649,8 +5675,8 @@ gnat_to_gnu_subprog_type (Entity_Id gnat
  the TYPE_CI_CO_LIST field of the FUNCTION_TYPE node we create.  */
   tree gnu_cico_list = NULL_TREE;
   tree gnu_cico_return_type = NULL_TREE;
-  /* Fields in return type of procedure with copy-in copy-out parameters.  */
-  tree gnu_field_list = NULL_TREE;
+  tree gnu_cico_field_list = NULL_TREE;
+  bool gnu_cico_only_integral_type = true;
   /* The semantics of "pure" in Ada essentially matches that of "const"
  or "pure" in GCC.  In particular, both properties are orthogonal
  to the "nothrow" property if the EH circuitry is explicit in the
@@ -5976,9 +6002,11 @@ gnat_to_gnu_subprog_type (Entity_Id gnat
  NULL_TREE, 0, 0);
 		  Sloc_to_locus (Sloc (gnat_subprog),
 			 _SOURCE_LOCATION (gnu_field));
-		  gnu_field_list = gnu_field;
+		  gnu_cico_field_list = gnu_field;
 		  gnu_cico_list
 		= tree_cons (gnu_field, void_type_node, NULL_TREE);
+		  if (!type_contains_only_integral_data (gnu_return_type))
+		gnu_cico_only_integral_type = false;
 		}
 
 	  TYPE_NAME (gnu_cico_return_type) = get_identifier ("RETURN");
@@ -5995,9 +6023,11 @@ gnat_to_gnu_subprog_type (Entity_Id gnat
  0, 0);
 	  Sloc_to_locus (Sloc (gnat_param),
 			 _SOURCE_LOCATION (gnu_field));
-	  DECL_CHAIN (gnu_field) = gnu_field_list;
-	  gnu_field_list = gnu_field;
+	  DECL_CHAIN (gnu_field) = gnu_cico_field_list;
+	  gnu_cico_field_list = gnu_field;
 	  gnu_cico_list = tree_cons (gnu_field, gnu_param, gnu_cico_list);
+	  if (!type_contains_only_integral_data (gnu_param_type))
+	gnu_cico_only_integral_type = false;
 	}
 }
 
@@ -6014,12 +6044,14 @@ gnat_to_gnu_subprog_type (Entity_Id gnat
 	 since structures are incomplete for the back-end.  */
   else if (Convention (gnat_subprog) != Convention_Stubbed)
 	{
-	  finish_record_type (gnu_cico_return_type, nreverse (gnu_field_list),
+	  finish_record_type (gnu_cico_return_type,
+			  nreverse (gnu_cico_field_list),
 			  0, false);
 
-	  /* Try to promote the mode of the return type if it is passed
-	 in registers, again to speed up accesses.  */
+	  /* Try to promote the mode if the return type is fully returned
+	 in integer registers, again to speed up accesses.  */
 	  if (TYPE_MODE (gnu_cico_return_type) == BLKmode
+	  && gnu_cico_only_integral_type
 	  && !targetm.calls.return_in_memory (gnu_cico_return_type,
 		  NULL_TREE))
 	{
@@ -6042,6 +6074,17 @@ gnat_to_gnu_subprog_type (Entity_Id gnat
 		}
 	}
 
+	  /* But demote the mode if the return type is partly returned in FP
+	 registers to avoid creating problematic paradoxical subregs.
+	 Note that we need to cater to historical 32-bit architectures
+	 that incorrectly use the mode to select the return mechanism.  */
+	  else if (INTEGRAL_MODE_P (TYPE_MODE (gnu_cico_return_type))
+		   && !gnu_cico_only_integral_type
+		   && BITS_PER_WORD >= 64
+		   && !targetm.calls.return_in_memory (gnu_cico_return_type,
+		   NULL_TREE))
+	SET_TYPE_MODE (gnu_cico_return_type, BLKmode);
+
 	  if (debug_info_p)
 	rest_of_record_type_compilation (gnu_cico_return_type);
 	}


Re: [PATCH] Improve A*B+-A -> A*(B+-1) and A+-A*B -> A*(1+-B) match.pd optimization

2019-12-03 Thread Richard Biener
On Tue, 3 Dec 2019, Richard Sandiford wrote:

> Jakub Jelinek  writes:
> > Hi!
> >
> > As discussed in the PR, we can't optimize e.g.
> >   int a = t - 1;
> >   int b = a * v;
> >   return b + v;
> > into return t * v; for signed non-wrapv arithmetics.  This can be done
> > by the match.pd (A * B) +- A -> (B +- 1) * A or
> > A +- (A * B) -> (1 +- B) * A canonicalizations.  Being a lazy guy,
> > I wrote attached brute force proglet to look for all the cases (for signed
> > char) where there is no UB in the original and the transformation would
> > introduce UB.  For the simple cases with just A and B, rather than A, B and
> > C, the problematic cases are for signed char only:
> > A*B+A -> (B+1)*A A==-1 && B==127
> > A*B+A -> (B+1)*A A==0 && B==127
> > A*B-A -> (B-1)*A A==0 && B==-128
> > A-A*B -> (1-B)*A A==-1 && B==-127
> > A-A*B -> (1-B)*A A==0 && B==-128
> > A-A*B -> (1-B)*A A==0 && B==-127
> > The current patterns already use VRP (tree_expr_nonzero_p and
> > expr_not_equal_to) to do the transformation only if A is known not to be 0
> > or -1.  But as the above problematic cases show, for A*B-A the -1
> > case is actually not problematic (transformation doesn't introduce UB;
> > if A is -1, -1*B+1 has UB only for minimum and minimum+1, and the
> > replacement (B-1)*-1 has also UB for those two cases only) and even when we
> > know nothing about A value range, if we know something about B value range,
> > we could still optimize.  So, for the
> >   int a = t - 1;
> >   int b = a * v;
> >   return b + v;
> > case, the a = t - 1 has value range that doesn't include maximum and
> > so we can conclude it is ok to transform it into return ((t - 1) + 1) * v
> > and thus t * v.
> >
> > Unfortunately, the patch "broke" a few loop_versioning_*.f90 tests (CCing
> > author), where there are small differences between the lversion pass,
> > e.g. in loop_versioning_1.f90 (f1):
> > -  # RANGE ~[0, 0]
> > -  _4 = iftmp.11_6 * S.4_19;
> > -  _11 = _4 - iftmp.11_6;
> > +  # RANGE [0, 9223372036854775806] NONZERO 9223372036854775807
> > +  _4 = S.4_19 + -1;
> > +  _11 = _4 * iftmp.11_6;
> > and the lversion pass then emits just one message instead of two, but in the
> > end assembly is identical.  In loop_versioning_6.f90 (though, with -m32
> > only), the code before lversion pass actually looks better in f1:
> > -  # i_35 = PHI <1(8), i_28(9)>
> > -  _9 = iftmp.33_15 * i_35;
> > -  _10 = _9 * 2;
> > -  _21 = _10 - iftmp.33_15;
> > -  (*x.0_23)[_21] = 1.0e+2;
> > -  _11 = i_35 * 2;
> > -  _12 = _11 + 1;
> > -  _13 = _12 * iftmp.33_15;
> > -  _22 = _13 - iftmp.33_15;
> > -  (*x.0_23)[_22] = 1.01e+2;
> > -  i_28 = i_35 + 1;
> > -  if (iftmp.36_25 < i_28)
> > +  # i_31 = PHI <1(8), i_26(9)>
> > +  _10 = iftmp.33_13 * i_31;
> > +  _11 = _10 * 2;
> > +  _19 = _11 - iftmp.33_13;
> > +  (*x.0_21)[_19] = 1.0e+2;
> > +  (*x.0_21)[_11] = 1.01e+2;
> > +  i_26 = i_31 + 1;
> > +  if (iftmp.36_23 < i_26)
> > where due to the new canonicalizations we managed to avoid some
> > multiplications.  One index was iftmp*i*2-iftmp and the other
> > was iftmp*(i*2+1)-iftmp and with the patch we managed to simplify
> > the latter into iftmp*i*2 and use for that the temporary used for
> > the first expression.  f1 is actually in assembly smaller because of this.
> > The lp64 vs. ! lp64 is just a wild guess, guess testing on further targets
> > will show what is the target property that matters.
> 
> We're supposed to version all 9 functions on all targets though,
> so I think we should xfail the test for ! lp64 rather than expect
> different output.
> 
> E.g. we longer vectorise f1 for -mabi=ilp32 on aarch64-linux-gnu,
> which is a regression of sorts,
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

> Richard
> 
> 
> 2019-12-03  Richard Sandiford  
> 
> gcc/testsuite/
>   * gfortran.dg/loop_versioning_6.f90: XFAIL the scans for ! lp64.
> 
> Index: gcc/testsuite/gfortran.dg/loop_versioning_6.f90
> ===
> --- gcc/testsuite/gfortran.dg/loop_versioning_6.f90   2019-12-02 
> 17:38:19.276428679 +
> +++ gcc/testsuite/gfortran.dg/loop_versioning_6.f90   2019-12-03 
> 09:45:30.673414203 +
> @@ -89,7 +89,5 @@ subroutine f9(x, limit, step)
>end do
>  end subroutine f9
>  
> -! { dg-final { scan-tree-dump-times {want to version containing loop} 9 
> "lversion" { target lp64 } } }
> -! { dg-final { scan-tree-dump-times {versioned this loop} 9 "lversion" { 
> target lp64 } } }
> -! { dg-final { scan-tree-dump-times {want to version containing loop} 8 
> "lversion" { target { ! lp64 } } } }
> -! { dg-final { scan-tree-dump-times {versioned this loop} 8 "lversion" { 
> target { ! lp64 } } } }
> +! { dg-final { scan-tree-dump-times {want to version containing loop} 9 
> "lversion" { xfail { ! lp64 } } } }
> +! { dg-final { scan-tree-dump-times {versioned this loop} 9 "lversion" { 
> xfail { ! lp64 } } } }
> 

-- 
Richard Biener 
SUSE Software Solutions 

Re: [PATCH v2 00/11] timed_mutex, shared_timed_mutex: Add full steady clock support

2019-12-03 Thread Jonathan Wakely

On 02/12/19 16:23 +, Jonathan Wakely wrote:

On 15/10/19 18:57 +0100, Mike Crowe wrote:

glibc v2.30 added the pthread_mutex_clocklock,
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock
functions. These accept CLOCK_MONOTONIC, so they can be used to
implement proper steady_clock support in timed_mutex,
recursive_timed_mutex and shared_timed_mutex that is immune to the
system clock being warped.

Unfortunately we can't warp the system clock in the testsuite, so it's
not possible to automatically ensure that the system_clock test cases
result in a wait on CLOCK_REALTIME and steady_clock test cases result
in a wait on CLOCK_MONOTONIC. It's recommended to run the test under
strace(1) and check whether the expected futex calls are made by glibc
or ltrace(1) and check whether the expected pthread calls are
made. The easiest way to do this is to copy and paste the line used to
build the test from the output of running the tests (for example):

make check-target-libstdc++-v3 
RUNTESTFLAGS="conformance.exp=30_threads/shared_mutex/* -v -v"

to compile the test with only the tests for one clock enabled and then
run it as:

strace -f ./1.exe

You should see calls to:

futex(..., FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, ...)

with large values of tv_sec when using system_clock and calls to:

futex(..., FUTEX_WAIT_BITSET_PRIVATE, ...)

Alternatively, you can use:

ltrace -f ./1.exe

and look for calls to pthread_mutex_clocklock,
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock with a
parameter of 1 for CLOCK_MONOTONIC and with values of tv_sec based on
the system uptime when using steady_clock.

This series also adds some extra tests and fixes some other minor
problems with the existing implementation and tests.

Changes since v1[1]:

* Fix flaw pointed out[2] by François Dumont and add tests to prove
that.

[1] https://gcc.gnu.org/ml/libstdc++/2019-09/msg00106.html
[2] https://gcc.gnu.org/ml/libstdc++/2019-10/msg00021.html

Mike Crowe (11):
libstdc++ testsuite: Check return value from timed_mutex::try_lock_until
libstdc++ testsuite: Add timed_mutex::try_lock_until test
libstdc++ testsuite: Also test timed_mutex with steady_clock
libstdc++ testsuite: Also test unique_lock::try_lock_until with steady_clock
PR libstdc++/78237 Add full steady_clock support to timed_mutex
libstdc++ testsuite: Move slow_clock to its own header
PR libstdc++/91906 Fix timed_mutex::try_lock_until on arbitrary clock
libstdc++ testsuite: Also test shared_timed_mutex with steady_clock
libstdc++ shared_mutex: Add full steady_clock support to shared_timed_mutex
libstdc++ timed_mutex: Ensure that try_lock_for waits for long enough
shared_mutex: Fix try_lock_until and try_lock_shared_until on arbitrary clock

libstdc++-v3/acinclude.m4   |  
64 +++-
libstdc++-v3/config.h.in|   
7 +++-
libstdc++-v3/configure  | 
170 -
libstdc++-v3/configure.ac   |   
6 +++-
libstdc++-v3/include/std/mutex  |  
60 +
libstdc++-v3/include/std/shared_mutex   | 
117 +-
libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc   |  
17 +---
libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc |  
76 -
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock/3.cc  |  
17 ---
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc|  
87 +-
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/2.cc|  
75 -
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc   |  
76 -
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc   |  
68 +-
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/57641.cc   |  
18 +---
libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc  |  
14 --
libstdc++-v3/testsuite/util/slow_clock.h|  
38 -
16 files changed, 850 insertions(+), 60 deletions(-)
create mode 100644 
libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
create mode 100644 
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc
create mode 100644 
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/2.cc
create mode 100644 
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc
create mode 100644 
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc
create 

Re: [PATCH] Improve A*B+-A -> A*(B+-1) and A+-A*B -> A*(1+-B) match.pd optimization

2019-12-03 Thread Richard Sandiford
Jakub Jelinek  writes:
> Hi!
>
> As discussed in the PR, we can't optimize e.g.
>   int a = t - 1;
>   int b = a * v;
>   return b + v;
> into return t * v; for signed non-wrapv arithmetics.  This can be done
> by the match.pd (A * B) +- A -> (B +- 1) * A or
> A +- (A * B) -> (1 +- B) * A canonicalizations.  Being a lazy guy,
> I wrote attached brute force proglet to look for all the cases (for signed
> char) where there is no UB in the original and the transformation would
> introduce UB.  For the simple cases with just A and B, rather than A, B and
> C, the problematic cases are for signed char only:
> A*B+A -> (B+1)*A A==-1 && B==127
> A*B+A -> (B+1)*A A==0 && B==127
> A*B-A -> (B-1)*A A==0 && B==-128
> A-A*B -> (1-B)*A A==-1 && B==-127
> A-A*B -> (1-B)*A A==0 && B==-128
> A-A*B -> (1-B)*A A==0 && B==-127
> The current patterns already use VRP (tree_expr_nonzero_p and
> expr_not_equal_to) to do the transformation only if A is known not to be 0
> or -1.  But as the above problematic cases show, for A*B-A the -1
> case is actually not problematic (transformation doesn't introduce UB;
> if A is -1, -1*B+1 has UB only for minimum and minimum+1, and the
> replacement (B-1)*-1 has also UB for those two cases only) and even when we
> know nothing about A value range, if we know something about B value range,
> we could still optimize.  So, for the
>   int a = t - 1;
>   int b = a * v;
>   return b + v;
> case, the a = t - 1 has value range that doesn't include maximum and
> so we can conclude it is ok to transform it into return ((t - 1) + 1) * v
> and thus t * v.
>
> Unfortunately, the patch "broke" a few loop_versioning_*.f90 tests (CCing
> author), where there are small differences between the lversion pass,
> e.g. in loop_versioning_1.f90 (f1):
> -  # RANGE ~[0, 0]
> -  _4 = iftmp.11_6 * S.4_19;
> -  _11 = _4 - iftmp.11_6;
> +  # RANGE [0, 9223372036854775806] NONZERO 9223372036854775807
> +  _4 = S.4_19 + -1;
> +  _11 = _4 * iftmp.11_6;
> and the lversion pass then emits just one message instead of two, but in the
> end assembly is identical.  In loop_versioning_6.f90 (though, with -m32
> only), the code before lversion pass actually looks better in f1:
> -  # i_35 = PHI <1(8), i_28(9)>
> -  _9 = iftmp.33_15 * i_35;
> -  _10 = _9 * 2;
> -  _21 = _10 - iftmp.33_15;
> -  (*x.0_23)[_21] = 1.0e+2;
> -  _11 = i_35 * 2;
> -  _12 = _11 + 1;
> -  _13 = _12 * iftmp.33_15;
> -  _22 = _13 - iftmp.33_15;
> -  (*x.0_23)[_22] = 1.01e+2;
> -  i_28 = i_35 + 1;
> -  if (iftmp.36_25 < i_28)
> +  # i_31 = PHI <1(8), i_26(9)>
> +  _10 = iftmp.33_13 * i_31;
> +  _11 = _10 * 2;
> +  _19 = _11 - iftmp.33_13;
> +  (*x.0_21)[_19] = 1.0e+2;
> +  (*x.0_21)[_11] = 1.01e+2;
> +  i_26 = i_31 + 1;
> +  if (iftmp.36_23 < i_26)
> where due to the new canonicalizations we managed to avoid some
> multiplications.  One index was iftmp*i*2-iftmp and the other
> was iftmp*(i*2+1)-iftmp and with the patch we managed to simplify
> the latter into iftmp*i*2 and use for that the temporary used for
> the first expression.  f1 is actually in assembly smaller because of this.
> The lp64 vs. ! lp64 is just a wild guess, guess testing on further targets
> will show what is the target property that matters.

We're supposed to version all 9 functions on all targets though,
so I think we should xfail the test for ! lp64 rather than expect
different output.

E.g. we longer vectorise f1 for -mabi=ilp32 on aarch64-linux-gnu,
which is a regression of sorts,

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-12-03  Richard Sandiford  

gcc/testsuite/
* gfortran.dg/loop_versioning_6.f90: XFAIL the scans for ! lp64.

Index: gcc/testsuite/gfortran.dg/loop_versioning_6.f90
===
--- gcc/testsuite/gfortran.dg/loop_versioning_6.f90 2019-12-02 
17:38:19.276428679 +
+++ gcc/testsuite/gfortran.dg/loop_versioning_6.f90 2019-12-03 
09:45:30.673414203 +
@@ -89,7 +89,5 @@ subroutine f9(x, limit, step)
   end do
 end subroutine f9
 
-! { dg-final { scan-tree-dump-times {want to version containing loop} 9 
"lversion" { target lp64 } } }
-! { dg-final { scan-tree-dump-times {versioned this loop} 9 "lversion" { 
target lp64 } } }
-! { dg-final { scan-tree-dump-times {want to version containing loop} 8 
"lversion" { target { ! lp64 } } } }
-! { dg-final { scan-tree-dump-times {versioned this loop} 8 "lversion" { 
target { ! lp64 } } } }
+! { dg-final { scan-tree-dump-times {want to version containing loop} 9 
"lversion" { xfail { ! lp64 } } } }
+! { dg-final { scan-tree-dump-times {versioned this loop} 9 "lversion" { xfail 
{ ! lp64 } } } }


[PATCH, rs6000] Fix PR92760 by checking VECTOR_MEM_NONE_P instead

2019-12-03 Thread Kewen.Lin
Hi,

PR92760 exposed one issue that VECTOR_UNIT_NONE_P (V2DImode) is true on Power7
then we won't return it as preferred_simd_mode but ISA 2.06 (Power7) does 
introduce partial support on vector doubleword (very limitted) and more basic
support origins from ISA 2.07 (Power8) though.  To make vectorizer still
leverage those few but available V2DImode related instructions, we need to
claim it's available on VSX (Power7 and up).

Instead of using VECTOR_UNIT_NONE_P, this fix is to use VECTOR_MEM_NONE_P
instead.  I was intended to use both VECTOR_UNIT_NONE_P and VECTOR_MEM_NONE_P
together but noticed that MEM is like a super set of UNIT in current
implementation, I think it would be enough.

Bootstrapped and tested on powerpc64le-linux-gnu (P8 LE) and ppc64-redhat-linux
(P7 BE).  Also verified the failures in the PR gone.

Is it ok for trunk?

BR,
Kewen

gcc/ChangeLog

2019-12-03  Kewen Lin  

PR target/92760
* gcc/config/rs6000/rs6000.c (rs6000_preferred_simd_mode): Update
VECTOR_UNIT_NONE_P by VECTOR_MEM_NONE_P.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 3c22f64..23b6d99 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4917,7 +4917,7 @@ rs6000_preferred_simd_mode (scalar_mode mode)
 {
   opt_machine_mode vmode = mode_for_vector (mode, 16 / GET_MODE_SIZE (mode));
 
-  if (vmode.exists () && !VECTOR_UNIT_NONE_P (vmode.require ()))
+  if (vmode.exists () && !VECTOR_MEM_NONE_P (vmode.require ()))
 return vmode.require ();
 
   return word_mode;


Which OpenACC 'acc_device_t' to use for AMD GPU offloading (was: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port)

2019-12-03 Thread Thomas Schwinge
Hi!

On 2019-12-02T14:50:42+, Julian Brown  wrote:
> On Mon, 2 Dec 2019 15:43:29 +0100
> Thomas Schwinge  wrote:
>
>> > --- a/libgomp/openacc.h
>> > +++ b/libgomp/openacc.h
>> > @@ -55,6 +55,7 @@ typedef enum acc_device_t {
>> >/* acc_device_host_nonshm = 3 removed.  */
>> >acc_device_not_host = 4,
>> >acc_device_nvidia = 5,
>> > +  acc_device_gcn = 8,  
>> 
>> There is no 'acc_device_gcn' in OpenACC.

My point is, the OpenACC specification defines 'acc_device_t', and we're
now adding/using a non-standard value, 'acc_device_gcn'.  Depending on
how you read the specification, it may be allowed for an implementation
to provide additional/different values, but for good reason, OpenACC 3.0,
A. "Recommendations for Implementors", still "gives recommendations for
standard names [...] to use for implementations for specific targets and
target platforms, to promote portability across such implementations".

>> Per OpenACC 3.0, A.1.2. "AMD
>> GPU Targets", for example, there is 'acc_device_radeon' (and "the
>> case-insensitive name 'radeon' for the environment variable
>> 'ACC_DEVICE_TYPE'").  If that is not appropriate to use (I have not
>> read up how exactly AMD's "GCN" and "radeon" relate to each other),
>> we should get that clarified in the OpenACC specification.
>
> FWIW, I'm pretty sure there are Radeon devices that do not use the GCN
> ISA.

But does an OpenACC user really care?  Are users likely to state: "I have
a GCN card", or rather: "I have an AMD GPU card", or even just: "I have a
card with a big AMD logo on it"?

Admittedly, users are probably unlikely to state: "I have a Radeon card",
heh?  ;-) (But it's still the standard name defined in the OpenACC
specification.)

> OTOH, there are also Nvidia devices that are not PTX-compatible.

(But not any recent (as in a few years old) ones, capable for GPGPU
computing, are there?)

But again: "I have a card with a big Nvidia logo on it" is probably what
users care about, not whether that is internally using PTX or anything
else, which then is the implementation's job to sort out, when
'acc_device_nvidia' is requested by the user.

> No strong opinion on the acc_device_foo name from me, though (maybe
> "acc_device_amdgcn"?).

Once/if we have established that the standard 'acc_device_radeon' is not
suitable for us here, only then we should think of a new name, in my
opinion.  You wouldn't just add a 'copy_mem' function to glibc given that
'memcpy' already exists?  ;-)


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [Patch][Fortran] OpenACC – permit common blocks in some clauses

2019-12-03 Thread Thomas Schwinge
Hi Tobias!

On 2019-11-29T18:47:12+0100, Tobias Burnus  wrote:
> On 11/28/19 6:02 PM, Thomas Schwinge wrote:
> [Test case which uses common blocks in device_resident.]
>> If you'd like to, please commit that, to document the status quo.  (I
>> have not reviewed.)
>
> Did so as r278845 with a slightly updated comment.

Testing with nvptx offloading on two different systems, on both I'm
seeing 'STOP 10' execution test FAILure for all optimization levels:

PASS: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-5.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
execution test
PASS: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-5.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  
execution test
PASS: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-5.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
execution test
PASS: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-5.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test
PASS: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-5.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  
execution test
PASS: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-5.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  
execution test


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH 4/4]: C++ P1423R3 char8_t remediation: New tests

2019-12-03 Thread Jonathan Wakely

On 03/12/19 09:11 +0100, Christophe Lyon wrote:

On Mon, 16 Sep 2019 at 04:34, Tom Honermann  wrote:


A revised patch is attached that modifies the tests for deleted ostream
inserters to require C++2a.  This is required by the revision of patch
2/4 that adds proper preprocessor conditionals to the definitions.

Tom.

On 9/15/19 3:40 PM, Tom Honermann wrote:
> This patch adds new tests to validate new deleted overloads of wchar_t,
> char8_t, char16_t, and char32_t for ordinary and wide formatted
> character and string ostream inserters.
>
> Additionally, new tests are added to validate invocations of u8path with
> sequences of char8_t for both the C++17 and filesystem TS implementations.
>
> libstdc++-v3/ChangeLog:
>
> 2019-09-15  Tom Honermann  
>
>   *
> 
libstdc++-v3/testsuite/27_io/basic_ostream/inserters_character/char/deleted.cc:
>
> New test to validate deleted overloads of character and string
> inserters for narrow ostreams.
>   *
> 
libstdc++-v3/testsuite/27_io/basic_ostream/inserters_character/wchar_t/deleted.cc:
>
> New test to validate deleted overloads of character and string
> inserters for wide ostreams.
>   *
> libstdc++-v3/testsuite/27_io/filesystem/path/factory/u8path-char8_t.cc:
> New test to validate u8path invocations with sequences of
> char8_t.
>   *
> libstdc++-v3/testsuite/experimental/filesystem/path/factory/u8path-char8_t.cc
>
> New test to validate u8path invocations with sequences of
> char8_t.
>


Hi,

I've noticed that the new test
27_io/filesystem/path/factory/u8path-char8_t.cc
fails to compile on arm-none-eabi with default cpu/fpu, because:
/tools/arm-none-eabi/bin/ld:
/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/src/.libs/libstdc++.a(string-inst.o):
in function `_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_':
string-inst.cc:(.text._ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_[_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_]+0xf4):
undefined reference to `_ZSt15__alloc_on_moveISaIcEEvRT_S2_'
[etc...]


That function is defined inline and so should be instantiated in any
TU that needs it, and so should not give linker errors. There was a
similar bug reported the other day that turned out to be pilot error:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92733


The one in experimental is unsupported thanks to
// { dg-require-filesystem-ts "" }
Should that be added to the version in 27_io?


No, the std::filesystem::path class has no dependencies, it should
work everywhere. I'm not sure what's happening here.



[PATCH] Deal with CLOBBERs better in VN (PR92751)

2019-12-03 Thread Richard Biener


The following fixes value-numbering being confused by the CLOBBERs
that appear in Skia vector constructors like

   :
  v ={v} {CLOBBER};
  MEM[(struct Vec *)] ={v} {CLOBBER};
  MEM[(struct Vec *)] ={v} {CLOBBER};
  MEM[(struct Vec *)].val = 1;
  MEM[(struct Vec *) + 4B] ={v} {CLOBBER};
  MEM[(struct Vec *) + 4B].val = 1;
  MEM[(struct Vec *) + 8B] ={v} {CLOBBER};
  MEM[(struct Vec *) + 8B] ={v} {CLOBBER};
  MEM[(struct Vec *) + 8B].val = 1;
  MEM[(struct Vec *) + 12B] ={v} {CLOBBER};
  MEM[(struct Vec *) + 12B].val = 1;
  _4 = MEM <__int128 unsigned> [(char * {ref-all})];

where we'd like to value-number _4 to { 1, 1, 1, 1 }.  Appearantly
the template obfuscation is necessary to trigger the CLOBBERs,
just a bunch of placement news to initialize array elements isn't
enough ;)

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I didn't opt for the alternative to just skip over all clobbers
when value-numbering - "reads" from them would invoke undefined
behavior and so we could assume them to be not aliasing.  The
following works as well and is safer at this point.

The testcase depends on the improved memcpy folding which I had
to adjust a bit to avoid regressions in strlenopt pass testcases
(retesting at the moment).

Richard.

2019-12-03  Richard Biener  

PR tree-optimization/92751
* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): Fail
when a clobber ends up in the partial-def vector.
(vn_reference_lookup_3): Let clobbers be handled by the
assignment from CTOR handling.

* g++.dg/tree-ssa/pr92751.C: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 278893)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1761,6 +1761,9 @@ vn_walk_cb_data::push_partial_def (const
 
   if (partial_defs.is_empty ())
 {
+  /* If we get a clobber upfront, fail.  */
+  if (TREE_CLOBBER_P (pd.rhs))
+   return (void *)-1;
   partial_defs.safe_push (pd);
   first_range.offset = pd.offset;
   first_range.size = pd.size;
@@ -1792,7 +1795,8 @@ vn_walk_cb_data::push_partial_def (const
   && ranges_known_overlap_p (r->offset, r->size + 1,
 newr.offset, newr.size))
 {
-  /* Ignore partial defs already covered.  */
+  /* Ignore partial defs already covered.  Here we also drop shadowed
+ clobbers arriving here at the floor.  */
   if (known_subrange_p (newr.offset, newr.size, r->offset, r->size))
return NULL;
   r->size = MAX (r->offset + r->size, newr.offset + newr.size) - r->offset;
@@ -1817,6 +1821,9 @@ vn_walk_cb_data::push_partial_def (const
 rafter->offset + rafter->size) - r->offset;
   splay_tree_remove (known_ranges, (splay_tree_key)>offset);
 }
+  /* If we get a clobber, fail.  */
+  if (TREE_CLOBBER_P (pd.rhs))
+return (void *)-1;
   partial_defs.safe_push (pd);
 
   /* Now we have merged newr into the range tree.  When we have covered
@@ -2355,10 +2362,6 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   poly_int64 offset = ref->offset;
   poly_int64 maxsize = ref->max_size;
 
-  /* We can't deduce anything useful from clobbers.  */
-  if (gimple_clobber_p (def_stmt))
-return (void *)-1;
-
   /* def_stmt may-defs *ref.  See if we can derive a value for *ref
  from that definition.
  1) Memset.  */
@@ -2508,6 +2511,10 @@ vn_reference_lookup_3 (ao_ref *ref, tree
  if (data->partial_defs.is_empty ()
  && known_subrange_p (offset, maxsize, offset2, size2))
{
+ /* While technically undefined behavior do not optimize
+a full read from a clobber.  */
+ if (gimple_clobber_p (def_stmt))
+   return (void *)-1;
  tree val = build_zero_cst (vr->type);
  return vn_reference_lookup_or_insert_for_pieces
  (vuse, get_alias_set (lhs), vr->type, vr->operands, val);
@@ -2522,6 +2529,9 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   && size2.is_constant ()
   && size2i % BITS_PER_UNIT == 0)
{
+ /* Let clobbers be consumed by the partial-def tracker
+which can choose to ignore them if they are shadowed
+by a later def.  */
  pd_data pd;
  pd.rhs = gimple_assign_rhs1 (def_stmt);
  pd.offset = (offset2i - offseti) / BITS_PER_UNIT;
Index: gcc/testsuite/g++.dg/tree-ssa/pr92751.C
===
--- gcc/testsuite/g++.dg/tree-ssa/pr92751.C (nonexistent)
+++ gcc/testsuite/g++.dg/tree-ssa/pr92751.C (working copy)
@@ -0,0 +1,26 @@
+// { dg-do compile }
+// { dg-options "-O -fdump-tree-fre1" }
+
+inline void* operator new(__SIZE_TYPE__, void* p) { return p; }
+template
+struct Vec {
+  Vec(int v) : lo(v), hi(v) {};
+  Vec lo, hi;
+};
+template<>

Re: [PATCH] Improve (CST1 - A) +- CST2 -> CST3 - A and CST1 - (CST2 - A) -> CST3 + A match.pd opt (PR tree-optimization/92734)

2019-12-03 Thread Richard Biener
On Tue, 3 Dec 2019, Jakub Jelinek wrote:

> Hi!
> 
> The following patch extends the improvements Marc did to the
> (A +- CST1) +- CST2 -> A +- CST3
> match.pd simplification some time ago to the other two patterns,
> in particular handle the case when the inner subtraction is done in a
> different, but nop_convert compatible, type from the outer +/-.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2019-12-03  Jakub Jelinek  
> 
>   PR tree-optimization/92734
>   * match.pd ((CST1 - A) +- CST2 -> CST3 - A,
>   CST1 - (CST2 - A) -> CST3 + A): Handle nop casts around
>   inner subtraction.
> 
>   * gcc.dg/tree-ssa/pr92734.c: New test.
> 
> --- gcc/match.pd.jj   2019-12-02 09:50:27.0 +0100
> +++ gcc/match.pd  2019-12-02 16:23:43.825040429 +0100
> @@ -2237,17 +2237,39 @@ (define_operator_list COND_TERNARY
>/* (CST1 - A) +- CST2 -> CST3 - A  */
>(for outer_op (plus minus)
> (simplify
> -(outer_op (minus CONSTANT_CLASS_P@1 @0) CONSTANT_CLASS_P@2)
> -(with { tree cst = const_binop (outer_op, type, @1, @2); }
> - (if (cst && !TREE_OVERFLOW (cst))
> -  (minus { cst; } @0)
> +(outer_op (nop_convert (minus CONSTANT_CLASS_P@1 @0)) CONSTANT_CLASS_P@2)
> +/* If one of the types wraps, use that one.  */
> +(if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type))
> + /* If all 3 captures are CONSTANT_CLASS_P, punt, as we might recurse
> + forever if something doesn't simplify into a constant.  */
> + (if (!CONSTANT_CLASS_P (@0))
> +  (minus (outer_op (view_convert @1) @2) (view_convert @0)))
> + (if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +   || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> +  (view_convert (minus (outer_op @1 (view_convert @2)) @0))
> +  (if (types_match (type, @0))
> +   (with { tree cst = const_binop (outer_op, type, @1, @2); }
> + (if (cst && !TREE_OVERFLOW (cst))
> +  (minus { cst; } @0
>  
> -  /* CST1 - (CST2 - A) -> CST3 + A  */
> +  /* CST1 - (CST2 - A) -> CST3 + A
> + Use view_convert because it is safe for vectors and equivalent for
> + scalars.  */
>(simplify
> -   (minus CONSTANT_CLASS_P@1 (minus CONSTANT_CLASS_P@2 @0))
> -   (with { tree cst = const_binop (MINUS_EXPR, type, @1, @2); }
> -(if (cst && !TREE_OVERFLOW (cst))
> - (plus { cst; } @0
> +   (minus CONSTANT_CLASS_P@1 (nop_convert (minus CONSTANT_CLASS_P@2 @0)))
> +   /* If one of the types wraps, use that one.  */
> +   (if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type))
> +/* If all 3 captures are CONSTANT_CLASS_P, punt, as we might recurse
> +  forever if something doesn't simplify into a constant.  */
> +(if (!CONSTANT_CLASS_P (@0))
> + (plus (view_convert @0) (minus @1 (view_convert @2
> +(if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> + (view_convert (plus @0 (minus (view_convert @1) @2)))
> + (if (types_match (type, @0))
> +  (with { tree cst = const_binop (MINUS_EXPR, type, @1, @2); }
> +   (if (cst && !TREE_OVERFLOW (cst))
> + (plus { cst; } @0)))
>  
>  /* ((T)(A)) + CST -> (T)(A + CST)  */
>  #if GIMPLE
> --- gcc/testsuite/gcc.dg/tree-ssa/pr92734.c.jj2019-12-02 
> 16:26:55.771057273 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr92734.c   2019-12-02 16:29:59.922195268 
> +0100
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/92734 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1" } */
> +/* { dg-final { scan-tree-dump-times "return t_\[0-9]*\\\(D\\\);" 4 
> "forwprop1" } } */
> +
> +int
> +f1 (int t)
> +{
> +  return 1 - (int) (1U - t);
> +}
> +
> +int
> +f2 (int t)
> +{
> +  int a = 7U - t;
> +  return 7 - a;
> +}
> +
> +int
> +f3 (int t)
> +{
> +  int a = 32U - t;
> +  return 32 - a;
> +}
> +
> +int
> +f4 (int t)
> +{
> +  int a = 32 - t;
> +  return (int) (32 - (unsigned) a);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [wwwdocs][Fortran] gcc-10/porting_to.html – Fortran's argument-checking

2019-12-03 Thread Tobias Burnus
I have now installed https://gcc.gnu.org/gcc-10/porting_to.html – which 
is linked from https://gcc.gnu.org/gcc-10/changes.html (first paragraph 
is now uncommented).


Comments? Omissions, additional suggestions for either page?

Cheers,

Tobias

On 12/2/19 1:48 PM, Tobias Burnus wrote:
Revised version. Changes: I now assume that this patch goes in before 
Wilco's; hence, it includes the full HTML file and the changes.html 
modification.


Additionally, I improved the wording at several places a tiny bit – it 
really helps to re-read what one has written :-) (Thanks, Gerald, I 
followed most of your suggestions, though some became obsolete due to 
other changes.)


Any other comments? Otherwise, I would like to commit it :-)

Tobias



[PATCH] Improve (CST1 - A) +- CST2 -> CST3 - A and CST1 - (CST2 - A) -> CST3 + A match.pd opt (PR tree-optimization/92734)

2019-12-03 Thread Jakub Jelinek
Hi!

The following patch extends the improvements Marc did to the
(A +- CST1) +- CST2 -> A +- CST3
match.pd simplification some time ago to the other two patterns,
in particular handle the case when the inner subtraction is done in a
different, but nop_convert compatible, type from the outer +/-.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-03  Jakub Jelinek  

PR tree-optimization/92734
* match.pd ((CST1 - A) +- CST2 -> CST3 - A,
CST1 - (CST2 - A) -> CST3 + A): Handle nop casts around
inner subtraction.

* gcc.dg/tree-ssa/pr92734.c: New test.

--- gcc/match.pd.jj 2019-12-02 09:50:27.0 +0100
+++ gcc/match.pd2019-12-02 16:23:43.825040429 +0100
@@ -2237,17 +2237,39 @@ (define_operator_list COND_TERNARY
   /* (CST1 - A) +- CST2 -> CST3 - A  */
   (for outer_op (plus minus)
(simplify
-(outer_op (minus CONSTANT_CLASS_P@1 @0) CONSTANT_CLASS_P@2)
-(with { tree cst = const_binop (outer_op, type, @1, @2); }
- (if (cst && !TREE_OVERFLOW (cst))
-  (minus { cst; } @0)
+(outer_op (nop_convert (minus CONSTANT_CLASS_P@1 @0)) CONSTANT_CLASS_P@2)
+/* If one of the types wraps, use that one.  */
+(if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type))
+ /* If all 3 captures are CONSTANT_CLASS_P, punt, as we might recurse
+   forever if something doesn't simplify into a constant.  */
+ (if (!CONSTANT_CLASS_P (@0))
+  (minus (outer_op (view_convert @1) @2) (view_convert @0)))
+ (if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+  (view_convert (minus (outer_op @1 (view_convert @2)) @0))
+  (if (types_match (type, @0))
+   (with { tree cst = const_binop (outer_op, type, @1, @2); }
+   (if (cst && !TREE_OVERFLOW (cst))
+(minus { cst; } @0
 
-  /* CST1 - (CST2 - A) -> CST3 + A  */
+  /* CST1 - (CST2 - A) -> CST3 + A
+ Use view_convert because it is safe for vectors and equivalent for
+ scalars.  */
   (simplify
-   (minus CONSTANT_CLASS_P@1 (minus CONSTANT_CLASS_P@2 @0))
-   (with { tree cst = const_binop (MINUS_EXPR, type, @1, @2); }
-(if (cst && !TREE_OVERFLOW (cst))
- (plus { cst; } @0
+   (minus CONSTANT_CLASS_P@1 (nop_convert (minus CONSTANT_CLASS_P@2 @0)))
+   /* If one of the types wraps, use that one.  */
+   (if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type))
+/* If all 3 captures are CONSTANT_CLASS_P, punt, as we might recurse
+  forever if something doesn't simplify into a constant.  */
+(if (!CONSTANT_CLASS_P (@0))
+ (plus (view_convert @0) (minus @1 (view_convert @2
+(if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
+|| TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+ (view_convert (plus @0 (minus (view_convert @1) @2)))
+ (if (types_match (type, @0))
+  (with { tree cst = const_binop (MINUS_EXPR, type, @1, @2); }
+   (if (cst && !TREE_OVERFLOW (cst))
+   (plus { cst; } @0)))
 
 /* ((T)(A)) + CST -> (T)(A + CST)  */
 #if GIMPLE
--- gcc/testsuite/gcc.dg/tree-ssa/pr92734.c.jj  2019-12-02 16:26:55.771057273 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr92734.c 2019-12-02 16:29:59.922195268 
+0100
@@ -0,0 +1,31 @@
+/* PR tree-optimization/92734 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop1" } */
+/* { dg-final { scan-tree-dump-times "return t_\[0-9]*\\\(D\\\);" 4 
"forwprop1" } } */
+
+int
+f1 (int t)
+{
+  return 1 - (int) (1U - t);
+}
+
+int
+f2 (int t)
+{
+  int a = 7U - t;
+  return 7 - a;
+}
+
+int
+f3 (int t)
+{
+  int a = 32U - t;
+  return 32 - a;
+}
+
+int
+f4 (int t)
+{
+  int a = 32 - t;
+  return (int) (32 - (unsigned) a);
+}

Jakub



[C++ PATCH] Fix up constexpr TARGET_EXPR evaluation if there are cleanups (PR c++/91369)

2019-12-03 Thread Jakub Jelinek
Hi!

The following testcase shows that during constexpr evaluation we didn't
handle TARGET_EXPR_CLEANUP at all (which was probably fine when there
weren't constexpr dtors).  My understanding is that TARGET_EXPR cleanups
should be queued and evaluated only at the end of CLEANUP_POINT_EXPR, so
that is what the patch implements.

Regtesting of the first iteration revealed quite some FAILs in libstdc++,
my assumption that a TARGET_EXPR with cleanups will always appear inside of
some CLEANUP_POINT_EXPR is violated e.g. when cp_fold attempts to evaluate
some call expressions with TARGET_EXPR in arguments, so I had to add
evaluating of cleanups in cxx_eval_outermost_constant_expr too as if the
outermost expression were wrapped into another CLEANUP_POINT_EXPR.

There was another issue only seen in libstdc++, in particular,
TRY_CATCH_EXPR created by wrap_cleanups_r can sometimes have NULL_TREE
first argument.

Furthermore (though just theoretical, I haven't tried to create a testcase),
I believe a TARGET_EXPR shuld behave similarly to SAVE_EXPR that if the same
TARGET_EXPR tree is encountered multiple times, the initializer expression
is evaluated just once, not multiple times.  Maybe it didn't matter before
if things were evaluated multiple times, but e.g. with constexpr new/delete,
evaluating new multiple times without corresponding deletes is incorrect,
so in the patch I've tried to handle caching of the result and reuse of it
once it has been evaluated already (with pushing into save_exprs so that
e.g. when evaluating a loop second time or a function we undo the caching).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, the description of CLEANUP_POINT_EXPR in tree.def makes me worry a
little bit, with the:
"   Note that if the expression is a reference to storage, it is forced out
   of memory before the cleanups are run.  This is necessary to handle
   cases where the cleanups modify the storage referenced; in the
   expression 't.i', if 't' is a struct with an integer member 'i' and a
   cleanup which modifies 'i', the value of the expression depends on
   whether the cleanup is run before or after 't.i' is evaluated.  When
   expand_expr is run on 't.i', it returns a MEM.  This is not good enough;
   the value of 't.i' must be forced out of memory."
note.  Does that mean the CLEANUP_POINT_EXPR handling should do
unshare_constructor on the result if there are any cleanups (and similarly
outermost expr handling)?  Don't want on the other side to waste too much
memory if it is not necessary...

2019-12-03  Jakub Jelinek  

PR c++/91369
* constexpr.c (struct constexpr_global_ctx): Add cleanups member,
initialize it in the ctor.
(cxx_eval_constant_expression) : If TARGET_EXPR_SLOT
is already in the values hash_map, don't evaluate it again.  Put
TARGET_EXPR_SLOT into hash_map even if not lval, and push it into
save_exprs too.  If there is TARGET_EXPR_CLEANUP and not
CLEANUP_EH_ONLY, push the cleanup to cleanups vector.
: Save outer cleanups, set cleanups to
local auto_vec, after evaluating the body evaluate cleanups and
restore previous cleanups.
: Don't crash if the first operand is NULL_TREE.
(cxx_eval_outermost_constant_expr): Set cleanups to local auto_vec,
after evaluating the expression evaluate cleanups.

* g++.dg/cpp2a/constexpr-new8.C: New test.

--- gcc/cp/constexpr.c.jj   2019-11-28 09:02:21.896897228 +0100
+++ gcc/cp/constexpr.c  2019-12-03 01:14:06.674952015 +0100
@@ -1025,8 +1025,10 @@ struct constexpr_global_ctx {
   /* Heap VAR_DECLs created during the evaluation of the outermost constant
  expression.  */
   auto_vec heap_vars;
+  /* Cleanups that need to be evaluated at the end of CLEANUP_POINT_EXPR.  */
+  vec *cleanups;
   /* Constructor.  */
-  constexpr_global_ctx () : constexpr_ops_count (0) {}
+  constexpr_global_ctx () : constexpr_ops_count (0), cleanups (NULL) {}
 };
 
 /* The constexpr expansion context.  CALL is the current function
@@ -4984,6 +4986,14 @@ cxx_eval_constant_expression (const cons
  *non_constant_p = true;
  break;
}
+  /* Avoid evaluating a TARGET_EXPR more than once.  */
+  if (tree *p = ctx->global->values.get (TARGET_EXPR_SLOT (t)))
+   {
+ if (lval)
+   return TARGET_EXPR_SLOT (t);
+ r = *p;
+ break;
+   }
   if ((AGGREGATE_TYPE_P (TREE_TYPE (t)) || VECTOR_TYPE_P (TREE_TYPE (t
{
  /* We're being expanded without an explicit target, so start
@@ -5004,13 +5014,14 @@ cxx_eval_constant_expression (const cons
   if (!*non_constant_p)
/* Adjust the type of the result to the type of the temporary.  */
r = adjust_temp_type (TREE_TYPE (t), r);
+  if (TARGET_EXPR_CLEANUP (t) && !CLEANUP_EH_ONLY (t))
+   ctx->global->cleanups->safe_push (TARGET_EXPR_CLEANUP (t));
+  r = unshare_constructor (r);
+  

Re: [PATCH 4/4]: C++ P1423R3 char8_t remediation: New tests

2019-12-03 Thread Christophe Lyon
On Mon, 16 Sep 2019 at 04:34, Tom Honermann  wrote:
>
> A revised patch is attached that modifies the tests for deleted ostream
> inserters to require C++2a.  This is required by the revision of patch
> 2/4 that adds proper preprocessor conditionals to the definitions.
>
> Tom.
>
> On 9/15/19 3:40 PM, Tom Honermann wrote:
> > This patch adds new tests to validate new deleted overloads of wchar_t,
> > char8_t, char16_t, and char32_t for ordinary and wide formatted
> > character and string ostream inserters.
> >
> > Additionally, new tests are added to validate invocations of u8path with
> > sequences of char8_t for both the C++17 and filesystem TS implementations.
> >
> > libstdc++-v3/ChangeLog:
> >
> > 2019-09-15  Tom Honermann  
> >
> >   *
> > libstdc++-v3/testsuite/27_io/basic_ostream/inserters_character/char/deleted.cc:
> >
> > New test to validate deleted overloads of character and string
> > inserters for narrow ostreams.
> >   *
> > libstdc++-v3/testsuite/27_io/basic_ostream/inserters_character/wchar_t/deleted.cc:
> >
> > New test to validate deleted overloads of character and string
> > inserters for wide ostreams.
> >   *
> > libstdc++-v3/testsuite/27_io/filesystem/path/factory/u8path-char8_t.cc:
> > New test to validate u8path invocations with sequences of
> > char8_t.
> >   *
> > libstdc++-v3/testsuite/experimental/filesystem/path/factory/u8path-char8_t.cc
> >
> > New test to validate u8path invocations with sequences of
> > char8_t.
> >

Hi,

I've noticed that the new test
27_io/filesystem/path/factory/u8path-char8_t.cc
fails to compile on arm-none-eabi with default cpu/fpu, because:
/tools/arm-none-eabi/bin/ld:
/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/src/.libs/libstdc++.a(string-inst.o):
in function `_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_':
string-inst.cc:(.text._ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_[_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_]+0xf4):
undefined reference to `_ZSt15__alloc_on_moveISaIcEEvRT_S2_'
[etc...]

The one in experimental is unsupported thanks to
// { dg-require-filesystem-ts "" }
Should that be added to the version in 27_io?

Thanks,

Christophe

> > Tom.
>